CN106601250A

CN106601250A - Speech control method and device and equipment

Info

Publication number: CN106601250A
Application number: CN201510765415.2A
Authority: CN
Inventors: 刘芨可
Original assignee: Individual
Current assignee: Individual
Priority date: 2015-11-10
Filing date: 2015-11-10
Publication date: 2017-04-26

Abstract

The invention discloses a speech control method and device and equipment. The method comprises the following steps: obtaining a first isolated word, wherein the first isolated word is a speech word; judging whether the first isolated word is matched with a preset first key word, wherein the first key word is any one key word in a key word set of a first node; when the first isolated word is matched with the first key word, obtaining a preset second key word set according to the first key word, wherein the second key word set is a key word set of a second node, the second node being a child node of the first node; obtaining a second isolated word; judging whether the second isolated word is matched with a second key word, wherein the second key word is any one key word in the second key word set; and when the second isolated word is matched with the second key word, executing an instruction corresponding to the second key word.

Description

A kind of sound control method and device, equipment

Technical field

The present invention relates to electronic technology, more particularly to a kind of sound control method and device, equipment.

Background technology

Voice control technology is the advanced subject of world today's smart machine control field, it is therefore intended that allow equipment accurately to perform predetermined behavior according to the password of people.Information technology (IT main in the world at present, Information Technology) company releases one after another the speech recognition engine of oneself, the Cortana of the SIRI, the Google Now of Google (Google) company and Microsoft of such as Apple.Domestic IT companies are also proposed the speech-recognition services of oneself, such as Baidu's voice assistant.The release of these voice platforms presents the magical magic power of speech ciphering equipment control, and equipment starts the language that can understand people, and acts according to our wish.But regrettably, so far voice control technology, still not by large-scale use, is used only in some real-times, the not high occasion of accuracy requirement.

The most importantly speech recognition technology in voice control technology, according to the data in the annual report of China Chinese Character Information Institute 2011, in the complex environment English recognition correct rate such as phone, meeting 80% or so, the error rate from the mankind 2% to 4% also has with a distance from very big.In fact, speech recognition technology also has with a distance from no small in following many aspects from final ideal：

1) to the identification and understanding of natural language.Continuous speech must be decomposed into the units such as word, phoneme first, its secondary rule for setting up a understanding semanteme.In fact, the voice of the mankind is vast as the open sea, semantic simply fantastic, the exchange between some normal people has 2% to 4% error rate, is still more equipment.

2) voice messaging amount is big.Speech pattern is not only different to different speakers, is also different to same speaker, and for example, voice messaging of the speaker when arbitrarily speaking and conscientiously speaking is different, in addition, the tongue of a people also can be changed over time.

3) ambiguity of voice.In speech, different words may sound similar to speaker, and this point is relatively conventional in English and Chinese, for example, when a people says the Chinese phonetic alphabet " feiji ", then " feiji " is probably Fiji, aircraft and fertile chicken etc..

4) single letter or word, the characteristics of speech sounds of word are affected by context, so that changing stress, tone, volume and rate of articulation etc..

5) environment noise and interference has a strong impact on to speech recognition, causes discrimination low.

6) audio signal processing technique is limited, and two pronunciations that human ear can clearly respectively, its acoustic features may be almost identical for speech recognition engine.

Current speech control system is mainly the appearance in the form of the voice assistant of equipment.During work, voice assistant gathers the voice of user by the voice collecting part on equipment to voice assistant, and then voice assistant is analyzed and Semantic judgement by the voice signal to gathering, in turn control device.Fig. 1-1 is the workflow diagram of a typical voice assistant in correlation technique, and as Figure 1-1, the workflow of the voice assistant includes：

Step S101, operating system loading, starts voice assistant；

Step S102, the speech recognition engine in voice assistant obtains the input of voice from voice collecting part；

Here, user can send voice, and the voice collecting part on equipment gathers the voice of user input, and then the speech recognition engine in voice assistant receives the input of voice from voice collecting part；

Step S103, voice assistant judges whether semantic analysis are successful, when being, into step S104, conversely, into step S105；

Here, after complicated Speech processing and semantic analysis, speech analysises success is judged whether, if it succeeds, into step S104, performing the instruction obtained after semantic analysis, if speech analysises are without success, into step S105, voice assistant prompting " saying once again ".

Step S104, performs the instruction obtained after semantic analysis；

Step S105, voice assistant prompting user " saying once again ".

In this scenario, instruction after no matter parsing is the original idea of user, voice assistant all can be performed, even if prompting is said once again, user does not know how the accuracy rate that can improve identification yet, therefore user simply repeats, but this repetition can not improve the accuracy rate of identification.It can be seen that, the accuracy rate of above-mentioned Voice command mode is low, for example, be difficult to position exactly and control targe, therefore is not suitable for the low environment of fault-tolerance such as precision instrument control.

The content of the invention

In view of this, the embodiment of the present invention provides a kind of sound control method and device, equipment to solve at least one problem present in prior art, phonetic order recognition accuracy can be improved, so as to ensure that the instruction after parsing is the original idea of user, and then Consumer's Experience is improved.

What the technical scheme of the embodiment of the present invention was realized in：

In a first aspect, the embodiment of the present invention provides a kind of sound control method, methods described includes：The first isolated word is obtained, first isolated word is voice word；Judge first isolated word whether with default first Keywords matching, obtain the first judged result, first key word is any one key word in the keyword set of primary nodal point；When first judged result shows first isolated word with the first Keywords matching, default second keyword set is obtained according to first key word, second keyword set is combined into the keyword set of secondary nodal point, and the secondary nodal point is the child node of the primary nodal point；Obtain the second isolated word；Judge second isolated word whether with the second Keywords matching, obtain the second judged result, second key word is any one key word in the second keyword set；When second judged result shows second isolated word with the second Keywords matching, the corresponding instruction of second key word is performed.

Second aspect, the embodiment of the present invention provides a kind of phonetic controller, and described device includes：First acquisition unit, for obtaining the first isolated word, first isolated word is voice word；First judging unit, for judge first isolated word whether with default first Keywords matching, obtain the first judged result, first key word is any one key word in the keyword set of primary nodal point；Second acquisition unit, for when first judged result shows first isolated word with the first Keywords matching, default second keyword set is obtained according to first key word, second keyword set is combined into the keyword set of secondary nodal point, and the secondary nodal point is the child node of the primary nodal point；3rd acquiring unit, for obtaining the second isolated word；Second judging unit, for judge second isolated word whether with the second Keywords matching, obtain the second judged result, second key word is any one key word in the second keyword set；Performance element, for when second judged result shows second isolated word with the second Keywords matching, performing the corresponding instruction of second key word.

The third aspect, the embodiment of the present invention provides a kind of voice control device, and the equipment includes：Memorizer, for storing the first keyword set and the second keyword set；Processor, for obtaining the first isolated word, first isolated word is voice word；Judge first isolated word whether with default first Keywords matching, obtain the first judged result, first key word is any one key word in the keyword set of primary nodal point；When first judged result shows first isolated word with the first Keywords matching, default second keyword set is obtained according to first key word, second keyword set is combined into the keyword set of secondary nodal point, and the secondary nodal point is the child node of the primary nodal point；Obtain the second isolated word；Judge second isolated word whether with the second Keywords matching, obtain the second judged result, second key word is any one key word in the second keyword set；When second judged result shows second isolated word with the second Keywords matching, the corresponding instruction of second key word is performed.

The embodiment of the present invention provides a kind of sound control method and device, equipment, wherein, obtain the first isolated word；Judge first isolated word whether with default first Keywords matching, first key word is any one key word in the keyword set of primary nodal point；When first isolated word described in DANGMING is with the first Keywords matching, default second keyword set is obtained according to first key word, second keyword set is combined into the keyword set of secondary nodal point, and the secondary nodal point is the child node of the primary nodal point；Obtain the second isolated word；Judge second isolated word whether with the second Keywords matching, second key word is any one key word in the second keyword set；When second isolated word and the second Keywords matching, the corresponding instruction of second key word is performed；So, it is possible to improve phonetic order recognition accuracy, so as to ensure that the instruction after parsing is the original idea of user, and then improve Consumer's Experience.

Description of the drawings

Fig. 1-1 is the workflow diagram of a typical voice assistant in correlation technique；

Fig. 1-2 realizes schematic flow sheet for the sound control method of the embodiment of the present invention one；

Fig. 1-3 is the corresponding relation schematic diagram between the key word of the embodiment of the present invention first and the second set of keywords；

Fig. 2-1 is the hardware composition structural representation of electronic equipment in the embodiment of the present invention two；

Fig. 2-2 realizes schematic flow sheet one for the sound control method of the embodiment of the present invention two；

Fig. 2-3 realizes schematic flow sheet two for the sound control method of the embodiment of the present invention two；

Fig. 2-4 realizes schematic flow sheet three for the sound control method of the embodiment of the present invention two；

Fig. 3 realizes schematic flow sheet for the sound control method of the embodiment of the present invention three；

Fig. 4 realizes schematic flow sheet for the sound control method of the embodiment of the present invention four；

Fig. 5 realizes schematic flow sheet for the sound control method of the embodiment of the present invention five；

Fig. 6-1 is the corresponding relation schematic diagram between embodiment of the present invention node；

Fig. 6-2 is each functional node as corresponding relation schematic diagram during father node and between child node；

Fig. 6-3 is the schematic diagram of each node when embodiment of the present invention clematis stem crosses Voice command search engine search destination object；

Fig. 7 is composition schematic diagram of the speech control system when realizing in the embodiment of the present invention seven；

Fig. 8 realizes schematic flow sheet for the sound control method of the embodiment of the present invention eight；

Fig. 9-1 is the corresponding relation schematic diagram between the father and son's node of the embodiment of the present invention nine；

Fig. 9-2 realizes schematic flow sheet for the sound control method of the embodiment of the present invention nine；

Figure 10-1 is the corresponding relation schematic diagram in embodiment of the present invention fighting Lvbu by three heros game embodiment between father and son's node；

Figure 10-2 realizes schematic flow sheet for the sound control method of the embodiment of the present invention ten；

Figure 11 is the corresponding relation schematic diagram between the wired home father and son's node of the embodiment of the present invention 11；

Figure 12 is the corresponding relation schematic diagram between the fighter plane Voice command father and son's node of the embodiment of the present invention 12；

Figure 13 is the composition structural representation of embodiment of the present invention phonetic controller；

Figure 14 is control system figure of the mobile phone speech assistant when the song for playing Sun Nan is saved in prior art；

Figure 15 is control system figure of the sound control method provided in an embodiment of the present invention when the song for playing Sun Nan is saved.

Specific embodiment

In below example of the present invention, by using isolated word recognition (the isolated word recognition) technology in speech recognition technology, Isolated word recognition is referred to carries out spectrum analyses and speech feature extraction by the voice signal of input, again contrast matching is carried out in limited key wordses list, find out the key word of highest scoring as recognition result.Wherein, also include before spectrum analyses to speech signal pre-processing process, wherein preprocessing process includes the processing procedures such as pre-filtering, sampling and quantization, adding window, breaking point detection, preemphasis.The purpose of phonetic feature is that time dependent phonetic feature sequence is extracted from speech waveform.

The theory and practice of current little vocabulary quantity alone word voice identification is all fairly perfect, for example, using dynamic time warping (Dynamic Time Warping, DTW) algorithm, the algorithm solves the problems, such as template matching different in size, and under the less occasion of environmental disturbances, alone word voice accuracy of identification is already close to 100%, can apply completely in most of field of intelligent control, it can be seen that the embodiment of the present invention provides sound control method and can be guaranteed in control accuracy.

Below in conjunction with the accompanying drawings the technical solution of the present invention is further elaborated with specific embodiment.

Embodiment one

In order to solve problem present in aforementioned background art, the embodiment of the present invention provides a kind of sound control method, the method is applied to electronic equipment, the function that the sound control method is realized can be realized by the processor caller code in electronic equipment, certainly program code can be stored in computer-readable storage medium, it can be seen that, the electronic equipment at least includes processor and storage medium.

Fig. 1-2 realizes schematic flow sheet for the sound control method of the embodiment of the present invention one, and as shown in Figure 1-2, the sound control method includes：

Step S111, obtains the first isolated word；

Here, the electronic equipment is during being embodied as, any equipment with processor and storage medium can be included, such as in daily life common mobile terminal, Wearable terminal and fixed terminal, can also include car-mounted terminal, the transaction terminal of bank, the transaction terminal of supermarket, the delivery terminal of express delivery mailbag.Wherein mobile terminal can at least include handss machine, panel computer, personal digital assistant (PDA), navigator, game machine, intelligent toy etc., Wearable terminal can at least include intelligent watch, intelligent glasses, intelligent running shoes etc., fixed Terminal Type can at least include desktop computer, desktop computer, integral computer (All In One, abbreviation all-in-one), television set, projector, sound equipment etc., above intelligent toy, intelligence in intelligent watch refers to that equipment includes processor and storage medium, so as to automatically or according to the setting of operator such as user perform the instruction of some sequencing.

Here, first isolated word is voice word；During being embodied as, the first isolated word of the acquisition can have various implementations, for example, when electronic equipment includes voice collecting part such as Mike, processor in electronic equipment can call the voice of voice collecting part acquisition operations person such as user, then by the isolated word of isolated word recognition technical limit spacing first；When electronic equipment does not include voice collecting part, electronic equipment can also receive the voice signal that other equipment is sended over, then using the isolated word of isolated word recognition technical limit spacing first.In addition to both the above implements the mode for obtaining the first isolated word, those skilled in the art can realize step S111 according to various prior arts, and the present embodiment is not limited in any way, also repeats no more.

Step S112, judge first isolated word whether with default first Keywords matching；When being, into step S113；When no, into step S114；

Here, first key word is any one key word in the keyword set of primary nodal point；

Here, judge first isolated word whether with default first Keywords matching, the first judged result can be obtained, wherein when first judged result shows first isolated word with the first Keywords matching, into step S113, to obtain default second keyword set；When first judged result shows first isolated word with the first crucial word mismatch, into step S114, to export the first information, first information is used for prompting input speech signal again.

Step S113, according to first key word default second keyword set is obtained；

Here, when first judged result shows first isolated word with the first Keywords matching, default second keyword set is obtained according to first key word；

Here, second keyword set is combined into the keyword set of secondary nodal point, and the secondary nodal point is the child node of the primary nodal point；In other words, father node of the primary nodal point as the secondary nodal point.

Here, second keyword set belongs to the key word corresponding to the child node of the first key word, as Figure 1-3,10 are assumed as the first keyword set, 11 used as the first key word, when the first isolated word is matched with the first key word -1 (11), according to the first key word -1 the second keyword set (12) is obtained, when assuming that the second isolated word is matched with the second key word -2 (13), then then perform the corresponding instruction of the second key word -2.In this example, if by the first key word -1 (11) corresponding to father node (primary nodal point), the node corresponding to the second key word -1 to the second key word-N is child node (secondary nodal point).

Here, it is described that such a way can be adopted according to default second keyword set of first key word acquisition during being embodied as：According to default first related information of first keyword query, obtain the second keyword set, wherein described first related information is used to show the incidence relation (filiation) between the first key word and the second key word, during being embodied as, first related information can be embodied in the form of list.

Step S114, exports the first information, returns to step S111；

Here, when first judged result shows first isolated word with the first crucial word mismatch, the first information is exported, wherein, first information is used for prompting input speech signal again.It should be noted that can also carry the reason for being input into again in the first information, the voice that such as user speaks is too small, or it is dialect that user speaks, and electronic equipment can not be recognized, then the reason for being input into again is " not catching "；For another example, although the voice that user speaks is although clear, but the first isolated word and the first crucial word mismatch, then the reason for being input into again is " without corresponding instruction, to point out user to change order ".

Here, when electronic equipment has display screen, the information of the output first can be：First information is included on the display screen of electronic equipment；When electronic equipment has voice output part such as microphone or sound equipment etc., the information of the output first can be：First information is played in the form of speech.

Step S115, obtains the second isolated word；

Here, second isolated word is also voice word；The mode for obtaining the second isolated word can be with by the way of the first isolated word of acquisition be similar.

Step S116, judge second isolated word whether with the second Keywords matching, when being, into step S117, when no, into step S118；

Here, second key word is any one key word in the second keyword set；

Here, judge second isolated word whether with the second Keywords matching, the second judged result can be obtained；When second judged result shows second isolated word with the second Keywords matching, into step S117, to perform the corresponding instruction of second key word.When second judged result shows second isolated word with the second crucial word mismatch, into step S118, to export the first information.

Step S117, performs the corresponding instruction of second key word；

When second judged result shows second isolated word with the second Keywords matching, the corresponding instruction of second key word is performed.

Step S118, exports the first information, returns to step S115；

Here, when second judged result shows second isolated word with the second crucial word mismatch, first information is used for prompting input speech signal again.

In the embodiment of the present invention, methods described also includes：

When first judged result shows first isolated word with the first Keywords matching, the corresponding instruction of first key word is performed.

In the embodiment of the present invention, the first keyword set and the second key word are default, and during implementing, the first keyword set and the second keyword set can exist with command word set or instruction set.First keyword set and the second keyword set can be set in advance on the storage medium of electronic equipment itself, it is stored on the storage medium of other equipment, wherein other equipment can wiredly and/or wirelessly mode be attached with electronic equipment with employing, specifically, first keyword set and the second keyword set can be stored on database facility, and the first keyword set and the second keyword set may be stored in long-range by wiredly and/or wirelessly mode equipment.

Here, during implementing, keyword set (such as the first keyword set and the second keyword set) can be dynamically generated, the list of song is for example played in mobile phone, user has downloaded new song, and the keyword set of song will add title, author, style of song of newly downloaded song etc..

In the words such as the first isolated word, the second isolated word, the first key word, the second key word in of the invention and following embodiment first, second, the relation being not offered as in sequence does not indicate that temporal precedence relationship yet, is used only for distinguishing each noun.As mentioned above, sound control method applying electronic equipment provided in an embodiment of the present invention, electronic equipment can include continuous several subprocess during Voice command is performed, if with sequencing relation subprocess is regarded as a node, so one speech control process can be expressed as primary nodal point and secondary nodal point in said method, primary nodal point and secondary nodal point in the embodiment of the present invention has the relation of the time of priority, and this priority time relationship is produced due to the filiation between primary nodal point and secondary nodal point.

In the embodiment of the present invention, voice match may have three kinds of results：1) expected keyword is correctly matched, the keyword for 2) matching is not expected keyword, 3) speech recognition failure；The embodiment of the present invention can correctly process situation 1) and 3).But if being the 2nd kind of situation, maloperation may be caused, if voice match is the keyword of mistake, directly perform the instruction, the chance that user does not confirm.Therefore, before the corresponding instruction of second key word is performed, also include in method provided in an embodiment of the present invention：

Step S119, exports inquiry message, and the inquiry message is used to ask the user whether to perform the corresponding instruction of second key word；

Step S120, obtains response message of the user to the inquiry message；

Step S121, determines whether to perform the corresponding instruction of second key word according to the response message.

So, after the completion of each voice match, all should once point out, point out user's keyword that the match is successful, allow user to understand current position, be easy to make next step selection.

It should be noted that said method merely illustrates two nodes, but do not represent the speech control process and there was only two subprocess.Above-mentioned primary nodal point can be understood as nth node, and correspondingly, secondary nodal point is construed as (N+1) node, and wherein N is the integer more than or equal to 0；When primary nodal point is interpreted as nth node, and secondary nodal point, when being interpreted as (N+1) node, above-mentioned sound control method can be expressed as：Step S121, obtains N isolated words, and the N isolated words are voice word；Step S122, judge the N isolated words whether with default N Keywords matchings, when being, into step S123, when no, into step S124, wherein the N key words are any one key word in the keyword set of nth node；Step S123, obtains default (N+1) keyword set, and (N+1) keyword set is combined into the keyword set of (N+1) node, and (N+1) node is the child node of the nth node；Step S124, exports the first information；Step S125, obtains (N+1) isolated word；Step S126, judge (N+1) isolated word whether with (N+1) Keywords matching, when being, into step S127, when no, into step S128；Step S127, performs the corresponding instruction of (N+1) key word；Step S128, exports the first information.

In the embodiment of the present invention, father and son's node is a kind of Multiway Tree Structure, and the son node number of father node is one or more, and maximum is less than 100, therefore is easy to remember and shows.The depth of multiway tree is not limited, and second level node can also become the father node of next stage node.The function of each node is divided three classes：1) first kind is turn function (0 arrives multiple), jumps to certain next stage node；2) Equations of The Second Kind is to return function (having and only one of which), jumps to even higher level of node, if root node, jumps to itself；3) the 3rd class is application function (0 arrives multiple), i.e., the function that equipment is performed for example plays video etc..

In the embodiment of the present invention, the first isolated word is obtained；Judge first isolated word whether with default first Keywords matching, first key word is any one key word in the keyword set of primary nodal point；When first isolated word and the first Keywords matching, default second keyword set is obtained according to first key word, second keyword set is combined into the keyword set of secondary nodal point, and the secondary nodal point is the child node of the primary nodal point；Obtain the second isolated word；Judge second isolated word whether with the second Keywords matching, second key word is any one key word in the second keyword set；When second isolated word and the second Keywords matching, the corresponding instruction of second key word is performed；So, it is possible to improve phonetic order recognition accuracy, so as to ensure that the instruction after parsing is the original idea of user, and then improve Consumer's Experience.

Embodiment two

Based on aforesaid embodiment, the embodiment of the present invention provides a kind of sound control method, the method is applied to electronic equipment, the function that the sound control method is realized can be realized by the processor caller code in electronic equipment, certainly program code can be stored in computer-readable storage medium, it can be seen that, the electronic equipment at least includes processor and storage medium.

Fig. 2-1 is the hardware composition structural representation of electronic equipment in the embodiment of the present invention two, and as shown in Fig. 2-1, the hardware composition of the electronic equipment 20 at least includes voice collecting part 21, processor 22 and storage medium 23, wherein：Can be with store program codes on storage medium 23, program code is used to perform the method for being provided of the present embodiment two, and in addition, storage medium is additionally operable to store set of keywords, such as the first set of keywords and the second set of keywords.Processor 22 is used for according to program code call voice collecting part, and the voice signal that voice collecting part is gathered then is received from voice collecting part, and voice signal is processed according to program code.Voice collecting part 21 receives calling for processor 22, then the voice signal of acquisition operations person such as user, and the voice signal for being gathered is sent to into processor 22.

Fig. 2-2 realizes schematic flow sheet one for the sound control method of the embodiment of the present invention two, and as shown in Fig. 2-1 and Fig. 2-2, the sound control method includes：

Step S201, using voice collecting part the voice signal that user sends is gathered；

Here, processor on the electronic equipment calls the voice signal that voice collecting part collection user sends, then voice collecting part is in the state of collection voice, when operator such as user sends voice signal, so voice collecting part collects the voice signal that user sends, then the voice signal for collecting is sent to processor by voice collecting part, and at this moment, processor gets the voice signal that voice collecting part is collected.

Here, the voice signal that processor is gathered to the voice collecting part can also carry out pretreatment, and for example, the pretreatment can include：Spectrum analyses and speech feature extraction are carried out to the voice signal, those skilled in the art can carry out pretreatment, repeat no more here using various prior arts to voice signal.

Step S202, judges whether the voice signal is isolated word, when being, into step S203, conversely, into step S204；

Here, first isolated word is voice word；

Here, judge whether the voice signal is isolated word, the 3rd judged result can be obtained, when the 3rd judged result show the voice signal be isolated word when, into step S203, will the voice signal be defined as the first isolated word；It is not isolated word when the 3rd judged result shows the voice signal, into step S204, i.e., extracts voice word from the voice signal according to default first rule, and the voice word for extracting is defined as into the first isolated word.

Here, judge that whether voice signal is that the mode of isolated word can adopt such a way：For example spectrum analyses are carried out to voice signal, be determined whether voice signal is isolated word according to frequency spectrum；For another example it is determined whether voice signal is isolated word according to voice signal duration length.In general, isolated word refers to single word, therefore, if voice signal is isolated word, then the frequency spectrum and duration length of isolated word just has than larger difference with whole sentence, such as：" song ", " Wang Fei ", " title " and " legend " is all four independent words (isolated word), and in short " song Wang Fei; song title is legend ", it is then in short, so above-mentioned each isolated word and above-mentioned a word, all it is different in terms of frequency spectrum and/or time span, therefore can determines whether voice signal is isolated word according to frequency spectrum and/or time span.

Step S203, by the voice signal the first isolated word is defined as；

Here, when it is isolated word that the 3rd judged result shows the voice signal, the voice signal is defined as into the first isolated word.

Step S204, voice word is extracted according to default first rule from the voice signal, and the voice word for extracting is defined as into the first isolated word；

Here, when it is not isolated word that the 3rd judged result shows the voice signal, voice word is extracted from the voice signal according to default first rule, and the voice word for extracting is defined as into the first isolated word.

Here, first rule refers to that some for pre-setting are used to extract the rule of voice word, and generally speaking, the rule of extraction mainly extracts the noun in voice signal, can also include certainly：Those words can determine as isolated word, and which word is not determined as isolated word, for example during practice, it was found that demonstrative pronoun " I, you, he, it, this, that " etc. during speech recognition, not what effect, therefore when extracting, it is also possible to do not extract or ignore.

For example it is bright " extracting voice word from the voice signal according to default first rule " below, example 1, the voice signal that user says is for " I wants the song for listening Wang Fei, and song title is legend；I wants to see a film, and movie name is《Dogface opens loud, high-pitched sound》", if the first rule is different, then the result (result is voice word) of extraction is also by difference：If 1) the first rule is " extract noun in voice signal ", then the result of extraction is " I, Wang Fei, song, song title, legend, film, movie name, dogface open loud, high-pitched sound "；If 2) the first rule is " extract the noun in voice signal, and ignore pronoun, verb and predicate ", then the result of extraction is " Wang Fei, song, song title, legend, film, movie name, dogface open loud, high-pitched sound "；If 3) electronic equipment can only play song, two can not play film, so the word of film etc is then not as the object for extracting, that is the first extracting rule is " extracting the noun in voice signal and the word in terms of ignoring film ", then it is " I, Wang Fei, song, song title, legend " to extract result.

Example 2, the voice signal that user says is " I wants to listen song ", if the first rule is different, then the result of extraction is also by difference：If 1) the first rule is " extracting the noun in voice signal ", then the result of extraction is " I, song "；If 2) the first rule is " extract the noun in voice signal, and ignore pronoun, verb and predicate ", then the result of extraction is " song ".

Here, above-mentioned step S201 to step S204 actually provides one kind and realizes " obtaining the mode of the first isolated word ".

Step S114, exports the first information, returns to step S201；

Here, first information is used for prompting input speech signal again.

Step S115, obtains the second isolated word；

Here, second key word is any one key word in the second keyword set；

Step S117, performs the corresponding instruction of second key word；

Step S118, exports the first information, returns to step S115；

Here, first information is used for prompting input speech signal again.

Step S112 in the embodiment of the present invention two is to step S118 corresponding to step corresponding in embodiment one, therefore, those skilled in the art can understand step S112 in embodiment two to step S118 refering to embodiment one, be to save length, repeat no more here.

In the embodiment of the present invention, as it was previously stated, the quantity of the voice word for extracting is equally likely to 1, it is likely to be equal to 0, it is also possible to more than 1, two kinds of quantity according to the voice word for extracting is provided below to realize the mode of " the voice word for extracting is defined as into the first isolated word ", wherein

Mode one, by the voice word for extracting the first isolated word is defined as, including：

When the quantity of the voice word for extracting is equal to 0, it is considered as without voice signal input, the first information is exported, to point out user to be input into again.When the quantity of the voice word for extracting is equal to 1, the voice word for extracting is defined as into the first isolated word；When the quantity of the voice word for extracting is more than 1, each described voice word for extracting all is defined as into the first isolated word.In the present embodiment, whether the quantity of the voice word for no matter extracting is more than 1, all using the voice word for extracting as the first isolated word.

Mode two, it is described that the voice word for extracting is defined as into the first isolated word, including：

When the quantity of the voice word for extracting is more than 1, the second information is exported, second information is used to point out user to select a voice word from the voice word for extracting；First operation of user is obtained, described first operates for selecting a voice word from the voice word for extracting；Based on the described first operation, the voice word of selection is defined as into the first isolated word.

Here, the second information of the output is similar with the first information is exported in embodiment one, repeats no more here.

Here, first operation can be any kind of user operation, for example, when the input equipment of electronic equipment is touch-sensitive display screen, first operation can be touch operation, when the input equipment of electronic equipment is mouse, first operation can be the clicking operation that user is carried out using mouse, and when the input equipment of electronic equipment is button, first operation can be button operation, when the input equipment of electronic equipment is Mike, first operation can be voice command.

In another embodiment of the invention, flow chart as Figure 2-3 can also be adopted, as Figure 2-3, the flow process includes：

Step S211, using voice collecting part the voice signal that user sends is gathered；

Step S212, judges whether the voice signal is certain key word, when being, into step S213, conversely, into step S214；

Step S213, by the voice signal the first key word is defined as；

Step S214, exports the first information；

Step S215, judges the whether corresponding application function of the first key word, when being, into step S218, when no, into step S217；

Here, because each key word both corresponds to a node, then in other words, each node can also have corresponding application function in addition to can be to having child node.Application function can be with any function, and for example, when method application field of play provided in an embodiment of the present invention, application function can be attacked；When field of express delivery is applied to, application function can print corresponding address.

Step S216, the first key word of execution is corresponding to redirect or returns function, obtains the second keyword set；

Here, redirect and return function and be all not belonging to application function, therefore, turn function and function is returned merely to make flow process enter next step, actually user do not exported.

Step S217, by the second keyword set the first keyword set is set to；

Step S218, output information " whether performing application function xxx "；

Step S219, obtains the 3rd keyword set "Yes" and "No"；

Step S220, using voice collecting part the voice signal that user sends is gathered；

Step S221, judges whether the voice signal is key word, when being, into step S223；When no, into step S222；

Here, the key word in step S221 only includes "Yes" and "No", therefore, the 3rd keyword set only includes two key words of "Yes" and "No".

Step S222, exports the first information；

Step S223, judges whether key word is "Yes", when being, into step S224, conversely, into step S211；

Step S224, performs the corresponding application function of first key word.

In the embodiment shown in Fig. 2-3, when voice match goes out keyword, before corresponding function is performed, inquiry message will be exported for key word, the inquiry message is used to ask the user whether to perform the corresponding instruction of second key word；Then user can be with phonetic entry response message, and the response message is the response to the inquiry message, then by the analysis to user response information, it is determined whether perform the corresponding application function of first key word.

In another embodiment of the invention, flow chart as in Figure 2-4 can also be adopted, as in Figure 2-4, the flow process includes：

Step S231, using voice collecting part the voice signal that user sends is gathered；

Step S232, judges whether the voice signal is key word, when being, into step S233, conversely, into step S224；

Step S233, exports the first information；

Step S234, by the voice signal the first key word is defined as；

Step S235, judges the whether corresponding application function of the first key word；

Step S236, according to first key word the second keyword set is obtained；

Step S237, using voice collecting part the voice signal that user sends is gathered；

Step S238, judges whether the voice signal is key word, when being, into step S240, conversely, into step S239；

Step S239, exports the first information；

Step S240, by the voice signal the second lonely key word is defined as；

Step S241, judges the whether corresponding application function of the second key word；

Step S242, performs the corresponding instruction of second key word；

Step S243, output information " whether performing application function xxx "；

Step S244, obtains the 3rd keyword set ("Yes" and "No")；

Step S245, using voice collecting part the voice signal that user sends is gathered；

Step S246, judges whether the voice signal is "Yes"；

Step S247, exports the first information；

Step S248, performs the first key word corresponding function.

Embodiment three

Fig. 3 realizes schematic flow sheet for the sound control method of the embodiment of the present invention three, as shown in figure 3, the sound control method includes：

Step S301, using voice collecting part the voice signal that user sends is gathered；

Step S302, judges whether the voice signal is isolated word, when being, into step S303, conversely, into step S304；

Here, first isolated word is voice word；

Step S303, by the voice signal the first isolated word is defined as；

Step S304, voice word is extracted according to default first rule from the voice signal；Into step S305 or step S306；

Here, step S301 in the embodiment of the present invention three to step S304 corresponding to step S201 in embodiment two to step S204, therefore, those skilled in the art can understand step S301 in embodiment three to step S304 refering to embodiment two, to save length, repeat no more here.

Step S305, when the quantity of the voice word for extracting is equal to 1, by the voice word for extracting the first isolated word is defined as；

Step S306, when the quantity of the voice word for extracting is more than 1, judges whether there are father and son's node relationships between the voice word for extracting, when being, into step S307, conversely, into step S308；

Here, judge whether there are father and son's node relationships between the voice word for extracting, obtain the 4th judged result；When the 4th judged result shows to have father and son's node relationships between the voice word for extracting, into step S307, conversely, into step S308；

Here, judge whether there are father and son's node relationships to be determined using such a way between the voice word for extracting：For example, using the voice word for extracting as the key word of father node, then, default first related information is searched, the sub- keyword set corresponding to the key word is obtained, first related information is used to show the sub- keyword set corresponding to the key word of father node；Such as fruit keyword set includes the voice word for extracting, then determine and there are father and son's node relationships between the voice word and other voice words, determine whether the voice word that others are extracted has father and son's node relationships with other remaining voice word times for extracting with this.

Step S307, will be defined as the first isolated word, subsequently into step S112 as the corresponding key word of child node；

Here, when the 4th judged result shows to have father and son's node relationships between the voice word for extracting, the first isolated word will be defined as the corresponding key word of child node.

Step S308, using the voice word for extracting all as the first isolated word；Subsequently into step S112；

Here, when the 4th judged result shows to have father and son's node relationships between the voice word for extracting, all using the voice word for extracting as the first isolated word；

Here, above-mentioned step S305 to step S308 actually provides one kind and realizes " the voice word for extracting being defined as into the first isolated word " in embodiment two.

Step S114, exports the first information, returns to step S301；

Here, first information is used for prompting input speech signal again.

Step S115, obtains the second isolated word；

Here, second key word is any one key word in the second keyword set；

Step S117, performs the corresponding instruction of second key word；

Step S118, exports the first information；

Here, first information is used for prompting input speech signal again.

Step S112 in the embodiment of the present invention three is to step S118 corresponding to step corresponding in embodiment one, therefore, those skilled in the art can understand step S112 in embodiment two to step S118 refering to embodiment one, be to save length, repeat no more here.

Example IV

Fig. 4 realizes schematic flow sheet for the sound control method of the embodiment of the present invention four, as shown in figure 4, the sound control method includes：

Step S111, obtains the first isolated word；

Step S113, obtains default second keyword set, into step S405 according to first key word；

Step S114, exports the first information, returns to step S111；

Step S405, exports the key word in second keyword set, and exports the 4th information；

Here, the 4th information is used to point out user to select one of them key word to be defined as the second isolated word from second keyword set；During being embodied as, the key word in the second keyword set can be included in the 4th information, so only need to export the 4th information.In the present embodiment, the 4th information of output is similar with the mode of the first information of aforementioned output, therefore repeats no more.

Step S406, obtains the 3rd operation of user, and the 3rd operation is that the response to the 4th information is operated；

Step S407, based on the described 3rd operation the second isolated word is obtained.

Here, the 3rd operation is similar with the aforesaid first operation, therefore repeats no more.

Here, aforementioned step S405 to step S407 substantially provide it is a kind of " obtain the second isolated word " mode, second isolated word is also voice word.

Here, second key word is any one key word in the second keyword set；

Step S117, performs the corresponding instruction of second key word；

Step S118, exports the first information, returns to step S405.

Embodiment five

Fig. 5 realizes schematic flow sheet for the sound control method of the embodiment of the present invention five, as shown in figure 5, the sound control method includes：

Step S501, obtains the first isolated word；

Step S502, judge first isolated word whether with default first Keywords matching；When being, into step S503；When no, into step S114；

Step S503, exports the 3rd information；

Here, the 3rd information is used to pointing out the voice signal whether the first key word of user's identification is user input；Export the 3rd information similar with the mode of the first information of aforementioned output, therefore repeat no more.

Step S504, obtains second operation of user；

Here, second operation is that the response to the 3rd information is operated；

Step S505, when determining the voice signal that the first key word is user input based on the described second operation, according to first key word default second keyword set is obtained；

Step S506, based on described second operation determine the first key word for user input voice signal when, export the first information, return to step S501.

Step S114, exports the first information, returns to step S501；

Step S115, obtains the second isolated word；

Here, second key word is any one key word in the second keyword set；

Step S117, performs the corresponding instruction of second key word；

Step S118, exports the first information, returns to step S115.

Embodiment six

Based on aforesaid embodiment, the embodiment of the present invention is providing a kind of sound control method, the purpose of the method is to realize the accurate control by phonetic order to exquisite system, specifically, by the accuracy rate for improving phonetic order identification, reach and be accurately positioned with control targe, so as to realize being accurately controlled on one's own initiative to complication system, and then ensure the behavior of system and the expected strict conformance of user.

The embodiment of the present invention will focus on introduces in following above example relation between primary nodal point and secondary nodal point，In the various embodiments of the invention，Control targe is divided into into any number of functional nodes by function，As in Figure 6-1，The permutation and combination in the form of multiway tree，Wherein each functional node has zero to multiple subfunction nodes，For example，Root node 60 has multiple subfunction nodes (from functional node 61 to 6n)，Wherein there are multiple subfunction nodes again as the functional node 61 of the subfunction node of root node (from functional node 611 to 61n)，Wherein functional node 612 no longer has subfunction node，And functional node 61n has two sub- functional nodes (functional node 61n1 and 61n2)，Wherein functional node 61n1 no longer has subfunction node，And functional node 61n2 has two sub- functional nodes (functional node 61n21 and 61n22).

Fig. 6-2 is each functional node as corresponding relation schematic diagram during father node and between child node, as in fig. 6-2, each functional node in Fig. 6-1 can serve as a father node, and connected next stage functional node can serve as child node (child node 1 to child node n), each father node has multiple states again, for example choose, to be selected, dormancy and locking, wherein, choose and represent the voice signal that user sent and match with the order corresponding to the node or key word, so the node is activated.It is to be selected refer to the node as child node father node (in other words, the even higher level of node of the node) it is in selected state, then as choosing the child node of node to be then in state to be selected, for example, when root node is in selected state, the state of functional node 61 to 6n is to be selected.Each functional node also will be locked and refer to the functional node in lasting working condition in a dormant state when dormancy refers to equipment in a dormant state, and not switched state, and unlocked and referred to and unlock.

When can be seen that each functional node as father node from Fig. 6-1 and Fig. 6-2, all with the entrance for jumping to child node.Therefore, in method provided in an embodiment of the present invention, each functional node is a node in multiway tree, so as to father node can be scanned for downwards, to traverse its child node, so as to realize the function of corresponding child node.If from the beginning of root node, all child nodes of whole how poor tree can be downwards searched, to realize the corresponding function of all child nodes.

Based on the Multiway Tree Structure of function above node, by way of the method that the present invention is provided is issuing step by step alone word voice password, destination node is progressively navigated to.The key element of composition multiway tree, is to improve multiway tree instruction system search efficiency, fault-tolerance and extensibility in addition to functional node and its interrelational form, and the embodiment of the present invention especially increased 7 special global control functions, and this 7 global control functions are divided into：Return, inquire about, redirect, lock, unlock, insert and delete.Global control function all can work within the whole system working cycle, that is, instruction system is ready to receive global control instruction.And functional node instruction set only works when the functional node is selected.Wherein：

Function is returned, upper level energy node or root node work(is returned to, false command correction, and the function that multiway tree is searched for upwards is realized.Query function, to the current functional node of user feedback and the information of child node, realizes system mode and understands, user according to current state, can determine the searching route of next step.Turn function, can jump to any other functional node specified, and the node that such as root node or Jing frequentations are asked, it is not necessary to search for step by step, directly redirect improves search efficiency.Lock function, by system control current functional node is locked onto, and in addition to " unblock " is instructed, system no longer receives any phonetic order.During locking, systems stay performs a certain function, for example, play film.After unblock, system recovery receives instruction.Unlocking function, releases voice command system locking, recovers to receive phonetic order.Insertions function, inserts new subfunction node under a certain functional node, is that system increases new function, so as to ensure that the extensibility of system.Function is deleted, the functional node specified, the unnecessary function of removal system is deleted.

The depth of multiway tree depends on the complexity of system and the dividing mode of function, should try one's best and reduce the depth of multiway tree, so as to ensure the efficiency that destination node is searched for.And maximum son node number in multiway tree, that is, the maximum number of phonetic order that node can be received.Depending on three key elements：The ability of the complexity, the similarity degree of instruction and speech recognition engine isolated word recognition of goal systems.According to the level of development of current speech recognition technology, on the premise of instruction is reasonable in design, the isolated word recognition accuracy accessible 100% of the instruction of less than 50.So, the multiway tree voice command system of one three-level can at most cover the individual functional nodes of 2551 (1+50+50*50=2551), level Four is the individual functional nodes of 127551 (1+50+50*50+50*50*50=127551), is incremented by step by step by 50 multiple per many one-levels.

In order to accurately trigger the function of belonging to some node, the person of issuing an order such as user sends step by step instruction, the functional node specified is navigated to Jing after many times redirect.Such as user wants to adjust mobile phone brightness to 80, so user needs gradually to send verbal instructions, verbal instructions are followed successively by " system setting "-＞ " display brightness "-＞ " 80 ", this verbal instructions (or referred to as phonetic order) sequence is called " command chain " in the embodiment of the present invention, the more short then search efficiency of command chain is higher, efficient voice command system should ensure that the balance of multiway tree, make the instruction chain length of all leaf nodes basically identical.

Search target can be accurately positioned in theory the tiny node of unlimited subdivision from Premium Features node to low order function node iterative method.For example, " please parcel being sent it to into No. 1 Room of Unit 2, building 204 of BeiJing, China Chaoyang District Beiyuan home Wang Chun gardens cell ", so complicated phonetic order, current speech recognition engine almost cannot be parsed correctly, and pass through multiway tree voice command system and then can accurately complete, illustrate in the future in latter embodiments of the present invention.

Additionally, in the process for implementing, it is " command cue menu " that can also arrange an optional key element, and command cue menu always shows present node instruction set and global control function instruction.By listing all instructions to be received, to point out user's next step can selection operation.Command cue menu is not required, and it can be realized by another way, will node correspondence UI element outward appearances, change with node state and update, such as when a certain node is selected, the node corresponding icon is arranged to red.

Citing below is illustrated，Fig. 6-3 is the schematic diagram of each node when embodiment of the present invention clematis stem crosses Voice command search engine search destination object，As shown in Fig. 6-3，Root node is and starts 630 nodes，And the node of map 631、The node of picture 632、The node of webpage 633、The node of library 634、The node of video 635 and the node of audio frequency 636 are all the child node of 630 nodes of beginning，Wherein the node of video 635 includes multiple child nodes again，For example the node of video 635 includes a node of film 6351，And the node of film 6351 further includes multiple child nodes，A wherein child node of the node of comedy 63511 as the node of film 6351，The node of comedy 63511 further includes several child nodes，Wherein domestic 007 node 635112 is a child node of comedy.The course of work of the search engine is described below：

User issues step by step when verbal instructions are sent, and such as user is intended to see a film originally《Domestic 007》, in prior art, user is to say that in short that is, " I wants to see comedy movie《Domestic 007》", then electronic equipment is just scanned for according to the words that user says, because the words is long, electronic equipment will run into many difficulties during speech recognition, and for example, electronic equipment is possible to become " comedy " identification " drama ", will《Domestic 007》It is identified as《007》, due to speech recognition errors, then let alone searching final result can be correct.And in sound control method provided in an embodiment of the present invention, using verbal instructions are issued step by step, then verbal instructions are identified as voice signal to user by electronic equipment one by one, for example, be equally that user wants to see《Domestic 007》So user is completely different with prior art in the process that the process for sending verbal instructions is performed with electronic equipment, specifically, (corresponding to the beginning in Fig. 6-3) after user determines to start, the node of map 631, the node of picture 632, the node of webpage 633, the node of library 634, the node of video 635 and the node of audio frequency 636 are all in state to be selected；User will first say verbal instructions " video ", so the node of video 635 is in selected state, and the child node of the node of the video 635 such as node of film 6351 and TV play node are all in state to be selected, then user then says verbal instructions " film ", so film 6351 is in selected state, and then user then says verbal instructions " comedy ", then the node of comedy 63511 is in selected state, verbal instructions " domestic 007 ", then search film is said by end user《Domestic 007》, next step user can say verbal instructions " opening ", then search engine can think that user plays film《Domestic 007》, thus, user can then watch film《Domestic 007》.

In the process of above-mentioned search, assume the wrong verbal instructions of user, as long as the verbal instructions that so user says " return ", for example, originally to say " video " and theory of error into " audio frequency ", then the node of audio frequency 636 i.e. in choosing, when user recognizes the verbal instructions of oneself output error, user can be input into " return " password, then then return to the even higher level of node (starting 630 nodes) of the node of audio frequency 636.During being embodied as, the verbal instructions of " return " can make electronic equipment return to even higher level of node and can also return to root node, which rank of node is specifically returned to, those skilled in the art can voluntarily arrange according to practical situation, repeat no more here.

Embodiment seven

Illustrate as a example by below using mobile phone as electronic equipment, the speech control system that above example is provided can be with software program (App in the process for implementing, application form) embodying, for example, can be with the voice assistant or speech control system on mobile phone.Fig. 7 is composition schematic diagram of the speech control system when realizing in the embodiment of the present invention seven, as shown in fig. 7, the speech control system includes：

1) functional node registration table

Function register table is global static list, and all functional nodes all add log-on message in the registration table in multiway tree, and log-on message structure is as follows：

Functional node identification information (ID)：Functional node access mode (such as memory address, index, handle)；Corresponding node can be inquired about and access in registration table by node ID.

2) topological structure of functional node and multiway tree

● following information is included in functional node data structure：

● node ID, for representing node unique mark；

● command word, navigate to the node verbal instructions from father node for representing；

● father node ID, for representing father node unique mark；

● node state, the state current for representing node, can be the one kind in " choosing ", " to be selected ", " dormancy ", " unlock/lock ".When systematic search is to a certain node, the node state is " choosing ", that is, present node.The child status of present node are " to be selected ", and the state in addition to above two state is " resting state " or " unlock/lock ".

● child node search instruction collection, for representing that child node searches for the set of information, child node is searched for message structure and is, child node command word：Child node ID；

● nodal function instruction set, for representing the set of nodal function recalls information, nodal function message structure is, nodal function command word：The function entrance that nodal function is realized.

3) global control function

To improve fault-tolerance and reverse search function, the embodiment of the present invention increases a global control function " return ", explains in detail the return function of refer in embodiment six, and " return " is concentrated all the time in voice recognition instruction.

4) phonetic order prompting menu

During implementing, phonetic order prompting menu can be at the display screen frame of electronic equipment, current speech instruction system " command word " to be matched to be shown all the time.The content of the menu is present node instruction set and global control function instruction set, and main purpose is the current acceptable instruction of prompting user.

5) towards the interface of third-party application

For convenience third party is applied or component adds the voice command system of the embodiment of the present invention, the speech control system to provide such as lower interface API for third party user：

RegisterNode (node instance)；

AddNode (command chain)；

DeleteNode (command chain)；

DisableNode (command chain)；

EnableNode (command chain)；

Its APP or component are carried out multiway tree function division by third party, realize all functional nodes.Third party manufacturer in multiway tree system, can be inserted, be deleted, be activated by calling interface API, fail its functional node.

Below following explanation is made to each noun in the embodiment of the present invention：

Function：Goal systems meet certain behavior of user's request, for example, play video, check document, listen to music etc..

Functional node：Node in multiway tree, it provides the link and some functions of father and son's node；

Present node：Node in currently selected, whole system is while only one of which present node；

Voice recognition instruction collection：The set of current phonetic order to be identified, wait is matched with user password；

Node instruction collection：What node was supported, the set of child node search instruction and nodal function instruction；

Command chain：Job sequence from root node to destination node.

Embodiment eight

Based on aforesaid embodiment, the embodiment of the present invention provides again a kind of sound control method, and Fig. 8 realizes schematic flow sheet for the sound control method of the embodiment of the present invention eight, as shown in figure 8, the method includes：

Step S800, starts, i.e. system start-up.

Step S801, system loads register all functional nodes in functional node registration table, and root node is set to into present node.

Here, can initialize after system loads.

Step S802, present node is set to into " choosing " state, it is the corresponding UI elements of present node by UI Preferences, present node child node is set to into " state to be selected ", voice recognition instruction collection is set to into present node instruction set and global control function instruction.

Here, phonetic order prompting menu is emptied, present node instruction set and global control function instruction is added again.Its child nodes search command word could be arranged to green, and nodal function command word could be arranged to blueness, and global control function command word could be arranged to yellow.If not using phonetic order menu, renewable present node, the display of child node UI elements corresponding with nodal function, to point out the state of each UI elements corresponding node of user or function.

Step S803, the input of speech recognition engine waiting voice；

Here, speech recognition engine waits user input phonetic order in state of intercepting, until capturing voice signal.Wherein speech recognition engine can be used as speech control system ingredient, for calling voice collecting part to gather voice signal, and the voice signal that reception voice collecting part is gathered, wherein voice signal can be input in the way of isolated word, if the voice signal that user sends not is isolated word, so voice signal can be extracted, to obtain isolated word.

Step S804, judges whether voice signal has the command word of matching；

Here, by capture voice signal voice recognition instruction concentrate match, if with a certain instructions match success, execution step S805, if it fails to match, execution step S803, command word described here is equivalent to the key word in above example.

Step S805, obtains the command word for matching simultaneously audio playback command word；

Here, the audio playback command word that the match is successful, is to point out user matching result.

Step S806, judges whether the command word that the match is successful is some child node function of search, if it is, execution step S807, if it is not, execution step S808.

All node states are set to resting state by step S807, and selected child node is set to into new present node, continue executing with step S802.

Step S808, judges whether the command word that the match is successful is " return " overall situation control function, if it is, execution step S809；If it is not, execution step S810.

All node states are set to resting state by step S809, and father node is set to into new present node, continue executing with step S802.

Here, if not using phonetic order menu, the display of renewable node correspondence UI elements, to point out all nodes of user to be in " dormancy " state.The father node of present node is set to into new present node, if present node is root node, its father node is still root node itself.

Step S810, judges whether the command word that the match is successful is an application function, if it is, execution step S812；If it is not, execution step S811.

Step S811, the command word do not concentrated with the command word in any present node instruction set and global control instruction is matched, and plays voice " unknown command please be re-entered ", continues executing with step S802.

Step S812, by voice recognition instruction collection "Yes" is set to, and "No" confirms coding line, empties phonetic order prompting menu, adds menu item "Yes", and "No" confirms coding line.Audio playback " whether perform function XXX "；

Here, point out in menu, "Yes" could be arranged to green, and "No" could be arranged to redness.XXX represents the application function of the identification in step S810.

Step S813, the input of speech recognition engine waiting voice；

Here, speech recognition engine waits user input phonetic order in state of intercepting, until capturing voice signal.

Step S814, judges whether voice signal has the command word of matching；

Here, the voice recognition instruction that arranges in step S812 of voice signal of capture is concentrated and is matched, if with a certain instructions match success, execution step S815, if it fails to match, execution step S813.

Step S815, judges whether the command word matched in step S814 is "Yes".If it is, execution step S816；If it is not, execution step S802.

Step S816, the application function recognized in execution step S810, after the completion of function executing, execution step S802.

Embodiment nine

Based on aforesaid embodiment, the embodiment of the present invention to be applied to express delivery mailbag field as a example by illustrate, Fig. 9-1 is the corresponding relation schematic diagram between the father and son's node of the embodiment of the present invention nine, Fig. 9-2 realizes schematic flow sheet for the sound control method of the embodiment of the present invention nine, flow chart shown in Fig. 9-2 using program software when being realized, four functional modules can be divided into, wherein：

M1 is system initialization module, initializes speech control system, and systematic function Node registry table simultaneously registers all functional nodes.

M2 is system mode update module, and when present node changes, the module will update voice recognition instruction collection, and more new system shows, with the instruction set for pointing out user currently can send.

M3 is functional node search module, according to user instruction, destination node of the search with a certain function.

M4 is function executing module, and user is searched after the node comprising objective function, and the module carries out last confirmation and execution.

The sound control method can be regarded as Voice command Beijing mail distribution system, and speech recognition engine can adopt Microsoft speech recognition engine Microsoft Speech SDK, and development platform can be Microsoft .net4.0, and development language can adopt C#.With reference to Fig. 9-1 and Fig. 9-2, execution step of Voice command Beijing given below mail distribution system to function " parcel is sent it to into No. 1 Room of Unit 2, building 1201 of Chaoyang District Beijing Beiyuan home Wang Chun gardens cell ".

Step M1, starts；System loads, initialize speech control system, and all nodes in Beijing's mail distribution system are registered in functional node registration table, and root node " Beijing " is set to into present node.

Here, present node is " Beijing ".

Step M2, present node is set to " choosing " state；

Here, by " Beijing " node child node, " Haidian District ", " Chaoyang District ", " Dongcheng District " etc., is set to " state to be selected ", arranges speech recognition engine, voice recognition instruction collection is set to into " Haidian District ", " Chaoyang District ", the child node coding line such as " Dongcheng District " and " return " coding line.The child nodes such as phonetic order prompting menu, addition menu item " Haidian District ", " Chaoyang District ", " Dongcheng District " and " return " overall situation control function are emptied, points out user currently to send the set of instruction.The menu items such as its child nodes " Haidian District ", " Chaoyang District ", " Dongcheng District " are green, and global control function " return " menu item is yellow.

Step M3, node searching, the input of speech recognition engine waiting voice；

Specifically, after user sends instruction " Chaoyang District ", phonetic order is received, and the voice recognition instruction arranged in step M2 concentrates matching.If " Chaoyang District " child node is set to present node by successful match to " Chaoyang District " command word, system playback " Chaoyang District " command word.If " Haidian District " child node is set to present node by erroneous matching to " Haidian District " command word, system playback " Haidian District ".Continue executing with M2, M3 step.Now, user has already known that last time has matched the child node " Haidian District " of mistake, therefore, send instruction " return ".System is instructed according to " return ", again father node " Beijing " is set to into present node, and continues executing with M2, M3 step.Now user can re-emit " Chaoyang District " instruction.The mechanism that this mistake is returned is all suitable for during any node searching, repeats no more in below step.

Here, user sends successively phonetic order, Chaoyang District to system>Beiyuan home>Wang Chunyuan>No. 1 building>Unit 2>1201>Parcel, control system receive user phonetic order is sent to repeat M2, M3 step, until searching node " sending parcel ", when present node is " sending parcel ", it is application function that M307 steps detect present node, continues executing with step M4.

Step M4, the corresponding application function of execution present node；

Specifically, speech recognition engine is set, voice recognition instruction collection is set to into "Yes", "No" confirms coding line.Phonetic order prompting menu is emptied, adds menu item "Yes", "No" confirms coding line, to point out user currently to send the set of instruction.Wherein "Yes" is green, and "No" is redness.Whether audio playback " send parcel to the unit 1201 of Chaoyang District Beijing Beiyuan home Wang Chun gardens the 1st building 2 ", and waits user speech to instruct.After user sends instruction "Yes", system receives phonetic order, and is confirming instruction set "Yes", matches in "No".If to "Yes" command word, system perform function " sending parcel " arrives " unit 1201 of Chaoyang District Beijing Beiyuan home Wang Chun gardens the 1st building 2 " to successful match, and continues executing with step M2.If user sends instruction "No", not perform function, direct execution step M2.

In embodiments of the present invention, speech control system is initialized, all nodes in Beijing's mail distribution system is registered in functional node registration table, root node " Beijing " is set to into present node.Here, after user sends instruction " Beijing ", phonetic order " Beijing " is received, " Beijing " node is set to into " choosing " state；

After user sends instruction " Chaoyang District ", phonetic order is received, and matching is concentrated in the voice recognition instruction for arranging.If " Chaoyang District " child node is set to present node by successful match to " Chaoyang District " command word, system playback " Chaoyang District " command word.If " Haidian District " child node is set to present node by erroneous matching to " Haidian District " command word, system playback " Haidian District ".Now, user has already known that last time has matched the child node " Haidian District " of mistake, therefore, send instruction " return ".System is instructed according to " return ", again father node " Beijing " is set to into present node.Now user can re-emit " Chaoyang District " instruction.The mechanism that this mistake is returned is all suitable for during any node searching, will not be described in great detail.

" Chaoyang District " node is set to into " choosing " state.By the child node of " Chaoyang District " node, " Wangjing ", " Beiyuan home ", " San Litun " etc. is set to " state to be selected ".Speech recognition engine is set, voice recognition instruction collection is set to into " Wangjing ", " Beiyuan home ", the child node such as " San Litun " coding line and " return " coding line.Phonetic order prompting menu, addition menu item " Wangjing ", " Beiyuan home ", the child node such as " San Litun " and " return " overall situation control function are emptied, to point out user currently to send the set of instruction.Its child nodes " Wangjing ", " Beiyuan home ", menu item such as " San Litun " is green, and global control function " return " menu item is yellow.

After user sends instruction " Beiyuan home ", phonetic order is received, and matching is concentrated in the voice recognition instruction for arranging.If " Beiyuan home " child node is set to present node by successful match to " Beiyuan home " command word, system playback " Beiyuan home " command word." Beiyuan home " node is set to into " choosing " state.By the child node of " Beiyuan home " node, " Wang Chunyuan ", " Embroidered Juyuan ", " rising sun brightness is difficult to understand all " etc., it is set to " state to be selected ".Setting speech recognition engine, by voice recognition instruction collection " Wang Chunyuan " is set to, " Embroidered Juyuan ", child node coding line and " return " coding line such as " rising sun brightness is difficult to understand all ".The child nodes such as phonetic order prompting menu, addition menu item " Wang Chunyuan ", " Embroidered Juyuan ", " rising sun brightness is difficult to understand all " and " return " overall situation control function are emptied, to point out user currently to send the set of instruction.The menu items such as its child nodes " Wang Chunyuan ", " Embroidered Juyuan ", " rising sun brightness is difficult to understand all " are green, and global control function " return " menu item is yellow.

After user sends instruction " Wang Chunyuan ", phonetic order is received, and the voice recognition instruction in middle setting concentrates matching.If " Wang Chunyuan " child node is set to present node by successful match to " Wang Chunyuan " command word, system playback " Wang Chunyuan " command word." Wang Chunyuan " node is set to into " choosing " state.By the child node of " Wang Chunyuan " node, " No. 1 building ", " No. 2 building ", " No. 10 building " etc., it is set to " state to be selected ".Setting speech recognition engine, by voice recognition instruction collection " No. 1 building " is set to, " No. 2 building ", child node coding line and " return " coding line such as " No. 10 building ".The child nodes such as phonetic order prompting menu, addition menu item " No. 1 building ", " No. 2 building ", " No. 10 building " and " return " overall situation control function are emptied, to point out user currently to send the set of instruction.The menu items such as its child nodes " No. 1 building ", " No. 2 building ", " No. 10 building " are green, and global control function " return " menu item is yellow.

After user sends instruction " No. 1 building ", phonetic order is received, and matching is concentrated in the voice recognition instruction for arranging.If " No. 1 building " child node is set to present node by successful match to " No. 1 building " command word, system playback " No. 1 building " command word." No. 1 building " node is set to into " choosing " state.By the child node of " No. 1 building " node, " Unit 1 ", " Unit 2 ", " Unit 3 ", " Unit 4 " is set to " state to be selected ".Speech recognition engine is set, voice recognition instruction collection is set to into " Unit 1 ", " Unit 2 ", " Unit 3 ", " Unit 4 " child node coding line and " return " coding line.Phonetic order prompting menu, addition menu item " Unit 1 ", " Unit 2 ", " Unit 3 ", " Unit 4 " child node and " return " overall situation control function are emptied, to point out user currently to send the set of instruction.Its child nodes " Unit 1 ", " Unit 2 ", " Unit 3 ", " Unit 4 " menu item is green, and global control function " return " menu item is yellow.

After user sends instruction " Unit 2 ", phonetic order is received, and the voice recognition instruction in middle setting concentrates matching.If " Unit 2 " child node is set to present node by successful match to " Unit 2 " command word, system playback " Unit 2 " command word.Here, "Yes" could be arranged to green, and "No" could be arranged to redness." Unit 2 " node is set to " choosing " state by audio playback " whether perform function XXX ".By the child node of " Unit 2 " node, " 101 ", " 102 ", " 1203 " etc. are set to " state to be selected ".Setting speech recognition engine, by voice recognition instruction collection " 101 " are set to, " 102 ", child node coding line and " return " coding line such as " 1203 ".The child nodes such as phonetic order prompting menu, addition menu item " 101 ", " 102 ", " 1203 " and " return " overall situation control function are emptied, to point out user currently to send the set of instruction.The menu items such as its child nodes " 101 ", " 102 ", " 1203 " are green, and global control function " return " menu item is yellow.

Here, after user sends instruction " 1201 ", phonetic order is received, and matching is concentrated in the voice recognition instruction for arranging.If " 1201 " child node is set to present node by successful match to " 1201 " command word, system playback " 1201 " command word." 1201 " node is set to into " choosing " state.By the function of " 1201 " node, " posting bill ", " sending parcel ", " sending out mail " is set to " state to be selected ".Speech recognition engine is set, voice recognition instruction collection is set to into " posting bill ", " sending parcel ", " send out mail " nodal function coding line and " return " coding line.Phonetic order prompting menu, addition menu item " posting bill ", " sending parcel ", " sending out mail " nodal function and " return " overall situation control function are emptied, to point out user currently to send the set of instruction.Wherein " post bill ", " sending parcel ", " sending out mail " nodal function menu item is blueness, global control function " return " menu item is yellow.

Specifically, after user sends instruction " sending parcel ", phonetic order is received, and matching is concentrated in the voice recognition instruction for arranging.If successful match is to " sending parcel " command word, system playback " sending parcel ".Speech recognition engine is set, voice recognition instruction collection is set to into "Yes", "No" confirms coding line.Phonetic order prompting menu is emptied, adds menu item "Yes", "No" confirms coding line, to point out user currently to send the set of instruction.Wherein "Yes" is green, and "No" is redness.Whether audio playback " send parcel to the unit 1201 of Chaoyang District Beijing Beiyuan home Wang Chun gardens the 1st building 2 ", and waits user speech to instruct.After user sends instruction "Yes", system receives phonetic order, and is confirming instruction set "Yes", matches in "No".If successful match is to "Yes" command word, system perform function " sending parcel " is arrived " unit 1201 of Chaoyang District Beijing Beiyuan home Wang Chun gardens the 1st building 2 ".If user sends instruction "No", not perform function.

Embodiment ten

Embodiment of the present invention above technical scheme is also applied to field of play, illustrates by taking fighting Lvbu by three heros as an example.Fighting Lvbu by three heros game is the game of militancy, is carried out using turn-based.In the prior art, user (i.e. player) is fought by the personage " Guan Yu ", " Liu is standby " and " Zhang Fei " in keyboard and mouse control fighting Lvbu by three heros game with the enemy personage " Lv Bu ", " Hua Xiong ", " Dong Zhuo " controlled by games, and the action of game includes：

Attack, success rate 100% reduces by 100 vital values of target person.

Rest action, recovers itself 80 vital values.

Close plumage stunt-power and split Huashan action, 60% success rate；If successfully hit, target person reduces by 200 vital values；If missed the objective, vital values are not reduced.

Liu is for scheme-sweet words, 70% success rate；If it succeeds, target person abandons next round action chance；If it fails, having no effect to enemy.

Zhang Fei's stunt-shout when sun, 100% success rate, reduces everyone everyone 50 vital values of enemy.

, by games control, when enemy's action, enemy personage can at random attack our personage " Guan Yu ", " Liu is standby " and " Zhang Fei ", success attack rate 100% for enemy personage " Lv Bu ", " Hua Xiong ", " Dong Zhuo " in game.Lv Bu is attacked, and reduces random life points between target 120 to 180.Magnificent male attack, reduces random life points between target 80 to 140.Dong Zhuo is attacked, and reduces random life points between target 60 to 120.The initial vital values of all persons are 1000 points, and life points are changed into 0 personage and withdraw from acting.The side that all personnel first withdraws from acting, is the losing side.

In technical scheme provided in an embodiment of the present invention, user can be fought by personage " Guan Yu ", " Liu is standby ", " Zhang Fei " in Voice command game.Figure 10-1 is the corresponding relation schematic diagram in fighting Lvbu by three heros game embodiment of the present invention between father and son's node, Figure 10-2 realizes schematic flow sheet for the sound control method of the embodiment of the present invention ten, flow chart shown in Figure 10-2 using program software when being realized, five functional modules can be divided into, wherein：

The flowchart of the Voice command part of the embodiment of the present invention ten is identical to step S816 with step S800 in Fig. 8.Its particularity is, the speech control system that it is represented, and in step S801 system initialization, loading is functional node set shown in Figure 10-1.In addition, after step S816 perform function terminates, continuing executing with step S817 game logic portion, rather than S802.

This example increased game logic on the basis of flow process Fig. 8 and realize part to realize game logic, i.e., step S817 is to step S825.

Example ten is comprised the steps of：

Step S800, starts, i.e. system start-up.

Step S801, system loads, all functional nodes in functional node registration table shown in loading figure 10-1, and root node " beginning " is set to into present node.

Step S802 to step S815 is identical with the corresponding steps in example eight, is described again here.

The application function recognized in step S816, execution step S810, after the completion of function executing, execution step S817.

Step S817, by performed our personage of action from root node " from the beginning of " child node set in delete.

Whether step S818, judgement " Lv Bu ", " Hua Xiong ", the vital values of " Dong Zhuo " node are all less than 0.If it is, execution step S819.If it is not, execution step S820

Step S819, game over, we wins.

Step S820, judge whether the personage of our vital values more than 0 has all performed epicycle action (our personage's epicycle action executings all are finished).If it is, execution step S821.If it is not, execution step S825

Step S821, computer control all enemy personages and perform epicycle action, and our personage is attacked at random.We changes therewith at personage's vital values.Continue executing with step S822.

Step S822, whether the vital values of our personage are judged all less than 0, if it is, execution step S823.If it is not, execution step S824

Step S823, game over, enemy's triumph.

Step S824, our the personage's node by all vital values more than 0, are added to the child node of " beginning " root node.

Step S825, all node states are set to " dormancy " state, " beginning " root node is set to into present node, continue executing with step S802.

In the embodiment of the present invention, system loads initialize speech control system；Here, all nodes in fighting Lvbu by three heros game are registered in functional node registration table (such as Figure 10-1).Root node " beginning " is set to into present node., " beginning " node is set to into " choosing " state；It is the corresponding UI elements of present node by UI Preferences, present node child node is set to into " to be selected " state, after emptying phonetic order prompting menu, present node instruction set and global control function instruction is added again；Its child nodes search command word is green, and nodal function command word is blueness, and global control function " return " is yellow；Present node instruction set and global control function " return " instruction are set to into voice recognition instruction collection.

Here, begin to control " Guan Yu " execution epicycle action, node " will be started " and be set to " choosing " state, the child node " Guan Yu " of " beginning " node, " Liu is standby ", " Zhang Fei " are set to " state to be selected ".Speech recognition engine is set, voice recognition instruction collection is set to into " Guan Yu ", " Liu is standby ", " Zhang Fei " child node coding line and " return " coding line.Phonetic order prompting menu is emptied, addition menu item " Guan Yu ", " Liu is standby ", " Zhang Fei " child node and " return " overall situation control function point out user currently to send the set of instruction.Its child nodes " Guan Yu ", " Liu is standby ", " Zhang Fei " menu item is green, and global control function " return " menu item is yellow.

Here, after user sends instruction " Guan Yu ", phonetic order is received, and matching is concentrated in the voice recognition instruction for arranging.If " Guan Yu " child node is set to present node by successful match to " Guan Yu " command word, system playback " Guan Yu " command word.If " Zhang Fei " child node is set to present node by erroneous matching to " Zhang Fei " command word, system playback " Zhang Fei ".Now, user has already known that last time has matched the child node " Zhang Fei " of mistake, therefore, send instruction " return ".System is instructed according to " return ", again father node " beginning " is set to into present node.Now user can re-emit " Guan Yu " instruction.The mechanism that this mistake is returned is all suitable for during any node searching, is not repeated in below step.

Here, " Guan Yu " node is set to into " choosing " state, by the child node of " Guan Yu " node, " attack ", " stunt ", " rest " are set to " state to be selected ".Speech recognition engine is set, voice recognition instruction collection is set to into " attack ", " stunt ", " rest " child node coding line and " return " coding line.Empty phonetic order prompting menu, addition menu item " attack ", " stunt ", " rest " child node and " return " overall situation control function, to point out user currently to send the set of instruction.Its child nodes " attack ", " stunt ", " rest " menu item are green, and global control function " return " menu item is yellow.

Here, after user sends instruction " attack ", phonetic order is received, and the voice recognition instruction in middle setting concentrates matching.If " attack " child node is set to present node by successful match to " attack " command word, system playback " attack " command word.

Here, " attack " node is set to into " choosing " state, node of the vital values in " Lv Bu ", " Hua Xiong ", " Dong Zhuo " three nodes more than 0 is added to into the child node of " attack " node.The vital values for assuming " Lv Bu ", " Hua Xiong " and " Dong Zhuo " three nodes are both greater than 0.By the child node " Lv Bu ", " Hua Xiong " and " Dong Zhuo " of " attack " node, it is set to " state to be selected ".Speech recognition engine is set, voice recognition instruction collection is set to into " Lv Bu ", " Hua Xiong " and " Dong Zhuo " coding line and " return " coding line.Empty phonetic order prompting menu, addition menu item " Lv Bu ", " Hua Xiong " and " Dong Zhuo " and " return " overall situation control function, to point out user currently to send the set of instruction.The menu items such as its child nodes " Lv Bu ", " Hua Xiong " and " Dong Zhuo " are green, and global control function " return " menu item is yellow.

After user sends instruction " Lv Bu ", phonetic order is received, and matching is concentrated in the voice recognition instruction for arranging.If successful match is to " Lv Bu " command word, system playback " Lv Bu " command word.Speech recognition engine is set, voice recognition instruction collection is set to into "Yes", "No" and is confirmed coding line.Phonetic order prompting menu is emptied, addition menu item "Yes", "No" confirm coding line, to point out user currently to send the set of instruction.Wherein "Yes" is green, and "No" is redness.Whether audio playback " closes plumage and attacks Lv's cloth ", and waits user speech to instruct.After user sends instruction "Yes", system receives phonetic order, and is confirming instruction set "Yes", matches in "No".If successful match is to "Yes" command word, system perform function " Guan Yu attacks Lv's cloth ".If user sends instruction "No", not perform function.

By " Guan Yu " from root node " from the beginning of " child node set in delete, judge " Lv Bu ", " Hua Xiong ", whether the vital values of " Dong Zhuo " node all less than 0.If it is, game over, we wins.Judge whether our personage of the vital values more than 0 has all performed epicycle action (Liu is standby and Zhang Fei is also not carried out action).All nodes are set to into " dormancy " state, root node " beginning " is set to into present node.

" beginning " node is set to into " choosing " state.By " beginning " node child node, " Liu is standby ", " Zhang Fei " are set to " state to be selected ".Speech recognition engine is set, voice recognition instruction collection is set to, " Liu is standby ", " Zhang Fei " child node coding line and " return " coding line.Phonetic order prompting menu is emptied, addition menu item " Liu is standby ", " Zhang Fei " child node and " return " overall situation control function point out user currently to send the set of instruction.Its child nodes " Liu is standby ", " Zhang Fei " menu item is green, and global control function " return " menu item is yellow.

After user sends instruction " Liu is standby ", phonetic order is received, and the voice recognition instruction for arranging concentrates matching.If " Liu is standby " child node is set to present node by successful match to " Liu is standby " command word, system playback " Liu is standby " command word." Liu is standby " node is set to into " choosing " state.By the child node of " Liu is standby " node, " attack ", " scheme ", " rest " is set to " state to be selected ".Speech recognition engine is set, voice recognition instruction collection is set to into " attack ", " scheme ", " rest " child node coding line and " return " coding line.Empty phonetic order prompting menu, addition menu item " attack ", " scheme ", " rest " child node and " return " overall situation control function, to point out user currently to send the set of instruction.Its child nodes " attack ", " scheme ", " rest " menu item are green, and global control function " return " menu item is yellow.

After user sends instruction " rest ", phonetic order is received, and the voice recognition instruction for arranging concentrates matching.If successful match is to " rest " command word, system playback " rest " command word.Speech recognition engine is set, voice recognition instruction collection is set to into "Yes", "No" confirms coding line.Phonetic order prompting menu is emptied, adds menu item "Yes", "No" confirms coding line, to point out user currently to send the set of instruction.Wherein "Yes" is green, and "No" is redness.Audio playback " whether Liu is standby is rested ", and wait user speech to instruct.After user sends instruction "Yes", system receives phonetic order, and is confirming instruction set "Yes", matches in "No".If successful match is to "Yes" command word, system perform function " the standby rest of Liu ".

By " Liu is standby " from root node " from the beginning of " child node set in delete, judge " Lv Bu ", " Hua Xiong ", whether the vital values of " Dong Zhuo " node all less than 0.If it is, game over, we wins.Judge whether our personage of the vital values more than 0 has all performed epicycle action (Zhang Fei is also not carried out action).All node states are set to into " dormancy ", root node is set to into present node.

(starting to control Zhang Fei's execution epicycle action), by " beginning " node " choosing " state is set to.By " beginning " node child node, " Zhang Fei " is set to " state to be selected ".Speech recognition engine is set, voice recognition instruction collection is set to, " Zhang Fei " child node coding line and " return " coding line.Phonetic order prompting menu, addition menu item " Zhang Fei " child node and " return " overall situation control function are emptied, points out user currently to send the set of instruction.Its child nodes " Zhang Fei " menu item is green, and global control function " return " menu item is yellow.

After user sends instruction " Zhang Fei ", phonetic order is received, and the voice recognition instruction for arranging concentrates matching.If " Zhang Fei " child node is set to present node by successful match to " Zhang Fei " command word, system playback " Zhang Fei " command word." Zhang Fei " node is set to into " choosing " state.By the child node of " Zhang Fei " node, " attack ", " stunt ", " rest " are set to " state to be selected ".Speech recognition engine is set, voice recognition instruction collection is set to into " attack ", " stunt ", " rest " child node coding line and " return " coding line.Empty phonetic order prompting menu, addition menu item " attack ", " stunt ", " rest " child node and " return " overall situation control function, to point out user currently to send the set of instruction.Its child nodes " attack ", " stunt ", " rest " menu item are green, and global control function " return " menu item is yellow.

After user sends instruction " stunt ", phonetic order is received, and matching is concentrated in the voice recognition instruction for arranging.If " stunt " child node is set to present node by successful match to " stunt " command word, system playback " stunt " command word.

" stunt " node is set to into " choosing " state.By the child node of " stunt " node, " when sun is shouted " is set to " state to be selected ".Speech recognition engine is set, voice recognition instruction collection is set to into " when sun is shouted " child node coding line and " return " coding line.Phonetic order prompting menu, addition menu item " when sun is shouted " child node and " return " overall situation control function are emptied, to point out user currently to send the set of instruction.Its child nodes " when sun is shouted " menu item is green, and global control function " return " menu item is yellow.

After user sends instruction " when sun is shouted ", phonetic order is received, and matching is concentrated in the voice recognition instruction for arranging.If successful match is to " when sun is shouted " command word, system playback command word " when sun is shouted ".Speech recognition engine is set, voice recognition instruction collection is set to into "Yes", "No" confirms coding line.Phonetic order prompting menu is emptied, adds menu item "Yes", "No" confirms coding line, to point out user currently to send the set of instruction.Wherein "Yes" is green, and "No" is redness.Audio playback " whether using Zhang Fei's stunt when sun is shouted ", and wait user speech to instruct.After user sends instruction "Yes", system receives phonetic order, and is confirming instruction set "Yes", matches in "No".If successful match is to "Yes" command word, system perform function " Zhang Fei's stunt is shouted when sun ".If user sends instruction "No", not perform function.

By " Zhang Fei " from root node " from the beginning of " child node set in delete, judge " Lv Bu ", " Hua Xiong ", whether the vital values of " Dong Zhuo " node all less than 0.If it is, game over, we wins.Judge whether our personage of the vital values more than 0 has all performed epicycle action (everyone finishes epicycle action executing).Computer control enemy personage performs epicycle action, and our personage is attacked at random.After the completion of enemy's all persons' action, whether the vital values of our personage's node are judged all less than 0, if it is, enemy's triumph, game over.

By we personage node of all vital values more than 0, the child node of " beginning " root node is added to.All node states are set to into " dormancy " state, " beginning " root node is set to into present node；Start the fight of second leg, until a side all persons withdraw from acting, the opposing party wins the victory, game over.

Embodiment 11

Based on aforesaid embodiment, the embodiment of the present invention to be applied to wired home field as a example by illustrate, Figure 11 is the corresponding relation schematic diagram between the wired home father and son's node of the embodiment of the present invention 11, the wired home sound control method of the embodiment of the present invention 11 using program software when being realized, four big steps can be divided into, wherein：

M1 is system initialization, initializes speech control system, and systematic function Node registry table simultaneously registers all functional nodes.

The sound control method can be regarded as Voice command Beijing mail distribution system, and speech recognition engine can adopt Microsoft speech recognition engine Microsoft Speech SDK, and development platform can be Microsoft .net4.0, and development language can adopt C#.With reference to Figure 11, execution step of the Voice command intelligent home system given below to function " opening toilet headlight ".

Step M1, starts；System loads, initialize speech control system, and all nodes in intelligent home system are registered in functional node registration table, and root node " house keeper " is set to into present node.

Here, present node is " house keeper ".

Step M2, present node is set to " choosing " state；

Here, by " house keeper " node child node, " toilet ", " parlor ", " kitchen ", " bedroom ", " study " is set to " state to be selected ", speech recognition engine is set, voice recognition instruction collection is set to into " toilet ", " parlor ", " kitchen ", the child node coding lines such as " bedroom ", " study " and " return " coding line.The child nodes such as phonetic order prompting menu, addition menu item " toilet ", " parlor ", " kitchen ", " bedroom ", " study " and " return " overall situation control function are emptied, points out user currently to send the set of instruction.The menu items such as its child nodes " toilet ", " parlor ", " kitchen ", " bedroom ", " study " are green, and global control function " return " menu item is yellow.

Specifically, after user sends instruction " toilet ", system receives phonetic order, and the voice recognition instruction arranged in step M2 concentrates matching.If " toilet " child node is set to present node by successful match to " toilet " command word, system playback " toilet " command word.If " study " child node is set to present node by erroneous matching to " study " command word, system playback " study ".Continue executing with M2, M3 step.Now, user has already known that last time has matched the child node " study " of mistake, therefore, send instruction " return ".System is instructed according to " return ", again father node " house keeper " is set to into present node, and continues executing with M2, M3 step.Now user can re-emit " toilet " instruction.The mechanism that this mistake is returned is all suitable for during any node searching, repeats no more in below step.

Here, user sends successively phonetic order, toilet to system>Headlight>Open, control system receive user phonetic order repeats M2, M3 step, until searching node "ON", when present node is "ON", it is application function that M307 steps detect present node, continues executing with step M4.

Step M4, the corresponding application function of execution present node；

Specifically, speech recognition engine is set, voice recognition instruction collection is set to into "Yes", "No" confirms coding line.Phonetic order prompting menu is emptied, adds menu item "Yes", "No" confirms coding line, to point out user currently to send the set of instruction.Wherein "Yes" is green, and "No" is redness.Whether audio playback " opens toilet headlight ", and waits user speech to instruct.After user sends instruction "Yes", system receives phonetic order, and is confirming instruction set "Yes", matches in "No".If to "Yes" command word, system perform function " opens toilet headlight " to successful match, and continues executing with step M2.If user sends instruction "No", not perform function, direct execution step M2.

Embodiment 12

Based on aforesaid embodiment, the embodiment of the present invention to be applied to fighter plane Voice command field as a example by illustrate, Figure 12 is the corresponding relation schematic diagram between the fighter plane Voice command father and son's node of the embodiment of the present invention 12, the fighter plane sound control method of the embodiment of the present invention 12 using program software when being realized, four big steps can be divided into, wherein：

The sound control method can be regarded as Voice command Beijing mail distribution system, and speech recognition engine can adopt Microsoft speech recognition engine Microsoft Speech SDK, and development platform can be Microsoft .net4.0, and development language can adopt C#.

When the situation of high acceleration or aircraft big bump occurs, fighter-pilot controls manually extremely difficult, especially in the case where exception is urgent, fighter plane is out of hand, driver needs to start ejector seat escape, in the case of control difficulty manually, " ejection ejector seat " is realized by phonetic order, be one and select well.

With reference to Figure 12, execution step of the Voice command fighter plane control system given below to function " ejection ejector seat ".

Step M1, starts；System loads, initialize speech control system, and all nodes in intelligent home system are registered in functional node registration table, and root node " fighter plane " is set to into present node.

Here, present node is " fighter plane ".

Step M2, present node is set to " fighter plane " state；

Here, by " fighter plane " node child node, " fire control ", " flying control ", " system " is set to " state to be selected ", arranges speech recognition engine, voice recognition instruction collection is set to into " fire control ", " fly control ", the child node such as " system " coding line and " return " coding line.Phonetic order prompting menu, addition menu item " fire control ", " flying control ", the child node such as " system " and " return " overall situation control function are emptied, points out user currently to send the set of instruction.Its child nodes " fire control ", " flying control ", menu item such as " system " is green, and global control function " return " menu item is yellow.

Specifically, after user sends instruction " system ", system receives phonetic order, and the voice recognition instruction arranged in step M2 concentrates matching.If " system " child node is set to present node by successful match to " system " command word, system playback " system " command word.If " fire control " child node is set to present node by erroneous matching to " fire control " command word, system playback " fire control ".Continue executing with M2, M3 step.Now, user has already known that last time has matched the child node " fire control " of mistake, therefore, send instruction " return ".System is instructed according to " return ", again father node " fighter plane " is set to into present node, and continues executing with M2, M3 step.Now user can re-emit " system " instruction.The mechanism that this mistake is returned is all suitable for during any node searching, repeats no more in below step.

Here, user sends successively phonetic order, system to system>Ejector seat>Eject, control system receive user phonetic order repeats M2, M3 step, until searching node " ejection ", when present node is " ejection ", it is application function that M307 steps detect present node, continues executing with step M4

Step M4, the corresponding application function of execution present node；

Specifically, speech recognition engine is set, voice recognition instruction collection is set to into "Yes", "No" confirms coding line.Phonetic order prompting menu is emptied, adds menu item "Yes", "No" confirms coding line, to point out user currently to send the set of instruction.Wherein "Yes" is green, and "No" is redness.Whether audio playback " ejects ejector seat ", and waits user speech to instruct.After user sends instruction "Yes", system receives phonetic order, and is confirming instruction set "Yes", matches in "No".If successful match continues executing with step M2 to "Yes" command word, system perform function " ejection ejector seat ".If user sends instruction "No", not perform function, direct execution step M2.

Embodiment 13

Based on aforesaid embodiment, the embodiment of the present invention provides again a kind of phonetic controller, each unit included by the device, each module included by each unit, and each submodule included by each module, can be realized by the processor in electronic equipment；Certainly also can be realized by specific logic circuit；During specific embodiment, processor can be central processing unit (CPU), microprocessor (MPU), digital signal processor (DSP) or field programmable gate array (FPGA) etc..

Figure 13 is the composition structural representation of the phonetic controller of the embodiment of the present invention 13, and as shown in figure 13, the device 1300 includes：

First acquisition unit 1301, for obtaining the first isolated word, first isolated word is voice word；

First judging unit 1302, for judge first isolated word whether with default first Keywords matching, obtain the first judged result, first key word is any one key word in the keyword set of primary nodal point；

Second acquisition unit 1303, for when first judged result shows first isolated word with the first Keywords matching, default second keyword set is obtained according to first key word, second keyword set is combined into the keyword set of secondary nodal point, and the secondary nodal point is the child node of the primary nodal point；

3rd acquiring unit 1304, for obtaining the second isolated word；

Second judging unit 1305, for judge second isolated word whether with the second Keywords matching, obtain the second judged result, second key word is any one key word in the second keyword set；

Performance element 1306, for when second judged result shows second isolated word with the second Keywords matching, performing the corresponding instruction of second key word.

In the embodiment of the present invention, described device also includes the first output unit, for when first judged result shows first isolated word with the first crucial word mismatch, exporting the first information, first information is used for prompting input speech signal again.

It need to be noted that be：The description of apparatus above embodiment, is similar with the description of said method embodiment, with the similar beneficial effect of same embodiment of the method, therefore is not repeated.For the ins and outs not disclosed in apparatus of the present invention embodiment, refer to the description of the inventive method embodiment and understand, be to save length, therefore repeat no more.

Embodiment 14

The device includes first acquisition unit, the first judging unit, second acquisition unit, the 3rd acquiring unit, the second judging unit and performance element, wherein the first acquisition unit includes acquisition module, judge module and the first determining module, wherein：

Acquisition module, for gathering the voice signal that user sends using voice collecting part；

Judge module, for judging whether the voice signal is isolated word, obtains the 3rd judged result；

First determining module, for when it is isolated word that the 3rd judged result shows the voice signal, the voice signal being defined as into the first isolated word；

First judging unit, for judge first isolated word whether with default first Keywords matching, obtain the first judged result, first key word is any one key word in the keyword set of primary nodal point；

Second acquisition unit, for when first judged result shows first isolated word with the first Keywords matching, default second keyword set is obtained according to first key word, second keyword set is combined into the keyword set of secondary nodal point, and the secondary nodal point is the child node of the primary nodal point；

3rd acquiring unit, for obtaining the second isolated word；

Second judging unit, for judge second isolated word whether with the second Keywords matching, obtain the second judged result, second key word is any one key word in the second keyword set；

Performance element, for when second judged result shows second isolated word with the second Keywords matching, performing the corresponding instruction of second key word.

Embodiment 15

The device includes first acquisition unit, the first judging unit, second acquisition unit, the 3rd acquiring unit, the second judging unit and performance element, wherein described first acquisition unit includes acquisition module, judge module, the first determining module, extraction module and the second determining module, wherein：

Extraction module, for when it is not isolated word that the 3rd judged result shows the voice signal, voice word being extracted from the voice signal according to default first rule；

Second determining module, for the voice word for extracting to be defined as into the first isolated word；

3rd acquiring unit, for obtaining the second isolated word；

In the embodiment of the present invention, second determining module, for when the quantity of the voice word for extracting is equal to 1, the voice word for extracting being defined as into the first isolated word；When the quantity of the voice word for extracting is more than 1, each described voice word for extracting all is defined as into the first isolated word.

In the embodiment of the present invention, second determining module includes：

Judging submodule, for when the quantity of the voice word for extracting is more than 1, judging whether there are father and son's node relationships between the voice word for extracting, obtains the 4th judged result；

First determination sub-module, for when the 4th judged result shows to have father and son's node relationships between the voice word for extracting, the first isolated word will to be defined as the corresponding key word of child node.

Output sub-module, for when the quantity of the voice word for extracting is more than 1, exporting the second information, second information is used to point out user to select a voice word from the voice word for extracting；

Acquisition submodule, for obtaining first operation of user, described first operates for selecting a voice word from the voice word for extracting；

Second determining module, for based on the described first operation, the voice word of selection being defined as into the first isolated word.

Embodiment 16

The device includes first acquisition unit, the first judging unit, second acquisition unit, the 3rd acquiring unit, the second judging unit and performance element, wherein the 3rd acquiring unit, including output module, the first acquisition module and the second acquisition module, wherein：

First acquisition unit, for obtaining the first isolated word, first isolated word is voice word；

Output module, for exporting second keyword set in key word, and export the 4th information, one of the 4th information is used to pointing out user to select from second keyword set, and key word is defined as the second isolated word；

First acquisition module, for obtaining the 3rd operation of user, the 3rd operation is that the response to the 4th information is operated；

Second acquisition module, for obtaining the second isolated word based on the described 3rd operation.

Embodiment 17

The device includes first acquisition unit, the first judging unit, second acquisition unit, the second output unit, the 4th acquiring unit, the 3rd acquiring unit, the second judging unit, performance element and the 3rd output unit, wherein：

Second output unit, for when first judged result shows first isolated word and the first Keywords matching, exporting the 3rd information, the 3rd information is used to pointing out the voice signal whether the first key word of user's identification is user input；

4th acquiring unit, for obtaining second operation of user, second operation is that the response to the 3rd information is operated；When determining the voice signal that the first key word is user input based on the described second operation, the 3rd acquiring unit is triggered；

3rd acquiring unit, for obtaining the second isolated word；

3rd output unit, during for not being the voice signal of user input based on described second the first key word of operation determination, exports the first information.

Embodiment 18

Based on aforesaid embodiment, the embodiment of the present invention provides again a kind of voice control device, and the equipment includes memorizer and processor, wherein：

Memorizer, for storing the first keyword set and the second keyword set；

Processor, is used for：The first isolated word is obtained, first isolated word is voice word；Judge first isolated word whether with default first Keywords matching, obtain the first judged result, first key word is any one key word in the keyword set of primary nodal point；

When first judged result shows first isolated word with the first Keywords matching, default second keyword set is obtained according to first key word, second keyword set is combined into the keyword set of secondary nodal point, and the secondary nodal point is the child node of the primary nodal point；Obtain the second isolated word；Judge second isolated word whether with the second Keywords matching, obtain the second judged result, second key word is any one key word in the second keyword set；

It need to be noted that be：Above electronic equipment implements the description of item, is similar with said method description, with same embodiment of the method identical beneficial effect, therefore does not repeat.It is to save length for the ins and outs not disclosed in electronic equipment embodiment of the present invention, those skilled in the art refer to the description of the inventive method embodiment and understand, repeats no more here.

Sound control method, device and equipment that above example of the present invention is provided, make use of the technology of isolated word recognition plus feedback acknowledgment.The speech control system of existing complex device is all based on keyword spotting technology, because everyone voice and word speed difference are very big, and the complexity of semantic analysis, cause kind of control mode error rate high, and prior art is a kind of control mode of open loop, system mistake cannot be avoided to understand user instruction, perform wrong function.

And add the technology of confirmation in various embodiments above of the present invention using isolated word recognition, when user searches a new node or before One function is performed, can all receive system validation information.If system feedback is expected not being inconsistent with user, then user may be selected " return " superior node or select not perform function, it can be seen that provided in an embodiment of the present invention is a kind of control mode of closed loop, therefore recognition accuracy is improve, also completely eliminate the probability of maloperation.Illustrate by taking mobile phone assistant as an example below, Figure 14 is control system figure of the mobile phone speech assistant at " playing the song rescue of Sun Nan " in prior art, Figure 15 is control system figure of the sound control method provided in an embodiment of the present invention at " playing the song rescue of Sun Nan ", as shown in Figure 14 and Figure 15, in fig. 14, mobile phone assistant only receives a user instruction " playing the song rescue of Sun Nan ".User instruction is generated after speech recognition, keyword spotting and is performed.Can instruction correctly parse the speech recognition capabilities for depending entirely on voice assistant, regrettably, current various speech recognition engines are low to mankind's ordinary language recognition accuracy, and affected by environment big, therefore cause the error rate height of this open-loop control system, reliability low.

In Figure 15, multiway tree speech control system is avoided using the jejune technology of field of speech recognition, using isolated word recognition plus the mode of feedback acknowledgment, the control instruction of a complexity is decomposed into into " song ", " Sun Nan ", multiple rate-determining steps such as " rescue ".Greatly improve the accuracy rate of instruction identification, it is entirely avoided the maloperation of system.

In sum, various embodiments of the present invention have following technique effect：

1) accuracy of control instruction identification；

Because the number of child nodes using positioning mode progressive step by step, each functional node is limited, number of child nodes is fewer, instructs the accuracy of identification higher.Maximum number of child nodes is controlled within 30, and increases the difference of child node audio instructions as far as possible, it is possible to ensure the high discrimination of phonetic order close 100%.And multiway tree voice command system uses isolated word recognition technology, without the need for carrying out semantic analysis, fundamentally avoids the erroneous judgement to user view.

2) person of issuing an order such as user is expected the knowability with control object state, essentially eliminates the probability of maloperation

The information of each node is known in multiway tree system, in the search procedure of destination node, often reaches a new functional node, and voice system can feed back the new information for reaching node, for example job sequence " system arranges->Show->In brightness-＞ 80 ", present node is " display ", and when user sends password " brightness ", voice system jumps to brightness node after correctly recognizing, and sends confirmation voice " brightness ".If incorrect identification instruction, system can be pointed out " please say it again ".If identification mistake jumps to the child node of mistake, such as " saturation ".User's meeting " return ", and re-emit instruction " brightness ".By way of progressive search, user can with 100% determine systematic search to function be exactly expectation function.And the confirmation link when function is finally performed, essentially eliminate system misrecognition and perform the probability of wrong function.

3) it is accurately positioned tiny control targe；

By way of increasing instruction chain length, system can be to small control targe iterative method, final lock onto target.Multiway tree family of languages system reasonable in design can accurately control in theory arbitrarily tiny target.

4) present instruction prompting；

Command cue menu or node correspondence UI elements change with node state and update, and point out user current optional instruction set, facilitate user quickly to select next step to operate.

In sum, various embodiments of the present invention are provided method, device and equipment, eliminate use obstacle of the Voice command in complexity, intelligence, precision equipment control, enable people to the Control complex systems of, low error accurate by voice.

It should be understood that " one embodiment " or " embodiment " that description is mentioned in the whole text means that the special characteristic relevant with embodiment, structure or characteristic are included at least one embodiment of the present invention.Therefore, " in one embodiment " or " in one embodiment " for occurring everywhere in entire disclosure not necessarily refers to identical embodiment.Additionally, these specific feature, structure or characteristics can be combined in any suitable manner in one or more embodiments.It should be understood that, in various embodiments of the present invention, the size of the sequence number of above-mentioned each process is not meant to the priority of execution sequence, and the execution sequence of each process should be determined with its function and internal logic, and should not constitute any restriction to the implementation process of the embodiment of the present invention.The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.

It should be noted that, herein, term " including ", "comprising" or its any other variant are intended to including for nonexcludability, so that a series of process, method, article or device including key elements not only includes those key elements, but also including other key elements being not expressly set out, or also include the key element intrinsic for this process, method, article or device.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that also there is other identical element in the process including the key element, method, article or device.

In several embodiments provided herein, it should be understood that disclosed apparatus and method, can realize by another way.Apparatus embodiments described above are only schematic, and for example, the division of the unit, only a kind of division of logic function can have other dividing mode, such as when actually realizing：Multiple units or component can be combined, or be desirably integrated into another system, or some features can be ignored, or not performed.In addition, the coupling each other of shown or discussed each ingredient or direct-coupling or communication connection can be INDIRECT COUPLING or the communication connections by some interfaces, equipment or unit, can be electrical, machinery or other forms.

The above-mentioned unit as separating component explanation can be or may not be physically separate, can be as the part that unit shows or may not be physical location；Both a place had been may be located at, it is also possible to be distributed on multiple NEs；Part or all of unit therein can according to the actual needs be selected to realize the purpose of this embodiment scheme.

In addition, each functional unit in various embodiments of the present invention can be fully integrated in a processing unit, or each unit is individually as a unit, it is also possible to which two or more units are integrated in a unit；Above-mentioned integrated unit both can be realized in the form of hardware, it would however also be possible to employ hardware adds the form of SFU software functional unit to realize.

One of ordinary skill in the art will appreciate that：Realizing all or part of step of said method embodiment can be completed by the related hardware of programmed instruction, and aforesaid program can be stored in computer read/write memory medium, and the program upon execution, performs the step of including said method embodiment；And aforesaid storage medium includes：Movable storage device, read only memory (Read Only Memory, ROM), magnetic disc or CD etc. are various can be with the medium of store program codes.

Or, if the above-mentioned integrated unit of the present invention is realized and as independent production marketing or when using using in the form of software function module, it is also possible in being stored in a computer read/write memory medium.Based on such understanding, the part that the technical scheme of the embodiment of the present invention substantially contributes in other words to prior art can be embodied in the form of software product, the computer software product is stored in a storage medium, including some instructions are used so that computer equipment (can be personal computer, server or network equipment etc.) performs all or part of each embodiment methods described of the invention.And aforesaid storage medium includes：Movable storage device, ROM, magnetic disc or CD etc. are various can be with the medium of store program codes.

The above; specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, any those familiar with the art the invention discloses technical scope in; change or replacement can be readily occurred in, all should be included within the scope of the present invention.Therefore, protection scope of the present invention should be defined by the scope of the claims.

Claims

1. a kind of sound control method, it is characterised in that methods described includes：

The first isolated word is obtained, first isolated word is voice word；

Judge first isolated word whether with default first Keywords matching, obtain the first judged result, described first is crucial Word is any one key word in the keyword set of primary nodal point；

When first judged result shows first isolated word with the first Keywords matching, obtained according to first key word Default second keyword set is taken, second keyword set is combined into the keyword set of secondary nodal point, and the secondary nodal point is The child node of the primary nodal point；

Obtain the second isolated word；

Judge second isolated word whether with the second Keywords matching, obtain the second judged result, second key word is the Any one key word in two keyword sets；

When second judged result shows second isolated word with the second Keywords matching, second key word pair is performed The instruction answered.

2. method according to claim 1, it is characterised in that methods described also includes：

When first judged result shows first isolated word with the first crucial word mismatch, the first information is exported, First information is used for prompting input speech signal again.

3. method according to claim 1, it is characterised in that methods described also includes：

When first judged result shows first isolated word with the first Keywords matching, first key word pair is performed The instruction answered.

4. method according to claim 1, it is characterised in that before the corresponding instruction of second key word is performed, Methods described also includes：

Output inquiry message, the inquiry message is used to ask the user whether to perform the corresponding instruction of second key word；

Obtain response message of the user to the inquiry message；

Determined whether to perform the corresponding instruction of second key word according to the response message.

5. method according to claim 1, it is characterised in that the isolated word of the acquisition first, including：

The voice signal that user sends is gathered using voice collecting part；

Judge whether the voice signal is isolated word, obtain the 3rd judged result；

When it is isolated word that the 3rd judged result shows the voice signal, the voice signal is defined as into the first isolated word.

6. the method according to any one of claim 1 to 5, it is characterised in that the isolated word of the acquisition second, including：

The key word in second keyword set is exported, and exports the 4th information, the 4th information is used to carry Show that user selects one of them key word to be defined as the second isolated word from second keyword set；

The 3rd operation of user is obtained, the 3rd operation is that the response to the 4th information is operated；

Second isolated word is obtained based on the described 3rd operation.

7. the method according to any one of claim 1 to 5, it is characterised in that methods described also includes：

When first judged result shows first isolated word with the first Keywords matching, the 3rd information, institute are exported Whether the first key word for stating the 3rd information for pointing out user's identification is the voice signal of user input；

Second operation of user is obtained, second operation is that the response to the 3rd information is operated；

When determining the voice signal that the first key word is user input based on the described second operation, the second isolated word is obtained.

8. method according to claim 7, it is characterised in that methods described also includes：

Based on described second operation determine the first key word for user input voice signal when, export the first information, institute The first information is stated for pointing out user to re-enter voice signal.

9. a kind of phonetic controller, it is characterised in that described device includes：

First judging unit, for judge first isolated word whether with default first Keywords matching, obtain the first judgement As a result, first key word is any one key word in the keyword set of primary nodal point；

Second acquisition unit, for when first judged result shows first isolated word and the first Keywords matching, root Default second keyword set is obtained according to first key word, second keyword set is combined into the keyword set of secondary nodal point Close, the secondary nodal point is the child node of the primary nodal point；

3rd acquiring unit, for obtaining the second isolated word；

Performance element, for when second judged result shows second isolated word with the second Keywords matching, performing institute State the corresponding instruction of the second key word.

10. a kind of voice control device, it is characterised in that the equipment includes：

Memorizer, for storing the first keyword set and the second keyword set；

Processor, is used for：

The first isolated word is obtained, first isolated word is voice word；

Obtain the second isolated word；