CN105261366A

CN105261366A - Voice identification method, voice engine and terminal

Info

Publication number: CN105261366A
Application number: CN201510548423.1A
Authority: CN
Inventors: 王维平
Original assignee: Nubia Technology Co Ltd
Current assignee: Nubia Technology Co Ltd
Priority date: 2015-08-31
Filing date: 2015-08-31
Publication date: 2016-01-20
Anticipated expiration: 2035-08-31
Also published as: CN105261366B

Abstract

The invention discloses a voice identification method, a voice engine and a terminal. The voice identification method comprises: a voice instruction is received; whether a network is normal is determined; if so, the voice instruction is sent to a cloud voice engine to carry out identification; and an identification result returned by the cloud voice engine is received and a local voice base of a local voice engine is updated according to the identification result. With the voice identification method, when the network is normal, the cloud voice engine is used for identifying the voice instruction and the voice instruction is added to the local grammar base; and when the network is abnormal, the local grammar base with the added voice instruction is used for carrying out voice instruction identification. Therefore, the identification accuracy and efficiency of the local voice engine are improved.

Description

Audio recognition method, speech engine and terminal

Technical field

The present invention relates to field of speech recognition, more particularly, relate to a kind of audio recognition method, speech engine and terminal.

Background technology

Along with the significant increase of speech recognition technology, people more and more bring into use speech recognition technology, voice assistant, intelligent robot, Smart Home etc. on such as mobile phone.

In order to improve the accuracy of speech recognition, in prior art, corresponding speech engine is set beyond the clouds.The speech engine in high in the clouds can by analyzing large data and utilizing various resource thus identify instruction and the meaning of user accurately.Such as, the siri of apple is exactly the speech engine using high in the clouds completely, and its recognition accuracy is very high.

When utilizing high in the clouds to carry out speech recognition, local device needs to be connected with cloud device by network, mutual to realize.When there is not having network or the slow situation of network speed, because local device cannot be connected to cloud device, thus the speech identifying function in high in the clouds normally can not be used.This causes very large puzzlement to user.

On the other hand, also speech recognition is realized by the local speech engine of local device.It can overcome does not have network or the inconvenience that situation causes to user such as network speed is slow, but the restriction of the resources such as storage space, CPU and internal memory is limited to due to local speech engine, very simple phonetic order can only be identified, the instruction that None-identified is comparatively complicated, there is significant limitation, the demand of user can not be met.

Therefore, prior art existing defects, needs to improve.

Summary of the invention

The technical problem to be solved in the present invention is, for the above-mentioned defect of prior art, provides a kind of audio recognition method, speech engine and terminal.

The technical solution adopted for the present invention to solve the technical problems is: construct a kind of audio recognition method, comprising:

Receive phonetic order;

Judge that whether current network is normal, if normally, then send to high in the clouds speech engine to identify described phonetic order;

Receive the recognition result that high in the clouds speech engine returns, and according to described recognition result, the local syntax library of local speech engine is upgraded.

Wherein, the described local syntax library to local speech engine carries out renewal and comprises:

According to recognition result, carry out instruction classification;

Determine whether the instruction of preset kind based on instruction classification result, if so, then syntax parsing is carried out to described phonetic order, and extract grammer key message;

Judge whether the grammer key message of described extraction meets pre-conditioned, if meet, then described phonetic order is increased to described local syntax library according to after preset format conversion.

Wherein, describedly pre-conditionedly to comprise: the frequency of utilization of grammer exceedes preset value;

Describedly judge whether the grammer key message of described extraction meets pre-conditioned comprising:

According to the grammer key message of described extraction, judge its grammer whether belonging to frequency of utilization and exceed preset value.

Wherein, described method also comprises:

After receiving phonetic order, carry out speech recognition to obtain recognition result according to described local syntax library.

Wherein, described method also comprises: perform corresponding operation according to described recognition result.

Wherein, described method also comprises:

Preset a time period;

In described preset time period, according to the recognition result that high in the clouds speech engine returns, the local syntax library of described local speech engine is upgraded.

Wherein, described method also comprises:

When described preset time period then, from described local syntax library, delete upgraded phonetic order.

On the other hand, a kind of speech engine is provided, comprises:

Speech reception module, for receiving the phonetic order of user's input;

Judge module, for judging that whether current network is normal, if normally, then sends to high in the clouds speech engine to identify described phonetic order;

Receiver module, for receiving the recognition result that high in the clouds speech engine returns;

Update module, for according to described recognition result, upgrades syntax library.

Wherein, described update module comprises:

Taxon, for according to recognition result, carries out instruction classification;

Resolution unit, for determining whether the instruction of preset kind based on instruction classification result, if so, then carries out syntax parsing to described phonetic order, and extracts grammer key message;

Judging unit, for judging whether the grammer key message of described extraction meets pre-conditioned, if meet, is then increased to described syntax library by described phonetic order according to after preset format conversion.

The third aspect, provides a kind of terminal, comprises above-mentioned speech engine.

Implement audio recognition method of the present invention, speech engine and terminal, there is following beneficial effect: when network is normal, utilize high in the clouds speech engine to carry out the identification of phonetic order, and phonetic order is increased to local syntax library; When Network Abnormal, then utilize the local syntax library adding phonetic order to carry out phonetic order identification, improve recognition correct rate and the efficiency of local speech engine.By upgrading the study of local syntax library, increasing the phonetic order scope that local speech engine can identify, improving Consumer's Experience.

Accompanying drawing explanation

Below in conjunction with drawings and Examples, the invention will be further described, in accompanying drawing:

Fig. 1 is the process flow diagram of the audio recognition method of the embodiment of the present invention;

Fig. 2 is the identification process figure of the high in the clouds speech engine of the embodiment of the present invention;

Fig. 3 is the detail flowchart of the step S3 of the audio recognition method of the embodiment of the present invention;

Fig. 4 is the schematic flow sheet that the local speech engine of the embodiment of the present invention carries out identifying;

Fig. 5 is the process flow diagram of the audio recognition method of another embodiment of the present invention;

Fig. 6 is the structural representation of the speech engine of the embodiment of the present invention;

Fig. 7 is the schematic diagram being carried out speech recognition by the speech engine of the embodiment of the present invention;

Fig. 8 is the identifying schematic diagram of the intelligent learning module of the embodiment of the present invention;

Fig. 9 is the terminal of the embodiment of the present invention and the communication scheme of high in the clouds speech engine.

Embodiment

The circumscribed problem of network condition and local speech engine is limited to for solving audio recognition method of the prior art, the embodiment of the present invention provides a kind of audio recognition method, speech engine and terminal, by the recognition result of high in the clouds speech engine is increased to local syntax library, carry out study to local syntax library to upgrade, increase the phonetic order scope that local speech engine can identify, achieve the recognition correct rate and efficiency that improve local speech engine, speech recognition is made no longer to be limited to the limitation of network condition and local speech engine, improve the technique effect of Consumer's Experience.

The Integral Thought that the embodiment of the present invention solves the problems of the technologies described above is: receive phonetic order; Judge that whether current network is normal, if normally, then sent to by phonetic order high in the clouds speech engine to identify; Receive the recognition result that high in the clouds speech engine returns, and according to recognition result, the local syntax library of local speech engine is upgraded; If Network Abnormal, then local speech engine is utilized to carry out speech recognition.

In order to there be understanding clearly to technical characteristic of the present invention, object and effect, now contrast accompanying drawing and describe the specific embodiment of the present invention in detail.

It is the process flow diagram of the audio recognition method of the embodiment of the present invention see Fig. 1.The audio recognition method of the embodiment of the present invention comprises the following steps:

S1, reception phonetic order.

Concrete, the collection of phonetic order is carried out by voice acquisition device (such as, microphone).Phonetic order is that user sends, and such as, phones Xiao Ming, connects wifi, opens camera etc.

S2, judge that current network is whether normal, if normally, then send to high in the clouds speech engine to identify described phonetic order.

Concrete, by being connected to high in the clouds speech engine for networks such as wifi, 3G, 4G, GPRS.Can judge that whether network is normal by the signal quality etc. detecting interconnection network.If disconnection occurs network or network speed crosses the situations such as slow, then judged result is network is abnormal.

High in the clouds speech engine is a server in fact, and it has very strong computing power and syntax library.In an embodiment of the present invention, if network is normal, then carry out phonetic order identification by high in the clouds speech engine, thus, improve the accuracy rate identified.Being the identification process figure of the high in the clouds speech engine of the embodiment of the present invention see Fig. 2, when receiving phonetic order, after the steps such as grammatical analysis, large data analysis, feature extraction, pattern match, arithmetic analysis, result examination, exporting recognition result successively.The identifying that should be understood that high in the clouds speech engine is prior art, and the embodiment of the present invention is not described in detail this.

The recognition result that S3, reception high in the clouds speech engine return, and according to recognition result, the local syntax library of local speech engine is upgraded.

Concrete, according to the recognition result that high in the clouds speech engine returns, identifiable design goes out the phonetic order of user, performs corresponding operation thus.Such as, phonetic order " phones Xiao Ming ", according to the recognition result returned, can carry out the operation of dialling to " Xiao Ming ".

In an embodiment of the present invention, except performing except corresponding operation according to recognition result, also perform the operation upgrading local syntax library, the phonetic order identified by this is increased to local syntax library, thus, when to receive identical phonetic order next time, accurate identification can be realized by local syntax library.Should understand, when phonetic order is increased to local syntax library, need to carry out according to the preset format of local syntax library, such as, only the subject of storaged voice instruction and predicate are (such as, I will phone) or only store the modes such as predicate (such as, making a phone call) and store.

See Fig. 3, above-mentioned steps S3 specifically comprises:

S31, according to recognition result, carry out instruction classification.

In an embodiment of the present invention, instruction type comprises: make a phone call, inquire about, search for, take pictures, clear up mobile phone etc.Thus, recognition result corresponding for each phonetic order is unique an instruction type can be divided into.By instruction classification, better can realize identifying, and also provide reference to the need of being updated to local syntax library.Such as, phonetic order " please open camera ", the recognition result of " startup is taken pictures " is all taken pictures for opening camera, this instruction type that it all belongs to " taking pictures ".

S32, determine whether the instruction of preset kind based on instruction classification result, if so, then syntax parsing is carried out to phonetic order, and extract grammer key message.

In an embodiment of the present invention, the instruction of preset kind can be the instruction that user often uses or the instruction arranged voluntarily by user, and such as, the instruction of preset kind can comprise: make a phone call, take pictures, inquiry etc.In order to raise the efficiency, after carrying out instruction classification, judge that whether it is the instruction of preset kind, if not, then flow process terminates no longer to carry out follow-up process, namely instruction is not increased to the operation of local syntax library, thus, the instruction being increased to local syntax library can be made to be the instruction useful to user, not only can to save storage efficiency and also can improve Consumer's Experience.Such as, user seldom uses counter, and the instruction of preset kind does not comprise opens counter, when phonetic order is for " opening counter ", it does not belong to the instruction of preset kind, therefore, no longer carries out follow-up syntax parsing, extracts the operations such as grammer key message.

Syntax parsing and extraction key message are the filtrations to phonetic order, such as, phonetic order " I thinks opened bluetooth function ", " I wants opened bluetooth function ", " opening bluetooth ", the recognition result of " opened bluetooth function ", " please open bluetooth " etc. is all the operation of opening bluetooth.Therefore, by syntax parsing, identical grammer key message can be drawn into and " open bluetooth ".By extracting grammer key message, it is more accurate to make follow-up judgement and be increased to the operation of local syntax library.

Should be understood that syntax parsing realizes by sentence trunk extraction method, such as, realize syntax parsing by the analysis of " SVO is determined shape and mended ", if phonetic order is " I wants opened bluetooth function ", then subject is " I ", predicate is " opening ", and object is " bluetooth ".In addition, the mode also by other realizes syntax parsing, does not invent and is not restricted this.

Extracting key message is based on the result of syntax parsing, if syntax parsing is for realize by sentence trunk extraction method, then predicate wherein and object can be extracted as key message, for above-mentioned example, the key message of extraction is " opening bluetooth ".In addition, also SVO or object etc. can be extracted as key message, the present invention is not restricted this.According to pairing comparision

S33, judge whether the grammer key message of described extraction meets pre-conditioned, if meet, then described phonetic order is increased to described local syntax library according to after preset format conversion.

After being drawn into grammer key message, judge whether it meets according to grammer key message pre-conditioned.Pre-conditionedly can be, such as, the frequency of utilization of grammer exceedes preset value.Thus, judge according to grammer key message the grammer whether phonetic order belongs to frequency of utilization and exceed preset value.If exceed, then the frequency of utilization of this phonetic order is high, then perform the operation being increased to local syntax library, otherwise, be not increased to local syntax library.Thus, the storage space of local syntax library can be saved, and the real grammer useful to user can be increased, can Consumer's Experience be promoted.

In an embodiment of the present invention, the pre-conditioned mode of frequency of utilization and function combination that adopts pre-sets and stores, such as, for the operation of " making a phone call " (i.e. aforesaid function), according to the frequency of utilization of statistics, the frequency of utilization of following grammer exceedes preset value (such as, 10 times): " phoning ", " calling ", " dialing ", " dial to ", " calling out ", " making a phone call to ", then can as shown in table 1ly carry out arranging and storing about the pre-conditioned of " making a phone call " this function.

Table 1

After being drawn into grammer key message, by question blank 1, then can judge whether corresponding phonetic order belongs to the one in above-mentioned grammer.If belong to, carry out the operation being increased to local syntax library.When phonetic order is increased to local syntax library, the form according to presetting is needed to change phonetic order.This form preset is the discernible form of local speech engine, such as, or not object, interjection etc. during storage.And for avoiding repeated storage, when storing, the grammer of repetition is deleted.

Below with reference to the example of concrete " making a phone call ", the audio recognition method of the embodiment of the present invention is described.

See as follows be the grammatical form of the local syntax library of the local speech engine of the embodiment of the present invention:

<CallCmd>；

！tag(CALLCMD_TAG,

" phone "! Id (1001) |

" make a phone call to "! Id (1002) |

" call "! Id (1003) |

" dial to "! Id (1004) |

" calling "! Id (1005) |

" call out "! Id (1006) |

)；

According to local syntax library, when user needs to carry out the operation of making a phone call, its phonetic order of saying is " phoning Xiao Ming ", " making a phone call to Xiao Ming ", " calling Xiao Ming " etc., and local speech engine all can successfully identify, and return recognition result, carry out the operation of phoning Xiao Ming.If phonetic order is " I wants to phone Xiao Ming ", then can not realize because local syntax library does not define corresponding grammer.According to the audio recognition method of the embodiment of the present invention, phonetic order " I wants to phone Xiao Ming " is sent to high in the clouds speech engine.High in the clouds speech engine obtains the operation of recognition result-make a phone call by identifying, and recognition result (operation of making a phone call) is returned, thus, the operation of making a phone call can be performed, and after above-mentioned steps S31-S33, this phonetic order will be added in local syntax library and (store with the form not comprising object), and the local syntax library after increase is as follows:

<CallCmd>；

！tag(CALLCMD_TAG,

" phone "! Id (1001) |

" make a phone call to "! Id (1002) |

" call "! Id (1003) |

" dial to "! Id (1004) |

" calling "! Id (1005) |

" call out "! Id (1006) |

" I wants to phone "! Id (1007) |

)；

Thus, local syntax library adds a grammer, when identifying next time, does not identify even without network by high in the clouds speech engine, also can recognize the phonetic order of " I wants to phone Xiao Ming " quickly and accurately.

By the audio recognition method of the embodiment of the present invention, user uses high in the clouds to carry out speech recognition when there being network; When using local speech engine without when network.Because local speech engine is according to the recognition result in high in the clouds, constantly " enlarging one's knowledge ", local speech engine can identify large amount of complex instruction, reaches self-teaching, has the object of memory; Increase the scope of speech recognition on the one hand, improve Consumer's Experience on the other hand.

In an embodiment of the present invention, when receiving phonetic order at every turn, can determine by high in the clouds speech engine identification (network is normal) according to network state, or by local speech engine identification (network is abnormal); Also first by local speech engine identification, can be identified by high in the clouds speech engine again when its None-identified.Due to the audio recognition method according to the embodiment of the present invention, local speech engine " constantly can increase knowledge ", compared to the local speech engine of prior art, more complicated and the more instruction of identifiable design, therefore, a kind of mode after adopting, for user saves flow and cost on the basis ensureing accurately identification, can promote Consumer's Experience further.

Carry out the schematic flow sheet identified see Fig. 4 local speech engine that is the embodiment of the present invention, after it receives phonetic order, carry out grammatical analysis successively, dynamic syntax is resolved, pattern match and result export recognition result after screening.Should understand, the local syntax library of local speech engine is adopt the method for the embodiment of the present invention to carry out grammer to increase the local syntax library after upgrading, and the processes such as grammatical analysis, dynamic syntax parsing, pattern match and result examination are prior art, the embodiment of the present invention does not describe in detail this.

It is the process flow diagram of the audio recognition method of another embodiment of the present invention see Fig. 5.The audio recognition method of this embodiment comprises:

S100, a default time period.

Preset time period can sky, the moon or hour etc. be unit, such as, can 2 months be set to.

S101, in preset time period, according to the recognition result that high in the clouds speech engine returns, the local syntax library of local speech engine to be upgraded.

Concrete, in preset time period, receive phonetic order, then according to the method for above-described embodiment, judge that whether network is normal, if normal, sent to by phonetic order high in the clouds speech engine to identify.When high in the clouds returns recognition result, phonetic order is increased in local syntax library.Thus, when user says this phonetic order again next time, local speech engine can identify accurately to this.

S102, when preset time period then, from local syntax library, delete upgraded phonetic order.

According to this embodiment, deleted by the phonetic order that preset time period is increased, can storage space be discharged.In addition, in practice, user can change in different time period demands, such as, works at city A, and its conventional phonetic order is " making a phone call ", then conventional " taking pictures ", " inquiry " during city B travels.Therefore, if be preset time period by the set of time of tourism, then during can making tourism, increase the dependent parser instruction of " taking pictures " in local syntax library, and get back to the place of work, then delete these instructions and the instruction be of little use, to discharge storage space, save internal memory.

Should be understood that other implementation and the details (such as, the renewal process etc. to local syntax library) of this embodiment are identical with above-described embodiment, do not repeat them here.

Be the structural representation of the speech engine of the embodiment of the present invention see Fig. 6, it comprises:

Speech reception module 40, for receiving the phonetic order of user's input;

Judge module 41, for judging that whether current network is normal, if normally, then sends to high in the clouds speech engine to identify by phonetic order;

Receiver module 42, for receiving the recognition result that high in the clouds speech engine returns;

Update module 43, for according to recognition result, upgrades syntax library.

Wherein, update module 43 comprises:

Taxon 431, for according to recognition result, carries out instruction classification;

Resolution unit 432, for determining whether the instruction of preset kind based on instruction classification result, if so, then carries out syntax parsing to phonetic order, and extracts grammer key message;

Judging unit 433, for judging whether the grammer key message extracted meets pre-conditioned, if meet, is then increased to syntax library by phonetic order according to after preset format conversion.

In addition, the speech engine of the embodiment of the present invention also comprises one for the sound identification module 44 performing grammatical analysis described in above-mentioned Fig. 4, dynamic syntax parsing, pattern match, result are screened, and for storing the syntax library 45 of grammer.

Should be understood that the speech engine of the embodiment of the present invention is corresponding with above-mentioned audio recognition method, be not described in detail herein.

See Fig. 7, the above-mentioned speech reception module 40 of the embodiment of the present invention, judge module 41, receiver module 42, update module 43 form intelligent learning module (non-label).Concrete, see Fig. 8, intelligent learning module receives in step 70 recognition result that high in the clouds speech engine returns; Instruction classification is carried out in step 72; Carry out syntax parsing in step 74, after completing parsing, extract grammer key message in step 76, after determining whether renewal according to grammer key message, upgrade local syntax library in step 80.To learn the recognition result of high in the clouds speech engine, be then updated to the syntax library of local speech engine.Should be understood that intelligent learning module is included in (Fig. 7 this relation not shown) in local speech engine.

Thus, by the speech engine of the embodiment of the present invention, when user is when there being network or when network is good, using high in the clouds speech engine to carry out speech recognition, and recognition result is updated to the syntax library of local speech engine; When without when network or time network is bad, local speech engine is used to carry out speech recognition.Because local speech engine is according to the recognition result in high in the clouds, constantly " enlarging one's knowledge ", local speech engine can identify large amount of complex instruction, reaches self-teaching, has the object of memory, increases the scope of speech recognition, improves Consumer's Experience.

On the other hand, the embodiment of the present invention also provides a kind of terminal, and it comprises above-mentioned speech engine.Should be understood that terminal can be the hardware device such as mobile phone, panel computer, personal digital assistant, e-book comprising communication unit, audio/video (A/V) input block, user input unit, sensing cell, output unit, storer, interface unit, controller and power supply unit etc.

See Fig. 9, terminal 1 and the high in the clouds speech engine 2 of the embodiment of the present invention communicate to connect.Terminal 1 gathers the phonetic order of user by voice acquisition devices such as microphones.When there being network or when network is good, using high in the clouds speech engine 2 to carry out speech recognition, and recognition result being updated to the syntax library of the local speech engine of terminal 1; When without when network or time network is bad, the local speech engine of terminal 1 is used to carry out speech recognition.Because people's what someone said or phonetic order are in one day or a period of time, have is repeated greatly, therefore, by descending the recognition result of high in the clouds speech engine to be updated in local syntax library by having when network, local speech engine can constantly " enlarge one's knowledge " according to the recognition result in high in the clouds, makes local speech engine can identify large amount of complex instruction, reach self-teaching, there is the object of memory, increase the scope of speech recognition, improve Consumer's Experience.

The audio recognition method of the embodiment of the present invention, speech recognition engine and terminal, the knowledge method for distinguishing of local speech engine can be improved to a great extent, facilitate user also can accurately identify history phonetic order when there is no network, reach the object of self-teaching, Consumer's Experience can be improved greatly.

In process flow diagram or any process otherwise described in an embodiment of the present invention or method describe and can be understood to, represent and comprise one or more for realizing the module of the code of the executable instruction of the step of specific logical function or process, fragment or part, and the scope of embodiment of the present invention comprises other realization, wherein can not according to order that is shown or that discuss, comprise according to involved function by the mode while of basic or by contrary order, carry out n-back test, this should understand by those skilled in the art described in embodiments of the invention.

The present invention can also be implemented by computer program, and described routine package contains whole features that can realize the inventive method, when it is installed in computer system, by running, and can be to perform the methods of the present invention.Computer program in presents referred to: any expression formula of one group of instruction that any program language, code or symbol can be adopted to write, this instruction group makes system have information processing capability, directly to realize specific function, or realize specific function after carrying out one or two step following: a) convert other Languages, coding or symbol to; B) reproduce in a different format.

By reference to the accompanying drawings embodiments of the invention are described above; but the present invention is not limited to above-mentioned embodiment; above-mentioned embodiment is only schematic; instead of it is restrictive; those of ordinary skill in the art is under enlightenment of the present invention; do not departing under the ambit that present inventive concept and claim protect, also can make a lot of form, these all belong within protection of the present invention.

Claims

1. an audio recognition method, is characterized in that, comprising:

Receive phonetic order;

2. audio recognition method according to claim 1, is characterized in that, the described local syntax library to local speech engine carries out renewal and comprises:

According to recognition result, carry out instruction classification;

3. audio recognition method according to claim 2, is characterized in that, describedly pre-conditionedly to comprise: the frequency of utilization of grammer exceedes preset value;

4. audio recognition method according to claim 1, is characterized in that, described method also comprises:

5. audio recognition method according to claim 1, is characterized in that, described method also comprises:

Corresponding operation is performed according to described recognition result.

6. the audio recognition method according to any one of claim 1-5, is characterized in that, described method also comprises:

Preset a time period;

7. audio recognition method according to claim 6, is characterized in that, described method also comprises:

8. a speech engine, is characterized in that, comprising:

Speech reception module, for receiving the phonetic order of user's input;

9. speech engine according to claim 8, is characterized in that, described update module comprises:

10. a terminal, is characterized in that, comprising: the speech engine described in any one of claim 8-9.