CN108831458A

CN108831458A - A kind of offline voice is to order transform method and system

Info

Publication number: CN108831458A
Application number: CN201810533495.2A
Authority: CN
Inventors: 马鸿飞; 刘海模; 吴晓东; 苏云鹏; 刘雄; 肖虎; 卢敬光
Original assignee: Guangdong Sheng General Technology Co Ltd
Current assignee: Guangdong Sheng General Technology Co Ltd
Priority date: 2018-05-29
Filing date: 2018-05-29
Publication date: 2018-11-16

Abstract

A kind of offline voice includes the following steps to order transform method：Multiple respective speech texts of trained voice are received, and corresponding voice short sentence dictionary is constructed based on speech text, the voice short sentence dictionary includes at least the text information of corresponding speech text；The multistage input voice of each trained voice is received respectively；Multistage input voice and voice short sentence dictionary based on each trained voice form corresponding speech recognition template, and the speech recognition template is stored in local；It is corresponding operational order by speech recognition Template Map, and the operational order is output to control equipment.Beneficial effects of the present invention are：It is stored in local by the speech recognition template that will have been trained, the command content that can be convenient the offline confirmation speaker in ground carries out speech recognition without voice content is sent to some external server, to improve the efficiency of voice control.

Description

A kind of offline voice is to order transform method and system

Technical field

The present invention relates to the technical field of voice control, more particularly to a kind of offline voice to order transform method and The system for realizing correlation technique.

Background technique

Voice is the most frequently used and most natural communication form of Human communication.Speech recognition is as a kind of man-machine in information technology The key technology of interface has important research significance and wide application value.With the day of speech recognition technology in recent years It gradually popularizes, many consumer products has directly been successfully applied to the function that machinery equipment issues control instruction by voice In.People and machine have obtained preliminary realization with the dream that natural language engages in the dialogue.Although speech recognition technology applies model Enclose extremely wide, and specific implementation needs according to every kind of concrete application scene to carry out adaptation adjustment；But either that Specific speech recognition application is directed to the transformation that voice itself arrives voice content.

Compared with traditional equipment control technology, although voice-based equipment control technology can provide more for user Direct convenience interactive operation mode (such as being indicated without user's manual input commands)；But prior art due to voice from Body is easy caused by being influenced by other conditions (such as sounding situation different between background noise and multidigit speaker etc.) not Stablize, and the determination of voice content, i.e., by its from natural language be converted to the acceptable computer language such as machinery equipment toward It is past to require relevant device on-line joining process one external data base for semantic conversion.The problems in these practical applications all improve The use cost of voice-based equipment control technology.

Summary of the invention

Present invention aims to solve the deficiencies of the prior art, and provides a kind of a kind of offline voices to order transform method and to be System, can obtain and realize voice-based equipment control function offline, and reduces external condition as far as possible and convert to voice content Influence effect.

To achieve the goals above, present invention firstly provides a kind of offline voice to order transform method, including it is following Step：Multiple respective speech texts of trained voice are received, and corresponding voice short sentence dictionary is constructed based on speech text, it is above-mentioned Voice short sentence dictionary includes at least the text information of corresponding speech text；The multistage input language of each trained voice is received respectively Sound；Multistage input voice and voice short sentence dictionary based on each trained voice form corresponding speech recognition template, and by institute Predicate sound recognition template is stored in local；It is corresponding operational order by speech recognition Template Map, and according to instruction voice institute Corresponding operational order is output to control equipment by matched speech recognition template.

In one or more embodiments of the method, above-mentioned voice short sentence dictionary also includes at least with the next item down speech text Sound pronunciation characteristic：Phrase, word, individual character, syllable and phoneme.

In one or more embodiments of the method, the formation of speech recognition template further includes sub-step below：Calculate language The mel-frequency cepstrum parameter of each section of input voice of sound text, to form the gauss hybrid models of each speech text；According to The transfer matrix of each section of input voice, to form corresponding hidden markov model；Gauss based on each speech text is mixed Molding type and hidden markov model form speech recognition template.

In one or more embodiments of the method, content and/or operational order and the speech recognition template of operational order Corresponding relationship is customized.

Further, in above method embodiment, the content and/or operational order and speech recognition template of operational order Corresponding relationship be stored in local.

In one or more embodiments of the method, speech recognition template is that collected input voice instruction is updated by dynamic Made of white silk.

Secondly, the present invention also proposes that the offline voice of one kind to order converting means, comprises the following modules：Received text mould Block constructs corresponding voice short sentence dictionary for receiving multiple respective speech texts of trained voice, and based on speech text, on Predicate sound short sentence dictionary includes at least the text information of corresponding speech text；Speech reception module, for receiving each instruction respectively The multistage for practicing voice inputs voice；Template generation module inputs voice for the multistage based on each trained voice and voice is short Sentence dictionary forms corresponding speech recognition template, and above-mentioned speech recognition template is stored in local；Voice mapping block, is used for It is corresponding operational order by speech recognition Template Map, and according to the matched speech recognition template of instruction voice institute, will corresponds to Operational order be output to control equipment.

In one or more Installation practices, above-mentioned voice short sentence dictionary also includes at least with the next item down speech text Sound pronunciation characteristic：Phrase, word, individual character, syllable and phoneme.

In one or more Installation practices, template generation module further includes submodule below：First modeling module： For calculating the mel-frequency cepstrum parameter of each section of input voice of speech text, to form the Gaussian Mixture of each speech text Model；Second modeling module, for inputting the transfer matrix of voice according to each section, to form corresponding hidden markov mould Type；Template creation module forms voice for gauss hybrid models and hidden markov model based on each speech text Recognition template.

In one or more Installation practices, content and/or operational order and the speech recognition template of operational order Corresponding relationship is customized.

Further, in above-mentioned apparatus embodiment, the content and/or operational order and speech recognition template of operational order Corresponding relationship be stored in local.

In one or more Installation practices, speech recognition template is that collected input voice instruction is updated by dynamic Made of white silk.

Finally, it is stored thereon with computer instruction the invention also discloses a kind of computer readable storage medium, the instruction It realizes when being executed by processor such as the step of aforementioned described in any item methods.

Beneficial effects of the present invention are：Be stored in local by the speech recognition template that will have been trained, can be convenient from The command content of line justification speaker carries out speech recognition without voice content is sent to some external server, to mention The high efficiency of voice control.

Detailed description of the invention

Fig. 1 show offline voice to order transform method one embodiment flow chart；

Fig. 2 show a configuration schematic diagram of method shown in Fig. 1；

Fig. 3 show the sub-step flow chart of the forming process of speech recognition template；

Fig. 4 show the schematic diagram of user's customized voice and operational order corresponding relationship；

Fig. 5 show offline voice to order transform method another embodiment configuration schematic diagram；

Fig. 6 show offline voice to order transformation system one embodiment function structure chart.

Specific embodiment

It is carried out below with reference to technical effect of the embodiment and attached drawing to design of the invention, specific structure and generation clear Chu, complete description, to be completely understood by the purpose of the present invention, scheme and effect.It should be noted that the case where not conflicting Under, the features in the embodiments and the embodiments of the present application can be combined with each other.The identical attached drawing mark used everywhere in attached drawing Note indicates the same or similar part.

Fig. 1 show offline voice to order transform method one embodiment flow chart.Wherein, above method packet Include following steps：Multiple respective speech texts of trained voice are received, and corresponding voice short sentence word is constructed based on speech text Allusion quotation, above-mentioned voice short sentence dictionary include at least the text information of corresponding speech text；The multistage of each trained voice is received respectively Input voice；Multistage input voice and voice short sentence dictionary based on each trained voice form corresponding speech recognition template, And above-mentioned speech recognition template is stored in local；It is corresponding operational order by speech recognition Template Map, and according to instruction The matched speech recognition template of voice institute, is output to control equipment for corresponding operational order.

As shown in Fig. 2, in one embodiment, the corresponding trained voice of each single item operational order.Above-mentioned trained voice Speech text can be grammatically complete sentence, or one or more keywords.Voice short sentence dictionary is at least with this The form of text has recorded the content of above-mentioned speech text, the text information as speech text.One user repeatedly reads aloud above-mentioned Speech text or several users read aloud above-mentioned speech text respectively, form the multistage input voice of above-mentioned speech text.For The corresponding trained voice of each single item operational order, above-mentioned multistage input voice and voice short sentence dictionary are trained to be formed corresponding Speech recognition template.After speech recognition template is trained to, the instruction comprising speech text content is issued receiving user When voice, above-metioned instruction voice will be matched with speech recognition template, to confirm that instruction voice corresponds to multiple speech recognition moulds Which of plate, and corresponding operational order is output to control equipment.The matching of instruction voice and speech recognition template can It is realized by conventional algorithm in the art, the present invention not limits this.

In one embodiment, above-mentioned voice short sentence dictionary also includes at least special with the sound pronunciation of the next item down speech text Property：Phrase, word, individual character, syllable and phoneme.Above-mentioned sound pronunciation characteristic can be based on speech text itself formation, or passes through The mode being manually entered is formed, to improve the accuracy of speech recognition template generated.Meanwhile above-mentioned sound pronunciation characteristic It can be used for the pretreatment to input voice.(the example when inputting voice and above-mentioned sound pronunciation characteristic occurs apparent inconsistent The punctuate mistake or external noise such as inputted in voice is doped in input voice), prompt can be issued and required again Receive input voice.

Referring to the sub-step flow chart of the forming process of speech recognition template shown in Fig. 3, in one embodiment, voice Recognition template can be formed based on following steps：Calculate separately the mel-frequency cepstrum ginseng of the corresponding each section of input voice of speech text Number (Mel-Frequency Cepstral Coefficients, abbreviation MFCC), and according to mel-frequency cepstrum parameter and voice Pronunciation characteristics (such as syllable and phoneme etc.) are to form the gauss hybrid models of each speech text；Voice is inputted according to each section Transfer matrix, to form corresponding hidden markov model；Gauss hybrid models and implicit horse based on each speech text The mixed model of Er Kefu model forms speech recognition template.Those skilled in the art can be in terms of customary technical means in the art The mel-frequency cepstrum parameter of input voice is calculated, the present invention not limits this.

Referring to the schematic diagram of another embodiment shown in Fig. 4, in this embodiment, the content and operation life of operational order It is customized for enabling with the corresponding relationship of speech recognition template.User can define corresponding operation according to actual application scenarios The corresponding relationship of order and operational order and speech recognition template.Such as the application scenarios for access control system, operational order Content may be defined as " opening the door " and " shutdown " two.Similarly, in above-mentioned scene, can for " enabling " and " shutdown " two this The customized speech recognition template of two operational orders, for example will such as " 123456 " language of the verifying password as speech text Sound recognition template is mapped as operational order " enabling ".Only after the instruction voice of sending " 123456 ", access control system can just be connect Receive the operational order of " enabling ".

Further, the configuration schematic diagram of embodiment referring to Figure 5, the content and/or operation of aforesaid operations order Order and the corresponding relationship of speech recognition template can be stored in local database.The instruction voice of sending can be based on local Database in stored operational order content and/or operational order be mapped to the corresponding relationship of speech recognition template it is corresponding Operational order so that operational order can be output to control equipment without connecting network.

In one or more embodiments, speech recognition template is that collected input voice training is updated by dynamic At.For the application scenarios of above-mentioned access control system, user can improve gate inhibition system by regularly updating speech recognition template The safety coefficient of system avoids entering particular place by the personnel of other lacks of competence.

Fig. 6 show offline voice to order transformation system one embodiment function structure chart.Wherein, above-mentioned system System comprises the following modules：Received text module for receiving multiple respective speech texts of trained voice, and is based on speech text Corresponding voice short sentence dictionary is constructed, above-mentioned voice short sentence dictionary includes at least the text information of corresponding speech text；Voice connects Module is received, the multistage for receiving each trained voice respectively inputs voice；Template generation module, for being based on each trained language The multistage input voice and voice short sentence dictionary of sound form corresponding speech recognition template, and above-mentioned speech recognition template is stored In local；Voice mapping block, for being corresponding operational order by speech recognition Template Map, and according to instruction voice institute Corresponding operational order is output to control equipment by the speech recognition template matched.

As shown in Fig. 2, in one embodiment, the corresponding trained voice of each single item operational order.Above-mentioned trained voice Speech text can be grammatically complete sentence, or one or more keywords.Received text module is at least with this The text information that the form of text has recorded the content of above-mentioned speech text to form voice short sentence dictionary, as speech text.One Name user repeatedly reads aloud above-mentioned speech text or several users read aloud above-mentioned speech text respectively, forms above-mentioned speech text Multistage input voice and received by speech reception module.For the corresponding trained voice of each single item operational order, template The above-mentioned multistage input voice of generation module training and voice short sentence dictionary are to form corresponding speech recognition template.Work as speech recognition After template is trained to, when receiving instruction voice of user's sending comprising speech text content, voice mapping block will be upper It states instruction voice to be matched with speech recognition template, to confirm which in multiple speech recognition templates instruction voice correspond to It is a, and corresponding operational order is output to control equipment.The matching of instruction voice and speech recognition template can pass through this field Interior conventional algorithm realizes that the present invention not limits this.

In one embodiment, template generation module may include following submodule：First modeling module, for calculating separately The mel-frequency cepstrum parameter of the corresponding each section of input voice of speech text, and according to mel-frequency cepstrum parameter and sound pronunciation Characteristic (such as syllable and phoneme etc.) is to form the gauss hybrid models of each speech text；Second modeling module is used for basis The transfer matrix of each section of input voice, to form corresponding hidden markov model；Template creation module, for based on each The gauss hybrid models of speech text and the mixed model of hidden markov model form speech recognition template.This field skill Art personnel can calculate the mel-frequency cepstrum parameter of input voice with customary technical means in the art, and the present invention not limits this It is fixed.

Although description of the invention is quite detailed and especially several embodiments are described, it is not Any of these details or embodiment or any specific embodiments are intended to be limited to, but should be considered as is by reference to appended A possibility that claim provides broad sense in view of the prior art for these claims explanation, to effectively cover the present invention Preset range.In addition, with the foreseeable embodiment of inventor, present invention is described above, its purpose is to be provided with Description, and those still unforeseen at present change to unsubstantiality of the invention can still represent equivalent modifications of the invention.

Claims

1. a kind of offline voice is to order transform method, which is characterized in that include the following steps：

Multiple respective speech texts of trained voice are received, and corresponding voice short sentence dictionary is constructed based on speech text, it is described Voice short sentence dictionary includes at least the text information of corresponding speech text；

The multistage input voice of each trained voice is received respectively；

Multistage input voice and voice short sentence dictionary based on each trained voice form corresponding speech recognition template, and by institute Predicate sound recognition template is stored in local；

It is corresponding operational order by speech recognition Template Map, and according to the matched speech recognition template of instruction voice institute, it will Corresponding operational order is output to control equipment.

2. the method according to claim 1, wherein the voice short sentence dictionary also includes at least with the next item down language The sound pronunciation characteristic of sound text：Phrase, word, individual character, syllable and phoneme.

3. the method according to claim 1, wherein the formation of speech recognition template further includes sub-step below Suddenly：

The mel-frequency cepstrum parameter of each section of input voice of speech text is calculated, to form the Gaussian Mixture of each speech text Model；

The transfer matrix of voice is inputted, according to each section to form corresponding hidden markov model；

Gauss hybrid models and hidden markov model based on each speech text form speech recognition template.

4. the method according to claim 1, wherein the content and/or operational order of operational order and voice are known The corresponding relationship of other template is customized.

5. according to the method described in claim 4, it is characterized in that, the content and/or operational order of operational order and voice are known The corresponding relationship of other template is stored in local.

6. the method according to claim 1, wherein speech recognition template is collected defeated by dynamic update Enter made of voice training.

7. a kind of offline voice is to order converting means, which is characterized in that comprise the following modules：

Received text module, for receiving multiple respective speech texts of trained voice, and it is corresponding based on speech text construction Voice short sentence dictionary, the voice short sentence dictionary include at least the text information of corresponding speech text；

Speech reception module, the multistage for receiving each trained voice respectively input voice；

Template generation module inputs voice for the multistage based on each trained voice and voice short sentence dictionary forms corresponding language Sound recognition template, and the speech recognition template is stored in local；

Voice mapping block for being corresponding operational order by speech recognition Template Map, and is matched according to instruction voice Speech recognition template, corresponding operational order is output to control equipment.

8. a kind of computer readable storage medium, is stored thereon with computer instruction, it is characterised in that the instruction is held by processor It realizes when row such as the step of method described in any one of claims 1 to 6.