CN109584860A

CN109584860A - A kind of voice wakes up word and defines method and system

Info

Publication number: CN109584860A
Application number: CN201710889765.9A
Authority: CN
Inventors: 朱泽春; 时春平
Original assignee: Joyoung Co Ltd
Current assignee: Joyoung Co Ltd
Priority date: 2017-09-27
Filing date: 2017-09-27
Publication date: 2019-04-05
Anticipated expiration: 2037-09-27
Also published as: CN109584860B

Abstract

The application proposes that a kind of voice wakes up word and defines method and system, is related to technical field of voice interaction, which comprises be trained by voice training model to customized wake-up word according to word information is waken up；Voice library file is returned to according to training result, the voice library file supports user to use customized wake-up word wake-up device for being matched with phonetic order；The voice library file includes audio file corresponding with the wake-up word information, or including voice match algorithm corresponding with the wake-up word information and voice match parameter.By voice training model, the customized wake-up word of training forms voice library file, realizes that voice system wake-up word user can be customized.

Description

A kind of voice wakes up word and defines method and system

Technical field

The present invention relates to technical field of voice interaction, and in particular to a kind of voice wakes up word and defines method and system.

Background technique

Smart home has become a kind of life style of young man, and voice wake-up is all used in online voice wake-up system Function, voice wake-up are a kind of forms of speech recognition technology, are not directly contacted with hardware device, can be by equipment by voice Operation is waken up, voice wakes up the word that word is typically all fixed noun such as " hello, robot " etc, because this is called out Word of waking up needs local identification, therefore wake-up word cannot be replaced arbitrarily.

Summary of the invention

The present invention, which provides a kind of voice and wakes up word, defines method and system, realizes that the customized voice system of user wakes up word.

In order to achieve the above-mentioned object of the invention, the technical solution adopted by the present invention is as follows:

It wakes up word in a first aspect, the present invention provides a kind of voice and defines method, comprising:

Customized wake-up word is trained by voice training model according to word information is waken up；

Voice library file is returned to according to training result, the voice library file with phonetic order for being matched to support User uses customized wake-up word wake-up device；

The voice library file includes audio file corresponding with the wake-up word information, or including with the wake-up The corresponding voice match algorithm of word information and voice match parameter.

Preferably, the method also includes:

The identity information of user is verified to determine whether user has and carry out customized wake-up word permission to wake-up word；

Determined whether to execute corresponding wake-up word training step according to verification result.

Preferably, according to the cycle of training for waking up word information determination and/or feedback wake-up word, the wake-up word information includes Close degree between the content for waking up word, the public use frequency for waking up word, the length for waking up word and the word for waking up word.

Preferably,

After cycle of training expires, customized wake-up word is prompted to be in available mode to the user for waking up word permission；With/ Or,

When wake-up device fails, word and/or customized wake-up are initially waken up to not having the user's prompt for waking up word permission Word.

Preferably, the method also includes:

It receives user's customized word that wakes up by voice input and instructs or receive what user was inputted by terminal device Customized wake-up word instruction.

Preferably, it when user inputs customized wake-up word instruction by voice, extracts and wakes up word Self-definition process In customized wake-up word audio-frequency information；Before cycle of training expires, by the wake up instruction of input and the audio-frequency information into Row similarity compares, when similarity is more than threshold value, wake-up device.

Preferably, the method also includes:

Interactive voice information of user during interactive voice is extracted according to the identity information of user,

The interactive voice information is trained by voice training model, is extracted in the interactive voice information The parameters,acoustic of least speech unit；

The voice library file is synthesized according to the parameters,acoustic splicing for waking up word information and least speech unit.

Preferably, after the identity information of verifying user, before being trained to customized wake-up word further include:

Speech recognition and semantic understanding are carried out to the wake-up word information.

Second aspect wakes up word the present invention also provides a kind of voice and defines system, comprising:

Voice wake-up module and voice training module are provided with voice training model in the voice training module, should Voice training module is used to be trained customized wake-up word by voice training model according to wake-up word information, and according to instruction Practice result and returns to voice library file；The voice wake-up module is used for voice library file and the phonetic order progress according to return It is equipped with and user is supported to use customized wake-up word wake-up device；The voice library file includes corresponding with the wake-up word information Audio file, or including voice match algorithm corresponding with the wake-up word information and voice match parameter.

Preferably, the system also includes identification modules；The voice wake-up module is set to local end equipment, institute Predicate sound training module is set to cloud platform；The local end equipment further includes voice acquisition module, noise processed module, language Sound transmission control module and the first voice transfer module；The cloud platform further includes the second voice transfer module, voice knowledge Other module and semantic understanding module；The voice transfer control module is according to the stream of the type adjustment voice signal of phonetic order To wake up local end equipment or voice signal be sent to cloud platform；The cloud platform passes through speech recognition module and language Adopted Understanding Module parses the voice signal, and returns to corresponding parsing result or voice according to the type of phonetic order Library file.

The present invention compared to the prior art, by being trained to customized wake-up word；It obtains to use customized call out The voice library file for word wake-up device of waking up；It has the following beneficial effects:

1, technical solution of the present invention forms voice library file by voice training model, the customized wake-up word of training, real Existing voice system wakes up word user can be customized.

2, the present invention can identify whether to be administrator's identity by Application on Voiceprint Recognition, and administrator can modify wake-up word.

3, the present invention can store multiple wake-up words, and user can pass through multiple wake-up word wake-up devices when in use.

4, voice acquisition module, noise processed module, voice transfer control module are in work always in the embodiment of the present invention Make state, speech recognition module, semantic understanding module and the voice training module of cloud server are in work shape after triggering State realizes the real-time identification of voice, and energy conservation.

5, the present invention can be inputted by voice or the customized wake-up word of terminal device input instructs, the selection of user Customized wake-up word is added or modified to mode appropriate.

Detailed description of the invention

Fig. 1 is that the voice of the embodiment of the present invention wakes up the flow chart that word defines method；

Fig. 2 is that the voice of the embodiment of the present invention wakes up the structural schematic diagram that word defines system.

Specific embodiment

To keep goal of the invention of the invention, technical scheme and beneficial effects more clear, with reference to the accompanying drawing to this The embodiment of invention is illustrated, it should be noted that in the absence of conflict, in the embodiment and embodiment in the application Feature can mutual any combination.

Embodiment one

The present embodiment, which is illustrated with reference to Fig. 1 a kind of voice and wakes up word, defines method, comprising:

S101, customized wake-up word is trained by voice training model according to wake-up word information；

S102, voice library file is returned to according to training result, the voice library file with phonetic order for being matched To support user to use customized wake-up word wake-up device；

The embodiment of the present invention forms voice library file, realizes voice by voice training model, the customized wake-up word of training System wake-up word user can be customized.

Preferably, before the method further include:

Specifically, be arranged by way of preparatory typing or default user identity information and corresponding permission, when with When family modification wakes up word, the identity information verifying of user is carried out, when user wake-up word customized with permission, execution is called out The step that word of waking up is trained；When user wake-up word customized without permission, the step for waking up word training is not executed.

Be arranged by way of preparatory typing in the embodiment of the present invention user identity information and corresponding permission, Ke Yili It being configured with voiceprint, the voiceprint of preparatory typing administrator, setting administrator has the customized wake-up word of permission, into When the identity information verifying of row user, user speech is inputted and is compared with administrator's voiceprint of preparatory typing, sound is worked as When line information matches, determine that user has the customized wake-up word of permission, when voiceprint mismatches, user does not have permission certainly Definition wakes up word.

The embodiment of the present invention identifies whether to be administrator's identity, administrator can modify wake-up word by Application on Voiceprint Recognition

Be arranged by way of default in the embodiment of the present invention user identity information and corresponding permission, can be set The user of one wake-up device is administrator, and setting administrator has the customized wake-up word of permission, carries out the identity information of user When verifying, user speech is inputted and is compared with administrator's voiceprint of preparatory typing, when voiceprint matching, determined User has the customized wake-up word of permission, and when voiceprint mismatches, user does not have the customized wake-up word of permission.

The mode that the mode receives customized wake-up word instruction includes: to receive that user is by voice input customized to be called out Word of waking up instructs or receives user and instructed by the customized wake-up word that terminal device inputs.The selection of user mode appropriate adds Add or modify customized wake-up word

Authority Verification information can store in the memory of intelligent terminal the machine in the embodiment of the present invention, in intelligent terminal sheet Authentication is carried out on machine determines that the user is match with the Authority Verification information subscriber identity information The no permission that there is modification to wake up word.The wake-up word pre-established can be equipped in the embodiment of the present invention in intelligent terminal the machine Sound training pattern, the intelligent terminal carry out voice training to the customized wake-up word according to the voice training model.

Authority Verification information can store on server beyond the clouds in the embodiment of the present invention, and smart machine takes to the cloud Business device sends the subscriber identity information；Authentication is carried out with by the subscriber identity information and the power by cloud server Limit verification information carries out matching and determines the permission whether user there is modification to wake up word.Voice training in the embodiment of the present invention Model can store on server beyond the clouds, and the cloud server is equipped with the wake-up word sound training pattern pre-established, The cloud server carries out voice instruction to the customized wake-up word by voice training model according to the wake-up word information Practice.

Embodiment two

The embodiment of the present invention illustrates that the voice before expiring cycle of training and after training at the expiration wakes up word and defines method flow:

According to the cycle of training for waking up word information determination and/or feedback wake-up word, the wake-up word information includes waking up word Content, wake up word the public use frequency, wake up word length and wake up word word between close degree.

Different customized wake-up words are different cycle of training, and the embodiment of the present invention can will return to user cycle of training, more Customized wake-up word comes into force after long-time, may remind the user that how long later the customized wake-up word can be used in this way.

Wherein, the public use frequency for waking up word indicates: using certain intelligent appliance or using the intelligent family of certain brand Electricity or using certain manufacturer intelligent appliance user using the customized wake-up word pre-seted in the embodiment of the present invention quantity, Described more using the customized quantity for waking up word, corresponding cycle of training is shorter, uses the customized wake-up word Quantity is fewer, and corresponding cycle of training is longer.For example, the intelligent appliance of nine positive brands, user A modify customized wake-up word 1 and are " small sun, small sun ", user B modify customized wake-ups word 2 be " small nine, small nine ", since numerous users use " small positive, small sun ", Therefore " the small sun, small sun " cycle of training of customized wake-up word 1 than customized wake-up word 2 " small nine, small nine " cycle of training it is short.

Close degree indicates between waking up the word of word: the close degree of the customized pronunciation for waking up each word of word or sound, Difference of pronouncing is smaller, and corresponding cycle of training is longer, and pronunciation difference is bigger, and corresponding cycle of training is smaller, for example, nine positive brands Intelligent appliance, it is " persimmon, persimmon " that user C, which modifies customized wake-ups word 3, each word of customized wake-up word 1 " small positive, small sun " Between pronounce and be weak in pronunciation greatly between difference word more each than customized wake-ups word 3 " persimmon, persimmon ", therefore customized wake-up word 1 is " small Positive, small sun " cycle of training is shorter cycle of training than customized wake-up word 3 " persimmon, persimmon ".

Preferably, it after expiring cycle of training, can be used to having the user for waking up word permission that customized wake-up word is prompted to be in State；And/or

The embodiment of the present invention can store multiple wake-up words, and user can be waken up by multiple wake-up words set when in use It is standby.

It instructs to solve to receive customized wake-up word to waking up the wake-up word connection problem between expiring word cycle of training, The embodiment of the present invention may include:

When user inputs customized wake-up word instruction by voice, extracts and wake up making by oneself in word Self-definition process Justice wakes up the audio-frequency information of word；Before cycle of training expires, the wake up instruction of input and the audio-frequency information are subjected to similarity It compares, when similarity is more than threshold value, wake-up device.

Before cycle of training expires, after the customized customized wake-up word of user for waking up word of permission, it can be used certainly Definition wakes up word wake-up device, solves customized wake-up word and asks to the linking of the wake-up word between expiring word cycle of training is waken up Topic, other users by initially wake up word still can wake-up device, distinguish whether be have permission it is customized wake up word user when, It is compared using audio-frequency information, it is high with the customized audio-frequency information similarity for waking up word if it is administrator, if it is other User is low with the customized wake-up audio-frequency information similarity of word.In the embodiment of the present invention with it is customized wake up word audio-frequency information Similarity judged using threshold value, according to judging result, it is determined whether wake-up device.

The method further include:

In the embodiment of the present invention during daily interactive voice, according to the interactive voice information of extract management person, benefit Be trained with voice interactive information daily, advantageously reduce cycle of training, accelerate user using it is customized wake up word when Between.

It can use synthetic method in the embodiment of the present invention and generate voice library file by phonetic rules.Store the smallest language The parameters,acoustic of sound unit, and form word by phoneme group syllabication, by syllable, be composed of words sentence and control tone, weight The various rules of the rhythms such as sound.After providing voice data to be synthesized, automatically converted voice data to using rule continuous Speech sound waves.It is Pitch synchronous overlap add technology for waveform concatenation and prosodic control, more representational algorithm (PSOLA), this method was not only able to maintain the main segment5al feature to be pronounced, but can be adjusted flexibly in splicing its fundamental frequency, duration and The super-segmental features such as intensity.Its core concept is directly to splice to the voice of storage with PSOLA algorithm, thus whole Synthesize complete voice.It is different from traditional concept and only closes the waveform compilation that different voice units carries out simple concatenation At ruled synthesis first has in a large amount of sound banks, selects most suitable voice unit to be used to splice, and during selecting sound The technology for often using Various Complex will use such as PSOLA algorithm finally in splicing, and the rhythm that voice is synthesized to it is special Sign is modified, so that the voice of synthesis be enable to reach very high sound quality.

Embodiment three

It wakes up word as shown in Fig. 2, the embodiment of the present invention provides a kind of voice and defines system, comprising:

Voice wake-up module 111 and voice training module 221 are provided with voice instruction in the voice training module 221 Practice model, which is used to customized wake-up word is carried out by voice training model according to wake-up word information Training, and voice library file is returned to according to training result；The voice wake-up module is with 111 in the voice library file according to return It is matched with phonetic order to support user to use customized wake-up word wake-up device；The voice library file include with it is described The corresponding audio file of word information is waken up, or including voice match algorithm corresponding with the wake-up word information and voice With parameter.

The system also includes identification modules 222；The voice wake-up module 111 is set to local end equipment 11, The voice training module 221 is set to cloud server 22；It is described local end equipment 11 further include voice acquisition module 112, Noise processed module 113, voice transfer control module 114 and the first voice transfer module 115；The cloud server 22 is also Including the second voice transfer module 225, speech recognition module 223 and semantic understanding module 224；The voice transfer controls mould Block 114 is sent to according to the flow direction of the type adjustment voice signal of phonetic order with the local end equipment 11 of wake-up or by voice signal Cloud server 22；The cloud server 22 believes the voice by speech recognition module 223 and semantic understanding module 224 It number is parsed, and corresponding parsing result or voice library file is returned to according to the type of phonetic order.

The embodiment of the present invention verifying user identity information after, customized wake-up word is trained before include: Speech recognition and semantic understanding are carried out to the wake-up word information.

Voice acquisition module 112, noise processed module 113, voice transfer control 114 pieces of mould always in the embodiment of the present invention It is in running order, when voice transfer control module 114 is determined as the instruction of customized wake-up word according to the type of phonetic order, Voice signal is sent to cloud server 22 by the first voice transfer module 115, the voice of cloud server 22 is known at this time Other module 223, semantic understanding module 224 and voice training module 221 are in running order, start to execute wake-up word information Corresponding operation, when voice transfer control module 114 is determined as the instruction of non-custom wake-up word according to the type of phonetic order, Voice wake-up module 111 is in running order, wakes up local end equipment.

Example IV

Illustrate that the present invention realizes that voice wakes up the customized method of word in voice interactive system in conjunction with soil 2, comprising:

When system handles working condition, voice acquisition module 112, noise processed module 113, voice transfer control mould 114 are constantly in working condition, wait the input of user speech, when user wakes up word using modification, the first voice transfer mould Block 115, speech recognition module 223, semantic understanding module 223 are started to work, the final real-time identification for realizing voice；

In speech recognition process when user wakes up word using preset order modification, for example " I will modify and call out for use Awake word ", the meeting of cloud server 22 identifies the identity of the user by identification module 222, identifies whether the user manages Reason person, if having permission modification and wake up word；Then allow to modify if it is manager to wake up word, feeds back to use if not manager Family can not modify wake-up word.Meeting starts voice the wake-up word last time of the offer of user to cloud server 22 after being identified by Training, it is different that difference wakes up the word training time, therefore using that can return to user, how long afterwards the wake-up word comes into force, this How long sample can be used later if may remind the user that；It can be in the following way when the typing of manager's identity: using for the first time When equipment can prompt user's typing administrator information, according to prompt typing voice messaging, sound of the equipment the user after typing Line information is stored in cloud server 22.

After the completion of voice training, voice document library is returned to local end equipment 11, local end equipment by cloud server 22 After 11 receive voice document library, store into memory；

The memory can store multiple wake-up words simultaneously, and user can be waken up by multiple wake-up words set when in use It is standby.Addition and overlay strategy: whether original wake-up word is covered by phonetic order prompt, selects addition customized by user It wakes up word or covers original wake-up word, it is general to support to wake up word within storage 5.

A button can be arranged in the embodiment of the present invention in equipment, can be triggered by the button, restore voice and wake up Word function.After equipment receives the instruction, restore original voice data library file from storage, at the same user storage from Definition wakes up word and deletes, and realizes that voice wakes up the recovery function of word.

It checks that voice wakes up word by equipment, and is modified by copy editor and wake up word.When modification or addition wake up word, Equipment sends the text information of modification to cloud server, verifies identity by cloud server, authentication is by then allowing Modification wakes up word, and by the way that the wake-up word is passed to during training pattern is trained, cloud server will estimate training time and anti- Feed user.

Although disclosed embodiment is as above, its content is only to facilitate understand technical side of the invention Case and the embodiment used, are not intended to limit the present invention.Any those skilled in the art to which this invention pertains, not Under the premise of being detached from disclosed core technology scheme, any modification and change can be made in form and details in implementation Change, but protection scope defined by the present invention, the range that the appended claims that must still be subject to limits.

Claims

1. a kind of voice wakes up word and defines method, it is characterised in that: include:

Voice library file is returned to according to training result, the voice library file with phonetic order for being matched to support user Use customized wake-up word wake-up device；

The voice library file includes audio file corresponding with the wake-up word information, or including believing with the wake-up word Cease corresponding voice match algorithm and voice match parameter.

2. the method as described in claim 1, which is characterized in that the method also includes:

3. method according to claim 2, which is characterized in that according to the instruction for waking up word information determination and/or feedback wake-up word Practice the period, the word information that wakes up includes the content for waking up word, the public use frequency for waking up word, the length for waking up word and calls out Close degree between the word of awake word.

4. according to the method described in claim 3, it is characterized in that,

After cycle of training expires, customized wake-up word is prompted to be in available mode to the user for waking up word permission；And/or

When wake-up device fails, word and/or customized wake-up word are initially waken up to not having the user's prompt for waking up word permission.

5. such as method of any of claims 1-4, which is characterized in that the method also includes:

It receives user's customized wake-up word instruction by voice input or receives user and made by oneself by what terminal device inputted Justice wakes up word instruction.

6. method as claimed in claim 5, which is characterized in that instructed when user inputs the customized wake-up word by voice When, extract the audio-frequency information for waking up the customized wake-up word in word Self-definition process；Before cycle of training expires, by calling out for input Instruction of waking up carries out similarity with the audio-frequency information and compares, when similarity is more than threshold value, wake-up device.

7. method according to claim 2, which is characterized in that the method also includes:

The interactive voice information is trained by voice training model, extracts the minimum in the interactive voice information The parameters,acoustic of phonetic unit；

8. method according to claim 2, which is characterized in that after the identity information of verifying user, to customized wake-up Before word is trained further include:

9. a kind of voice wakes up word and defines system characterized by comprising

Voice wake-up module and voice training module are provided with voice training model in the voice training module, the voice Training module is used to be trained customized wake-up word by voice training model according to wake-up word information, and is tied according to training Fruit returns to voice library file；The voice wake-up module be used to be matched according to the voice library file of return with phonetic order with User is supported to use customized wake-up word wake-up device；The voice library file includes sound corresponding with the wake-up word information Frequency file, or including voice match algorithm corresponding with the wake-up word information and voice match parameter.

10. system as claimed in claim 9, which is characterized in that the system also includes identification modules；The voice is called out Awake module is set to local end equipment, and the voice training module is set to cloud platform；The local end equipment further includes language Sound acquisition module, noise processed module, voice transfer control module and the first voice transfer module；The cloud platform also wraps Include the second voice transfer module, speech recognition module and semantic understanding module；The voice transfer control module is according to voice The flow direction of the type adjustment voice signal of instruction, to wake up local end equipment or voice signal is sent to cloud platform；The cloud End platform parses the voice signal by speech recognition module and semantic understanding module, and according to the class of phonetic order Type returns to corresponding parsing result or voice library file.