CN110060693A

CN110060693A - Model training method, device, electronic equipment and storage medium

Info

Publication number: CN110060693A
Application number: CN201910305432.6A
Authority: CN
Inventors: 曹冰
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2019-07-26

Abstract

This application discloses a kind of model training method, device, electronic equipment and storage mediums, belong to field of communication technology.This method comprises: obtaining voice signal, the voice signal includes at least one voice content；It is searched from least one described voice content and wakes up the matched voice content of word, as target voice content；Obtain the corresponding voiceprint of the target voice content；The voiceprint is intercepted according to the wake-up word, obtains standard voiceprint；Sound-groove model is trained using the standard voiceprint, to obtain target sound-groove model.Model training method provided by the embodiments of the present application.By the acquisition to voiceprint and the available voiceprint to more standard is handled, it is so available to more preferably sound-groove model, and then the vocal print of promotion user wakes up experience.

Description

Model training method, device, electronic equipment and storage medium

Technical field

This application involves field of communication technology, more particularly, to a kind of model training method, device, electronic equipment and Storage medium.

Background technique

In recent years, with the fast development of intelligent sound processing technique, internet and cloud computing technology, go out on the market at present The phonetic order that existing smart machine can have been sent for user responds.But existing speech recognition technology there is also The case where certain drawbacks, i.e. user usually will appear false wake-up when carrying out voice wake-up or voice recognition.For example, Do not include waking up word in the content that user speaks, mistakenly therefrom identifies wake-up word, thus by false wake-up.

Summary of the invention

In view of this, the invention proposes a kind of model training method, device and electronic equipment, to solve the above problems.

In a first aspect, the embodiment of the present application provides a kind of model training method, it is applied to electronic equipment, this method packet It includes: obtaining voice signal, the voice signal includes at least one voice content；It is searched from least one described voice content With wake up the matched voice content of word, as target voice content；Obtain the corresponding voiceprint of the target voice content；Root The voiceprint is intercepted according to the wake-up word, obtains standard voiceprint；Using the standard voiceprint to sound Line model is trained, to obtain target sound-groove model.

Second aspect, the embodiment of the present application provide a kind of method for recognizing sound-groove, are applied to electronic equipment, this method packet It includes: obtaining the voice signal of user to be identified, and using the voice signal as voice signal to be measured；Institute is identified using model is waken up State the wake-up word in voice signal to be measured；The corresponding voiceprint of the wake-up word is searched, and using target sound-groove model to institute It states voiceprint to be identified, obtains recognition result.

The third aspect, the embodiment of the present application provide a kind of model training apparatus, are applied to electronic equipment.Described device packet Include: voice obtains module, searching module and data obtaining module, information interception module and model training module.Voice obtains mould For block for obtaining voice signal, the voice signal includes at least one voice content.Searching module is used for from described at least one It is searched in a voice content and wakes up the matched voice content of word, as target voice content.Data obtaining module is for obtaining The corresponding voiceprint of the target voice content.Information interception module be used for according to the wake-up word to the voiceprint into Row interception, obtains standard voiceprint.Model training module is for instructing sound-groove model using the standard voiceprint Practice, to obtain target sound-groove model.

Fourth aspect, the embodiment of the present application also provides a kind of electronic equipment, including one or more processors；Storage Device；One or more application program, wherein one or more of application programs are stored in the memory and are configured To be executed by one or more of processors, one or more of programs are configured to carry out the above method.

5th aspect, the embodiment of the present application also provides a kind of computer-readable medium, the computer-readable storage Program code is stored in medium, said program code can be called by processor and execute the above method.

Compared with the existing technology, model training method provided by the embodiments of the present application, device, electronic equipment and storage are situated between Matter is trained sound-groove model by obtaining voice signal, and voice signal includes at least one voice content, then from It is searched at least one described voice content and wakes up the matched voice content of word, and as target voice content, then The corresponding voiceprint of the target voice content is obtained, and the voiceprint is intercepted according to word is waken up, obtains target Sound-groove model is finally trained sound-groove model using the standard voiceprint, to obtain target sound-groove model.The application Embodiment can make the target sound-groove model more accurate and effective finally got by obtaining standard voiceprint.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.

A kind of vocal print that Fig. 1 shows the application proposition wakes up the structural schematic diagram of system；

The exemplary diagram of vocal print wakeup process in a kind of vocal print wake-up system proposed Fig. 2 shows the application；

Fig. 3 shows the method flow diagram of the model training method of the application one embodiment offer；

Fig. 4 shows the method flow diagram of the model training method of another embodiment of the application offer；

Fig. 5 shows the process of step S240 in the method for the model training method of another embodiment of the application offer Figure；

Fig. 6 shows the method flow diagram of the model training method of another embodiment of the application offer；

Fig. 7 shows the concrete application flow chart of the model training method of the application further embodiment offer；

Fig. 8 shows specifically used surface chart in the model training method of the application further embodiment offer；

Fig. 9 shows the flow chart of other steps in the model training method of the application further embodiment offer；

Figure 10 shows the module frame chart of model training apparatus provided by the embodiments of the present application；

Figure 11 shows the module frame chart of electronic equipment provided by the embodiments of the present application；

Figure 12 has gone out provided by the embodiments of the present application for saving or carrying the model realized according to the embodiment of the present application The storage unit of the program code of training method.

Specific embodiment

In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described.

Referring to Fig. 1, it may include training process and identification process that vocal print, which wakes up system, and the work that training process is mainly done Work is to obtain speech model and sound-groove model.Wherein, speech model be mainly used for identifying in voice signal to be identified whether include Wake up word；Sound-groove model is mainly used for determining the identity information of user to be identified because everyone vocal print be it is different, because This can determine the identity information of user by identification voiceprint.Speech model and sound-groove model can be in the present embodiment It is stored in advance in electronic equipment or server, and speech model and sound-groove model can then be obtained by data set training. When voice signal to be identified is input to electronic equipment by user, speech model and sound-groove model in electronic equipment can lead to It whether include waking up word, while determining the identity of user to be identified in the voice signal that the mode of Model Matching is crossed to judge input Whether matched with identity pre-stored in electronic equipment, when the two conditions meet simultaneously, electronic equipment can be called out It wakes up.

The exemplary diagram of vocal print wakeup process in a kind of vocal print wake-up system proposed Fig. 2 shows the application, can be with from Fig. 2 Find out, user can input voice signal to be identified by MIC (Microphone, microphone), and MIC receives voice to be identified After signal can by the transmitting voice signal to DSP (Digital Signal Processor, digital signal processor) into Row level-one wakes up, and DSP is the microprocessor for being responsible for processing digital signal operation specially in one embodiment, is mainly used in reality When rapidly realize various digital signal processing algorithms, share the task of CPU.It wakes up to complete level-one, may include in DSP Voice wakes up algorithm and voiceprint recognition algorithm, available after DSP processing to arrive wake-up signal and voice signal, by the two signals Final voice signal identification can be carried out by being input in Android.In other words, after the level-one arousal function for opening DSP, when User's input includes that can enter second level awakening phase by level-one awakening phase after waking up the voice signal of word, Voice signal to be identified can be input in the application processor of Android by second level awakening phase, in this stage Realize the identification to voice signal.

It wakes up and realizes currently based on the voice of vocal print, need to collect the voiceprint of user before the use.It is usually adopted With pre-set phrase, user is guided to repeat with reading 3~5 times, model and the voiceprint of training user are established with this. In other words, for the prior art in training sound-groove model, user needs all audios got to be sent into application processor In be trained, it is clear that will lead to sound-groove model in this way can have many redundancies when being trained, and may cause in this way The model finally got can not accurately identify vocal print.

Therefore, in order to overcome drawbacks described above, such as Fig. 3, the embodiment of the present application provides a kind of model training method, can apply In electronic equipment, the present embodiment describes the step process in electronic equipment, and this method may include step S110 to step S150。

Step S110: voice signal is obtained, the voice signal includes at least one voice content.

In the embodiment of the present invention, electronic equipment can be mobile phone, laptop, tablet computer (Tablet Personal Computer), palm PC, laptop computer (Laptop Computer), personal digital assistant (personal digital Assistant, abbreviation PDA), mobile internet device (Mobile Internet Device, MID) or wearable device (for example, smartwatch (such as iWatch), Intelligent bracelet, pedometer) or other mountable deployment instant messaging applications clients Electronic equipment.

Electronic equipment obtains voice signal, which, which can be user and be input to electronics by sound input interface, sets Standby.Because sound playback environ-ment is usually uncontrollable, the voice signal in the present embodiment includes at least one language Sound content.For example, user A, when to electronic equipment input speech signal, the user B of his at one's side is simultaneously also to electronic equipment Voice signal is had input, may so cause not only to include a voice content in the received voice signal of electronic equipment, It may also include more than two voice contents.And these voice contents may it is identical may not also be identical.Such as.User A is defeated The content entered is " aaa please unlock mobile phone ", and the content of user B input then may be " aaa I want unlock mobile phone " etc..

In addition, other redundant signals are also possible that in voice signal in one embodiment, these redundant signals and electricity The voice of sub- equipment actual needs identification is unrelated, is mainly user when carrying out voice signal input, produced by ambient enviroment Interference signal.For example, user in input speech signal, may there is the noise of other objects in ambient enviroment, for example, wind These sound can be known as in voice signal by sound, whistle sound, bark sound and the sound of other electronic equipments etc., the present embodiment Interference signal.

Step S120: searching from least one described voice content and wakes up the matched voice content of word, as target Voice content.

After electronic equipment gets voice signal, from lookup in the voice signal and it can wake up in the matched voice of word Hold, is known that voice signal may include at least one voice content by above-mentioned introduction, i.e. electronic equipment gets voice All voice contents that can include to it after signal are searched, and may determine that in voice signal by searching for electronic equipment is No includes waking up word, if can find and wake up the matched voice content of word at least one voice content, can be incited somebody to action Voice content comprising keyword is as target voice content, while electronic equipment can also be waken up.

In addition, the present embodiment can use neural network model when being matched voice content with wake-up word It is identified and is matched, i.e., neural network model can be to the voice after voice signal being input in neural network model The voice content for including in signal carries out feature extraction, then extracts result according to this feature and searches the language to match with wake-up word Sound content can be using the voice content comprising wake-up word as in target voice if in voice content including wake-up word Hold.It is " aaa please unlock mobile phone " and " bbb please unlock mobile phone " respectively for example, including two voice contents in voice signal, and Waking up word is then " aaa ", after neural network model receives voice content " aaa please unlock mobile phone " and " bbb please unlock mobile phone ", Feature extraction and recognitions first can be carried out to this two voice contents, get recognition result, then by the recognition result with call out Awake word is matched, it is clear that in " aaa please unlock mobile phone " this voice content include " aaa ", and " bbb please unlock mobile phone " this Do not include " aaa " in voice content, thus can by " aaa please unlock mobile phone " this voice content as target voice content, And " bbb please unlock mobile phone " then cannot function as target voice content.

Step S130: the corresponding voiceprint of the target voice content is obtained.

After electronic equipment gets target voice content, vocal print that can be corresponding according to the target voice content search Information, vocal print are the sound wave spectrums for the carrying verbal information that electricity consumption acoustic instrument is shown, the voiceprint map of any two people is all deposited In certain difference, because there is very big differences in terms of phonatory organ and form that everyone uses in speech. Therefore, the present embodiment can pass through the identity information identified to determine the corresponding user of voiceprint to voiceprint.One In a embodiment, voice content and voiceprint are stored in electronic equipment by one-to-one mode, are got in voice Electronic equipment can find corresponding voiceprint according to the voice content after appearance.For example, target voice content " aaa please unlock mobile phone " is what user A was issued, then is exactly the vocal print of user A with " aaa please unlock mobile phone " corresponding voiceprint Information.

Step S140: the voiceprint is intercepted according to the wake-up word, obtains standard voiceprint.

Electronic equipment can be according to wake-up word to the voiceprint after getting the corresponding voiceprint of target voice content It is intercepted, the main purpose intercepted here is to remove redundancy, it is avoided to interfere sound-groove model training.At one In embodiment, voiceprint is carried out intercepting can be intercepting the noise information in voiceprint, is also possible to sound The information of redundancy is intercepted in line information.For example, user A is before and after input " aaa please unlock mobile phone ", there are the waiting time, and Can there are certain noise or redundancy in this section of waiting time, in order to make the voiceprint finally obtained more standard, It needs all to intercept by redundancy that surrounding time section obtains.

In addition, according to wake up word to voiceprint carry out interception cardinal principle be voiceprint and voice content be one a pair It answers, includes wake-up word in target voice content by above-mentioned introduce, therefore wake up word also there is corresponding vocal print letter Breath, it can voiceprint is intercepted according to word is waken up.For example, target voice content is " aaa please unlock mobile phone ", and call out Awake word is " aaa ", then the information of redundancy is exactly " please unlock mobile phone " corresponding voiceprint in voiceprint, can by interception To remove the voiceprint of " mobile phone please be unlock " corresponding redundancy, standard voiceprint is got, which is " aaa " corresponding voiceprint.

Step S150: being trained sound-groove model using the standard voiceprint, to obtain target sound-groove model.

After electronic equipment gets standard voiceprint by intercept operation, which can be input to vocal print mould It is trained in type, sound-groove model here can store in the application processor of electronic equipment, and mainly effect is pair Voiceprint carries out feature extraction, and then determines the identity information of the corresponding user of the voiceprint.In the present embodiment, vocal print Model is initial model, and before unused standard voiceprint trains it, sound-groove model is low to the accuracy rate of Application on Voiceprint Recognition, mistake Rate is high, and after being trained with standard voiceprint to it, the accuracy rate of its identification, but also because this implementation not only can be improved Voiceprint in example is the sound-groove model for handling to obtain by range of information, therefore being obtained using voiceprint training The actual demand of user can be more in line with.

The embodiment of the present application propose a kind of model training method by the voice signal of acquisition is searched, is matched with And the operations such as interception can get the voiceprint of standard, believe in the voiceprint without containing extra interference information and redundancy Breath, can so make the sound-groove model more accurate and effective finally obtained, i.e., the sound-groove model finally obtained can be more accurate Effectively the voiceprint in voice signal is identified, so as to improve the accuracy rate of Application on Voiceprint Recognition.

The application can be applied to electronic equipment another embodiment provides for a kind of model training method, referring to Fig. 4, The model training method may include step S210 to step S270.

Step S210: voice signal input has been detected whether.

After electronic equipment receives recording trigger signal, so that it may detect whether voice signal input, i.e. electronics is set It is standby that speech trigger button perhaps trigger button or speech trigger mould of the voice activated module when electronic equipment above can be set When block is triggered by user, electronic equipment can receive a trigger signal.

As an implementation, can be timing and detected whether voice signal input, wherein timing detection refer to by Voice signal input has been detected whether according to prefixed time interval.For example, it may be every 1 second has detected whether voice signal input, It is also possible to detect whether voice signal input in every 0.5 second.

After getting the trigger signal, electronic equipment can open its voice arousal function judged without vocal print, this In without vocal print judgement voice arousal function refer to electronic equipment only identify wake-up word, any processing is not done to vocal print, i.e., Electronic equipment only needs to handle the voice content in voice signal, without to the corresponding voiceprint of voice content It is handled.After electronic equipment opens the arousal function judged without vocal print, voice signal input can have periodically been detected whether, Whether there can also be voice signal input with real-time detection, specifically be detected here in the case of which kind of without clearly limiting.

Step S220: the voice signal is obtained when there is voice signal input, and the voice signal is removed and is made an uproar The processing of sound.

It is known that in actual life voice signal usually all with noise in order to make the language got by above-mentioned introduction The purer the present embodiment of sound signal can denoise the voice signal got, i.e., when electronic equipment has detected voice When signal inputs, noise reduction process can be carried out to the voice signal, common voice noise can be summarized as four kinds, be respectively: Impulse noise, periodic noise, broadband noise and voice interference etc., noise type difference then corresponding denoising mode also not phase Together.I.e. impulse noise can remove impulse noise from voice signal；Periodic noise can be removed using trapper；Broadband Noise can be filtered out using nonlinear mode；Voice interference can be denoised using combed filter device.

In addition, the present embodiment can use LSM (Least Mean Square, adaptive-filtering) Method of Noise, basic spectrum subtracts Method and Wiener Filter Method etc. carry out noise reduction process to voice signal, also can use deep learning and inhibit to noise, or Person also can use Noise gate, sampling except method etc. of making an uproar is made a return journey out noise.Specifically used any method carries out denoising here Without clearly limiting, can be selected according to actual needs.

Step S230: voice signal is obtained, the voice signal includes at least one voice content.

Step S240: searching from least one described voice content and wakes up the matched voice content of word, as target Voice content.

Referring to Fig. 5, step S240 may include step S241 to step S242.

Step S241: multiple keywords in each voice content are obtained, the multiple crucial phrase is at a language Sound content.

In one embodiment, the voice signal that electronic equipment obtains includes at least a voice content, and each voice It then may include multiple keywords in content, i.e., multiple crucial phrases need before obtaining voice content at a voice content First obtain multiple keywords in the voice content.For example, including that a voice content is in the voice signal of user A input " aaa please unlock mobile phone ", it is clear that this voice content includes multiple keywords, and this multiple keyword be respectively ' a ', ' a ', ' a ', ' asking ', ' solution ', ' lock ', ' hand ' and ' machine ', i.e. this eight keywords constitute a voice content, and " aaa please unlock hand Machine ".In addition, keyword can be Chinese character, letter, number, underscore and additional character etc. in the present embodiment, specifically which kind of Here without clearly limiting.

Step S242: will voice content corresponding with the wake-up matched keyword of word, as target voice content.

After electronic equipment gets multiple keywords in each voice content, can by these keywords and wake up word into Row is matched one by one, i.e., determines in voice content whether include wake-up word by matched mode, if comprising that can incite somebody to action It include to wake up the voice content of word as target voice content.For example, waking up word is " aaa ", and voice content A is that " aaa is asked Unlock mobile phone ", voice content B is " aaa please open mobile phone ", and voice content C is " bbb please unlock mobile phone ".By will be in voice Hold A, the keyword in voice content B and voice content C and wake up after word matched one by one, it can be found that voice content A with It all include to wake up word " aaa " in voice content B, therefore it can be as target voice content, and in voice content C then not It include that wake-up word " aaa " therefore cannot be as target voice content.Therefore, the target voice content in the present embodiment can It can be a voice content, it is also possible to have multiple voice contents, specific how many need to be determined according to the actual situation.

Step S250: the corresponding voiceprint of the target voice content is obtained.

Step S260: the voiceprint is intercepted according to the wake-up word, obtains standard voiceprint.

Step S270: being trained sound-groove model using the standard voiceprint, to obtain target sound-groove model.

Whether the application real-time detection can have voice signal input by way of timing, as long as there is voice signal defeated Enter, electronic equipment can carry out respective handling to the voice signal got, this can make data processing more fast and effective. In addition, the application can carry out denoising to voice signal after getting voice signal, it can so make the vocal print obtained Information more standard, this is but also the sound-groove model finally obtained more accurate and effective.

Another embodiment of the application provides a kind of model training method, can be applied to electronic equipment, referring to Fig. 6, The model training method may include step S310 to step S350.

Step S310: voice signal is obtained, the voice signal includes at least one voice content.

Step S320: searching from least one described voice content and wakes up the matched voice content of word, as target Voice content.

Step S330: the corresponding voiceprint of the target voice content is obtained.

Step S340: the voiceprint is intercepted according to the wake-up word, obtains standard voiceprint.

Wherein, step S340 may include step S341 to step S342.

Step S341: by searching pass corresponding with the wake-up word in the corresponding multiple keywords of the target voice content Keyword.

Electronic equipment can be searched and be waken up in the object content after getting the corresponding voiceprint of object content How many a, these target voice content regardless of target voice content be known that by above-mentioned introduction for the corresponding keyword of word It all centainly include wake-up word.For example, target voice content A and target voice content B be respectively " aaa please unlock mobile phone " and " aaa please open mobile phone ", it is clear that the common ground of the two target voices is all comprising waking up word " aaa ".It therefore, can be by mesh Keyword corresponding with word is waken up can be searched in the corresponding multiple keywords of mark voice content.

Step S342: using the corresponding voiceprint of the keyword searched as standard voiceprint.

In one embodiment, voice content and voiceprint are stored by one-to-one mode, some The although identical but corresponding voiceprint of voice content is then different.For example, in the voice of user A and user B Appearance is all " aaa please unlock mobile phone ", but their corresponding voiceprints are different, because of everyone voiceprint It is different, and voiceprint may not also be identical under different situations by the same person.Target voice content A in step S341 In with to wake up the corresponding keyword of word be " aaa ", voiceprint corresponding with the keyword is voiceprint A；In target voice Holding keyword corresponding with word is waken up in B is " aaa ", and voiceprint corresponding with this keyword is then voiceprint B, i.e., most Whole standard voiceprint is voiceprint A and voiceprint B.Obviously, vocal print can be believed in the present embodiment according to wake-up word The information of redundancy is intercepted in breath, and wherein the information of redundancy is the other information in addition to waking up word, i.e., " please unlock mobile phone " Corresponding voiceprint is the information of redundancy, which is interference information, can directly be intercepted.

Step S350: being trained sound-groove model using the standard voiceprint, to obtain target sound-groove model.

The embodiment of the present application can intercept the redundancy in voiceprint, remove extra interference information, into And the standard voiceprint finally got can be made more accurate, i.e., the standard voiceprint finally got is more in line with sound The standard requirements of line identification, can so make the sound-groove model got more excellent.

Referring to Fig. 7, this application provides a kind of concrete application flow chart of model training method, i.e., a kind of Application on Voiceprint Recognition Method can be applied to electronic equipment, from Fig. 7 it is known that this method may include step S410 to step S430.

Step S410: the voice signal of user to be identified is obtained, and using the voice signal as voice signal to be measured.

The voice signal of user to be identified is obtained, which can be defeated by the voice input interface of electronic equipment Enter, as shown in figure 8, the content that the voice input interface of electronic equipment is shown is " please input and wake up word ' aaa ' ".In other words, When user needs to identify voice signal using electronic equipment, user can be sent to the electronic equipment comprising waking up word The voice signal of " aaa ", electronic equipment can be using the voice signals received as voice signal to be measured.In addition, the present embodiment Middle wake-up word can two syllables, four syllables or other syllables, waking up word can be pre-set, and there are many particular contents Here without clearly limiting.

Step S420: the wake-up word in the voice signal to be measured is identified using model is waken up.

It to be trained general speech recognition modeling that model is waken up in the present embodiment, pass through the speech recognition The recognition result of the available voice signal to be measured of model, finally judges whether the recognition result matches with wake-up word, if It can be then waken up with electronic equipment.Furthermore it is possible to using the voice data comprising wake-up word as training data, to preset structure Neural network be trained, obtain the wake-up model.Alternatively, can also be by the voice data comprising waking up word and other vocabulary As training data, neural network is trained, obtains final wake-up model.

Step S430: the corresponding voiceprint of the wake-up word is searched, and the vocal print is believed using target sound-groove model Breath is identified, recognition result is obtained.

In one embodiment, target sound-groove model can store in the electronic device, and such electronic equipment utilizes itself The sound-groove model of storage, identifies voiceprint, can shorten the response time of identification process.Alternatively, the target vocal print Model, which also can store, to be in the cloud server of electronic equipment communication connection, can save electronics to a certain extent in this way The memory of equipment, where sound-groove model can be configured according to the actual demand of user if being especially stored in.Electronic equipment obtains To after recognition result, electronic equipment can be unlocked according to the recognition result.

From Fig. 9 it is recognised that electronic equipment get recognition result after further include step S440 to step S450.

Step S440: recognition accuracy is obtained according to the recognition result, and judges whether the recognition accuracy is greater than mesh Mark threshold value.

By it is above-mentioned it is known that electronic equipment vocal print is identified after an available recognition result, according to the knowledge Other result can not only be unlocked electronic equipment, but also according to the available recognition accuracy of the recognition result, will The recognition accuracy is compared with targets threshold, then enters step S450 when the recognition accuracy is greater than targets threshold.Separately It outside, also may include parameter maximum loss rate, least disadvantage rate and weighted value etc. in recognition result, it can be right by these parameters Target sound-groove model carries out comprehensive analysis, and then can continue to optimize to the model.

Step S450: when the recognition accuracy is greater than targets threshold, believe the voice signal to be measured as optimization It number is input in the target sound-groove model to optimize.

It, can be using measured signal as excellent when the recognition accuracy is greater than targets threshold after getting recognition accuracy Change signal to be input in target sound-groove model to optimize.For example, targets threshold is 0.8, and the knowledge got by identification Other accuracy rate is 0.9, indicates relatively good to the recognition effect of voice signal to be measured, can be believed this measured signal as optimization Number target sound-groove model is updated, i.e., is trained voice signal to be measured input target sound-groove model.

The identification to voice signal may be implemented using wake-up model and sound-groove model in the embodiment of the present application, and because of vocal print mould Type is that by the way that treated, sound-groove model training is obtained, therefore the application can be more accurately and effectively to voice signal to be measured It is identified, in addition the application constantly can optimize sound-groove model and update, this is it is also possible that the application can be more Add the voice signal for accurately identifying different situations, and then Application on Voiceprint Recognition can be made more accurate.

Referring to Fig. 10, a kind of model training apparatus 500 that the embodiment of the present application proposes, can be applied to electronic equipment, have Body, which includes: that voice obtains module 510, searching module 520, data obtaining module 530, information section Modulus block 540 and model training module 550.

Voice obtains module 510, and for obtaining voice signal, the voice signal includes at least one voice content.

Before obtaining voice signal, the voice obtains module 510 and is also used to detect whether voice signal input, when The voice signal is obtained when having voice signal input, and the processing of noise is removed to the voice signal.

Searching module 520 is made for searching from least one described voice content and waking up the matched voice content of word For target voice content.

Searching module 520 can be used for obtaining multiple keywords in each voice content, the multiple keyword Form a voice content, then will voice content corresponding with the wake-up matched keyword of word, as in target voice Hold.

Data obtaining module 530, for obtaining the corresponding voiceprint of the target voice content.

Information interception module 540 obtains standard vocal print for intercepting according to the wake-up word to the voiceprint Information.

Further, information interception module 540 can be by searching in the corresponding multiple keywords of the target voice content Keyword corresponding with the wake-up word, and then using the corresponding voiceprint of the keyword searched as standard voiceprint.

Model training module 550, for being trained using the standard voiceprint to sound-groove model, to obtain target Sound-groove model.

After getting target vocal print module, it is quasi- that model training apparatus 500 can also obtain identification according to the recognition result True rate, and judge whether the recognition accuracy is greater than targets threshold；It, will be described when the recognition accuracy is greater than targets threshold Voice signal to be measured is input in the target sound-groove model as optimization signal to optimize.

Figure 11 is please referred to, it illustrates the structural block diagrams of a kind of electronic equipment 600 provided by the embodiments of the present application.The electronics Equipment 600, which can be smart phone, tablet computer, e-book etc., can run the electronic equipment of application program.In the application Electronic equipment 600 may include one or more such as lower component: processor 610, memory 620 and one or more application Program, wherein one or more application programs can be stored in memory 620 and be configured as being handled by one or more Device 610 executes, and one or more programs are configured to carry out the method as described in preceding method embodiment.

Processor 610 may include one or more processing core.Processor 610 is whole using various interfaces and connection Various pieces in a electronic equipment 600, by run or execute the instruction being stored in memory 620, program, code set or Instruction set, and the data being stored in memory 620 are called, execute the various functions and processing data of electronic equipment 600.It can Selection of land, processor 610 can use Digital Signal Processing (Digital Signal Processing, DSP), field-programmable Gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable LogicArray, PLA) at least one of example, in hardware realize.Processor 610 can integrating central processor (Central Processing Unit, CPU), in image processor (Graphics Processing Unit, GPU) and modem etc. One or more of combinations.Wherein, the main processing operation system of CPU, user interface and application program etc.；GPU is for being responsible for Show the rendering and drafting of content；Modem is for handling wireless communication.It is understood that above-mentioned modem It can not be integrated into processor 610, be realized separately through one piece of communication chip.

Memory 620 may include random access memory (Random Access Memory, RAM), also may include read-only Memory (Read-Only Memory).Memory 620 can be used for store instruction, program, code, code set or instruction set.It deposits Reservoir 620 may include storing program area and storage data area, wherein the finger that storing program area can store for realizing operating system Enable, for realizing at least one function instruction (such as touch function, sound-playing function, image player function etc.), be used for Realize the instruction etc. of following each embodiments of the method.Storage data area can also store electronic equipment 600 and be created in use Data (such as phone directory, audio, video data, chat record data) etc..

Figure 12 is please referred to, it illustrates a kind of structures of computer readable storage medium 700 provided by the embodiments of the present application Block diagram.Program code is stored in the computer readable storage medium 700, said program code can be called by processor and be executed State method described in embodiment of the method.

Computer readable storage medium 700 can be such as flash memory, EEPROM (electrically erasable programmable read-only memory), The electronic memory of EPROM, hard disk or ROM etc.Optionally, computer readable storage medium 700 includes non-volatile meter Calculation machine readable medium (non-transitory computer-readable storage medium).Computer-readable storage Medium 700 has the memory space for the program code 710 for executing any method and step in the above method.These program codes can With from reading or be written in one or more computer program product in this one or more computer program product. Program code 710 can be compressed in a suitable form.

Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations；Although The application is described in detail with reference to the foregoing embodiments, those skilled in the art are when understanding: it still can be with It modifies the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features；And These are modified or replaceed, do not drive corresponding technical solution essence be detached from each embodiment technical solution of the application spirit and Range.

Claims

1. a kind of model training method, which is characterized in that be applied to electronic equipment, which comprises

Voice signal is obtained, the voice signal includes at least one voice content；

It is searched from least one described voice content and wakes up the matched voice content of word, as target voice content；

Obtain the corresponding voiceprint of the target voice content；

The voiceprint is intercepted according to the wake-up word, obtains standard voiceprint；

Sound-groove model is trained using the standard voiceprint, to obtain target sound-groove model.

2. the method according to claim 1, wherein before the acquisition voice signal, comprising:

Voice signal input is detected whether；

The voice signal is obtained when there is voice signal input, and the processing of noise is removed to the voice signal.

3. the method according to claim 1, wherein described search and call out from least one described voice content The awake matched voice content of word, as target voice content, comprising:

Multiple keywords in each voice content are obtained, the multiple crucial phrase is at a voice content；

Will voice content corresponding with the wake-up matched keyword of word, as target voice content.

4. according to the method described in claim 3, it is characterized in that, described carry out the voiceprint according to the wake-up word Interception, obtains standard voiceprint, comprising:

By searching keyword corresponding with the wake-up word in the corresponding multiple keywords of the target voice content；

Using the corresponding voiceprint of the keyword searched as standard voiceprint.

5. a kind of method for recognizing sound-groove, which is characterized in that be applied to electronic equipment, which comprises

The voice signal of user to be identified is obtained, and using the voice signal as voice signal to be measured；

The wake-up word in the voice signal to be measured is identified using model is waken up；

The corresponding voiceprint of the wake-up word is searched, and the voiceprint is identified using target sound-groove model, is obtained To recognition result.

6. according to the method described in claim 5, it is characterized in that, it is described obtain recognition result after, further includes:

Recognition accuracy is obtained according to the recognition result, and judges whether the recognition accuracy is greater than targets threshold；

When the recognition accuracy is greater than targets threshold, the mesh is input to using the voice signal to be measured as optimization signal It marks in sound-groove model to optimize.

7. according to the method described in claim 5, it is characterized in that, it is described obtain recognition result after, further includes:

The electronic equipment is unlocked according to the recognition result.

8. a kind of model training apparatus, which is characterized in that be applied to electronic equipment, described device includes:

Voice obtains module, and for obtaining voice signal, the voice signal includes at least one voice content；

Searching module, for being searched from least one described voice content and waking up the matched voice content of word, as target Voice content；

Data obtaining module, for obtaining the corresponding voiceprint of the target voice content；

Information interception module obtains standard voiceprint for intercepting according to the wake-up word to the voiceprint；

Model training module, for being trained using the standard voiceprint to sound-groove model, to obtain target vocal print mould Type.

9. a kind of electronic equipment characterized by comprising

One or more processors；

Memory；

One or more application program, wherein one or more of application programs are stored in the memory and are configured To be executed by one or more of processors, one or more of programs are configured to carry out as claim 1-7 is any Method described in.

10. a kind of computer-readable storage medium, which is characterized in that be stored with journey in the computer-readable storage medium Sequence code, said program code can be called by processor and execute the method according to claim 1 to 7.