CN110060693A - Model training method, device, electronic equipment and storage medium - Google Patents
Model training method, device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN110060693A CN110060693A CN201910305432.6A CN201910305432A CN110060693A CN 110060693 A CN110060693 A CN 110060693A CN 201910305432 A CN201910305432 A CN 201910305432A CN 110060693 A CN110060693 A CN 110060693A
- Authority
- CN
- China
- Prior art keywords
- voiceprint
- voice
- voice content
- word
- voice signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
Abstract
This application discloses a kind of model training method, device, electronic equipment and storage mediums, belong to field of communication technology.This method comprises: obtaining voice signal, the voice signal includes at least one voice content;It is searched from least one described voice content and wakes up the matched voice content of word, as target voice content;Obtain the corresponding voiceprint of the target voice content;The voiceprint is intercepted according to the wake-up word, obtains standard voiceprint;Sound-groove model is trained using the standard voiceprint, to obtain target sound-groove model.Model training method provided by the embodiments of the present application.By the acquisition to voiceprint and the available voiceprint to more standard is handled, it is so available to more preferably sound-groove model, and then the vocal print of promotion user wakes up experience.
Description
Technical field
This application involves field of communication technology, more particularly, to a kind of model training method, device, electronic equipment and
Storage medium.
Background technique
In recent years, with the fast development of intelligent sound processing technique, internet and cloud computing technology, go out on the market at present
The phonetic order that existing smart machine can have been sent for user responds.But existing speech recognition technology there is also
The case where certain drawbacks, i.e. user usually will appear false wake-up when carrying out voice wake-up or voice recognition.For example,
Do not include waking up word in the content that user speaks, mistakenly therefrom identifies wake-up word, thus by false wake-up.
Summary of the invention
In view of this, the invention proposes a kind of model training method, device and electronic equipment, to solve the above problems.
In a first aspect, the embodiment of the present application provides a kind of model training method, it is applied to electronic equipment, this method packet
It includes: obtaining voice signal, the voice signal includes at least one voice content;It is searched from least one described voice content
With wake up the matched voice content of word, as target voice content;Obtain the corresponding voiceprint of the target voice content;Root
The voiceprint is intercepted according to the wake-up word, obtains standard voiceprint;Using the standard voiceprint to sound
Line model is trained, to obtain target sound-groove model.
Second aspect, the embodiment of the present application provide a kind of method for recognizing sound-groove, are applied to electronic equipment, this method packet
It includes: obtaining the voice signal of user to be identified, and using the voice signal as voice signal to be measured;Institute is identified using model is waken up
State the wake-up word in voice signal to be measured;The corresponding voiceprint of the wake-up word is searched, and using target sound-groove model to institute
It states voiceprint to be identified, obtains recognition result.
The third aspect, the embodiment of the present application provide a kind of model training apparatus, are applied to electronic equipment.Described device packet
Include: voice obtains module, searching module and data obtaining module, information interception module and model training module.Voice obtains mould
For block for obtaining voice signal, the voice signal includes at least one voice content.Searching module is used for from described at least one
It is searched in a voice content and wakes up the matched voice content of word, as target voice content.Data obtaining module is for obtaining
The corresponding voiceprint of the target voice content.Information interception module be used for according to the wake-up word to the voiceprint into
Row interception, obtains standard voiceprint.Model training module is for instructing sound-groove model using the standard voiceprint
Practice, to obtain target sound-groove model.
Fourth aspect, the embodiment of the present application also provides a kind of electronic equipment, including one or more processors;Storage
Device;One or more application program, wherein one or more of application programs are stored in the memory and are configured
To be executed by one or more of processors, one or more of programs are configured to carry out the above method.
5th aspect, the embodiment of the present application also provides a kind of computer-readable medium, the computer-readable storage
Program code is stored in medium, said program code can be called by processor and execute the above method.
Compared with the existing technology, model training method provided by the embodiments of the present application, device, electronic equipment and storage are situated between
Matter is trained sound-groove model by obtaining voice signal, and voice signal includes at least one voice content, then from
It is searched at least one described voice content and wakes up the matched voice content of word, and as target voice content, then
The corresponding voiceprint of the target voice content is obtained, and the voiceprint is intercepted according to word is waken up, obtains target
Sound-groove model is finally trained sound-groove model using the standard voiceprint, to obtain target sound-groove model.The application
Embodiment can make the target sound-groove model more accurate and effective finally got by obtaining standard voiceprint.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for
For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
A kind of vocal print that Fig. 1 shows the application proposition wakes up the structural schematic diagram of system;
The exemplary diagram of vocal print wakeup process in a kind of vocal print wake-up system proposed Fig. 2 shows the application;
Fig. 3 shows the method flow diagram of the model training method of the application one embodiment offer;
Fig. 4 shows the method flow diagram of the model training method of another embodiment of the application offer;
Fig. 5 shows the process of step S240 in the method for the model training method of another embodiment of the application offer
Figure;
Fig. 6 shows the method flow diagram of the model training method of another embodiment of the application offer;
Fig. 7 shows the concrete application flow chart of the model training method of the application further embodiment offer;
Fig. 8 shows specifically used surface chart in the model training method of the application further embodiment offer;
Fig. 9 shows the flow chart of other steps in the model training method of the application further embodiment offer;
Figure 10 shows the module frame chart of model training apparatus provided by the embodiments of the present application;
Figure 11 shows the module frame chart of electronic equipment provided by the embodiments of the present application;
Figure 12 has gone out provided by the embodiments of the present application for saving or carrying the model realized according to the embodiment of the present application
The storage unit of the program code of training method.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application
Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described.
Referring to Fig. 1, it may include training process and identification process that vocal print, which wakes up system, and the work that training process is mainly done
Work is to obtain speech model and sound-groove model.Wherein, speech model be mainly used for identifying in voice signal to be identified whether include
Wake up word;Sound-groove model is mainly used for determining the identity information of user to be identified because everyone vocal print be it is different, because
This can determine the identity information of user by identification voiceprint.Speech model and sound-groove model can be in the present embodiment
It is stored in advance in electronic equipment or server, and speech model and sound-groove model can then be obtained by data set training.
When voice signal to be identified is input to electronic equipment by user, speech model and sound-groove model in electronic equipment can lead to
It whether include waking up word, while determining the identity of user to be identified in the voice signal that the mode of Model Matching is crossed to judge input
Whether matched with identity pre-stored in electronic equipment, when the two conditions meet simultaneously, electronic equipment can be called out
It wakes up.
The exemplary diagram of vocal print wakeup process in a kind of vocal print wake-up system proposed Fig. 2 shows the application, can be with from Fig. 2
Find out, user can input voice signal to be identified by MIC (Microphone, microphone), and MIC receives voice to be identified
After signal can by the transmitting voice signal to DSP (Digital Signal Processor, digital signal processor) into
Row level-one wakes up, and DSP is the microprocessor for being responsible for processing digital signal operation specially in one embodiment, is mainly used in reality
When rapidly realize various digital signal processing algorithms, share the task of CPU.It wakes up to complete level-one, may include in DSP
Voice wakes up algorithm and voiceprint recognition algorithm, available after DSP processing to arrive wake-up signal and voice signal, by the two signals
Final voice signal identification can be carried out by being input in Android.In other words, after the level-one arousal function for opening DSP, when
User's input includes that can enter second level awakening phase by level-one awakening phase after waking up the voice signal of word,
Voice signal to be identified can be input in the application processor of Android by second level awakening phase, in this stage
Realize the identification to voice signal.
It wakes up and realizes currently based on the voice of vocal print, need to collect the voiceprint of user before the use.It is usually adopted
With pre-set phrase, user is guided to repeat with reading 3~5 times, model and the voiceprint of training user are established with this.
In other words, for the prior art in training sound-groove model, user needs all audios got to be sent into application processor
In be trained, it is clear that will lead to sound-groove model in this way can have many redundancies when being trained, and may cause in this way
The model finally got can not accurately identify vocal print.
Therefore, in order to overcome drawbacks described above, such as Fig. 3, the embodiment of the present application provides a kind of model training method, can apply
In electronic equipment, the present embodiment describes the step process in electronic equipment, and this method may include step S110 to step
S150。
Step S110: voice signal is obtained, the voice signal includes at least one voice content.
In the embodiment of the present invention, electronic equipment can be mobile phone, laptop, tablet computer (Tablet Personal
Computer), palm PC, laptop computer (Laptop Computer), personal digital assistant (personal digital
Assistant, abbreviation PDA), mobile internet device (Mobile Internet Device, MID) or wearable device
(for example, smartwatch (such as iWatch), Intelligent bracelet, pedometer) or other mountable deployment instant messaging applications clients
Electronic equipment.
Electronic equipment obtains voice signal, which, which can be user and be input to electronics by sound input interface, sets
Standby.Because sound playback environ-ment is usually uncontrollable, the voice signal in the present embodiment includes at least one language
Sound content.For example, user A, when to electronic equipment input speech signal, the user B of his at one's side is simultaneously also to electronic equipment
Voice signal is had input, may so cause not only to include a voice content in the received voice signal of electronic equipment,
It may also include more than two voice contents.And these voice contents may it is identical may not also be identical.Such as.User A is defeated
The content entered is " aaa please unlock mobile phone ", and the content of user B input then may be " aaa I want unlock mobile phone " etc..
In addition, other redundant signals are also possible that in voice signal in one embodiment, these redundant signals and electricity
The voice of sub- equipment actual needs identification is unrelated, is mainly user when carrying out voice signal input, produced by ambient enviroment
Interference signal.For example, user in input speech signal, may there is the noise of other objects in ambient enviroment, for example, wind
These sound can be known as in voice signal by sound, whistle sound, bark sound and the sound of other electronic equipments etc., the present embodiment
Interference signal.
Step S120: searching from least one described voice content and wakes up the matched voice content of word, as target
Voice content.
After electronic equipment gets voice signal, from lookup in the voice signal and it can wake up in the matched voice of word
Hold, is known that voice signal may include at least one voice content by above-mentioned introduction, i.e. electronic equipment gets voice
All voice contents that can include to it after signal are searched, and may determine that in voice signal by searching for electronic equipment is
No includes waking up word, if can find and wake up the matched voice content of word at least one voice content, can be incited somebody to action
Voice content comprising keyword is as target voice content, while electronic equipment can also be waken up.
In addition, the present embodiment can use neural network model when being matched voice content with wake-up word
It is identified and is matched, i.e., neural network model can be to the voice after voice signal being input in neural network model
The voice content for including in signal carries out feature extraction, then extracts result according to this feature and searches the language to match with wake-up word
Sound content can be using the voice content comprising wake-up word as in target voice if in voice content including wake-up word
Hold.It is " aaa please unlock mobile phone " and " bbb please unlock mobile phone " respectively for example, including two voice contents in voice signal, and
Waking up word is then " aaa ", after neural network model receives voice content " aaa please unlock mobile phone " and " bbb please unlock mobile phone ",
Feature extraction and recognitions first can be carried out to this two voice contents, get recognition result, then by the recognition result with call out
Awake word is matched, it is clear that in " aaa please unlock mobile phone " this voice content include " aaa ", and " bbb please unlock mobile phone " this
Do not include " aaa " in voice content, thus can by " aaa please unlock mobile phone " this voice content as target voice content,
And " bbb please unlock mobile phone " then cannot function as target voice content.
Step S130: the corresponding voiceprint of the target voice content is obtained.
After electronic equipment gets target voice content, vocal print that can be corresponding according to the target voice content search
Information, vocal print are the sound wave spectrums for the carrying verbal information that electricity consumption acoustic instrument is shown, the voiceprint map of any two people is all deposited
In certain difference, because there is very big differences in terms of phonatory organ and form that everyone uses in speech.
Therefore, the present embodiment can pass through the identity information identified to determine the corresponding user of voiceprint to voiceprint.One
In a embodiment, voice content and voiceprint are stored in electronic equipment by one-to-one mode, are got in voice
Electronic equipment can find corresponding voiceprint according to the voice content after appearance.For example, target voice content
" aaa please unlock mobile phone " is what user A was issued, then is exactly the vocal print of user A with " aaa please unlock mobile phone " corresponding voiceprint
Information.
Step S140: the voiceprint is intercepted according to the wake-up word, obtains standard voiceprint.
Electronic equipment can be according to wake-up word to the voiceprint after getting the corresponding voiceprint of target voice content
It is intercepted, the main purpose intercepted here is to remove redundancy, it is avoided to interfere sound-groove model training.At one
In embodiment, voiceprint is carried out intercepting can be intercepting the noise information in voiceprint, is also possible to sound
The information of redundancy is intercepted in line information.For example, user A is before and after input " aaa please unlock mobile phone ", there are the waiting time, and
Can there are certain noise or redundancy in this section of waiting time, in order to make the voiceprint finally obtained more standard,
It needs all to intercept by redundancy that surrounding time section obtains.
In addition, according to wake up word to voiceprint carry out interception cardinal principle be voiceprint and voice content be one a pair
It answers, includes wake-up word in target voice content by above-mentioned introduce, therefore wake up word also there is corresponding vocal print letter
Breath, it can voiceprint is intercepted according to word is waken up.For example, target voice content is " aaa please unlock mobile phone ", and call out
Awake word is " aaa ", then the information of redundancy is exactly " please unlock mobile phone " corresponding voiceprint in voiceprint, can by interception
To remove the voiceprint of " mobile phone please be unlock " corresponding redundancy, standard voiceprint is got, which is
" aaa " corresponding voiceprint.
Step S150: being trained sound-groove model using the standard voiceprint, to obtain target sound-groove model.
After electronic equipment gets standard voiceprint by intercept operation, which can be input to vocal print mould
It is trained in type, sound-groove model here can store in the application processor of electronic equipment, and mainly effect is pair
Voiceprint carries out feature extraction, and then determines the identity information of the corresponding user of the voiceprint.In the present embodiment, vocal print
Model is initial model, and before unused standard voiceprint trains it, sound-groove model is low to the accuracy rate of Application on Voiceprint Recognition, mistake
Rate is high, and after being trained with standard voiceprint to it, the accuracy rate of its identification, but also because this implementation not only can be improved
Voiceprint in example is the sound-groove model for handling to obtain by range of information, therefore being obtained using voiceprint training
The actual demand of user can be more in line with.
The embodiment of the present application propose a kind of model training method by the voice signal of acquisition is searched, is matched with
And the operations such as interception can get the voiceprint of standard, believe in the voiceprint without containing extra interference information and redundancy
Breath, can so make the sound-groove model more accurate and effective finally obtained, i.e., the sound-groove model finally obtained can be more accurate
Effectively the voiceprint in voice signal is identified, so as to improve the accuracy rate of Application on Voiceprint Recognition.
The application can be applied to electronic equipment another embodiment provides for a kind of model training method, referring to Fig. 4,
The model training method may include step S210 to step S270.
Step S210: voice signal input has been detected whether.
After electronic equipment receives recording trigger signal, so that it may detect whether voice signal input, i.e. electronics is set
It is standby that speech trigger button perhaps trigger button or speech trigger mould of the voice activated module when electronic equipment above can be set
When block is triggered by user, electronic equipment can receive a trigger signal.
As an implementation, can be timing and detected whether voice signal input, wherein timing detection refer to by
Voice signal input has been detected whether according to prefixed time interval.For example, it may be every 1 second has detected whether voice signal input,
It is also possible to detect whether voice signal input in every 0.5 second.
After getting the trigger signal, electronic equipment can open its voice arousal function judged without vocal print, this
In without vocal print judgement voice arousal function refer to electronic equipment only identify wake-up word, any processing is not done to vocal print, i.e.,
Electronic equipment only needs to handle the voice content in voice signal, without to the corresponding voiceprint of voice content
It is handled.After electronic equipment opens the arousal function judged without vocal print, voice signal input can have periodically been detected whether,
Whether there can also be voice signal input with real-time detection, specifically be detected here in the case of which kind of without clearly limiting.
Step S220: the voice signal is obtained when there is voice signal input, and the voice signal is removed and is made an uproar
The processing of sound.
It is known that in actual life voice signal usually all with noise in order to make the language got by above-mentioned introduction
The purer the present embodiment of sound signal can denoise the voice signal got, i.e., when electronic equipment has detected voice
When signal inputs, noise reduction process can be carried out to the voice signal, common voice noise can be summarized as four kinds, be respectively:
Impulse noise, periodic noise, broadband noise and voice interference etc., noise type difference then corresponding denoising mode also not phase
Together.I.e. impulse noise can remove impulse noise from voice signal;Periodic noise can be removed using trapper;Broadband
Noise can be filtered out using nonlinear mode;Voice interference can be denoised using combed filter device.
In addition, the present embodiment can use LSM (Least Mean Square, adaptive-filtering) Method of Noise, basic spectrum subtracts
Method and Wiener Filter Method etc. carry out noise reduction process to voice signal, also can use deep learning and inhibit to noise, or
Person also can use Noise gate, sampling except method etc. of making an uproar is made a return journey out noise.Specifically used any method carries out denoising here
Without clearly limiting, can be selected according to actual needs.
Step S230: voice signal is obtained, the voice signal includes at least one voice content.
Step S240: searching from least one described voice content and wakes up the matched voice content of word, as target
Voice content.
Referring to Fig. 5, step S240 may include step S241 to step S242.
Step S241: multiple keywords in each voice content are obtained, the multiple crucial phrase is at a language
Sound content.
In one embodiment, the voice signal that electronic equipment obtains includes at least a voice content, and each voice
It then may include multiple keywords in content, i.e., multiple crucial phrases need before obtaining voice content at a voice content
First obtain multiple keywords in the voice content.For example, including that a voice content is in the voice signal of user A input
" aaa please unlock mobile phone ", it is clear that this voice content includes multiple keywords, and this multiple keyword be respectively ' a ', ' a ',
' a ', ' asking ', ' solution ', ' lock ', ' hand ' and ' machine ', i.e. this eight keywords constitute a voice content, and " aaa please unlock hand
Machine ".In addition, keyword can be Chinese character, letter, number, underscore and additional character etc. in the present embodiment, specifically which kind of
Here without clearly limiting.
Step S242: will voice content corresponding with the wake-up matched keyword of word, as target voice content.
After electronic equipment gets multiple keywords in each voice content, can by these keywords and wake up word into
Row is matched one by one, i.e., determines in voice content whether include wake-up word by matched mode, if comprising that can incite somebody to action
It include to wake up the voice content of word as target voice content.For example, waking up word is " aaa ", and voice content A is that " aaa is asked
Unlock mobile phone ", voice content B is " aaa please open mobile phone ", and voice content C is " bbb please unlock mobile phone ".By will be in voice
Hold A, the keyword in voice content B and voice content C and wake up after word matched one by one, it can be found that voice content A with
It all include to wake up word " aaa " in voice content B, therefore it can be as target voice content, and in voice content C then not
It include that wake-up word " aaa " therefore cannot be as target voice content.Therefore, the target voice content in the present embodiment can
It can be a voice content, it is also possible to have multiple voice contents, specific how many need to be determined according to the actual situation.
Step S250: the corresponding voiceprint of the target voice content is obtained.
Step S260: the voiceprint is intercepted according to the wake-up word, obtains standard voiceprint.
Step S270: being trained sound-groove model using the standard voiceprint, to obtain target sound-groove model.
Whether the application real-time detection can have voice signal input by way of timing, as long as there is voice signal defeated
Enter, electronic equipment can carry out respective handling to the voice signal got, this can make data processing more fast and effective.
In addition, the application can carry out denoising to voice signal after getting voice signal, it can so make the vocal print obtained
Information more standard, this is but also the sound-groove model finally obtained more accurate and effective.
Another embodiment of the application provides a kind of model training method, can be applied to electronic equipment, referring to Fig. 6,
The model training method may include step S310 to step S350.
Step S310: voice signal is obtained, the voice signal includes at least one voice content.
Step S320: searching from least one described voice content and wakes up the matched voice content of word, as target
Voice content.
Step S330: the corresponding voiceprint of the target voice content is obtained.
Step S340: the voiceprint is intercepted according to the wake-up word, obtains standard voiceprint.
Wherein, step S340 may include step S341 to step S342.
Step S341: by searching pass corresponding with the wake-up word in the corresponding multiple keywords of the target voice content
Keyword.
Electronic equipment can be searched and be waken up in the object content after getting the corresponding voiceprint of object content
How many a, these target voice content regardless of target voice content be known that by above-mentioned introduction for the corresponding keyword of word
It all centainly include wake-up word.For example, target voice content A and target voice content B be respectively " aaa please unlock mobile phone " and
" aaa please open mobile phone ", it is clear that the common ground of the two target voices is all comprising waking up word " aaa ".It therefore, can be by mesh
Keyword corresponding with word is waken up can be searched in the corresponding multiple keywords of mark voice content.
Step S342: using the corresponding voiceprint of the keyword searched as standard voiceprint.
In one embodiment, voice content and voiceprint are stored by one-to-one mode, some
The although identical but corresponding voiceprint of voice content is then different.For example, in the voice of user A and user B
Appearance is all " aaa please unlock mobile phone ", but their corresponding voiceprints are different, because of everyone voiceprint
It is different, and voiceprint may not also be identical under different situations by the same person.Target voice content A in step S341
In with to wake up the corresponding keyword of word be " aaa ", voiceprint corresponding with the keyword is voiceprint A;In target voice
Holding keyword corresponding with word is waken up in B is " aaa ", and voiceprint corresponding with this keyword is then voiceprint B, i.e., most
Whole standard voiceprint is voiceprint A and voiceprint B.Obviously, vocal print can be believed in the present embodiment according to wake-up word
The information of redundancy is intercepted in breath, and wherein the information of redundancy is the other information in addition to waking up word, i.e., " please unlock mobile phone "
Corresponding voiceprint is the information of redundancy, which is interference information, can directly be intercepted.
Step S350: being trained sound-groove model using the standard voiceprint, to obtain target sound-groove model.
The embodiment of the present application can intercept the redundancy in voiceprint, remove extra interference information, into
And the standard voiceprint finally got can be made more accurate, i.e., the standard voiceprint finally got is more in line with sound
The standard requirements of line identification, can so make the sound-groove model got more excellent.
Referring to Fig. 7, this application provides a kind of concrete application flow chart of model training method, i.e., a kind of Application on Voiceprint Recognition
Method can be applied to electronic equipment, from Fig. 7 it is known that this method may include step S410 to step S430.
Step S410: the voice signal of user to be identified is obtained, and using the voice signal as voice signal to be measured.
The voice signal of user to be identified is obtained, which can be defeated by the voice input interface of electronic equipment
Enter, as shown in figure 8, the content that the voice input interface of electronic equipment is shown is " please input and wake up word ' aaa ' ".In other words,
When user needs to identify voice signal using electronic equipment, user can be sent to the electronic equipment comprising waking up word
The voice signal of " aaa ", electronic equipment can be using the voice signals received as voice signal to be measured.In addition, the present embodiment
Middle wake-up word can two syllables, four syllables or other syllables, waking up word can be pre-set, and there are many particular contents
Here without clearly limiting.
Step S420: the wake-up word in the voice signal to be measured is identified using model is waken up.
It to be trained general speech recognition modeling that model is waken up in the present embodiment, pass through the speech recognition
The recognition result of the available voice signal to be measured of model, finally judges whether the recognition result matches with wake-up word, if
It can be then waken up with electronic equipment.Furthermore it is possible to using the voice data comprising wake-up word as training data, to preset structure
Neural network be trained, obtain the wake-up model.Alternatively, can also be by the voice data comprising waking up word and other vocabulary
As training data, neural network is trained, obtains final wake-up model.
Step S430: the corresponding voiceprint of the wake-up word is searched, and the vocal print is believed using target sound-groove model
Breath is identified, recognition result is obtained.
In one embodiment, target sound-groove model can store in the electronic device, and such electronic equipment utilizes itself
The sound-groove model of storage, identifies voiceprint, can shorten the response time of identification process.Alternatively, the target vocal print
Model, which also can store, to be in the cloud server of electronic equipment communication connection, can save electronics to a certain extent in this way
The memory of equipment, where sound-groove model can be configured according to the actual demand of user if being especially stored in.Electronic equipment obtains
To after recognition result, electronic equipment can be unlocked according to the recognition result.
From Fig. 9 it is recognised that electronic equipment get recognition result after further include step S440 to step S450.
Step S440: recognition accuracy is obtained according to the recognition result, and judges whether the recognition accuracy is greater than mesh
Mark threshold value.
By it is above-mentioned it is known that electronic equipment vocal print is identified after an available recognition result, according to the knowledge
Other result can not only be unlocked electronic equipment, but also according to the available recognition accuracy of the recognition result, will
The recognition accuracy is compared with targets threshold, then enters step S450 when the recognition accuracy is greater than targets threshold.Separately
It outside, also may include parameter maximum loss rate, least disadvantage rate and weighted value etc. in recognition result, it can be right by these parameters
Target sound-groove model carries out comprehensive analysis, and then can continue to optimize to the model.
Step S450: when the recognition accuracy is greater than targets threshold, believe the voice signal to be measured as optimization
It number is input in the target sound-groove model to optimize.
It, can be using measured signal as excellent when the recognition accuracy is greater than targets threshold after getting recognition accuracy
Change signal to be input in target sound-groove model to optimize.For example, targets threshold is 0.8, and the knowledge got by identification
Other accuracy rate is 0.9, indicates relatively good to the recognition effect of voice signal to be measured, can be believed this measured signal as optimization
Number target sound-groove model is updated, i.e., is trained voice signal to be measured input target sound-groove model.
The identification to voice signal may be implemented using wake-up model and sound-groove model in the embodiment of the present application, and because of vocal print mould
Type is that by the way that treated, sound-groove model training is obtained, therefore the application can be more accurately and effectively to voice signal to be measured
It is identified, in addition the application constantly can optimize sound-groove model and update, this is it is also possible that the application can be more
Add the voice signal for accurately identifying different situations, and then Application on Voiceprint Recognition can be made more accurate.
Referring to Fig. 10, a kind of model training apparatus 500 that the embodiment of the present application proposes, can be applied to electronic equipment, have
Body, which includes: that voice obtains module 510, searching module 520, data obtaining module 530, information section
Modulus block 540 and model training module 550.
Voice obtains module 510, and for obtaining voice signal, the voice signal includes at least one voice content.
Before obtaining voice signal, the voice obtains module 510 and is also used to detect whether voice signal input, when
The voice signal is obtained when having voice signal input, and the processing of noise is removed to the voice signal.
Searching module 520 is made for searching from least one described voice content and waking up the matched voice content of word
For target voice content.
Searching module 520 can be used for obtaining multiple keywords in each voice content, the multiple keyword
Form a voice content, then will voice content corresponding with the wake-up matched keyword of word, as in target voice
Hold.
Data obtaining module 530, for obtaining the corresponding voiceprint of the target voice content.
Information interception module 540 obtains standard vocal print for intercepting according to the wake-up word to the voiceprint
Information.
Further, information interception module 540 can be by searching in the corresponding multiple keywords of the target voice content
Keyword corresponding with the wake-up word, and then using the corresponding voiceprint of the keyword searched as standard voiceprint.
Model training module 550, for being trained using the standard voiceprint to sound-groove model, to obtain target
Sound-groove model.
After getting target vocal print module, it is quasi- that model training apparatus 500 can also obtain identification according to the recognition result
True rate, and judge whether the recognition accuracy is greater than targets threshold;It, will be described when the recognition accuracy is greater than targets threshold
Voice signal to be measured is input in the target sound-groove model as optimization signal to optimize.
Figure 11 is please referred to, it illustrates the structural block diagrams of a kind of electronic equipment 600 provided by the embodiments of the present application.The electronics
Equipment 600, which can be smart phone, tablet computer, e-book etc., can run the electronic equipment of application program.In the application
Electronic equipment 600 may include one or more such as lower component: processor 610, memory 620 and one or more application
Program, wherein one or more application programs can be stored in memory 620 and be configured as being handled by one or more
Device 610 executes, and one or more programs are configured to carry out the method as described in preceding method embodiment.
Processor 610 may include one or more processing core.Processor 610 is whole using various interfaces and connection
Various pieces in a electronic equipment 600, by run or execute the instruction being stored in memory 620, program, code set or
Instruction set, and the data being stored in memory 620 are called, execute the various functions and processing data of electronic equipment 600.It can
Selection of land, processor 610 can use Digital Signal Processing (Digital Signal Processing, DSP), field-programmable
Gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable
LogicArray, PLA) at least one of example, in hardware realize.Processor 610 can integrating central processor (Central
Processing Unit, CPU), in image processor (Graphics Processing Unit, GPU) and modem etc.
One or more of combinations.Wherein, the main processing operation system of CPU, user interface and application program etc.;GPU is for being responsible for
Show the rendering and drafting of content;Modem is for handling wireless communication.It is understood that above-mentioned modem
It can not be integrated into processor 610, be realized separately through one piece of communication chip.
Memory 620 may include random access memory (Random Access Memory, RAM), also may include read-only
Memory (Read-Only Memory).Memory 620 can be used for store instruction, program, code, code set or instruction set.It deposits
Reservoir 620 may include storing program area and storage data area, wherein the finger that storing program area can store for realizing operating system
Enable, for realizing at least one function instruction (such as touch function, sound-playing function, image player function etc.), be used for
Realize the instruction etc. of following each embodiments of the method.Storage data area can also store electronic equipment 600 and be created in use
Data (such as phone directory, audio, video data, chat record data) etc..
Figure 12 is please referred to, it illustrates a kind of structures of computer readable storage medium 700 provided by the embodiments of the present application
Block diagram.Program code is stored in the computer readable storage medium 700, said program code can be called by processor and be executed
State method described in embodiment of the method.
Computer readable storage medium 700 can be such as flash memory, EEPROM (electrically erasable programmable read-only memory),
The electronic memory of EPROM, hard disk or ROM etc.Optionally, computer readable storage medium 700 includes non-volatile meter
Calculation machine readable medium (non-transitory computer-readable storage medium).Computer-readable storage
Medium 700 has the memory space for the program code 710 for executing any method and step in the above method.These program codes can
With from reading or be written in one or more computer program product in this one or more computer program product.
Program code 710 can be compressed in a suitable form.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although
The application is described in detail with reference to the foregoing embodiments, those skilled in the art are when understanding: it still can be with
It modifies the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;And
These are modified or replaceed, do not drive corresponding technical solution essence be detached from each embodiment technical solution of the application spirit and
Range.
Claims (10)
1. a kind of model training method, which is characterized in that be applied to electronic equipment, which comprises
Voice signal is obtained, the voice signal includes at least one voice content;
It is searched from least one described voice content and wakes up the matched voice content of word, as target voice content;
Obtain the corresponding voiceprint of the target voice content;
The voiceprint is intercepted according to the wake-up word, obtains standard voiceprint;
Sound-groove model is trained using the standard voiceprint, to obtain target sound-groove model.
2. the method according to claim 1, wherein before the acquisition voice signal, comprising:
Voice signal input is detected whether;
The voice signal is obtained when there is voice signal input, and the processing of noise is removed to the voice signal.
3. the method according to claim 1, wherein described search and call out from least one described voice content
The awake matched voice content of word, as target voice content, comprising:
Multiple keywords in each voice content are obtained, the multiple crucial phrase is at a voice content;
Will voice content corresponding with the wake-up matched keyword of word, as target voice content.
4. according to the method described in claim 3, it is characterized in that, described carry out the voiceprint according to the wake-up word
Interception, obtains standard voiceprint, comprising:
By searching keyword corresponding with the wake-up word in the corresponding multiple keywords of the target voice content;
Using the corresponding voiceprint of the keyword searched as standard voiceprint.
5. a kind of method for recognizing sound-groove, which is characterized in that be applied to electronic equipment, which comprises
The voice signal of user to be identified is obtained, and using the voice signal as voice signal to be measured;
The wake-up word in the voice signal to be measured is identified using model is waken up;
The corresponding voiceprint of the wake-up word is searched, and the voiceprint is identified using target sound-groove model, is obtained
To recognition result.
6. according to the method described in claim 5, it is characterized in that, it is described obtain recognition result after, further includes:
Recognition accuracy is obtained according to the recognition result, and judges whether the recognition accuracy is greater than targets threshold;
When the recognition accuracy is greater than targets threshold, the mesh is input to using the voice signal to be measured as optimization signal
It marks in sound-groove model to optimize.
7. according to the method described in claim 5, it is characterized in that, it is described obtain recognition result after, further includes:
The electronic equipment is unlocked according to the recognition result.
8. a kind of model training apparatus, which is characterized in that be applied to electronic equipment, described device includes:
Voice obtains module, and for obtaining voice signal, the voice signal includes at least one voice content;
Searching module, for being searched from least one described voice content and waking up the matched voice content of word, as target
Voice content;
Data obtaining module, for obtaining the corresponding voiceprint of the target voice content;
Information interception module obtains standard voiceprint for intercepting according to the wake-up word to the voiceprint;
Model training module, for being trained using the standard voiceprint to sound-groove model, to obtain target vocal print mould
Type.
9. a kind of electronic equipment characterized by comprising
One or more processors;
Memory;
One or more application program, wherein one or more of application programs are stored in the memory and are configured
To be executed by one or more of processors, one or more of programs are configured to carry out as claim 1-7 is any
Method described in.
10. a kind of computer-readable storage medium, which is characterized in that be stored with journey in the computer-readable storage medium
Sequence code, said program code can be called by processor and execute the method according to claim 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910305432.6A CN110060693A (en) | 2019-04-16 | 2019-04-16 | Model training method, device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910305432.6A CN110060693A (en) | 2019-04-16 | 2019-04-16 | Model training method, device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110060693A true CN110060693A (en) | 2019-07-26 |
Family
ID=67319209
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910305432.6A Pending CN110060693A (en) | 2019-04-16 | 2019-04-16 | Model training method, device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110060693A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110491393A (en) * | 2019-08-30 | 2019-11-22 | 科大讯飞股份有限公司 | The training method and relevant apparatus of vocal print characterization model |
CN110491373A (en) * | 2019-08-19 | 2019-11-22 | Oppo广东移动通信有限公司 | Model training method, device, storage medium and electronic equipment |
CN110544468A (en) * | 2019-08-23 | 2019-12-06 | Oppo广东移动通信有限公司 | Application awakening method and device, storage medium and electronic equipment |
CN110570869A (en) * | 2019-08-09 | 2019-12-13 | 科大讯飞股份有限公司 | Voiceprint recognition method, device, equipment and storage medium |
CN110880318A (en) * | 2019-11-27 | 2020-03-13 | 云知声智能科技股份有限公司 | Voice recognition method and device |
CN111326146A (en) * | 2020-02-25 | 2020-06-23 | 北京声智科技有限公司 | Method and device for acquiring voice awakening template, electronic equipment and computer readable storage medium |
CN111627449A (en) * | 2020-05-20 | 2020-09-04 | Oppo广东移动通信有限公司 | Screen voiceprint unlocking method and device |
CN113407768A (en) * | 2021-06-24 | 2021-09-17 | 深圳市声扬科技有限公司 | Voiceprint retrieval method, device, system, server and storage medium |
CN113421573A (en) * | 2021-06-18 | 2021-09-21 | 马上消费金融股份有限公司 | Identity recognition model training method, identity recognition method and device |
CN113782005A (en) * | 2021-01-18 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Voice recognition method and device, storage medium and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105575395A (en) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | Voice wake-up method and apparatus, terminal, and processing method thereof |
CN107147618A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of user registering method, device and electronic equipment |
CN107886957A (en) * | 2017-11-17 | 2018-04-06 | 广州势必可赢网络科技有限公司 | The voice awakening method and device of a kind of combination Application on Voiceprint Recognition |
CN108694947A (en) * | 2018-06-27 | 2018-10-23 | Oppo广东移动通信有限公司 | Sound control method, device, storage medium and electronic equipment |
TW201839629A (en) * | 2017-04-28 | 2018-11-01 | 冠捷投資有限公司 | Method of application to intelligent personal assistant |
CN108766446A (en) * | 2018-04-18 | 2018-11-06 | 上海问之信息科技有限公司 | Method for recognizing sound-groove, device, storage medium and speaker |
-
2019
- 2019-04-16 CN CN201910305432.6A patent/CN110060693A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105575395A (en) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | Voice wake-up method and apparatus, terminal, and processing method thereof |
CN107147618A (en) * | 2017-04-10 | 2017-09-08 | 北京猎户星空科技有限公司 | A kind of user registering method, device and electronic equipment |
TW201839629A (en) * | 2017-04-28 | 2018-11-01 | 冠捷投資有限公司 | Method of application to intelligent personal assistant |
CN107886957A (en) * | 2017-11-17 | 2018-04-06 | 广州势必可赢网络科技有限公司 | The voice awakening method and device of a kind of combination Application on Voiceprint Recognition |
CN108766446A (en) * | 2018-04-18 | 2018-11-06 | 上海问之信息科技有限公司 | Method for recognizing sound-groove, device, storage medium and speaker |
CN108694947A (en) * | 2018-06-27 | 2018-10-23 | Oppo广东移动通信有限公司 | Sound control method, device, storage medium and electronic equipment |
Non-Patent Citations (2)
Title |
---|
俞一彪: "《基于互信息理论的说话人识别研究》", 31 December 2004, 上海大学出版社 * |
王炳锡 等: "《实用语音识别基础》", 31 January 2005, 国防工业出版社 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110570869B (en) * | 2019-08-09 | 2022-01-14 | 科大讯飞股份有限公司 | Voiceprint recognition method, device, equipment and storage medium |
CN110570869A (en) * | 2019-08-09 | 2019-12-13 | 科大讯飞股份有限公司 | Voiceprint recognition method, device, equipment and storage medium |
CN110491373A (en) * | 2019-08-19 | 2019-11-22 | Oppo广东移动通信有限公司 | Model training method, device, storage medium and electronic equipment |
CN110544468A (en) * | 2019-08-23 | 2019-12-06 | Oppo广东移动通信有限公司 | Application awakening method and device, storage medium and electronic equipment |
CN110544468B (en) * | 2019-08-23 | 2022-07-12 | Oppo广东移动通信有限公司 | Application awakening method and device, storage medium and electronic equipment |
CN110491393A (en) * | 2019-08-30 | 2019-11-22 | 科大讯飞股份有限公司 | The training method and relevant apparatus of vocal print characterization model |
CN110491393B (en) * | 2019-08-30 | 2022-04-22 | 科大讯飞股份有限公司 | Training method of voiceprint representation model and related device |
CN110880318A (en) * | 2019-11-27 | 2020-03-13 | 云知声智能科技股份有限公司 | Voice recognition method and device |
CN110880318B (en) * | 2019-11-27 | 2023-04-18 | 云知声智能科技股份有限公司 | Voice recognition method and device |
CN111326146A (en) * | 2020-02-25 | 2020-06-23 | 北京声智科技有限公司 | Method and device for acquiring voice awakening template, electronic equipment and computer readable storage medium |
CN111627449A (en) * | 2020-05-20 | 2020-09-04 | Oppo广东移动通信有限公司 | Screen voiceprint unlocking method and device |
CN111627449B (en) * | 2020-05-20 | 2023-02-28 | Oppo广东移动通信有限公司 | Screen voiceprint unlocking method and device |
CN113782005A (en) * | 2021-01-18 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Voice recognition method and device, storage medium and electronic equipment |
CN113782005B (en) * | 2021-01-18 | 2024-03-01 | 北京沃东天骏信息技术有限公司 | Speech recognition method and device, storage medium and electronic equipment |
CN113421573A (en) * | 2021-06-18 | 2021-09-21 | 马上消费金融股份有限公司 | Identity recognition model training method, identity recognition method and device |
CN113421573B (en) * | 2021-06-18 | 2024-03-19 | 马上消费金融股份有限公司 | Identity recognition model training method, identity recognition method and device |
CN113407768A (en) * | 2021-06-24 | 2021-09-17 | 深圳市声扬科技有限公司 | Voiceprint retrieval method, device, system, server and storage medium |
CN113407768B (en) * | 2021-06-24 | 2024-02-02 | 深圳市声扬科技有限公司 | Voiceprint retrieval method, voiceprint retrieval device, voiceprint retrieval system, voiceprint retrieval server and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110060693A (en) | Model training method, device, electronic equipment and storage medium | |
CN107481718B (en) | Audio recognition method, device, storage medium and electronic equipment | |
CN107134279B (en) | Voice awakening method, device, terminal and storage medium | |
CN107767863B (en) | Voice awakening method and system and intelligent terminal | |
CN108182937B (en) | Keyword recognition method, device, equipment and storage medium | |
CN110890093B (en) | Intelligent equipment awakening method and device based on artificial intelligence | |
CN110265040A (en) | Training method, device, storage medium and the electronic equipment of sound-groove model | |
CN110534099A (en) | Voice wakes up processing method, device, storage medium and electronic equipment | |
CN107977183A (en) | voice interactive method, device and equipment | |
CN110570840B (en) | Intelligent device awakening method and device based on artificial intelligence | |
CN110570873B (en) | Voiceprint wake-up method and device, computer equipment and storage medium | |
CN108897732B (en) | Statement type identification method and device, storage medium and electronic device | |
CN109215646B (en) | Voice interaction processing method and device, computer equipment and storage medium | |
CN109872713A (en) | A kind of voice awakening method and device | |
CN110473536B (en) | Awakening method and device and intelligent device | |
CN110544468B (en) | Application awakening method and device, storage medium and electronic equipment | |
CN111312222A (en) | Awakening and voice recognition model training method and device | |
CN110491373A (en) | Model training method, device, storage medium and electronic equipment | |
JP6496942B2 (en) | Information processing device | |
CN110491394A (en) | Wake up the acquisition methods and device of corpus | |
CN109119073A (en) | Audio recognition method, system, speaker and storage medium based on multi-source identification | |
CN111048068B (en) | Voice wake-up method, device and system and electronic equipment | |
CN111522592A (en) | Intelligent terminal awakening method and device based on artificial intelligence | |
CN112669837B (en) | Awakening method and device of intelligent terminal and electronic equipment | |
CN113362830A (en) | Starting method, control method, system and storage medium of voice assistant |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190726 |