CN108172230A - Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model - Google Patents
Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model Download PDFInfo
- Publication number
- CN108172230A CN108172230A CN201810003939.1A CN201810003939A CN108172230A CN 108172230 A CN108172230 A CN 108172230A CN 201810003939 A CN201810003939 A CN 201810003939A CN 108172230 A CN108172230 A CN 108172230A
- Authority
- CN
- China
- Prior art keywords
- voice
- per
- vector
- voiceprint
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Abstract
The invention discloses a kind of voiceprint registration methods based on Application on Voiceprint Recognition model, and applied to terminal installation, this method includes:Obtain efficient voice during user's registration voice;The efficient voice is averagely divided into integer part voice;The characteristic voice vector per portion voice in integer part voice is calculated respectively;Judge whether belong to same user per a voice according to the result of calculation to the characteristic voice vector per a voice;And if every a voice belongs to same user, and voice registration is carried out to the efficient voice.The present invention also provides a kind of terminal installation and storage mediums.The present invention can detect whether user in the environment that noise is excessive or more people speak carries out voiceprint registration, undesirable recording is avoided to enter vocal print library (or sound bank), ensure the quality of registration voice, it prevents low quality voice from influencing subsequent authentication, promotes the availability of Application on Voiceprint Recognition.
Description
Technical field
The present invention relates to field of communication technology more particularly to a kind of voiceprint registration method based on Application on Voiceprint Recognition model, ends
End device and storage medium.
Background technology
With the continuous development of speech recognition technology, the application for supporting speech recognition is also more and more, for example voice is unlocked,
Voice payment etc..But in the application of speech recognition technology, it is exactly voiceprint registration to be directed to a very important step.In sound
Line registration in, if there is a large amount of environmental noise or there are more people carry out Speech Record it is fashionable, inherently influence registration voice
Quality, and low quality voice influence subsequent authentication, and then influence Application on Voiceprint Recognition availability.And current way is mainly done
Noise monitor, i.e., traditional noise measuring were detected environment before Application on Voiceprint Recognition is carried out, before typing user speech only
Typing ambient sound is analyzed, and is determined as that environment is excessively noisy if ambient sound decibels are excessively high, is set if ambient sound decibel is less than
Determine threshold value, then noise measuring passes through, and user can be with normal registration vocal print.But the noise measuring method is only capable of detection environmental noise,
Whether None- identified goes out to have more people to enter recording, still influences the quality of registration voice.
Invention content
In view of this, the present invention proposes a kind of voiceprint registration method, terminal installation and storage based on Application on Voiceprint Recognition model
Medium by implementing aforesaid way, can detect whether user in the environment that noise is excessive or more people speak carries out vocal print
Registration avoids undesirable recording from entering vocal print library (sound bank), ensures the quality of registration voice, prevent low quality voice
Subsequent authentication is influenced, promotes the availability of Application on Voiceprint Recognition.
First, to achieve the above object, the present invention proposes a kind of terminal installation, and the terminal installation includes memory, place
Device is managed, the voiceprint registration program based on Application on Voiceprint Recognition model that can be run on the processor is stored on the memory,
The voiceprint registration program based on Application on Voiceprint Recognition model realizes following steps when being performed by the processor:
Obtain efficient voice during user's registration voice;
The efficient voice is averagely divided into integer part voice;
The characteristic voice vector per portion voice in integer part voice is calculated respectively;
Whether belong to same per the result of calculation of a voice according to the characteristic voice vector determination per a voice
One user;And
If described belong to same user per a voice, voice registration is carried out to the efficient voice.
Optionally, the step for calculating the characteristic voice vector in integer part voice per portion voice respectively, packet
It includes:
It is extracted using MFCC methods per one matrix of MFCC features and composition per frame voice in a voice;And
Feature most crucial in the matrix is filtered out using UBM and characteristic voice vector extractor, forms the feature
Speech vector.
Optionally, the foundation judges per a voice the result of calculation of the characteristic voice vector per a voice
The step of whether belonging to same user, including:
Comparison two-by-two is carried out using PLDA algorithms to the characteristic voice vector per a voice to give a mark;
If the difference compared after giving a mark two-by-two is less than a preset value, judge that every a voice belongs to same use
Family.
In addition, to achieve the above object, the present invention also provides a kind of voiceprint registration methods based on Application on Voiceprint Recognition model, should
For terminal installation, the method includes:
Obtain efficient voice during user's registration voice;
The efficient voice is averagely divided into integer part voice;
The characteristic voice vector per portion voice in integer part voice is calculated respectively;
Whether belong to same per the result of calculation of a voice according to the characteristic voice vector determination per a voice
One user;And
If described belong to same user per a voice, voice registration is carried out to the efficient voice.
Optionally, the described the step of efficient voice is averagely divided into integer part voice, including:
When efficient voice is non-textual voice, the efficient voice of user is averagely cut into integer section to obtain by valid frame
Round numbers part voice.
Optionally, the step for calculating the characteristic voice vector in integer part voice per portion voice respectively, packet
It includes:
It is extracted using MFCC methods per one matrix of MFCC features and composition per frame voice in a voice;And
Feature most crucial in the matrix is filtered out using UBM and characteristic voice vector extractor, forms the feature
Speech vector.
Optionally, it is described to judging according to the result of calculation of the characteristic voice vector per a voice per a voice
The step of whether belonging to same user, including:
Comparison two-by-two is carried out using PLDA algorithms to the characteristic voice vector per a voice to give a mark;
If the difference compared after giving a mark two-by-two is less than a preset value, judge that every a voice belongs to same use
Family.
Further, to achieve the above object, the present invention also provides a kind of storage medium, the storage medium is stored with base
In the voiceprint registration program of Application on Voiceprint Recognition model, the voiceprint registration program based on Application on Voiceprint Recognition model can be by least one place
It manages device to perform, so that at least one processor performs the voiceprint registration method based on Application on Voiceprint Recognition model as described above
Step.
Compared to the prior art, the voiceprint registration method proposed by the invention based on Application on Voiceprint Recognition model, terminal installation
And storage medium, first, obtain efficient voice during user's registration voice;Secondly, the efficient voice is averagely divided into whole
Several parts of voices;Then, the characteristic voice vector per portion voice in integer part voice is calculated respectively;Then, according to described in
Whether the characteristic voice vector determination per portion voice belongs to same user per a voice;Finally, it is if described per a language
Sound belongs to same user, then carries out voice registration to the efficient voice.In this way, existing noise measuring method can be solved only
Environmental noise can be detected, whether None- identified goes out to have more people to enter recording, the drawbacks of still influencing to register the quality of voice, and then
It can detect whether user in the environment that noise is excessive or more people speak carries out voiceprint registration, avoid undesirable record
Sound enters vocal print library (or sound bank), ensures the quality of registration voice, prevents low quality voice from influencing subsequent authentication, promotion sound
The availability of line identification.
Description of the drawings
Fig. 1 is a kind of hardware architecture diagram for the terminal installation for realizing each embodiment of the present invention;
Fig. 2 is a kind of communications network system Organization Chart provided in an embodiment of the present invention;
Fig. 3 is the Program modual graph of one embodiment of voiceprint registration program the present invention is based on Application on Voiceprint Recognition model;
Fig. 4 is the flow chart of one embodiment of voiceprint registration program the present invention is based on Application on Voiceprint Recognition model;
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that specific embodiment described herein is not intended to limit the present invention only to explain the present invention.
In subsequent description, using for representing that the suffix of such as " module ", " component " or " unit " of element is only
Be conducive to the explanation of the present invention, itself there is no a specific meaning.Therefore, " module ", " component " or " unit " can mix
Ground uses.
Terminal installation can be implemented in a variety of manners.For example, terminal installation described in the present invention can be included such as
Mobile phone, tablet computer, laptop, palm PC, personal digital assistant (Personal Digital Assistant,
PDA), portable media player (Portable Media Player, PMP), navigation device, wearable device, intelligent hand
The fixed terminals such as the mobile terminals such as ring, pedometer and number TV, desktop computer.
It will be illustrated by taking mobile terminal as an example in subsequent descriptions, it will be appreciated by those skilled in the art that in addition to special
For moving except the element of purpose, construction according to the embodiment of the present invention can also apply to the terminal of fixed type.
Referring to Fig. 1, a kind of hardware architecture diagram of its terminal installation of each embodiment to realize the present invention, the end
End device 100 can include:RF (Radio Frequency, radio frequency) unit 101, WiFi module 102, audio output unit
103rd, A/V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit
108th, the components such as memory 109, processor 110 and power supply 111.It will be understood by those skilled in the art that shown in Fig. 1
The restriction of terminal installation structure not structure paired terminal, terminal installation can include than illustrate more or fewer components or
Combine certain components or different components arrangement.
For the ease of understanding the embodiment of the present invention, below to the communications network system that is based on of terminal installation of the present invention into
Row description.
Referring to Fig. 2, Fig. 2 is a kind of communications network system Organization Chart provided in an embodiment of the present invention, the communication network system
The LTE system united as universal mobile communications technology, the LTE system include the UE (User Equipment, the use that communicate connection successively
Family equipment) 201, E-UTRAN (Evolved UMTS Terrestrial Radio Access Network, evolved UMTS lands
Ground wireless access network) 202, EPC (Evolved Packet Core, evolved packet-based core networks) 203 and operator IP operation
204。
Specifically, UE201 can be above-mentioned terminal 100, and details are not described herein again.
Based on above description, but those skilled in the art also may be used it is to be understood that the present invention is not only applicable to LTE system
To be suitable for other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA and following new network system
Deng, meanwhile, above-mentioned terminal installation 100 can be mobile phone, the electronic equipment of other voice Collecting and dealings such as personal electric assistant,
It does not limit herein.
Based on above-mentioned 100 hardware configuration of terminal installation and communications network system, each embodiment of the method for the present invention is proposed.
First, the present invention proposes a kind of voiceprint registration program 300 based on Application on Voiceprint Recognition model.
As shown in fig.3, it is the program of 300 first embodiment of voiceprint registration program the present invention is based on Application on Voiceprint Recognition model
Module map.
In the present embodiment, the voiceprint registration program 300 based on Application on Voiceprint Recognition model includes a series of be stored in
Computer program instructions on memory 109 when the computer program instructions are performed by processor 110, can realize this hair
The operation of the voiceprint registration based on Application on Voiceprint Recognition model of bright each embodiment.In some embodiments, referred to based on the computer program
The specific operation that each section is realized is enabled, the voiceprint registration program 300 based on Application on Voiceprint Recognition model can be divided into
One or more modules.For example, in figure 3, the voiceprint registration program 300 based on Application on Voiceprint Recognition model can be divided
Into acquisition module 301, segmentation module 302, computing module 303, judgment module 304 and registration module 305.Wherein:
The acquisition module 301, for obtaining efficient voice during user's registration voice.In the present embodiment, the base
Terminal installation 100 is stored in the voiceprint registration program 300 of Application on Voiceprint Recognition model, the terminal installation 100 of the present embodiment can be
Any one terminal with speech identifying function, such as mobile phone, portable computer, personal digital assistant, bank paying terminal,
Access control equipment etc., these equipment can go to realize some specific function and applications by speech recognition technology.In addition, terminal
Device 100 obtains efficient voice when user carries out voice registration, can be obtained since when user clicks voice typing,
Until user stops voice typing, so pending speech samples can be improved to avoid some unnecessary noise jamming
Degree of purity.
The segmentation module 302, the efficient voice for the acquisition module 301 to be obtained averagely are divided into integer part language
Sound.Specifically, the efficient voice of user is averagely cut into integer section to obtain integer part by the segmentation module 302 by valid frame
Voice.In the present embodiment, the efficient voice of acquisition is preferably averagely divided into 3 parts of voices by the segmentation module 302.And having
In the implementation process of body, if when user uses text pertinent registration vocal print (such as cryptogram), repeated text is needed three times
It is registered, then using this 3 voices as 3 parts of voices after segmentation.It, can be with when using that text is unrelated to carry out voiceprint registration
It is three sections that valid frame is directly averagely cut to voice.Certainly, the present invention is not limited to efficient voice is averagely divided into 3 parts of languages
Sound, technical staff can be split according to actual needs.
The computing module 303, it is each in integer part voice of segmentation module 302 segmentation for calculating respectively
The characteristic voice vector of part voice.In the present embodiment, the computing module 303 calculates described integer part respectively in the following manner
Characteristic voice vector in voice per portion voice:
It is extracted using mel-frequency cepstrum coefficient MFCC methods per the MFCC features and composition per frame voice in a voice
One matrix, and screened using universal background model (UBM) and vectorial (i-vector) extractor (extractor) of characteristic voice
Go out feature most crucial in the matrix, form the characteristic voice vector.
Wherein, MFCC is the abbreviation of Mel-Frequency Cepstral Coefficients, includes two key steps
Suddenly:Mel-frequency is transformed into, then carries out cepstral analysis.Melscale be it is a kind of based on human ear to equidistant pitch (pitch)
The relationship of the hertz of non-linear frequency scale and frequency depending on the sense organ judgement of variation is as follows:M=2595log10(1+f/
700), wherein m represents each melscale, and f represents current sound frequency.It in the present embodiment, will be per a language
Sound is input to the wave filter of melscale, and the wave filter of melscale is then recycled to every a voice wait the letter of scales
Number cutting, wherein obtaining multiple frequency bands, and forms a matrix by the numerical value of corresponding frequencies section.
Common UBM (universal background model, Universal Background Model) has gauss hybrid models
(GMM).And the definition of mixed Gauss model is:
Wherein K is the number of model;πkWeight for k-th of Gauss;P (x/k) is then k-th of Gaussian probability density,
Mean value is μk, variance σk.And after above-mentioned matrix is substituted into such Gauss model, then corresponding K is equal to matrix node
Number, πkFor the weight (i.e. k-th node occur number) of k-th of node, and mean value is μk, variance σkIt is then institute in matrix
There are mean value, the variance of node, p (x/k) is the probability of k-th node.It is not difficult to calculate by above-mentioned formula, p (x/k) is most in matrix
Big node is chosen by extractor as the speech feature vector per a voice.
Further, per the feature of portion voice in the computing module 303 calculates integer part voice respectively
After speech vector, the judgment module 304, for according to the every a voice of the characteristic voice vector determination per a voice
Whether same user is belonged to.
Specifically, the judgment module 304 judges whether belong to same user's per a voice in the following manner
Step:
The judgment module 304 calculates the characteristic voice vector per a voice, i.e., is compared two-by-two
Marking, and judge whether every a voice belongs to same user according to marking result.For example, in the present embodiment, it is described
Judgment module 304 carries out comparison two-by-two to the characteristic voice vector per a voice using PLDA algorithms and gives a mark.It is specific and
Speech, the characteristic voice vector obtained above by matrix-vector, only including corresponding frequency values, and by PLDA models,
Can be to the channel compensation that respective value is simulated, and then ensure that features described above speech vector has channel attribute (such as frequency
Bandwidth), then given a mark by log-likelihood ratio, will the frequencies/channels property value of each characteristic voice vector take the logarithm,
Then numerical value after taking the logarithm is compared.For example if there is two tested speech, by calculating log-likelihood ratio, compare two
Whether the difference between logarithm after speech feature vector calculating is less than a preset value, if less than a preset value, judges two
Speech feature vector is similar, i.e., two parts of voices may be same user.
Further, in order to exclude the influence of noise, the judgment module 304 is judging whether belong to same per a voice
During the step of one user, whether it is additionally operable to judge to compare marking every time higher than preset value, if comparison marking is all higher than pre- every time
If value, judge that every a voice belongs to same user.
In the present embodiment, the efficient voice that the acquisition module 301 obtains averagely is divided into 3 by the segmentation module 302
Part, the judgment module 304 compares the feature vector (i-vector) of three sections of voices two-by-two.It can be with when comparing two i-vector
Given a mark using many algorithms, usual PLDA algorithms are given a mark, when this score difference be less than setting threshold value, then sentence
Fixed two i-vector come from same voice.If comparing i-vector two-by-two, comparison marking difference is respectively less than threshold value three times, then sentences
The voice quality of breaking reaches a standard, and can be normally carried out voiceprint registration, if there is arbitrary primary marking difference higher than threshold value, judgement should
Voice noise is excessive, and voice quality is bad, does not allow to carry out voiceprint registration.
The registration module 305, for it is described belong to same user per a voice when, to the efficient voice
Carry out voice registration.In voice registration phase, sound-groove model is stored on terminal device 100 for use by the registration module 305,
It is registered so as to complete voice.In the present embodiment, if the above results be unsatisfactory for every a voice belong to it is same
During user, then do not allow to carry out voice registration.
By above procedure module 301-305, the voiceprint registration program proposed by the invention based on Application on Voiceprint Recognition model
300, first, obtain efficient voice during user's registration voice;Secondly, the efficient voice is averagely divided into integer part language
Sound;Then, the characteristic voice vector per portion voice in integer part voice is calculated respectively;Then, according to described per a
Whether the characteristic voice vector determination of voice belongs to same user per a voice;Finally, only described per a voice category
In same user, voice registration is carried out to the efficient voice.It is only capable of detecting in this way, existing noise measuring method can be solved
The drawbacks of environmental noise, whether None- identified goes out to have more people to enter recording, still influence the quality of registration voice, and then can examine
It measures whether user in the environment that noise is excessive or more people speak carries out voiceprint registration, undesirable recording is avoided to enter
Vocal print library (sound bank) ensures the quality of registration voice, prevents low quality voice from influencing subsequent authentication, and promote Application on Voiceprint Recognition can
The property used.
In addition, the present invention also proposes a kind of voiceprint registration method based on Application on Voiceprint Recognition model.
As shown in fig.4, it is the implementation stream of the voiceprint registration method first embodiment the present invention is based on Application on Voiceprint Recognition model
Journey schematic diagram.In the present embodiment, according to different demands, the execution sequence of the step in flow chart shown in Fig. 4 can change
Become, certain steps can be omitted.
Step S401 obtains efficient voice during user's registration voice.It is described based on Application on Voiceprint Recognition mould in the present embodiment
The voiceprint registration method of type, applied to terminal installation 100, the terminal installation 100 of the present embodiment can be with speech recognition work(
Any one terminal of energy, such as mobile phone, portable computer, personal digital assistant, bank paying terminal, access control equipment etc.,
These equipment can go to realize some specific function and applications by speech recognition technology.It is used in addition, terminal installation 100 obtains
Family carries out efficient voice during voice registration, can be obtained since when user clicks voice typing, until user stops
Only voice typing so can improve the degree of purity of pending speech samples to avoid some unnecessary noise jamming.
The efficient voice is averagely divided into integer part voice by step S402.Specifically, the terminal installation 100 will
The efficient voice of user is averagely cut into integer section by valid frame to obtain integer part voice.In the present embodiment, the terminal
The efficient voice of acquisition is preferably averagely divided into 3 parts of voices by device 100.And in specific implementation process, if user
Using (such as cryptogram) during text pertinent registration vocal print, repeated text is needed to be registered three times, then by this 3 voices
As 3 parts of voices after segmentation.Use text it is unrelated carry out voiceprint registration when, can valid frame averagely directly be cut into voice
It is three sections.Certainly, the present invention is not limited to which efficient voice to be averagely divided into 3 parts of voices, technical staff can be according to practical need
It is split.
Step S403 calculates the characteristic voice vector per portion voice in integer part voice respectively.
In embodiment, the terminal installation 100 is calculated in integer part voice respectively in the following manner per portion language
The characteristic voice vector of sound:
It is extracted using mel-frequency cepstrum coefficient MFCC methods per the MFCC features and composition per frame voice in a voice
One matrix, and screened using universal background model (UBM) and vectorial (i-vector) extractor (extractor) of characteristic voice
Go out feature most crucial in the matrix, form the characteristic voice vector.
Wherein, MFCC is the abbreviation of Mel-Frequency Cepstral Coefficients, includes two key steps
Suddenly:Mel-frequency is transformed into, then carries out cepstral analysis.Melscale be it is a kind of based on human ear to equidistant pitch (pitch)
The relationship of the hertz of non-linear frequency scale and frequency depending on the sense organ judgement of variation is as follows:M=2595log10(1+f/
700), wherein m represents each melscale, and f represents current sound frequency.It in the present embodiment, will be per a language
Sound is input to the wave filter of melscale, and the wave filter of melscale is then recycled to every a voice wait the letter of scales
Number cutting, wherein obtaining multiple frequency bands, and forms a matrix by the numerical value of corresponding frequencies section.
Common UBM (universal background model, Universal Background Model) has gauss hybrid models
(GMM).And the definition of mixed Gauss model is:
Wherein K is the number of model;πkWeight for k-th of Gauss;P (x/k) is then k-th of Gaussian probability density,
Mean value is μk, variance σk.And after above-mentioned matrix is substituted into such Gauss model, then corresponding K is equal to matrix node
Number, πkFor the weight (i.e. k-th node occur number) of k-th of node, and mean value is μk, variance σkIt is then institute in matrix
There are mean value, the variance of node, p (x/k) is the probability of k-th node.It is not difficult to calculate by above-mentioned formula, p (x/k) is most in matrix
Big node is chosen by extractor as the speech feature vector per a voice.
Whether step S404 belongs to same according to the characteristic voice vector determination per a voice per a voice
User.When every a voice belongs to same user, step S405 is performed, otherwise, terminates flow.
Specifically, the terminal installation 100 judges whether belong to same user's per a voice in the following manner
Step:
The terminal installation 100 calculates the characteristic voice vector per a voice, i.e., is compared two-by-two
Marking, and judge whether every a voice belongs to same user according to marking result.For example, in the present embodiment, it is described
Terminal installation 100 carries out comparison two-by-two to the characteristic voice vector per a voice using PLDA algorithms and gives a mark.It is specific and
Speech, the characteristic voice vector obtained above by matrix-vector, only including corresponding frequency values, and by PLDA models,
Can be to the channel compensation that respective value is simulated, and then ensure that features described above speech vector has channel attribute (such as frequency
Bandwidth), then given a mark by log-likelihood ratio, will the frequencies/channels property value of each characteristic voice vector take the logarithm,
Then numerical value after taking the logarithm is compared.For example if there is two tested speech, by calculating log-likelihood ratio, compare two
Whether the difference between logarithm after speech feature vector calculating is less than a preset value, if less than a preset value, judges two
Speech feature vector is similar, i.e., two parts of voices may be same user.
Further, in order to exclude the influence of noise, the terminal installation 100 is judging whether belong to same per a voice
During the step of one user, whether it is additionally operable to judge to compare the difference of marking every time less than preset value, if comparison marking every time
Difference is below preset value, judges that every a voice belongs to same user.
In the present embodiment, the efficient voice of acquisition is averagely divided into 3 parts by the terminal installation 100, is then compared two-by-two
The feature vector (i-vector) of three sections of voices.It can be given a mark when comparing two i-vector using many algorithms, usually
PLDA algorithms are given a mark, and when this score is more than the threshold value of setting, then two i-vector of judgement come from same voice.If two
Two compare i-vector, and comparison marking three times is below threshold value, then judges that the voice quality reaches a standard, and can be normally carried out vocal print note
Volume if there is arbitrary primary marking difference higher than threshold value, judges that the voice noise is excessive, and voice quality is bad, do not allow into
Row voiceprint registration.
Step S405 carries out voice registration to the efficient voice.In voice registration phase, the terminal installation 100 will
Sound-groove model storage is for use, is registered so as to complete voice.When above-mentioned judging result be non-effective voice, then not register.
By above-mentioned steps S401-405, the voiceprint registration method proposed by the invention based on Application on Voiceprint Recognition model is first
First, efficient voice during user's registration voice is obtained;Secondly, the efficient voice is averagely divided into integer part voice;Then,
The characteristic voice vector per portion voice in integer part voice is calculated respectively;Then, according to the spy per a voice
Sign speech vector judges whether belong to same user per a voice;Finally, if described belong to same use per a voice
Family then carries out voice registration to the efficient voice.It makes an uproar in this way, existing noise measuring method can be solved and be only capable of detection environment
The drawbacks of sound, whether None- identified goes out to have more people to enter recording, still influence the quality of registration voice, and then can detect to use
Whether family in the environment that noise is excessive or more people speak carries out voiceprint registration, and undesirable recording is avoided to enter vocal print library
(sound bank) ensures the quality of registration voice, prevents low quality voice from influencing subsequent authentication, promotes the availability of Application on Voiceprint Recognition.
The present invention also provides another embodiments, that is, provide a kind of storage medium, and the storage medium is stored with base
In the voiceprint registration program of Application on Voiceprint Recognition model, the voiceprint registration program based on Application on Voiceprint Recognition model can be by least one place
It manages device to perform, so that at least one processor performs the voiceprint registration method based on Application on Voiceprint Recognition model as described above
Step.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on such understanding, technical scheme of the present invention substantially in other words does the prior art
Going out the part of contribution can be embodied in the form of software product, which is stored in a storage medium
In (such as ROM/RAM, magnetic disc, CD), used including some instructions so that a station terminal equipment (can be mobile phone, computer takes
Be engaged in device, air conditioner or the network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair
The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made directly or indirectly is used in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of voiceprint registration method based on Application on Voiceprint Recognition model, applied to terminal installation, which is characterized in that the method packet
It includes:
Obtain efficient voice during user's registration voice;
The efficient voice is averagely divided into integer part voice;
The characteristic voice vector per portion voice in integer part voice is calculated respectively;
Judge whether belong to same per a voice according to the result of calculation to the characteristic voice vector per a voice
User;And
If described belong to same user per a voice, voice registration is carried out to the efficient voice.
2. the voiceprint registration method based on Application on Voiceprint Recognition model as described in claim 1, which is characterized in that described to have described
The step of effect voice is averagely divided into integer part voice, including:
When efficient voice is non-textual voice, it is whole to obtain that the efficient voice of user is averagely cut by valid frame to integer section
Several parts of voices.
3. the voiceprint registration method based on Application on Voiceprint Recognition model as described in claim 1, which is characterized in that described to calculate respectively
The step of characteristic voice vector in integer part voice per portion voice, including:
It is extracted using MFCC methods per one matrix of MFCC features and composition per frame voice in a voice;And
Feature most crucial in the matrix is filtered out using UBM and characteristic voice vector extractor, forms the characteristic voice
Vector.
4. the voiceprint registration method based on Application on Voiceprint Recognition model as described in claim 1, which is characterized in that the foundation is to institute
The result of calculation for stating the characteristic voice vector of every a voice judges the step of whether every a voice belongs to same user, packet
It includes:
The characteristic voice vector per a voice is subjected to comparison marking two-by-two;And
If the difference compared after giving a mark two-by-two is less than a preset value, judge that every a voice belongs to same user.
5. the voiceprint registration method based on Application on Voiceprint Recognition model as claimed in claim 4, which is characterized in that described to according to institute
The result of calculation for stating the characteristic voice vector of every a voice judges the step of whether every a voice belongs to same user, also
Including:
Comparison two-by-two is carried out using PLDA algorithms to the characteristic voice vector per a voice to give a mark;And
If the difference of comparison marking is below preset value every time, judge that every a voice belongs to same user.
6. a kind of terminal installation, which is characterized in that the terminal installation includes memory, processor, is stored on the memory
There is a voiceprint registration program based on Application on Voiceprint Recognition model that can be run on the processor, it is described based on Application on Voiceprint Recognition model
Voiceprint registration program realizes following steps when being performed by the processor:
Obtain efficient voice during user's registration voice;
The efficient voice is averagely divided into integer part voice;
The characteristic voice vector per portion voice in integer part voice is calculated respectively;
Judge whether belong to same per a voice according to the result of calculation to the characteristic voice vector per a voice
User;And
If described belong to same user per a voice, voice registration is carried out to the efficient voice.
7. terminal installation as claimed in claim 6, which is characterized in that described to calculate respectively in integer part voice per a
The step of the characteristic voice vector of voice, including:
It is extracted using MFCC methods per one matrix of MFCC features and composition per frame voice in a voice;And
Feature most crucial in the matrix is filtered out using UBM and characteristic voice vector extractor, forms the characteristic voice
Vector.
8. terminal installation as claimed in claim 6, which is characterized in that described to according to the characteristic voice per a voice
The step of result of calculation of vector judges whether to belong to same user per a voice, including:
The characteristic voice vector per a voice is subjected to comparison marking two-by-two;And
If the difference compared after giving a mark two-by-two is less than a preset value, judge that every a voice belongs to same user.
9. terminal installation as claimed in claim 8, which is characterized in that the foundation is to the characteristic voice per a voice
It the step of whether vector determination belongs to same user per the result of calculation of a voice, further includes:
Comparison two-by-two is carried out using PLDA algorithms to the characteristic voice vector per a voice to give a mark;And
If the difference of comparison marking is below preset value every time, judge that every a voice belongs to same user.
10. a kind of storage medium, the computer-readable recording medium storage has the voiceprint registration journey based on Application on Voiceprint Recognition model
Sequence, the voiceprint registration program based on Application on Voiceprint Recognition model can be performed by least one processor, so that described at least one
Processor performs the step of voiceprint registration method based on Application on Voiceprint Recognition model as described in any one of claim 1-5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810003939.1A CN108172230A (en) | 2018-01-03 | 2018-01-03 | Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model |
PCT/CN2018/077670 WO2019134247A1 (en) | 2018-01-03 | 2018-02-28 | Voiceprint registration method based on voiceprint recognition model, terminal device, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810003939.1A CN108172230A (en) | 2018-01-03 | 2018-01-03 | Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108172230A true CN108172230A (en) | 2018-06-15 |
Family
ID=62517236
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810003939.1A Pending CN108172230A (en) | 2018-01-03 | 2018-01-03 | Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108172230A (en) |
WO (1) | WO2019134247A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108962284A (en) * | 2018-07-04 | 2018-12-07 | 科大讯飞股份有限公司 | A kind of voice recording method and device |
CN109051405A (en) * | 2018-08-31 | 2018-12-21 | 深圳市研本品牌设计有限公司 | A kind of intelligent dustbin and storage medium |
CN109378002A (en) * | 2018-10-11 | 2019-02-22 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of voice print verification |
CN110099047A (en) * | 2019-04-12 | 2019-08-06 | 平安科技(深圳)有限公司 | Registration information processing method, device, computer equipment and storage medium |
CN110688640A (en) * | 2019-09-03 | 2020-01-14 | 深圳市声扬科技有限公司 | Data processing method, device and system based on voiceprint recognition and server |
CN110827834A (en) * | 2019-11-11 | 2020-02-21 | 广州国音智能科技有限公司 | Voiceprint registration method, system and computer readable storage medium |
CN111916074A (en) * | 2020-06-29 | 2020-11-10 | 厦门快商通科技股份有限公司 | Cross-device voice control method, system, terminal and storage medium |
CN112908310A (en) * | 2021-01-20 | 2021-06-04 | 宁波方太厨具有限公司 | Voice instruction recognition method and system in intelligent electric appliance |
CN113177816A (en) * | 2020-01-08 | 2021-07-27 | 阿里巴巴集团控股有限公司 | Information processing method and device |
WO2021232213A1 (en) * | 2020-05-19 | 2021-11-25 | 华为技术有限公司 | Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method |
WO2022077918A1 (en) * | 2020-10-12 | 2022-04-21 | 北京捷通华声科技股份有限公司 | Method for detecting validity of registered audio, detection apparatus, and electronic device |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111048072A (en) * | 2019-11-21 | 2020-04-21 | 中国南方电网有限责任公司 | Voiceprint recognition method applied to power enterprises |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102404287A (en) * | 2010-09-14 | 2012-04-04 | 盛乐信息技术(上海)有限公司 | Voiceprint identification system and method for determining voiceprint authentication threshold value through data multiplexing method |
CN104184587A (en) * | 2014-08-08 | 2014-12-03 | 腾讯科技(深圳)有限公司 | Voiceprint generation method, voiceprint generation server, client and voiceprint generation system |
CN104219050A (en) * | 2014-08-08 | 2014-12-17 | 腾讯科技(深圳)有限公司 | Voiceprint verification method and system, voiceprint verification server and voiceprint verification client side |
CN106782564A (en) * | 2016-11-18 | 2017-05-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for processing speech data |
CN107068154A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The method and system of authentication based on Application on Voiceprint Recognition |
CN107105343A (en) * | 2017-04-24 | 2017-08-29 | 深圳市茁壮网络股份有限公司 | A kind of authentication method of user, apparatus and system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10476872B2 (en) * | 2015-02-20 | 2019-11-12 | Sri International | Joint speaker authentication and key phrase identification |
CN106601258A (en) * | 2016-12-12 | 2017-04-26 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Speaker identification method capable of information channel compensation based on improved LSDA algorithm |
CN107146601B (en) * | 2017-04-07 | 2020-07-24 | 南京邮电大学 | Rear-end i-vector enhancement method for speaker recognition system |
CN107342077A (en) * | 2017-05-27 | 2017-11-10 | 国家计算机网络与信息安全管理中心 | A kind of speaker segmentation clustering method and system based on factorial analysis |
-
2018
- 2018-01-03 CN CN201810003939.1A patent/CN108172230A/en active Pending
- 2018-02-28 WO PCT/CN2018/077670 patent/WO2019134247A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102404287A (en) * | 2010-09-14 | 2012-04-04 | 盛乐信息技术(上海)有限公司 | Voiceprint identification system and method for determining voiceprint authentication threshold value through data multiplexing method |
CN104184587A (en) * | 2014-08-08 | 2014-12-03 | 腾讯科技(深圳)有限公司 | Voiceprint generation method, voiceprint generation server, client and voiceprint generation system |
CN104219050A (en) * | 2014-08-08 | 2014-12-17 | 腾讯科技(深圳)有限公司 | Voiceprint verification method and system, voiceprint verification server and voiceprint verification client side |
CN106782564A (en) * | 2016-11-18 | 2017-05-31 | 百度在线网络技术(北京)有限公司 | Method and apparatus for processing speech data |
CN107068154A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The method and system of authentication based on Application on Voiceprint Recognition |
CN107105343A (en) * | 2017-04-24 | 2017-08-29 | 深圳市茁壮网络股份有限公司 | A kind of authentication method of user, apparatus and system |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108962284A (en) * | 2018-07-04 | 2018-12-07 | 科大讯飞股份有限公司 | A kind of voice recording method and device |
CN108962284B (en) * | 2018-07-04 | 2021-06-08 | 科大讯飞股份有限公司 | Voice recording method and device |
CN109051405A (en) * | 2018-08-31 | 2018-12-21 | 深圳市研本品牌设计有限公司 | A kind of intelligent dustbin and storage medium |
CN109378002A (en) * | 2018-10-11 | 2019-02-22 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of voice print verification |
CN110099047A (en) * | 2019-04-12 | 2019-08-06 | 平安科技(深圳)有限公司 | Registration information processing method, device, computer equipment and storage medium |
CN110099047B (en) * | 2019-04-12 | 2021-09-07 | 平安科技(深圳)有限公司 | Registration information processing method and device, computer equipment and storage medium |
CN110688640A (en) * | 2019-09-03 | 2020-01-14 | 深圳市声扬科技有限公司 | Data processing method, device and system based on voiceprint recognition and server |
CN110827834A (en) * | 2019-11-11 | 2020-02-21 | 广州国音智能科技有限公司 | Voiceprint registration method, system and computer readable storage medium |
CN110827834B (en) * | 2019-11-11 | 2022-07-12 | 广州国音智能科技有限公司 | Voiceprint registration method, system and computer readable storage medium |
CN113177816A (en) * | 2020-01-08 | 2021-07-27 | 阿里巴巴集团控股有限公司 | Information processing method and device |
WO2021232213A1 (en) * | 2020-05-19 | 2021-11-25 | 华为技术有限公司 | Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method |
CN111916074A (en) * | 2020-06-29 | 2020-11-10 | 厦门快商通科技股份有限公司 | Cross-device voice control method, system, terminal and storage medium |
WO2022077918A1 (en) * | 2020-10-12 | 2022-04-21 | 北京捷通华声科技股份有限公司 | Method for detecting validity of registered audio, detection apparatus, and electronic device |
CN112908310A (en) * | 2021-01-20 | 2021-06-04 | 宁波方太厨具有限公司 | Voice instruction recognition method and system in intelligent electric appliance |
Also Published As
Publication number | Publication date |
---|---|
WO2019134247A1 (en) | 2019-07-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108172230A (en) | Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model | |
CN110288978B (en) | Speech recognition model training method and device | |
CN110289003B (en) | Voiceprint recognition method, model training method and server | |
EP2763134B1 (en) | Method and apparatus for voice recognition | |
EP2760018B1 (en) | Voice identification method and apparatus | |
EP1989701B1 (en) | Speaker authentication | |
US20180197547A1 (en) | Identity verification method and apparatus based on voiceprint | |
CN110853617B (en) | Model training method, language identification method, device and equipment | |
CN107610707A (en) | A kind of method for recognizing sound-groove and device | |
CN107919137A (en) | The long-range measures and procedures for the examination and approval, device, equipment and readable storage medium storing program for executing | |
CN110457432A (en) | Interview methods of marking, device, equipment and storage medium | |
CN112259106A (en) | Voiceprint recognition method and device, storage medium and computer equipment | |
CN108648769A (en) | Voice activity detection method, apparatus and equipment | |
CN113223536B (en) | Voiceprint recognition method and device and terminal equipment | |
CN110827793A (en) | Language identification method | |
CN111583906A (en) | Role recognition method, device and terminal for voice conversation | |
CN108962231A (en) | A kind of method of speech classification, device, server and storage medium | |
CN107358947A (en) | Speaker recognition methods and system again | |
CN108831506A (en) | Digital audio based on GMM-BIC distorts point detecting method and system | |
Drygajlo | Automatic speaker recognition for forensic case assessment and interpretation | |
CN110019741A (en) | Request-answer system answer matching process, device, equipment and readable storage medium storing program for executing | |
CN112562725A (en) | Mixed voice emotion classification method based on spectrogram and capsule network | |
CN102237089A (en) | Method for reducing error identification rate of text irrelevant speaker identification system | |
Sekkate et al. | Speaker identification for OFDM-based aeronautical communication system | |
US10446138B2 (en) | System and method for assessing audio files for transcription services |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180615 |
|
RJ01 | Rejection of invention patent application after publication |