CN108172230A - Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model - Google Patents

Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model Download PDF

Info

Publication number
CN108172230A
CN108172230A CN201810003939.1A CN201810003939A CN108172230A CN 108172230 A CN108172230 A CN 108172230A CN 201810003939 A CN201810003939 A CN 201810003939A CN 108172230 A CN108172230 A CN 108172230A
Authority
CN
China
Prior art keywords
voice
per
vector
voiceprint
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810003939.1A
Other languages
Chinese (zh)
Inventor
王健宗
郑斯奇
于夕畔
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810003939.1A priority Critical patent/CN108172230A/en
Priority to PCT/CN2018/077670 priority patent/WO2019134247A1/en
Publication of CN108172230A publication Critical patent/CN108172230A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Abstract

The invention discloses a kind of voiceprint registration methods based on Application on Voiceprint Recognition model, and applied to terminal installation, this method includes:Obtain efficient voice during user's registration voice;The efficient voice is averagely divided into integer part voice;The characteristic voice vector per portion voice in integer part voice is calculated respectively;Judge whether belong to same user per a voice according to the result of calculation to the characteristic voice vector per a voice;And if every a voice belongs to same user, and voice registration is carried out to the efficient voice.The present invention also provides a kind of terminal installation and storage mediums.The present invention can detect whether user in the environment that noise is excessive or more people speak carries out voiceprint registration, undesirable recording is avoided to enter vocal print library (or sound bank), ensure the quality of registration voice, it prevents low quality voice from influencing subsequent authentication, promotes the availability of Application on Voiceprint Recognition.

Description

Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model
Technical field
The present invention relates to field of communication technology more particularly to a kind of voiceprint registration method based on Application on Voiceprint Recognition model, ends End device and storage medium.
Background technology
With the continuous development of speech recognition technology, the application for supporting speech recognition is also more and more, for example voice is unlocked, Voice payment etc..But in the application of speech recognition technology, it is exactly voiceprint registration to be directed to a very important step.In sound Line registration in, if there is a large amount of environmental noise or there are more people carry out Speech Record it is fashionable, inherently influence registration voice Quality, and low quality voice influence subsequent authentication, and then influence Application on Voiceprint Recognition availability.And current way is mainly done Noise monitor, i.e., traditional noise measuring were detected environment before Application on Voiceprint Recognition is carried out, before typing user speech only Typing ambient sound is analyzed, and is determined as that environment is excessively noisy if ambient sound decibels are excessively high, is set if ambient sound decibel is less than Determine threshold value, then noise measuring passes through, and user can be with normal registration vocal print.But the noise measuring method is only capable of detection environmental noise, Whether None- identified goes out to have more people to enter recording, still influences the quality of registration voice.
Invention content
In view of this, the present invention proposes a kind of voiceprint registration method, terminal installation and storage based on Application on Voiceprint Recognition model Medium by implementing aforesaid way, can detect whether user in the environment that noise is excessive or more people speak carries out vocal print Registration avoids undesirable recording from entering vocal print library (sound bank), ensures the quality of registration voice, prevent low quality voice Subsequent authentication is influenced, promotes the availability of Application on Voiceprint Recognition.
First, to achieve the above object, the present invention proposes a kind of terminal installation, and the terminal installation includes memory, place Device is managed, the voiceprint registration program based on Application on Voiceprint Recognition model that can be run on the processor is stored on the memory, The voiceprint registration program based on Application on Voiceprint Recognition model realizes following steps when being performed by the processor:
Obtain efficient voice during user's registration voice;
The efficient voice is averagely divided into integer part voice;
The characteristic voice vector per portion voice in integer part voice is calculated respectively;
Whether belong to same per the result of calculation of a voice according to the characteristic voice vector determination per a voice One user;And
If described belong to same user per a voice, voice registration is carried out to the efficient voice.
Optionally, the step for calculating the characteristic voice vector in integer part voice per portion voice respectively, packet It includes:
It is extracted using MFCC methods per one matrix of MFCC features and composition per frame voice in a voice;And
Feature most crucial in the matrix is filtered out using UBM and characteristic voice vector extractor, forms the feature Speech vector.
Optionally, the foundation judges per a voice the result of calculation of the characteristic voice vector per a voice The step of whether belonging to same user, including:
Comparison two-by-two is carried out using PLDA algorithms to the characteristic voice vector per a voice to give a mark;
If the difference compared after giving a mark two-by-two is less than a preset value, judge that every a voice belongs to same use Family.
In addition, to achieve the above object, the present invention also provides a kind of voiceprint registration methods based on Application on Voiceprint Recognition model, should For terminal installation, the method includes:
Obtain efficient voice during user's registration voice;
The efficient voice is averagely divided into integer part voice;
The characteristic voice vector per portion voice in integer part voice is calculated respectively;
Whether belong to same per the result of calculation of a voice according to the characteristic voice vector determination per a voice One user;And
If described belong to same user per a voice, voice registration is carried out to the efficient voice.
Optionally, the described the step of efficient voice is averagely divided into integer part voice, including:
When efficient voice is non-textual voice, the efficient voice of user is averagely cut into integer section to obtain by valid frame Round numbers part voice.
Optionally, the step for calculating the characteristic voice vector in integer part voice per portion voice respectively, packet It includes:
It is extracted using MFCC methods per one matrix of MFCC features and composition per frame voice in a voice;And
Feature most crucial in the matrix is filtered out using UBM and characteristic voice vector extractor, forms the feature Speech vector.
Optionally, it is described to judging according to the result of calculation of the characteristic voice vector per a voice per a voice The step of whether belonging to same user, including:
Comparison two-by-two is carried out using PLDA algorithms to the characteristic voice vector per a voice to give a mark;
If the difference compared after giving a mark two-by-two is less than a preset value, judge that every a voice belongs to same use Family.
Further, to achieve the above object, the present invention also provides a kind of storage medium, the storage medium is stored with base In the voiceprint registration program of Application on Voiceprint Recognition model, the voiceprint registration program based on Application on Voiceprint Recognition model can be by least one place It manages device to perform, so that at least one processor performs the voiceprint registration method based on Application on Voiceprint Recognition model as described above Step.
Compared to the prior art, the voiceprint registration method proposed by the invention based on Application on Voiceprint Recognition model, terminal installation And storage medium, first, obtain efficient voice during user's registration voice;Secondly, the efficient voice is averagely divided into whole Several parts of voices;Then, the characteristic voice vector per portion voice in integer part voice is calculated respectively;Then, according to described in Whether the characteristic voice vector determination per portion voice belongs to same user per a voice;Finally, it is if described per a language Sound belongs to same user, then carries out voice registration to the efficient voice.In this way, existing noise measuring method can be solved only Environmental noise can be detected, whether None- identified goes out to have more people to enter recording, the drawbacks of still influencing to register the quality of voice, and then It can detect whether user in the environment that noise is excessive or more people speak carries out voiceprint registration, avoid undesirable record Sound enters vocal print library (or sound bank), ensures the quality of registration voice, prevents low quality voice from influencing subsequent authentication, promotion sound The availability of line identification.
Description of the drawings
Fig. 1 is a kind of hardware architecture diagram for the terminal installation for realizing each embodiment of the present invention;
Fig. 2 is a kind of communications network system Organization Chart provided in an embodiment of the present invention;
Fig. 3 is the Program modual graph of one embodiment of voiceprint registration program the present invention is based on Application on Voiceprint Recognition model;
Fig. 4 is the flow chart of one embodiment of voiceprint registration program the present invention is based on Application on Voiceprint Recognition model;
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that specific embodiment described herein is not intended to limit the present invention only to explain the present invention.
In subsequent description, using for representing that the suffix of such as " module ", " component " or " unit " of element is only Be conducive to the explanation of the present invention, itself there is no a specific meaning.Therefore, " module ", " component " or " unit " can mix Ground uses.
Terminal installation can be implemented in a variety of manners.For example, terminal installation described in the present invention can be included such as Mobile phone, tablet computer, laptop, palm PC, personal digital assistant (Personal Digital Assistant, PDA), portable media player (Portable Media Player, PMP), navigation device, wearable device, intelligent hand The fixed terminals such as the mobile terminals such as ring, pedometer and number TV, desktop computer.
It will be illustrated by taking mobile terminal as an example in subsequent descriptions, it will be appreciated by those skilled in the art that in addition to special For moving except the element of purpose, construction according to the embodiment of the present invention can also apply to the terminal of fixed type.
Referring to Fig. 1, a kind of hardware architecture diagram of its terminal installation of each embodiment to realize the present invention, the end End device 100 can include:RF (Radio Frequency, radio frequency) unit 101, WiFi module 102, audio output unit 103rd, A/V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108th, the components such as memory 109, processor 110 and power supply 111.It will be understood by those skilled in the art that shown in Fig. 1 The restriction of terminal installation structure not structure paired terminal, terminal installation can include than illustrate more or fewer components or Combine certain components or different components arrangement.
For the ease of understanding the embodiment of the present invention, below to the communications network system that is based on of terminal installation of the present invention into Row description.
Referring to Fig. 2, Fig. 2 is a kind of communications network system Organization Chart provided in an embodiment of the present invention, the communication network system The LTE system united as universal mobile communications technology, the LTE system include the UE (User Equipment, the use that communicate connection successively Family equipment) 201, E-UTRAN (Evolved UMTS Terrestrial Radio Access Network, evolved UMTS lands Ground wireless access network) 202, EPC (Evolved Packet Core, evolved packet-based core networks) 203 and operator IP operation 204。
Specifically, UE201 can be above-mentioned terminal 100, and details are not described herein again.
Based on above description, but those skilled in the art also may be used it is to be understood that the present invention is not only applicable to LTE system To be suitable for other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA and following new network system Deng, meanwhile, above-mentioned terminal installation 100 can be mobile phone, the electronic equipment of other voice Collecting and dealings such as personal electric assistant, It does not limit herein.
Based on above-mentioned 100 hardware configuration of terminal installation and communications network system, each embodiment of the method for the present invention is proposed.
First, the present invention proposes a kind of voiceprint registration program 300 based on Application on Voiceprint Recognition model.
As shown in fig.3, it is the program of 300 first embodiment of voiceprint registration program the present invention is based on Application on Voiceprint Recognition model Module map.
In the present embodiment, the voiceprint registration program 300 based on Application on Voiceprint Recognition model includes a series of be stored in Computer program instructions on memory 109 when the computer program instructions are performed by processor 110, can realize this hair The operation of the voiceprint registration based on Application on Voiceprint Recognition model of bright each embodiment.In some embodiments, referred to based on the computer program The specific operation that each section is realized is enabled, the voiceprint registration program 300 based on Application on Voiceprint Recognition model can be divided into One or more modules.For example, in figure 3, the voiceprint registration program 300 based on Application on Voiceprint Recognition model can be divided Into acquisition module 301, segmentation module 302, computing module 303, judgment module 304 and registration module 305.Wherein:
The acquisition module 301, for obtaining efficient voice during user's registration voice.In the present embodiment, the base Terminal installation 100 is stored in the voiceprint registration program 300 of Application on Voiceprint Recognition model, the terminal installation 100 of the present embodiment can be Any one terminal with speech identifying function, such as mobile phone, portable computer, personal digital assistant, bank paying terminal, Access control equipment etc., these equipment can go to realize some specific function and applications by speech recognition technology.In addition, terminal Device 100 obtains efficient voice when user carries out voice registration, can be obtained since when user clicks voice typing, Until user stops voice typing, so pending speech samples can be improved to avoid some unnecessary noise jamming Degree of purity.
The segmentation module 302, the efficient voice for the acquisition module 301 to be obtained averagely are divided into integer part language Sound.Specifically, the efficient voice of user is averagely cut into integer section to obtain integer part by the segmentation module 302 by valid frame Voice.In the present embodiment, the efficient voice of acquisition is preferably averagely divided into 3 parts of voices by the segmentation module 302.And having In the implementation process of body, if when user uses text pertinent registration vocal print (such as cryptogram), repeated text is needed three times It is registered, then using this 3 voices as 3 parts of voices after segmentation.It, can be with when using that text is unrelated to carry out voiceprint registration It is three sections that valid frame is directly averagely cut to voice.Certainly, the present invention is not limited to efficient voice is averagely divided into 3 parts of languages Sound, technical staff can be split according to actual needs.
The computing module 303, it is each in integer part voice of segmentation module 302 segmentation for calculating respectively The characteristic voice vector of part voice.In the present embodiment, the computing module 303 calculates described integer part respectively in the following manner Characteristic voice vector in voice per portion voice:
It is extracted using mel-frequency cepstrum coefficient MFCC methods per the MFCC features and composition per frame voice in a voice One matrix, and screened using universal background model (UBM) and vectorial (i-vector) extractor (extractor) of characteristic voice Go out feature most crucial in the matrix, form the characteristic voice vector.
Wherein, MFCC is the abbreviation of Mel-Frequency Cepstral Coefficients, includes two key steps Suddenly:Mel-frequency is transformed into, then carries out cepstral analysis.Melscale be it is a kind of based on human ear to equidistant pitch (pitch) The relationship of the hertz of non-linear frequency scale and frequency depending on the sense organ judgement of variation is as follows:M=2595log10(1+f/ 700), wherein m represents each melscale, and f represents current sound frequency.It in the present embodiment, will be per a language Sound is input to the wave filter of melscale, and the wave filter of melscale is then recycled to every a voice wait the letter of scales Number cutting, wherein obtaining multiple frequency bands, and forms a matrix by the numerical value of corresponding frequencies section.
Common UBM (universal background model, Universal Background Model) has gauss hybrid models (GMM).And the definition of mixed Gauss model is:
Wherein K is the number of model;πkWeight for k-th of Gauss;P (x/k) is then k-th of Gaussian probability density, Mean value is μk, variance σk.And after above-mentioned matrix is substituted into such Gauss model, then corresponding K is equal to matrix node Number, πkFor the weight (i.e. k-th node occur number) of k-th of node, and mean value is μk, variance σkIt is then institute in matrix There are mean value, the variance of node, p (x/k) is the probability of k-th node.It is not difficult to calculate by above-mentioned formula, p (x/k) is most in matrix Big node is chosen by extractor as the speech feature vector per a voice.
Further, per the feature of portion voice in the computing module 303 calculates integer part voice respectively After speech vector, the judgment module 304, for according to the every a voice of the characteristic voice vector determination per a voice Whether same user is belonged to.
Specifically, the judgment module 304 judges whether belong to same user's per a voice in the following manner Step:
The judgment module 304 calculates the characteristic voice vector per a voice, i.e., is compared two-by-two Marking, and judge whether every a voice belongs to same user according to marking result.For example, in the present embodiment, it is described Judgment module 304 carries out comparison two-by-two to the characteristic voice vector per a voice using PLDA algorithms and gives a mark.It is specific and Speech, the characteristic voice vector obtained above by matrix-vector, only including corresponding frequency values, and by PLDA models, Can be to the channel compensation that respective value is simulated, and then ensure that features described above speech vector has channel attribute (such as frequency Bandwidth), then given a mark by log-likelihood ratio, will the frequencies/channels property value of each characteristic voice vector take the logarithm, Then numerical value after taking the logarithm is compared.For example if there is two tested speech, by calculating log-likelihood ratio, compare two Whether the difference between logarithm after speech feature vector calculating is less than a preset value, if less than a preset value, judges two Speech feature vector is similar, i.e., two parts of voices may be same user.
Further, in order to exclude the influence of noise, the judgment module 304 is judging whether belong to same per a voice During the step of one user, whether it is additionally operable to judge to compare marking every time higher than preset value, if comparison marking is all higher than pre- every time If value, judge that every a voice belongs to same user.
In the present embodiment, the efficient voice that the acquisition module 301 obtains averagely is divided into 3 by the segmentation module 302 Part, the judgment module 304 compares the feature vector (i-vector) of three sections of voices two-by-two.It can be with when comparing two i-vector Given a mark using many algorithms, usual PLDA algorithms are given a mark, when this score difference be less than setting threshold value, then sentence Fixed two i-vector come from same voice.If comparing i-vector two-by-two, comparison marking difference is respectively less than threshold value three times, then sentences The voice quality of breaking reaches a standard, and can be normally carried out voiceprint registration, if there is arbitrary primary marking difference higher than threshold value, judgement should Voice noise is excessive, and voice quality is bad, does not allow to carry out voiceprint registration.
The registration module 305, for it is described belong to same user per a voice when, to the efficient voice Carry out voice registration.In voice registration phase, sound-groove model is stored on terminal device 100 for use by the registration module 305, It is registered so as to complete voice.In the present embodiment, if the above results be unsatisfactory for every a voice belong to it is same During user, then do not allow to carry out voice registration.
By above procedure module 301-305, the voiceprint registration program proposed by the invention based on Application on Voiceprint Recognition model 300, first, obtain efficient voice during user's registration voice;Secondly, the efficient voice is averagely divided into integer part language Sound;Then, the characteristic voice vector per portion voice in integer part voice is calculated respectively;Then, according to described per a Whether the characteristic voice vector determination of voice belongs to same user per a voice;Finally, only described per a voice category In same user, voice registration is carried out to the efficient voice.It is only capable of detecting in this way, existing noise measuring method can be solved The drawbacks of environmental noise, whether None- identified goes out to have more people to enter recording, still influence the quality of registration voice, and then can examine It measures whether user in the environment that noise is excessive or more people speak carries out voiceprint registration, undesirable recording is avoided to enter Vocal print library (sound bank) ensures the quality of registration voice, prevents low quality voice from influencing subsequent authentication, and promote Application on Voiceprint Recognition can The property used.
In addition, the present invention also proposes a kind of voiceprint registration method based on Application on Voiceprint Recognition model.
As shown in fig.4, it is the implementation stream of the voiceprint registration method first embodiment the present invention is based on Application on Voiceprint Recognition model Journey schematic diagram.In the present embodiment, according to different demands, the execution sequence of the step in flow chart shown in Fig. 4 can change Become, certain steps can be omitted.
Step S401 obtains efficient voice during user's registration voice.It is described based on Application on Voiceprint Recognition mould in the present embodiment The voiceprint registration method of type, applied to terminal installation 100, the terminal installation 100 of the present embodiment can be with speech recognition work( Any one terminal of energy, such as mobile phone, portable computer, personal digital assistant, bank paying terminal, access control equipment etc., These equipment can go to realize some specific function and applications by speech recognition technology.It is used in addition, terminal installation 100 obtains Family carries out efficient voice during voice registration, can be obtained since when user clicks voice typing, until user stops Only voice typing so can improve the degree of purity of pending speech samples to avoid some unnecessary noise jamming.
The efficient voice is averagely divided into integer part voice by step S402.Specifically, the terminal installation 100 will The efficient voice of user is averagely cut into integer section by valid frame to obtain integer part voice.In the present embodiment, the terminal The efficient voice of acquisition is preferably averagely divided into 3 parts of voices by device 100.And in specific implementation process, if user Using (such as cryptogram) during text pertinent registration vocal print, repeated text is needed to be registered three times, then by this 3 voices As 3 parts of voices after segmentation.Use text it is unrelated carry out voiceprint registration when, can valid frame averagely directly be cut into voice It is three sections.Certainly, the present invention is not limited to which efficient voice to be averagely divided into 3 parts of voices, technical staff can be according to practical need It is split.
Step S403 calculates the characteristic voice vector per portion voice in integer part voice respectively.
In embodiment, the terminal installation 100 is calculated in integer part voice respectively in the following manner per portion language The characteristic voice vector of sound:
It is extracted using mel-frequency cepstrum coefficient MFCC methods per the MFCC features and composition per frame voice in a voice One matrix, and screened using universal background model (UBM) and vectorial (i-vector) extractor (extractor) of characteristic voice Go out feature most crucial in the matrix, form the characteristic voice vector.
Wherein, MFCC is the abbreviation of Mel-Frequency Cepstral Coefficients, includes two key steps Suddenly:Mel-frequency is transformed into, then carries out cepstral analysis.Melscale be it is a kind of based on human ear to equidistant pitch (pitch) The relationship of the hertz of non-linear frequency scale and frequency depending on the sense organ judgement of variation is as follows:M=2595log10(1+f/ 700), wherein m represents each melscale, and f represents current sound frequency.It in the present embodiment, will be per a language Sound is input to the wave filter of melscale, and the wave filter of melscale is then recycled to every a voice wait the letter of scales Number cutting, wherein obtaining multiple frequency bands, and forms a matrix by the numerical value of corresponding frequencies section.
Common UBM (universal background model, Universal Background Model) has gauss hybrid models (GMM).And the definition of mixed Gauss model is:
Wherein K is the number of model;πkWeight for k-th of Gauss;P (x/k) is then k-th of Gaussian probability density, Mean value is μk, variance σk.And after above-mentioned matrix is substituted into such Gauss model, then corresponding K is equal to matrix node Number, πkFor the weight (i.e. k-th node occur number) of k-th of node, and mean value is μk, variance σkIt is then institute in matrix There are mean value, the variance of node, p (x/k) is the probability of k-th node.It is not difficult to calculate by above-mentioned formula, p (x/k) is most in matrix Big node is chosen by extractor as the speech feature vector per a voice.
Whether step S404 belongs to same according to the characteristic voice vector determination per a voice per a voice User.When every a voice belongs to same user, step S405 is performed, otherwise, terminates flow.
Specifically, the terminal installation 100 judges whether belong to same user's per a voice in the following manner Step:
The terminal installation 100 calculates the characteristic voice vector per a voice, i.e., is compared two-by-two Marking, and judge whether every a voice belongs to same user according to marking result.For example, in the present embodiment, it is described Terminal installation 100 carries out comparison two-by-two to the characteristic voice vector per a voice using PLDA algorithms and gives a mark.It is specific and Speech, the characteristic voice vector obtained above by matrix-vector, only including corresponding frequency values, and by PLDA models, Can be to the channel compensation that respective value is simulated, and then ensure that features described above speech vector has channel attribute (such as frequency Bandwidth), then given a mark by log-likelihood ratio, will the frequencies/channels property value of each characteristic voice vector take the logarithm, Then numerical value after taking the logarithm is compared.For example if there is two tested speech, by calculating log-likelihood ratio, compare two Whether the difference between logarithm after speech feature vector calculating is less than a preset value, if less than a preset value, judges two Speech feature vector is similar, i.e., two parts of voices may be same user.
Further, in order to exclude the influence of noise, the terminal installation 100 is judging whether belong to same per a voice During the step of one user, whether it is additionally operable to judge to compare the difference of marking every time less than preset value, if comparison marking every time Difference is below preset value, judges that every a voice belongs to same user.
In the present embodiment, the efficient voice of acquisition is averagely divided into 3 parts by the terminal installation 100, is then compared two-by-two The feature vector (i-vector) of three sections of voices.It can be given a mark when comparing two i-vector using many algorithms, usually PLDA algorithms are given a mark, and when this score is more than the threshold value of setting, then two i-vector of judgement come from same voice.If two Two compare i-vector, and comparison marking three times is below threshold value, then judges that the voice quality reaches a standard, and can be normally carried out vocal print note Volume if there is arbitrary primary marking difference higher than threshold value, judges that the voice noise is excessive, and voice quality is bad, do not allow into Row voiceprint registration.
Step S405 carries out voice registration to the efficient voice.In voice registration phase, the terminal installation 100 will Sound-groove model storage is for use, is registered so as to complete voice.When above-mentioned judging result be non-effective voice, then not register.
By above-mentioned steps S401-405, the voiceprint registration method proposed by the invention based on Application on Voiceprint Recognition model is first First, efficient voice during user's registration voice is obtained;Secondly, the efficient voice is averagely divided into integer part voice;Then, The characteristic voice vector per portion voice in integer part voice is calculated respectively;Then, according to the spy per a voice Sign speech vector judges whether belong to same user per a voice;Finally, if described belong to same use per a voice Family then carries out voice registration to the efficient voice.It makes an uproar in this way, existing noise measuring method can be solved and be only capable of detection environment The drawbacks of sound, whether None- identified goes out to have more people to enter recording, still influence the quality of registration voice, and then can detect to use Whether family in the environment that noise is excessive or more people speak carries out voiceprint registration, and undesirable recording is avoided to enter vocal print library (sound bank) ensures the quality of registration voice, prevents low quality voice from influencing subsequent authentication, promotes the availability of Application on Voiceprint Recognition.
The present invention also provides another embodiments, that is, provide a kind of storage medium, and the storage medium is stored with base In the voiceprint registration program of Application on Voiceprint Recognition model, the voiceprint registration program based on Application on Voiceprint Recognition model can be by least one place It manages device to perform, so that at least one processor performs the voiceprint registration method based on Application on Voiceprint Recognition model as described above Step.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme of the present invention substantially in other words does the prior art Going out the part of contribution can be embodied in the form of software product, which is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), used including some instructions so that a station terminal equipment (can be mobile phone, computer takes Be engaged in device, air conditioner or the network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made directly or indirectly is used in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of voiceprint registration method based on Application on Voiceprint Recognition model, applied to terminal installation, which is characterized in that the method packet It includes:
Obtain efficient voice during user's registration voice;
The efficient voice is averagely divided into integer part voice;
The characteristic voice vector per portion voice in integer part voice is calculated respectively;
Judge whether belong to same per a voice according to the result of calculation to the characteristic voice vector per a voice User;And
If described belong to same user per a voice, voice registration is carried out to the efficient voice.
2. the voiceprint registration method based on Application on Voiceprint Recognition model as described in claim 1, which is characterized in that described to have described The step of effect voice is averagely divided into integer part voice, including:
When efficient voice is non-textual voice, it is whole to obtain that the efficient voice of user is averagely cut by valid frame to integer section Several parts of voices.
3. the voiceprint registration method based on Application on Voiceprint Recognition model as described in claim 1, which is characterized in that described to calculate respectively The step of characteristic voice vector in integer part voice per portion voice, including:
It is extracted using MFCC methods per one matrix of MFCC features and composition per frame voice in a voice;And
Feature most crucial in the matrix is filtered out using UBM and characteristic voice vector extractor, forms the characteristic voice Vector.
4. the voiceprint registration method based on Application on Voiceprint Recognition model as described in claim 1, which is characterized in that the foundation is to institute The result of calculation for stating the characteristic voice vector of every a voice judges the step of whether every a voice belongs to same user, packet It includes:
The characteristic voice vector per a voice is subjected to comparison marking two-by-two;And
If the difference compared after giving a mark two-by-two is less than a preset value, judge that every a voice belongs to same user.
5. the voiceprint registration method based on Application on Voiceprint Recognition model as claimed in claim 4, which is characterized in that described to according to institute The result of calculation for stating the characteristic voice vector of every a voice judges the step of whether every a voice belongs to same user, also Including:
Comparison two-by-two is carried out using PLDA algorithms to the characteristic voice vector per a voice to give a mark;And
If the difference of comparison marking is below preset value every time, judge that every a voice belongs to same user.
6. a kind of terminal installation, which is characterized in that the terminal installation includes memory, processor, is stored on the memory There is a voiceprint registration program based on Application on Voiceprint Recognition model that can be run on the processor, it is described based on Application on Voiceprint Recognition model Voiceprint registration program realizes following steps when being performed by the processor:
Obtain efficient voice during user's registration voice;
The efficient voice is averagely divided into integer part voice;
The characteristic voice vector per portion voice in integer part voice is calculated respectively;
Judge whether belong to same per a voice according to the result of calculation to the characteristic voice vector per a voice User;And
If described belong to same user per a voice, voice registration is carried out to the efficient voice.
7. terminal installation as claimed in claim 6, which is characterized in that described to calculate respectively in integer part voice per a The step of the characteristic voice vector of voice, including:
It is extracted using MFCC methods per one matrix of MFCC features and composition per frame voice in a voice;And
Feature most crucial in the matrix is filtered out using UBM and characteristic voice vector extractor, forms the characteristic voice Vector.
8. terminal installation as claimed in claim 6, which is characterized in that described to according to the characteristic voice per a voice The step of result of calculation of vector judges whether to belong to same user per a voice, including:
The characteristic voice vector per a voice is subjected to comparison marking two-by-two;And
If the difference compared after giving a mark two-by-two is less than a preset value, judge that every a voice belongs to same user.
9. terminal installation as claimed in claim 8, which is characterized in that the foundation is to the characteristic voice per a voice It the step of whether vector determination belongs to same user per the result of calculation of a voice, further includes:
Comparison two-by-two is carried out using PLDA algorithms to the characteristic voice vector per a voice to give a mark;And
If the difference of comparison marking is below preset value every time, judge that every a voice belongs to same user.
10. a kind of storage medium, the computer-readable recording medium storage has the voiceprint registration journey based on Application on Voiceprint Recognition model Sequence, the voiceprint registration program based on Application on Voiceprint Recognition model can be performed by least one processor, so that described at least one Processor performs the step of voiceprint registration method based on Application on Voiceprint Recognition model as described in any one of claim 1-5.
CN201810003939.1A 2018-01-03 2018-01-03 Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model Pending CN108172230A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810003939.1A CN108172230A (en) 2018-01-03 2018-01-03 Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model
PCT/CN2018/077670 WO2019134247A1 (en) 2018-01-03 2018-02-28 Voiceprint registration method based on voiceprint recognition model, terminal device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810003939.1A CN108172230A (en) 2018-01-03 2018-01-03 Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model

Publications (1)

Publication Number Publication Date
CN108172230A true CN108172230A (en) 2018-06-15

Family

ID=62517236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810003939.1A Pending CN108172230A (en) 2018-01-03 2018-01-03 Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model

Country Status (2)

Country Link
CN (1) CN108172230A (en)
WO (1) WO2019134247A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962284A (en) * 2018-07-04 2018-12-07 科大讯飞股份有限公司 A kind of voice recording method and device
CN109051405A (en) * 2018-08-31 2018-12-21 深圳市研本品牌设计有限公司 A kind of intelligent dustbin and storage medium
CN109378002A (en) * 2018-10-11 2019-02-22 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of voice print verification
CN110099047A (en) * 2019-04-12 2019-08-06 平安科技(深圳)有限公司 Registration information processing method, device, computer equipment and storage medium
CN110688640A (en) * 2019-09-03 2020-01-14 深圳市声扬科技有限公司 Data processing method, device and system based on voiceprint recognition and server
CN110827834A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Voiceprint registration method, system and computer readable storage medium
CN111916074A (en) * 2020-06-29 2020-11-10 厦门快商通科技股份有限公司 Cross-device voice control method, system, terminal and storage medium
CN112908310A (en) * 2021-01-20 2021-06-04 宁波方太厨具有限公司 Voice instruction recognition method and system in intelligent electric appliance
CN113177816A (en) * 2020-01-08 2021-07-27 阿里巴巴集团控股有限公司 Information processing method and device
WO2021232213A1 (en) * 2020-05-19 2021-11-25 华为技术有限公司 Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method
WO2022077918A1 (en) * 2020-10-12 2022-04-21 北京捷通华声科技股份有限公司 Method for detecting validity of registered audio, detection apparatus, and electronic device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111048072A (en) * 2019-11-21 2020-04-21 中国南方电网有限责任公司 Voiceprint recognition method applied to power enterprises

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404287A (en) * 2010-09-14 2012-04-04 盛乐信息技术(上海)有限公司 Voiceprint identification system and method for determining voiceprint authentication threshold value through data multiplexing method
CN104184587A (en) * 2014-08-08 2014-12-03 腾讯科技(深圳)有限公司 Voiceprint generation method, voiceprint generation server, client and voiceprint generation system
CN104219050A (en) * 2014-08-08 2014-12-17 腾讯科技(深圳)有限公司 Voiceprint verification method and system, voiceprint verification server and voiceprint verification client side
CN106782564A (en) * 2016-11-18 2017-05-31 百度在线网络技术(北京)有限公司 Method and apparatus for processing speech data
CN107068154A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The method and system of authentication based on Application on Voiceprint Recognition
CN107105343A (en) * 2017-04-24 2017-08-29 深圳市茁壮网络股份有限公司 A kind of authentication method of user, apparatus and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10476872B2 (en) * 2015-02-20 2019-11-12 Sri International Joint speaker authentication and key phrase identification
CN106601258A (en) * 2016-12-12 2017-04-26 广东顺德中山大学卡内基梅隆大学国际联合研究院 Speaker identification method capable of information channel compensation based on improved LSDA algorithm
CN107146601B (en) * 2017-04-07 2020-07-24 南京邮电大学 Rear-end i-vector enhancement method for speaker recognition system
CN107342077A (en) * 2017-05-27 2017-11-10 国家计算机网络与信息安全管理中心 A kind of speaker segmentation clustering method and system based on factorial analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102404287A (en) * 2010-09-14 2012-04-04 盛乐信息技术(上海)有限公司 Voiceprint identification system and method for determining voiceprint authentication threshold value through data multiplexing method
CN104184587A (en) * 2014-08-08 2014-12-03 腾讯科技(深圳)有限公司 Voiceprint generation method, voiceprint generation server, client and voiceprint generation system
CN104219050A (en) * 2014-08-08 2014-12-17 腾讯科技(深圳)有限公司 Voiceprint verification method and system, voiceprint verification server and voiceprint verification client side
CN106782564A (en) * 2016-11-18 2017-05-31 百度在线网络技术(北京)有限公司 Method and apparatus for processing speech data
CN107068154A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The method and system of authentication based on Application on Voiceprint Recognition
CN107105343A (en) * 2017-04-24 2017-08-29 深圳市茁壮网络股份有限公司 A kind of authentication method of user, apparatus and system

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962284A (en) * 2018-07-04 2018-12-07 科大讯飞股份有限公司 A kind of voice recording method and device
CN108962284B (en) * 2018-07-04 2021-06-08 科大讯飞股份有限公司 Voice recording method and device
CN109051405A (en) * 2018-08-31 2018-12-21 深圳市研本品牌设计有限公司 A kind of intelligent dustbin and storage medium
CN109378002A (en) * 2018-10-11 2019-02-22 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of voice print verification
CN110099047A (en) * 2019-04-12 2019-08-06 平安科技(深圳)有限公司 Registration information processing method, device, computer equipment and storage medium
CN110099047B (en) * 2019-04-12 2021-09-07 平安科技(深圳)有限公司 Registration information processing method and device, computer equipment and storage medium
CN110688640A (en) * 2019-09-03 2020-01-14 深圳市声扬科技有限公司 Data processing method, device and system based on voiceprint recognition and server
CN110827834A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Voiceprint registration method, system and computer readable storage medium
CN110827834B (en) * 2019-11-11 2022-07-12 广州国音智能科技有限公司 Voiceprint registration method, system and computer readable storage medium
CN113177816A (en) * 2020-01-08 2021-07-27 阿里巴巴集团控股有限公司 Information processing method and device
WO2021232213A1 (en) * 2020-05-19 2021-11-25 华为技术有限公司 Voiceprint recognition apparatus, voiceprint registration apparatus and cross-device voiceprint recognition method
CN111916074A (en) * 2020-06-29 2020-11-10 厦门快商通科技股份有限公司 Cross-device voice control method, system, terminal and storage medium
WO2022077918A1 (en) * 2020-10-12 2022-04-21 北京捷通华声科技股份有限公司 Method for detecting validity of registered audio, detection apparatus, and electronic device
CN112908310A (en) * 2021-01-20 2021-06-04 宁波方太厨具有限公司 Voice instruction recognition method and system in intelligent electric appliance

Also Published As

Publication number Publication date
WO2019134247A1 (en) 2019-07-11

Similar Documents

Publication Publication Date Title
CN108172230A (en) Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model
CN110288978B (en) Speech recognition model training method and device
CN110289003B (en) Voiceprint recognition method, model training method and server
EP2763134B1 (en) Method and apparatus for voice recognition
EP2760018B1 (en) Voice identification method and apparatus
EP1989701B1 (en) Speaker authentication
US20180197547A1 (en) Identity verification method and apparatus based on voiceprint
CN110853617B (en) Model training method, language identification method, device and equipment
CN107610707A (en) A kind of method for recognizing sound-groove and device
CN107919137A (en) The long-range measures and procedures for the examination and approval, device, equipment and readable storage medium storing program for executing
CN110457432A (en) Interview methods of marking, device, equipment and storage medium
CN112259106A (en) Voiceprint recognition method and device, storage medium and computer equipment
CN108648769A (en) Voice activity detection method, apparatus and equipment
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
CN110827793A (en) Language identification method
CN111583906A (en) Role recognition method, device and terminal for voice conversation
CN108962231A (en) A kind of method of speech classification, device, server and storage medium
CN107358947A (en) Speaker recognition methods and system again
CN108831506A (en) Digital audio based on GMM-BIC distorts point detecting method and system
Drygajlo Automatic speaker recognition for forensic case assessment and interpretation
CN110019741A (en) Request-answer system answer matching process, device, equipment and readable storage medium storing program for executing
CN112562725A (en) Mixed voice emotion classification method based on spectrogram and capsule network
CN102237089A (en) Method for reducing error identification rate of text irrelevant speaker identification system
Sekkate et al. Speaker identification for OFDM-based aeronautical communication system
US10446138B2 (en) System and method for assessing audio files for transcription services

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180615

RJ01 Rejection of invention patent application after publication