CN108694952A - Electronic device, the method for authentication and storage medium - Google Patents
Electronic device, the method for authentication and storage medium Download PDFInfo
- Publication number
- CN108694952A CN108694952A CN201810311721.2A CN201810311721A CN108694952A CN 108694952 A CN108694952 A CN 108694952A CN 201810311721 A CN201810311721 A CN 201810311721A CN 108694952 A CN108694952 A CN 108694952A
- Authority
- CN
- China
- Prior art keywords
- user
- reading
- voice
- vocal print
- print feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000001755 vocal effect Effects 0.000 claims abstract description 104
- 238000012545 processing Methods 0.000 claims description 29
- 230000002452 interceptive effect Effects 0.000 claims description 26
- 238000001228 spectrum Methods 0.000 claims description 26
- 230000015654 memory Effects 0.000 claims description 21
- 238000004458 analytical method Methods 0.000 claims description 15
- 230000001537 neural effect Effects 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 abstract description 10
- 238000004364 calculation method Methods 0.000 description 7
- 238000009432 framing Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
- G10L17/24—Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/16—Hidden Markov models [HMM]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Game Theory and Decision Science (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Collating Specific Patterns (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Telephonic Communication Services (AREA)
Abstract
The present invention relates to a kind of electronic device, the method for authentication and storage medium, this method to include:Under IVR scenes when user's transacting business, report the random code of the first presetting digit capacity for the user with reading, and after with reading be respectively this random code for reporting and the user this acoustic model of preset kind is established with the voice of reading;By the acoustic model of the random code of this report and the user, this with the acoustic model of the voice of reading carries out forcing whole alignment operation, and the identical probability of two acoustic models after the alignment is calculated using pre-defined algorithm;If the probability is more than preset first threshold value, then extracting the user, this is vectorial with the vocal print feature of the voice of reading, obtain the standard vocal print feature vector that the user prestores after succeeding in registration, and calculate the user this with the voice of reading vocal print feature vector and the standard vocal print feature vector distance, with to the user progress authentication.The present invention carries out double verification to user identity, can accurately confirm user identity.
Description
Technical field
The present invention relates to a kind of field of communication technology more particularly to electronic device, the method for authentication and storage mediums.
Background technology
Currently, in interactive voice answering IVR (Interactive Voice Response) scene, providing will be mutual
Dynamic formula voice answer-back IVR is combined with Application on Voiceprint Recognition, to carry out the scheme of authentication to client, for example, client receives credit card
When using phone to carry out credit card activation or Modify password afterwards, the scene for verifying client identity is needed.The prior art is interactive
In voice answer-back IVR (Interactive Voice Response) scene, in view of long-range voice print verification both sides be not face to face into
Row verification cannot accurately confirm client's body accordingly, it is possible to can have the fraud that client utilizes pre-prepd synthesized voice
Part, the safety of authentication is low.
Invention content
The purpose of the present invention is to provide a kind of electronic device, the method for authentication and storage mediums, it is intended to user
Identity carries out double verification, can accurately confirm user identity.
To achieve the above object, the present invention provides a kind of electronic device, the electronic device include memory and with it is described
The processor of memory connection, is stored with the processing system that can be run on the processor, the processing in the memory
System realizes following steps when being executed by the processor:
Under interactive voice answering IVR scenes when user's transacting business, it is default to report first for acoustic model establishment step
The random code of digit for the user with read, and after with reading be respectively this report random code and the user this with reading language
Sound establishes the acoustic model of preset kind;
Force whole alignment step, by the acoustic model of the random code of this report and the user this with reading voice
Acoustic model carries out forcing whole alignment operation, and the identical probability of two acoustic models after the alignment is calculated using pre-defined algorithm;
Authentication step is extracted if the identical probability of two acoustic models after the alignment is more than preset first threshold value
The user this with the voice of reading vocal print feature vector, obtain standard vocal print feature that the user prestores after succeeding in registration to
Amount, and calculate the user this with the voice of reading vocal print feature vector and the standard vocal print feature vector distance, with to this
User carries out authentication.
Preferably, when the processing system is executed by the processor, following steps are also realized:
When user carries out voiceprint registration under interactive voice answering IVR scenes, the random code of the second presetting digit capacity is reported
It is respectively that the random code reported and user establish the default class with the voice of reading after every time with reading for user with reading default time
The acoustic model of type;
The acoustic model for the random code reported every time and corresponding user are carried out with the acoustic model of the voice of reading respectively
Whole alignment operation is forced, the identical probability of two acoustic models after alignment is calculated using pre-defined algorithm;
If the identical probability of two acoustic models after alignment is all higher than default second threshold, each user is extracted with reading
Whether the vocal print feature vector of voice, calculate the distance of vocal print feature vector two-by-two, be every time same with the user of reading with analysis
User;
If so, the standard vocal print feature vector using the vocal print feature vector as the user stores.
Preferably, the acoustic model of the preset kind is deep neural network-hidden Markov model.
Preferably, described extraction user this with the step of the vocal print feature vector of the voice of reading include:
To the user, this with the voice of reading carries out preemphasis and windowing process, and Fourier transform is carried out to each adding window
Corresponding frequency spectrum is obtained, the frequency spectrum is inputted into Meier filter to export to obtain Meier frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency
Spectral coefficient MFCC forms the user, and this is vectorial with the vocal print feature of the voice of reading.
To achieve the above object, the present invention also provides a kind of method of authentication, the method for the authentication includes:
S1, under interactive voice answering IVR scenes when user's transacting business, the random code for reporting the first presetting digit capacity supplies
The user with reading, and after with reading be respectively this random code for reporting and the user this with the voice of reading establish preset kind
Acoustic model;
S2, by the acoustic model of the random code of this report and the user this with the voice of reading acoustic model carry out it is strong
The whole alignment operation of system, the identical probability of two acoustic models after the alignment is calculated using pre-defined algorithm;
S3, if the identical probability of two acoustic models after the alignment is more than preset first threshold value, extract the user this
With the vocal print feature vector of the voice of reading, the standard vocal print feature vector that the user prestores after succeeding in registration is obtained, and calculate
The user this with the voice of reading vocal print feature vector and the standard vocal print feature vector distance, with to the user progress body
Part verification.
Preferably, before the step S1, further include:
S01, under interactive voice answering IVR scenes user carry out voiceprint registration when, report the second presetting digit capacity with
Machine code for user with reading default time, with the voice of reading establish described pre- by the random code and user respectively reported after every time with reading
If the acoustic model of type;
S02, respectively by the acoustic model for the random code reported every time and corresponding user with the acoustic model of the voice of reading
It carries out forcing whole alignment operation, the identical probability of two acoustic models after alignment is calculated using pre-defined algorithm;
S03, if alignment after the identical probability of two acoustic models be all higher than default second threshold, extract each user with
The vocal print feature vector of the voice of reading, calculates the distance of vocal print feature vector two-by-two, with analysis every time with the user of reading whether be
Same user;
S04, if so, the standard vocal print feature vector using the vocal print feature vector as the user stores.
Preferably, the acoustic model of the preset kind is deep neural network-hidden Markov model.
Preferably, described extraction user this with the step of the vocal print feature vector of the voice of reading include:
To the user, this with the voice of reading carries out preemphasis and windowing process, and Fourier transform is carried out to each adding window
Corresponding frequency spectrum is obtained, the frequency spectrum is inputted into Meier filter to export to obtain Meier frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is fallen based on the mel-frequency
Spectral coefficient MFCC forms the user, and this is vectorial with the vocal print feature of the voice of reading.
Preferably, it is described calculate the user this with the voice of reading vocal print feature vector and standard vocal print feature vector
Apart from the step of include:
Wherein, describedIt is described for standard vocal print feature vectorFor the user this with reading
The vocal print feature vector of voice.
The present invention also provides a kind of computer readable storage medium, processing is stored on the computer readable storage medium
The step of system, the processing system realizes the method for above-mentioned authentication when being executed by processor.
The beneficial effects of the invention are as follows:When the present invention carries out identification under interactive voice answering IVR scenes, utilize
Random code can effectively prevent pre-prepd synthesized voice with reading for user and cheat, by random code and Application on Voiceprint Recognition knot
It closes, realizes the double verification to user identity, can accurately confirm user identity, improve interactive voice answering IVR scenes
The safety of lower authentication, in addition, the acoustic model of acoustic model and the user to the random code of report with the voice of reading
It carries out forcing whole alignment operation, calculation amount can be reduced, improve identification efficiency.
Description of the drawings
Fig. 1 is each one optional application environment schematic diagram of embodiment of the present invention;
Fig. 2 is the flow diagram of one embodiment of method of authentication of the present invention.
Specific implementation mode
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not
For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work
The every other embodiment obtained is put, shall fall within the protection scope of the present invention.
It should be noted that the description for being related to " first ", " second " etc. in the present invention is used for description purposes only, and cannot
It is interpreted as indicating or implying its relative importance or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the
One ", the feature of " second " can explicitly or implicitly include at least one of the features.In addition, the skill between each embodiment
Art scheme can be combined with each other, but must can be implemented as basis with those of ordinary skill in the art, when technical solution
Will be understood that the combination of this technical solution is not present in conjunction with there is conflicting or cannot achieve when, also not the present invention claims
Protection domain within.
As shown in fig.1, being the application environment schematic diagram of the preferred embodiment of the method for authentication of the present invention.The application
Environment schematic includes electronic device 1 and terminal device.What electronic device 1 can be suitble to by network, near-field communication technology etc.
Technology carries out data interaction with terminal device.In the present embodiment, interactive language that user passes through terminal device logs electronic device 1
Sound response IVR system, to execute the operation of voiceprint registration and Application on Voiceprint Recognition.
The terminal device, which includes, but are not limited to any type, to pass through keyboard, mouse, remote controler, touch tablet with user
Or the modes such as voice-operated device carry out the electronic product of human-computer interaction, for example, personal computer, tablet computer, smart mobile phone, a
Personal digital assistant (Personal Digital Assistant, PDA), game machine, Interactive Internet TV (Internet
Protocol Television, IPTV), the movable equipment of intellectual Wearable, navigation device etc., or such as
The fixed terminal of digital TV, desktop computer, notebook, server etc..
The electronic device 1 be it is a kind of can according to the instruction for being previously set or storing, it is automatic carry out numerical computations and/
Or the equipment of information processing.The electronic device 1 can be computer, can also be single network server, multiple networks clothes
It is engaged in the server group either cloud being made of a large amount of hosts or network server based on cloud computing of device composition, wherein cloud computing
It is one kind of Distributed Calculation, a super virtual computer being made of the computer collection of a group loose couplings.
In the present embodiment, electronic device 1 may include, but be not limited only to, and can be in communication with each other connection by system bus
Memory 11, processor 12, network interface 13, memory 11 are stored with the processing system that can be run on the processor 12.It needs
, it is noted that Fig. 1 illustrates only the electronic device 1 with component 11-13, it should be understood that being not required for implementing all
The component shown, the implementation that can be substituted is more or less component.
Wherein, memory 11 includes memory and the readable storage medium storing program for executing of at least one type.Inside save as the fortune of electronic device 1
Row provides caching;Readable storage medium storing program for executing can be if flash memory, hard disk, multimedia card, card-type memory are (for example, SD or DX memories
Deng), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electric erasable can compile
Journey read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc. it is non-volatile
Storage medium.In some embodiments, readable storage medium storing program for executing can be the internal storage unit of electronic device 1, such as the electronics
The hard disk of device 1;In further embodiments, which can also be that the external storage of electronic device 1 is set
Plug-in type hard disk that is standby, such as being equipped on electronic device 1, intelligent memory card (Smart Media Card, SMC), secure digital
(Secure Digital, SD) blocks, flash card (Flash Card) etc..In the present embodiment, the readable storage medium storing program for executing of memory 11
It is installed on commonly used in storage in the operating system and types of applications software of electronic device 1, such as storage one embodiment of the invention
Processing system program code etc..It has exported or will export in addition, memory 11 can be also used for temporarily storing
Various types of data.
The processor 12 can be in some embodiments central processing unit (Central Processing Unit,
CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 12 is commonly used in the control electricity
The overall operation of sub-device 1, such as execute and carry out data interaction with the terminal device or communicate relevant control and processing
Deng.In the present embodiment, the processor 12 is for running the program code stored in the memory 11 or processing data, example
Such as run processing system.
The network interface 13 may include radio network interface or wired network interface, which is commonly used in
Communication connection is established between the electronic device 1 and other electronic equipments.In the present embodiment, network interface 13 is mainly used for will be electric
Sub-device 1 is connected with one or more terminal devices, and data are established between electronic device 1 and one or more terminal devices and are passed
Defeated channel and communication connection.
The processing system is stored in memory 11, including it is at least one be stored in it is computer-readable in memory 11
Instruction, at least one computer-readable instruction can be executed by processor device 12, the method to realize each embodiment of the application;With
And the function that at least one computer-readable instruction is realized according to its each section is different, can be divided into different logic moulds
Block.
In one embodiment, following steps are realized when above-mentioned processing system is executed by the processor 12:
Under interactive voice answering IVR scenes when user's transacting business, it is default to report first for acoustic model establishment step
The random code of digit for the user with read, and after with reading be respectively this report random code and the user this with reading language
Sound establishes the acoustic model of preset kind;
Under interactive voice answering IVR scenes, user asks to send identity code, such as identity card when transacting business
Number, after receiving the request of user, whether the business handled of analysis user needs further authentication, and according to
The identity code at family analyzes whether the user is registered vocal print, if desired the further authentication and user is registered
There is vocal print, then generate the random code of the first presetting digit capacity and speech synthesis technique is used to report the random code with speech form, draws
It leads user to carry out with reading, which is, for example, 8.
It is that the voice of this random code reported establishes the acoustic model of preset kind, is the user after user is with reading
This establishes the acoustic model of preset kind with the voice of reading.In a preferred embodiment, the acoustic model of the preset kind is
Deep neural network-hidden Markov acoustic model, i.e. DNN-HMM acoustic models.In other embodiments, the preset kind
Acoustic model may be other acoustic models, for example, hidden Markov acoustic model etc..
In a specific example, by taking DNN-HMM acoustic models as an example, wherein HMM is used for describing the dynamic of voice signal
Variation, the posterior probability of some state of continuous density HMM is estimated using each output node of DNN, you can obtain DNN-
HMM model.This with the voice of reading is all a series of syllable by the voice of this random code reported and the user, to identification
At word, then be a series of character.The present embodiment is based on scheduled character voice when establishing DNN-HMM acoustic models
Library obtains DNN-HMM acoustic models, the use of the voice of the random code of this report by global character acoustics adaptive training
Family this with the voice of reading DNN-HMM acoustic models.
Force whole alignment step, by the acoustic model of the random code of this report and the user this with reading voice
Acoustic model carries out forcing whole alignment operation, and the identical probability of two acoustic models after the alignment is calculated using pre-defined algorithm;
Wherein, by this report random code acoustic model and the user this with the voice of reading acoustic model carry out
Whole alignment (Force Alignment) operation is forced, takes the method word for word compared, the present embodiment can compared to traditional
Calculation amount is substantially reduced, the advantageous efficiency for improving identification.
Wherein, posterior probability algorithm is tested before pre-defined algorithm is in one embodiment, in other embodiments, can also be phase
Like degree algorithm, such as editing distance that the similarity algorithm be character in two acoustic models calculated after being aligned, editing distance gets over
The identical probability of two acoustic models after small then alignment is bigger;The similarity algorithm can also be longest common subsequence algorithm,
If obtained longest common subsequence be aligned after two acoustic models in the length of character differ smaller, two after alignment
The identical probability of acoustic model is bigger.
Authentication step is extracted if the identical probability of two acoustic models after the alignment is more than preset first threshold value
The user this with the voice of reading vocal print feature vector, obtain standard vocal print feature that the user prestores after succeeding in registration to
Amount, and calculate the user this with the voice of reading vocal print feature vector and the standard vocal print feature vector distance, with to this
User carries out authentication.
In the present embodiment, if the identical probability of two acoustic models after the alignment is more than preset first threshold value, for example, it is default
First threshold is 0.985, then it is assumed that this is consistent with the random code that this is reported with the character of reading by user.Due to report be with
Machine code, therefore the pre-prepd synthesized voice of user can be effectively prevented and cheated, promote the safety of identification.
In one embodiment, extracting the user, this with the step of the vocal print feature vector of the voice of reading includes:To the use
This with the voice of reading carries out preemphasis and windowing process at family, and carrying out Fourier transform to each adding window obtains corresponding frequency
The frequency spectrum is inputted Meier filter to export to obtain Meier frequency spectrum by spectrum;Cepstral analysis is carried out on Meier frequency spectrum to obtain
Mel-frequency cepstrum coefficient MFCC, based on the mel-frequency cepstrum coefficient MFCC form the user this with the voice of reading sound
Line feature vector.
Wherein, to the user this with reading voice carry out framing, then to after framing voice data carry out preemphasis
Processing, preemphasis processing are really high-pass filtering processing, filter out low-frequency data so that the high frequency characteristics in the voice data is more
It highlights, specifically, the transmission function of high-pass filtering is:H (Z)=1- α Z-1, wherein Z is voice data, and α is constant factor, excellent
The value of selection of land, α is 0.97;Since voice deviates from raw tone to a certain extent after framing, therefore, it is necessary to the language
Sound data carry out windowing process.
In the present embodiment, it is, for example, to take logarithm, do inverse transformation that cepstral analysis is carried out on Meier frequency spectrum, and inverse transformation is usually
It is realized by DCT discrete cosine transforms, takes the 2nd after DCT to the 13rd coefficient as mel-frequency cepstrum coefficient MFCC.
Mel-frequency cepstrum coefficient MFCC is the vocal print feature of this frame voice data, by the mel-frequency cepstrum coefficient MFCC groups of every frame
At characteristic matrix, this feature data matrix be the user this with the voice of reading vocal print feature vector.
The present embodiment takes the mel-frequency cepstrum coefficient MFCC of voice data to form corresponding vocal print feature vector, due to it
Than the frequency band for the linear interval in normal cepstrum more can subhuman auditory system, therefore body can be improved
The accuracy of part verification.
In one embodiment, calculate the user this with the voice of reading vocal print feature vector and the standard vocal print feature to
The distance of amount is the COS distance of both calculating, including:
Wherein, describedIt is described for standard vocal print feature vectorFor the user this with reading
The vocal print feature vector of voice.
If COS distance is less than or equal to preset distance threshold, authentication passes through;If COS distance is more than pre-
If distance threshold, then authentication do not pass through.
In one embodiment, the step of standard vocal print feature vector to prestore after user registration success, the registration vocal print
Including:
When user carries out voiceprint registration under interactive voice answering IVR scenes, the random code of the second presetting digit capacity is reported
It is respectively that the random code reported and user establish the default class with the voice of reading after every time with reading for user with reading default time
The acoustic model of type;
The acoustic model for the random code reported every time and corresponding user are carried out with the acoustic model of the voice of reading respectively
Whole alignment operation is forced, the identical probability of two acoustic models after alignment is calculated using pre-defined algorithm;
If the identical probability of two acoustic models after alignment is all higher than default second threshold, each user is extracted with reading
Whether the vocal print feature vector of voice, calculate the distance of vocal print feature vector two-by-two, be every time same with the user of reading with analysis
User;
If so, the standard vocal print feature vector using the vocal print feature vector as the user stores;
If it is not, the step of then user being prompted to re-type, carrying out registration vocal print again.
Wherein, under interactive voice answering IVR scenes, user asks to send identity code, such as identity when registration
Card number generates the random code of the second presetting digit capacity and uses speech synthesis technique with voice shape after receiving the request of user
Formula reports the random code, and guiding user carries out with reading default time (such as 3 times), which is, for example, 8.
After user is with reading, the voice of the random code to report every time establishes the acoustic model of preset kind, is the user
The acoustic model of preset kind is established with the voice of reading every time.In a preferred embodiment, the acoustic model of the preset kind is
Deep neural network-hidden Markov acoustic model, i.e. DNN-HMM acoustic models.In other embodiments, the preset kind
Acoustic model may be other acoustic models, for example, hidden Markov acoustic model etc..Specific example can refer to
The above embodiments, details are not described herein again.
In a specific example, by taking DNN-HMM acoustic models as an example, wherein HMM is used for describing the dynamic of voice signal
Variation, the posterior probability of some state of continuous density HMM is estimated using each output node of DNN, you can obtain DNN-
HMM model.The voice for the random code reported every time and the user are a series of syllables with the voice of reading, to identification
Word is then a series of character.The present embodiment is based on scheduled character sound bank when establishing DNN-HMM acoustic models, leads to
The DNN-HMM acoustic models of the voice for the random code that global character acoustics adaptive training is reported, the user are crossed with reading
The DNN-HMM acoustic models of voice.
Wherein, the acoustic model for the random code reported every time and the user are forced with the acoustic model of the voice of reading
Whole alignment (Force Alignment) operation, takes the method word for word compared, the present embodiment can be significantly compared to traditional
Reduce calculation amount, the advantageous efficiency for improving identification.
Wherein, posterior probability algorithm is tested before pre-defined algorithm is in one embodiment, in other embodiments, can also be phase
Like degree algorithm, specific example can refer to the above embodiments, and details are not described herein again.
In the present embodiment, if the identical probability of two acoustic models after alignment is all higher than default second threshold, for example, it is default
Second threshold is 0.985, then it is assumed that user is consistent with the random code reported with the character of reading every time.What it is due to report is random
Code, therefore the pre-prepd synthesized voice of user can be effectively prevented and cheated, promote the safety of identification.
In one embodiment, step and above-described embodiment of each user with the vocal print feature vector of the voice of reading are extracted
The method for extracting the vocal print feature vector of voice is essentially identical, and details are not described herein again.
In one embodiment, calculate two-by-two vocal print feature vector apart from the step of, the step with above-mentioned calculating COS distance
Rapid essentially identical, details are not described herein again.
It is every time same user with the user of reading, at this time if COS distance is less than or equal to preset distance threshold
Standard vocal print feature vector using the vocal print feature vector as the user stores;If COS distance is more than preset distance
Threshold value, then be not same user with the user of reading every time, and prompt user re-registers.
Compared with prior art, when the present invention carries out identification under interactive voice answering IVR scenes, using random
Code can effectively prevent pre-prepd synthesized voice with reading for user and cheat, and random code is combined with Application on Voiceprint Recognition, real
Show the double verification to user identity, can accurately confirm user identity, improves identity under interactive voice answering IVR scenes
The safety of verification, in addition, the acoustic model and the user to the random code of report carry out by force with the acoustic model of the voice of reading
The whole alignment operation of system, can reduce calculation amount, improve identification efficiency.
As shown in Fig. 2, Fig. 2 is the flow diagram of one embodiment of method of authentication of the present invention, the authentication
Method includes the following steps:
Step S1 under interactive voice answering IVR scenes when user's transacting business, reports the random of the first presetting digit capacity
Code for the user with reading, and after with reading be respectively this random code for reporting and the user this with reading voice establish it is default
The acoustic model of type;
Under interactive voice answering IVR scenes, user asks to send identity code, such as identity card when transacting business
Number, after receiving the request of user, whether the business handled of analysis user needs further authentication, and according to
The identity code at family analyzes whether the user is registered vocal print, if desired the further authentication and user is registered
There is vocal print, then generate the random code of the first presetting digit capacity and speech synthesis technique is used to report the random code with speech form, draws
It leads user to carry out with reading, which is, for example, 8.
It is that the voice of this random code reported establishes the acoustic model of preset kind, is the user after user is with reading
This establishes the acoustic model of preset kind with the voice of reading.In a preferred embodiment, the acoustic model of the preset kind is
Deep neural network-hidden Markov acoustic model, i.e. DNN-HMM acoustic models.In other embodiments, the preset kind
Acoustic model may be other acoustic models, for example, hidden Markov acoustic model etc..
In a specific example, by taking DNN-HMM acoustic models as an example, wherein HMM is used for describing the dynamic of voice signal
Variation, the posterior probability of some state of continuous density HMM is estimated using each output node of DNN, you can obtain DNN-
HMM model.This with the voice of reading is all a series of syllable by the voice of this random code reported and the user, to identification
At word, then be a series of character.The present embodiment is based on scheduled character voice when establishing DNN-HMM acoustic models
Library obtains DNN-HMM acoustic models, the use of the voice of the random code of this report by global character acoustics adaptive training
Family this with the voice of reading DNN-HMM acoustic models.
Step S2, by this report random code acoustic model and the user this with the voice of reading acoustic model into
Row forces whole alignment operation, and the identical probability of two acoustic models after the alignment is calculated using pre-defined algorithm;
Wherein, by this report random code acoustic model and the user this with the voice of reading acoustic model carry out
Whole alignment (Force Alignment) operation is forced, takes the method word for word compared, the present embodiment can compared to traditional
Calculation amount is substantially reduced, the advantageous efficiency for improving identification.
Wherein, posterior probability algorithm is tested before pre-defined algorithm is in one embodiment, in other embodiments, can also be phase
Like degree algorithm, such as editing distance that the similarity algorithm be character in two acoustic models calculated after being aligned, editing distance gets over
The identical probability of two acoustic models after small then alignment is bigger;The similarity algorithm can also be longest common subsequence algorithm,
If obtained longest common subsequence be aligned after two acoustic models in the length of character differ smaller, two after alignment
The identical probability of acoustic model is bigger.
Step S3 extracts the user if the identical probability of two acoustic models after the alignment is more than preset first threshold value
This obtains the standard vocal print feature vector that the user prestores after succeeding in registration with the vocal print feature vector of the voice of reading, and
Calculate the user this with the voice of reading vocal print feature vector and the standard vocal print feature vector distance, with to the user into
Row authentication.
In the present embodiment, if the identical probability of two acoustic models after the alignment is more than preset first threshold value, for example, it is default
First threshold is 0.985, then it is assumed that this is consistent with the random code that this is reported with the character of reading by user.Due to report be with
Machine code, therefore the pre-prepd synthesized voice of user can be effectively prevented and cheated, promote the safety of identification.
In one embodiment, extracting the user, this with the step of the vocal print feature vector of the voice of reading includes:To the use
This with the voice of reading carries out preemphasis and windowing process at family, and carrying out Fourier transform to each adding window obtains corresponding frequency
The frequency spectrum is inputted Meier filter to export to obtain Meier frequency spectrum by spectrum;Cepstral analysis is carried out on Meier frequency spectrum to obtain
Mel-frequency cepstrum coefficient MFCC, based on the mel-frequency cepstrum coefficient MFCC form the user this with the voice of reading sound
Line feature vector.
Wherein, to the user this with reading voice carry out framing, then to after framing voice data carry out preemphasis
Processing, preemphasis processing are really high-pass filtering processing, filter out low-frequency data so that the high frequency characteristics in the voice data is more
It highlights, specifically, the transmission function of high-pass filtering is:H (Z)=1- α Z-1, wherein Z is voice data, and α is constant factor, excellent
The value of selection of land, α is 0.97;Since voice deviates from raw tone to a certain extent after framing, therefore, it is necessary to the language
Sound data carry out windowing process.
In the present embodiment, it is, for example, to take logarithm, do inverse transformation that cepstral analysis is carried out on Meier frequency spectrum, and inverse transformation is usually
It is realized by DCT discrete cosine transforms, takes the 2nd after DCT to the 13rd coefficient as mel-frequency cepstrum coefficient MFCC.
Mel-frequency cepstrum coefficient MFCC is the vocal print feature of this frame voice data, by the mel-frequency cepstrum coefficient MFCC groups of every frame
At characteristic matrix, this feature data matrix be the user this with the voice of reading vocal print feature vector.
The present embodiment takes the mel-frequency cepstrum coefficient MFCC of voice data to form corresponding vocal print feature vector, due to it
Than the frequency band for the linear interval in normal cepstrum more can subhuman auditory system, therefore body can be improved
The accuracy of part verification.
In one embodiment, calculate the user this with the voice of reading vocal print feature vector and the standard vocal print feature to
The distance of amount is the COS distance of both calculating, including:
Wherein, describedIt is described for standard vocal print feature vectorFor the user this with reading
The vocal print feature vector of voice.
If COS distance is less than or equal to preset distance threshold, authentication passes through;If COS distance is more than pre-
If distance threshold, then authentication do not pass through.
In one embodiment, the step of standard vocal print feature vector to prestore after user registration success, the registration vocal print
Including:
When user carries out voiceprint registration under interactive voice answering IVR scenes, the random code of the second presetting digit capacity is reported
It is respectively that the random code reported and user establish the default class with the voice of reading after every time with reading for user with reading default time
The acoustic model of type;
The acoustic model for the random code reported every time and corresponding user are carried out with the acoustic model of the voice of reading respectively
Whole alignment operation is forced, the identical probability of two acoustic models after alignment is calculated using pre-defined algorithm;
If the identical probability of two acoustic models after alignment is all higher than default second threshold, each user is extracted with reading
Whether the vocal print feature vector of voice, calculate the distance of vocal print feature vector two-by-two, be every time same with the user of reading with analysis
User;
If so, the standard vocal print feature vector using the vocal print feature vector as the user stores;
If it is not, the step of then user being prompted to re-type, carrying out registration vocal print again.
Wherein, under interactive voice answering IVR scenes, user asks to send identity code, such as identity when registration
Card number generates the random code of the second presetting digit capacity and uses speech synthesis technique with voice shape after receiving the request of user
Formula reports the random code, and guiding user carries out with reading default time (such as 3 times), which is, for example, 8.
After user is with reading, the voice of the random code to report every time establishes the acoustic model of preset kind, is the user
The acoustic model of preset kind is established with the voice of reading every time.In a preferred embodiment, the acoustic model of the preset kind is
Deep neural network-hidden Markov acoustic model, i.e. DNN-HMM acoustic models.In other embodiments, the preset kind
Acoustic model may be other acoustic models, for example, hidden Markov acoustic model etc..Specific example can refer to
The above embodiments, details are not described herein again.
In a specific example, by taking DNN-HMM acoustic models as an example, wherein HMM is used for describing the dynamic of voice signal
Variation, the posterior probability of some state of continuous density HMM is estimated using each output node of DNN, you can obtain DNN-
HMM model.The voice for the random code reported every time and the user are a series of syllables with the voice of reading, to identification
Word is then a series of character.The present embodiment is based on scheduled character sound bank when establishing DNN-HMM acoustic models, leads to
The DNN-HMM acoustic models of the voice for the random code that global character acoustics adaptive training is reported, the user are crossed with reading
The DNN-HMM acoustic models of voice.
Wherein, the acoustic model for the random code reported every time and the user are forced with the acoustic model of the voice of reading
Whole alignment (Force Alignment) operation, takes the method word for word compared, the present embodiment can be significantly compared to traditional
Reduce calculation amount, the advantageous efficiency for improving identification.
Wherein, posterior probability algorithm is tested before pre-defined algorithm is in one embodiment, in other embodiments, can also be phase
Like degree algorithm, specific example can refer to the above embodiments, and details are not described herein again.
In the present embodiment, if the identical probability of two acoustic models after alignment is all higher than default second threshold, for example, it is default
Second threshold is 0.985, then it is assumed that user is consistent with the random code reported with the character of reading every time.What it is due to report is random
Code, therefore the pre-prepd synthesized voice of user can be effectively prevented and cheated, promote the safety of identification.
In one embodiment, step and above-described embodiment of each user with the vocal print feature vector of the voice of reading are extracted
The method for extracting the vocal print feature vector of voice is essentially identical, and details are not described herein again.
In one embodiment, calculate two-by-two vocal print feature vector apart from the step of, the step with above-mentioned calculating COS distance
Rapid essentially identical, details are not described herein again.
It is every time same user with the user of reading, at this time if COS distance is less than or equal to preset distance threshold
Standard vocal print feature vector using the vocal print feature vector as the user stores;If COS distance is more than preset distance
Threshold value, then be not same user with the user of reading every time, and prompt user re-registers.
The present invention also provides a kind of computer readable storage medium, processing is stored on the computer readable storage medium
The step of system, the processing system realizes the method for above-mentioned authentication when being executed by processor.
The embodiments of the present invention are for illustration only, can not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical scheme of the present invention substantially in other words does the prior art
Going out the part of contribution can be expressed in the form of software products, which is stored in a storage medium
In (such as ROM/RAM, magnetic disc, CD), including some instructions are used so that a station terminal equipment (can be mobile phone, computer, clothes
Be engaged in device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of electronic device, which is characterized in that the electronic device includes memory and the processing that is connect with the memory
Device is stored with the processing system that can be run on the processor in the memory, and the processing system is by the processor
Following steps are realized when execution:
Acoustic model establishment step under interactive voice answering IVR scenes when user's transacting business, reports the first presetting digit capacity
Random code for the user with read, and after with reading be respectively this report random code and the user this built with the voice of reading
The acoustic model of vertical preset kind;
Force whole alignment step, by the acoustic model of the random code of this report and the user this with the voice of reading acoustics
Model carries out forcing whole alignment operation, and the identical probability of two acoustic models after the alignment is calculated using pre-defined algorithm;
Authentication step extracts the use if the identical probability of two acoustic models after the alignment is more than preset first threshold value
This is vectorial with the vocal print feature of the voice of reading at family, obtains the standard vocal print feature vector that the user prestores after succeeding in registration,
And calculate the user this with the voice of reading vocal print feature vector and the standard vocal print feature vector distance, with to the user
Carry out authentication.
2. electronic device according to claim 1, which is characterized in that when the processing system is executed by the processor,
Also realize following steps:
Under interactive voice answering IVR scenes user carry out voiceprint registration when, report the second presetting digit capacity random code for
Family is respectively that the random code reported and user establish the preset kind with the voice of reading after every time with reading with reading default time
Acoustic model;
The acoustic model for the random code reported every time and corresponding user are forced with the acoustic model of the voice of reading respectively
Whole alignment operation calculates the identical probability of two acoustic models after alignment using pre-defined algorithm;
If the identical probability of two acoustic models after alignment is all higher than default second threshold, voice of each user with reading is extracted
Vocal print feature vector, calculate the distance of vocal print feature vector two-by-two, whether be every time same user with the user of reading to analyze;
If so, the standard vocal print feature vector using the vocal print feature vector as the user stores.
3. electronic device according to claim 1 or 2, which is characterized in that the acoustic model of the preset kind is depth
Neural network-hidden Markov model.
4. electronic device according to claim 1 or 2, which is characterized in that it is described extract the user this with reading voice
The step of vocal print feature vector include:
To the user, this with the voice of reading carries out preemphasis and windowing process, and carrying out Fourier transform to each adding window obtains
The frequency spectrum is inputted Meier filter to export to obtain Meier frequency spectrum by corresponding frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is based on mel-frequency cepstrum system
Number MFCC forms the user, and this is vectorial with the vocal print feature of the voice of reading.
5. a kind of method of authentication, which is characterized in that the method for the authentication includes:
S1 under interactive voice answering IVR scenes when user's transacting business, reports the random code of the first presetting digit capacity for this
Family with reading, and after with reading be respectively this random code for reporting and the user this sound of preset kind is established with the voice of reading
Learn model;
S2, by this report random code acoustic model and the user this with the voice of reading acoustic model force it is whole
Body alignment operation calculates the identical probability of two acoustic models after the alignment using pre-defined algorithm;
S3, if the identical probability of two acoustic models after the alignment is more than preset first threshold value, extract the user this with reading
Voice vocal print feature vector, obtain the standard vocal print feature vector that the user prestores after succeeding in registration, and calculate the use
Family this with the voice of reading vocal print feature vector and the standard vocal print feature vector distance, with to the user progress identity test
Card.
6. the method for authentication according to claim 5, which is characterized in that before the step S1, further include:
S01 when user carries out voiceprint registration under interactive voice answering IVR scenes, reports the random code of the second presetting digit capacity
It is respectively that the random code reported and user establish the default class with the voice of reading after every time with reading for user with reading default time
The acoustic model of type;
S02 respectively carries out the acoustic model for the random code reported every time and corresponding user with the acoustic model of the voice of reading
Whole alignment operation is forced, the identical probability of two acoustic models after alignment is calculated using pre-defined algorithm;
S03 extracts each user with reading if the identical probability of two acoustic models after alignment is all higher than default second threshold
Whether the vocal print feature vector of voice, calculate the distance of vocal print feature vector two-by-two, be every time same with the user of reading with analysis
User;
S04, if so, the standard vocal print feature vector using the vocal print feature vector as the user stores.
7. the method for authentication according to claim 5 or 6, which is characterized in that the acoustic model of the preset kind
For deep neural network-hidden Markov model.
8. the method for authentication according to claim 5 or 6, which is characterized in that described extraction user this with read
The step of vocal print feature vector of voice include:
To the user, this with the voice of reading carries out preemphasis and windowing process, and carrying out Fourier transform to each adding window obtains
The frequency spectrum is inputted Meier filter to export to obtain Meier frequency spectrum by corresponding frequency spectrum;
Cepstral analysis is carried out on Meier frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, is based on mel-frequency cepstrum system
Number MFCC forms the user, and this is vectorial with the vocal print feature of the voice of reading.
9. the method for authentication according to claim 5 or 6, which is characterized in that described calculating user this with read
Voice vocal print feature vector and standard vocal print feature vector apart from the step of include:
Wherein, describedIt is described for standard vocal print feature vectorFor the user this with reading voice
Vocal print feature vector.
10. a kind of computer readable storage medium, which is characterized in that be stored with processing system on the computer readable storage medium
System, when the processing system is executed by processor the method for authentication of the realization as described in any one of claim 5 to 9
Step.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810311721.2A CN108694952B (en) | 2018-04-09 | 2018-04-09 | Electronic device, identity authentication method and storage medium |
PCT/CN2018/102208 WO2019196305A1 (en) | 2018-04-09 | 2018-08-24 | Electronic device, identity verification method, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810311721.2A CN108694952B (en) | 2018-04-09 | 2018-04-09 | Electronic device, identity authentication method and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108694952A true CN108694952A (en) | 2018-10-23 |
CN108694952B CN108694952B (en) | 2020-04-28 |
Family
ID=63844884
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810311721.2A Active CN108694952B (en) | 2018-04-09 | 2018-04-09 | Electronic device, identity authentication method and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108694952B (en) |
WO (1) | WO2019196305A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109448732A (en) * | 2018-12-27 | 2019-03-08 | 科大讯飞股份有限公司 | A kind of digit string processing method and processing device |
CN110491393A (en) * | 2019-08-30 | 2019-11-22 | 科大讯飞股份有限公司 | The training method and relevant apparatus of vocal print characterization model |
CN110536029A (en) * | 2019-08-15 | 2019-12-03 | 咪咕音乐有限公司 | A kind of exchange method, network side equipment, terminal device, storage medium and system |
CN111161746A (en) * | 2019-12-31 | 2020-05-15 | 苏州思必驰信息科技有限公司 | Voiceprint registration method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103680497A (en) * | 2012-08-31 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Voice recognition system and voice recognition method based on video |
CN103986725A (en) * | 2014-05-29 | 2014-08-13 | 中国农业银行股份有限公司 | Client side, server side and identity authentication system and method |
CN107517207A (en) * | 2017-03-13 | 2017-12-26 | 平安科技(深圳)有限公司 | Server, auth method and computer-readable recording medium |
-
2018
- 2018-04-09 CN CN201810311721.2A patent/CN108694952B/en active Active
- 2018-08-24 WO PCT/CN2018/102208 patent/WO2019196305A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103680497A (en) * | 2012-08-31 | 2014-03-26 | 百度在线网络技术(北京)有限公司 | Voice recognition system and voice recognition method based on video |
CN103986725A (en) * | 2014-05-29 | 2014-08-13 | 中国农业银行股份有限公司 | Client side, server side and identity authentication system and method |
CN107517207A (en) * | 2017-03-13 | 2017-12-26 | 平安科技(深圳)有限公司 | Server, auth method and computer-readable recording medium |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109448732A (en) * | 2018-12-27 | 2019-03-08 | 科大讯飞股份有限公司 | A kind of digit string processing method and processing device |
CN109448732B (en) * | 2018-12-27 | 2021-06-08 | 科大讯飞股份有限公司 | Digital string voice processing method and device |
CN110536029A (en) * | 2019-08-15 | 2019-12-03 | 咪咕音乐有限公司 | A kind of exchange method, network side equipment, terminal device, storage medium and system |
CN110491393A (en) * | 2019-08-30 | 2019-11-22 | 科大讯飞股份有限公司 | The training method and relevant apparatus of vocal print characterization model |
CN110491393B (en) * | 2019-08-30 | 2022-04-22 | 科大讯飞股份有限公司 | Training method of voiceprint representation model and related device |
CN111161746A (en) * | 2019-12-31 | 2020-05-15 | 苏州思必驰信息科技有限公司 | Voiceprint registration method and system |
Also Published As
Publication number | Publication date |
---|---|
WO2019196305A1 (en) | 2019-10-17 |
CN108694952B (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107527620B (en) | Electronic device, the method for authentication and computer readable storage medium | |
CN107517207A (en) | Server, auth method and computer-readable recording medium | |
US9940935B2 (en) | Method and device for voiceprint recognition | |
US11862176B2 (en) | Reverberation compensation for far-field speaker recognition | |
CN108694952A (en) | Electronic device, the method for authentication and storage medium | |
CN107993071A (en) | Electronic device, auth method and storage medium based on vocal print | |
CN108154371A (en) | Electronic device, the method for authentication and storage medium | |
CN108281158A (en) | Voice biopsy method, server and storage medium based on deep learning | |
CN110473552A (en) | Speech recognition authentication method and system | |
EP3989217A1 (en) | Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium | |
CN108650266A (en) | Server, the method for voice print verification and storage medium | |
CN109065022A (en) | I-vector vector extracting method, method for distinguishing speek person, device, equipment and medium | |
CN110753263A (en) | Video dubbing method, device, terminal and storage medium | |
CN112037800A (en) | Voiceprint nuclear model training method and device, medium and electronic equipment | |
CN107229691A (en) | A kind of method and apparatus for being used to provide social object | |
CN109545226B (en) | Voice recognition method, device and computer readable storage medium | |
CN113112992B (en) | Voice recognition method and device, storage medium and server | |
CN109739968A (en) | A kind of data processing method and device | |
CN108630208B (en) | Server, voiceprint-based identity authentication method and storage medium | |
CN112382296A (en) | Method and device for voiceprint remote control of wireless audio equipment | |
CN113436633B (en) | Speaker recognition method, speaker recognition device, computer equipment and storage medium | |
CN115101054A (en) | Voice recognition method, device and equipment based on hot word graph and storage medium | |
CN116975823A (en) | Data processing method, device, computer equipment, storage medium and product | |
CN115223569A (en) | Speaker verification method based on deep neural network, terminal and storage medium | |
CN116403585A (en) | Outbound customer identification method and system based on robustness characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |