CN110223676A - The optimization method and system of deception recording detection neural network model - Google Patents

The optimization method and system of deception recording detection neural network model Download PDF

Info

Publication number
CN110223676A
CN110223676A CN201910516188.8A CN201910516188A CN110223676A CN 110223676 A CN110223676 A CN 110223676A CN 201910516188 A CN201910516188 A CN 201910516188A CN 110223676 A CN110223676 A CN 110223676A
Authority
CN
China
Prior art keywords
data
domain
feature extractor
deception
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910516188.8A
Other languages
Chinese (zh)
Inventor
俞凯
钱彦旻
王鸿基
丁翰林
王帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AI Speech Ltd
Original Assignee
Shanghai Jiaotong University
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University, AI Speech Ltd filed Critical Shanghai Jiaotong University
Priority to CN201910516188.8A priority Critical patent/CN110223676A/en
Publication of CN110223676A publication Critical patent/CN110223676A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Abstract

The embodiment of the present invention provides a kind of optimization method of deception recording detection neural network model.This method comprises: based on feature extractor, fraud detection device and domain prediction device building deception recording detection neural network model;Source domain data and target numeric field data are input to feature extractor;The output of feature extractor is separately input into fraud detection device and domain prediction device, recording detection neural network model is cheated by training, reduces the loss function value of fraud detection device and the loss function value of domain prediction device;Loss function value based on the domain prediction device after reduction carries out dual training to feature extractor, and the depth characteristic domain for making feature extractor be output to fraud detection device is constant with fraud detection area another characteristic.The embodiment of the present invention also provides a kind of optimization system of deception recording detection neural network model.The model of optimization of the embodiment of the present invention does not have to distinguish the ability of domain prediction in recording attack detecting, improves the Generalization Capability of cross-cutting test.

Description

The optimization method and system of deception recording detection neural network model
Technical field
The present invention relates to audio detection field more particularly to a kind of optimization methods of deception recording detection neural network model And system.
Background technique
Due to the convenience and reliability of authentication, and ASV (Automatic speaker verification, automatically Speaker verification) in deep neural network there is major progress, this leads to its, and the application such as the heart, telephone bank carries out quotient in a call Industry.However the fragility of ASV technology is easily exposed to ASV system in various deception voice attacks.
Spoofing attack detection technique of recording is commonly used in Speaker Recognition System, the audio for detecting input is recording Attack or realAudio, in order to protect ASV system from malice spoofing attack.The front end features extracted in audio, training Deep learning model has good differentiation effect in corresponding field.
In realizing process of the present invention, at least there are the following problems in the related technology for inventor's discovery:
As aforesaid, performance of these technologies on same a data set (field) is all good;However, if into The test of row cross datasets (cross-cutting), then performance will will be greatly reduced.Since in the same data set or same field, record It is more similar that sound configures (such as playback equipment, sound pick-up outfit and playback environ-ment), so these recording attacks are more similar;Without The recording configuration variance of same data set is larger, i.e., there are larger differences for recording attack.The above technology is arrived due to over-fitting On training set, good Generalization Capability is lacked for the recording attack type not occurred in training set, therefore is carried out across data When collection field is tested, since the training set of source domain and the test set of aiming field are there are biggish data distribution difference (mismatch), The effect of detection is caused to substantially reduce.
Summary of the invention
In order at least solve to cheat in practice in the prior art recording be what field often it is difficult to predict and use same The field that the deception recording detection neural network model of a training set training is identified often is recorded with deception in practice field It mismatches, also allows for the deception recording detection neural network model detection effect of cheating recording different for field often not Good problem.
In a first aspect, the embodiment of the present invention provides a kind of optimization method of deception recording detection neural network model, comprising:
Deception recording detection neural network model is constructed based on feature extractor, fraud detection device and domain prediction device, In, the feature extractor and the fraud detection device constitute the first branch, the feature extractor and the domain prediction Device constitutes second branch;
The feature extractor is input to using source domain data and target numeric field data as input sample, wherein source domain number According to having deception label and field label, target numeric field data has field label;
The output of the feature extractor is separately input into the fraud detection device and the domain prediction device, passes through instruction Practice the deception recording detection neural network model, reduces the loss function value of fraud detection device and reduce the domain prediction device Loss function value;
Loss function value based on the domain prediction device after the reduction carries out dual training to the feature extractor, with Make the feature extractor be output to the fraud detection device depth characteristic domain is constant and fraud detection area another characteristic.
Second aspect, the embodiment of the present invention provide a kind of optimization system of deception recording detection neural network model, comprising:
Network model construction procedures module, for being taken advantage of based on feature extractor, fraud detection device and the building of domain prediction device Deceive recording detection neural network model, wherein the feature extractor and the fraud detection device constitute the first branch, described Feature extractor and the domain prediction device constitute second branch;
Feature extraction program module, for being input to the spy using source domain data and target numeric field data as input sample Levy extractor, wherein source domain data have deception label and field label, and target numeric field data has field label;
Loss function optimizes program module, for the output of the feature extractor to be separately input into the fraud detection Device and the domain prediction device reduce the loss of fraud detection device by the training deception recording detection neural network model Functional value and the loss function value for reducing the domain prediction device;
Model optimization program module, for the loss function value based on the domain prediction device after the reduction to the feature Extractor carry out dual training so that the feature extractor be output to the fraud detection device depth characteristic be domain it is constant and Fraud detection area another characteristic.
The third aspect provides a kind of electronic equipment comprising: at least one processor, and with described at least one Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, institute It states instruction to be executed by least one described processor, so that at least one described processor is able to carry out any embodiment of the present invention Deception recording detection neural network model optimization method the step of.
Fourth aspect, the embodiment of the present invention provide a kind of storage medium, are stored thereon with computer program, and feature exists In the optimization side of the deception recording detection neural network model of realization any embodiment of the present invention when the program is executed by processor The step of method.
The beneficial effect of the embodiment of the present invention is: the reduction amplitude in order to reduce cross-cutting test performance proposes excellent The frame of deception recording detection neural network model after change, is added another field on the basis of traditional neural network model The output of prediction attacks model learning in recording by the dual training of feature extractor and domain prediction device Detection is not having the depth characteristic of separating capacity with distinguishing ability and on domain prediction, to improve the general of cross-cutting test Change performance, when solving the test of cross datasets field, the bad problem of identification effect.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of process of the optimization method for deception recording detection neural network model that one embodiment of the invention provides Figure;
Fig. 2 be one embodiment of the invention provide a kind of deception recording detection neural network model optimization method based on The cross-cutting recording spoofing attack detection framework schematic diagram of field dual training;
Fig. 3 is a kind of optimization method for deception recording detection neural network model that one embodiment of the invention provides Sentence quantity list datagram in 2016 data set of ASVspoof 2017V.2 data set and BTAS-PA;
Fig. 4 is a kind of LCNN of the optimization method for deception recording detection neural network model that one embodiment of the invention provides The topological structure Parameter Map of model;
Fig. 5 is a kind of baseline of the optimization method for deception recording detection neural network model that one embodiment of the invention provides The EER (%) of LCNN model and the LCNN-DAT model list data proposed on A-dev, A-eval, B-dev and B-eval Figure;
Fig. 6 is a kind of optimization method for deception recording detection neural network model that one embodiment of the invention provides not With the EER schematic diagram of the LCNN or LCNN-DAT model of training on training data;
Fig. 7 is a kind of structure of the optimization system for deception recording detection neural network model that one embodiment of the invention provides Schematic diagram.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
A kind of optimization method of deception recording detection neural network model provided as shown in Figure 1 for one embodiment of the invention Flow chart, include the following steps:
S11: based on feature extractor, fraud detection device and domain prediction device building deception recording detection neural network mould Type, wherein the feature extractor and the fraud detection device constitute the first branch, the feature extractor and the field Fallout predictor constitutes second branch;
S12: the feature extractor is input to using source domain data and target numeric field data as input sample, wherein source Numeric field data has deception label and field label, and target numeric field data has field label;
S13: being separately input into the fraud detection device and the domain prediction device for the output of the feature extractor, leads to It crosses and trains the deception recording detection neural network model, loss function value and the reduction field for reducing fraud detection device are pre- Survey the loss function value of device;
S14: the loss function value based on the domain prediction device after the reduction carries out confrontation instruction to the feature extractor Practice, so that the depth characteristic that the feature extractor is output to the fraud detection device is the constant spy with fraud detection difference in domain Sign.
In the present embodiment, for " field " described in the text, it is possible to understand that are as follows: the configuration of a recorded audio includes " playback equipment " (i.e. with what device plays original audio), " sound pick-up outfit " (what equipment recording former audio with) and " recording Environment " (audio play where ambient enviroment, such as office, dining room etc.), these configurations between the same data set Similitude can be relatively high, and these being similarly configured property of different data set can be relatively low, and the same data set of intuitivism apprehension is not Possible each audio all uses different playback environ-ment and equipment (often multiplex environments and equipment), and between two datasets Coincidence factor then may be almost 0, such as playback environ-ment and equipment used are absolutely not overlapped, that is to say, that cross datasets Field difference is huge, than otherness is much bigger the case where test in same data set.
For step S11, deception recording detection mind is constructed based on feature extractor, fraud detection device and domain prediction device Through network model, the conventional depth neural network for detecting replay attacks attack generally comprises two components: one is intended to It was found that the feature extractor of distinguishing feature, the other is Feature Mapping is implied them to the fraud detection device of deception label It is spoofing attack or true voice.
In order to mitigate the unmatched influence in domain, a kind of architecture is proposed, which can learn depth characteristic, It can solve detection replay attacks but the problem that cannot be distinguished between not same area, it is different from traditional neural network, it establishes new Branch's connection serves as domain classifier by gradient inversion layer after feature extractor.First branch includes feature extraction Device and fraud detection device, constitute the feed forward architecture of a standard.Article 2 branch has shared the feature extraction of first branch Device accesses a domain classification device (domain by a reversed layer of gradient (gradient reversal layer) classifier)。
For step S12, the spy is input to using ready source domain data and target numeric field data as input sample Levy extractor, wherein the deception label and field label of the source domain data and target numeric field data be when collecting data just It is known that.
For step S13, for " source domain data ", should go to calculate deception prediction loss by fraud detection device, again The domain prediction loss of calculating field detector over there simultaneously;And for " target numeric field data ", it is only necessary to calculating field detector Domain prediction loss, because of the label that these data are not cheated.And then pass through the training deception recording detection nerve net Network model reduces the loss function value of fraud detection device and reduces the loss function value of the domain prediction device.
For step S14, the loss function value based on the domain prediction device after the reduction to the feature extractor into Row dual training, by dual training, so that being the separating capacity of not domain features after feature extractor training.
It can be seen that by the embodiment in order to reduce the reduction amplitude of cross-cutting test performance, after proposing optimization Deception recording detection neural network model frame, another domain prediction is added on the basis of traditional neural network model Output, by the dual training of feature extractor and domain prediction device, finally make model learning to recording attack detecting There is no the depth characteristic of separating capacity with distinguishing ability and on domain prediction, to improve the generalization of cross-cutting test Can, when solving the test of cross datasets field, the bad problem of identification effect.
As an implementation, the loss function value based on the domain prediction device after the reduction is to the feature Extractor carries out dual training
By the loss function value of the domain prediction device after the reduction by the reversed layer of gradient to the feature extractor into Row dual training.
Further, after the reversed layer reversion by gradient, the loss function of the fraud detection device minimized is determined The loss function value of value and maximized domain prediction device.
In the present embodiment, (the gradient reversal of the GRL between feature extractor and domain prediction device Layer, gradient inversion layer) gradient is inverted during backpropagation.In turn, the loss function value of field fallout predictor is inverted maximum Change.
Can be seen that by the embodiment realizes domain prediction device by the reversed layer of gradient during backpropagation Loss function value maximizes, and helps the identification of the deception recording detection neural network model after optimizing more accurate.
As an implementation, when the data volume imbalance of the source domain data and the target numeric field data, logarithm Over-sampling is carried out according to few data field is measured, so that the data of the source domain data and the target numeric field data are flux matched.
, may be inadequate due to data in the acquisition of source domain data and target numeric field data, cause training when data volume not Balance, influences final effect of optimization.In order to avoid this case, the data field few to data volume carries out over-sampling.
It can be seen that the data volume by matching source domain data and the target numeric field data by the embodiment, guarantee When optimizing training, there are sufficient data to be trained optimization, the deception recording detection neural network model after improving optimization Recognition effect.
Above-mentioned steps are specifically implemented, the conventional depth neural network for replay attacks attack detecting is usually wrapped Containing two components: a feature extractor for being intended to find distinguishing feature, the other is by Feature Mapping to deception label Fraud detection device implies that they are spoofing attack or true voice.Assuming that input sample is x ∈ X and output label It is { [0,1], [1,0] } y ∈ Y=, wherein X and Y is input feature vector space and output label space respectively.Scene is mismatched in domain In, the source domain data data distribution similar but different with aiming field data sharing is expressed as S (x, y) and T (x, y).
In order to mitigate the unmatched influence in domain, a kind of architecture is proposed, which can learn depth characteristic, Cross-cutting recording spoofing attack detection framework schematic diagram based on field dual training as shown in Figure 2.With traditional neural network Difference, new branch pass through gradient inversion layer after being connected to feature extractor, serve as domain prediction device.Therefore, the framework is by two A output layer composition: one is deception label y ∈ Y, the other is domain label d ∈ D.Here { [0,1], [1,0] } Y=D=, because Binary classification task is usually modeled as to cheat.
Specifically, feature extractor Gf(·;Θf), fraud detection device Gy(·;Θy) and domain classifier Gd(·;Θd) Correspondence mappings function formula is as follows:
F=Gf(x;Θf)
Y=Gy(f;Θy)
D=Gd(f;Θd)
By xiIt is expressed as that there is label yiAnd diI-th of input sample, indicate xiFrom source domain ((xi, yi)~S (x, Y), if di=[0,1]) or aiming field ((xi, yi)~T (x, y) is if di=[1,0]).The deception inspection of i-th of input sample It surveys to lose to lose with domain prediction and is expressed as (domain refers to the field in text):
In order to find deception-differentiation and domain invariant features, target is to find optimal parameter Θf, ΘyAnd Θd, to minimize Fraud detection loss, while maximizing domain prediction loss.Therefore, the total losses of the whole network of N number of input sample can state It is as follows:
Wherein λ is two positive coefficients lost that fracture in back-propagation process.By finding saddle point With It can theoretically optimize.
Stochastic gradient descent (SGD) is used with the help of gradient inversion layer, the gradient of source domain sample
Wherein, α is learning rate.For aiming field sample, parameter ΘyIt does not update, parameter ΘdStill it updates, as parameter Θf When changing its update rule:
In order to verify the effect of this method, tested,
Test in 2016 data set of ASVspoof 2017V.2 data set and BTAS] PA part (only real audio And Replay Attack, be expressed as 2016 data set of BTAS-PA) on carry out.ASVspoof 2017V.2 data set as shown in Figure 3 The detail statistics of two datasets sentence quantity are listed with the sentence quantity list datagram in 2016 data set of BTAS-PA Data.
For ASVspoof 2017V.2 data set, all real audios both are from the son of original RedDots corpus Collection, and audio playback is then configured with various playbacks and is recorded, including acoustic enviroment, the various combination of playback apparatus and sound pick-up outfit. 2016 data set of BTAS is based on public AVspoof database, in the database, under different settings and environmental condition It will do it secret record, in addition the Replay Attack of two kinds of " unknown " types is further added to assessment and concentrates, and has more competition Challenge.In addition, the development set of 2016 data set of ASVspoof 2017V.2 data set and BTAS-PA and assessment collection are only in institute Have in experiment and is left test set.Model is selected, the 10% of training set is removed as verifying collection.
Front end features are 257 dimension spectrograms, are obtained by every 10 milliseconds 512 point fast Fouriers of calculating transformation, window is big Small is 25 milliseconds.The library Librosa uses Kaldi kit sliding by 300 frames for extracting front end features from initial data Dynamic window applies the cepstrum mean value and normalized square mean of each sentence.In addition, the mean value and standard deviation that calculate training data are simultaneously For global criteria.
Training is carried out in a manner of sentence, it means that application filling is needed, because sentence length is different.In order to All sentences of parallel processing in batch fill longest topic by repeating its feature in each batch.In all experiments Middle batch size is set as 8.
All neural networks all realize that Xavier initialization is used for all parameter layer in PyTorch.It is damaged using cross entropy The SGD optimization that losing is 0.9 using momentum in the training process of all models as loss criterion and learning rate is 0.0001 Device.In addition, directly carrying out calculated performance measurement EER using the score in predicting from neural network using end-to-end scoring method (Equal Error Rate, etc. error rates).EER is calculated using the kit provided in the challenge of ASV spoof 2019.
LCNN (Lingt Convolutional Neural Networks, lightweight convolutional neural networks) is 2017 The optimizer system of ASV spoof challenge, wherein maximum Feature Mapping (MFM) active module uses after CNN module.Due to making With batch filling rather than all sentence overall situations are filled into maximum length, therefore frame number (being expressed as T) is different because of batch.It will be real Existing LCNN is adjusted to the new version suitable for variable-length input feature vector.
The details of LCNN framework is described in the topological structure Parameter Map of LCNN model shown in Fig. 4.MaxPool mode is used In all maximum pond layers, make sub it is suitable for being less than the short sentence of 32 frames.In addition, being answered in the time dimension after MaxPool5 layers With average pond, so that significant reduction is fully connected the number of parameters in FC6 layers of (FC).0.5 ratio is used in FC7 and FC8 Dropout layers (random drop layer).
DAT (the domain adversarial based on LCNN can be readily available from baseline LCNN model Training, field dual training) (LCNN-DAT) frame.Specifically, the layer from Conv1 to MFM6 is considered as feature extraction Device, and FC7 and FC8 layers of composition fraud detection device.The copy of fraud detection device is used as anti-by gradient after feature extractor Turn the domain classifier of layer connection.But dropout is used not in the classifier of domain.
In order to make up the imbalance between source domain amount of training data and aiming field amount of training data, to a small number of domain training datas Over-sampling is carried out, to match most domains training data.Then, successively using the batch of all source domain data and all aiming fields The batch of data carrys out training pattern.In addition, in order to the early exercise stage inhibit the noise signal from domain classifier, use with Adaptation factor λ is gradually changed into 1 from 0 rather than initially fixes it by lower strategy:
Wherein r, which is set as 0.1, e, indicates housebroken the number of iterations.
Here, respectively by the training set of 2016 data set of ASVSpoof 2017v.2 data set and BTAS-PA, development set A-train, A-dev, A-eval, B-train, B-dev and B-eval are expressed as with test set.Table data figure shown in fig. 5 Compare EER (%) performance of baseline LCNN model and LCNN-DAT model on A-dev, A-eval, B-dev and B-eval. (wherein, use A-train+B-train to mean that A-train is source domain data as training data, and B-train is target Numeric field data, for B-train+A-train, vice versa.)
To realize 9.06EER on A-dev, 12.39EER is realized on A-eval, this shows the LCNN realized Slightly preferably promote.In addition, all performance is good on B-dev and B-eval for LCNN model, but the result is that on B-train mistake Degree fitting, this explains significant performance differences.Although LCNN model shows well in the same domain, they are at the two Generalization ability in data set is very poor.But by introducing domain antagonistic training framework, cross-domain test can be effectively reduced Performance slippage, without weakening its overall performance in original source domain.Specifically, if using in A-train+B- The LCNN-DAT model of training on train, opposite 38%, the B-eval that reduces of the performance degradation of B-dev then accordingly reduce by 57%, A- Dev is that 33%, A- is assessed if using the LCNN-DAT model trained on B-train+A-train for 30%.As a result table Bright, by the way that domain dual training is introduced LCNN frame, LCNN-DAT model is general for cross datasets replay attacks attack detecting Change ability is more preferable than not DAT.
Entire aiming field training set domain dual training.Herein, five foldings are randomly divided into, then use preceding 1 respectively, 2,3,4 and 5 foldings ensure that lesser training set is the subset of larger training set as unlabelled aiming field training data.
The EER schematic diagram of the LCNN or LCNN-DAT model of the training shown in fig. 6 on different training datas shows institute Systematic result.Regardless of the aiming field data volume used in all cases, significant cross-domain performance can be obtained It improves.However, it can be seen that LCNN-DAT model has better cross-cutting generalization ability using more aiming field training datas, and It will not influence the overall performance that they are concentrated in original source numeric field data.In addition, when 2016 data set of BTAS-PA is used as aiming field Rather than when ASVspoof 2017V.2 data set, it is relatively improved more significant.Reason may be the data set size of B-train It is A-train more than twice, so that it is more preferable effectively to help LCNN-DAT model to practise from more target numeric field datas middle school, and realizes Better cross-domain performance.
A kind of optimization system of deception recording detection neural network model of one embodiment of the invention offer is provided Structural schematic diagram, which can be performed described in above-mentioned any embodiment the optimization side of deception recording detection neural network model Method, and configure in the terminal.
A kind of optimization system of deception recording detection neural network model provided in this embodiment includes: network model building Program module 11, feature extraction program module 12, loss function optimize program module 13 and model optimization program module 14.
Wherein, network model construction procedures module 11 is used to be based on feature extractor, fraud detection device and domain prediction device Building deception recording detection neural network model, wherein the feature extractor and the fraud detection device constitute first Road, the feature extractor and the domain prediction device constitute second branch;Feature extraction program module 12 is used for source domain Data and target numeric field data are input to the feature extractor as input sample, wherein source domain data have deception label With field label, target numeric field data has field label;Loss function optimizes program module 13 and is used for the feature extractor Output be separately input into the fraud detection device and the domain prediction device, pass through training deception recording detection nerve net Network model reduces the loss function value of fraud detection device and reduces the loss function value of the domain prediction device;Model optimization journey Sequence module 14 carries out confrontation instruction to the feature extractor for the loss function value based on the domain prediction device after the reduction Practice, so that the depth characteristic that the feature extractor is output to the fraud detection device is the constant spy with fraud detection difference in domain Sign.
Further, the model optimization program module is used for:
By the loss function value of the domain prediction device after the reduction by the reversed layer of gradient to the feature extractor into Row dual training.
Further, after the reversed layer reversion by gradient, the loss function of the fraud detection device minimized is determined The loss function value of value and maximized domain prediction device.
Further, few to data volume when the data volume imbalance of the source domain data and the target numeric field data Data field carries out over-sampling, so that the data of the source domain data and the target numeric field data are flux matched.
The embodiment of the invention also provides a kind of nonvolatile computer storage media, computer storage medium is stored with meter The recording detection nerve of the deception in above-mentioned any means embodiment can be performed in calculation machine executable instruction, the computer executable instructions The optimization method of network model;
As an implementation, nonvolatile computer storage media of the invention is stored with the executable finger of computer It enables, computer executable instructions setting are as follows:
Deception recording detection neural network model is constructed based on feature extractor, fraud detection device and domain prediction device, In, the feature extractor and the fraud detection device constitute the first branch, the feature extractor and the domain prediction Device constitutes second branch;
The feature extractor is input to using source domain data and target numeric field data as input sample, wherein source domain number According to having deception label and field label, target numeric field data has field label;
The output of the feature extractor is separately input into the fraud detection device and the domain prediction device, passes through instruction Practice the deception recording detection neural network model, reduces the loss function value of fraud detection device and reduce the domain prediction device Loss function value;
Loss function value based on the domain prediction device after the reduction carries out dual training to the feature extractor, with Make the feature extractor be output to the fraud detection device depth characteristic domain is constant and fraud detection area another characteristic.
As a kind of non-volatile computer readable storage medium storing program for executing, it can be used for storing non-volatile software program, non-volatile Property computer executable program and module, such as the corresponding program instruction/mould of the method for the test software in the embodiment of the present invention Block.One or more program instruction is stored in non-volatile computer readable storage medium storing program for executing, when being executed by a processor, is held The optimization method of deception recording detection neural network model in the above-mentioned any means embodiment of row.
Non-volatile computer readable storage medium storing program for executing may include storing program area and storage data area, wherein storage journey It sequence area can application program required for storage program area, at least one function;Storage data area can be stored according to test software Device use created data etc..In addition, non-volatile computer readable storage medium storing program for executing may include that high speed is deposited at random Access to memory, can also include nonvolatile memory, a for example, at least disk memory, flush memory device or other are non- Volatile solid-state part.In some embodiments, it includes relative to place that non-volatile computer readable storage medium storing program for executing is optional The remotely located memory of device is managed, these remote memories can be by being connected to the network to the device of test software.Above-mentioned network Example include but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
The embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor, and with described at least one The memory of a processor communication connection, wherein the memory is stored with the finger that can be executed by least one described processor Enable, described instruction executed by least one described processor so that at least one described processor be able to carry out it is of the invention any The step of optimization method of the deception recording detection neural network model of embodiment.
The client of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone, multimedia handset, functional mobile phone and low-end mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as tablet computer.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player, handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) other electronic devices with audio detection function.
Herein, relational terms such as first and second and the like be used merely to by an entity or operation with it is another One entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this reality Relationship or sequence.Moreover, the terms "include", "comprise", include not only those elements, but also including being not explicitly listed Other element, or further include for elements inherent to such a process, method, article, or device.Do not limiting more In the case where system, the element that is limited by sentence " including ... ", it is not excluded that including process, method, the article of the element Or there is also other identical elements in equipment.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of optimization method of deception recording detection neural network model, which comprises
Deception recording detection neural network model is constructed based on feature extractor, fraud detection device and domain prediction device, wherein institute It states feature extractor and the fraud detection device constitutes the first branch, the feature extractor and the domain prediction device are constituted Second branch;
The feature extractor is input to using source domain data and target numeric field data as input sample, wherein source domain data tool There are deception label and field label, target numeric field data has field label;
The output of the feature extractor is separately input into the fraud detection device and the domain prediction device, passes through training institute Deception recording detection neural network model is stated, the loss function value of fraud detection device is reduced and reduces the damage of the domain prediction device Lose functional value;
Loss function value based on the domain prediction device after the reduction carries out dual training to the feature extractor, so that institute Stating feature extractor and being output to the depth characteristic of the fraud detection device is that domain is constant and fraud detection area another characteristic.
2. according to the method described in claim 1, wherein, the loss function value based on the domain prediction device after the reduction Carrying out dual training to the feature extractor includes:
The loss function value of domain prediction device after the reduction carries out pair the feature extractor by the reversed layer of gradient Anti- training.
3. according to the method described in claim 2, wherein, after the reversed layer reversion by gradient, determining that is minimized takes advantage of Deceive the loss function value of detector and the loss function value of maximized domain prediction device.
4. according to the method described in claim 1, wherein, when the data volume of the source domain data and target numeric field data injustice When weighing apparatus, the data field few to data volume carries out over-sampling, so that the data volume of the source domain data and the target numeric field data Matching.
5. a kind of optimization system of deception recording detection neural network model, the system comprises:
Network model construction procedures module, for based on feature extractor, fraud detection device and the building deception record of domain prediction device Sound detects neural network model, wherein the feature extractor and the fraud detection device constitute the first branch, the feature Extractor and the domain prediction device constitute second branch;
Feature extraction program module is mentioned for source domain data and target numeric field data to be input to the feature as input sample Take device, wherein source domain data have deception label and field label, and target numeric field data has field label;
Loss function optimize program module, for by the output of the feature extractor be separately input into the fraud detection device and The domain prediction device reduces the loss function of fraud detection device by the training deception recording detection neural network model It is worth and reduces the loss function value of the domain prediction device;
Model optimization program module, for the loss function value based on the domain prediction device after the reduction to the feature extraction Device carry out dual training so that the feature extractor be output to the fraud detection device depth characteristic be domain it is constant and deception Detection zone another characteristic.
6. system according to claim 5, wherein the model optimization program module is used for:
The loss function value of domain prediction device after the reduction carries out pair the feature extractor by the reversed layer of gradient Anti- training.
7. system according to claim 6, wherein after the reversed layer reversion by gradient, determine that is minimized takes advantage of Deceive the loss function value of detector and the loss function value of maximized domain prediction device.
8. system according to claim 5, wherein when the data volume of the source domain data and target numeric field data injustice When weighing apparatus, the data field few to data volume carries out over-sampling, so that the data volume of the source domain data and the target numeric field data Matching.
9. a kind of electronic equipment comprising: at least one processor, and deposited with what at least one described processor communication was connect Reservoir, wherein the memory be stored with can by least one described processor execute instruction, described instruction by it is described at least One processor executes, so that at least one described processor is able to carry out the step of any one of claim 1-4 the method Suddenly.
10. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor The step of any one of claim 1-4 the method.
CN201910516188.8A 2019-06-14 2019-06-14 The optimization method and system of deception recording detection neural network model Pending CN110223676A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910516188.8A CN110223676A (en) 2019-06-14 2019-06-14 The optimization method and system of deception recording detection neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910516188.8A CN110223676A (en) 2019-06-14 2019-06-14 The optimization method and system of deception recording detection neural network model

Publications (1)

Publication Number Publication Date
CN110223676A true CN110223676A (en) 2019-09-10

Family

ID=67817331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910516188.8A Pending CN110223676A (en) 2019-06-14 2019-06-14 The optimization method and system of deception recording detection neural network model

Country Status (1)

Country Link
CN (1) CN110223676A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735381A (en) * 2020-12-29 2021-04-30 四川虹微技术有限公司 Model updating method and device
CN113284508A (en) * 2021-07-21 2021-08-20 中国科学院自动化研究所 Hierarchical differentiation based generated audio detection system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875007A (en) * 2017-01-25 2017-06-20 上海交通大学 End-to-end deep neural network is remembered based on convolution shot and long term for voice fraud detection
US20180082689A1 (en) * 2016-09-19 2018-03-22 Pindrop Security, Inc. Speaker recognition in the call center
CN107944410A (en) * 2017-12-01 2018-04-20 中国科学院重庆绿色智能技术研究院 A kind of cross-cutting facial characteristics analytic method based on convolutional neural networks
CN108141363A (en) * 2015-10-15 2018-06-08 诺基亚技术有限公司 For the device of certification, method and computer program product
CN108198561A (en) * 2017-12-13 2018-06-22 宁波大学 A kind of pirate recordings speech detection method based on convolutional neural networks
US20180254046A1 (en) * 2017-03-03 2018-09-06 Pindrop Security, Inc. Method and apparatus for detecting spoofing conditions
US20180374487A1 (en) * 2017-06-27 2018-12-27 Cirrus Logic International Semiconductor Ltd. Detection of replay attack
CN109754812A (en) * 2019-01-30 2019-05-14 华南理工大学 A kind of voiceprint authentication method of the anti-recording attack detecting based on convolutional neural networks
US20190180742A1 (en) * 2017-12-08 2019-06-13 Google Llc Digital assistant processing of stacked data structures

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108141363A (en) * 2015-10-15 2018-06-08 诺基亚技术有限公司 For the device of certification, method and computer program product
US20180082689A1 (en) * 2016-09-19 2018-03-22 Pindrop Security, Inc. Speaker recognition in the call center
CN106875007A (en) * 2017-01-25 2017-06-20 上海交通大学 End-to-end deep neural network is remembered based on convolution shot and long term for voice fraud detection
US20180254046A1 (en) * 2017-03-03 2018-09-06 Pindrop Security, Inc. Method and apparatus for detecting spoofing conditions
US20180374487A1 (en) * 2017-06-27 2018-12-27 Cirrus Logic International Semiconductor Ltd. Detection of replay attack
CN107944410A (en) * 2017-12-01 2018-04-20 中国科学院重庆绿色智能技术研究院 A kind of cross-cutting facial characteristics analytic method based on convolutional neural networks
US20190180742A1 (en) * 2017-12-08 2019-06-13 Google Llc Digital assistant processing of stacked data structures
CN108198561A (en) * 2017-12-13 2018-06-22 宁波大学 A kind of pirate recordings speech detection method based on convolutional neural networks
CN109754812A (en) * 2019-01-30 2019-05-14 华南理工大学 A kind of voiceprint authentication method of the anti-recording attack detecting based on convolutional neural networks

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HIMAWAN I 等: "Deep domain adaptation for anti-spoofing in speaker verification systems", 《COMPUTER SPEECH & LANGUAGE》 *
WANG H 等: "Cross-Domain Replay Spoofing Attack Detection Using Domain Adversarial Training", 《INTERSPEECH. 2019》 *
WANG Q 等: "Unsupervised Domain Adaptation via Domain Adversarial Training for Speaker Recognition", 《2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
徐涌钞: "基于高频和瓶颈特征的说话人验证系统重放攻击检测方法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112735381A (en) * 2020-12-29 2021-04-30 四川虹微技术有限公司 Model updating method and device
CN112735381B (en) * 2020-12-29 2022-09-27 四川虹微技术有限公司 Model updating method and device
CN113284508A (en) * 2021-07-21 2021-08-20 中国科学院自动化研究所 Hierarchical differentiation based generated audio detection system
CN113284508B (en) * 2021-07-21 2021-11-09 中国科学院自动化研究所 Hierarchical differentiation based generated audio detection system
US11763836B2 (en) 2021-07-21 2023-09-19 Institute Of Automation, Chinese Academy Of Sciences Hierarchical generated audio detection system

Similar Documents

Publication Publication Date Title
CN109637546B (en) Knowledge distillation method and apparatus
CN110246487A (en) Optimization method and system for single pass speech recognition modeling
CN107924682A (en) Neutral net for speaker verification
CN110473569A (en) Detect the optimization method and system of speaker's spoofing attack
CN111835784B (en) Data generalization method and system for replay attack detection system
CN108766445A (en) Method for recognizing sound-groove and system
CN108109613A (en) For the audio training of Intelligent dialogue voice platform and recognition methods and electronic equipment
Yang et al. Modified magnitude-phase spectrum information for spoofing detection
CN104902012B (en) The method and singing contest system of singing contest are carried out by network
CN103730114A (en) Mobile equipment voiceprint recognition method based on joint factor analysis model
CN109584884A (en) A kind of speech identity feature extractor, classifier training method and relevant device
CN108986798B (en) Processing method, device and the equipment of voice data
CN110223676A (en) The optimization method and system of deception recording detection neural network model
CN108711336A (en) A kind of piano performance points-scoring system and its method
CN109448706A (en) Neural network language model compression method and system
CN109976998A (en) A kind of Software Defects Predict Methods, device and electronic equipment
CN108091326A (en) A kind of method for recognizing sound-groove and system based on linear regression
CN108877783A (en) The method and apparatus for determining the audio types of audio data
CN110211599A (en) Using awakening method, device, storage medium and electronic equipment
CN110223678A (en) Audio recognition method and system
Shi et al. Semi-supervised acoustic event detection based on tri-training
CN111191787B (en) Training method and device of neural network for extracting speaker embedded features
Cáceres et al. The Biometric Vox system for the ASVspoof 2021 challenge
CN108417207A (en) A kind of depth mixing generation network self-adapting method and system
CN108932646A (en) User tag verification method, device and electronic equipment based on operator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200616

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant after: AI SPEECH Co.,Ltd.

Applicant after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant before: AI SPEECH Co.,Ltd.

Applicant before: SHANGHAI JIAO TONG University

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201026

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant after: AI SPEECH Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Applicant before: AI SPEECH Co.,Ltd.

Applicant before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

TA01 Transfer of patent application right
CB02 Change of applicant information

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: AI SPEECH Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20190910

RJ01 Rejection of invention patent application after publication