CN109872720A - It is a kind of that speech detection algorithms being rerecorded to different scenes robust based on convolutional neural networks - Google Patents

It is a kind of that speech detection algorithms being rerecorded to different scenes robust based on convolutional neural networks Download PDF

Info

Publication number
CN109872720A
CN109872720A CN201910085725.8A CN201910085725A CN109872720A CN 109872720 A CN109872720 A CN 109872720A CN 201910085725 A CN201910085725 A CN 201910085725A CN 109872720 A CN109872720 A CN 109872720A
Authority
CN
China
Prior art keywords
frequency
voice
time
neural networks
pond
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910085725.8A
Other languages
Chinese (zh)
Other versions
CN109872720B (en
Inventor
王泳
赵雅珺
张梦鸽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN201910085725.8A priority Critical patent/CN109872720B/en
Publication of CN109872720A publication Critical patent/CN109872720A/en
Application granted granted Critical
Publication of CN109872720B publication Critical patent/CN109872720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

Speech detection algorithms are rerecorded to different scenes robust based on convolutional neural networks the invention discloses a kind of, more particularly to speech detection algorithms field, by the way that voice time-frequency figure is input in algorithm model, algorithm model includes seven layers, every layer includes a convolutional layer and a pond layer, and residual error connection is added by line rectification function in the output of convolutional layer between the layers, final feature is extracted finally by global poolization, and passes through sigmoid predicted detection result.Data entry modality of the present invention using time-frequency figure as network in the present invention, relative to voice data is directly inputted, the characteristic information that time-frequency figure introduces rewriting device has relatively intensive distribution, is more advantageous to neural network characteristics extraction, to accelerate to train, precision is improved.

Description

It is a kind of that speech detection algorithms being rerecorded to different scenes robust based on convolutional neural networks
Technical field
The present invention relates to speech detection algorithms fields, it is more particularly related to which a kind of be based on convolutional neural networks Speech detection algorithms are rerecorded to different scenes robust.
Background technique
Existing research proves, voice convert (Voice Conversion, VC), speech synthesis (Speech Synthesis, SS) and rerecording voice etc., duplicity voice can effectively cheat Speaker Identification (Automatic Speaker Recognition, ASV) system, thus personation accessing system, rerecording voice can be such that the higher mistake of ASV system generation connects By rate, social safety generation is seriously threatened.Wherein, the voice messaging and feature that VC and SS needs target speaker more, then In addition existing algorithm not yet full maturity, cost of implementation and difficulty are relatively high;And it rerecords voice and utilizes cheap sound pick-up outfit It is readily available, and rerecords all features that voice forgives target person voice substantially, therefore, opposite VC and SS has more prestige The side of body.For this purpose, rerecording the detection of voice should be taken seriously.
SV (automatic Speaker Identification) system in practice using more and more, such as: access control system, phone silver The fields such as row, military affairs.Since speaker verification's process does not need any aspectant contact, ASV system be very easy to by To the attack of duplicity voice.The duplicity voice that audio frequency apparatus generates can bring prestige to ASV (automatic Speaker Identification) system The side of body, influences the security performance of the system.In nearest more than ten years, digital audio product not only emerges one after another in type, but also The integrated function of various product is also more and more, increasingly stronger.Now with the PC for being equipped with audio processing software or Person have the relatively inexpensive equipment such as PDA of audio processing ability can achieve the effect that it is same or similar.For example, high quality, Sound pick-up outfit-smart phone of low cost, the duplicity voice formed will constitute risk to ASV system.Duplicity voice Including replay attack, voice conversion, speech synthesis etc..Attacker can forge characteristic using fraudulent voice, to obtain Illegal identity access to system, and then the file data of user, privacy will be stolen, and bring the damage that can not much make up It loses.Wherein replay attack is relative to voice conversion and speech synthesis with more threat.Replay attack is from realistic objective speaker The speech samples of middle acquisition, form are continuous pre-recorded speech samples.Spoofing attack based on replay does not need pair Voice does any technical treatment, and the voice and replay voice of realistic objective speaker has identical frequency spectrum and advanced spy Sign, it is the voice attack type being easiest to.And synthesize voice and deform voice of the voice relative to realistic objective speaker, it is There is certain errors and variations, be not identical, so to the detection of replay attack relative to synthesis voice and deformation Voice has bigger difficulty.
Summary of the invention
In order to overcome the drawbacks described above of the prior art, the embodiment of the present invention provides one kind based on convolutional neural networks to not Speech detection algorithms are rerecorded with scene robust, the data entry modality by using time-frequency figure as network in the present invention, phase For directly inputting voice data, the characteristic information that time-frequency figure introduces rewriting device has relatively intensive distribution, more favorably It is extracted in neural network characteristics, to accelerate to train, improves precision, to different recording arrangements, recorded environment and record distance The detection for rerecording voice has very high accuracy.
To achieve the above object, the invention provides the following technical scheme: a kind of convolutional neural networks that are based on are to different scenes Robust rerecords speech detection algorithms, specifically includes the following steps:
A, raw tone is acquired using sound pick-up outfit, and is converted through DA/AD, voice is rerecorded in acquisition;
B, raw tone can generate distortion in conversion process, and the distortion data of raw tone is calculated by distortion model, Wherein, distortion model expression formula are as follows:
Y (t) is to rerecord voice, and x (t) is raw tone, and λ is the amplitude transformation factor, and α is the time shaft linear extendible factor, η It is superimposed noise;
Corresponding frequency domain changes expression formula:
Y (j ω), X (j ω), N (j ω) they are respectively the frequency domain representation of y (t), x (t), η, for fixed sound pick-up outfit, It is characterized in highly stable, i.e., λ, α are constants;
C, it rerecords voice and voice time-frequency figure is produced by Short Time Fourier Transform;
D, voice time-frequency figure is input in algorithm model, and algorithm model includes seven layers, and every layer includes a convolutional layer and one A pond layer, residual error connection is added by line rectification function in the output of convolutional layer between the layers, finally by the overall situation Pondization extracts final feature, and passes through sigmoid predicted detection result.
In a preferred embodiment, when rerecording voice and being converted, Short Time Fourier Transform uses the 126 length Chinese Bright (hanning) window, step-length 50, the size of time-frequency figure are (64x62).
In a preferred embodiment, algorithm model is used in frequency dimension convolution, and time dimension pond is specifically set It is set to using 3x1 convolution kernel, the pond 1x2, and can mutually agree with the feature distribution feature of time-frequency figure, voice time-frequency figure characteristic distributions It is with independence and again with uniformity in special frequency channel between adjacent speech frame.
In a preferred embodiment, algorithm model uses technology of the deep learning as data-driven.
In a preferred embodiment, rewriting device can introduce variation, depth on the frequency domain of primitive sound signal Practise input data of the model using original audio signal as network.
In a preferred embodiment, when the algorithm model carries out frequency dimension progress convolution, do not consider the time The correlation of dimension, and when frequency dimension carries out convolution, while carrying out time dimension and carrying out pond.
In a preferred embodiment, convolution kernel can parameter sharing, the equipment for the same distribution that time dimension has Characteristic information repetition training convolution nuclear parameter, pond layer use the pond (1x2) of time dimension, and frequency dimension is without pond.
Technical effect and advantage of the invention:
1, data entry modality of the present invention using time-frequency figure as network in of the invention, relative to directly inputting voice number According to, the characteristic information that time-frequency figure introduces rewriting device has relatively intensive distribution, it is more advantageous to neural network characteristics extraction, To accelerate to train, precision is improved;
2, using in frequency dimension convolution, time dimension pond is specifically configured to using 3x1 convolution kernel, the pond 1x2 the present invention Change, only carries out convolution in frequency dimension, do not consider the correlation of time dimension, convolution nuclear parameter amount can be significantly reduced, so that Model has stronger anti-over-fitting ability, and data volume is depended in reduction unduly, while in the training process due to convolution kernel Parameter sharing, the characteristic information repetition training convolution nuclear parameter of the equipment for the same distribution that time dimension has can make training more Add sufficiently;
3, the present invention does not need to need manually to choose specific one or multiple spies as traditional machine learning method Then sign is classified with classifier again, can spontaneously extract the feature and deep layer that relevant feature includes some shallow-layer edges Feature then so that classify, simplify whole flow process and reached better effect;
4, inventive algorithm has the detection for rerecording voice of different recording arrangements, recording environment and recording distance very high Accuracy.
Detailed description of the invention
Fig. 1 is algorithm model structural schematic diagram of the invention.
Fig. 2 is that voice of the invention rerecords process schematic.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Embodiment 1
It is as shown in Figure 1 it is a kind of based on convolutional neural networks to the speech detection algorithms of rerecording of different scenes robust, algorithm Model shares 7 layers, and every layer includes a convolutional layer and a pond layer, and the output of convolutional layer passes through line rectification function, and Residual error connection is added between layers, extracts final feature finally by global poolization, and pass through sigmoid predicted detection knot Fruit, using in frequency dimension convolution, time dimension pond is specifically configured to using 3x1 convolution kernel, the pond 1x2, and maximizing reduces Model capacity greatly reduces the risk of over-fitting, and it is again special with the feature distribution of time-frequency figure to the dependence of data volume to reduce model Point height is agreed with, and training parameter is assigned to more reasonable place, more compact parameter is trained with more effective feature;
Voice time-frequency figure, is generated by Short Time Fourier Transform, and relative to voice data is directly inputted, time-frequency figure is for rerecording The characteristic information that equipment introduces has relatively intensive distribution, is more advantageous to neural network characteristics extraction, to accelerate to train, improves Precision, rewriting device can introduce variation on the frequency domain of primitive sound signal, and the performance of deep learning model has data high Dependence, using original audio signal as the input data of network, feature distribution is excessively sparse, greatly improves nerve net The difficulty of network extraction validity feature;
Embodiment 2
It is as shown in Figure 2 it is a kind of speech detection algorithms are rerecorded to different scenes robust based on convolutional neural networks, rerecord Lead to a degree of distortion of voice data, including the linear extendible on amplitude distortion and time shaft, wherein distortion model expression Formula are as follows:
Y (t) is to rerecord voice, and x (t) is raw tone, and λ is the amplitude transformation factor, and α is the time shaft linear extendible factor, η It is superimposed noise;
Corresponding frequency domain changes expression formula:
Y (j ω), X (j ω), N (j ω) they are respectively the frequency domain representation of y (t), x (t), η, for fixed sound pick-up outfit, It is characterized in highly stable, i.e., λ, α are constants;
Embodiment 3
In this embodiment, using 0.2 second voice segments as experimental data, Short Time Fourier Transform uses 126 length Hammings (hanning) window, step-length 50, the size of time-frequency figure are (64x62);
Further, in the above-mentioned technical solutions, convolution is carried out using in frequency dimension, while carries out pond in time dimension Change, only carries out convolution in frequency dimension, do not consider the correlation of time dimension, convolution nuclear parameter amount can be significantly reduced, so that Model has stronger anti-over-fitting ability, and data volume is depended in reduction unduly, while in the training process due to convolution kernel Parameter sharing, the characteristic information repetition training convolution nuclear parameter of the equipment for the same distribution that time dimension has can make training more Adding sufficiently, pond layer uses the pond (1x2) of time dimension, and for frequency dimension without pond, pond can be reduced the dimension of feature, Accelerate the calculating of network, and network structure is made to have stronger robustness to flexible, the deformation of data characteristics, for time-frequency figure, Feature distribution not only reduces characteristic dimension, but also not will lead to frequency only in time dimension pond with deformation there is no flexible The loss of dimensional characteristics, is calculated by multilayer convolution and pondization, and characteristic dimension eventually becomes one-dimensional, length and time-frequency figure frequency phase Together;
Further, in the above-mentioned technical solutions, raw tone library is by 30000 sections of voices, and totally 60 people record composition, sampling Frequency 16kHz, quantified precision 16bits;
The voice of 10 spokesman of random selection guarantees training for training as test data, the voice of remaining 50 people The independence of data and test data avoids the recording of same position spokesman from appearing in different data collection;
Specific recording process is as follows: for training set, being combined by different distance and equipment to original language under quiet environment Sound library is rerecorded 4 times, rerecords sound bank thus to obtain 4, they separately include 25000 sections of voices, is mentioned at random from 4 sound banks Take totally 25000 sections of voices collectively constitute totally 50000 sections of training dataset with raw tone as negative sample.Raw tone passes through Laptop computer is associated Y40-70AT-IFI and is played;Rewriting device is that 14 (Ins14VD-258) are got in the Inspion spirit of laptop computer Dell With smart phone millet 2S;
The case where 4 recordings, is as shown in table 1:
1 recorded speech of table
For test data, it is arranged using the identical recording of table two, in order to verify interference of the model to environment random noise Voice robustness, recorded respectively in quiet environment in the environment of having certain random noise, test set includes 4 voices altogether Library, each sound bank include the quiet environment under the library recording mode and totally 10000 tested speech containing ambient noise;
Further, in the above-mentioned technical solutions, network error function is cross entropy loss function, is optimized using Adam and is calculated Method is trained, and initial learning rate is set as 0.001, and dynamic regularized learning algorithm rate in the training process, and every training 10000 times will Learning rate reduces one times, and training batch size is 32 every time, in order in training process supervised training effect, from training data with Machine chooses 2000 datas for verifying, by comparative training data degradation function and verify data loss function, to lose letter Number, which is added regularization term and regularization coefficient is arranged, can effectively prevent over-fitting for 0.0001;
Table 2 lists some important hyper parameter settings in training process, has in the setting lower network in training process Quickly convergence, and finally obtain quite high accuracy;
2 hyper parameter (β of table1、β2Respectively Adam optimizer parameter)
Further, in the above-mentioned technical solutions, the present embodiment contains 4 experiments test, is for different records respectively The test of control equipment and different recordings apart from lower progress, it is as shown in table 3 to test experimental results every time:
3 experimental result of table
Test experiments accuracy in varied situations is attained by 99.8% or more, and it is fine to ensure that experimental model has Versatility.
Last: the foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, all in the present invention Spirit and principle within, any modification, equivalent replacement, improvement and so on, should be included in protection scope of the present invention it It is interior.

Claims (7)

1. it is a kind of based on convolutional neural networks to the speech detection algorithms of rerecording of different scenes robust, spy is, specifically includes Following steps:
A, raw tone is acquired using sound pick-up outfit, and is converted through DA/AD, voice is rerecorded in acquisition;
B, raw tone can generate distortion in conversion process, and the distortion data of raw tone is calculated by distortion model, wherein Distortion model expression formula are as follows:
Y (t) is to rerecord voice, and x (t) is raw tone, and λ is the amplitude transformation factor, and α is the time shaft linear extendible factor, and η is folded Plus noise;
Corresponding frequency domain changes expression formula:
Y (j ω), X (j ω), N (j ω) are respectively the frequency domain representation of y (t), x (t), η, for fixed sound pick-up outfit, feature Be it is highly stable, i.e., λ, α are constants;
C, it rerecords voice and voice time-frequency figure is produced by Short Time Fourier Transform;
D, voice time-frequency figure is input in algorithm model, and algorithm model includes seven layers, and every layer includes a convolutional layer and a pond Change layer, residual error connection is added by line rectification function in the output of convolutional layer between the layers, finally by global pool Final feature is extracted, and passes through sigmoid predicted detection result.
2. it is according to claim 1 it is a kind of based on convolutional neural networks to different scenes robust rerecord speech detection calculate Method, it is characterised in that: when rerecording voice and being converted, Short Time Fourier Transform uses 126 length Hamming (hanning) windows, step A length of 50, the size of time-frequency figure is (64x62).
3. it is according to claim 1 it is a kind of based on convolutional neural networks to different scenes robust rerecord speech detection calculate Method, it is characterised in that: algorithm model is used in frequency dimension convolution, time dimension pond, is specifically configured to using 3x1 convolution Core, the pond 1x2, and can mutually agreeing with the feature distribution feature of time-frequency figure, voice time-frequency figure characteristic distributions adjacent speech frame it Between have independence and special frequency channel again it is with uniformity.
4. it is according to claim 3 it is a kind of based on convolutional neural networks to different scenes robust rerecord speech detection calculate Method, it is characterised in that: algorithm model uses technology of the deep learning as data-driven.
5. it is according to claim 4 it is a kind of based on convolutional neural networks to different scenes robust rerecord speech detection calculate Method, it is characterised in that: rewriting device can introduce variation on the frequency domain of primitive sound signal, and deep learning model is believed with original audio Input data number as network.
6. it is according to claim 3 it is a kind of based on convolutional neural networks to different scenes robust rerecord speech detection calculate Method, it is characterised in that: the algorithm model carries out frequency dimension when carrying out convolution, does not consider the correlation of time dimension, and When frequency dimension carries out convolution, while carrying out time dimension and carrying out pond.
7. it is according to claim 3 it is a kind of based on convolutional neural networks to different scenes robust rerecord speech detection calculate Method, it is characterised in that: convolution kernel can parameter sharing, the equipment for the same distribution that time dimension has characteristic information repetition training volume Product nuclear parameter, pond layer use the pond (1x2) of time dimension, and frequency dimension is without pond.
CN201910085725.8A 2019-01-29 2019-01-29 Re-recorded voice detection algorithm for different scene robustness based on convolutional neural network Active CN109872720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910085725.8A CN109872720B (en) 2019-01-29 2019-01-29 Re-recorded voice detection algorithm for different scene robustness based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910085725.8A CN109872720B (en) 2019-01-29 2019-01-29 Re-recorded voice detection algorithm for different scene robustness based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN109872720A true CN109872720A (en) 2019-06-11
CN109872720B CN109872720B (en) 2022-11-22

Family

ID=66918246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910085725.8A Active CN109872720B (en) 2019-01-29 2019-01-29 Re-recorded voice detection algorithm for different scene robustness based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN109872720B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211604A (en) * 2019-06-17 2019-09-06 广东技术师范大学 A kind of depth residual error network structure for voice deformation detection
CN110689902A (en) * 2019-12-11 2020-01-14 北京影谱科技股份有限公司 Audio signal time sequence processing method, device and system based on neural network and computer readable storage medium
CN110797031A (en) * 2019-09-19 2020-02-14 厦门快商通科技股份有限公司 Voice change detection method, system, mobile terminal and storage medium
CN111370028A (en) * 2020-02-17 2020-07-03 厦门快商通科技股份有限公司 Voice distortion detection method and system
CN111916067A (en) * 2020-07-27 2020-11-10 腾讯科技(深圳)有限公司 Training method and device of voice recognition model, electronic equipment and storage medium
CN112614483A (en) * 2019-09-18 2021-04-06 珠海格力电器股份有限公司 Modeling method based on residual convolutional network, voice recognition method and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170092297A1 (en) * 2015-09-24 2017-03-30 Google Inc. Voice Activity Detection
US20180068675A1 (en) * 2016-09-07 2018-03-08 Google Inc. Enhanced multi-channel acoustic models
CN108198561A (en) * 2017-12-13 2018-06-22 宁波大学 A kind of pirate recordings speech detection method based on convolutional neural networks
CN109065030A (en) * 2018-08-01 2018-12-21 上海大学 Ambient sound recognition methods and system based on convolutional neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170092297A1 (en) * 2015-09-24 2017-03-30 Google Inc. Voice Activity Detection
US20180068675A1 (en) * 2016-09-07 2018-03-08 Google Inc. Enhanced multi-channel acoustic models
CN108198561A (en) * 2017-12-13 2018-06-22 宁波大学 A kind of pirate recordings speech detection method based on convolutional neural networks
CN109065030A (en) * 2018-08-01 2018-12-21 上海大学 Ambient sound recognition methods and system based on convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
项世军: "稳健音频水印研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211604A (en) * 2019-06-17 2019-09-06 广东技术师范大学 A kind of depth residual error network structure for voice deformation detection
CN112614483A (en) * 2019-09-18 2021-04-06 珠海格力电器股份有限公司 Modeling method based on residual convolutional network, voice recognition method and electronic equipment
CN110797031A (en) * 2019-09-19 2020-02-14 厦门快商通科技股份有限公司 Voice change detection method, system, mobile terminal and storage medium
CN110689902A (en) * 2019-12-11 2020-01-14 北京影谱科技股份有限公司 Audio signal time sequence processing method, device and system based on neural network and computer readable storage medium
CN111370028A (en) * 2020-02-17 2020-07-03 厦门快商通科技股份有限公司 Voice distortion detection method and system
CN111916067A (en) * 2020-07-27 2020-11-10 腾讯科技(深圳)有限公司 Training method and device of voice recognition model, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109872720B (en) 2022-11-22

Similar Documents

Publication Publication Date Title
CN109872720A (en) It is a kind of that speech detection algorithms being rerecorded to different scenes robust based on convolutional neural networks
CN108711436B (en) Speaker verification system replay attack detection method based on high frequency and bottleneck characteristics
CN108922518A (en) voice data amplification method and system
CN108231067A (en) Sound scenery recognition methods based on convolutional neural networks and random forest classification
CN104835498A (en) Voiceprint identification method based on multi-type combination characteristic parameters
CN110111797A (en) Method for distinguishing speek person based on Gauss super vector and deep neural network
CN108198561A (en) A kind of pirate recordings speech detection method based on convolutional neural networks
Monge-Alvarez et al. Audio-cough event detection based on moment theory
CN108831443A (en) A kind of mobile sound pick-up outfit source discrimination based on stacking autoencoder network
CN108091326A (en) A kind of method for recognizing sound-groove and system based on linear regression
CN106531174A (en) Animal sound recognition method based on wavelet packet decomposition and spectrogram features
CN111402922B (en) Audio signal classification method, device, equipment and storage medium based on small samples
Sukhwal et al. Comparative study of different classifiers based speaker recognition system using modified MFCC for noisy environment
Cao et al. Underwater target classification at greater depths using deep neural network with joint multiple‐domain feature
CN110136746B (en) Method for identifying mobile phone source in additive noise environment based on fusion features
Zheng et al. MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios
CN110390937A (en) A kind of across channel method for recognizing sound-groove based on ArcFace loss algorithm
Reimao Synthetic speech detection using deep neural networks
Chen et al. Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework
CN114863937A (en) Hybrid birdsong identification method based on deep migration learning and XGboost
El‐Dahshan et al. Intelligent methodologies for cardiac sound signals analysis and characterization in cepstrum and time‐scale domains
Zhou et al. Robust sound event classification by using denoising autoencoder
CN111785262A (en) Speaker age and gender classification method based on residual error network and fusion characteristics
CN116631406B (en) Identity feature extraction method, equipment and storage medium based on acoustic feature generation
Boujnah et al. Smartphone-captured ear and voice database in degraded conditions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 510665 293 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong.

Applicant after: GUANGDONG POLYTECHNIC NORMAL University

Address before: 510665 293 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong.

Applicant before: GUANGDONG POLYTECHNIC NORMAL University

GR01 Patent grant
GR01 Patent grant