CN113223537B - Voice training data iterative updating method based on stage test feedback - Google Patents
Voice training data iterative updating method based on stage test feedback Download PDFInfo
- Publication number
- CN113223537B CN113223537B CN202110489679.5A CN202110489679A CN113223537B CN 113223537 B CN113223537 B CN 113223537B CN 202110489679 A CN202110489679 A CN 202110489679A CN 113223537 B CN113223537 B CN 113223537B
- Authority
- CN
- China
- Prior art keywords
- voice
- training
- speech
- test
- voices
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 109
- 238000012360 testing method Methods 0.000 title claims abstract description 63
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 description 11
- 238000011161 development Methods 0.000 description 9
- 238000011156 evaluation Methods 0.000 description 7
- 238000005457 optimization Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 241001672694 Citrus reticulata Species 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 2
- 230000007123 defense Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000010187 selection method Methods 0.000 description 2
- 102100024109 Cyclin-T1 Human genes 0.000 description 1
- 101000910488 Homo sapiens Cyclin-T1 Proteins 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a speech training data iteration updating method based on stage test feedback, which trains and establishes a reference model by continuously adding misrecognized speech in stage test speech, then calculates the likelihood score of the original training speech on the reference model, sorts various speeches according to the likelihood score, and selects various sequenced speeches according to a certain proportion to obtain stage core training speech. By the iterative updating method of the voice training data, high-quality training voice can be continuously screened according to the feedback of the test data, and the obtained stage core training voice timely utilizes stage application feedback, so that the future recognition performance of the voice training voice is better and better; the method is suitable for voice classification scenes such as voice recognition, speaker recognition, fake voice recognition and the like.
Description
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a voice training data iterative updating method based on stage test feedback.
Background
The voiceprint authentication system has the advantages of low acquisition cost, easiness in acquisition, convenience in remote authentication and the like as a biological authentication mode, and is widely applied to the fields of access control systems, financial transactions, judicial appraisal and the like. With the rapid development of the voice synthesis technology, on one hand, more convenient service and better user experience are brought to people, such as real sound intelligent customer service, real sound intelligent navigation, audio reading, intelligent voice calling and the like; on the other hand, the method also brings great challenges to the security of the voiceprint authentication system, for example, the voiceprint authentication system is attacked by the synthesized voice to obviously reduce the performance of the voiceprint authentication system, so that the method has important significance in research on the detection of the synthesized voice.
The purpose of the synthesized speech detection is to detect the synthesized speech from the real speech. The existing experimental research on the detection of the synthesized voice is trained according to a training set by a match, and a large amount of training data is usually used; however, in practical situations, when more training data is used, the performance is rather degraded, because redundancy exists in the training data, and it is necessary to make data selection. In a real engineering problem, such a scenario is encountered: the test is performed in stages, when the test is continuously performed, a plurality of test stage results are obtained, how to select training data according to the stage results in a feedback mode to obtain a better model so as to obtain better performance in the subsequent test stage is a practical problem worthy of discussion.
Disclosure of Invention
After the voice classification system acquires test data in a stage in actual operation, how to update the voice training data by iteration by using the test data so as to update a classification model, so that the future recognition performance is better; aiming at the problem, the invention provides a speech training data iterative updating method based on stage test feedback, and by using the method, a high-quality core training speech set can be selected by using stage result feedback in a stage test scene, so that a model obtains better performance under the condition of using less training speech, the training time and energy consumption are saved, and the detection performance is improved.
A speech training data iteration updating method based on stage test feedback comprises the following steps:
s1, extracting features of original training voice and then training to obtain an original model;
s2, performing a round of stage test, selecting the misrecognized speech from the test speech according to the score of the test speech on the original model, and adding the misrecognized speech into a reference speech set;
s3, training by using the voices in the reference voice set to obtain a reference model;
s4, calculating the matching score of the original training voice on the reference model;
s5, sequencing each training voice in each training voice set in sequence according to the model score;
s6, selecting training voices ranked at the top one by one according to a certain proportion as core training voices of the test of the current stage;
s7, extracting characteristics of the core training voice, and then training to obtain a core model tested in the current stage;
and S8, carrying out a new round of stage test, selecting the misrecognized speech from the core model obtained by the previous round of stage test according to the score of the test speech, adding the selected misrecognized speech into the reference speech set, and returning to execute the step S3.
Further, the specific implementation manner of step S2 is: inputting the test voice into the original model according to categories to obtain corresponding scores, taking the test voice which has lower score and is the real voice of the user and the test voice which has higher score and is not the real voice of the user as the misrecognized voice tested at the stage, and incorporating the misrecognized voice into a reference voice set, wherein the initial reference voice set comprises known part of the test voice.
Further, the specific implementation manner of step S3 is: for N-type voice classification tasks, dividing the voices in the reference voice set into N subsets according to the categories of the voices, sequentially extracting features of the voices in the subsets, and then respectively training the voices to obtain reference models of the voices, namely N reference models, wherein N is a natural number which is larger than 1, namely a set voice category number.
Further, the specific implementation manner of step S4 is: firstly, original training voices are classified according to voice categories to obtain N category training voice sets, then, the characteristics of the original training voices in the various training voice sets are extracted in sequence and input into reference models of corresponding categories, and matching scores of the training voices, namely model scores, are obtained through calculation and output. The way of extracting the speech features in this step is consistent with the way of extracting the speech features when the reference model is trained in step S3.
Further, the specific implementation manner of step S5 is: based on the matching scores of all the training voices obtained in step S4, the training voices in each class of training voice set are sorted by their model score from high to low according to the voice category.
Further, the specific implementation manner of step S6 is: and according to the voice arrangement sequence in the training voice sets obtained in the step S5, selecting the training voice with the top rank according to a certain proportion as the core training voice of each category tested in the current round.
Further, the specific implementation manner of step S7 is: for N-type voice classification tasks, the core training voices tested in the current round of stage are divided into N sets according to the categories of the core training voices, the core training voices in each set are sequentially subjected to characteristic extraction and then are respectively trained to obtain core models, namely N core models, of various voices tested in the current round of stage.
Further, the specific implementation manner of step S8 is: inputting the test voice into the core model of the previous stage test according to the category to obtain the corresponding score, taking the test voice with lower score and being the real voice of the user and the test voice with higher score and not being the real voice of the user as the misrecognition voice of the stage test, and incorporating the misrecognition voice into the reference voice set of the stage.
By the iterative updating method of the voice training data, the high-quality training voice can be continuously screened according to the feedback of the test data, and the obtained stage core training voice utilizes the stage application feedback in time, so that the future recognition performance of the voice training voice is better and better. Therefore, the method is suitable for voice classification scenes such as voice recognition, speaker recognition, fake voice recognition and the like.
Drawings
Fig. 1 is a flowchart illustrating an iterative updating method of speech training data according to the present invention.
Detailed Description
The invention is suitable for voice classification scenes such as voice recognition, speaker recognition, forged voice recognition and the like. For further understanding of the present invention, the following detailed description of the embodiments of the present invention will be made only with respect to specific embodiments of applications for selecting core training speech in synthesized speech detection, but it is to be understood that these descriptions are only intended to further illustrate features and advantages of the present invention, and are not intended to limit the claims of the present invention.
The experimental data in the present embodiment adopts a logical access database (ASVspoof 2019-LA) for automatically recognizing a spoofing attack and defense countermeasure challenge match by a speaker in 2019, an ASVspoof 2015 (ASVspoof 2015) for automatically recognizing a spoofing attack and defense countermeasure challenge match by a speaker, and a real scene synthesized speech detection data set (RS-SSD).
The ASVspoof challenge is initiated by a co-organization of several world-leading research institutes, university of Edinburgh, England, France, EURECOM, Japan NEC, university of east Finland, etc. The real speech of ASVspoof 2019 comes from 107 speakers, 61 being female and 46 being male, the data set is divided into three parts: training set (Train), development set (Dev), evaluation set (Eval), recording environment is quieter, no obvious channel or environmental noise. The false voices of a training set and a development set are generated from real voices by various algorithms, wherein the training set comprises 20 speakers, 12 speakers are female, 8 speakers are male, and comprises 2580 sentences of real voices and 22800 sentences of false voices; the development set comprises 20 speakers, 12 speakers are female, 8 speakers are male, and the development set comprises 2548 true voices and 22296 false voices; the evaluation set contains 67 speakers, 37 women and 30 men, and contains 7355 true voices and 63882 false voices, and the size of the evaluation set is about 4 GB.
The real speech of ASVspoof 2015 is from 106 speakers, 61 as females and 45 as males, the data set is divided into three parts: training set (Train), development set (Dev), evaluation set (Eval), recording environment is quieter, no obvious channel or environmental noise. The false voices of a training set and a development set are generated from real voices by various algorithms, wherein the training set comprises 25 speakers, 15 human females and 10 human males and comprises 3750 sentences of the real voices and 12625 sentences of the false voices; the development set comprises 35 speakers, 20 people are female, 15 people are male, and the development set comprises 2497 true voices and 49875 false voices; the evaluation set contained 46 speakers, 26 women, 20 men, and about 20 million test voices, and the evaluation set size was about 20 GB.
A Real-scene synthesized Speech Detection dataset (Real-scene synthesized Speech Detection Database), abbreviated as RS-SSD dataset, wherein the synthesized Speech includes synthesized Speech from google, Tencent, hundredth and Sinhua society Artificial Intelligence (AI) anchor, the duration is 4.12 hours in total, and Real Speech of the same duration, including Real Speech from network media video, Real Speech from Newhua society Newcastle news video, and Real Speech from two databases of Mandarin Available Speech Corpuses (MASC) published by CCNT laboratories of Zhejiang university, and Chinese Mandarin open source Speech Database AILL 1 provided by Hill shell; the voice contents of various categories are various and include voice contents of various scenes such as news reports, smart homes, unmanned driving, industrial production and the like.
As shown in fig. 1, the speech training data iterative update method based on stage test feedback of the present invention includes the following steps:
s1, performing stage test by using the existing model to obtain stage test misrecognition sentences;
s2, adding stage test misrecognition sentences into a set initially used for selecting training sentences;
s3, extracting acoustic features of the new set, and training to obtain reference model parameters;
s4, calculating likelihood scores of all training sentences on the reference model;
s5, respectively sequencing likelihood scores of all real sentences and synthesized sentences in a descending order and an ascending order;
and S6, selecting the real sentences and the synthesized sentences which are ranked at the front respectively to form a training set.
The specific implementation method of the step S1 is as follows: firstly, defining the real voice training corpus of the synthesized voice detection as X _ gene, the false voice training corpus as X _ spoof, and the number of target selection voices as MgenuineAnd MspoofThe known test speech that is misrecognized by the new test is Q,test Speech set Q initially for picking0,Cgenuine,CspoofIs the selected voice set.
In the current testing stage of the stage testing scene, the existing model is used for testing to obtain a stage testing misrecognized speech set Q, which comprises two conditions that real speech is misrecognized as synthesized speech and the synthesized speech is misrecognized as the real speech.
The specific implementation method of the step S2 is as follows: adding a set of stage test misrecognized speech Q to a set of test data Q initially used for selection training0Composing a new set Q for picking up training datanew。
The specific implementation method of the step S3 is as follows: for a new set of speech Q used to pick up training datanewThe feature data is obtained using a 32-order LFCC of speech, plus a first order delta feature and a second order delta feature. Using these feature data, a GMM with a K-order Gaussian component is trained, and this GMM is used as a reference model for the selection of training speechAnd
the training of the GMM is a supervised optimization process, typically using maximum likelihood criteria. The whole process is divided into two parts of parameter initialization and parameter optimization, wherein the parameter initialization usually uses an LBG algorithm, and the parameter optimization usually uses an EM algorithm; since the GMM training and the speech feature obtaining method are commonly applied to the existing synthesized speech detection system, they are not described in more detail here. For the choice of GMM model order K, typically a power of 2 such as 64, 128, 512, 1024, etc., it was found experimentally that the 512 order GMM synthesized speech detection system performs better for the 96-dimensional LFCC features used.
The specific implementation method of the step S4 is as follows: acquiring the feature data of all training voices in the same manner as the data of the GMM reference model trained in step S3And keeping consistent. Then aiming at each training voice, the reference model of the real voiceAnd synthesizing speechRespectively calculating log-likelihood scores on the reference model Differencing to obtain log-likelihood score ratio
The specific implementation method of the step S5 is as follows: for all the likelihood scores of the real voice and the synthesized voice obtained in step S4, y is obtained by sorting y in descending order and y in ascending order, respectively1,y2,…,ynY'1,y’2,…,y’n}。
The specific implementation method of the step S6 is as follows: for the ranked training speech y and y' obtained in step S5, the top M was selectedgenuineSentence true speech and MspoofSentence-synthesized speech is added to selected speech set Cgenuine and CspoofI.e. by MgenuineAnd MspoofIs determined according to the data selection ratio.
In the following, we tested all voices in the evaluation set, and the experiments were based on the GMM system, except for the selection algorithm proposed by the present invention, comparing the error rate EER results of the experiments and the like using all data and random selection methods, as shown in table 1:
TABLE 1
As can be seen from Table 1, the invention can improve the system identification performance to a certain extent and the performance is superior to that of the random selection method, compared with the original method using all data training, when only 1/3 data is selected for training, the EER is respectively improved by 0.77, 1.31 and 5.26 percentage points in three data sets, and when only 1/2 data is selected, the EER is respectively improved by 0.88, 1.70 and 5.95 percentage points in three data sets.
The foregoing description of the embodiments is provided to enable one of ordinary skill in the art to make and use the invention, and it is to be understood that other modifications of the embodiments, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty, as will be readily apparent to those skilled in the art. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.
Claims (8)
1. A speech training data iteration updating method based on stage test feedback comprises the following steps:
s1, extracting features of original training voice and then training to obtain an original model;
s2, performing a round of stage test, selecting the misrecognized speech from the test speech according to the score of the test speech on the original model, and adding the misrecognized speech into a reference speech set;
s3, training by using the voices in the reference voice set to obtain a reference model;
s4, calculating the matching score of the original training voice on the reference model;
s5, sequencing each training voice in each training voice set in sequence according to the model score;
s6, selecting training voices ranked at the top one by one according to a certain proportion as core training voices of the test of the current stage;
s7, extracting characteristics of the core training voice, and then training to obtain a core model tested in the current stage;
and S8, carrying out a new round of stage test, selecting the misrecognized speech from the core model obtained by the previous round of stage test according to the score of the test speech, adding the selected misrecognized speech into the reference speech set, and returning to execute the step S3.
2. The iterative update method for speech training data according to claim 1, characterized in that: the specific implementation manner of step S2 is as follows: inputting the test voice into the original model according to categories to obtain corresponding scores, taking the test voice which has lower score and is the real voice of the user and the test voice which has higher score and is not the real voice of the user as the misrecognized voice tested at the stage, and incorporating the misrecognized voice into a reference voice set, wherein the initial reference voice set comprises known part of the test voice.
3. The iterative update method for speech training data according to claim 1, characterized in that: the specific implementation manner of step S3 is as follows: for N-type voice classification tasks, dividing the voices in the reference voice set into N subsets according to the categories of the voices, sequentially extracting features of the voices in the subsets, and then respectively training the voices to obtain reference models of the voices, namely N reference models, wherein N is a natural number which is larger than 1, namely a set voice category number.
4. The iterative update method for speech training data according to claim 1, characterized in that: the specific implementation manner of step S4 is as follows: firstly, original training voices are classified according to voice categories to obtain N category training voice sets, then, the characteristics of the original training voices in the various training voice sets are extracted in sequence and input into reference models of corresponding categories, and matching scores of the training voices, namely model scores, are obtained through calculation and output.
5. The iterative update method for speech training data according to claim 1, characterized in that: the specific implementation manner of step S5 is as follows: based on the matching scores of all the training voices obtained in step S4, the training voices in each class of training voice set are sorted by their model score from high to low according to the voice category.
6. The iterative update method for speech training data according to claim 1, characterized in that: the specific implementation manner of step S6 is as follows: and according to the voice arrangement sequence in the training voice sets obtained in the step S5, selecting the training voice with the top rank according to a certain proportion as the core training voice of each category tested in the current round.
7. The iterative update method for speech training data according to claim 1, characterized in that: the specific implementation manner of step S7 is as follows: for N-type voice classification tasks, the core training voices tested in the current round of stage are divided into N sets according to the categories of the core training voices, the core training voices in each set are sequentially subjected to characteristic extraction and then are respectively trained to obtain core models, namely N core models, of various voices tested in the current round of stage.
8. The iterative update method for speech training data according to claim 1, characterized in that: the specific implementation manner of step S8 is as follows: inputting the test voice into the core model of the previous stage test according to the category to obtain the corresponding score, taking the test voice with lower score and being the real voice of the user and the test voice with higher score and not being the real voice of the user as the misrecognition voice of the stage test, and incorporating the misrecognition voice into the reference voice set of the stage.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2020103621269 | 2020-04-30 | ||
CN202010362126 | 2020-04-30 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113223537A CN113223537A (en) | 2021-08-06 |
CN113223537B true CN113223537B (en) | 2022-03-25 |
Family
ID=77090957
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110489679.5A Active CN113223537B (en) | 2020-04-30 | 2021-04-30 | Voice training data iterative updating method based on stage test feedback |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113223537B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6224636B1 (en) * | 1997-02-28 | 2001-05-01 | Dragon Systems, Inc. | Speech recognition using nonparametric speech models |
US7346507B1 (en) * | 2002-06-05 | 2008-03-18 | Bbn Technologies Corp. | Method and apparatus for training an automated speech recognition-based system |
CN102332263A (en) * | 2011-09-23 | 2012-01-25 | 浙江大学 | Close neighbor principle based speaker recognition method for synthesizing emotional model |
US9443517B1 (en) * | 2015-05-12 | 2016-09-13 | Google Inc. | Generating sounds for detectability by neural networks |
CN108922515A (en) * | 2018-05-31 | 2018-11-30 | 平安科技(深圳)有限公司 | Speech model training method, audio recognition method, device, equipment and medium |
US10388272B1 (en) * | 2018-12-04 | 2019-08-20 | Sorenson Ip Holdings, Llc | Training speech recognition systems using word sequences |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7509259B2 (en) * | 2004-12-21 | 2009-03-24 | Motorola, Inc. | Method of refining statistical pattern recognition models and statistical pattern recognizers |
-
2021
- 2021-04-30 CN CN202110489679.5A patent/CN113223537B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6224636B1 (en) * | 1997-02-28 | 2001-05-01 | Dragon Systems, Inc. | Speech recognition using nonparametric speech models |
US7346507B1 (en) * | 2002-06-05 | 2008-03-18 | Bbn Technologies Corp. | Method and apparatus for training an automated speech recognition-based system |
CN102332263A (en) * | 2011-09-23 | 2012-01-25 | 浙江大学 | Close neighbor principle based speaker recognition method for synthesizing emotional model |
US9443517B1 (en) * | 2015-05-12 | 2016-09-13 | Google Inc. | Generating sounds for detectability by neural networks |
CN108922515A (en) * | 2018-05-31 | 2018-11-30 | 平安科技(深圳)有限公司 | Speech model training method, audio recognition method, device, equipment and medium |
US10388272B1 (en) * | 2018-12-04 | 2019-08-20 | Sorenson Ip Holdings, Llc | Training speech recognition systems using word sequences |
Non-Patent Citations (2)
Title |
---|
Text-Independent Speaker Recognition Using GMM Non-Linear Transformation;LOU Wenhua;《电子器件》;20171231;全文 * |
基于长短期记忆的车辆行为动态识别网络;卫星;《计算机应用》;20191231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113223537A (en) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102779510B (en) | Speech emotion recognition method based on feature space self-adaptive projection | |
CN103531198B (en) | A kind of speech emotion feature normalization method based on pseudo-speaker clustering | |
CN111243602A (en) | Voiceprint recognition method based on gender, nationality and emotional information | |
CN110610709A (en) | Identity distinguishing method based on voiceprint recognition | |
KR101618512B1 (en) | Gaussian mixture model based speaker recognition system and the selection method of additional training utterance | |
CN111524527A (en) | Speaker separation method, device, electronic equipment and storage medium | |
CN110111797A (en) | Method for distinguishing speek person based on Gauss super vector and deep neural network | |
CN102890930A (en) | Speech emotion recognizing method based on hidden Markov model (HMM) / self-organizing feature map neural network (SOFMNN) hybrid model | |
CN106991312B (en) | Internet anti-fraud authentication method based on voiceprint recognition | |
CN109346084A (en) | Method for distinguishing speek person based on depth storehouse autoencoder network | |
CN103578481A (en) | Method for recognizing cross-linguistic voice emotion | |
CN108962247A (en) | Based on gradual neural network multidimensional voice messaging identifying system and its method | |
CN111091809B (en) | Regional accent recognition method and device based on depth feature fusion | |
CN114678030A (en) | Voiceprint identification method and device based on depth residual error network and attention mechanism | |
CN110992988A (en) | Speech emotion recognition method and device based on domain confrontation | |
Cao et al. | Speaker-independent speech emotion recognition based on random forest feature selection algorithm | |
CN112562725A (en) | Mixed voice emotion classification method based on spectrogram and capsule network | |
Anguera et al. | A novel speaker binary key derived from anchor models | |
CN113223537B (en) | Voice training data iterative updating method based on stage test feedback | |
Kaur et al. | An efficient speaker recognition using quantum neural network | |
CN110807370B (en) | Conference speaker identity noninductive confirmation method based on multiple modes | |
CN115910073B (en) | Voice fraud detection method based on bidirectional attention residual error network | |
CN113223503B (en) | Core training voice selection method based on test feedback | |
Ladde et al. | Use of multiple classifier system for gender driven speech emotion recognition | |
CN112489689A (en) | Cross-database voice emotion recognition method and device based on multi-scale difference confrontation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |