CN112712099B - Double-layer knowledge-based speaker model compression system and method by distillation - Google Patents

Double-layer knowledge-based speaker model compression system and method by distillation Download PDF

Info

Publication number
CN112712099B
CN112712099B CN202011079752.3A CN202011079752A CN112712099B CN 112712099 B CN112712099 B CN 112712099B CN 202011079752 A CN202011079752 A CN 202011079752A CN 112712099 B CN112712099 B CN 112712099B
Authority
CN
China
Prior art keywords
speaker
layer
teacher
network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011079752.3A
Other languages
Chinese (zh)
Other versions
CN112712099A (en
Inventor
李入云
宋丹丹
欧阳鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Qingwei Intelligent Technology Co ltd
Original Assignee
Jiangsu Qingwei Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Qingwei Intelligent Technology Co ltd filed Critical Jiangsu Qingwei Intelligent Technology Co ltd
Priority to CN202011079752.3A priority Critical patent/CN112712099B/en
Publication of CN112712099A publication Critical patent/CN112712099A/en
Application granted granted Critical
Publication of CN112712099B publication Critical patent/CN112712099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a speaker model compression system and method based on double-layer knowledge distillation, and belongs to the technical field of implementation modes of stereo matching algorithms. Comprising the following steps: the Embedding layer knowledge distillation directs the student network to mimic the segment-level speaker representation (speaker characterization) of the teacher network, which captures the basic distribution of each speaker's features. Logit layer knowledge distillation guides student networks to mimic speaker posterior probability distribution of teacher networks, exploiting similarities between speaker classes. The method migrates a hierarchy of speaker characterization profiles from the teacher's network. The invention solves the problems that the student network in the prior art cannot realize smaller intra-speaker differences and larger inter-speaker differences, and the accuracy of the verification system of the same speaker and different speakers is lower.

Description

Double-layer knowledge-based speaker model compression system and method by distillation
Technical Field
The invention belongs to the technical field of model compression based on a double-layer knowledge distillation technology, and particularly relates to a system and a method for compressing a model of a distilled speaker based on double-layer knowledge.
Background
In recent years, with the increasing abundance of computing resources and data resources. Machine learning based on deep neural network enables the accuracy of speaker recognition system to be significantly improved. In situations where network connectivity is unavailable or personal privacy leakage is a concern, it is desirable to be able to use speaker recognition technology locally on embedded devices such as cell phones, such as speaker recognition systems operating on embedded terminals with a higher level of security. However, existing speaker recognition techniques rely on deep neural networks, which are hindered from deployment in embedded devices with low memory resources due to the high computational effort and large memory usage. Accordingly, increasing research is focused on model compression and acceleration of deep networks without significantly degrading model performance.
To compress these networks, knowledge distillation is a common approach in which a large network (teacher) provides a weighted goal to guide training of a small network (student). While knowledge distillation has proven to be a practical method of model compression in a variety of tasks (e.g., image classification, speech recognition, and speaker verification), previous researchers have only studied the impact of single-layer knowledge distillation on speaker characterization performance, and as compression scales become larger and larger, these methods are insufficient to make up for the performance gap between size models, and it remains a challenge to obtain a student network that performs better than a teacher network.
Disclosure of Invention
The invention aims to provide a double-layer knowledge-based distilled speaker model compression system and method, which are used for solving the problems that a student network in the prior art cannot realize smaller intra-speaker differences and larger inter-speaker differences and the accuracy of the verification system of the same speaker and different speakers is lower.
In order to achieve the above object, the present invention provides the following technical solutions:
a speaker model compression method based on double-layer knowledge distillation comprises the following steps:
s101, training a teacher model, wherein the teacher model can extract the speaker characterization of the teacher network learning, and the teacher model can predict the posterior probability distribution of the speaker of the teacher network learning.
S102, the teacher model comprises a teacher network, and the teacher network comprises a characterization layer and a posterior probability layer.
S103, training a student model by using a teacher model through knowledge distillation. The student model comprises a student network, and the student model can extract speaker characterization learned by the student network.
S104, double-layer knowledge distillation can simultaneously extract knowledge of a characterization layer and a posterior probability layer from a teacher network.
S105, carrying out characterization layer knowledge distillation through speaker characterization learned by a teacher network.
S106, the knowledge distillation of the characterization layer guides the student network to simulate the speaker characterization of the teacher network.
S107, performing posterior probability layer knowledge distillation through speaker posterior probability distribution learned by a teacher network.
S108, the posterior probability layer knowledge distillation guides the student network to simulate the speaker posterior probability distribution of the teacher network through the similarity among speaker categories.
S109, double-layer knowledge distillation can add the differences in the token layer and posterior probability layer outputs between the student network and the teacher network to the total classification loss.
S110, double-layer distillation can yield a distribution of intra-speaker characterization and similarity of inter-class characterization. The students are guided to realize smaller intra-speaker differences and larger inter-speaker differences through the hierarchical distribution of the speaker characterization, so that the modeling accuracy of the speakers is finally improved.
Based on the technical scheme, the invention can also be improved as follows:
further, the token layer knowledge distillation can obtain the overall distribution of the teacher network token for each speaker, so as to directly guide the convergence of the token in the student network speaker.
Further, knowledge is extracted from the output of the posterior probability layer of the teacher network, and posterior distribution which can be predicted by the teacher model is distilled by the posterior probability layer knowledge to guide optimization of the student model. The posterior probability layer knowledge distillation is able to learn the similarity between speaker classes.
Further, knowledge is extracted from the output of the teacher's network posterior probability layer.
Further, the output of the teacher network posterior probability layer is taken as a standard, and is incorporated into the calculation of the student network loss function to guide the update of the student model parameters.
Further, posterior probability layer knowledge distillation guides optimization of student models through posterior probability distribution predicted by teacher models.
Furthermore, the student model introduces a parameter m to control the angle allowance through a classification function AM-loss, the student model generates the angle classification allowance between the characterizations of different speaker categories, and the student model can make the requirements of correct classification more strict.
Further, the total classification loss is a cosine distance loss characterizing layer knowledge distillation, a KL divergence loss of posterior probability layer knowledge distillation, and a softmax loss for speaker classification.
A dual-layer knowledge-based distilled speaker model compression system, comprising:
training a teacher model, the teacher model can extract the speaker characterization of the teacher network learning, and the teacher model can predict the speaker posterior probability distribution of the teacher network learning.
The teacher model comprises a teacher network, and the teacher network comprises a characterization layer and a posterior probability layer.
And training the student model by using the teacher model through knowledge distillation. The student model comprises a student network, and the student model can extract speaker characterization learned by the student network.
The double-layer knowledge distillation can extract knowledge of the characterization layer and the posterior probability layer from the teacher network simultaneously.
And carrying out characterization layer knowledge distillation through speaker characterization learned by a teacher network.
The token layer knowledge distillation directs the student network to mimic the speaker token of the teacher network.
And performing posterior probability layer knowledge distillation through speaker posterior probability distribution learned by a teacher network.
Posterior probability layer knowledge distillation directs student networks to mimic the speaker posterior probability distribution of a teacher network through similarities between speaker classes.
The double-layer knowledge distillation can add differences in the output of the token layer and the posterior probability layer between the student network and the teacher network to the total classification loss.
Double layer distillation can yield a distribution of intra-speaker characterization and similarity of inter-class characterization. The students are guided to realize smaller intra-speaker differences and larger inter-speaker differences through the hierarchical distribution of the speaker characterization, so that the modeling accuracy of the speakers is finally improved.
The invention has the following advantages:
the invention discloses a compression system based on a double-layer knowledge distillation speaker model, which guides a student network to simulate a segment-level speaker representation (speaker characterization) of a teacher network through the knowledge distillation of an Embedding layer, and captures the basic distribution of the characteristics of each speaker. Logit layer knowledge distillation guides student networks to mimic speaker posterior probability distribution of teacher networks, exploiting similarities between speaker classes. The method migrates a hierarchy of speaker characterization profiles from the teacher's network. Double-layer knowledge distillation can help the student network achieve smaller intra-speaker differences and larger inter-speaker differences and further improve the accuracy of the same speaker and different speaker verification systems.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for compressing a model of a distilled speaker based on double-layer knowledge according to the present invention.
FIG. 2 is a flow chart of a double-layer knowledge distillation method of the present invention.
FIG. 3 is a schematic diagram of the double-layer knowledge distillation principle of the present invention.
FIG. 4 is a schematic diagram of the double-layer knowledge distillation principle of the present invention.
FIG. 5 is a graphical representation of comparative data for a double layer knowledge distillation and an original single layer knowledge distillation of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1-5, an embodiment of the present invention provides a dual-layer knowledge-based speaker model compression system, comprising:
knowledge of the basic mathematical model of distillation. Knowledge distillation aims at transferring knowledge from a large teacher network T to a small student network S. The student network is trained to mimic the behavior of the teacher network. Where HT and HS represent the behavioral functions of the teacher network and the student network, respectively. This behavior function converts network inputs into an information representation, specifically the output of any layer in the network. For example, hlS represents the output of layer l in a student's network. Layer l of the student network is matched to layer l 'in the teacher network by a mapping function f (l), which means that layer l of the student network can learn information from layer l' of the teacher network. Finally, through the difference between the minimum chemical output and the teacher output, students can well simulate the behavior of a teacher network:
where $x_ { i } $ represents the ith training set sample. $\mathcal { L } { L } $ refers to a loss function that limits the difference between the output of the student's $l$ layer and the output of the teacher's $f (L) $ layer (e.g., an embedded layer or L $ layer ogit A layer). Lambda_ { l } $ represents a super parameter, which represents the importance of the distillation of the l$ layer. $N$ is the number of training samples. The $ L $ refers to the total number of layers for the student.
Matching appropriate layers between the student network and the teacher network for knowledge distillation is not easy. In most cases we have to cope with their differences in width and depth.
S101, training a teacher model.
In this step, a teacher model 10 is trained, the teacher model 10 can extract the speaker characterization of the teacher network learning, and the teacher model 10 can predict the speaker posterior probability distribution of the teacher network learning.
A model compression method based on double-layer knowledge distillation takes a large speaker model as a teacher model 10, and the distillation obtains a very small student model 20, and meanwhile, the performance of the teacher model 10 is reserved.
S102, the teacher model comprises a teacher network.
In this step, the teacher model 10 includes a teacher network including a characterization layer and a posterior probability layer. Based on the x-vector structure, a characterization layer and a posterior probability layer are selected from a teacher network to perform knowledge distillation.
S103, training a student model by using a teacher model through knowledge distillation.
In this step, the student model 20 is trained by the teacher model 10 through knowledge distillation, the student model 20 includes a student network, and the student model 20 is capable of extracting speaker characterization of the student network learning.
S104, double-layer knowledge distillation extracts knowledge of a characterization layer and a posterior probability layer from a teacher network simultaneously.
In this step, the double-layer knowledge distillation can extract knowledge of the characterization layer and the posterior probability layer from the teacher network at the same time.
S105, carrying out characterization layer knowledge distillation through speaker characterization learned by a teacher network.
In the step, the knowledge distillation of the characterization layer is carried out through the speaker characterization learned by the teacher network.
S106, the knowledge distillation of the characterization layer guides the student network to simulate the speaker characterization of the teacher network.
In this step, the token layer knowledge distillation directs the student network to mimic the speaker token of the teacher network.
S107, performing posterior probability layer knowledge distillation through speaker posterior probability distribution learned by a teacher network.
In the step, posterior probability layer knowledge distillation is carried out through speaker posterior probability distribution learned by a teacher network.
S108, the posterior probability layer knowledge distillation guides the student network to simulate the speaker posterior probability distribution of the teacher network through the similarity among speaker categories.
In this step, posterior probability layer knowledge distillation directs the student network to mimic the speaker posterior probability distribution of the teacher network through similarities between speaker classes.
S109, double-layer knowledge distillation can add the differences in the token layer and posterior probability layer outputs between the student network and the teacher network to the total classification loss.
In this step, the double-layer knowledge distillation can add the differences in the output of the token layer and the posterior probability layer between the student network and the teacher network to the total classification loss.
S110, double-layer distillation can yield a distribution of intra-speaker characterization and similarity of inter-class characterization.
In this step, double-layer distillation can yield a distribution of intra-speaker characterization and similarity of inter-class characterization. The students are guided to realize smaller intra-speaker differences and larger inter-speaker differences through the hierarchical distribution of the speaker characterization, so that the modeling accuracy of the speakers is finally improved.
Assuming that the student and teacher networks produce speaker representations of the same dimension, the assembled representation layer knowledge distillation limits the similarity of speaker representations learned from the teacher and student model 20 by cosine similarity:
where \ (H { T } { embd } (x_ { i }) \) represents the embedding of the teacher network extracted for the $i $ th sample. (H { S } { emmbd } (x_ { i }) represents embedding calculated by the student network. Other definitions of the symbols are similar to the formula \ref { eq: kd }.
The comparison of the double-layer knowledge distillation of the present invention and the original single-layer knowledge distillation (Wang, shuai, yexin Yang, tianzhe Wang, yanmin Qian, and Kai Yu. "Knowledge distillation for small foot-print deep speaker embedding." In ICASSP 2019-2019IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP), pp.6021-6025.IEEE, 2019.) as shown In FIG. 5, test set was little loving.
The knowledge distillation of the characterization layer can obtain the overall distribution of the teacher network to the characterization of each speaker, so as to directly guide the convergence of the characterization in the student network speaker.
For speaker i, limited by cosine similarity, the speaker characterization Sspki extracted by student model 20 converges to the speaker characterization Tspki extracted by teacher model 10, thereby allowing student model 20 to achieve smaller intra-class differences.
Knowledge is extracted from the output of the teacher's network posterior probability layer, which distills the posterior distribution that can be predicted by the teacher model 10 to guide the optimization of the student model 20. The posterior probability layer knowledge distillation is able to learn the similarity between speaker classes.
By minimizing the KL divergence between the teacher network and the student network posterior probabilities:
where \ (C\) is the number of speakers in the training set. \ (\tille { y } pi\) is the posterior of the $i$ th sample predicted by the teacher network. \ ({ y } ] i\) is the posterior of the $i $ sample predicted by the student network. Other definitions of the symbols are similar to the formula \ref { eq: cos }.
Posterior probability is valuable information that can encode correlations between different classes. So that similarity between speaker classes can be learned by posterior probability layer knowledge distillation.
As shown in fig. 2-3, it can be seen that posterior probability layer knowledge distillation increases the inter-class differences of the student's network. Speakers with high similarity are grouped into a subclass.
Knowledge is extracted from the output of the teacher's network posterior probability layer.
The output of the teacher network posterior probability layer is taken as a standard, and is incorporated into the calculation of the student network loss function to guide the update of the parameters of the student model 20.
Posterior probability layer knowledge distillation directs optimization of student model 20 through posterior probability distributions predicted by teacher model 10.
The student model 20 controls the angle allowance by introducing the parameter m through the classification function AM-loss, the student model 20 generates the angle classification allowance between the characterizations of different speaker categories, and the student model 20 can make the requirements of correct classification more strict.
The total classification loss is the cosine distance loss characterizing layer knowledge distillation, the KL divergence loss of posterior probability layer knowledge distillation, and the softmax loss for speaker classification. Where α and β are the hyper-parameters used to balance these losses, the values of which will be optimized later in the experiment.
L total =L A-softmax +αL KLD +βL COS
A dual-layer knowledge-based distilled speaker model compression system, comprising:
training a teacher model 10, the teacher model 10 can extract the speaker characterization of the teacher's network learning, and the teacher model 10 can predict the speaker posterior probability distribution of the teacher's network learning.
The teacher model 10 includes a teacher network that includes a token layer and a posterior probability layer.
The student model 20 is trained with the teacher model 10 by knowledge distillation. The student model 20 includes a student network, and the student model 20 is capable of extracting speaker representations of student network learning.
The double-layer knowledge distillation can extract knowledge of the characterization layer and the posterior probability layer from the teacher network simultaneously.
And carrying out characterization layer knowledge distillation through speaker characterization learned by a teacher network.
The token layer knowledge distillation directs the student network to mimic the speaker token of the teacher network.
And performing posterior probability layer knowledge distillation through speaker posterior probability distribution learned by a teacher network.
Posterior probability layer knowledge distillation directs student networks to mimic the speaker posterior probability distribution of a teacher network through similarities between speaker classes.
The double-layer knowledge distillation can add differences in the output of the token layer and the posterior probability layer between the student network and the teacher network to the total classification loss.
Double layer distillation can yield a distribution of intra-speaker characterization and similarity of inter-class characterization. The students are guided to realize smaller intra-speaker differences and larger inter-speaker differences through the hierarchical distribution of the speaker characterization, so that the modeling accuracy of the speakers is finally improved.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting. Although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some of the technical features thereof can be replaced by equivalents. Such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A method for compressing a model of a distilled speaker based on double-layer knowledge, comprising:
s101, training a teacher model, wherein the teacher model can extract the speaker characterization of the teacher network learning, and the teacher model can predict the posterior probability distribution of the speaker of the teacher network learning;
s102, the teacher model comprises a teacher network, and the teacher network comprises a characterization layer and a posterior probability layer;
s103, training a student model by using the teacher model through knowledge distillation; the student model comprises a student network, and the student model can extract speaker characterization learned by the student network;
s104, the double-layer knowledge distillation can simultaneously extract knowledge of a characterization layer and a posterior probability layer from the teacher network;
s105, performing characterization layer knowledge distillation through speaker characterization learned by the teacher network;
s106, the characterization layer knowledge distillation guides the student network to simulate the speaker characterization of the teacher network;
s107, performing posterior probability layer knowledge distillation through the speaker posterior probability distribution learned by the teacher network;
s108, the posterior probability layer knowledge distillation guides the student network to simulate the speaker posterior probability distribution of the teacher network through the similarity among speaker categories;
s109, the double-layer knowledge distillation is capable of adding differences in the output of the characterization layer and the posterior probability layer between the student network and the teacher network to the total classification loss;
s110, the double-layer distillation can obtain the distribution of the characterization in the speaker and the similarity of the characterization among classes; the students are guided to realize smaller intra-speaker differences and larger inter-speaker differences through the hierarchical distribution of the speaker characterization, so that the modeling accuracy of the speakers is finally improved.
2. The method of claim 1, wherein the token layer knowledge distillation is capable of obtaining an overall distribution of the teacher's network token for each speaker, thereby directly guiding convergence of the token within the student's network speaker.
3. The method of claim 2, wherein knowledge is extracted from the output of the teacher network posterior probability layer, which distills posterior distributions that can be predicted by the teacher model to guide the optimization of the student model; the posterior probability layer knowledge distillation is able to learn the similarity between speaker classes.
4. The method for compressing a model of a speaker based on double knowledge distillation as recited in claim 3, wherein knowledge is extracted from an output of a posterior probability layer of said teacher network.
5. The method for compressing a model of a speaker based on double knowledge distillation as recited in claim 4, wherein the output of the teacher network posterior probability layer is taken as a standard, and is incorporated into the calculation of the student network loss function, so as to guide the update of the student model parameters.
6. The method for compressing a model of a speaker based on double knowledge distillation according to claim 5 wherein said posterior probability layer knowledge distillation directs optimization of said student model by a posterior probability distribution predicted by a teacher model.
7. The method for compressing a model of a distilled speaker based on double-layer knowledge according to claim 6, wherein the student model introduces a parameter m to control an angle margin through a classification function AM-loss, the student model generates an angle classification margin between the characterizations of different speaker categories, and the student model can make the requirements of correct classification more strict.
8. The double knowledge distillation speaker model compression method according to claim 7 wherein the total classification loss is a cosine distance loss characterizing layer knowledge distillation, a KL divergence loss of posterior probability layer knowledge distillation, and a softmax loss for speaker classification.
9. A dual knowledge-based speaker model compression system, comprising:
training a teacher model, wherein the teacher model can extract the speaker characterization of the teacher network learning, and the teacher model can predict the posterior probability distribution of the speaker of the teacher network learning;
the teacher model comprises a teacher network, and the teacher network comprises a characterization layer and a posterior probability layer;
training a student model by using the teacher model through knowledge distillation; the student model comprises a student network, and the student model can extract speaker characterization learned by the student network;
the double-layer knowledge distillation can simultaneously extract knowledge of a characterization layer and a posterior probability layer from the teacher network;
carrying out characterization layer knowledge distillation through speaker characterization learned by the teacher network;
the knowledge distillation of the characterization layer guides the student network to simulate the speaker characterization of the teacher network;
performing posterior probability layer knowledge distillation through the posterior probability distribution of the speaker learned by the teacher network;
the posterior probability layer knowledge distillation guides a student network to simulate speaker posterior probability distribution of a teacher network through similarity among speaker categories;
the double-layer knowledge distillation is capable of adding differences in the output of the characterization layer and the posterior probability layer between the student network and the teacher network to the total classification loss;
the double-layer distillation can obtain the distribution of the characterization in the speaking human and the similarity of the characterization among the classes; the students are guided to realize smaller intra-speaker differences and larger inter-speaker differences through the hierarchical distribution of the speaker characterization, so that the modeling accuracy of the speakers is finally improved.
CN202011079752.3A 2020-10-10 2020-10-10 Double-layer knowledge-based speaker model compression system and method by distillation Active CN112712099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011079752.3A CN112712099B (en) 2020-10-10 2020-10-10 Double-layer knowledge-based speaker model compression system and method by distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011079752.3A CN112712099B (en) 2020-10-10 2020-10-10 Double-layer knowledge-based speaker model compression system and method by distillation

Publications (2)

Publication Number Publication Date
CN112712099A CN112712099A (en) 2021-04-27
CN112712099B true CN112712099B (en) 2024-04-12

Family

ID=75541647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011079752.3A Active CN112712099B (en) 2020-10-10 2020-10-10 Double-layer knowledge-based speaker model compression system and method by distillation

Country Status (1)

Country Link
CN (1) CN112712099B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361396B (en) * 2021-06-04 2023-12-26 思必驰科技股份有限公司 Multi-mode knowledge distillation method and system
CN113849641B (en) * 2021-09-26 2023-10-24 中山大学 Knowledge distillation method and system for cross-domain hierarchical relationship

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109637546A (en) * 2018-12-29 2019-04-16 苏州思必驰信息科技有限公司 Knowledge distillating method and device
GB201908574D0 (en) * 2019-06-14 2019-07-31 Vision Semantics Ltd Optimised machine learning
EP3598343A1 (en) * 2018-07-17 2020-01-22 Nokia Technologies Oy Method and apparatus for processing audio data
CN111599373A (en) * 2020-04-07 2020-08-28 云知声智能科技股份有限公司 Compression method of noise reduction model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3598343A1 (en) * 2018-07-17 2020-01-22 Nokia Technologies Oy Method and apparatus for processing audio data
CN109637546A (en) * 2018-12-29 2019-04-16 苏州思必驰信息科技有限公司 Knowledge distillating method and device
GB201908574D0 (en) * 2019-06-14 2019-07-31 Vision Semantics Ltd Optimised machine learning
CN111599373A (en) * 2020-04-07 2020-08-28 云知声智能科技股份有限公司 Compression method of noise reduction model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Knowledge Distillation and Random Erasing Data Augmentation for Text-Dependent Speaker Verification;Victoria Mingote 等;ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);全文 *
Knowledge Distillation for Small Foot-print Deep Speaker Embedding;Shuai Wang 等;ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);全文 *

Also Published As

Publication number Publication date
CN112712099A (en) 2021-04-27

Similar Documents

Publication Publication Date Title
Settle et al. Discriminative acoustic word embeddings: Tecurrent neural network-based approaches
Zhang et al. Top-down tree long short-term memory networks
Huang et al. Speech emotion recognition from variable-length inputs with triplet loss function.
CN112071329A (en) Multi-person voice separation method and device, electronic equipment and storage medium
Markov et al. Robust speech recognition using generalized distillation framework.
CN108255805A (en) The analysis of public opinion method and device, storage medium, electronic equipment
CN113204952B (en) Multi-intention and semantic slot joint identification method based on cluster pre-analysis
CN111353029B (en) Semantic matching-based multi-turn spoken language understanding method
CN110110318B (en) Text steganography detection method and system based on cyclic neural network
Fang et al. Channel adversarial training for cross-channel text-independent speaker recognition
CN112712099B (en) Double-layer knowledge-based speaker model compression system and method by distillation
CN105139864A (en) Voice recognition method and voice recognition device
CN104217226A (en) Dialogue act identification method based on deep neural networks and conditional random fields
CN104200814A (en) Speech emotion recognition method based on semantic cells
Chen et al. Distilled binary neural network for monaural speech separation
CN113178193A (en) Chinese self-defined awakening and Internet of things interaction method based on intelligent voice chip
Zhou et al. ICRC-HIT: A deep learning based comment sequence labeling system for answer selection challenge
CN110633689B (en) Face recognition model based on semi-supervised attention network
CN114360502A (en) Processing method of voice recognition model, voice recognition method and device
CN116341558A (en) Multi-modal emotion recognition method and model based on multi-level graph neural network
CN114444481A (en) Sentiment analysis and generation method of news comments
CN116205227A (en) Keyword generation method and system based on variation inference theory
CN115795010A (en) External knowledge assisted multi-factor hierarchical modeling common-situation dialogue generation method
CN112287690A (en) Sign language translation method based on conditional sentence generation and cross-modal rearrangement
CN112463965A (en) Method and system for semantic understanding of text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Li Ruyun

Inventor after: Song Dandan

Inventor after: OuYang Peng

Inventor before: Li Ruyun

Inventor before: Song Dandan

Inventor before: OuYang Peng

Inventor before: Yin Shouyi

GR01 Patent grant
GR01 Patent grant