CN114566155B - Feature reduction method for continuous speech recognition - Google Patents
Feature reduction method for continuous speech recognition Download PDFInfo
- Publication number
- CN114566155B CN114566155B CN202210243971.3A CN202210243971A CN114566155B CN 114566155 B CN114566155 B CN 114566155B CN 202210243971 A CN202210243971 A CN 202210243971A CN 114566155 B CN114566155 B CN 114566155B
- Authority
- CN
- China
- Prior art keywords
- feature
- training
- data
- features
- mean
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000009467 reduction Effects 0.000 title claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 122
- 238000010606 normalization Methods 0.000 claims abstract description 10
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 238000012545 processing Methods 0.000 claims abstract description 8
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims abstract description 5
- 230000009466 transformation Effects 0.000 claims abstract description 5
- 230000000694 effects Effects 0.000 abstract description 4
- 230000008569 process Effects 0.000 description 21
- 238000013528 artificial neural network Methods 0.000 description 15
- 239000013598 vector Substances 0.000 description 15
- 230000006870 function Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000011410 subtraction method Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Electrically Operated Instructional Devices (AREA)
- Machine Translation (AREA)
Abstract
A feature reduction method for continuous speech recognition, comprising the steps of: step 1, preparing a training set and calculating a dimension mean value of voice features; step 2, subtracting the mean value of the cepstrum coefficient characteristic from the characteristic of the single voice sample by using cepstrum mean value normalization, and eliminating outlier data; step 3, after eliminating the data of the discrete feature distribution, extracting the voice data features of all training samples of the training set by using a global feature mean affine transformation scaling mean method; and 4, performing calculation feature normalization processing to replace feature frames in the training samples to obtain a training set with reduced features. The invention provides a training method for feature reduction by reducing the characteristic aspect of the features. Aiming at the acoustic model training with large data volume, the recognition effect is improved, the false recognition is reduced, and the training time is shortened.
Description
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a feature reduction method for continuous voice recognition.
Background
The traditional acoustic model modeling mode is based on deep neural network model parameters, acoustic feature vectors of each frame of voice data are transmitted along the neural network node connection, the nodes are output as posterior probability vectors of the frame, and the physical meaning of each dimension of the posterior probability vectors is classification probability of the corresponding acoustic state.
The deep neural network training mode greatly improves the overall recognition effect, and simultaneously aggravates the requirement on the data size of training samples, and the training of large data size is accompanied with time consumption. The mechanism of continuous speech recognition decoding is seriously dependent on the training of an acoustic model, the training of the acoustic model determines the overall performance of speech recognition decoding, and the comprehensive performance can be improved by the input training of a large data volume speech corpus, but the input consumption in the aspects of cost of data formation, total duration of training operation and the like is very large.
Disclosure of Invention
In order to overcome the technical defects in the prior art, the invention discloses a feature reduction method for continuous voice recognition.
The feature reduction method for continuous voice recognition comprises the following steps:
step 1, preparing a training set, wherein the training set comprises a plurality of training samples, and each training sample comprises voice and corresponding text; calculating the average value of the voice feature dimensions of a single training sample in the training set by using a formula 1;
Equation 1
Represents the mean value of the dimensions of the speech features,A summation of frame features representing speech data, T representing the total number of feature frames in the speech data,Representing the ith class of t frame features;
Step 2, using cepstrum mean normalization to subtract the mean value of cepstrum coefficient features on the features of the single voice sample, specifically as follows:
Feature frame mean Equation 2
O i represents a characteristic dimension value of the characteristic frame, and formula 2 represents a characteristic dimension value of the characteristic frame minus a characteristic dimension mean value of the voice to calculate a characteristic frame mean value;
marking outlier data according to the characteristic value mean value and removing the outlier data from the training set;
step 3, after eliminating the data of the discrete feature distribution, extracting the voice data features of all training samples of the training set by using a global feature mean affine transformation scaling mean method;
The mean y i and standard deviation sigma i of all training sample voice data in dimension i are calculated as follows:
Equation 3
Equation 4
Representing the feature frames of each dimension of feature i in the training samples, M representing different training samples, M representing the total number of training samples after outlier data is removed,
The training samples in the step 3 are processed by using sample feature normalized data, and outlier data is removed;
Step 4, carrying out calculation feature normalization processing by combining the formula 3 and the formula 4, specifically as shown in the formula 5:
Equation 5
Obtaining normalized values of all data in the training data after feature reduction according to the formula 5Will normalize the valueFeature frames corresponding to each dimension feature i in the replacement training samplesAnd obtaining the training set after the features are reduced.
Preferably, in the step 2, the outlier data is marked and rejected according to the following method: setting a deviation threshold, marking training samples with differences from the characteristic frame mean value larger than the deviation threshold as outlier data, and eliminating
The invention provides a training method for feature reduction by reducing the characteristic aspect of the features. Aiming at the acoustic model training with large data volume, the accuracy of training model identification is improved, false identification is reduced, and training time is shortened.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of the present invention.
Detailed Description
The function of noise signal characteristics is eliminated after the characteristic reduction method is used in the continuous speech recognition acoustic model training process, and in the training process of a large amount of speech data corpus, unnecessary characteristic structures are necessarily existed, so that the data characteristics except for non-target signals are noise signal characteristics, the noise signals need to be removed to the greatest extent before or during training, noise is an interference item to influence the iteration of training parameters, and the model accuracy and the signal processing robustness are reduced. The signal characteristics are kept to be more fit with the actual voice signals and the specific voice environment, or the voice contents of the evaluation corpus such as the test set verification set are fit.
Secondly, the unnecessary relevance of the features is eliminated by using a feature reduction method in the continuous speech recognition acoustic model training process, analysis is performed from the aspect of feature space representation of acoustic model decoding, if the model parameters of the same type of speech corpus training possibly have the same speech signal features, namely, some feature parameters in the feature space are denser, some feature parameters in the feature space are sparser due to the fact that fewer feature parameters occur in the corpus and the occurrence of speech signals with lower frequency, and the feature structure can obtain a model structure with poor robustness in the deep neural network training process, the model is not suitable for a specific speech signal use environment, and the result is poor generalization, poor user sensory experience and particularly causes specific phenomena such as frequency of false recognition.
To improve upon the above-described drawbacks, a detailed description of embodiments of the present invention will be given in further detail with reference to the detailed flow chart shown in fig. 1.
The feature reduction method for continuous speech recognition trains a training set for speech recognition, wherein the training set comprises M training samples, and each training sample comprises speech and corresponding text.
The method specifically comprises the following steps:
Step 1, calculating a voice feature dimension mean value of a single training sample by using a formula 1;
Equation 1
Represents the mean value of the dimensions of the speech features,A summation of frame features representing speech data, T representing the total number of feature frames in the speech data,Representing the ith class of frame features.
And 2, step 2.
And then the cepstrum mean normalization is used for subtracting the mean value of the cepstrum coefficient characteristics from the characteristics of the single voice sample, and the method is concretely as follows:
Feature frame mean Equation 2
O i in the formula 2 represents the feature frames, and represents the feature dimension values of all the feature frames minus the voice feature dimension mean value calculated in the formula 1, so as to obtain the feature frame mean value.
And marking the outlier data according to the characteristic value mean value and removing the outlier data from the training set.
The selection of outlier data may set a deviation threshold and training samples that differ from the characteristic frame mean by more than the deviation threshold are labeled as outlier data.
And step 3, after eliminating the data of the discrete feature distribution, extracting the voice data features of all training samples of the training set by using a global feature mean affine transformation scaling mean method, and calculating the mean and standard deviation.
The mean y i and standard deviation sigma i of all training sample voice data in dimension i are calculated as follows:
Equation 3
Equation 4
Representing the feature frames of each dimension of feature i in the training samples, M representing different training samples, M representing the total number of training samples after outlier data is removed,
The training samples in equations 3 and 4 have been processed using sample feature normalization data, rejecting outlier data.
Step 4, carrying out calculation feature normalization processing by combining the formula 3 and the formula 4, specifically as shown in the formula 9:
Equation 5
Obtaining normalized values of all data in the training data after feature reduction according to the formula 5Will normalize the valueFeature frames corresponding to each dimension feature i in the replacement training samplesAnd obtaining the training set after the features are reduced.
When each dimension of the feature is reduced to a standardized numerical range, the subsequent training and processing process can generally obtain better performance improvement, after feature normalization is calculated, the global feature mean affine transformation scales the feature data of each dimension in a certain numerical range, the sample is preprocessed based on the prior voice signal processing in the subsequent training process, the training space data is increased and more feature representations are covered, the feature dimensions of all the data can iteratively update the deep neural network parameters in the model parameter adjustment process, the generalization capability of the training model is enhanced, the decoding effect is improved, and the misrecognition is reduced.
The data set after feature reduction can be processed as follows.
The model initialization in the feature reduction randomly takes values from the initial distribution of the mean value and the standard deviation, and is used for initializing model parameters before training, and taking values and numerical ranges of the mean value and the standard deviation.
The initial value of the mean is set to 0, and the standard deviation formula is as follows:
Equation 6
Equation 6 represents the mean standard deviationIs initialized to a distribution of random values, whereinRepresenting the number of output nodes connected with weights in a deep neural network, aloneRepresenting the implicit layer weights of the neural network. The regularization training is used for restraining and adjusting coefficient estimation towards the zero vector direction, the coefficient estimation is reduced to be similar to the zero vector to the greatest extent, and regularization can reduce model complexity and eliminate instability in the learning process, so that overfitting is avoided.
The norm calculation used in the regularization training process is shown in equation 7-1:
Equation 7-1
Represents a non-negative norm, n represents a dimension of the vector, x i represents a vector of the ith dimension,Is the sum of squares of all elements of the vector, p is the norm order, and when p=2, 1/p is the open square.
The core role of the norms is to solve smooth deviations and variances under the condition of excessive model parameters or excessively complex structure, and to process fitting in terms of model prediction and variability of real model parameters. When p=2, the sum of squares and reopen root number of all elements in a certain vector are represented, and the calculation result is L2 norm, i.e. euclidean distance.
Coefficient scaling for features in vectors using L2 norms can suppress training features from being too large, and if overfitting occurs, the norms formula can constrain the number of features.
Calculation using equation 7-2:
Equation 7-2
In equation 7-2The loss function representing the L2 norm of vector vec (W) continues to decompose from equation 7-2 to calculate a vector represented as matrix W for the left R (W), the regularization term of matrix WIn the form, the following formula 7-3 is obtained:
Equation 7-3
Equation 7-3The method is a vector representation formed by combining weights of all hidden layers of a deep neural network, and then the vector is calculated and decomposed to obtain an accumulated representation of values of an ith row and an jth column, wherein the accumulated representation is represented by the following formula 7-4:
equation 7-4
The leftmost R (W) of equation 7-2, the canonical term calculation representation of matrix W, is shown in equation 7-3 as the value of the ith row and jth column in matrix W, and in equation 7-4 to the rightTo take the absolute value of the ownership parameter on the basis of the loss function.
N l-1 、Nl represents the hierarchical numbers of the hidden layers of the deep neural network N l and N l-1, respectively, L is the number of all hidden layers, and the sum of three terms is the vector representation formed by the weight of all hidden layers.
Is a vector-expanded absolute value accumulated representation of the implicit layer weight values to the ith row and jth column.
Specific examples:
The identification content is four short sentences: ① Younger intelligent households, ② are good weather today, ③ turn on the air conditioner to raise the temperature, ④ turn off the heating to lower the temperature. The data volume of the four phrases is 2000 samples, the total time length of each phrase is 2.78 hours, the content of each phrase is contained in various 2-4-word household phrases for 120 hours, and the total time length is about 135 hours.
The four sentences are assumed to be used for enabling the traditional training method and the feature reduction training method provided by the application to compare and evaluate the recognition results, and the word error rate is used as a judgment standard.
The training uses an error back propagation algorithm in the deep neural network using a conventional training method, namely a deep neural network multi-hidden layer perceptron training method, using a conventional activation function and training criteria in the deep neural network.
In a specific training process, a training data processing algorithm for fully and randomly scattering training data is adopted, a plurality of groups of acoustic models are trained by extracting home environment training corpus for 160 hours, 270 hours and 205 hours and configuring different parameters, test verification results of the acoustic model training for 160 hours are found to be obviously superior to test verification results of the acoustic model training of the acoustic network same-level data corpus in the traditional 160 hours, and test decoding results of the acoustic model training for 270 hours and 205 hours are higher than traditional neural network training baseline test results above 600 hours: the novel acoustic model training method greatly reduces the word error rate of continuous voice recognition from 11.21% to 8.96%.
And the training speed is slightly increased in terms of training iteration time due to the reduction of the characteristic reduction training parameters, and the comparison finds that the extraction of the home environment training corpus for 160 hours, 270 hours and 205 hours is generally carried out for about 15 hours, and the process of 25.5 hours and 21 hours is shortened to 14 hours, 23 hours and 19.5 hours by the novel training method. The novel method provided by the patent of the invention reduces the total training duration while providing a solution for improving the recognition accuracy, and provides a training method for feature reduction by reducing the characterization aspect of features.
The training of the traditional method is better than the repeated teaching of the process of learning words by pictures, and the like are repeatedly learned, but in the traditional training method, due to the fact that the labeling accuracy of the corpus of training data and the reliability of washing have certain difference influences, the influence of characteristic noise exists, in the repeated simulation learning process, the characteristic expression of a voice signal in a deep neural network is represented by anthropomorphic persons in the training process, different pronunciations of individual Chinese character differences exist in the process of training, such as a 'hello intelligent manager', 'hello intelligent gateway', 'hello intelligent manager', and the like, and the differences of frames of harmony characteristics and real pronunciations and labeled texts occur in error back propagation process of model parameter training, so that unexpected results such as over fitting or under fitting appear.
The invention repeatedly emphasizes the correct content and simultaneously repeatedly emphasizes the content of the correct text, when the voice features enter the deep neural network, the influence of the feature reduction coefficient can be fully considered in the process of training the model parameters by adding a feature reduction mode to the loss function, namely to the optimization target, so that the feature coefficient with smaller influence can be reduced to an irreducible degree, only the feature structure suitable for the training parameters is reserved, the correct voice signals still enter the subsequent network iteration process forcefully, unnecessary relevance of the noise voice signals is weakened gradually, some feature parameters in the feature space are uniformly distributed, and the deep neural network is repeatedly trained in this way.
According to the acoustic model training method for feature reduction, which is provided by the invention and is described by the above example, optimal model parameters can be trained under the real condition so as to achieve the best recognition effect. The comparison data shows that the novel training method with reduced characteristics has great performance improvement on the aspect of voice recognition rate compared with the traditional acoustic model training method, and reduces the word error rate of continuous voice recognition compared with the traditional acoustic model training method.
In a large number of trial and error processes, the feature subtraction method is found to be more effective when the size of the training data set is smaller than the amount of parameters in the deep neural network model.
The foregoing description of the preferred embodiments of the present invention is not obvious contradiction or on the premise of a certain preferred embodiment, but all the preferred embodiments can be used in any overlapped combination, and the embodiments and specific parameters in the embodiments are only for clearly describing the invention verification process of the inventor and are not intended to limit the scope of the invention, and the scope of the invention is still subject to the claims, and all equivalent structural changes made by applying the specification and the content of the drawings of the present invention are included in the scope of the invention.
Claims (2)
1. A feature reduction method for continuous speech recognition, comprising the steps of:
step 1, preparing a training set, wherein the training set comprises a plurality of training samples, and each training sample comprises voice and corresponding text; calculating the average value of the voice feature dimensions of a single training sample in the training set by using a formula 1;
equation 1
Represents the mean value of the dimensions of the speech features,A summation of frame features representing speech data, T representing the total number of feature frames in the speech data,Representing the ith class of t frame features;
Step 2, using cepstrum mean normalization to subtract the mean value of cepstrum coefficient features on the features of the single voice sample, specifically as follows:
Feature frame mean Equation 2
O i represents a characteristic dimension value of the characteristic frame, and formula 2 represents a characteristic dimension value of the characteristic frame minus a characteristic dimension mean value of the voice to calculate a characteristic frame mean value;
marking outlier data according to the characteristic value mean value and removing the outlier data from the training set;
step 3, after eliminating the data of the discrete feature distribution, extracting the voice data features of all training samples of the training set by using a global feature mean affine transformation scaling mean method;
The mean y i and standard deviation sigma i of all training sample voice data in dimension i are calculated as follows:
Equation 3
Equation 4
Representing a feature frame of each dimension feature i in the training samples, wherein M represents different training samples, and M represents the total number of training samples after outlier data are removed;
The training samples in the step 3 are processed by using sample feature normalized data, and outlier data is removed;
Step 4, carrying out calculation feature normalization processing by combining the formula 3 and the formula 4, specifically as shown in the formula 5:
equation 5
Obtaining normalized values of all data in the training data after feature reduction according to the formula 5Will normalize the valueFeature frames corresponding to each dimension feature i in the replacement training samplesAnd obtaining the training set after the features are reduced.
2. The feature reduction method of continuous speech recognition according to claim 1, wherein in the step 2, the outlier data is marked and rejected according to the following method: and setting a deviation threshold, marking training samples with differences from the characteristic frame mean value larger than the deviation threshold as outlier data, and eliminating.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210243971.3A CN114566155B (en) | 2022-03-14 | 2022-03-14 | Feature reduction method for continuous speech recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210243971.3A CN114566155B (en) | 2022-03-14 | 2022-03-14 | Feature reduction method for continuous speech recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114566155A CN114566155A (en) | 2022-05-31 |
CN114566155B true CN114566155B (en) | 2024-07-12 |
Family
ID=81720454
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210243971.3A Active CN114566155B (en) | 2022-03-14 | 2022-03-14 | Feature reduction method for continuous speech recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114566155B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113470622A (en) * | 2021-09-06 | 2021-10-01 | 成都启英泰伦科技有限公司 | Conversion method and device capable of converting any voice into multiple voices |
CN113707135A (en) * | 2021-10-27 | 2021-11-26 | 成都启英泰伦科技有限公司 | Acoustic model training method for high-precision continuous speech recognition |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5473116B2 (en) * | 2009-08-18 | 2014-04-16 | Kddi株式会社 | Speech recognition apparatus and feature amount normalization method thereof |
KR101236539B1 (en) * | 2010-12-30 | 2013-02-25 | 부산대학교 산학협력단 | Apparatus and Method For Feature Compensation Using Weighted Auto-Regressive Moving Average Filter and Global Cepstral Mean and Variance Normalization |
CN102982799A (en) * | 2012-12-20 | 2013-03-20 | 中国科学院自动化研究所 | Speech recognition optimization decoding method integrating guide probability |
CN107633842B (en) * | 2017-06-12 | 2018-08-31 | 平安科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN108831443B (en) * | 2018-06-25 | 2020-07-21 | 华中师范大学 | Mobile recording equipment source identification method based on stacked self-coding network |
CN109801621B (en) * | 2019-03-15 | 2020-09-29 | 三峡大学 | Voice recognition method based on residual error gating cyclic unit |
-
2022
- 2022-03-14 CN CN202210243971.3A patent/CN114566155B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113470622A (en) * | 2021-09-06 | 2021-10-01 | 成都启英泰伦科技有限公司 | Conversion method and device capable of converting any voice into multiple voices |
CN113707135A (en) * | 2021-10-27 | 2021-11-26 | 成都启英泰伦科技有限公司 | Acoustic model training method for high-precision continuous speech recognition |
Also Published As
Publication number | Publication date |
---|---|
CN114566155A (en) | 2022-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108597539B (en) | Speech emotion recognition method based on parameter migration and spectrogram | |
CN108984745B (en) | Neural network text classification method fusing multiple knowledge maps | |
CN110853680B (en) | double-BiLSTM speech emotion recognition method with multi-input multi-fusion strategy | |
CN108549658B (en) | Deep learning video question-answering method and system based on attention mechanism on syntax analysis tree | |
CN110164452A (en) | A kind of method of Application on Voiceprint Recognition, the method for model training and server | |
CN110083831A (en) | A kind of Chinese name entity recognition method based on BERT-BiGRU-CRF | |
CN110222163A (en) | A kind of intelligent answer method and system merging CNN and two-way LSTM | |
CN108630199A (en) | A kind of data processing method of acoustic model | |
CN108281137A (en) | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system | |
CN109119072A (en) | Civil aviaton's land sky call acoustic model construction method based on DNN-HMM | |
CN110289002B (en) | End-to-end speaker clustering method and system | |
CN111522956A (en) | Text emotion classification method based on double channels and hierarchical attention network | |
CN107316654A (en) | Emotion identification method based on DIS NV features | |
CN107146615A (en) | Audio recognition method and system based on the secondary identification of Matching Model | |
CN110321418A (en) | A kind of field based on deep learning, intention assessment and slot fill method | |
CN111477220B (en) | Neural network voice recognition method and system for home spoken language environment | |
CN110459225A (en) | A kind of speaker identification system based on CNN fusion feature | |
CN110046252A (en) | A kind of medical textual hierarchy method based on attention mechanism neural network and knowledge mapping | |
CN109637526A (en) | The adaptive approach of DNN acoustic model based on personal identification feature | |
CN109712609A (en) | A method of it solving keyword and identifies imbalanced training sets | |
CN112053694A (en) | Voiceprint recognition method based on CNN and GRU network fusion | |
CN111309909A (en) | Text emotion classification method based on hybrid model | |
CN111899766B (en) | Speech emotion recognition method based on optimization fusion of depth features and acoustic features | |
CN114678030B (en) | Voiceprint recognition method and device based on depth residual error network and attention mechanism | |
CN114566155B (en) | Feature reduction method for continuous speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |