CN109754789A - The recognition methods of phoneme of speech sound and device - Google Patents
The recognition methods of phoneme of speech sound and device Download PDFInfo
- Publication number
- CN109754789A CN109754789A CN201711082646.9A CN201711082646A CN109754789A CN 109754789 A CN109754789 A CN 109754789A CN 201711082646 A CN201711082646 A CN 201711082646A CN 109754789 A CN109754789 A CN 109754789A
- Authority
- CN
- China
- Prior art keywords
- phoneme
- voice
- identified
- model
- recognition model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Telephonic Communication Services (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of recognition methods of phoneme of speech sound and devices, are related to technical field of voice recognition, when main purpose is to solve speech recognition, cutting phoneme low efficiency, alternatively, the problem of locally optimal solution.Main technical schemes of the invention include: that voice to be identified is inputted phoneme recognition model, and the corresponding expected results of the voice to be identified are obtained according to output result, wherein, the phoneme recognition model identifies each phoneme in the voice to be identified by a variety of neural network pattern types and hidden Markov model;According to the model parameter in the expected results training phoneme recognition model, until the change rate of phoneme model output result is less than preset threshold;Determine that the change rate is less than the output result of the preset threshold as the corresponding final phoneme recognition result of the voice to be identified.During identifying sound.
Description
Technical field
The present invention relates to technical field of voice recognition, recognition methods and device more particularly to a kind of phoneme of speech sound.
Background technique
In field of speech recognition, phoneme (phone) is as the smallest unit in voice, to improve the accurate of identification
Degree, first has to the resolution for improving each phoneme.
Currently, there are mainly two types of the main stream approach being trained for phoneme model: one is the hidden Ma Erke of Gaussian Mixture
Husband's model (Gaussian mix-ture-hidden Markov model, GMM-HMM), neural network-Hidden Markov Model
DNN-HMM.Wherein, GMM-HMM is mainly fitted using variable condition of the HMM to the corresponding frame of phoneme, then using GMM or
Person DNN restrains frame, and identification when is decoded using viterbi, cut based on time frame to audio
Point.
During inventor states invention in realization, discovery is in the prior art especially in the phoneme mould of identification phonetic transcriptions of Chinese characters
In type, in order to improve the correctness of cutting, when executing phoneme according to time frame cutting, a millisecond grade, cutting can be accurate to
The efficiency of phoneme is lower;In addition, due to the inborn unitarian hypothesis of HMM, dualism hypothesis, can there is identification during using HMM
Phoneme fall into locally optimal solution, reduce the accuracy of phoneme recognition, according to ternary assume or quaternary assume etc., identification
The calculation amount of phoneme is huge.
Summary of the invention
In view of this, recognition methods and the device of a kind of phoneme of speech sound provided by the invention, main purpose is to solve language
When sound identifies, cutting phoneme low efficiency, alternatively, the problem of locally optimal solution.
To solve the above-mentioned problems, present invention generally provides following technical solutions:
On the one hand, the present invention provides a kind of recognition methods of phoneme of speech sound, this method comprises:
Voice to be identified is inputted into phoneme recognition model, and to obtain the voice to be identified corresponding pre- according to output result
Phase result, wherein the phoneme recognition model by a variety of neural network pattern types and hidden Markov model identification it is described to
Identify each phoneme in voice;
According to the model parameter in the expected results training phoneme recognition model, until the phoneme model exports
As a result change rate is less than preset threshold;
Determine that the change rate is corresponding final as the voice to be identified less than the output result of the preset threshold
Phoneme recognition result.
Optionally, before voice to be identified is inputted phoneme recognition model, the method also includes:
Construct the phoneme recognition model.
Optionally, the building phoneme recognition model, comprising:
Construct the shot and long term memory network LSTM of convolutional neural networks CNN and the preset quantity number of plies;
Add deep neural network DNN and hidden Markov model HMM;
Utilize the convolutional neural networks CNN, the shot and long term memory network LSTM, deep neural network DNN and institute
It states hidden Markov model HMM and constructs the phoneme recognition model, and assign initialization value for the phoneme recognition model, wherein
Input terminal of the convolutional neural networks CNN as the voice to be identified, the deep neural network DNN is as described wait know
The output end of other voice.
Optionally, obtaining the corresponding expected results of the voice to be identified according to output result includes:
The voice to be identified is inputted into the convolutional neural networks CNN, noise reduction process is carried out to the voice to be identified;
The shot and long term memory network LSTM that the voice to be identified after noise reduction is inputted to the preset quantity number of plies, to institute
It states voice to be identified to be fitted, wherein shot and long term memory network LSTM forgets the invalid phoneme filtering of goalkeeper by activation, passes through
Activation Memory-Gate retains effective phoneme;
Voice to be identified after over-fitting is input to the deep neural network DNN;
Using phoneme in each moment of the output and corresponding probability as visible observation sequence, it is recorded in described hidden
In Markov model HMM in probability output matrix;
According to the probability output matrix and forwards algorithms, carry out that the first matrix is calculated;And according to the probability
Output matrix and backward algorithm carry out that the second matrix is calculated;
According to first matrix and second matrix, the corresponding maximum likelihood of each phoneme is calculated, and is recorded
In three matrixes;
The third matrix is decoded, the maximum likelihood value of each phoneme is obtained, to obtain the expected results.
Optionally, according to the model parameter in the expected results training phoneme recognition model, until the phoneme
Model output result change rate include: less than preset threshold
According to the expected results, since the output end of the phoneme model, successively each phoneme is executed under gradient
The derivative operation of drop;
According to the derivative operation, the neuron parameter in each phoneme in the phoneme model is adjusted, until the sound
The change rate that prime model exports result is less than the preset threshold.
Second aspect, the present invention provide a kind of identification device of phoneme of speech sound, comprising:
Input unit, for voice to be identified to be inputted phoneme recognition model, wherein the phoneme recognition model passes through more
Kind neural network pattern type and hidden Markov model identify each phoneme in the voice to be identified;
Output unit, for being tied according to output after voice to be identified is inputted phoneme recognition model by the input unit
Fruit obtains the corresponding expected results of the voice to be identified,
Training unit, the expected results for being exported according to the output unit are trained in the phoneme recognition model
Model parameter, until the phoneme model output result change rate be less than preset threshold;
Determination unit, for determining that the change rate is less than the output result of the preset threshold as the language to be identified
The corresponding final phoneme recognition result of sound.
Optionally, described device further include:
Construction unit, for constructing before the input unit is by the voice input phoneme recognition model to be identified
The phoneme recognition model.
Optionally, the construction unit includes:
First building module, for constructing the shot and long term memory network of convolutional neural networks CNN and the preset quantity number of plies
LSTM;
Adding module, for adding deep neural network DNN and hidden Markov model HMM;
Second building module, for utilizing the convolutional neural networks CNN, the shot and long term memory network LSTM, depth
The neural network DNN and hidden Markov model HMM constructs the phoneme recognition model, wherein the convolutional Neural net
Input terminal of the network CNN as the voice to be identified, output of the deep neural network DNN as the voice to be identified
End;
Assignment module, for assigning initialization value for the phoneme recognition model.
Optionally, the output unit includes:
Noise reduction module, for the voice to be identified to be inputted the convolutional neural networks CNN, to the voice to be identified
Carry out noise reduction process;
First input module, for the voice to be identified after the noise reduction module noise reduction to be inputted the preset quantity
The shot and long term memory network LSTM of the number of plies, wherein shot and long term memory network LSTM forgets the invalid phoneme filtering of goalkeeper by activation,
Effective phoneme is retained by activation Memory-Gate;
Fitting module, for being fitted to the voice to be identified;
Voice to be identified after over-fitting is input to the deep neural network DNN by the second input module;
Logging modle, in each moment for exporting second output module phoneme and corresponding probability as
It can be seen that observation sequence, is recorded in the hidden Markov model HMM in probability output matrix;
First computing module, for carrying out that the first square is calculated according to the probability output matrix and forwards algorithms
Battle array;
Second computing module, for carrying out that the second square is calculated according to the probability output matrix and backward algorithm
Battle array;
Third computing module, for it is corresponding most to calculate each phoneme according to first matrix and second matrix
Maximum-likelihood, and be recorded in third matrix;
Processing module obtains the maximum likelihood value of each phoneme, for being decoded to the third matrix to obtain
State expected results.
Optionally, training unit includes:
Computing module is used for according to the expected results, since the output end of the phoneme model, successively to each sound
Element executes the derivative operation of gradient decline;
Module is adjusted, for adjusting the neural radix scrophulariae in each phoneme in the phoneme model according to the derivative operation
Number, until the change rate of phoneme model output result is less than the preset threshold.
To achieve the goals above, according to another aspect of the present invention, a kind of storage medium, the storage medium are provided
Program including storage, wherein equipment where controlling the storage medium in described program operation executes language as described above
The recognition methods of sound phoneme.
To achieve the goals above, according to another aspect of the present invention, a kind of processor is provided, the processor is used for
Run program, wherein described program executes the recognition methods of phoneme of speech sound as described above when running.
By above-mentioned technical proposal, technical solution provided by the invention is at least had the advantage that
The recognition methods of phoneme of speech sound provided by the invention and device, firstly, voice to be identified is inputted phoneme recognition mould
Type, and the corresponding expected results of the voice to be identified are obtained according to output result, wherein the phoneme recognition model passes through more
Kind neural network pattern type and hidden Markov model identify each phoneme in the voice to be identified;Secondly, according to described
Model parameter in the expected results training phoneme recognition model, until the change rate of phoneme model output result is less than
Preset threshold;Finally, determining that the change rate is corresponding as the voice to be identified less than the output result of the preset threshold
Final phoneme recognition result;Compared with prior art, the present invention passes through the utilization of phoneme recognition model, and it is congenital to compensate for HMM
Unitarian hypothesis, dualism hypothesis defect, avoid the problem that falling into locally optimal solution, in addition, can lead in phoneme recognition model
It crosses hidden Markov model and the other decoding of frame level is carried out to voice, reduce operation scale.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows a kind of flow chart of the recognition methods of phoneme of speech sound provided in an embodiment of the present invention;
Fig. 2 shows a kind of configuration diagrams of phoneme recognition model provided in an embodiment of the present invention;
Fig. 3 shows a kind of change rate of the output result of recognition result provided in an embodiment of the present invention to time change
Schematic diagram;
Fig. 4 shows the flow chart of the recognition methods of another phoneme of speech sound provided in an embodiment of the present invention;
Fig. 5 shows a kind of schematic diagram of shot and long term memory network LSTM fitting result provided in an embodiment of the present invention;
Fig. 6 shows the schematic diagram provided in an embodiment of the present invention with a kind of probability output matrix;
Fig. 7 shows the schematic diagram of partial nerve member in a kind of phoneme recognition model provided in an embodiment of the present invention;
Fig. 8 shows a kind of composition block diagram of the identification device of phoneme of speech sound provided in an embodiment of the present invention;
Fig. 9 shows the composition block diagram of the identification device of another phoneme of speech sound provided in an embodiment of the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
The embodiment of the present invention provides a kind of recognition methods of phoneme of speech sound, as shown in Figure 1, which comprises
101, voice to be identified is inputted into phoneme recognition model, and the voice to be identified is obtained according to output result and is corresponded to
Expected results.
Voice to be identified described in the embodiment of the present invention can be that (such as voice to be identified is in one section to any one section of voice
Text), which is input to phoneme recognition model, which can be used for according to time sequencing to voice
Phoneme carries out cutting, identification, conversion, voice to be identified is converted to the Chinese phonetic alphabet, then turned by language model by the Chinese phonetic alphabet
It is changed to corresponding Chinese.The speech recognition in acoustics is the process of aligned phoneme sequence by emphasis in the embodiment of the present invention.
Wherein, the phoneme recognition model by a variety of neural network pattern types and hidden Markov model identification it is described to
Identify each phoneme in voice;Neural network include convolutional neural networks (Convolutional Neural Network,
CNN), shot and long term memory network (Long Short-Term Memory, LSTM).In actual application of the embodiment of the present invention
In, it may include one layer of convolutional neural networks CNN in the phoneme recognition model of setting, 5 layers of shot and long term memory network LSTM, 1 layer
Deep neural network DNN, 1 layer of hidden Markov model (Hidden Markov Model, HMM).But the embodiment of the present invention
In can't go limit shot and long term memory network LSTM the specific number of plies, reach as high as 170 layers.
As shown in Fig. 2, Fig. 2 shows a kind of configuration diagram of phoneme recognition model provided in an embodiment of the present invention,
In, the meaning of this layer of convolutional neural networks CNN is to adjust the influence of speaker's difference audio bring by convolution, 5 layers long
Short-term memory network LSTM transmits the voice of convolutional neural networks CNN input, this for voice based on dynamic
The degree of fitting of data is very good, it can be achieved that the message of future time instance is intervened and corrected to the input at current time, last
Layer depth neural network DNN is exported.Wherein, the exemplary only explanation of connection shown in Fig. 2, in practical applications,
Connection relationship is lateral connection, and it is not limited in the embodiment of the present invention.
102, according to the model parameter in the expected results training phoneme recognition model, until the phoneme model
The change rate for exporting result is less than preset threshold.
Under normal conditions, expected results obtained in step 101 are all incorrect phoneme of speech sound identification, are needed according to pre-
Phase result carries out repetition training to the model parameter in the phoneme recognition model, and the framework of training being substantially according to Fig. 2 is inverse
Each model parameter into adjustment phoneme recognition model, the model parameter are the connection relationship in model between neuron,
In a particular application, the model parameter can neuron parameter between each neuron.
In practical applications, there is no unique final result for the result of morpheme identification, but to phoneme recognition mould
During type is trained, the change rate of the output result of recognition result tends to fluctuate in lesser range up and down.Such as Fig. 3 institute
Show, Fig. 3 shows a kind of signal of the change rate of the output result of recognition result provided in an embodiment of the present invention to time change
Figure, the highest point of change rate is that expected results are trained model parameter w since at expected results, every time after training
Output result can gradually become smaller, and when change rate fluctuates smaller up and down, export training result at this time.It is reacted to technology realization
On, when the change rate for exporting result is less than preset threshold, training result is exported, preset threshold described in the embodiment of the present invention can
Think a specific numerical value, such as 0.2, or a section, such as 0.18-0.26, specifically, the embodiment of the present invention
It does not limit this.
103, determine that the change rate is corresponding as the voice to be identified less than the output result of the preset threshold
Final phoneme recognition result.
It should be noted that including hidden Markov model in phoneme recognition model described in the embodiment of the present invention
HMM, output the result is that executed by Viterbi decoded as a result, language model can be directly transferred to, carry out subsequent language
Say the identification of model.As the another embodiment of the embodiment of the present invention, if not including in phoneme recognition model has hidden Ma Er
Can husband model HMM remain as a voice sequence, it is also necessary to be passed then after deep neural network DNN is exported
It transports in hidden Markov model HMM and executes decoding operation.Specifically, it is not limited in the embodiment of the present invention.
The recognition methods of phoneme of speech sound provided by the invention, firstly, voice to be identified is inputted phoneme recognition model, and root
The corresponding expected results of the voice to be identified are obtained according to output result, wherein the phoneme recognition model passes through a variety of nerves
Network-type model and hidden Markov model identify each phoneme in the voice to be identified;Secondly, according to the expected knot
Model parameter in the fruit training phoneme recognition model, until the change rate of phoneme model output result is less than default threshold
Value;Finally, determining that the change rate is corresponding final as the voice to be identified less than the output result of the preset threshold
Phoneme recognition result;Compared with prior art, the present invention passes through the utilization of phoneme recognition model, compensates for the inborn unitary of HMM
Assuming that, dualism hypothesis defect, avoid the problem that falling into locally optimal solution, in addition, hidden horse can be passed through in phoneme recognition model
Er Kefu model carries out the other decoding of frame level to voice, reduces operation scale.
Further, as the refinement and extension to the embodiment of the present invention, the embodiment of the present invention provides another voice
The recognition methods of phoneme, as shown in Figure 4, which comprises
201, the phoneme recognition model is constructed.
The phoneme recognition model of construction, please continue to refer to Fig. 2, one layer of convolutional neural networks CNN is as described to be identified
The input terminal of voice;The shot and long term memory network LSTM of the preset quantity number of plies is constructed, deep neural network DNN, the depth are added
Output end of the neural network DNN as the voice to be identified is spent, adds hidden Markov model HMM, and utilize the convolution
Neural network CNN, the shot and long term memory network LSTM, deep neural network DNN and the hidden Markov model HMM structure
The phoneme recognition model is built, and assigns initialization value for the phoneme recognition model.
202, voice to be identified is inputted into phoneme recognition model, and the voice to be identified is obtained according to output result and is corresponded to
Expected results, wherein the phoneme recognition model by a variety of neural network models and hidden Markov model identification described in
Each phoneme in voice to be identified is (the same as step 101).
The voice to be identified is inputted into the convolutional neural networks CNN, noise reduction process is carried out to the voice to be identified,
The voice to be identified after noise reduction is inputted five layers of shot and long term and remembered by the problem of can effectively solve the problem that voice different channels
Network LSTM is fitted the voice to be identified, wherein shot and long term memory network LSTM is invalid by activation forgetting goalkeeper
Phoneme filtering is retained effective phoneme by activation Memory-Gate;Illustratively, the forgetting door in LSTM can read output one
Probability between 0 to 1 corresponds to the number in cell state to each phoneme.1 indicates " being fully retained ", and 0 indicates " giving up completely ".
Cell state may include the gender of current subject, therefore correctly pronoun can be selected, when we have seen that new master
Language, it is intended that forget that old subject, old subject will be filtered.
In the embodiment of the present invention, the process of shot and long term memory network LSTM fitting substantially determines one section of voice to be identified
Sequence belong in each phoneme of a certain moment probability.Illustratively, such as voice to be identified is " hello ", and corresponding phonetic is
" n i h a o ", as shown in figure 5, include 26 letter a in Fig. 5, the corresponding pass between b, c, d, e ... z and moment t
System, when the t0 moment, the input of shot and long term memory network LSTM one voice sequence to be identified of reception, " n i h a o ", and according to
Secondary judgement belongs to alphabetical a, the probability of b, c, d ... z.In actual application, two sides forward and backward can be used
To LSTM realize that the message of future time instance is intervened and corrected to the input at current time.
Voice to be identified after over-fitting is input to the deep neural network DNN, deep neural network DNN is
One sorting algorithm, the output layer of deep neural network DNN are corresponding with different neurons, the corresponding class of each neuron
Not, deep neural network DNN disaggregated model require be output layer neuron output value between 0 to 1, while all output valves
The sum of be 1, i.e., in a line or a column, the phoneme that the sum of output probability value is 1, and with the phoneme number that substantially exports without
It closes.In practical applications, when deep neural network DNN is executed and exported, need to complete the output of phoneme by activation primitive,
The activation primitive include but is not limited to sigmoid activation primitive, softmax activation primitive, ReLU activation primitive, tanh swash
Function living etc., specifically without limitation.
Using phoneme in each moment of the output and corresponding probability as visible observation sequence, it is recorded in described hidden
In the probability output matrix of Markov model HMM, the probability output matrix is as shown in fig. 6, Fig. 6 shows implementation of the present invention
A kind of schematic diagram with probability output matrix that example provides, the probability P of record, may not with both the probability Ps recorded in Fig. 5
Unanimously, but the sum of the corresponding probability of phoneme in a line or a column is 1.
According to the probability output matrix and forwards algorithms, carry out that the first matrix is calculated;It is defeated according to the probability
Matrix and backward algorithm out carry out that the second matrix is calculated;According to first matrix and second matrix, calculate each
The corresponding maximum likelihood of a phoneme, and be recorded in third matrix.In the embodiment of the present invention, related forwards algorithms and backward calculation
The specific implementation of method, the method for calculating maximum likelihood, the detailed description that refer to the prior art, the embodiment of the present invention is to preceding
It is no longer repeated to the calculating process of algorithm and backward algorithm.
The third matrix is decoded, the maximum likelihood value of each phoneme is obtained, to obtain the expected results.?
When executing decoding, the embodiment of the present invention is decoded phoneme using Viterbi algorithm.
It should be noted that step 202 is to obtain the detailed process of expected results, when step 203 when being executed, same meeting
Identical calculation is taken to be calculated, the difference lies in that the parameter (model parameter) of each neuron is no when calculating every time
Together.
203, according to the model parameter in the expected results training phoneme recognition model, until the phoneme model
The change rate for exporting result is less than preset threshold (with step 102).
In the training process, according to the expected results, since the output end of the phoneme model, successively to each sound
Element executes the derivative operation of gradient decline;According to the derivative operation, adjust in each phoneme in the nerve of the phoneme model
First parameter, until the change rate of phoneme model output result is less than the preset threshold.As shown in fig. 7, Fig. 7 shows this
Inventive embodiments provide a kind of phoneme recognition model in partial nerve member schematic diagram, in the model, each neuron it
Between by full connection be attached, be the explanation carried out by taking one of neuron as an example in Fig. 7, wherein w1w2w3w4 be mind
Through first parameter, different probability sizes is corresponded to, when executing the derivative operation of gradient decline, from output end to input terminal successively
It executes, after neuron derivation apart from output end side, obtains one group of numerical value, adjacent next column can also be obtained with derivation
Results change and change, adjustment foundation be each neuron derivation after result.
204, determine that the change rate is corresponding as the voice to be identified less than the output result of the preset threshold
Final phoneme recognition result is (the same as step 103).
If the change rate of the phoneme model output result is more than or equal to preset threshold, recycles and execute step 203.
Further, as the realization to method shown in above-mentioned Fig. 1, another embodiment of the present invention additionally provides a kind of voice
The identification device of phoneme.The Installation practice is corresponding with preceding method embodiment, and to be easy to read, present apparatus embodiment is no longer right
Detail content in preceding method embodiment is repeated one by one, it should be understood that the device in the present embodiment can correspond to reality
Full content in existing preceding method embodiment.
The embodiment of the present invention provides a kind of identification device of phoneme of speech sound, as shown in Figure 8, comprising:
Input unit 31, for voice to be identified to be inputted phoneme recognition model, wherein the phoneme recognition model passes through
A variety of neural network pattern types and hidden Markov model identify each phoneme in the voice to be identified;
Output unit 32 is used for after voice to be identified is inputted phoneme recognition model by the input unit 31, according to defeated
Result obtains the corresponding expected results of the voice to be identified out,
Training unit 33, the expected results training phoneme recognition mould for being exported according to the output unit 32
Model parameter in type, until the change rate of phoneme model output result is less than preset threshold;
Determination unit 34, for determining that the change rate is less than the output result of the preset threshold as described to be identified
The corresponding final phoneme recognition result of voice.
Further, as shown in figure 9, described device further include:
Construction unit 35 is used for before the input unit 31 is by the voice input phoneme recognition model to be identified,
Construct the phoneme recognition model.
Further, as shown in figure 9, the construction unit 35 includes:
First building module 351, the shot and long term for constructing convolutional neural networks CNN and the preset quantity number of plies remember net
Network LSTM;
Adding module 352, for adding deep neural network DNN and hidden Markov model HMM;
Second building module 353, for utilizing the convolutional neural networks CNN, the shot and long term memory network LSTM, depth
It spends the neural network DNN and hidden Markov model HMM and constructs the phoneme recognition model, wherein the convolutional Neural
Input terminal of the network C NN as the voice to be identified, output of the deep neural network DNN as the voice to be identified
End;
Assignment module 354, for assigning initialization value for the phoneme recognition model.
Further, as shown in figure 9, output unit 32 includes:
Noise reduction module 321, for the voice to be identified to be inputted the convolutional neural networks CNN, to described to be identified
Voice carries out noise reduction process;
First input module 322, for the voice input to be identified after the noise reduction module noise reduction is described default
The shot and long term memory network LSTM of the quantity number of plies, wherein shot and long term memory network LSTM forgets the invalid phoneme of goalkeeper by activation
Filtering is retained effective phoneme by activation Memory-Gate;
Fitting module 323, for being fitted to the voice to be identified;
Second input module 324, for the voice to be identified after over-fitting to be input to the deep neural network
DNN;
Logging modle 325, phoneme and corresponding probability in each moment for exporting second output module
As visible observation sequence, it is recorded in the hidden Markov model HMM in probability output matrix;
First computing module 326, for carrying out being calculated first according to the probability output matrix and forwards algorithms
Matrix;
Second computing module 327, for carrying out being calculated second according to the probability output matrix and backward algorithm
Matrix;
Third calculates 328 pieces of mould, for it is corresponding to calculate each phoneme according to first matrix and second matrix
Maximum likelihood, and be recorded in third matrix;
Processing module 329 obtains the maximum likelihood value of each phoneme, for being decoded to the third matrix to obtain
To the expected results.
Further, as shown in figure 9, training unit 33 includes:
Computing module 331 is used for according to the expected results, since the output end of the phoneme model, successively to every
A phoneme executes the derivative operation of gradient decline;
Module 332 is adjusted, for adjusting in each phoneme in the neuron of the phoneme model according to the derivative operation
Parameter, until the change rate of phoneme model output result is less than the preset threshold.
The identification device of the phoneme of speech sound includes processor and memory, and above-mentioned input unit, output unit, training are single
Member and determination unit etc. store in memory as program unit, are executed by processor stored in memory above-mentioned
Program unit realizes corresponding function.
Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be set one
Or more, by adjusting kernel parameter come when solving speech recognition, cutting phoneme low efficiency, alternatively, the problem of locally optimal solution.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include that at least one is deposited
Store up chip.
The embodiment of the invention provides a kind of storage mediums, are stored thereon with program, real when which is executed by processor
The identification of the existing phoneme of speech sound.
The embodiment of the invention provides a kind of processor, the processor is for running program, wherein described program operation
The identification of phoneme of speech sound described in Shi Zhihang.
The embodiment of the invention provides a kind of equipment, equipment include processor, memory and storage on a memory and can
The program run on a processor, processor perform the steps of when executing program
Voice to be identified is inputted into phoneme recognition model, and to obtain the voice to be identified corresponding pre- according to output result
Phase result, wherein the phoneme recognition model by a variety of neural network pattern types and hidden Markov model identification it is described to
Identify each phoneme in voice;
According to the model parameter in the expected results training phoneme recognition model, until the phoneme model exports
As a result change rate is less than preset threshold;
Determine that the change rate is corresponding final as the voice to be identified less than the output result of the preset threshold
Phoneme recognition result.
Optionally, before voice to be identified is inputted phoneme recognition model, the method also includes:
Construct the phoneme recognition model.
Optionally, the building phoneme recognition model, comprising:
Construct the shot and long term memory network LSTM of convolutional neural networks CNN and the preset quantity number of plies;
Add deep neural network DNN and hidden Markov model HMM;
Utilize the convolutional neural networks CNN, the shot and long term memory network LSTM, deep neural network DNN and institute
It states hidden Markov model HMM and constructs the phoneme recognition model, and assign initialization value for the phoneme recognition model, wherein
Input terminal of the convolutional neural networks CNN as the voice to be identified, the deep neural network DNN is as described wait know
The output end of other voice.
Optionally, obtaining the corresponding expected results of the voice to be identified according to output result includes:
The voice to be identified is inputted into the convolutional neural networks CNN, noise reduction process is carried out to the voice to be identified;
The shot and long term memory network LSTM that the voice to be identified after noise reduction is inputted to the preset quantity number of plies, to institute
It states voice to be identified to be fitted, wherein shot and long term memory network LSTM forgets the invalid phoneme filtering of goalkeeper by activation, passes through
Activation Memory-Gate retains effective phoneme;
Voice to be identified after over-fitting is input to the deep neural network DNN;
Using phoneme in each moment of the output and corresponding probability as visible observation sequence, it is recorded in described hidden
In Markov model HMM in probability output matrix;
According to the probability output matrix and forwards algorithms, carry out that the first matrix is calculated;And according to the probability
Output matrix and backward algorithm carry out that the second matrix is calculated;
According to first matrix and second matrix, the corresponding maximum likelihood of each phoneme is calculated, and is recorded
In three matrixes;
The third matrix is decoded, the maximum likelihood value of each phoneme is obtained, to obtain the expected results.
Optionally, according to the model parameter in the expected results training phoneme recognition model, until the phoneme
Model output result change rate include: less than preset threshold
According to the expected results, since the output end of the phoneme model, successively each phoneme is executed under gradient
The derivative operation of drop;
According to the derivative operation, the neuron parameter in each phoneme in the phoneme model is adjusted, until the sound
The change rate that prime model exports result is less than the preset threshold.
Equipment herein can be server, PC, PAD, mobile phone etc..
Present invention also provides a kind of computer program products, when executing on data processing equipment, are adapted for carrying out just
The program code of beginningization there are as below methods step: voice to be identified is inputted into phoneme recognition model, and is obtained according to output result
The corresponding expected results of the voice to be identified, wherein the phoneme recognition model passes through a variety of neural network pattern types and hidden
Markov model identifies each phoneme in the voice to be identified;
According to the model parameter in the expected results training phoneme recognition model, until the phoneme model exports
As a result change rate is less than preset threshold;
Determine that the change rate is corresponding final as the voice to be identified less than the output result of the preset threshold
Phoneme recognition result.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/
Or the forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable Jie
The example of matter.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including element
There is also other identical elements in process, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product.
Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application
Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code
The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
Formula.
The above is only embodiments herein, are not intended to limit this application.To those skilled in the art,
Various changes and changes are possible in this application.It is all within the spirit and principles of the present application made by any modification, equivalent replacement,
Improve etc., it should be included within the scope of the claims of this application.
Claims (10)
1. a kind of recognition methods of phoneme of speech sound characterized by comprising
Voice to be identified is inputted into phoneme recognition model, and the corresponding expected knot of the voice to be identified is obtained according to output result
Fruit, wherein the phoneme recognition model is identified described to be identified by a variety of neural network pattern types and hidden Markov model
Each phoneme in voice;
According to the model parameter in the expected results training phoneme recognition model, until the phoneme model exports result
Change rate be less than preset threshold;
Determine that the change rate is less than the output result of the preset threshold as the corresponding final phoneme of the voice to be identified
Recognition result.
2. the method according to claim 1, wherein by voice to be identified input phoneme recognition model before,
The method also includes:
Construct the phoneme recognition model.
3. according to the method described in claim 2, it is characterized in that, the building phoneme recognition model, comprising:
Construct the shot and long term memory network LSTM of convolutional neural networks CNN and the preset quantity number of plies;
Add deep neural network DNN and hidden Markov model HMM;
Utilize the convolutional neural networks CNN, the shot and long term memory network LSTM, deep neural network DNN and described hidden
Markov model HMM constructs the phoneme recognition model, and assigns initialization value for the phoneme recognition model, wherein described
Input terminal of the convolutional neural networks CNN as the voice to be identified, the deep neural network DNN is as the language to be identified
The output end of sound.
4. according to the method described in claim 3, it is characterized in that, to obtain the voice to be identified corresponding according to output result
Expected results include:
The voice to be identified is inputted into the convolutional neural networks CNN, noise reduction process is carried out to the voice to be identified;
The shot and long term memory network LSTM that the voice to be identified after noise reduction is inputted to the preset quantity number of plies, to it is described to
Identification voice is fitted, wherein shot and long term memory network LSTM forgets the invalid phoneme filtering of goalkeeper by activation, passes through activation
Memory-Gate retains effective phoneme;
Voice to be identified after over-fitting is input to the deep neural network DNN;
Using phoneme in each moment of the output and corresponding probability as visible observation sequence, it is recorded in the hidden Ma Er
It can be in husband's model HMM in probability output matrix;
According to the probability output matrix and forwards algorithms, carry out that the first matrix is calculated;And according to the probability output
Matrix and backward algorithm carry out that the second matrix is calculated;
According to first matrix and second matrix, the corresponding maximum likelihood of each phoneme is calculated, and third square is recorded
In battle array;
The third matrix is decoded, the maximum likelihood value of each phoneme is obtained, to obtain the expected results.
5. according to the method described in claim 4, it is characterized in that, according to the expected results training phoneme recognition model
In model parameter, until the phoneme model output result change rate include: less than preset threshold
According to the expected results, since the output end of the phoneme model, gradient decline successively is executed to each phoneme
Derivative operation;
According to the derivative operation, adjust in the neuron parameter of the phoneme recognition model in each phoneme, until the sound
The change rate that prime model exports result is less than the preset threshold.
6. a kind of identification device of phoneme of speech sound characterized by comprising
Input unit, for voice to be identified to be inputted phoneme recognition model, wherein the phoneme recognition model passes through a variety of minds
Each phoneme in the voice to be identified is identified through network-type model and hidden Markov model;
Output unit, for being obtained according to output result after voice to be identified is inputted phoneme recognition model by the input unit
To the corresponding expected results of the voice to be identified,
Training unit, the mould in the expected results training phoneme recognition model for being exported according to the output unit
Shape parameter, until the change rate of phoneme model output result is less than preset threshold;
Determination unit, for determining that the change rate is less than the output result of the preset threshold as the voice pair to be identified
The final phoneme recognition result answered.
7. device according to claim 6, which is characterized in that described device further include:
Construction unit is used for before the input unit is by the voice input phoneme recognition model to be identified, described in building
Phoneme recognition model.
8. device according to claim 7, which is characterized in that the construction unit includes:
First building module, for constructing the shot and long term memory network LSTM of convolutional neural networks CNN and the preset quantity number of plies;
Adding module, for adding deep neural network DNN and hidden Markov model HMM;
Second building module, for utilizing the convolutional neural networks CNN, the shot and long term memory network LSTM, depth nerve
The network DNN and hidden Markov model HMM constructs the phoneme recognition model, wherein the convolutional neural networks CNN
As the input terminal of the voice to be identified, output end of the deep neural network DNN as the voice to be identified;
Assignment module, for assigning initialization value for the phoneme recognition model.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program
When control the storage medium where equipment perform claim require 1 to the phoneme of speech sound described in any one of claim 5
Recognition methods.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run
Benefit require 1 to the phoneme of speech sound described in any one of claim 5 recognition methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711082646.9A CN109754789B (en) | 2017-11-07 | 2017-11-07 | Method and device for recognizing voice phonemes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711082646.9A CN109754789B (en) | 2017-11-07 | 2017-11-07 | Method and device for recognizing voice phonemes |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109754789A true CN109754789A (en) | 2019-05-14 |
CN109754789B CN109754789B (en) | 2021-06-08 |
Family
ID=66400936
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711082646.9A Active CN109754789B (en) | 2017-11-07 | 2017-11-07 | Method and device for recognizing voice phonemes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109754789B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390929A (en) * | 2019-08-05 | 2019-10-29 | 中国民航大学 | Chinese and English civil aviaton land sky call acoustic model construction method based on CDNN-HMM |
CN110600018A (en) * | 2019-09-05 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Voice recognition method and device and neural network training method and device |
CN110992942A (en) * | 2019-11-29 | 2020-04-10 | 北京搜狗科技发展有限公司 | Voice recognition method and device for voice recognition |
CN111986653A (en) * | 2020-08-06 | 2020-11-24 | 杭州海康威视数字技术股份有限公司 | Voice intention recognition method, device and equipment |
CN112669881A (en) * | 2020-12-25 | 2021-04-16 | 北京融讯科创技术有限公司 | Voice detection method, device, terminal and storage medium |
CN112750425A (en) * | 2020-01-22 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Speech recognition method, speech recognition device, computer equipment and computer readable storage medium |
CN112905024A (en) * | 2021-01-21 | 2021-06-04 | 李博林 | Syllable recording method and device for words |
CN113838456A (en) * | 2021-09-28 | 2021-12-24 | 科大讯飞股份有限公司 | Phoneme extraction method, voice recognition method, device, equipment and storage medium |
WO2023124500A1 (en) * | 2021-12-30 | 2023-07-06 | 深圳市慧鲤科技有限公司 | Voice recognition method and apparatus, device and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070239444A1 (en) * | 2006-03-29 | 2007-10-11 | Motorola, Inc. | Voice signal perturbation for speech recognition |
CN102122507A (en) * | 2010-01-08 | 2011-07-13 | 龚澍 | Speech error detection method by front-end processing using artificial neural network (ANN) |
CN104681036A (en) * | 2014-11-20 | 2015-06-03 | 苏州驰声信息科技有限公司 | System and method for detecting language voice frequency |
US20150161991A1 (en) * | 2013-12-10 | 2015-06-11 | Google Inc. | Generating representations of acoustic sequences using projection layers |
CN105513591A (en) * | 2015-12-21 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | Method and device for speech recognition by use of LSTM recurrent neural network model |
CN106098059A (en) * | 2016-06-23 | 2016-11-09 | 上海交通大学 | customizable voice awakening method and system |
CN106328122A (en) * | 2016-08-19 | 2017-01-11 | 深圳市唯特视科技有限公司 | Voice identification method using long-short term memory model recurrent neural network |
CN106816147A (en) * | 2017-01-25 | 2017-06-09 | 上海交通大学 | Speech recognition system based on binary neural network acoustic model |
CN106940998A (en) * | 2015-12-31 | 2017-07-11 | 阿里巴巴集团控股有限公司 | A kind of execution method and device of setting operation |
CN107093422A (en) * | 2017-01-10 | 2017-08-25 | 上海优同科技有限公司 | A kind of audio recognition method and speech recognition system |
-
2017
- 2017-11-07 CN CN201711082646.9A patent/CN109754789B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070239444A1 (en) * | 2006-03-29 | 2007-10-11 | Motorola, Inc. | Voice signal perturbation for speech recognition |
CN102122507A (en) * | 2010-01-08 | 2011-07-13 | 龚澍 | Speech error detection method by front-end processing using artificial neural network (ANN) |
US20150161991A1 (en) * | 2013-12-10 | 2015-06-11 | Google Inc. | Generating representations of acoustic sequences using projection layers |
CN104681036A (en) * | 2014-11-20 | 2015-06-03 | 苏州驰声信息科技有限公司 | System and method for detecting language voice frequency |
CN105513591A (en) * | 2015-12-21 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | Method and device for speech recognition by use of LSTM recurrent neural network model |
CN106940998A (en) * | 2015-12-31 | 2017-07-11 | 阿里巴巴集团控股有限公司 | A kind of execution method and device of setting operation |
CN106098059A (en) * | 2016-06-23 | 2016-11-09 | 上海交通大学 | customizable voice awakening method and system |
CN106328122A (en) * | 2016-08-19 | 2017-01-11 | 深圳市唯特视科技有限公司 | Voice identification method using long-short term memory model recurrent neural network |
CN107093422A (en) * | 2017-01-10 | 2017-08-25 | 上海优同科技有限公司 | A kind of audio recognition method and speech recognition system |
CN106816147A (en) * | 2017-01-25 | 2017-06-09 | 上海交通大学 | Speech recognition system based on binary neural network acoustic model |
Non-Patent Citations (2)
Title |
---|
TAKUYA YOSHIOKA ETC: "Far-field speech recognition using CNN-DNN-HMM with convolution in time", 《ICASSP2015-2015IEEE INTERNATIONAL CONFERENCE ON ACOUSTIC, SPEECH AND SIGNAL PROCESSING(ICASSP)》 * |
谢逸: "基于CNN和LSTM混合模型的中文词性标注", 《武汉大学学报(理学版)》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390929A (en) * | 2019-08-05 | 2019-10-29 | 中国民航大学 | Chinese and English civil aviaton land sky call acoustic model construction method based on CDNN-HMM |
CN110600018A (en) * | 2019-09-05 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Voice recognition method and device and neural network training method and device |
CN110992942A (en) * | 2019-11-29 | 2020-04-10 | 北京搜狗科技发展有限公司 | Voice recognition method and device for voice recognition |
CN110992942B (en) * | 2019-11-29 | 2022-07-08 | 北京搜狗科技发展有限公司 | Voice recognition method and device for voice recognition |
CN112750425B (en) * | 2020-01-22 | 2023-11-03 | 腾讯科技(深圳)有限公司 | Speech recognition method, device, computer equipment and computer readable storage medium |
CN112750425A (en) * | 2020-01-22 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Speech recognition method, speech recognition device, computer equipment and computer readable storage medium |
US12112743B2 (en) | 2020-01-22 | 2024-10-08 | Tencent Technology (Shenzhen) Company Limited | Speech recognition method and apparatus with cascaded hidden layers and speech segments, computer device, and computer-readable storage medium |
CN111986653A (en) * | 2020-08-06 | 2020-11-24 | 杭州海康威视数字技术股份有限公司 | Voice intention recognition method, device and equipment |
CN112669881A (en) * | 2020-12-25 | 2021-04-16 | 北京融讯科创技术有限公司 | Voice detection method, device, terminal and storage medium |
CN112905024A (en) * | 2021-01-21 | 2021-06-04 | 李博林 | Syllable recording method and device for words |
CN112905024B (en) * | 2021-01-21 | 2023-10-27 | 李博林 | Syllable recording method and device for word |
CN113838456B (en) * | 2021-09-28 | 2024-05-31 | 中国科学技术大学 | Phoneme extraction method, voice recognition method, device, equipment and storage medium |
CN113838456A (en) * | 2021-09-28 | 2021-12-24 | 科大讯飞股份有限公司 | Phoneme extraction method, voice recognition method, device, equipment and storage medium |
WO2023124500A1 (en) * | 2021-12-30 | 2023-07-06 | 深圳市慧鲤科技有限公司 | Voice recognition method and apparatus, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109754789B (en) | 2021-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109754789A (en) | The recognition methods of phoneme of speech sound and device | |
US11138471B2 (en) | Augmentation of audiographic images for improved machine learning | |
KR102399535B1 (en) | Learning method and apparatus for speech recognition | |
CN109410924B (en) | Identification method and identification device | |
US9177550B2 (en) | Conservatively adapting a deep neural network in a recognition system | |
CN109036389A (en) | The generation method and device of a kind of pair of resisting sample | |
CN108920510A (en) | Automatic chatting method, device and electronic equipment | |
US11205419B2 (en) | Low energy deep-learning networks for generating auditory features for audio processing pipelines | |
CN105718943A (en) | Character selection method based on particle swarm optimization algorithm | |
CN113826125A (en) | Training machine learning models using unsupervised data enhancement | |
CN113362822A (en) | Black box voice confrontation sample generation method with auditory masking | |
CN112185382B (en) | Method, device, equipment and medium for generating and updating wake-up model | |
CN111627428A (en) | Method for constructing compressed speech recognition model | |
CN114072816A (en) | Method and system for multi-view and multi-source migration in neural topic modeling | |
CN111882042A (en) | Automatic searching method, system and medium for neural network architecture of liquid state machine | |
WO2022028378A1 (en) | Voice intention recognition method, apparatus and device | |
CN113744727B (en) | Model training method, system, terminal equipment and storage medium | |
CN112466310A (en) | Deep learning voiceprint recognition method and device, electronic equipment and storage medium | |
KR102684936B1 (en) | Electronic device and method for controlling the electronic device thereof | |
CN116664731A (en) | Face animation generation method and device, computer readable storage medium and terminal | |
CN108573275B (en) | Construction method of online classification micro-service | |
CN111602145A (en) | Optimization method of convolutional neural network and related product | |
KR20190118332A (en) | Electronic apparatus and control method thereof | |
CN115171710A (en) | Voice enhancement method and system for generating confrontation network based on multi-angle discrimination | |
CN115062769A (en) | Knowledge distillation-based model training method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: Beijing Guoshuang Technology Co.,Ltd. Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A Applicant before: Beijing Guoshuang Technology Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |