CN114974228A - Rapid voice recognition method based on hierarchical recognition - Google Patents

Rapid voice recognition method based on hierarchical recognition Download PDF

Info

Publication number
CN114974228A
CN114974228A CN202210571189.4A CN202210571189A CN114974228A CN 114974228 A CN114974228 A CN 114974228A CN 202210571189 A CN202210571189 A CN 202210571189A CN 114974228 A CN114974228 A CN 114974228A
Authority
CN
China
Prior art keywords
shallow
network
speech recognition
networks
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210571189.4A
Other languages
Chinese (zh)
Other versions
CN114974228B (en
Inventor
吕志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mingri Dream Beijing Technology Co ltd
Original Assignee
Mingri Dream Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mingri Dream Beijing Technology Co ltd filed Critical Mingri Dream Beijing Technology Co ltd
Priority to CN202210571189.4A priority Critical patent/CN114974228B/en
Publication of CN114974228A publication Critical patent/CN114974228A/en
Application granted granted Critical
Publication of CN114974228B publication Critical patent/CN114974228B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A rapid speech recognition method based on hierarchical recognition is characterized in that speech with different difficulties is shunted, models are disassembled step by step, and the models with different levels are used for processing speech cases with different difficulties; the invention solves the problem of limited computing resources required by large model modeling by a hierarchical reasoning mode, greatly reduces the complexity of the whole reasoning, saves the computing resources and reduces the service delay.

Description

Rapid voice recognition method based on hierarchical recognition
Technical Field
The invention relates to the field of voice recognition, in particular to a rapid voice recognition method based on hierarchical recognition.
Background
With the continuous improvement of computational power and the accumulation of data, the effect of a voice recognition system is obviously improved, and an end-to-end modeling method represented by CTC and encoder-decoder is more sufficient in utilization of mass data and has stronger modeling capability. In the field of speech recognition, a Conformer model adopting convolution enhancement is proposed by Google in 2020, and the accuracy of speech recognition is continuously refreshed, so that the Conformer model becomes a conventional method for acoustic modeling of current speech recognition. Under massive training data, the multi-layer former model has more parameter quantities and is proved to have stronger modeling capability. In general, the 12-24-layer Conformer model has stronger modeling capability under the support of massive training data as the number of model layers increases. However, as the number of parameters increases, the more computation is required in the model inference process when performing speech recognition, the more energy consumption, delay and resources are required, and this limits the application of large models in practical scenarios. In order to enable the deep transformer network to be applied to the speech recognition task, the number of hidden layer neurons or the number of parameters and the amount of calculation are usually reduced by using methods such as matrix decomposition, but these methods usually bring a certain performance loss. Meanwhile, the computational complexity of model reasoning in the voice recognition process still shows linear growth along with the increase of the number of layers of the former.
Disclosure of Invention
The present invention is directed to a method for fast speech recognition based on hierarchical recognition, so as to solve the aforementioned problems in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a quick speech recognition method based on hierarchical recognition comprises the following steps:
s1, dividing the deep network of the Conformer model, dividing the deep network with R layer into shallow networks every M layers according to the sequence from bottom layer to top layer, and leading out a tap from the last layer of identification network in each shallow network to decode by using a shallow Decoder to form F Conformer models with shallow networks; wherein R and M represent the number of network hierarchies, F represents the number of former models with shallow networks, and F is R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a speech recognition model with F shallow layer networks, carrying out level-by-level recognition on input speech according to the level of the shallow layer network in the speech recognition model, and judging the difficulty level of the input speech;
s3, according to the entropy of the input voice passing through the shallow network, judging the difficulty level of the input voice, and judging whether the input voice needs to be subjected to the calculation and recognition of the next-level shallow network; the smaller the entropy value of the shallow network output is, the more certain the speech recognition result of the shallow network output is, the smaller the ambiguity of the speech recognition result is; conversely, the larger the entropy value is, the more uncertain the speech recognition result output by the shallow network is, the larger the ambiguity of the speech recognition result is, the network with stronger modeling capability is required to recognize.
A rapid speech recognition method based on hierarchical recognition comprises the following steps
S1, dividing deep networks of the Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from the bottom layer to the top layer, and leading out a tap from the last layer of identification network in each shallow network to decode by using a shallow Decoder to form F Conformer models with shallow networks; wherein, R and M represent the number of network layers, F represents the number of former models with shallow networks, and F is R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a speech recognition model with F shallow layer networks, carrying out level-by-level recognition on input speech according to the level of the shallow layer network in the speech recognition model, and judging the difficulty level of the input speech;
s3, selecting two adjacent shallow networks according to the order of the shallow networks from small to large, and judging the consistency of the speech recognition results output by the two shallow networks; when the voice recognition results of two adjacent shallow networks pass consistency judgment, the acoustic modeling is considered to be complete; otherwise, a network with stronger modeling capability is required for speech recognition.
Preferably, the calculation formula of the entropy of the input voice passing through the shallow network is as follows:
Figure BDA0003659249200000021
wherein E represents entropy, L represents number of speech frames, N represents total number of units required for speech recognition in input speech, and p li Indicating the probability of the i-th speech recognition unit performing speech recognition in the l-th frame in the input speech.
Preferably, the judgment basis of the difficulty level of speech in step S3 is: setting an entropy threshold, when the entropy value output by the shallow network of the f-th level is smaller than the threshold, determining the difficulty degree of the input voice, and judging that the voice recognition result of the input voice output by the shallow network of the f-th level is a final result; otherwise, the input voice continues to be subjected to voice recognition step by step through the shallow network until the entropy value of the shallow network is smaller than the threshold value or the level of the shallow network is F level; wherein f represents a level of the shallow network in the speech recognition model.
Preferably, the judgment basis of the difficulty level of speech in step S3 is: setting a recognition result difference threshold, namely when the difference of the speech recognition results of the shallow networks of two levels is smaller than the difference threshold, namely diff (result1, result2) < threshold, then the acoustic modeling is considered to be complete; if the difference between the two speech recognition results is greater than the difference threshold value, namely diff (result1, result2) is greater than or equal to threshold, the speech recognition model formed by the current shallow network is considered to have insufficient modeling capability for speech, and speech recognition is continuously performed step by step through the shallow model upwards until the speech recognition results of two adjacent shallow networks are judged through consistency or the level of the shallow network is the F-th level.
Preferably, in step S3, when the speech recognition results of two adjacent shallow networks pass the consistency determination, the speech recognition results of the current two levels of the shallow networks are linearly weighted as the final output result.
Preferably, in step S1, the deep network of the former model is divided into: for the R-layer deep network, a tap is led out every M layers and decoded by using a shallow Decoder, and F shallow decoders are arranged to form F shallow networks.
Preferably, for the speech recognition model with the shallow networks formed in step S2, R/M branches are used to perform multitask joint training on the shallow networks with progressive levels.
Preferably, for models with different network depths, parameters are shared in the shallow network.
The invention has the beneficial effects that: the invention discloses a rapid speech recognition method based on hierarchical recognition, which divides speech with different difficulties and disassembles models step by step, so that the models with different levels can process cases with different difficulties; the invention solves the problem of limited computing resources required by large model modeling by a hierarchical reasoning mode, greatly reduces the complexity of the whole reasoning, saves the computing resources and reduces the service delay.
Drawings
FIG. 1 is a flow diagram of a fast speech recognition process for hierarchical recognition;
FIG. 2 is a block diagram of a fast speech recognition architecture for hierarchical recognition;
FIG. 3 is a diagram of a decision criteria structure for hierarchical recognition;
fig. 4 is a diagram of a resulting metric structure of hierarchical recognition.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
A rapid voice recognition method based on hierarchical recognition is characterized in that a multi-layer former voice model is divided into multiple stages, the input voice recognition difficulty is judged from the perspective of the voice model, and the voice recognition level is judged according to the voice recognition difficulty; as shown in fig. 1, the method comprises the following steps:
s1, dividing the deep network of the comfort model, dividing the deep network with R layer into shallow networks every M layers according to the sequence from bottom layer to top layer, and leading out a tap from the last layer of identification network in each shallow network to decode by using a shallow Decoder to form R/M comfort models with shallow networks;
the specific implementation mode is as follows: for a deep network of an R layer, a shallow Decoder is used to decode every M layers of extracted taps, and F shallow decoders are provided, where R and M represent the number of network layers, F represents the number of former models with a shallow network, and F is R/M. As shown in fig. 2, for the 12-layer former model, shallow decoders are connected every 3 layers, and 4 shallow decoders are sequentially arranged from the bottom layer to the top layer, thereby forming 4 former models having shallow networks.
S2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a voice recognition model with R/M shallow layer networks, carrying out multi-task joint training on the shallow layer networks with progressive levels by adopting R/M branch voices, wherein each shallow layer network is internally provided with shared parameters;
s3, recognizing input voice step by step according to the level of the shallow network in the voice recognition model, and judging the difficulty level of the input voice;
s41, as shown in fig. 3, determining difficulty of a voice according to entropy of the voice passing through the shallow network, and determining whether the voice needs to be calculated and identified by the shallow network at a next level;
the calculation formula of the entropy of the voice passing through the shallow network is as follows:
Figure BDA0003659249200000051
wherein E represents entropy, L represents number of speech frames, N represents total number of units required for speech recognition in input speech, and p li Representing the probability of the ith speech recognition unit in the input speech to perform speech recognition in the l frame; the smaller the entropy value of the shallow network output is, the more certain the speech recognition result of the shallow network output is, and the smaller the ambiguity of the speech recognition result is; conversely, the larger the entropy value is, the more uncertain the speech recognition result output by the shallow network is, the greater the ambiguity of the speech recognition result is, and the network with stronger modeling capability is required to recognize.
Setting a threshold value of entropy, determining the difficulty degree of the voice when the entropy value output by the f-th level shallow network is smaller than the threshold value, and judging that the voice recognition result output by the f-th level shallow network is the final result; otherwise, the voice continues to be recognized upwards step by step through the shallow network until the entropy value of the shallow network is smaller than the threshold value or the level of the shallow network is F level; wherein f represents a level of the shallow network in the speech recognition model.
By using the method for determining the difficulty of speech in step S41, even if the entropy outputted by the current shallow network is smaller than the threshold, an error may still occur in the speech recognition result outputted correspondingly; based on the method, the following voice difficulty judgment method can be adopted:
s42, as shown in fig. 4, selecting two adjacent shallow networks of different levels according to the order of the shallow networks from small to large, and determining the consistency of the speech recognition results output by the two shallow networks; setting a recognition result difference threshold, namely, when the difference of the voice recognition results of the two levels of the shallow network is smaller than the difference threshold
When diff (result1, result2) < threshold, the acoustic modeling is considered to be complete, the recognition result tends to converge, and the speech recognition result of the shallow network using the current two levels is used as the final output result through linear weighting; if the difference between the two speech recognition results is greater than the difference threshold value, namely diff (result1, result2) is greater than or equal to threshold, the speech recognition model formed by the current shallow network is considered to have insufficient modeling capability for speech, and speech recognition is continuously performed step by step through the shallow model upwards until the speech recognition results of two adjacent shallow networks are judged through consistency or the level of the shallow network is the F-th level.
Compared with the method in step S41, the speech difficulty determination method in step S42 has a larger calculation amount because at least the previous two-level speech recognition needs to be calculated, but the recognition accuracy is more guaranteed, and the weighted fusion of the two-level recognition results can also play a certain role of model averaging, so that the decoding accuracy of the speech recognition model at the same level can be improved.
Examples
In the actual voice recognition tasks, most voice recognition tasks are simple cases with clean backgrounds and clean pronunciations, and the voice recognition tasks can be completed by using a shallow network aiming at the voice recognition tasks.
Taking the deep network of the former model as 12 layers as an example, assuming that 20% of the selected detected speeches need to use the deep network for speech recognition, even if the determination method in step S42 is used, about 80% by 50% to 40% of the calculation amount can be saved, and the benefit is considerable;
taking a deep network of a Conformer model as an example of 24 layers, selecting 20% of cases in the detected voice requires the deep network to perform voice recognition, and assuming that 20% of cases need 24 layers of networks and 80% of cases do not exceed 12 layers of networks, the increased complexity does not exceed 100% by 20% or 20% of calculated amount, which is far less than 100% of calculated amount of all cases directly increased by 24 layers of networks.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
the invention discloses a rapid speech recognition method based on hierarchical recognition, which divides speech with different difficulties and disassembles models step by step, so that the models with different levels can process cases with different difficulties; the invention solves the problem of limited computing resources required by large model modeling by a hierarchical reasoning mode, greatly reduces the complexity of the whole reasoning, saves the computing resources and reduces the service delay.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims (9)

1. A quick speech recognition method based on hierarchical recognition is characterized by comprising the following steps:
s1, dividing deep networks of the Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from the bottom layer to the top layer, and leading out a tap from the last layer of identification network in each shallow network to decode by using a shallow Decoder to form F Conformer models with shallow networks; wherein, R and M represent the number of network layers, F represents the number of Conformer models with shallow networks, and F is R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a speech recognition model with F shallow layer networks, carrying out level-by-level recognition on input speech according to the level of the shallow layer network in the speech recognition model, and judging the difficulty level of the input speech;
s3, according to the entropy of the input voice passing through the shallow network, judging the difficulty degree of the input voice, and judging whether the input voice needs to be calculated and identified by the shallow network of the next level; the smaller the entropy value of the shallow network output is, the more certain the speech recognition result of the shallow network output is, the smaller the ambiguity of the speech recognition result is; conversely, the larger the entropy value is, the more uncertain the speech recognition result output by the shallow network is, the larger the ambiguity of the speech recognition result is, the network with stronger modeling capability is required to recognize.
2. A rapid speech recognition method based on hierarchical recognition is characterized by comprising the following steps
S1, dividing deep networks of the Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from the bottom layer to the top layer, and leading out a tap from the last layer of identification network in each shallow network to decode by using a shallow Decoder to form F Conformer models with shallow networks; wherein, R and M represent the number of network layers, F represents the number of former models with shallow networks, and F is R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a speech recognition model with F shallow layer networks, carrying out level-by-level recognition on input speech according to the level of the shallow layer network in the speech recognition model, and judging the difficulty level of the input speech;
s3, selecting two adjacent shallow networks according to the order of the shallow networks from small to large, and judging the consistency of the speech recognition results output by the two shallow networks; when the voice recognition results of two adjacent shallow networks pass consistency judgment, the acoustic modeling is considered to be complete; otherwise, a network with stronger modeling capability is required for speech recognition.
3. The method according to claim 1, wherein the entropy of the input speech passing through the shallow network is calculated by the following formula:
Figure FDA0003659249190000021
wherein E represents entropy, L represents number of speech frames, N represents total number of units required for speech recognition in input speech, and p li Indicating the probability of the i-th speech recognition unit performing speech recognition in the l-th frame in the input speech.
4. The method according to claim 1, wherein the difficulty level of speech is determined in step S3 according to: setting an entropy threshold, when the entropy value output by the shallow network of the f-th level is smaller than the threshold, determining the difficulty degree of the input voice, and judging that the voice recognition result of the input voice output by the shallow network of the f-th level is a final result; otherwise, the input voice continues to be subjected to voice recognition step by step through the shallow network until the entropy value of the shallow network is smaller than the threshold value or the level of the shallow network is the F-th level; wherein f represents a level of the shallow network in the speech recognition model.
5. The method according to claim 2, wherein the difficulty level of speech is determined in step S3 according to: setting a recognition result difference threshold, namely when the speech recognition result difference of the shallow networks of two levels is less than the difference threshold, namely diff (result1, result2) < threshold, the acoustic modeling is considered to be complete; if the difference of the two speech recognition results is greater than the difference threshold value, namely diff (result1, result2) is greater than or equal to threshold, the speech recognition model formed by the current shallow network is considered to have insufficient modeling capacity for speech, and speech recognition is continuously carried out step by step upwards through the shallow model until the speech recognition results of two adjacent shallow networks are judged through consistency or the level of the shallow network is level F.
6. The method according to claim 2, wherein in step S3, when the speech recognition results of two adjacent shallow networks pass the consistency determination, the speech recognition results of the current two levels of the shallow networks are linearly weighted as the final output result.
7. The method for rapid speech recognition based on hierarchical recognition according to any one of claims 1 or 2, wherein the deep network of the former model is divided in step S1 in a manner that: for the R-layer deep network, a tap is led out every M layers and decoded by using a shallow Decoder, and F shallow decoders are arranged to form F shallow networks.
8. The method for fast speech recognition based on hierarchical recognition according to any one of claims 1 or 2, wherein R/M branches are used for multi-task joint training of the shallow networks with progressive levels for the speech recognition model with the shallow networks formed in step S2.
9. The method of claim 1 or 2, wherein the models with different network depths share parameters in a shallow network.
CN202210571189.4A 2022-05-24 2022-05-24 Rapid voice recognition method based on hierarchical recognition Active CN114974228B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210571189.4A CN114974228B (en) 2022-05-24 2022-05-24 Rapid voice recognition method based on hierarchical recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210571189.4A CN114974228B (en) 2022-05-24 2022-05-24 Rapid voice recognition method based on hierarchical recognition

Publications (2)

Publication Number Publication Date
CN114974228A true CN114974228A (en) 2022-08-30
CN114974228B CN114974228B (en) 2023-04-11

Family

ID=82955743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210571189.4A Active CN114974228B (en) 2022-05-24 2022-05-24 Rapid voice recognition method based on hierarchical recognition

Country Status (1)

Country Link
CN (1) CN114974228B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0667698A (en) * 1992-06-19 1994-03-11 Seiko Epson Corp Speech recognizing device
US20040015329A1 (en) * 2002-07-19 2004-01-22 Med-Ed Innovations, Inc. Dba Nei, A California Corporation Method and apparatus for evaluating data and implementing training based on the evaluation of the data
CN105393302A (en) * 2013-07-17 2016-03-09 三星电子株式会社 Multi-level speech recognition
CN106844343A (en) * 2017-01-20 2017-06-13 上海傲硕信息科技有限公司 Instruction results screening plant
CN109074808A (en) * 2018-07-18 2018-12-21 深圳魔耳智能声学科技有限公司 Sound control method, control device and storage medium
CN109192194A (en) * 2018-08-22 2019-01-11 北京百度网讯科技有限公司 Voice data mask method, device, computer equipment and storage medium
CN109785831A (en) * 2017-11-14 2019-05-21 奥迪股份公司 Check method, control device and the motor vehicle of the vehicle-mounted voice identifier of motor vehicle
JP2019095600A (en) * 2017-11-22 2019-06-20 日本電信電話株式会社 Acoustic model learning device, speech recognition device, and method and program for them
CN110413997A (en) * 2019-07-16 2019-11-05 深圳供电局有限公司 For the new word discovery method and its system of power industry, readable storage medium storing program for executing
CN110472730A (en) * 2019-08-07 2019-11-19 交叉信息核心技术研究院(西安)有限公司 A kind of distillation training method and the scalable dynamic prediction method certainly of convolutional neural networks
CN112182154A (en) * 2020-09-25 2021-01-05 中国人民大学 Personalized search model for eliminating keyword ambiguity by utilizing personal word vector
CN112434163A (en) * 2020-11-30 2021-03-02 北京沃东天骏信息技术有限公司 Risk identification method, model construction method, risk identification device, electronic equipment and medium
CN112908301A (en) * 2021-01-27 2021-06-04 科大讯飞(上海)科技有限公司 Voice recognition method, device, storage medium and equipment
CN113807499A (en) * 2021-09-15 2021-12-17 清华大学 Lightweight neural network model training method, system, device and storage medium
CN113870845A (en) * 2021-09-26 2021-12-31 平安科技(深圳)有限公司 Speech recognition model training method, device, equipment and medium

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0667698A (en) * 1992-06-19 1994-03-11 Seiko Epson Corp Speech recognizing device
US20040015329A1 (en) * 2002-07-19 2004-01-22 Med-Ed Innovations, Inc. Dba Nei, A California Corporation Method and apparatus for evaluating data and implementing training based on the evaluation of the data
CN105393302A (en) * 2013-07-17 2016-03-09 三星电子株式会社 Multi-level speech recognition
CN106844343A (en) * 2017-01-20 2017-06-13 上海傲硕信息科技有限公司 Instruction results screening plant
CN109785831A (en) * 2017-11-14 2019-05-21 奥迪股份公司 Check method, control device and the motor vehicle of the vehicle-mounted voice identifier of motor vehicle
JP2019095600A (en) * 2017-11-22 2019-06-20 日本電信電話株式会社 Acoustic model learning device, speech recognition device, and method and program for them
CN109074808A (en) * 2018-07-18 2018-12-21 深圳魔耳智能声学科技有限公司 Sound control method, control device and storage medium
CN109192194A (en) * 2018-08-22 2019-01-11 北京百度网讯科技有限公司 Voice data mask method, device, computer equipment and storage medium
CN110413997A (en) * 2019-07-16 2019-11-05 深圳供电局有限公司 For the new word discovery method and its system of power industry, readable storage medium storing program for executing
CN110472730A (en) * 2019-08-07 2019-11-19 交叉信息核心技术研究院(西安)有限公司 A kind of distillation training method and the scalable dynamic prediction method certainly of convolutional neural networks
WO2021023202A1 (en) * 2019-08-07 2021-02-11 交叉信息核心技术研究院(西安)有限公司 Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method
CN112182154A (en) * 2020-09-25 2021-01-05 中国人民大学 Personalized search model for eliminating keyword ambiguity by utilizing personal word vector
CN112434163A (en) * 2020-11-30 2021-03-02 北京沃东天骏信息技术有限公司 Risk identification method, model construction method, risk identification device, electronic equipment and medium
CN112908301A (en) * 2021-01-27 2021-06-04 科大讯飞(上海)科技有限公司 Voice recognition method, device, storage medium and equipment
CN113807499A (en) * 2021-09-15 2021-12-17 清华大学 Lightweight neural network model training method, system, device and storage medium
CN113870845A (en) * 2021-09-26 2021-12-31 平安科技(深圳)有限公司 Speech recognition model training method, device, equipment and medium

Also Published As

Publication number Publication date
CN114974228B (en) 2023-04-11

Similar Documents

Publication Publication Date Title
CN110164476B (en) BLSTM voice emotion recognition method based on multi-output feature fusion
CN112613303B (en) Knowledge distillation-based cross-modal image aesthetic quality evaluation method
CN112685597B (en) Weak supervision video clip retrieval method and system based on erasure mechanism
CN112633010B (en) Aspect-level emotion analysis method and system based on multi-head attention and graph convolution network
CN110619319A (en) Improved MTCNN model-based face detection method and system
Fang et al. EdgeKE: an on-demand deep learning IoT system for cognitive big data on industrial edge devices
CN111612147A (en) Quantization method of deep convolutional network
CN110443784B (en) Effective significance prediction model method
CN112733964B (en) Convolutional neural network quantization method for reinforcement learning automatic perception weight distribution
CN111950715A (en) 8-bit integer full-quantization inference method and device based on self-adaptive dynamic shift
CN116863320B (en) Underwater image enhancement method and system based on physical model
CN107910009B (en) Code element rewriting information hiding detection method and system based on Bayesian inference
CN115588237A (en) Three-dimensional hand posture estimation method based on monocular RGB image
CN116109920A (en) Remote sensing image building extraction method based on transducer
CN114943335A (en) Layer-by-layer optimization method of ternary neural network
CN114974228B (en) Rapid voice recognition method based on hierarchical recognition
CN117494762A (en) Training method of student model, material processing method, device and electronic equipment
CN116167015A (en) Dimension emotion analysis method based on joint cross attention mechanism
CN115828863A (en) Automatic generation method of emergency plan in chaotic engineering test scene
CN112380874B (en) Multi-person-to-speech analysis method based on graph convolution network
CN114758645A (en) Training method, device and equipment of speech synthesis model and storage medium
CN112184846A (en) Image generation method and device, computer equipment and readable storage medium
CN113076123A (en) Adaptive template updating system and method for target tracking
Wang et al. Exploring quantization in few-shot learning
CN117808083B (en) Distributed training communication method, device, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant