CN114974228B - Rapid voice recognition method based on hierarchical recognition - Google Patents

Rapid voice recognition method based on hierarchical recognition Download PDF

Info

Publication number
CN114974228B
CN114974228B CN202210571189.4A CN202210571189A CN114974228B CN 114974228 B CN114974228 B CN 114974228B CN 202210571189 A CN202210571189 A CN 202210571189A CN 114974228 B CN114974228 B CN 114974228B
Authority
CN
China
Prior art keywords
shallow
network
networks
level
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210571189.4A
Other languages
Chinese (zh)
Other versions
CN114974228A (en
Inventor
吕志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mingri Dream Beijing Technology Co ltd
Original Assignee
Mingri Dream Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mingri Dream Beijing Technology Co ltd filed Critical Mingri Dream Beijing Technology Co ltd
Priority to CN202210571189.4A priority Critical patent/CN114974228B/en
Publication of CN114974228A publication Critical patent/CN114974228A/en
Application granted granted Critical
Publication of CN114974228B publication Critical patent/CN114974228B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A rapid speech recognition method based on hierarchical recognition is characterized in that speech with different difficulties is shunted, models are disassembled step by step, and the models with different levels are used for processing speech cases with different difficulties; the invention solves the problem of limited computing resources required by large model modeling by a hierarchical reasoning mode, greatly reduces the complexity of the whole reasoning, saves the computing resources and reduces the service delay.

Description

Rapid voice recognition method based on hierarchical recognition
Technical Field
The invention relates to the field of voice recognition, in particular to a rapid voice recognition method based on hierarchical recognition.
Background
With the continuous improvement of computing power and the accumulation of data, the effect of a voice recognition system is obviously improved, and an end-to-end modeling method represented by CTC and encorder-decoder is more sufficient in utilization of mass data and has stronger modeling capability. In the field of speech recognition, a Conformer model adopting convolution enhancement is proposed by Google in 2020, and the accuracy of speech recognition is continuously refreshed, so that the Conformer model becomes a conventional method for acoustic modeling of current speech recognition. Under massive training data, the multi-layer former model has more parameter quantities and is proved to have stronger modeling capability. In general, the 12-24-layer Conformer model has stronger modeling capability under the support of massive training data as the number of model layers increases. However, as the number of parameters increases, the more computation is required in the model inference process when performing speech recognition, the more energy consumption, delay and resources are required, and this limits the application of large models in practical scenarios. In order to enable the deep layer former network to be applied to the voice recognition task, methods such as reducing the number of hidden layer neurons or matrix decomposition are generally adopted to reduce the number of parameters and the amount of calculation, but these methods also generally bring a certain performance loss. Meanwhile, the computational complexity of model reasoning in the voice recognition process still presents linear growth along with the increase of the number of layers of the former.
Disclosure of Invention
The present invention aims to provide a method for fast speech recognition based on hierarchical recognition, thereby solving the aforementioned problems in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a quick speech recognition method based on hierarchical recognition comprises the following steps:
s1, dividing deep networks of a Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from a bottom layer to a top layer, and leading out a tap from a last layer identification network in each shallow network to decode by using a shallow Decoder to form F Conformer models with shallow networks; wherein R and M represent the number of network levels, F represents the number of Conformer models with shallow networks, F = R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a voice recognition model with F shallow layer networks, carrying out level-by-level recognition on input voice according to the level of the shallow layer network in the voice recognition model, and judging the difficulty level of the input voice;
s3, judging the difficulty degree of the input voice according to the entropy of the input voice passing through the shallow network, and judging whether the input voice needs to be calculated and identified by the shallow network at the next level; the smaller the entropy value of the shallow network output is, the more certain the speech recognition result of the shallow network output is, the smaller the ambiguity of the speech recognition result is; conversely, the larger the entropy value is, the more uncertain the speech recognition result output by the shallow network is, the larger the ambiguity of the speech recognition result is, the network with stronger modeling capability is required to recognize.
A rapid speech recognition method based on hierarchical recognition comprises the following steps
S1, dividing deep networks of a Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from a bottom layer to a top layer, leading out a tap from the last layer of identification network in each shallow network, and decoding by using a shallow Decoder to form F Conformer models with shallow networks; wherein, R and M represent the number of network layers, F represents the number of Conformer models with shallow networks, and F = R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a voice recognition model with F shallow layer networks, carrying out level-by-level recognition on input voice according to the level of the shallow layer network in the voice recognition model, and judging the difficulty level of the input voice;
s3, selecting two adjacent shallow networks according to the order of the shallow networks from small to large in level, and judging the consistency of the voice recognition results output by the two shallow networks; when the voice recognition results of two adjacent shallow networks pass consistency judgment, the acoustic modeling is considered to be complete; otherwise, a network with stronger modeling capability is required for speech recognition.
Preferably, the calculation formula of the entropy of the input voice passing through the shallow network is as follows:
Figure BDA0003659249200000021
wherein E represents entropy, L represents number of speech frames, N represents total number of units required for speech recognition in input speech, and p li Indicating the probability of the i-th speech recognition unit performing speech recognition in the l-th frame in the input speech.
Preferably, the judgment basis of the difficulty level of the speech in step S3 is: setting an entropy threshold, when the entropy value output by the shallow network of the f-th level is smaller than the threshold, determining the difficulty degree of the input voice, and judging that the voice recognition result of the input voice output by the shallow network of the f-th level is a final result; otherwise, the input voice continues to be subjected to voice recognition step by step through the shallow network until the entropy value of the shallow network is smaller than the threshold value or the level of the shallow network is the F-th level; where f represents the level of the shallow network in the speech recognition model.
Preferably, the judgment basis of the difficulty level of the speech in step S3 is: setting a recognition result difference threshold, namely when the difference of the speech recognition results of the shallow networks of the two levels is smaller than the difference threshold, namely diff (result 1, result 2) < threshold, the acoustic modeling is considered to be complete; if the difference between the two speech recognition results is greater than the difference threshold value, namely diff (result 1, result 2) is greater than or equal to threshold, the speech recognition model formed by the current shallow network is considered to have insufficient modeling capability for speech, and speech recognition is continuously carried out step by step through the shallow model upwards until the speech recognition results of two adjacent shallow networks are judged through consistency or the level of the shallow network is the F-th level.
Preferably, in step S3, when the speech recognition results of two adjacent shallow networks pass the consistency determination, the speech recognition results of the current two levels of the shallow networks are linearly weighted as the final output result.
Preferably, in step S1, the deep network of the transformer model is divided into: for the R-layer deep network, a tap is led out every M layers and decoded by using a shallow Decoder, and F shallow decoders are arranged to form F shallow networks.
Preferably, for the speech recognition model with the shallow network formed in step S2, R/M branches are used to perform multi-task joint training on the shallow network with progressive levels.
Preferably, for models with different network depths, parameters are shared in the shallow network.
The invention has the beneficial effects that: the invention discloses a rapid speech recognition method based on hierarchical recognition, which divides speech with different difficulties and disassembles models step by step, so that the models with different levels can process cases with different difficulties; the invention solves the problem of limited computing resources required by large model modeling by a hierarchical reasoning mode, greatly reduces the complexity of the whole reasoning, saves the computing resources and reduces the service delay.
Drawings
FIG. 1 is a flow diagram of a fast speech recognition process for hierarchical recognition;
FIG. 2 is a block diagram of a fast speech recognition architecture for hierarchical recognition;
FIG. 3 is a diagram of a decision criteria structure for hierarchical recognition;
fig. 4 is a diagram of a resulting metric structure of hierarchical recognition.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
A rapid voice recognition method based on hierarchical recognition is characterized in that a multi-layer former voice model is divided into multiple stages, the input voice recognition difficulty is judged from the perspective of the voice model, and the voice recognition level is judged according to the voice recognition difficulty; as shown in fig. 1, the method comprises the following steps:
s1, dividing deep networks of a Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from a bottom layer to a top layer, and leading out a tap from a last layer identification network in each shallow network to decode by using a shallow Decoder to form R/M Conformer models with shallow networks;
the specific implementation mode is as follows: for a deep network of an R layer, decoding is performed by using a shallow Decoder every other tap led out from the M layer, and F shallow decoders are configured, where R and M represent the number of network layers, F represents the number of provider models with a shallow network, and F = R/M. As shown in fig. 2, for the 12-layer former model, shallow decoders are connected every 3 layers, and 4 shallow decoders are sequentially arranged from the bottom layer to the top layer, thereby forming 4 former models having shallow networks.
S2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a voice recognition model with R/M shallow layer networks, carrying out multi-task joint training on the shallow layer networks with progressive levels by adopting R/M branch voices, wherein each shallow layer network is internally provided with shared parameters;
s3, recognizing input voice step by step according to the level of the shallow network in the voice recognition model, and judging the difficulty level of the input voice;
s41, as shown in FIG. 3, according to the entropy of the voice passing through the shallow network, the difficulty level of the voice is judged, and whether the voice needs to be calculated and identified by the shallow network of the next level is judged;
the calculation formula of the entropy of the voice passing through the shallow network is as follows:
Figure BDA0003659249200000051
wherein E represents entropy, L represents number of speech frames, N represents total number of units required for speech recognition in input speech, and p li Representing the probability of the ith voice recognition unit in the input voice to perform voice recognition in the ith frame; the smaller the entropy value of the shallow network output is, the more certain the speech recognition result of the shallow network output is, the smaller the ambiguity of the speech recognition result is; conversely, the larger the entropy value is, the more uncertain the speech recognition result output by the shallow network is, and the greater the ambiguity of the speech recognition result is, the network with stronger modeling capability is required to recognize.
Setting a threshold value of entropy, determining the difficulty degree of the voice when the entropy value output by the f-th level shallow network is smaller than the threshold value, and judging that the voice recognition result output by the f-th level shallow network is the final result; otherwise, the voice continues to be recognized upwards through the shallow network stage by stage until the entropy value of the shallow network is smaller than the threshold value or the level of the shallow network is F level; wherein f represents a level of the shallow network in the speech recognition model.
By using the voice difficulty determination method in step S41, even if the entropy outputted by the current shallow network is smaller than the threshold, an error may still occur in the correspondingly outputted voice recognition result; based on the method, the following voice difficulty judging method can be adopted:
s42, as shown in FIG. 4, selecting two adjacent shallow networks according to the order of the shallow networks from small to large, and judging the consistency of the speech recognition results output by the two shallow networks; setting a recognition result difference threshold, namely, when the difference of the voice recognition results of the two levels of the shallow network is smaller than the difference threshold
When diff (result 1, result 2) < threshold, the acoustic modeling is considered to be complete, the recognition result tends to converge, and the speech recognition results of the current two levels of the shallow network are used as the final output result through linear weighting; if the difference between the two speech recognition results is greater than the difference threshold value, namely diff (result 1, result 2) is greater than or equal to threshold, the speech recognition model formed by the current shallow network is considered to have insufficient modeling capability for speech, and speech recognition is continuously carried out step by step through the shallow model upwards until the speech recognition results of two adjacent shallow networks are judged through consistency or the level of the shallow network is the F-th level.
Compared with the speech difficulty judging method in the step S41, the speech difficulty judging method in the step S42 has a larger calculation amount because at least the previous two-stage speech recognition needs to be calculated, but the accuracy of the recognition is more guaranteed, and meanwhile, the weighted fusion of the two-stage recognition results can also play a certain model averaging role, so that the decoding accuracy of the speech recognition model in the same level can be improved.
Examples
In the actual voice recognition task, most voice recognition tasks are simple cases with clean backgrounds and clean pronunciations, and the voice recognition task can be completed by using a shallow network aiming at the voice recognition task.
Taking the deep network of the former model as 12 layers as an example, assuming that 20% of the selected detected speeches need to use the deep network for speech recognition, even if the determination method in step S42 is used, the calculation amount of about 80% by 50% =40% can be saved, and the profit is considerable;
taking the deep network of the former model as 24 layers as an example, selecting 20% of cases in the detected speech needs to use the deep network for speech recognition, and assuming that 20% of cases need to use 24 layers of networks and 80% of cases need not exceed 12 layers of networks, the increased complexity does not exceed 100% × 20% =20% calculated amount, which is far less than 100% calculated amount of all cases directly increased by 24 layers of networks.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
the invention discloses a rapid speech recognition method based on hierarchical recognition, which is characterized in that speech with different difficulties is shunted to disassemble models step by step, so that the models with different levels can process cases with different difficulties; the invention solves the problem of limited computing resources required by large model modeling by a hierarchical reasoning mode, greatly reduces the complexity of the whole reasoning, saves the computing resources and reduces the service delay.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims (1)

1. A quick speech recognition method based on hierarchical recognition is characterized by comprising the following steps:
s1, dividing deep networks of a Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from a bottom layer to a top layer, and leading out a tap from a last layer identification network in each shallow network to decode by using a shallow Decoder to form F Conformer models with shallow networks; wherein R and M represent the number of network levels, F represents the number of Conformer models with shallow networks, F = R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a voice recognition model with F shallow layer networks, carrying out level-by-level recognition on input voice according to the level of the shallow layer network in the voice recognition model, and judging the difficulty level of the input voice;
s3, judging the difficulty degree of the input voice according to the entropy of the input voice passing through the shallow network, and judging whether the input voice needs to be calculated and identified by the shallow network at the next level; the smaller the entropy value of the shallow network output is, the more certain the speech recognition result of the shallow network output is, the smaller the ambiguity of the speech recognition result is; on the contrary, the larger the entropy value is, the more uncertain the speech recognition result output by the shallow network is, the larger the ambiguity of the speech recognition result is, the network with stronger modeling capability is required to be recognized;
wherein, the calculation formula of the entropy of the input voice passing through the shallow network is as follows:
Figure FDA0004127074730000011
wherein E represents entropy, L represents number of speech frames, N represents total number of units required for speech recognition in input speech, and p li Representing the probability of the ith speech recognition unit in the input speech to perform speech recognition in the l frame;
the judgment basis of the speech difficulty degree in the step S3 is as follows: setting an entropy threshold, when the entropy value output by the shallow network of the f-th level is smaller than the threshold, determining the difficulty degree of the input voice, and judging that the voice recognition result of the input voice output by the shallow network of the f-th level is a final result; otherwise, the input voice continues to be subjected to voice recognition step by step through the shallow network until the entropy value of the shallow network is smaller than the threshold value or the level of the shallow network is F level; wherein f represents a level of the shallow network in the speech recognition model;
alternatively, the method comprises the steps of,
s1, dividing deep networks of a Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from a bottom layer to a top layer, leading out a tap from the last layer of identification network in each shallow network, and decoding by using a shallow Decoder to form F Conformer models with shallow networks; wherein, R and M represent the number of network layers, F represents the number of Conformer models with shallow networks, and F = R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a voice recognition model with F shallow layer networks, carrying out level-by-level recognition on input voice according to the level of the shallow layer network in the voice recognition model, and judging the difficulty level of the input voice;
s3, selecting two adjacent shallow networks according to the order of the shallow networks from small to large in level, and judging the consistency of the voice recognition results output by the two shallow networks; when the voice recognition results of two adjacent shallow networks pass consistency judgment, the acoustic modeling is considered to be complete; otherwise, a network with stronger modeling capability is needed for voice recognition;
the judgment basis of the difficulty level of the voice in step S3 is as follows: setting a recognition result difference threshold, namely when the difference of the speech recognition results of the shallow networks of two levels is smaller than the difference threshold, namely diff (result 1, result 2) < threshold, the acoustic modeling is considered to be complete; if the difference between the two speech recognition results is greater than the difference threshold value, namely diff (result 1, result 2) is greater than or equal to threshold, the speech recognition model formed by the current shallow network is considered to have insufficient modeling capacity for speech, and the speech recognition is continuously carried out step by step through the shallow network upwards until the speech recognition results of the two adjacent shallow networks are judged through consistency or the level of the shallow network is the F-th level;
in step S3, when the voice recognition results of two adjacent shallow networks pass consistency judgment, the voice recognition results of the current two levels of shallow networks are used as final output results through linear weighting;
in the step S1, the deep network of the transformer model is divided in the following manner: for the deep network of the R layer, a tap is led out every M layers and decoded by using a shallow Decoder, and F shallow decoders are arranged to form F shallow networks;
aiming at the voice recognition model with the shallow network formed in the step S2, performing multi-task joint training on the shallow network with progressive levels by adopting R/M branches;
for models with different network depths, all parameters are shared in the shallow network.
CN202210571189.4A 2022-05-24 2022-05-24 Rapid voice recognition method based on hierarchical recognition Active CN114974228B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210571189.4A CN114974228B (en) 2022-05-24 2022-05-24 Rapid voice recognition method based on hierarchical recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210571189.4A CN114974228B (en) 2022-05-24 2022-05-24 Rapid voice recognition method based on hierarchical recognition

Publications (2)

Publication Number Publication Date
CN114974228A CN114974228A (en) 2022-08-30
CN114974228B true CN114974228B (en) 2023-04-11

Family

ID=82955743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210571189.4A Active CN114974228B (en) 2022-05-24 2022-05-24 Rapid voice recognition method based on hierarchical recognition

Country Status (1)

Country Link
CN (1) CN114974228B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019095600A (en) * 2017-11-22 2019-06-20 日本電信電話株式会社 Acoustic model learning device, speech recognition device, and method and program for them
WO2021023202A1 (en) * 2019-08-07 2021-02-11 交叉信息核心技术研究院(西安)有限公司 Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method
CN113807499A (en) * 2021-09-15 2021-12-17 清华大学 Lightweight neural network model training method, system, device and storage medium

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3467556B2 (en) * 1992-06-19 2003-11-17 セイコーエプソン株式会社 Voice recognition device
US6795793B2 (en) * 2002-07-19 2004-09-21 Med-Ed Innovations, Inc. Method and apparatus for evaluating data and implementing training based on the evaluation of the data
US9305554B2 (en) * 2013-07-17 2016-04-05 Samsung Electronics Co., Ltd. Multi-level speech recognition
CN106844343B (en) * 2017-01-20 2019-11-19 上海傲硕信息科技有限公司 Instruction results screening plant
DE102017220266B3 (en) * 2017-11-14 2018-12-13 Audi Ag Method for checking an onboard speech recognizer of a motor vehicle and control device and motor vehicle
WO2020014899A1 (en) * 2018-07-18 2020-01-23 深圳魔耳智能声学科技有限公司 Voice control method, central control device, and storage medium
CN109192194A (en) * 2018-08-22 2019-01-11 北京百度网讯科技有限公司 Voice data mask method, device, computer equipment and storage medium
CN110413997B (en) * 2019-07-16 2023-04-07 深圳供电局有限公司 New word discovery method, system and readable storage medium for power industry
CN112182154B (en) * 2020-09-25 2023-10-10 中国人民大学 Personalized search model for eliminating keyword ambiguity by using personal word vector
CN112434163A (en) * 2020-11-30 2021-03-02 北京沃东天骏信息技术有限公司 Risk identification method, model construction method, risk identification device, electronic equipment and medium
CN112908301A (en) * 2021-01-27 2021-06-04 科大讯飞(上海)科技有限公司 Voice recognition method, device, storage medium and equipment
CN113870845A (en) * 2021-09-26 2021-12-31 平安科技(深圳)有限公司 Speech recognition model training method, device, equipment and medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019095600A (en) * 2017-11-22 2019-06-20 日本電信電話株式会社 Acoustic model learning device, speech recognition device, and method and program for them
WO2021023202A1 (en) * 2019-08-07 2021-02-11 交叉信息核心技术研究院(西安)有限公司 Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method
CN113807499A (en) * 2021-09-15 2021-12-17 清华大学 Lightweight neural network model training method, system, device and storage medium

Also Published As

Publication number Publication date
CN114974228A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN109977212B (en) Reply content generation method of conversation robot and terminal equipment
CN112613303B (en) Knowledge distillation-based cross-modal image aesthetic quality evaluation method
CN112685597B (en) Weak supervision video clip retrieval method and system based on erasure mechanism
CN111128137A (en) Acoustic model training method and device, computer equipment and storage medium
US11651578B2 (en) End-to-end modelling method and system
Fang et al. EdgeKE: An on-demand deep learning IoT system for cognitive big data on industrial edge devices
CN111916058A (en) Voice recognition method and system based on incremental word graph re-scoring
CN113257248B (en) Streaming and non-streaming mixed voice recognition system and streaming voice recognition method
CN112733964B (en) Convolutional neural network quantization method for reinforcement learning automatic perception weight distribution
CN112967739B (en) Voice endpoint detection method and system based on long-term and short-term memory network
CN115376491A (en) Voice confidence calculation method, system, electronic equipment and medium
CN114974228B (en) Rapid voice recognition method based on hierarchical recognition
Sterpu et al. Should we hard-code the recurrence concept or learn it instead? Exploring the Transformer architecture for Audio-Visual Speech Recognition
US11501759B1 (en) Method, system for speech recognition, electronic device and storage medium
CN116109920A (en) Remote sensing image building extraction method based on transducer
CN116028823A (en) Loss calculation method and device based on multi-mode semantic interaction
CN115828863A (en) Automatic generation method of emergency plan in chaotic engineering test scene
CN113160801B (en) Speech recognition method, device and computer readable storage medium
CN114758645A (en) Training method, device and equipment of speech synthesis model and storage medium
CN110674280B (en) Answer selection algorithm based on enhanced question importance representation
CN113076123A (en) Adaptive template updating system and method for target tracking
Wang et al. Few-shot short utterance speaker verification using meta-learning
CN112184846A (en) Image generation method and device, computer equipment and readable storage medium
CN117113063B (en) Encoding and decoding system for nanopore signals
CN117217499B (en) Campus electric scooter dispatching optimization method based on multi-source data driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant