CN114974228B - Rapid voice recognition method based on hierarchical recognition - Google Patents
Rapid voice recognition method based on hierarchical recognition Download PDFInfo
- Publication number
- CN114974228B CN114974228B CN202210571189.4A CN202210571189A CN114974228B CN 114974228 B CN114974228 B CN 114974228B CN 202210571189 A CN202210571189 A CN 202210571189A CN 114974228 B CN114974228 B CN 114974228B
- Authority
- CN
- China
- Prior art keywords
- shallow
- network
- networks
- level
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
A rapid speech recognition method based on hierarchical recognition is characterized in that speech with different difficulties is shunted, models are disassembled step by step, and the models with different levels are used for processing speech cases with different difficulties; the invention solves the problem of limited computing resources required by large model modeling by a hierarchical reasoning mode, greatly reduces the complexity of the whole reasoning, saves the computing resources and reduces the service delay.
Description
Technical Field
The invention relates to the field of voice recognition, in particular to a rapid voice recognition method based on hierarchical recognition.
Background
With the continuous improvement of computing power and the accumulation of data, the effect of a voice recognition system is obviously improved, and an end-to-end modeling method represented by CTC and encorder-decoder is more sufficient in utilization of mass data and has stronger modeling capability. In the field of speech recognition, a Conformer model adopting convolution enhancement is proposed by Google in 2020, and the accuracy of speech recognition is continuously refreshed, so that the Conformer model becomes a conventional method for acoustic modeling of current speech recognition. Under massive training data, the multi-layer former model has more parameter quantities and is proved to have stronger modeling capability. In general, the 12-24-layer Conformer model has stronger modeling capability under the support of massive training data as the number of model layers increases. However, as the number of parameters increases, the more computation is required in the model inference process when performing speech recognition, the more energy consumption, delay and resources are required, and this limits the application of large models in practical scenarios. In order to enable the deep layer former network to be applied to the voice recognition task, methods such as reducing the number of hidden layer neurons or matrix decomposition are generally adopted to reduce the number of parameters and the amount of calculation, but these methods also generally bring a certain performance loss. Meanwhile, the computational complexity of model reasoning in the voice recognition process still presents linear growth along with the increase of the number of layers of the former.
Disclosure of Invention
The present invention aims to provide a method for fast speech recognition based on hierarchical recognition, thereby solving the aforementioned problems in the prior art.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
a quick speech recognition method based on hierarchical recognition comprises the following steps:
s1, dividing deep networks of a Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from a bottom layer to a top layer, and leading out a tap from a last layer identification network in each shallow network to decode by using a shallow Decoder to form F Conformer models with shallow networks; wherein R and M represent the number of network levels, F represents the number of Conformer models with shallow networks, F = R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a voice recognition model with F shallow layer networks, carrying out level-by-level recognition on input voice according to the level of the shallow layer network in the voice recognition model, and judging the difficulty level of the input voice;
s3, judging the difficulty degree of the input voice according to the entropy of the input voice passing through the shallow network, and judging whether the input voice needs to be calculated and identified by the shallow network at the next level; the smaller the entropy value of the shallow network output is, the more certain the speech recognition result of the shallow network output is, the smaller the ambiguity of the speech recognition result is; conversely, the larger the entropy value is, the more uncertain the speech recognition result output by the shallow network is, the larger the ambiguity of the speech recognition result is, the network with stronger modeling capability is required to recognize.
A rapid speech recognition method based on hierarchical recognition comprises the following steps
S1, dividing deep networks of a Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from a bottom layer to a top layer, leading out a tap from the last layer of identification network in each shallow network, and decoding by using a shallow Decoder to form F Conformer models with shallow networks; wherein, R and M represent the number of network layers, F represents the number of Conformer models with shallow networks, and F = R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a voice recognition model with F shallow layer networks, carrying out level-by-level recognition on input voice according to the level of the shallow layer network in the voice recognition model, and judging the difficulty level of the input voice;
s3, selecting two adjacent shallow networks according to the order of the shallow networks from small to large in level, and judging the consistency of the voice recognition results output by the two shallow networks; when the voice recognition results of two adjacent shallow networks pass consistency judgment, the acoustic modeling is considered to be complete; otherwise, a network with stronger modeling capability is required for speech recognition.
Preferably, the calculation formula of the entropy of the input voice passing through the shallow network is as follows:
wherein E represents entropy, L represents number of speech frames, N represents total number of units required for speech recognition in input speech, and p li Indicating the probability of the i-th speech recognition unit performing speech recognition in the l-th frame in the input speech.
Preferably, the judgment basis of the difficulty level of the speech in step S3 is: setting an entropy threshold, when the entropy value output by the shallow network of the f-th level is smaller than the threshold, determining the difficulty degree of the input voice, and judging that the voice recognition result of the input voice output by the shallow network of the f-th level is a final result; otherwise, the input voice continues to be subjected to voice recognition step by step through the shallow network until the entropy value of the shallow network is smaller than the threshold value or the level of the shallow network is the F-th level; where f represents the level of the shallow network in the speech recognition model.
Preferably, the judgment basis of the difficulty level of the speech in step S3 is: setting a recognition result difference threshold, namely when the difference of the speech recognition results of the shallow networks of the two levels is smaller than the difference threshold, namely diff (result 1, result 2) < threshold, the acoustic modeling is considered to be complete; if the difference between the two speech recognition results is greater than the difference threshold value, namely diff (result 1, result 2) is greater than or equal to threshold, the speech recognition model formed by the current shallow network is considered to have insufficient modeling capability for speech, and speech recognition is continuously carried out step by step through the shallow model upwards until the speech recognition results of two adjacent shallow networks are judged through consistency or the level of the shallow network is the F-th level.
Preferably, in step S3, when the speech recognition results of two adjacent shallow networks pass the consistency determination, the speech recognition results of the current two levels of the shallow networks are linearly weighted as the final output result.
Preferably, in step S1, the deep network of the transformer model is divided into: for the R-layer deep network, a tap is led out every M layers and decoded by using a shallow Decoder, and F shallow decoders are arranged to form F shallow networks.
Preferably, for the speech recognition model with the shallow network formed in step S2, R/M branches are used to perform multi-task joint training on the shallow network with progressive levels.
Preferably, for models with different network depths, parameters are shared in the shallow network.
The invention has the beneficial effects that: the invention discloses a rapid speech recognition method based on hierarchical recognition, which divides speech with different difficulties and disassembles models step by step, so that the models with different levels can process cases with different difficulties; the invention solves the problem of limited computing resources required by large model modeling by a hierarchical reasoning mode, greatly reduces the complexity of the whole reasoning, saves the computing resources and reduces the service delay.
Drawings
FIG. 1 is a flow diagram of a fast speech recognition process for hierarchical recognition;
FIG. 2 is a block diagram of a fast speech recognition architecture for hierarchical recognition;
FIG. 3 is a diagram of a decision criteria structure for hierarchical recognition;
fig. 4 is a diagram of a resulting metric structure of hierarchical recognition.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
A rapid voice recognition method based on hierarchical recognition is characterized in that a multi-layer former voice model is divided into multiple stages, the input voice recognition difficulty is judged from the perspective of the voice model, and the voice recognition level is judged according to the voice recognition difficulty; as shown in fig. 1, the method comprises the following steps:
s1, dividing deep networks of a Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from a bottom layer to a top layer, and leading out a tap from a last layer identification network in each shallow network to decode by using a shallow Decoder to form R/M Conformer models with shallow networks;
the specific implementation mode is as follows: for a deep network of an R layer, decoding is performed by using a shallow Decoder every other tap led out from the M layer, and F shallow decoders are configured, where R and M represent the number of network layers, F represents the number of provider models with a shallow network, and F = R/M. As shown in fig. 2, for the 12-layer former model, shallow decoders are connected every 3 layers, and 4 shallow decoders are sequentially arranged from the bottom layer to the top layer, thereby forming 4 former models having shallow networks.
S2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a voice recognition model with R/M shallow layer networks, carrying out multi-task joint training on the shallow layer networks with progressive levels by adopting R/M branch voices, wherein each shallow layer network is internally provided with shared parameters;
s3, recognizing input voice step by step according to the level of the shallow network in the voice recognition model, and judging the difficulty level of the input voice;
s41, as shown in FIG. 3, according to the entropy of the voice passing through the shallow network, the difficulty level of the voice is judged, and whether the voice needs to be calculated and identified by the shallow network of the next level is judged;
the calculation formula of the entropy of the voice passing through the shallow network is as follows:
wherein E represents entropy, L represents number of speech frames, N represents total number of units required for speech recognition in input speech, and p li Representing the probability of the ith voice recognition unit in the input voice to perform voice recognition in the ith frame; the smaller the entropy value of the shallow network output is, the more certain the speech recognition result of the shallow network output is, the smaller the ambiguity of the speech recognition result is; conversely, the larger the entropy value is, the more uncertain the speech recognition result output by the shallow network is, and the greater the ambiguity of the speech recognition result is, the network with stronger modeling capability is required to recognize.
Setting a threshold value of entropy, determining the difficulty degree of the voice when the entropy value output by the f-th level shallow network is smaller than the threshold value, and judging that the voice recognition result output by the f-th level shallow network is the final result; otherwise, the voice continues to be recognized upwards through the shallow network stage by stage until the entropy value of the shallow network is smaller than the threshold value or the level of the shallow network is F level; wherein f represents a level of the shallow network in the speech recognition model.
By using the voice difficulty determination method in step S41, even if the entropy outputted by the current shallow network is smaller than the threshold, an error may still occur in the correspondingly outputted voice recognition result; based on the method, the following voice difficulty judging method can be adopted:
s42, as shown in FIG. 4, selecting two adjacent shallow networks according to the order of the shallow networks from small to large, and judging the consistency of the speech recognition results output by the two shallow networks; setting a recognition result difference threshold, namely, when the difference of the voice recognition results of the two levels of the shallow network is smaller than the difference threshold
When diff (result 1, result 2) < threshold, the acoustic modeling is considered to be complete, the recognition result tends to converge, and the speech recognition results of the current two levels of the shallow network are used as the final output result through linear weighting; if the difference between the two speech recognition results is greater than the difference threshold value, namely diff (result 1, result 2) is greater than or equal to threshold, the speech recognition model formed by the current shallow network is considered to have insufficient modeling capability for speech, and speech recognition is continuously carried out step by step through the shallow model upwards until the speech recognition results of two adjacent shallow networks are judged through consistency or the level of the shallow network is the F-th level.
Compared with the speech difficulty judging method in the step S41, the speech difficulty judging method in the step S42 has a larger calculation amount because at least the previous two-stage speech recognition needs to be calculated, but the accuracy of the recognition is more guaranteed, and meanwhile, the weighted fusion of the two-stage recognition results can also play a certain model averaging role, so that the decoding accuracy of the speech recognition model in the same level can be improved.
Examples
In the actual voice recognition task, most voice recognition tasks are simple cases with clean backgrounds and clean pronunciations, and the voice recognition task can be completed by using a shallow network aiming at the voice recognition task.
Taking the deep network of the former model as 12 layers as an example, assuming that 20% of the selected detected speeches need to use the deep network for speech recognition, even if the determination method in step S42 is used, the calculation amount of about 80% by 50% =40% can be saved, and the profit is considerable;
taking the deep network of the former model as 24 layers as an example, selecting 20% of cases in the detected speech needs to use the deep network for speech recognition, and assuming that 20% of cases need to use 24 layers of networks and 80% of cases need not exceed 12 layers of networks, the increased complexity does not exceed 100% × 20% =20% calculated amount, which is far less than 100% calculated amount of all cases directly increased by 24 layers of networks.
By adopting the technical scheme disclosed by the invention, the following beneficial effects are obtained:
the invention discloses a rapid speech recognition method based on hierarchical recognition, which is characterized in that speech with different difficulties is shunted to disassemble models step by step, so that the models with different levels can process cases with different difficulties; the invention solves the problem of limited computing resources required by large model modeling by a hierarchical reasoning mode, greatly reduces the complexity of the whole reasoning, saves the computing resources and reduces the service delay.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.
Claims (1)
1. A quick speech recognition method based on hierarchical recognition is characterized by comprising the following steps:
s1, dividing deep networks of a Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from a bottom layer to a top layer, and leading out a tap from a last layer identification network in each shallow network to decode by using a shallow Decoder to form F Conformer models with shallow networks; wherein R and M represent the number of network levels, F represents the number of Conformer models with shallow networks, F = R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a voice recognition model with F shallow layer networks, carrying out level-by-level recognition on input voice according to the level of the shallow layer network in the voice recognition model, and judging the difficulty level of the input voice;
s3, judging the difficulty degree of the input voice according to the entropy of the input voice passing through the shallow network, and judging whether the input voice needs to be calculated and identified by the shallow network at the next level; the smaller the entropy value of the shallow network output is, the more certain the speech recognition result of the shallow network output is, the smaller the ambiguity of the speech recognition result is; on the contrary, the larger the entropy value is, the more uncertain the speech recognition result output by the shallow network is, the larger the ambiguity of the speech recognition result is, the network with stronger modeling capability is required to be recognized;
wherein, the calculation formula of the entropy of the input voice passing through the shallow network is as follows:
wherein E represents entropy, L represents number of speech frames, N represents total number of units required for speech recognition in input speech, and p li Representing the probability of the ith speech recognition unit in the input speech to perform speech recognition in the l frame;
the judgment basis of the speech difficulty degree in the step S3 is as follows: setting an entropy threshold, when the entropy value output by the shallow network of the f-th level is smaller than the threshold, determining the difficulty degree of the input voice, and judging that the voice recognition result of the input voice output by the shallow network of the f-th level is a final result; otherwise, the input voice continues to be subjected to voice recognition step by step through the shallow network until the entropy value of the shallow network is smaller than the threshold value or the level of the shallow network is F level; wherein f represents a level of the shallow network in the speech recognition model;
alternatively, the method comprises the steps of,
s1, dividing deep networks of a Conformer model, dividing the deep networks with R layers into shallow networks every M layers according to the sequence from a bottom layer to a top layer, leading out a tap from the last layer of identification network in each shallow network, and decoding by using a shallow Decoder to form F Conformer models with shallow networks; wherein, R and M represent the number of network layers, F represents the number of Conformer models with shallow networks, and F = R/M;
s2, according to the sequence from the bottom layer to the top layer in the deep layer network, carrying out level division and sequencing on the formed shallow layer network to form a voice recognition model with F shallow layer networks, carrying out level-by-level recognition on input voice according to the level of the shallow layer network in the voice recognition model, and judging the difficulty level of the input voice;
s3, selecting two adjacent shallow networks according to the order of the shallow networks from small to large in level, and judging the consistency of the voice recognition results output by the two shallow networks; when the voice recognition results of two adjacent shallow networks pass consistency judgment, the acoustic modeling is considered to be complete; otherwise, a network with stronger modeling capability is needed for voice recognition;
the judgment basis of the difficulty level of the voice in step S3 is as follows: setting a recognition result difference threshold, namely when the difference of the speech recognition results of the shallow networks of two levels is smaller than the difference threshold, namely diff (result 1, result 2) < threshold, the acoustic modeling is considered to be complete; if the difference between the two speech recognition results is greater than the difference threshold value, namely diff (result 1, result 2) is greater than or equal to threshold, the speech recognition model formed by the current shallow network is considered to have insufficient modeling capacity for speech, and the speech recognition is continuously carried out step by step through the shallow network upwards until the speech recognition results of the two adjacent shallow networks are judged through consistency or the level of the shallow network is the F-th level;
in step S3, when the voice recognition results of two adjacent shallow networks pass consistency judgment, the voice recognition results of the current two levels of shallow networks are used as final output results through linear weighting;
in the step S1, the deep network of the transformer model is divided in the following manner: for the deep network of the R layer, a tap is led out every M layers and decoded by using a shallow Decoder, and F shallow decoders are arranged to form F shallow networks;
aiming at the voice recognition model with the shallow network formed in the step S2, performing multi-task joint training on the shallow network with progressive levels by adopting R/M branches;
for models with different network depths, all parameters are shared in the shallow network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210571189.4A CN114974228B (en) | 2022-05-24 | 2022-05-24 | Rapid voice recognition method based on hierarchical recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210571189.4A CN114974228B (en) | 2022-05-24 | 2022-05-24 | Rapid voice recognition method based on hierarchical recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114974228A CN114974228A (en) | 2022-08-30 |
CN114974228B true CN114974228B (en) | 2023-04-11 |
Family
ID=82955743
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210571189.4A Active CN114974228B (en) | 2022-05-24 | 2022-05-24 | Rapid voice recognition method based on hierarchical recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114974228B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019095600A (en) * | 2017-11-22 | 2019-06-20 | 日本電信電話株式会社 | Acoustic model learning device, speech recognition device, and method and program for them |
WO2021023202A1 (en) * | 2019-08-07 | 2021-02-11 | 交叉信息核心技术研究院(西安)有限公司 | Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method |
CN113807499A (en) * | 2021-09-15 | 2021-12-17 | 清华大学 | Lightweight neural network model training method, system, device and storage medium |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3467556B2 (en) * | 1992-06-19 | 2003-11-17 | セイコーエプソン株式会社 | Voice recognition device |
US6795793B2 (en) * | 2002-07-19 | 2004-09-21 | Med-Ed Innovations, Inc. | Method and apparatus for evaluating data and implementing training based on the evaluation of the data |
US9305554B2 (en) * | 2013-07-17 | 2016-04-05 | Samsung Electronics Co., Ltd. | Multi-level speech recognition |
CN106844343B (en) * | 2017-01-20 | 2019-11-19 | 上海傲硕信息科技有限公司 | Instruction results screening plant |
DE102017220266B3 (en) * | 2017-11-14 | 2018-12-13 | Audi Ag | Method for checking an onboard speech recognizer of a motor vehicle and control device and motor vehicle |
WO2020014899A1 (en) * | 2018-07-18 | 2020-01-23 | 深圳魔耳智能声学科技有限公司 | Voice control method, central control device, and storage medium |
CN109192194A (en) * | 2018-08-22 | 2019-01-11 | 北京百度网讯科技有限公司 | Voice data mask method, device, computer equipment and storage medium |
CN110413997B (en) * | 2019-07-16 | 2023-04-07 | 深圳供电局有限公司 | New word discovery method, system and readable storage medium for power industry |
CN112182154B (en) * | 2020-09-25 | 2023-10-10 | 中国人民大学 | Personalized search model for eliminating keyword ambiguity by using personal word vector |
CN112434163A (en) * | 2020-11-30 | 2021-03-02 | 北京沃东天骏信息技术有限公司 | Risk identification method, model construction method, risk identification device, electronic equipment and medium |
CN112908301A (en) * | 2021-01-27 | 2021-06-04 | 科大讯飞(上海)科技有限公司 | Voice recognition method, device, storage medium and equipment |
CN113870845A (en) * | 2021-09-26 | 2021-12-31 | 平安科技(深圳)有限公司 | Speech recognition model training method, device, equipment and medium |
-
2022
- 2022-05-24 CN CN202210571189.4A patent/CN114974228B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019095600A (en) * | 2017-11-22 | 2019-06-20 | 日本電信電話株式会社 | Acoustic model learning device, speech recognition device, and method and program for them |
WO2021023202A1 (en) * | 2019-08-07 | 2021-02-11 | 交叉信息核心技术研究院(西安)有限公司 | Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method |
CN113807499A (en) * | 2021-09-15 | 2021-12-17 | 清华大学 | Lightweight neural network model training method, system, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114974228A (en) | 2022-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977212B (en) | Reply content generation method of conversation robot and terminal equipment | |
CN112613303B (en) | Knowledge distillation-based cross-modal image aesthetic quality evaluation method | |
CN112685597B (en) | Weak supervision video clip retrieval method and system based on erasure mechanism | |
CN111128137A (en) | Acoustic model training method and device, computer equipment and storage medium | |
US11651578B2 (en) | End-to-end modelling method and system | |
Fang et al. | EdgeKE: An on-demand deep learning IoT system for cognitive big data on industrial edge devices | |
CN111916058A (en) | Voice recognition method and system based on incremental word graph re-scoring | |
CN113257248B (en) | Streaming and non-streaming mixed voice recognition system and streaming voice recognition method | |
CN112733964B (en) | Convolutional neural network quantization method for reinforcement learning automatic perception weight distribution | |
CN112967739B (en) | Voice endpoint detection method and system based on long-term and short-term memory network | |
CN115376491A (en) | Voice confidence calculation method, system, electronic equipment and medium | |
CN114974228B (en) | Rapid voice recognition method based on hierarchical recognition | |
Sterpu et al. | Should we hard-code the recurrence concept or learn it instead? Exploring the Transformer architecture for Audio-Visual Speech Recognition | |
US11501759B1 (en) | Method, system for speech recognition, electronic device and storage medium | |
CN116109920A (en) | Remote sensing image building extraction method based on transducer | |
CN116028823A (en) | Loss calculation method and device based on multi-mode semantic interaction | |
CN115828863A (en) | Automatic generation method of emergency plan in chaotic engineering test scene | |
CN113160801B (en) | Speech recognition method, device and computer readable storage medium | |
CN114758645A (en) | Training method, device and equipment of speech synthesis model and storage medium | |
CN110674280B (en) | Answer selection algorithm based on enhanced question importance representation | |
CN113076123A (en) | Adaptive template updating system and method for target tracking | |
Wang et al. | Few-shot short utterance speaker verification using meta-learning | |
CN112184846A (en) | Image generation method and device, computer equipment and readable storage medium | |
CN117113063B (en) | Encoding and decoding system for nanopore signals | |
CN117217499B (en) | Campus electric scooter dispatching optimization method based on multi-source data driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |