CN112908305A - Method and equipment for improving accuracy of voice recognition - Google Patents

Method and equipment for improving accuracy of voice recognition Download PDF

Info

Publication number
CN112908305A
CN112908305A CN202110132053.9A CN202110132053A CN112908305A CN 112908305 A CN112908305 A CN 112908305A CN 202110132053 A CN202110132053 A CN 202110132053A CN 112908305 A CN112908305 A CN 112908305A
Authority
CN
China
Prior art keywords
sdm
decoding
asr system
original audio
historical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110132053.9A
Other languages
Chinese (zh)
Other versions
CN112908305B (en
Inventor
范红亮
蒋莹
李轶杰
梁家恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202110132053.9A priority Critical patent/CN112908305B/en
Publication of CN112908305A publication Critical patent/CN112908305A/en
Application granted granted Critical
Publication of CN112908305B publication Critical patent/CN112908305B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training

Abstract

The invention relates to a method and equipment for improving speech recognition accuracy, which are applied to an ASR system provided with an SDM and used for speech recognition, wherein the ASR system is provided with a decoding network used for decoding; the method comprises the following steps: acquiring original audio input into an ASR system and historical decoding information output by a decoding network through the SDM; processing the original audio through the SDM to obtain a plurality of signal characteristics of the original audio; the final characteristics of the original audio are obtained by SDM processing based on a plurality of signal characteristics and historical decoding information. The SDM is added in the decoding stage of the ASR system, the information of each dimension, including signal characteristics directly obtained from audio, context information obtained from historical decoding information and the like, is fully utilized, and an original acoustic model trained through mass data in the ASR system is combined, so that the scoring and identifying capability of the ASR system on input voice in any complex scene can be improved, and the recognition rate is improved.

Description

Method and equipment for improving accuracy of voice recognition
Technical Field
The invention relates to the technical field of voice recognition, in particular to a method and equipment for improving voice recognition accuracy.
Background
The performance of an ASR (Automatic Speech Recognition) system is greatly affected by environmental factors, and when a complex scene is encountered, such as a large environmental noise or a large deviation from training data, a large challenge is posed to the performance of a Recognition engine. Particularly, acoustic scoring is very inaccurate, which has a crucial influence on the recognition result, and engine acoustic scoring is inaccurate, which further affects the accuracy of the final recognition result.
Recognition errors of ASR systems in complex scenes, one of the most common types of errors is insertion errors due to background noise (environmental noise or background human voice, etc.): due to the limitations of model structures and training data, voices and non-voices under a plurality of complex scenes cannot be well distinguished, and the non-voices of the background are recognized into voices by mistake, so that redundant recognition results are generated, namely insertion errors are generated.
In order to cope with high insertion errors in complex scenes, a general current practice is to arrange a VAD (Voice Activity Detection) module at the front end of an ASR system engine to distinguish human voices from non-human voices first, and then to send only pure human voices to the ASR system engine for recognition. However, the disadvantages of this approach are also evident, in particular the following:
VAD is not a standard fit for ASR systems, many ASR systems do not have VAD modules;
2. even if VAD is used to extract voice parts, the voice parts are not necessarily good for recognition (on one hand, VAD does not judge voice accurately, and on the other hand, ASR system recognition needs context information, even if it is non-voice frequency, it is very useful for recognition)
VAD cannot distinguish between target and background vocal interference (e.g., television background noise).
Thus, there is a need for a better solution to the problems of the prior art.
Disclosure of Invention
The invention provides a method and equipment for improving speech recognition accuracy, which can solve the technical problem of low recognition rate in the prior art.
The technical scheme for solving the technical problems is as follows:
the embodiment of the invention provides a method for improving the accuracy of speech recognition, which is applied to an ASR system provided with an SDM and used for speech recognition, wherein the ASR system is provided with a decoding network used for decoding; the method comprises the following steps:
acquiring original audio input into the ASR system and historical decoding information output by the decoding network through the SDM;
processing the original audio through the SDM to obtain a plurality of signal characteristics of the original audio;
and processing by the SDM based on the plurality of signal features and the historical decoding information to obtain final features of the original audio.
In a specific embodiment, the method further comprises the following steps:
and outputting the final characteristics to the decoding network through the SDM so as to enable the decoding network to decode to obtain an identification text.
In a specific embodiment, the signal characteristics include: signal-to-noise ratio, energy, zero crossing rate.
In a specific embodiment, the historical decoding information includes context information.
In a particular embodiment, an acoustic model is also included in the ASR system;
the acoustic scoring of the decoding network comprises: scoring the acoustic model and the SDM; wherein the score of the acoustic model and the score of the SDM each correspond to a respective weight.
In a specific embodiment, the scoring of the SDMs includes: a first score derived from signal features of the original audio, a second score of the original audio derived based on the historical decoding information; the first score and the second score each correspond to a respective weight.
The embodiment of the invention also provides equipment for improving the accuracy of speech recognition, which is applied to an ASR system provided with an SDM and used for speech recognition, wherein the ASR system is provided with a decoding network used for decoding; the apparatus comprises:
the acquisition module is used for acquiring original audio input into the ASR system and historical decoding information output by the decoding network through the SDM;
the first processing module is used for processing the original audio through the SDM to obtain a plurality of signal characteristics of the original audio;
and the second processing module is used for processing the SDM based on the plurality of signal characteristics and the historical decoding information to obtain the final characteristics of the original audio.
In a specific embodiment, the method further comprises the following steps:
and the identification module is used for outputting the final characteristics to the decoding network through the SDM so as to enable the decoding network to decode to obtain an identification text.
In a specific embodiment, the signal characteristics include: signal-to-noise ratio, energy, zero crossing rate.
In a specific embodiment, the historical decoding information includes context information.
The invention has the beneficial effects that:
the SDM is added in the decoding stage of the ASR system, the information of each dimension, including signal characteristics directly obtained from audio, context information obtained from historical decoding information and the like, is fully utilized, and an original acoustic model trained through mass data in the ASR system is combined, so that the scoring and identifying capability of the ASR system on input voice in any complex scene can be improved, and the recognition rate is improved. The accuracy of acoustic scoring can be improved, and the overall performance of the ASR system is improved.
Drawings
FIG. 1 is a block diagram of a prior art ASR system according to an embodiment of the present invention;
FIG. 2 is a block diagram of an ASR system applied in a method for improving speech recognition accuracy according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for improving speech recognition accuracy according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for improving accuracy of speech recognition according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus for improving accuracy of speech recognition according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
The method for improving the accuracy of voice recognition provided by the embodiment of the invention is applied to an ASR system which is provided with an SDM (Speech Detection Module) and is used for voice recognition, wherein the ASR system is provided with a decoding network for decoding; as shown in fig. 1, which is a schematic diagram of a prior art ASR system, a conventional ASR system includes: a training phase and a decoding phase; wherein, the training stage: training an Acoustic Model (AM) by utilizing a voice database based on technologies such as a deep neural network and the like; and training a Language Model (LM) by utilizing a text database based on technologies such as ngram and a deep neural network. And a decoding stage: the acoustic model, the language model and the pronunciation dictionary obtained in the training stage can form a decoding network. After the input audio is subjected to feature extraction, an optimal path can be found out from a decoding network through a decoding algorithm, and a final recognition result is obtained. However, the high insertion error of the Acoustic Model (AM) in a complex scene is mainly that when the Acoustic Model (AM) calculates an acoustic score, human voice and non-human voice cannot be accurately distinguished, and a noise part is misjudged as human voice. And the acoustic score used for decoding is directly from the score of an Acoustic Model (AM), and whether the human voice or the non-human voice is also directly dependent on the performance of the acoustic model.
As shown in fig. 2, which is a schematic diagram of a framework of the ASR system in the present solution, an SDM (Speech Detection Module) is added in the Speech recognition engine to dynamically detect the generation of Speech, assist the engine in judging human voices and non-human voices, and make up for the deficiency of the acoustic model in scoring, thereby improving the recognition accuracy of the ASR system.
As shown in fig. 3, the method comprises the steps of:
step 101, obtaining original audio input into the ASR system and historical decoding information output by the decoding network through the SDM; specifically, the historical decoding information includes context information.
Step 102, processing the original audio through the SDM to obtain a plurality of signal characteristics of the original audio; specifically, the signal characteristics include: signal to noise ratio, energy, zero crossing rate, etc.
Step 103, processing by the SDM based on the plurality of signal features and the historical decoding information to obtain final features of the original audio.
Specifically, in the scheme, the SDM input added in the decoding link of the ASR system has two inputs: input audio and historical decoding information. The output has one: the module is for the characteristics of the current input speech.
The SDM directly aims at an input audio clip A to obtain a group of characteristics Feat _ A, and the characteristics represent the characteristics of input audio from multiple dimensions such as signal-to-noise ratio, energy, zero crossing rate and the like;
the history information already obtained by the decoding network can be used as input, and together with the above feature Feat _ a, as the judgment of the current input audio by the voice detection module, the feature Feat _ B is output as the current output feature of the voice detection module. The information obtained by decoding the network is the determined context information, and has higher reference value: and extracting the human voice and non-human voice characteristics of the current scene from the human voice and non-human voice with known historical information.
Therefore, compared with the scoring of an Acoustic Model (AM), the output characteristic Feat _ B of the SDM describes the current input audio from multiple dimensions; by jointly using the characteristics of the original audio and the context characteristics obtained by acoustic decoding, the respective defects can be made up, more accurate judgment on the human voice/non-human voice in the complex scene can be obtained, the scoring identification capability of the engine on the complex scene is improved, the identification insertion errors are reduced, and the identification rate is improved.
Further, after step 103, as shown in fig. 2, the method further includes:
and outputting the final characteristics to the decoding network through the SDM so as to enable the decoding network to decode to obtain an identification text.
Further, an acoustic model is also included in the ASR system;
the acoustic scoring of the decoding network comprises: scoring the acoustic model and the SDM; wherein the score of the acoustic model and the score of the SDM each correspond to a respective weight.
Specifically, the scoring of the SDMs includes: a first score derived from signal features of the original audio, a second score of the original audio derived based on the historical decoding information; the first score and the second score each correspond to a respective weight.
Specifically, the acoustic score of the decoding network can be represented by the following formula:
sAM′=wAMsAM+wSDM(wSDM_AudiosSDM_Audio+wSDM_History_DecsSDM_History_Dec);
in particular, the acoustic score s used for decoding after SDM is addedAM' is made up of two parts, one part is the scoring s directly from the acoustic modelAMIt weights w in the final scoreAM(ii) a Another part is the scoring of the SDM, which is weighted w in the final scoreSDMThe score for SDM, in turn, is derived from the following two components:
partly a score s derived directly from the original audio informationSDM_AudioThe method is obtained through signal characteristics such as audio signal-to-noise ratio, energy magnitude, zero crossing rate and the like, and the weight is as follows: w is aSDM_Audio
Yet another part is dependency history solutionCode information, and thus a score s of the current audioSDM_History_Dec. Historically, the decoded information is often more reliable, and has strong directivity to the current speech characteristics (audio has chronology and short-time stability). Two features of the currently identified scene can be derived from historical decoding information: a Speech feature, Feat _ Speech, and a non-Speech feature, Feat _ NonSpeech. Which feature of the current audio is more preferred may be considered to be currently Speech or non-Speech. With a weight of wSDM_History_Dec
The scheme is characterized in that a voice detection module is newly added in the decoding stage of the ASR system, information of all dimensions including signal characteristics directly obtained from audio and context information obtained from historical decoding information is fully utilized, and an acoustic model trained through mass data is combined, so that the scoring and identifying capability of the ASR system on input voice in any complex scene can be improved, and the recognition rate is improved. The accuracy of acoustic scoring can be improved, and the overall performance of the ASR system engine is further improved. According to the scheme, through the application of the multidimensional characteristics, more comprehensive and reasonable scoring can be performed on the input audio in any complex scene, the insertion errors caused by complex environments are reduced, and the identification accuracy of the system is improved.
Example 2
Embodiment 2 of the present invention also discloses a device for improving accuracy of speech recognition, as shown in fig. 4, which is applied to an ASR system for speech recognition provided with SDM, the ASR system being provided with a decoding network for decoding; the apparatus comprises:
an obtaining module 201, configured to obtain, by the SDM, an original audio input to the ASR system and historical decoding information output by the decoding network;
a first processing module 202, configured to process the original audio through the SDM to obtain a plurality of signal features of the original audio;
a second processing module 203, configured to perform processing by the SDM based on the plurality of signal features and the historical decoding information to obtain a final feature of the original audio.
In a specific embodiment, as shown in fig. 5, the method further includes:
the identifying module 204 is configured to output the final feature to the decoding network through the SDM, so that the decoding network decodes the final feature to obtain an identifying text.
In a specific embodiment, the signal characteristics include: signal to noise ratio, energy, zero crossing rate, etc.
In a specific embodiment, the historical decoding information includes context information.
In a particular embodiment, an acoustic model is also included in the ASR system;
the acoustic scoring of the decoding network comprises: scoring the acoustic model and the SDM; wherein the score of the acoustic model and the score of the SDM each correspond to a respective weight.
In a specific embodiment, the scoring of the SDMs includes: a first score derived from signal features of the original audio, a second score of the original audio derived based on the historical decoding information; the first score and the second score each correspond to a respective weight.
The invention relates to a method and equipment for improving speech recognition accuracy, which are applied to an ASR system provided with an SDM and used for speech recognition, wherein the ASR system is provided with a decoding network used for decoding; the method comprises the following steps: acquiring original audio input into the ASR system and historical decoding information output by the decoding network through the SDM; processing the original audio through the SDM to obtain a plurality of signal characteristics of the original audio; and processing by the SDM based on the plurality of signal features and the historical decoding information to obtain final features of the original audio. The SDM is added in the decoding stage of the ASR system, the information of each dimension, including signal characteristics directly obtained from audio, context information obtained from historical decoding information and the like, is fully utilized, and an original acoustic model trained through mass data in the ASR system is combined, so that the scoring and identifying capability of the ASR system on input voice in any complex scene can be improved, and the recognition rate is improved. The accuracy of acoustic scoring can be improved, and the overall performance of the ASR system is improved.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for improving speech recognition accuracy is applied to an ASR system provided with an SDM and used for speech recognition, wherein the ASR system is provided with a decoding network used for decoding; the method comprises the following steps:
acquiring original audio input into the ASR system and historical decoding information output by the decoding network through the SDM;
processing the original audio through the SDM to obtain a plurality of signal characteristics of the original audio;
and processing by the SDM based on the plurality of signal features and the historical decoding information to obtain final features of the original audio.
2. The method of claim 1, further comprising:
and outputting the final characteristics to the decoding network through the SDM so as to enable the decoding network to decode to obtain an identification text.
3. The method of claim 1, wherein the signal features comprise: signal-to-noise ratio, energy, zero crossing rate.
4. The method of claim 1, wherein the historical decoding information comprises context information.
5. The method according to claim 1, further comprising an acoustic model in the ASR system;
the acoustic scoring of the decoding network comprises: scoring the acoustic model and the SDM; wherein the score of the acoustic model and the score of the SDM each correspond to a respective weight.
6. The method of claim 5, wherein scoring the SDM comprises: a first score derived from signal features of the original audio, a second score of the original audio derived based on the historical decoding information; the first score and the second score each correspond to a respective weight.
7. The device for improving the speech recognition accuracy is applied to an ASR system which is provided with an SDM and used for speech recognition, wherein the ASR system is provided with a decoding network used for decoding; the apparatus comprises:
the acquisition module is used for acquiring original audio input into the ASR system and historical decoding information output by the decoding network through the SDM;
the first processing module is used for processing the original audio through the SDM to obtain a plurality of signal characteristics of the original audio;
and the second processing module is used for processing the SDM based on the plurality of signal characteristics and the historical decoding information to obtain the final characteristics of the original audio.
8. The apparatus of claim 7, further comprising:
and the identification module is used for outputting the final characteristics to the decoding network through the SDM so as to enable the decoding network to decode to obtain an identification text.
9. The apparatus of claim 7, wherein the signal features comprise: signal-to-noise ratio, energy, zero crossing rate.
10. The apparatus of claim 7, wherein the historical decoding information comprises context information.
CN202110132053.9A 2021-01-30 2021-01-30 Method and equipment for improving accuracy of voice recognition Active CN112908305B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110132053.9A CN112908305B (en) 2021-01-30 2021-01-30 Method and equipment for improving accuracy of voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110132053.9A CN112908305B (en) 2021-01-30 2021-01-30 Method and equipment for improving accuracy of voice recognition

Publications (2)

Publication Number Publication Date
CN112908305A true CN112908305A (en) 2021-06-04
CN112908305B CN112908305B (en) 2023-03-21

Family

ID=76122000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110132053.9A Active CN112908305B (en) 2021-01-30 2021-01-30 Method and equipment for improving accuracy of voice recognition

Country Status (1)

Country Link
CN (1) CN112908305B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113889076A (en) * 2021-09-13 2022-01-04 北京百度网讯科技有限公司 Speech recognition and coding/decoding method, device, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249771A1 (en) * 2007-04-05 2008-10-09 Wahab Sami R System and method of voice activity detection in noisy environments
CN103035243A (en) * 2012-12-18 2013-04-10 中国科学院自动化研究所 Real-time feedback method and system of long voice continuous recognition and recognition result
CN106803422A (en) * 2015-11-26 2017-06-06 中国科学院声学研究所 A kind of language model re-evaluation method based on memory network in short-term long
CN108074584A (en) * 2016-11-18 2018-05-25 南京大学 A kind of audio signal classification method based on signal multiple features statistics
CN108510990A (en) * 2018-07-04 2018-09-07 百度在线网络技术(北京)有限公司 Audio recognition method, device, user equipment and storage medium
CN109637526A (en) * 2019-01-08 2019-04-16 西安电子科技大学 The adaptive approach of DNN acoustic model based on personal identification feature
CN110399460A (en) * 2019-07-19 2019-11-01 腾讯科技(深圳)有限公司 Dialog process method, apparatus, equipment and storage medium
US10468019B1 (en) * 2017-10-27 2019-11-05 Kadho, Inc. System and method for automatic speech recognition using selection of speech models based on input characteristics
CN111754991A (en) * 2020-06-28 2020-10-09 汪秀英 Method and system for realizing distributed intelligent interaction by adopting natural language

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249771A1 (en) * 2007-04-05 2008-10-09 Wahab Sami R System and method of voice activity detection in noisy environments
CN103035243A (en) * 2012-12-18 2013-04-10 中国科学院自动化研究所 Real-time feedback method and system of long voice continuous recognition and recognition result
CN106803422A (en) * 2015-11-26 2017-06-06 中国科学院声学研究所 A kind of language model re-evaluation method based on memory network in short-term long
CN108074584A (en) * 2016-11-18 2018-05-25 南京大学 A kind of audio signal classification method based on signal multiple features statistics
US10468019B1 (en) * 2017-10-27 2019-11-05 Kadho, Inc. System and method for automatic speech recognition using selection of speech models based on input characteristics
CN108510990A (en) * 2018-07-04 2018-09-07 百度在线网络技术(北京)有限公司 Audio recognition method, device, user equipment and storage medium
CN109637526A (en) * 2019-01-08 2019-04-16 西安电子科技大学 The adaptive approach of DNN acoustic model based on personal identification feature
CN110399460A (en) * 2019-07-19 2019-11-01 腾讯科技(深圳)有限公司 Dialog process method, apparatus, equipment and storage medium
CN111754991A (en) * 2020-06-28 2020-10-09 汪秀英 Method and system for realizing distributed intelligent interaction by adopting natural language

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张明亮等: "基于全卷积神经网络的语音增强算法", 《计算机应用研究》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113889076A (en) * 2021-09-13 2022-01-04 北京百度网讯科技有限公司 Speech recognition and coding/decoding method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112908305B (en) 2023-03-21

Similar Documents

Publication Publication Date Title
WO2021128741A1 (en) Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium
CN106611604B (en) Automatic voice superposition detection method based on deep neural network
US9043207B2 (en) Speaker recognition from telephone calls
CN111429935B (en) Voice caller separation method and device
CN106098079B (en) Method and device for extracting audio signal
KR102199246B1 (en) Method And Apparatus for Learning Acoustic Model Considering Reliability Score
CN112581938B (en) Speech breakpoint detection method, device and equipment based on artificial intelligence
CN102708861A (en) Poor speech recognition method based on support vector machine
CN111863033B (en) Training method, device, server and storage medium for audio quality recognition model
Shin et al. Learning audio-text agreement for open-vocabulary keyword spotting
CN112908305B (en) Method and equipment for improving accuracy of voice recognition
US20020082833A1 (en) Method for recognizing speech
Zilca et al. Pseudo pitch synchronous analysis of speech with applications to speaker recognition
CN111640423B (en) Word boundary estimation method and device and electronic equipment
KR101122591B1 (en) Apparatus and method for speech recognition by keyword recognition
Mengusoglu et al. Use of acoustic prior information for confidence measure in ASR applications.
CN113035236B (en) Quality inspection method and device for voice synthesis data
Reynolds et al. The Lincoln speaker recognition system: NIST EVAL2000
Niu et al. Separation guided speaker diarization in realistic mismatched conditions
WO2014155652A1 (en) Speaker retrieval system and program
CN110875044A (en) Speaker identification method based on word correlation score calculation
Li et al. Phonetic-attention scoring for deep speaker features in speaker verification
Ahmed et al. Non-native accent pronunciation modeling in automatic speech recognition
KR20180050809A (en) Apparatus and method for verifing speech file
CN117831506A (en) Speech recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant