CN113299280A - Professional vocabulary speech recognition method based on Kaldi - Google Patents
Professional vocabulary speech recognition method based on Kaldi Download PDFInfo
- Publication number
- CN113299280A CN113299280A CN202110515758.9A CN202110515758A CN113299280A CN 113299280 A CN113299280 A CN 113299280A CN 202110515758 A CN202110515758 A CN 202110515758A CN 113299280 A CN113299280 A CN 113299280A
- Authority
- CN
- China
- Prior art keywords
- file
- speech recognition
- professional
- kaldi
- hclg
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 12
- 238000013518 transcription Methods 0.000 claims abstract description 12
- 230000035897 transcription Effects 0.000 claims abstract description 12
- 238000013526 transfer learning Methods 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims description 4
- 230000008901 benefit Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013515 script Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a professional vocabulary speech recognition method based on Kaldi, which can effectively solve the problem of low speech recognition accuracy of professional vocabularies in a specific field. A professional vocabulary voice recognition method based on Kaldi comprises the following steps of S1 training a voice recognition model in a professional field through transfer learning, S2 collecting professional vocabularies and updating L files of a finite state transcription engine, S3 updating G files of the language model finite state transcription engine, S4 constructing a decoding space HCLG file, and S5 decoding and recognizing by using the HCLG file.
Description
Technical Field
The invention relates to a vocabulary speech recognition method, in particular to a professional vocabulary speech recognition method based on Kaldi, and belongs to the technical field of artificial intelligence.
Background
With the benefit of the development of digitization and intellectualization and the general improvement of speech recognition accuracy, speech recognition application has increased explosively in recent years, and plays an important role in smart homes, smart vehicles and other aspects. Kaldi is the most popular speech recognition tool at present, integrates a large amount of latest progress and optimal scripts, greatly lowers the application threshold of speech recognition technology, and promotes the falling of speech recognition in the industry. Although speech recognition has been developed in recent years, in a specific industrial field, application of speech recognition is limited due to problems such as low accuracy of professional vocabulary recognition. In order to solve the problem of professional vocabulary recognition, a large number of voice corpora containing professional vocabularies can be collected, and then a voice recognition model in a specific field is trained. In practical application, because the collection difficulty of the professional field speech corpora is high, the collected corpora quantity is not too large, a large amount of general corpus data can be used for training a general speech recognition model, and then migration learning training is carried out on the professional field corpora, so that a speech recognition model suitable for the professional field requirements is obtained. However, professional speech corpora are difficult to cover all professional vocabularies, especially some application scenarios involving human names and place names.
Disclosure of Invention
The invention aims to provide a professional vocabulary speech recognition method based on Kaldi, which can effectively solve the problem of low speech recognition accuracy of professional vocabularies in a specific field.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a professional vocabulary speech recognition method based on Kaldi comprises the following steps:
s1, training a speech recognition model in a professional field through transfer learning;
s2, collecting professional vocabularies and updating the L file of the finite state transcription engine;
s3, updating the file of the language model finite state transcription engine G;
s4, constructing a decoding space HCLG file;
and S5, decoding and identifying by using the HCLG file.
According to the preferred scheme of the professional vocabulary speech recognition method based on Kaldi, the specific process of s1 is as follows: and collecting speech recognition training corpus data of a specific field, and training a speech recognition model of a professional field by using a Kaldi transfer learning mode based on a universal speech recognition model.
According to the preferred scheme of the professional vocabulary speech recognition method based on Kaldi, the specific process of constructing the HCLG file in the decoding space is as follows: updating a language model finite state transcription machine G file by using a Kaldi tool based on professional vocabularies, and combining the L file and the G file to obtain an LG file; and dynamically generating a CLG file by utilizing a Kaldi tool based on the updated LG file, and further constructing a decoding space HCLG file.
The invention has the advantages that:
the problem of the speech recognition accuracy of professional vocabularies in the specific field can be improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a schematic flow chart of an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A professional vocabulary speech recognition method based on Kaldi comprises the following steps:
s1, training a speech recognition model in a professional field through transfer learning: collecting speech recognition training corpus data of a specific field, and training a speech recognition model of a professional field by using a Kaldi transfer learning mode based on a universal speech recognition model, wherein the speech recognition model can recognize most of vocabularies of the professional field but cannot cover all the vocabularies of the professional field;
s2, collecting professional vocabularies and updating the L file of the finite state transcription engine: collecting professional vocabularies according to specific voice recognition requirements, arranging a professional vocabulary dictionary according to Kaldi format requirements, and updating a dictionary finite state transcription engine L file by using a Kaldi tool;
s3, updating the file of the language model finite state transcription engine G: updating a language model finite state transcription machine G file by using a Kaldi tool based on professional vocabularies, and combining the L file and the G file to obtain an LG file;
s4, constructing a decoding space HCLG file: dynamically generating a CLG file by using a Kaldi tool based on the updated LG file, and further generating a decoding space HCLG file;
s5, decoding and identifying by using the new HCLG file: and decoding and recognizing by utilizing the new HCLG file and combining the speech recognition model of the transfer learning.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (3)
1. A professional vocabulary speech recognition method based on Kaldi is characterized in that: the method comprises the following steps:
s1, training a speech recognition model in a professional field through transfer learning;
s2, collecting professional vocabularies and updating the L file of the finite state transcription engine;
s3, updating the file of the language model finite state transcription engine G;
s4, constructing a decoding space HCLG file;
and S5, decoding and identifying by using the HCLG file.
2. The Kaldi-based professional vocabulary speech recognition method of claim 1, wherein: the specific process of s1 is as follows: and collecting speech recognition training corpus data of a specific field, and training a speech recognition model of a professional field by using a Kaldi transfer learning mode based on a universal speech recognition model.
3. The Kaldi-based professional vocabulary speech recognition method according to claim 1 or 2, wherein: the specific process of constructing the decoding space HCLG file is as follows: updating a language model finite state transcription machine G file by using a Kaldi tool based on professional vocabularies, and combining the L file and the G file to obtain an LG file; and dynamically generating a CLG file by utilizing a Kaldi tool based on the updated LG file, and further constructing a decoding space HCLG file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110515758.9A CN113299280A (en) | 2021-05-12 | 2021-05-12 | Professional vocabulary speech recognition method based on Kaldi |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110515758.9A CN113299280A (en) | 2021-05-12 | 2021-05-12 | Professional vocabulary speech recognition method based on Kaldi |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113299280A true CN113299280A (en) | 2021-08-24 |
Family
ID=77321633
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110515758.9A Pending CN113299280A (en) | 2021-05-12 | 2021-05-12 | Professional vocabulary speech recognition method based on Kaldi |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113299280A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9966066B1 (en) * | 2016-02-03 | 2018-05-08 | Nvoq Incorporated | System and methods for combining finite state transducer based speech recognizers |
CN110610700A (en) * | 2019-10-16 | 2019-12-24 | 科大讯飞股份有限公司 | Decoding network construction method, voice recognition method, device, equipment and storage medium |
CN110797026A (en) * | 2019-09-17 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Voice recognition method, device and storage medium |
CN111696525A (en) * | 2020-05-08 | 2020-09-22 | 天津大学 | Kaldi-based Chinese speech recognition acoustic model construction method |
CN112133292A (en) * | 2019-06-25 | 2020-12-25 | 南京航空航天大学 | End-to-end automatic voice recognition method for civil aviation land-air communication field |
CN112133290A (en) * | 2019-06-25 | 2020-12-25 | 南京航空航天大学 | Speech recognition method based on transfer learning and aiming at civil aviation air-land communication field |
US20210097428A1 (en) * | 2019-09-30 | 2021-04-01 | International Business Machines Corporation | Scalable and dynamic transfer learning mechanism |
-
2021
- 2021-05-12 CN CN202110515758.9A patent/CN113299280A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9966066B1 (en) * | 2016-02-03 | 2018-05-08 | Nvoq Incorporated | System and methods for combining finite state transducer based speech recognizers |
CN112133292A (en) * | 2019-06-25 | 2020-12-25 | 南京航空航天大学 | End-to-end automatic voice recognition method for civil aviation land-air communication field |
CN112133290A (en) * | 2019-06-25 | 2020-12-25 | 南京航空航天大学 | Speech recognition method based on transfer learning and aiming at civil aviation air-land communication field |
CN110797026A (en) * | 2019-09-17 | 2020-02-14 | 腾讯科技(深圳)有限公司 | Voice recognition method, device and storage medium |
US20210097428A1 (en) * | 2019-09-30 | 2021-04-01 | International Business Machines Corporation | Scalable and dynamic transfer learning mechanism |
CN110610700A (en) * | 2019-10-16 | 2019-12-24 | 科大讯飞股份有限公司 | Decoding network construction method, voice recognition method, device, equipment and storage medium |
CN111696525A (en) * | 2020-05-08 | 2020-09-22 | 天津大学 | Kaldi-based Chinese speech recognition acoustic model construction method |
Non-Patent Citations (1)
Title |
---|
徐萍等: "基于迁移学习的个性化循环神经网络语言模型", 《南京理工大学学报》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103903619B (en) | A kind of method and system improving speech recognition accuracy | |
CN1177313C (en) | Chinese speech identification method with dialect background | |
CN107039034B (en) | Rhythm prediction method and system | |
CN109522011B (en) | Code line recommendation method based on context depth perception of programming site | |
CN103810998B (en) | Based on the off-line audio recognition method of mobile terminal device and realize method | |
CN110275960B (en) | Method and system for expressing knowledge graph and text information based on named sentence | |
CN102831195B (en) | Personalized speech gathers and semantic certainty annuity and method thereof | |
CN109213856A (en) | Semantic recognition method and system | |
CN112216284B (en) | Training data updating method and system, voice recognition method and system and equipment | |
CN111428104A (en) | Epilepsy auxiliary medical intelligent question-answering method based on viewpoint type reading understanding | |
CN110265019B (en) | Voice recognition method and voice robot system | |
CN113268989B (en) | Multi-tone word processing method and device | |
US20240346950A1 (en) | Speaking practice system with redundant pronunciation correction | |
CN111104159A (en) | Annotation positioning method based on program analysis and neural network | |
CN111831792B (en) | Electric power knowledge base construction method and system | |
CN113299280A (en) | Professional vocabulary speech recognition method based on Kaldi | |
CN117113947A (en) | Form filling system, method, electronic equipment and storage medium | |
CN107274886B (en) | Voice recognition method and device | |
CN1099165A (en) | Chinese written language-phonetics transfer method and system based on waveform compilation | |
CN112270192B (en) | Semantic recognition method and system based on part of speech and deactivated word filtering | |
CN107825433A (en) | A kind of card machine people of children speech instruction identification | |
CN115270810A (en) | Intention recognition device and method based on sentence similarity | |
CN110866400B (en) | Automatic change lexical analysis system of update | |
CN113627191A (en) | Automatic labeling method and system for meteorological early warning sample semantics | |
CN111312211A (en) | Dialect speech recognition system based on oversampling technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210824 |