CN113707137B - Decoding realization method and device - Google Patents

Decoding realization method and device Download PDF

Info

Publication number
CN113707137B
CN113707137B CN202111007250.4A CN202111007250A CN113707137B CN 113707137 B CN113707137 B CN 113707137B CN 202111007250 A CN202111007250 A CN 202111007250A CN 113707137 B CN113707137 B CN 113707137B
Authority
CN
China
Prior art keywords
score
path
state
self
jump
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111007250.4A
Other languages
Chinese (zh)
Other versions
CN113707137A (en
Inventor
肖艳红
赵茂祥
李全忠
何国涛
蒲瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Puqiang Times Zhuhai Hengqin Information Technology Co ltd
Original Assignee
Puqiang Times Zhuhai Hengqin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Puqiang Times Zhuhai Hengqin Information Technology Co ltd filed Critical Puqiang Times Zhuhai Hengqin Information Technology Co ltd
Priority to CN202111007250.4A priority Critical patent/CN113707137B/en
Publication of CN113707137A publication Critical patent/CN113707137A/en
Application granted granted Critical
Publication of CN113707137B publication Critical patent/CN113707137B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/148Duration modelling in HMMs, e.g. semi HMM, segmental models or transition probabilities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis

Abstract

The invention relates to a decoding realization method and a decoding realization device, comprising a topological structure of an HMM model of a modeling unit, wherein the topological structure comprises a starting state, a transmitting state and an ending state; setting a self-jump edge in the transmitting state for self-jump of the transmitting state; the transmit state includes a self-hop path and a transfer path such that the topology completes sequence alignment; the step of aligning the topology completion sequence is as follows: when each frame of audio is decoded, calculating the acoustic score and the language score of blank characters used by the self-jump path and the acoustic score and the language score of effective characters used by the transfer path; comparing the scores of each path, and determining the highest score as the emission state score; and performing sequence alignment according to the emission state score. The invention can greatly reduce the number of models in the decoding network, thereby greatly reducing the memory required in the decoding process.

Description

Decoding realization method and device
Technical Field
The invention belongs to the technical field of neural networks, and particularly relates to a decoding realization method and device.
Background
In speech recognition, an input speech sequence and an output sequence are not equal in length, and one frame of data for speech recognition is difficult to give one pronunciation unit, but tens of frames of data are easy to judge the corresponding pronunciation unit. The traditional acoustic model training of voice recognition requires that the data of each frame is known to be effective in training, and the pretreatment of voice forced alignment is required before the data is trained. Compared with the traditional acoustic model, the acoustic model training adopting CTC as a loss function is a complete end-to-end type, and data does not need to be aligned in advance, and only one input sequence and one output sequence are needed. The CTC model introduces blank characters that are introduced for alignment with the input features, which have no output meaning. In the decoding process based on the CTC model, since each modeling unit is connected with one blank character, the decoding network contains a large number of blank character models, and the blank characters have no actual output meaning,
the HMM model is a model commonly used in the problem of sequence alignment, and plays an important role in the decoding process of speech recognition. It comprises the following parts:
state set of N transmitting states, state transition probability, observation sequence, here each o t The set U belonging to the acoustic model modeling unit, the emission probability, i.e. the likelihood of the acoustic model, represents the observation o seen at state i t The two special states can be used to more conveniently splice a plurality of HMMs into a larger HMM.
In the related art, the topology of the HMM model includes a start state, an end state, and a transmission state. The edges between states represent the direction and weight of the jump. Each emission state represents a modeling unit of an acoustic model (the modeling unit of the acoustic model may be a phoneme, a pinyin, a word, etc.), and the emission probability at time t is an acoustic model likelihood score of the modeling unit at time t.
The topology and sequence alignment process is that in CTC model based decoding, each modeling unit has one HMM model, each HMM model has three states, where HMM topologies with blank characters can self-hop, while HMM topologies with other modeling units or valid characters cannot self-hop.
Because the HMM model in the prior art contains a large number of blank character models, the blank characters have no actual output meaning, and the decoding network is larger, so that the memory required for speech recognition decoding is larger.
Disclosure of Invention
In view of the above, the present invention aims to overcome the shortcomings of the prior art, and provide a decoding implementation method and apparatus, so as to solve the problem in the prior art that the memory required for speech recognition decoding is large due to the large decoding network.
In order to achieve the above purpose, the invention adopts the following technical scheme: a decoding implementation method, comprising:
providing a topology of an HMM model of a modeling unit, the topology including a start state, an emission state, and an end state; setting a self-jump edge in the transmitting state for self-jump of the transmitting state; the transmit state includes a self-hop path and a transfer path such that the topology completes sequence alignment; the step of aligning the topology completion sequence is as follows:
when each frame of audio is decoded, calculating the acoustic score and the language score of blank characters used by the self-jump path and the acoustic score and the language score of effective characters used by the transfer path;
comparing the scores of each path, and determining the highest score as the emission state score;
and performing sequence alignment according to the emission state score.
Further, the decoding calculates an acoustic score and a language score of the blank character used by the self-jump path and an acoustic score and a language score of a modeling unit other than the blank character used by the transfer path by using a viterbi algorithm.
Further, the scores of the paths are compared, and the path with the highest score is determined
Comparing the sum of the acoustic score and the language score of each path;
the path for which the sum of the acoustic score and the language score is the highest score is determined as the path of the highest score.
Further, the sequence alignment according to the highest-score path includes:
if the transmission state score of the current frame is the score of the blank character, the frame is aligned with the blank character;
if the transmission status score of the current frame is a score of a valid character, it indicates that the frame is aligned with the modeling unit to which the valid character belongs.
Further, if the current frame is aligned with the blank character, the self-jump of the modeling unit is represented, if the current frame is aligned with the valid character, the HMM state of the modeling unit to which the valid character belongs jumps from the transmitting state to the ending state, and the ending state continues to expand the starting states of other modeling units until the decoding is finished.
Further, voice data are obtained, and the tone pinyin corresponding to the voice data is modeled by adopting initials, finals and tones, so that a plurality of modeling units are generated.
An embodiment of the present application provides a decoding implementation apparatus, including:
the building module is used for providing a topological structure of the HMM model of the modeling unit, wherein the topological structure comprises a starting state, a transmitting state and an ending state; setting a self-jump edge in the transmitting state for self-jump of the transmitting state; the transmit state includes a self-hop path and a transfer path such that the topology completes sequence alignment; the step of aligning the topology completion sequence is as follows:
when each frame of audio is decoded, calculating the acoustic score and the language score of blank characters used by the self-jump path and the acoustic score and the language score of effective characters used by the transfer path;
comparing the scores of each path, and determining the highest score as the emission state score;
and performing sequence alignment according to the emission state score.
By adopting the technical scheme, the invention has the following beneficial effects:
the invention provides a decoding realization method and a decoding realization device, comprising a topological structure of an HMM model of a modeling unit, wherein the topological structure comprises a starting state, a transmitting state and an ending state; setting a self-jump edge in the transmitting state for self-jump of the transmitting state; the transmit state includes a self-hop path and a transfer path such that the topology completes sequence alignment; the step of aligning the topology completion sequence is as follows: when each frame of audio is decoded, calculating the acoustic score and the language score of blank characters used by the self-jump path and the acoustic score and the language score of effective characters used by the transfer path; comparing the scores of each path, and determining the highest score as the emission state score; and performing sequence alignment according to the emission state score. The invention can greatly reduce the number of models in the decoding network, thereby greatly reducing the memory required in the decoding process.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a topology sequence alignment step of an HMM model in the prior art;
FIG. 2 is a schematic diagram of a topology of an HMM model provided by the present invention;
FIG. 3 is a schematic diagram of steps for alignment of a topology completion sequence according to the present invention;
FIG. 4 is a flow chart of the decoding process according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, based on the examples herein, which are within the scope of the invention as defined by the claims, will be within the scope of the invention as defined by the claims.
The topology structure of the HMM model in the prior art realizes the sequence alignment in the following way: as shown in fig. 1, after the input audio (hello) is extracted by the framing and windowing feature, an acoustic posterior sequence is obtained after the input audio is extracted by the acoustic model, the third part in fig. 1 is a path aligned with the HMM sequence, only the starting state of the first HMM model and the ending state of the last HMM model are reserved in the figure, and the starting state and the ending state of the HMM model at the middle position are removed. When the posterior sequence is aligned with the modeling unit, an HMM model of the modeling unit is used, and the emission probability of the model is the likelihood score of the modeling unit at the time t, here we assume that the modeling unit is pinyin. The method comprises the steps that when the alignment result of continuous voice frames is blank characters, the HMM of the blank characters automatically jumps in the transmission state, when the alignment result of the frames at a certain moment (t=3) is other modeling units such as ni3, the state of the HMM of the blank characters is transferred from the transmission state to the ending state, the ending state of the HMM is connected with the starting states of other modeling units such as ni3, and jumps from the starting state to the transmission state of ni3, the transmission probability of the frames is the acoustic likelihood score of ni3, ni3 cannot automatically jump, and therefore jumps from the transmission state to the ending state, and expands for a new round to connect with the starting states of other modeling units. The whole decoding process adopts a Viterbi algorithm, and at each time t, the acoustic and language scores of blank characters and other modeling units are calculated respectively, then clipping is carried out, and finally the optimal decoding result is obtained. And performing sequence alignment according to the decoding result.
A specific decoding implementation method and apparatus provided in the embodiments of the present application are described below with reference to the accompanying drawings.
As shown in fig. 1, the decoding implementation method provided in the embodiment of the present application,
providing a topology of an HMM model of a modeling unit, the topology including a start state, an emission state, and an end state; setting a self-jump edge in the transmitting state for self-jump of the transmitting state; the transmit state includes a self-hop path and a transfer path such that the topology completes sequence alignment; the step of aligning the topology completion sequence is as follows:
s101, when each frame of audio is decoded, calculating the acoustic score and the language score of blank characters used by a self-jump path and the acoustic score and the language score of effective characters used by a transition path;
s102, comparing the scores of each path, and determining the highest score as a transmitting state score;
s103, performing sequence alignment according to the emission state score.
It should be noted that, as shown in fig. 2, each circle represents a state of HMM, a start state represented by a dark circle, an end state represented by a double circle, and an emission state represented by a middle circle. The edges between states represent the direction and weight of the jump, where the edges of the circular arc represent that the jump can be made by itself.
Compared with the prior art, the topological structure provided by the application removes the HMM model of the blank character, the emission states of the HMM models of other acoustic model modeling units are increased by the self-jump edge, wherein the emission probability of the self-jump edge is the emission probability of the blank character, and for input audio (hello), the topological structure provided by the application can complete the same sequence alignment, and the alignment process is as follows: the emitting state of the HMM model of each pronunciation unit has a self-jump edge, but the emitting probability of the self-jump is the emitting probability of blank characters, and the emitting probability of transition to the ending state is the emitting probability of valid characters. When each frame of audio is decoded, traversing two paths of the emission state of each modeling unit, and performing self-jump and transition, wherein the self-jump path uses acoustic and language scores of blank characters, the transition path uses acoustic and language scores of the characters, the score of each path is calculated, and a relatively high score is selected as the score of the emission state of the characters. Comparing the scores of each path, and determining the highest score as the emission state score; and performing sequence alignment according to the emission state score.
In some embodiments, the decoding employs a viterbi algorithm to calculate the acoustic and language scores of the blank characters used by the self-skip path and the acoustic and language scores of the modeling units outside the blank characters used by the transfer path.
Preferably, the score of each path is compared to determine the path of the highest score
Comparing the sum of the acoustic score and the language score of each path;
the path for which the sum of the acoustic score and the language score is the highest score is determined as the path of the highest score.
Preferably, the sequence alignment according to the highest-score path includes:
if the transmission state score of the current frame is the score of the blank character, the frame is aligned with the blank character;
if the transmission status score of the current frame is a score of a valid character, it indicates that the frame is aligned with the modeling unit to which the valid character belongs.
In some embodiments, if the current frame and the blank character are aligned, the self-jump of the modeling unit is represented, if the current frame and the valid character are aligned, the HMM state of the modeling unit to which the valid character belongs jumps from the transmitting state to the ending state, and the ending state continues to expand the starting states of other modeling units until the decoding is finished.
Preferably, the method comprises the steps of obtaining voice data and modeling the tone-added pinyin corresponding to the voice data by adopting initials, finals and tones to generate a plurality of modeling units.
As shown in fig. 4, the technical solution provided in the present application is to implement the HMM model of the blank character in the form of viterbi algorithm. If the frame and blank character are aligned, the self-jump of the modeling unit such as (ni 3) is represented, and if the frame and the modeling unit (ni 3) are aligned, the HMM state of the modeling unit jumps from the transmitting state to the ending state, and the ending state continues to expand the starting states of other modeling units until decoding is finished. The algorithm adopted in the decoding process is still a Viterbi algorithm, the alignment results are the same, and since a blank character is connected between modeling units of each non-blank character in each path in the decoding process, the number of HMM models of the blank character in the decoding process is very large, and the blank character has no output meaning, the improved scheme can greatly reduce the number of models in the decoding network under the condition of not influencing the recognition result, and further greatly reduce the memory required in the decoding process.
An embodiment of the present application provides a decoding implementation apparatus, including:
the building module is used for providing a topological structure of the HMM model of the modeling unit, wherein the topological structure comprises a starting state, a transmitting state and an ending state; setting a self-jump edge in the transmitting state for self-jump of the transmitting state; the transmit state includes a self-hop path and a transfer path such that the topology completes sequence alignment; the step of aligning the topology completion sequence is as follows:
when each frame of audio is decoded, calculating the acoustic score and the language score of blank characters used by the self-jump path and the acoustic score and the language score of effective characters used by the transfer path;
comparing the scores of each path, and determining the highest score as the emission state score;
and performing sequence alignment according to the emission state score.
The embodiment of the application provides computer equipment, which comprises a processor and a memory connected with the processor;
the memory is used for storing a computer program, and the computer program is used for executing the decoding implementation method provided by any one of the embodiments;
the processor is used to call and execute the computer program in the memory.
In summary, the present invention provides a decoding implementation method and apparatus, including providing a topology structure of an HMM model of a modeling unit, where the topology structure includes a start state, an emission state, and an end state; setting a self-jump edge in the transmitting state for self-jump of the transmitting state; the transmit state includes a self-hop path and a transfer path such that the topology completes sequence alignment; the step of aligning the topology completion sequence is as follows: when each frame of audio is decoded, calculating the acoustic score and the language score of blank characters used by the self-jump path and the acoustic score and the language score of effective characters used by the transfer path; comparing the scores of each path, and determining the highest score as the emission state score; and performing sequence alignment according to the emission state score. The invention can greatly reduce the number of models in the decoding network, thereby greatly reducing the memory required in the decoding process.
It can be understood that the above-provided method embodiments correspond to the above-described apparatus embodiments, and corresponding specific details may be referred to each other and will not be described herein.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A decoding implementation method is characterized in that,
providing a topology of an HMM model of a modeling unit, the topology including a start state, an emission state, and an end state; setting a self-jump edge in the transmitting state for self-jump of the transmitting state; the transmit state includes a self-hop path and a transfer path such that the topology completes sequence alignment; the step of aligning the topology completion sequence is as follows:
when each frame of audio is decoded, calculating the acoustic score and the language score of blank characters used by the self-jump path and the acoustic score and the language score of effective characters used by the transfer path; comparing the scores of each path, and determining the highest score as the emission state score; performing sequence alignment according to the emission state score; comparing the scores of each path, and determining the path with the highest score:
comparing the sum of the acoustic score and the language score of each path;
the path for which the sum of the acoustic score and the language score is the highest score is determined as the path of the highest score.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the decoding calculates an acoustic score and a language score of a blank character used by the self-jump path and an acoustic score and a language score of a modeling unit outside the blank character used by the transfer path by using a Viterbi algorithm.
3. The method of claim 1, wherein performing sequence alignment according to the highest scoring path comprises:
if the transmission state score of the current frame is the score of the blank character, the frame is aligned with the blank character;
if the transmission status score of the current frame is a score of a valid character, it indicates that the frame is aligned with the modeling unit to which the valid character belongs.
4. The method of claim 3, wherein the step of,
if the current frame is aligned with the blank character, the self-jump of the modeling unit is represented, if the current frame is aligned with the valid character, the HMM state of the modeling unit to which the valid character belongs jumps from the transmitting state to the ending state, and the ending state of the HMM state continues to be expanded and connected with the starting states of other modeling units until decoding is finished.
5. The method according to any one of claim 1 to 4, wherein,
and acquiring voice data, modeling the tone pinyin corresponding to the voice data by adopting initials, finals and tones, and generating a plurality of modeling units.
6. A decoding implementation apparatus, comprising:
the building module is used for providing a topological structure of the HMM model of the modeling unit, wherein the topological structure comprises a starting state, a transmitting state and an ending state; setting a self-jump edge in the transmitting state for self-jump of the transmitting state; the transmit state includes a self-hop path and a transfer path such that the topology completes sequence alignment; the step of aligning the topology completion sequence is as follows:
when each frame of audio is decoded, calculating the acoustic score and the language score of blank characters used by the self-jump path and the acoustic score and the language score of effective characters used by the transfer path; comparing the scores of each path, and determining the highest score as the emission state score;
performing sequence alignment according to the emission state score;
comparing the scores of each path, and determining the path with the highest score:
comparing the sum of the acoustic score and the language score of each path;
the path for which the sum of the acoustic score and the language score is the highest score is determined as the path of the highest score.
CN202111007250.4A 2021-08-30 2021-08-30 Decoding realization method and device Active CN113707137B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111007250.4A CN113707137B (en) 2021-08-30 2021-08-30 Decoding realization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111007250.4A CN113707137B (en) 2021-08-30 2021-08-30 Decoding realization method and device

Publications (2)

Publication Number Publication Date
CN113707137A CN113707137A (en) 2021-11-26
CN113707137B true CN113707137B (en) 2024-02-20

Family

ID=78657035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111007250.4A Active CN113707137B (en) 2021-08-30 2021-08-30 Decoding realization method and device

Country Status (1)

Country Link
CN (1) CN113707137B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117357073B (en) * 2023-12-07 2024-04-05 北京清雷科技有限公司 Sleep stage method and device based on GMM-HMM model

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007225931A (en) * 2006-02-23 2007-09-06 Advanced Telecommunication Research Institute International Speech recognition system and computer program
CN103021408A (en) * 2012-12-04 2013-04-03 中国科学院自动化研究所 Method and device for speech recognition, optimizing and decoding assisted by stable pronunciation section
CN105529027A (en) * 2015-12-14 2016-04-27 百度在线网络技术(北京)有限公司 Voice identification method and apparatus
CN106710606A (en) * 2016-12-29 2017-05-24 百度在线网络技术(北京)有限公司 Method and device for treating voice based on artificial intelligence
CN108269568A (en) * 2017-01-03 2018-07-10 中国科学院声学研究所 A kind of acoustic training model method based on CTC
CN108305634A (en) * 2018-01-09 2018-07-20 深圳市腾讯计算机系统有限公司 Coding/decoding method, decoder and storage medium
WO2018232591A1 (en) * 2017-06-20 2018-12-27 Microsoft Technology Licensing, Llc. Sequence recognition processing
CN110136715A (en) * 2019-05-16 2019-08-16 北京百度网讯科技有限公司 Audio recognition method and device
CN112133285A (en) * 2020-08-31 2020-12-25 北京三快在线科技有限公司 Voice recognition method, voice recognition device, storage medium and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FI114051B (en) * 2001-11-12 2004-07-30 Nokia Corp Procedure for compressing dictionary data

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007225931A (en) * 2006-02-23 2007-09-06 Advanced Telecommunication Research Institute International Speech recognition system and computer program
CN103021408A (en) * 2012-12-04 2013-04-03 中国科学院自动化研究所 Method and device for speech recognition, optimizing and decoding assisted by stable pronunciation section
CN105529027A (en) * 2015-12-14 2016-04-27 百度在线网络技术(北京)有限公司 Voice identification method and apparatus
CN106710606A (en) * 2016-12-29 2017-05-24 百度在线网络技术(北京)有限公司 Method and device for treating voice based on artificial intelligence
CN108269568A (en) * 2017-01-03 2018-07-10 中国科学院声学研究所 A kind of acoustic training model method based on CTC
WO2018232591A1 (en) * 2017-06-20 2018-12-27 Microsoft Technology Licensing, Llc. Sequence recognition processing
CN108305634A (en) * 2018-01-09 2018-07-20 深圳市腾讯计算机系统有限公司 Coding/decoding method, decoder and storage medium
CN110136715A (en) * 2019-05-16 2019-08-16 北京百度网讯科技有限公司 Audio recognition method and device
CN112133285A (en) * 2020-08-31 2020-12-25 北京三快在线科技有限公司 Voice recognition method, voice recognition device, storage medium and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于Transformer的越南语连续语音识别;刘佳文;屈丹;杨绪魁;张昊;唐君;;信息工程大学学报;第21卷(第02期);第129-133页 *
基于编码—解码模型的序列映射若干问题研究;侯俊峰;《信息科技》(第01期);第1-95页 *

Also Published As

Publication number Publication date
CN113707137A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
CN111145728B (en) Speech recognition model training method, system, mobile terminal and storage medium
WO2017101450A1 (en) Voice recognition method and device
WO2017076222A1 (en) Speech recognition method and apparatus
US7562010B1 (en) Generating confidence scores from word lattices
KR20220035222A (en) Speech recognition error correction method, related devices, and readable storage medium
CN105869629B (en) Audio recognition method and device
CN107123417A (en) Optimization method and system are waken up based on the customized voice that distinctive is trained
US20150019214A1 (en) Method and device for parallel processing in model training
US20210193121A1 (en) Speech recognition method, apparatus, and device, and storage medium
CN109448719A (en) Establishment of Neural Model method and voice awakening method, device, medium and equipment
CN111916058A (en) Voice recognition method and system based on incremental word graph re-scoring
CN108389575B (en) Audio data identification method and system
WO2015003436A1 (en) Method and device for parallel processing in model training
CN108710704A (en) Determination method, apparatus, electronic equipment and the storage medium of dialogue state
CN111243574B (en) Voice model adaptive training method, system, device and storage medium
CN113707137B (en) Decoding realization method and device
CN110705254A (en) Text sentence-breaking method and device, electronic equipment and storage medium
CN105845130A (en) Acoustic model training method and device for speech recognition
CN105989839A (en) Speech recognition method and speech recognition device
WO2021139233A1 (en) Method and apparatus for generating data extension mixed strategy, and computer device
JP2002215187A (en) Speech recognition method and device for the same
CN111386566A (en) Device control method, cloud device, intelligent device, computer medium and device
CN111081254A (en) Voice recognition method and device
CN111833852B (en) Acoustic model training method and device and computer readable storage medium
CN111128172B (en) Voice recognition method, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant