CN112259084A - Speech recognition method, apparatus and storage medium - Google Patents

Speech recognition method, apparatus and storage medium Download PDF

Info

Publication number
CN112259084A
CN112259084A CN202010597703.2A CN202010597703A CN112259084A CN 112259084 A CN112259084 A CN 112259084A CN 202010597703 A CN202010597703 A CN 202010597703A CN 112259084 A CN112259084 A CN 112259084A
Authority
CN
China
Prior art keywords
sentence
text
current
lattice
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010597703.2A
Other languages
Chinese (zh)
Inventor
吴川隆
邓丽萍
张超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huijun Technology Co.,Ltd.
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010597703.2A priority Critical patent/CN112259084A/en
Publication of CN112259084A publication Critical patent/CN112259084A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Abstract

The disclosure provides a voice recognition method, a voice recognition device and a storage medium, and relates to the technical field of voice recognition. The disclosed speech recognition method includes: acquiring candidate lattice according to the voice signal of the current statement; resetting the neural network model according to the upper text corresponding to the current sentence, wherein the upper text is the recognition text of the previous sentence or a plurality of sentences of the current sentence; re-scoring the candidate lattice through the reset neural network model to obtain a re-scored lattice; and determining the recognition text of the current sentence according to the re-grading lattice. By the method, the information of one or more sentences in the current sentence can be considered for the speech recognition of the current sentence, so that the prior information is more fully utilized, the reprinting is more accurate, and the accuracy of the speech recognition is improved.

Description

Speech recognition method, apparatus and storage medium
Technical Field
The present disclosure relates to the field of speech recognition technologies, and in particular, to a speech recognition method, apparatus, and storage medium.
Background
The voice recognition is a key technology in systems such as voice quality inspection, man-machine conversation and the like, and is widely applied to the fields of logistics, finance, industry and the like. For example, in a dialogue robot, if the speech recognition accuracy is poor, the real intention of the speaker cannot be accurately understood, and an erroneous instruction is issued.
Disclosure of Invention
It is an object of the present disclosure to improve the accuracy of speech recognition.
According to an aspect of some embodiments of the present disclosure, there is provided a speech recognition method including: acquiring a lattice candidate according to a speech signal of a current statement; resetting a neural network model according to an upper text corresponding to the current sentence, wherein the upper text is an identification text of a previous sentence or a plurality of sentences of the current sentence, and the neural network model is generated based on corpus sample training with the upper text; re-scoring the candidate lattice through the reset neural network model to obtain a re-scored lattice; and determining the recognition text of the current sentence according to the re-grading lattice.
In some embodiments, the speech recognition method further comprises: and storing the identification text of the current sentence into a buffer area so as to be used as the previous text of the subsequent sentence.
In some embodiments, the speech recognition method further comprises: and acquiring the recognized text corresponding to the current sentence from the cache region.
In some embodiments, obtaining the candidate lattice from the speech signal of the current sentence comprises: and decoding the voice signal once based on the acoustic model and the language model to obtain candidate lattice.
In some embodiments, determining the recognition text of the current sentence from the re-scored lattice comprises: and (4) performing acoustic weight and language weight analysis on the heavily-scored lattice to obtain a decoding result of a path with the highest score as the recognition text of the current sentence.
In some embodiments, the neural network model comprises a LSTM (Long-Short Term Memory) model or a GRU (Gate Recurrent Unit) model.
In some embodiments, where the speech signal is of a conversation, the recognized text corresponding to the current sentence includes the speech signal of the utterance closest to the current sentence of the previous speaker of the current sentence.
In some embodiments, the speech recognition method further comprises: training the neural network model using the above samples until the output of the loss function converges, comprising: acquiring sample candidate lattice according to the voice signal of the current sample statement; resetting a neural network model to be trained according to an upper sample text corresponding to a current sample sentence, wherein the upper sample text is a sample text of a previous sentence or a plurality of sentences of the current sample sentence; the sample candidate lattice is re-scored through the reset neural network model to be trained, the re-scored sample lattice is obtained, and the recognition text of the current sample sentence is determined; and determining the output of the loss function according to the identification text of the current sample sentence and the sample text of the current sample sentence.
By the method, the information of one or more sentences in the current sentence can be considered for the speech recognition of the current sentence, so that the prior information is more fully utilized, the reprinting is more accurate, and the accuracy of the speech recognition is improved.
According to an aspect of further embodiments of the present disclosure, there is provided a speech recognition apparatus including: a decoding unit configured to acquire a candidate lattice from a speech signal of a current sentence; the reset unit is configured to reset the neural network model according to the recognized text corresponding to the current sentence, wherein the text is the recognized text of the previous sentence or multiple sentences of the current sentence, and the neural network model is generated based on corpus sample training with the text; the re-scoring unit is configured to re-score the candidate lattice through the reset neural network model to obtain a re-scored lattice; and the identification unit is configured to determine the identification text of the current sentence according to the re-grading lattice.
In some embodiments, the speech recognition apparatus further comprises: and the cache unit is configured to store the identification text of the current sentence into the cache region so as to serve as the text of the subsequent sentence.
In some embodiments, the reset unit is further configured to retrieve the identified above text corresponding to the current sentence from the buffer.
In some embodiments, the decoding unit is configured to decode the speech signal in one pass based on the acoustic model and the language model, obtaining the candidate lattice.
In some embodiments, the recognition unit is configured to perform acoustic weight and language weight analysis on the scoring lattice, and obtain a decoding result of a path with the highest score as the recognition text of the current sentence.
In some embodiments, the neural network model comprises an LSTM model or a GRU model.
In some embodiments, where the speech signal is of a conversation, the recognized text corresponding to the current sentence includes the speech signal of the utterance closest to the current sentence of the previous speaker of the current sentence.
In some embodiments, the speech recognition apparatus further comprises: a training unit configured to train the neural network model using the above-mentioned samples until the output of the loss function converges.
According to an aspect of some embodiments of the present disclosure, there is provided a speech recognition apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform any of the speech recognition methods mentioned above based on instructions stored in the memory.
The device can consider the information of one or more sentences in the current sentence in the speech recognition, thereby more fully utilizing the prior information, leading the re-scoring to be more accurate and improving the accuracy of the speech recognition.
According to an aspect of some embodiments of the present disclosure, a computer-readable storage medium is proposed, on which computer program instructions are stored, which instructions, when executed by a processor, implement the steps of any of the speech recognition methods mentioned above.
By executing the instructions on the computer-readable storage medium, the information of one or more sentences can be considered in the speech recognition of the current sentence, so that the prior information is more fully utilized, the re-scoring is more accurate, and the accuracy of the speech recognition is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure. In the drawings:
fig. 1 is a flow diagram of some embodiments of a speech recognition method of the present disclosure.
FIG. 2 is a flow diagram of further embodiments of speech recognition methods of the present disclosure.
Fig. 3 is a schematic diagram of some embodiments of speech recognition devices of the present disclosure.
FIG. 4 is a schematic diagram of further embodiments of speech recognition apparatus of the present disclosure.
Fig. 5 is a schematic diagram of a speech recognition device according to still other embodiments of the present disclosure.
Detailed Description
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
The speech recognition system firstly utilizes a simple language model to carry out rapid decoding to generate a lattice network, and then utilizes a complex language model to re-score the generated lattice network so as to obtain higher recognition accuracy. The speech recognition rate obtained by one-time decoding is often low, and the accuracy can be further improved after the complex language model obtained by large corpus training is re-scored. The language model adopted for the re-scoring firstly adopts a high-order n-gram language model, and then the neural network replaces the scheme of performing lattice re-scoring by adopting the n-gram language model by virtue of the excellent modeling capability of the neural network.
The inventor finds that although the performance of the neural network is superior, the related art often performs the re-scoring according to the relation between the preceding and following words, and does not consider the logic between the preceding and following sentences.
A flow diagram of some embodiments of the speech recognition method of the present disclosure is shown in fig. 1.
In step 101, a candidate lattice is obtained from the speech signal of the current sentence.
In some embodiments, the speech signal may be decoded in one pass based on the acoustic model and the language model to obtain the candidate lattice. In some embodiments, one decoding pass can be performed in any manner in the related art to obtain the original lattice network, i.e., as the candidate lattice.
In step 102, the neural network model is reset according to the identified above text corresponding to the current sentence. The above text may be an identification text of a sentence or sentences preceding the current sentence, for example a predetermined number of sentences immediately preceding the current sentence, or a preceding sentence. In some embodiments, the paragraphs may be divided by speech interval time, or distinguished by keywords.
In some embodiments, the execution order of steps 101, 102 may not be sequential.
In step 103, the candidate lattice is re-scored by the reset neural network model, and a re-scored lattice is obtained. In some embodiments, the scoring lattice may be analyzed for acoustic weight and language weight, and the decoding result of the path with the highest score is obtained as the recognition text of the current sentence.
In step 104, the recognition text of the current sentence is determined according to the re-scored lattice.
By the method, the information of one or more sentences in the current sentence can be considered for the speech recognition of the current sentence, so that the prior information is more fully utilized, the reprinting is more accurate, and the accuracy of the speech recognition is improved.
In some embodiments, where the speech signal is of a conversation, the recognized text corresponding to the current sentence includes the speech signal of the utterance closest to the current sentence of the previous speaker of the current sentence. In some embodiments, the speaker may be judged to have changed based on the tone.
By the method, the question-answer logic in the communication process can be fully utilized, and the accuracy of voice recognition is further improved.
A flow diagram of further embodiments of the speech recognition method of the present disclosure is shown in fig. 2.
In step 201, a speech signal is decoded once based on the acoustic model and the low-order language model to obtain candidate lattice.
In step 202, the recognized text corresponding to the current sentence is retrieved from the buffer. In some embodiments, the corresponding text above may be retrieved in the buffer according to a predetermined policy, which may include determining the recognized text of the last speaker speaking in proximity to the text, or the recognized text of the last sentence, last paragraph.
In step 203, the neural network model is reset based on the above text obtained from the buffer.
In step 204, the candidate lattice is re-scored by the reset neural network model, and a re-scored lattice is obtained. In some embodiments, the neural network model comprises an LSTM model or a GRU model.
In step 205, the scoring lattice is analyzed by acoustic weight and language weight, and the decoding result of the path with the highest score is obtained as the recognition text of the current sentence.
In step 206, the recognized text of the current sentence is stored in the buffer as the above text of the subsequent sentence.
By the method, the recognized text can be cached and managed in time to serve as a basis for recognizing the subsequent sentences; the neural network model is reset in time, the current statement is analyzed and estimated by utilizing the information, and the prediction accuracy of the language model is improved.
In some embodiments, the neural network model needs to be trained prior to speech recognition by any of the methods described above. The corpus sample needs to be provided with the above. In some embodiments, the training text with the above context may be obtained for training according to the corresponding application scenario, and when the result of the loss function converges to be stable (e.g., the output changes less than a predetermined value), the neural network training ends. In the testing stage, a sample candidate lattice can be obtained according to the speech signal of the current sample sentence, and the neural network model is reset through the above sample text corresponding to the current sample sentence. In some embodiments, the above sample text is a sample text of a sentence or sentences preceding the current sample sentence. And re-scoring the sample candidate lattice through the reset neural network model to be trained, and determining the optimal recognition text.
By the method, the neural network model can be trained on the basis of the corpus sample with the above, so that the generated neural network model has the capability of performing re-scoring by utilizing the logicality between the front sentence and the back sentence, and the accuracy of voice recognition is further improved.
Found after testing with the voice test data set. By the method in the disclosed embodiment, the PPL (Perplexity) of the single-layer LSTM neural language model is reduced from 43.2 to 40.05; meanwhile, the accuracy of voice recognition is absolutely improved by 0.7% by the repetition of Lattice scoring, and the improvement effect is obvious.
A schematic diagram of some embodiments of the speech recognition apparatus of the present disclosure is shown in fig. 3.
The decoding unit 301 can acquire a candidate from the speech signal of the current sentence. In some embodiments, the speech signal may be decoded in one pass based on the acoustic model and the language model to obtain the candidate lattice.
The reset unit 302 is capable of resetting the neural network model according to the recognized above text corresponding to the current sentence. The above text may be an identification text of a sentence or sentences preceding the current sentence, for example a predetermined number of sentences immediately preceding the current sentence, or a preceding sentence. In some embodiments, the paragraphs may be divided by speech interval time, or distinguished by keywords.
The re-scoring unit 303 can re-score the candidate lattice through the reset neural network model, and obtain a re-scored lattice. In some embodiments, the scoring lattice may be analyzed for acoustic weight and language weight, and the decoding result of the path with the highest score is obtained as the recognition text of the current sentence.
The recognition unit 304 can determine the recognition text of the current sentence from the re-scored lattice.
The device can consider the information of one or more sentences in the current sentence in the speech recognition, thereby more fully utilizing the prior information, leading the re-scoring to be more accurate and improving the accuracy of the speech recognition.
In some embodiments, as shown in fig. 3, the speech recognition apparatus may further include a buffer unit 305 capable of storing the recognition text of the current sentence in a buffer so as to be the above text of the subsequent sentence. The resetting unit 302 can obtain the recognized text corresponding to the current sentence from the buffer, and reset the neural network model according to the obtained text. In some embodiments, the corresponding text above may be retrieved in the buffer according to a predetermined policy, which may include determining the recognized text of the last speaker speaking in proximity to the text, or the recognized text of the last sentence, last paragraph.
The device can buffer and manage the recognized text in time to serve as a basis for recognizing the subsequent sentences; the neural network model is reset in time, the current statement is analyzed and estimated by utilizing the information, and the prediction accuracy of the language model is improved.
In some embodiments, as shown in fig. 3, the speech recognition apparatus may further include a training unit 306 capable of training the neural network model until the output of the loss function converges, generating a re-scoring unit 303. Training the corpus sample based on the above needs to be provided. In some embodiments, the training unit 306 may perform training based on the initial speech recognition apparatus shown in fig. 3, and input the corpus sample into the decoding unit 301, and obtain a sample candidate from the speech signal of the current sample sentence; the method comprises the steps that a reset unit resets a neural network model to be trained through an upper sample text corresponding to a current sample sentence, a re-grading unit re-grades a sample candidate lattice through the reset neural network model to be trained to obtain a re-graded sample lattice, and an identification unit determines an identification text of the current sample sentence; the training unit 306 determines the output of the loss function according to the recognition text of the current sample sentence and the sample text of the current sample sentence, and if the training unit 306 determines that the change of the output is smaller than the predetermined value, it is determined that the output is converged, and the training of the neural network model is completed.
The device can train the neural network model based on the corpus sample with the above, so that the generated neural network model has the capability of carrying out re-scoring by utilizing the logicality between the front sentence and the back sentence, and the accuracy of voice recognition is further improved.
A schematic structural diagram of an embodiment of the speech recognition apparatus of the present disclosure is shown in fig. 4. The speech recognition device comprises a memory 401 and a processor 402. Wherein: the memory 401 may be a magnetic disk, flash memory, or any other non-volatile storage medium. The memory is for storing the instructions in the corresponding embodiments of the speech recognition method above. The processor 402 is coupled to the memory 401 and may be implemented as one or more integrated circuits, such as a microprocessor or microcontroller. The processor 402 is configured to execute instructions stored in the memory, and can make full use of prior information, so that the re-scoring is more accurate, and the accuracy of speech recognition is improved.
In one embodiment, as also shown in FIG. 5, the speech recognition apparatus 500 includes a memory 501 and a processor 502. The processor 502 is coupled to the memory 501 by a BUS 503. The speech recognition apparatus 500 may also be connected to an external storage 505 via a storage interface 504 for invoking external data, and may also be connected to a network or another computer system (not shown) via a network interface 506. And will not be described in detail herein.
In the embodiment, the data instruction is stored in the memory, and the instruction is processed by the processor, so that the prior information can be more fully utilized, the re-scoring is more accurate, and the accuracy of voice recognition is improved.
In another embodiment, a computer-readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the steps of the method in the corresponding embodiment of the speech recognition method. As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, apparatus, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Thus far, the present disclosure has been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Finally, it should be noted that: the above examples are intended only to illustrate the technical solutions of the present disclosure and not to limit them; although the present disclosure has been described in detail with reference to preferred embodiments, those of ordinary skill in the art will understand that: modifications to the specific embodiments of the disclosure or equivalent substitutions for parts of the technical features may still be made; all such modifications are intended to be included within the scope of the claims of this disclosure without departing from the spirit thereof.

Claims (13)

1. A speech recognition method comprising:
acquiring a lattice candidate according to a speech signal of a current statement;
resetting a neural network model according to an upper text corresponding to a current sentence, wherein the upper text is an identification text of a previous sentence or a plurality of sentences of the current sentence, and the neural network model is generated based on corpus sample training with the upper text;
re-scoring the candidate lattice through the reset neural network model to obtain a re-scored lattice;
and determining the identification text of the current sentence according to the re-scoring lattice.
2. The method of claim 1, further comprising:
and storing the identification text of the current sentence into a buffer area so as to be used as the previous text of the subsequent sentence.
3. The method of claim 2, further comprising:
and acquiring the text corresponding to the current sentence from the cache region.
4. The method of claim 1, wherein the obtaining a candidate lattice from the speech signal of the current sentence comprises:
and decoding the voice signal once based on the acoustic model and the language model to obtain the candidate lattice.
5. The method of claim 1, wherein said determining an identification text of said current sentence from said re-scoring lattice comprises:
and performing acoustic weight and language weight analysis on the re-scored lattice to obtain a decoding result of a path with the highest score, wherein the decoding result is used as the identification text of the current sentence.
6. The method of claim 1, wherein the neural network model comprises an LSTM model or a GRU model.
7. The method of claim 1, wherein, in the case where the speech signal is a speech signal of a conversation,
the above text corresponding to the current sentence includes a speech signal of a previous speaker of the current sentence that is closest to the utterance of the current sentence.
8. The method of any of claims 1-7, further comprising:
training the neural network model with the above samples until the output of the loss function converges, comprising:
acquiring sample candidate lattice according to the voice signal of the current sample statement;
resetting a neural network model to be trained according to an upper sample text corresponding to a current sample sentence, wherein the upper sample text is a sample text of a previous sentence or a plurality of sentences of the current sample sentence;
the sample candidate lattice is re-scored through the reset neural network model to be trained, a re-scored sample lattice is obtained, and the recognition text of the current sample sentence is determined;
and determining the output of the loss function according to the identification text of the current sample sentence and the sample text of the current sample sentence.
9. A speech recognition apparatus comprising:
a decoding unit configured to acquire a lattice candidate from a speech signal of a current sentence;
the reset unit is configured to reset a neural network model according to an upper text corresponding to a current sentence, wherein the upper text is a recognition text of a previous sentence or multiple sentences of the current sentence, and the neural network model is generated based on corpus sample training with the upper text;
the re-scoring unit is configured to re-score the candidate lattice through the reset neural network model to obtain a re-scored lattice;
an identification unit configured to determine an identification text of the current sentence according to the re-scored lattice.
10. The apparatus of claim 9, further comprising:
and the cache unit is configured to store the identification text of the current sentence into the cache region so as to serve as the text of the subsequent sentence.
11. The apparatus of claim 9 or 10, further comprising:
a training unit configured to train the neural network model using the above-mentioned samples until an output of the loss function converges.
12. A speech recognition apparatus comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the method of any of claims 1-8 based on instructions stored in the memory.
13. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 8.
CN202010597703.2A 2020-06-28 2020-06-28 Speech recognition method, apparatus and storage medium Pending CN112259084A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010597703.2A CN112259084A (en) 2020-06-28 2020-06-28 Speech recognition method, apparatus and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010597703.2A CN112259084A (en) 2020-06-28 2020-06-28 Speech recognition method, apparatus and storage medium

Publications (1)

Publication Number Publication Date
CN112259084A true CN112259084A (en) 2021-01-22

Family

ID=74224197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010597703.2A Pending CN112259084A (en) 2020-06-28 2020-06-28 Speech recognition method, apparatus and storage medium

Country Status (1)

Country Link
CN (1) CN112259084A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112885338A (en) * 2021-01-29 2021-06-01 深圳前海微众银行股份有限公司 Speech recognition method, apparatus, computer-readable storage medium, and program product
CN113838456A (en) * 2021-09-28 2021-12-24 科大讯飞股份有限公司 Phoneme extraction method, voice recognition method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060069558A1 (en) * 2004-09-10 2006-03-30 Beattie Valerie L Sentence level analysis
JP2008181537A (en) * 2008-02-18 2008-08-07 Sony Corp Information processor, processing method, program and storage medium
CN108711422A (en) * 2018-05-14 2018-10-26 腾讯科技(深圳)有限公司 Audio recognition method, device, computer readable storage medium and computer equipment
CN110517693A (en) * 2019-08-01 2019-11-29 出门问问(苏州)信息科技有限公司 Audio recognition method, device, electronic equipment and computer readable storage medium
CN111145733A (en) * 2020-01-03 2020-05-12 深圳追一科技有限公司 Speech recognition method, speech recognition device, computer equipment and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060069558A1 (en) * 2004-09-10 2006-03-30 Beattie Valerie L Sentence level analysis
JP2008181537A (en) * 2008-02-18 2008-08-07 Sony Corp Information processor, processing method, program and storage medium
CN108711422A (en) * 2018-05-14 2018-10-26 腾讯科技(深圳)有限公司 Audio recognition method, device, computer readable storage medium and computer equipment
CN110517693A (en) * 2019-08-01 2019-11-29 出门问问(苏州)信息科技有限公司 Audio recognition method, device, electronic equipment and computer readable storage medium
CN111145733A (en) * 2020-01-03 2020-05-12 深圳追一科技有限公司 Speech recognition method, speech recognition device, computer equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张剑: "连续语音识别中的循环神经网络语言模型技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 7, pages 136 - 95 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112885338A (en) * 2021-01-29 2021-06-01 深圳前海微众银行股份有限公司 Speech recognition method, apparatus, computer-readable storage medium, and program product
CN113838456A (en) * 2021-09-28 2021-12-24 科大讯飞股份有限公司 Phoneme extraction method, voice recognition method, device, equipment and storage medium
WO2023050541A1 (en) * 2021-09-28 2023-04-06 科大讯飞股份有限公司 Phoneme extraction method, speech recognition method and apparatus, device and storage medium

Similar Documents

Publication Publication Date Title
US10741170B2 (en) Speech recognition method and apparatus
CN107301860B (en) Voice recognition method and device based on Chinese-English mixed dictionary
CN107195295B (en) Voice recognition method and device based on Chinese-English mixed dictionary
CN109887497B (en) Modeling method, device and equipment for speech recognition
JP5901001B1 (en) Method and device for acoustic language model training
US8818813B2 (en) Methods and system for grammar fitness evaluation as speech recognition error predictor
US10902846B2 (en) Spoken language understanding apparatus and spoken language understanding method using the same
WO2018192186A1 (en) Speech recognition method and apparatus
KR101587866B1 (en) Apparatus and method for extension of articulation dictionary by speech recognition
KR20140028174A (en) Method for recognizing speech and electronic device thereof
CN110473527B (en) Method and system for voice recognition
CN114038447A (en) Training method of speech synthesis model, speech synthesis method, apparatus and medium
JP6552999B2 (en) Text correction device, text correction method, and program
JP2020042257A (en) Voice recognition method and device
CN112259084A (en) Speech recognition method, apparatus and storage medium
JP2017058507A (en) Speech recognition device, speech recognition method, and program
JP5180800B2 (en) Recording medium for storing statistical pronunciation variation model, automatic speech recognition system, and computer program
EP3953928A1 (en) Automated speech recognition confidence classifier
CN113327575B (en) Speech synthesis method, device, computer equipment and storage medium
CN110223674B (en) Speech corpus training method, device, computer equipment and storage medium
KR20130126570A (en) Apparatus for discriminative training acoustic model considering error of phonemes in keyword and computer recordable medium storing the method thereof
US20220270637A1 (en) Utterance section detection device, utterance section detection method, and program
JP6716513B2 (en) VOICE SEGMENT DETECTING DEVICE, METHOD THEREOF, AND PROGRAM
Damavandi et al. NN-grams: Unifying neural network and n-gram language models for speech recognition
JP6082657B2 (en) Pose assignment model selection device, pose assignment device, method and program thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210526

Address after: 100176 room 1004, 10th floor, building 1, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Beijing Huijun Technology Co.,Ltd.

Address before: Room A402, 4th floor, building 2, No.18, Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: BEIJING WODONG TIANJUN INFORMATION TECHNOLOGY Co.,Ltd.

Applicant before: BEIJING JINGDONG CENTURY TRADING Co.,Ltd.

TA01 Transfer of patent application right