CN109273000B - Speech recognition method - Google Patents

Speech recognition method Download PDF

Info

Publication number
CN109273000B
CN109273000B CN201811186096.XA CN201811186096A CN109273000B CN 109273000 B CN109273000 B CN 109273000B CN 201811186096 A CN201811186096 A CN 201811186096A CN 109273000 B CN109273000 B CN 109273000B
Authority
CN
China
Prior art keywords
voice recognition
recognition result
voice
corrected
speech recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811186096.XA
Other languages
Chinese (zh)
Other versions
CN109273000A (en
Inventor
马世辉
刘学军
李进波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Institute of Technology
Original Assignee
Henan Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Institute of Technology filed Critical Henan Institute of Technology
Priority to CN201811186096.XA priority Critical patent/CN109273000B/en
Publication of CN109273000A publication Critical patent/CN109273000A/en
Application granted granted Critical
Publication of CN109273000B publication Critical patent/CN109273000B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention discloses a voice recognition method, which comprises the following steps: recognizing the voice information by using a first voice recognition method to obtain a first voice recognition result, and recognizing the voice information by using a second voice recognition method to obtain a second voice recognition result; and comparing the first voice recognition result with the second voice recognition result, outputting the voice recognition result according to the comparison result, and displaying the voice recognition result. By the scheme, the correlation detection of key information of voice data can be effectively solved, and the success rate of voice recognition is increased.

Description

Speech recognition method
Technical Field
The invention relates to the field of voice recognition, in particular to a voice recognition method.
Background
Speech recognition is an emerging research direction of current development, and the following five problems are mainly involved in speech recognition: the recognition and understanding of natural language by the transmitter. The continuous speech must first be broken down into words, phonemes, etc., and secondly a rule must be established that understands the semantics. The voice information of the capsule is large. The speech pattern is not only different for different speakers but also for the same speaker, e.g. the speech information of one speaker when speaking at will and carefully speaking is different. The way one speaks varies over time. ⒊ ambiguity of speech. The different words may sound similar when the speaker speaks. This is common in english and chinese. ⒋ the phonetic properties of individual letters or words, words are affected by the context so that accents, tones, volume, pronunciation speed, etc. are changed. ⒌ environmental noise and interference have a serious impact on speech recognition, resulting in low recognition rates. .
In order to solve the problems, scientists introduce deep learning research in the machine learning field into voice recognition acoustic model training, and the accuracy of the acoustic model is greatly improved by using a multi-layer neural network with RBM pre-training. In this respect, researchers from microsoft corporation have made breakthrough progress, and after they use deep neural network model (DNN), the speech recognition error rate is reduced by 30%, which is the fastest progress in speech recognition technology for the last 20 years. However, most mainstream speech recognition decoders have adopted finite state machine (WFST) based decoding networks that can integrate language models, dictionaries, and acoustically shared word sets into one large decoding network, greatly improving decoding speed and providing a basis for real-time application of speech recognition.
Nevertheless, the accuracy of recognition remains a serious problem, and in particular, there is no reasonable strategy at all for self-checking the recognition results. Regardless of what is identified. And directly outputting without any evaluation.
In order to solve the above problems, the present invention proposes a speech recognition method.
Disclosure of Invention
In order to solve the above problems, the present invention proposes a speech recognition method comprising:
recognizing the voice information by using a first voice recognition method to obtain a first voice recognition result, and recognizing the voice information by using a second voice recognition method to obtain a second voice recognition result; a kind of electronic device with high-pressure air-conditioning system
And comparing the first voice recognition result with the second voice recognition result, outputting the voice recognition result according to the comparison result, and displaying the voice recognition result.
Preferably, the preprocessing method comprises fluency detection, endpoint detection, pre-emphasis, framing and windowing;
1) Endpoint detection
The endpoint detection adopts the following mode: setting a time threshold T0, a time interval Deltat and a sound threshold V0, and carrying out signal acquisition through an audio signal acquisition circuit to continuously acquire sound signals of N time nodes, wherein N is greater than T0/Deltat;
if the sound signals > V0 of INT (0.6N) time nodes are satisfied, then the sound is considered to be detected and the state is set to S1; wherein INT (·) represents rounding; if the previous state s=0 when the sound is detected, the start point of the sound is considered to be detected;
if the sound signal < V0 for INT (0.6N) time nodes is satisfied, then no sound is considered detected and the state is set to S0; wherein INT (·) represents rounding; if the previous state s=1 when the sound is detected, then the end point of the sound is considered to be detected;
after the end point detection is finished, cutting off silence at two ends of the sound signal;
2) Fluency detection
Cutting the voice into a front part and a rear part, sampling the front part and the rear part, continuously collecting the voice signals of M time nodes, and if the voice signals of M time nodes are met with V0, considering that the fluency is problematic, at the moment, cutting the voice of the part, wherein the cut voice is an effective voice segment; calculating the lengths of the effective voice segments of the front part and the rear part respectively, selecting a value with smaller length and the length value of the total voice to be scored for dividing operation, comparing the obtained value with a corresponding threshold value, and judging that the voice is fluent if the obtained value is larger than the corresponding threshold value; otherwise, judging that the flow is unfavorable;
3) Pre-emphasis
A high-pass filter H (z) =1-alpha z-1 with a pre-emphasis coefficient of 0.91 is adopted to eliminate signal attenuation, and the high-frequency part of the signal is promoted; framing the pre-emphasized signal, wherein the frame length of the frame is 15ms, the voice sampling frequency is 11025Hz, the frame length is 256 samples, and the frame is shifted by 128 samples;
the signal x (n) of each frame is smoothed using a hamming window.
Preferably, the first voice recognition method is as follows:
acquiring characteristic parameters in the voice information; the characteristic parameters include pitch, frequency, rate of frequency change, pitch period, gain, and band pass unvoiced/voiced intensity;
if the acquired voice characteristic parameters pass through the corresponding ANN model, voice recognition is carried out; corresponding words and sentences are obtained.
The second voice recognition method comprises the following steps:
1) Acquiring characteristic parameters in the voice information; the probability of generating an observation sequence O given the model Λ,
defining a forward variable alpha t (i):
α t (t)=P{O 1 ,O 2 ,…,O t ;q t =S i |Λ}
Namely: under the given model condition, generating a partial observation symbol sequence before t, and at the moment of t, in a state S i Probability of (2);
initializing:
α 1 (i)=π i b i (O 1 )1≤i≤N
pi is the initial state distribution, pi= { pi i },π i =P[q 1 =S i ]J is more than or equal to 1 and less than or equal to N, and B is observation symbol probability distribution of states;
B={b j (O k )},b j (O k ) P [ output observation symbol at time t is O k |q t =S j ],1≤j≤N,1≤k≤M;
Iterative calculation:
Figure BDA0001826255540000031
finally calculate
Figure BDA0001826255540000032
Wherein a is ij B is an element in the state transition matrix j (O t ) Is an element in the observation symbol matrix;
2) The Baum-Welch algorithm finds the optimal solution λ=argmax { P (O|Λ) };
3) The Viterbi algorithm solves the optimal state transition sequence;
4) According to lambda corresponding to the optimal state sequence, candidate syllables or initial consonants are given;
5) Words and sentences are formed by the language model.
Preferably, the first speech recognition method is a large vocabulary speech recognition method based on a preset model, and the second speech recognition method is a speech recognition method based on an auxiliary speech data packet.
Preferably, the method further comprises:
a plurality of voice data packets are preset, the voice data packets are stored in the electronic equipment, the electronic equipment is connected with a processor, and the processor is connected with a server.
Preferably, the specific method for outputting the voice recognition result according to the comparison result comprises the following steps:
s1: comparing the first voice recognition result with the second voice recognition result, and if the coverage rate of the first voice recognition result and the second voice recognition result is lower than a set threshold value, executing the following steps, wherein the coverage rate refers to the ratio of complete repetition, the comparison is started from the first character, and the ratio of the same character number to the total character number is compared:
judging whether the character numbers of the first voice recognition result and the second voice recognition result are the same;
1) If the first voice recognition result and the second voice recognition result are the same, matching is carried out, and the matching quantity is counted; and calculates the similarity R: r=q (R1, R2)/Max (|r1|, |r2|); q (R1, R2) represents the same number as R1, R2; i.e. the same number of first speech recognition results as the second speech recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
2) If not, deleting irrelevant characters of the first voice recognition result and the second voice recognition result, including: deleting the deactivated characters and the continuous identical characters; obtaining a corrected first voice recognition result and a corrected second voice recognition result; judging whether the character number of the corrected first voice recognition result is the same as that of the corrected second voice recognition result, and if so, judging that R=Q (R1, R2)/Max (|R1|, |R2|); q (R1, R2) represents the same number as R1, R2; namely, the same number of corrected first voice recognition results and corrected second voice recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
if the number of characters of the corrected first voice recognition result is different from that of the corrected second voice recognition result, comparing the corrected first voice recognition result with the corrected second voice recognition result from front to back respectively, and calculating the similarity RA:
RA=Q1(R1,R2)/Max(|R1|,|R2|);
q1 (R1, R2) representing the same number of the corrected first speech recognition result and the corrected second speech recognition result compared from front to back; max (|r1|, |r2|) represents the maximum value of R1, R2;
comparing the corrected first voice recognition result with the corrected second voice recognition result from back to front, and calculating the similarity RB:
RB=Q2(R1,R2)/Max(|R1|,|R2|);
q2 (R1, R2) representing the same number of corrected first speech recognition results as corrected second speech recognition results compared from back to front; max (|r1|, |r2|) represents the maximum value of R1, R2;
compare RA, RB, r=max (RA, RB); s2, executing;
s2: if R is smaller than the appointed value, discarding the identification result, and resampling.
The invention has the beneficial effects that:
1) The correct recognition rate can be effectively improved;
2) The pre-judgment can be effectively prevented, and the output of an error result can be automatically prevented.
Detailed Description
Specific embodiments of the present invention will now be described in order to provide a clearer understanding of the technical features, objects and effects of the present invention.
A speech recognition method, the speech recognition method comprising:
recognizing the voice information by using a first voice recognition method to obtain a first voice recognition result, and recognizing the voice information by using a second voice recognition method to obtain a second voice recognition result; a kind of electronic device with high-pressure air-conditioning system
And comparing the first voice recognition result with the second voice recognition result, outputting the voice recognition result according to the comparison result, and displaying the voice recognition result.
Preferably, the preprocessing method comprises fluency detection, endpoint detection, pre-emphasis, framing and windowing;
1) Endpoint detection
The endpoint detection adopts the following mode: setting a time threshold T0, a time interval Deltat and a sound threshold V0, and carrying out signal acquisition through an audio signal acquisition circuit to continuously acquire sound signals of N time nodes, wherein N is greater than T0/Deltat;
if the sound signals > V0 of INT (0.6N) time nodes are satisfied, then the sound is considered to be detected and the state is set to S1; wherein INT (·) represents rounding; if the previous state s=0 when the sound is detected, the start point of the sound is considered to be detected;
if the sound signal < V0 for INT (0.6N) time nodes is satisfied, then no sound is considered detected and the state is set to S0; wherein INT (·) represents rounding; if the previous state s=1 when the sound is detected, then the end point of the sound is considered to be detected;
after the end point detection is finished, cutting off silence at two ends of the sound signal;
2) Fluency detection
Cutting the voice into a front part and a rear part, sampling the front part and the rear part, continuously collecting the voice signals of M time nodes, and if the voice signals of M time nodes are met with V0, considering that the fluency is problematic, at the moment, cutting the voice of the part, wherein the cut voice is an effective voice segment; calculating the lengths of the effective voice segments of the front part and the rear part respectively, selecting a value with smaller length and the length value of the total voice to be scored for dividing operation, comparing the obtained value with a corresponding threshold value, and judging that the voice is fluent if the obtained value is larger than the corresponding threshold value; otherwise, judging that the flow is unfavorable;
3) Pre-emphasis
A high-pass filter H (z) =1-alpha z-1 with a pre-emphasis coefficient of 0.91 is adopted to eliminate signal attenuation, and the high-frequency part of the signal is promoted; framing the pre-emphasized signal, wherein the frame length of the frame is 15ms, the voice sampling frequency is 11025Hz, the frame length is 256 samples, and the frame is shifted by 128 samples;
the signal x (n) of each frame is smoothed using a hamming window.
Preferably, the first voice recognition method is as follows:
acquiring characteristic parameters in the voice information; the characteristic parameters include pitch, frequency, rate of frequency change, pitch period, gain, and band pass unvoiced/voiced intensity;
if the acquired voice characteristic parameters pass through the corresponding ANN model, voice recognition is carried out; corresponding words and sentences are obtained.
The second voice recognition method comprises the following steps:
1) Acquiring characteristic parameters in the voice information; the probability of generating an observation sequence O given the model Λ,
defining a forward variable alpha t (i):
α t (t)=P{O 1 ,O 2 ,…,O t ;q t =S i |Λ}
Namely: under the given model condition, generating a partial observation symbol sequence before t, and at the moment of t, in a state S i Probability of (2);
initializing:
α 1 (i)=π i b i (O 1 ) 1≤i≤N
pi is the initial state distribution, pi= { pi i },π i =P[q 1 =S i ]J is more than or equal to 1 and less than or equal to N, and B is observation symbol probability distribution of states;
B={b j (O k )},b j (O k ) P [ output observation symbol at time t is O k |q t =S j ],1≤j≤N,1≤k≤M;
Iterative calculation:
Figure BDA0001826255540000061
finally calculate
Figure BDA0001826255540000062
Wherein a is ij B is an element in the state transition matrix j (O t ) Is an element in the observation symbol matrix;
2) The Baum-Welch algorithm finds the optimal solution λ=argmax { P (O|Λ) };
3) The Viterbi algorithm solves the optimal state transition sequence;
4) According to lambda corresponding to the optimal state sequence, candidate syllables or initial consonants are given;
5) Words and sentences are formed by the language model.
Preferably, the first speech recognition method is a large vocabulary speech recognition method based on a preset model, and the second speech recognition method is a speech recognition method based on an auxiliary speech data packet.
Preferably, the method further comprises:
a plurality of voice data packets are preset, the voice data packets are stored in the electronic equipment, the electronic equipment is connected with a processor, and the processor is connected with a server.
Preferably, the specific method for outputting the voice recognition result according to the comparison result comprises the following steps:
s1: comparing the first voice recognition result with the second voice recognition result, and if the coverage rate of the first voice recognition result and the second voice recognition result is lower than a set threshold value, executing the following steps, wherein the coverage rate refers to the ratio of complete repetition, the comparison is started from the first character, and the ratio of the same character number to the total character number is compared:
judging whether the character numbers of the first voice recognition result and the second voice recognition result are the same;
1) If the first voice recognition result and the second voice recognition result are the same, matching is carried out, and the matching quantity is counted; and calculates the similarity R: r=q (R1, R2)/Max (|r1|, |r2|); q (R1, R2) represents the same number as R1, R2; i.e. the same number of first speech recognition results as the second speech recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
2) If not, deleting irrelevant characters of the first voice recognition result and the second voice recognition result, including: deleting the deactivated characters and the continuous identical characters; obtaining a corrected first voice recognition result and a corrected second voice recognition result; judging whether the character number of the corrected first voice recognition result is the same as that of the corrected second voice recognition result, and if so, judging that R=Q (R1, R2)/Max (|R1|, |R2|); q (R1, R2) represents the same number as R1, R2; namely, the same number of corrected first voice recognition results and corrected second voice recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
if the number of characters of the corrected first voice recognition result is different from that of the corrected second voice recognition result, comparing the corrected first voice recognition result with the corrected second voice recognition result from front to back respectively, and calculating the similarity RA:
RA=Q1(R1,R2)/Max(|R1|,|R2|);
q1 (R1, R2) representing the same number of the corrected first speech recognition result and the corrected second speech recognition result compared from front to back; max (|r1|, |r2|) represents the maximum value of R1, R2;
comparing the corrected first voice recognition result with the corrected second voice recognition result from back to front, and calculating the similarity RB:
RB=Q2(R1,R2)/Max(|R1|,|R2|);
q2 (R1, R2) representing the same number of corrected first speech recognition results as corrected second speech recognition results compared from back to front; max (|r1|, |r2|) represents the maximum value of R1, R2;
compare RA, RB, r=max (RA, RB); s2, executing;
s2: if R is smaller than the appointed value, discarding the identification result, and resampling.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the described order of action, as some steps may take other order or be performed simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments and that the acts and elements referred to are not necessarily required in the present application.
In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in the embodiments may be accomplished by computer programs stored in a computer-readable storage medium, which when executed, may include the steps of the embodiments of the methods described above. Wherein the storage medium may be a magnetic disk, an optical disk, a ROM, a RAM, etc.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims (4)

1. A method of speech recognition, the method comprising:
acquiring voice information;
recognizing the voice information by using a first voice recognition method to obtain a first voice recognition result, and recognizing the voice information by using a second voice recognition method to obtain a second voice recognition result; comparing the first voice recognition result with the second voice recognition result, outputting the voice recognition result according to the comparison result, and displaying the voice recognition result;
the specific method for outputting the voice recognition result according to the comparison result comprises the following steps:
s1: comparing the first voice recognition result with the second voice recognition result, and if the coverage rate of the first voice recognition result and the second voice recognition result is lower than a set threshold value, executing the following steps, wherein the coverage rate refers to the ratio of complete repetition, the comparison is started from the first character, and the ratio of the same character number to the total character number is compared:
judging whether the character numbers of the first voice recognition result and the second voice recognition result are the same;
1) If the first voice recognition result and the second voice recognition result are the same, matching is carried out, and the matching quantity is counted; and calculates the similarity R: r=q (R1, R2)/Max (|r1|, |r2|); q (R1, R2) represents the same number as R1, R2; i.e. the same number of first speech recognition results as the second speech recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
2) If not, deleting irrelevant characters of the first voice recognition result and the second voice recognition result, including: deleting the deactivated characters and the continuous identical characters; obtaining a corrected first voice recognition result and a corrected second voice recognition result; judging whether the character number of the corrected first voice recognition result is the same as that of the corrected second voice recognition result, and if so, judging that R=Q (R1, R2)/Max (|R1|, |R2|); q (R1, R2) represents the same number as R1, R2; namely, the same number of corrected first voice recognition results and corrected second voice recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
if the number of characters of the corrected first voice recognition result is different from that of the corrected second voice recognition result, comparing the corrected first voice recognition result with the corrected second voice recognition result from front to back respectively, and calculating the similarity RA:
RA=Q1(R1,R2)/Max(|R1|,|R2|);
q1 (R1, R2) representing the same number of the corrected first speech recognition result and the corrected second speech recognition result compared from front to back; max (|r1|, |r2|) represents the maximum value of R1, R2;
comparing the corrected first voice recognition result with the corrected second voice recognition result from back to front, and calculating the similarity RB:
RB=Q2(R1,R2)/Max(|R1|,|R2|);
q2 (R1, R2) representing the same number of corrected first speech recognition results as corrected second speech recognition results compared from back to front; max (|r1|, |r2|) represents the maximum value of R1, R2;
compare RA, RB, r=max (RA, RB); s2, executing;
s2: if R is smaller than the appointed value, discarding the identification result, and resampling.
2. A method for speech recognition according to claim 1,
after voice information is acquired, preprocessing the voice information;
the preprocessing method comprises fluency detection, endpoint detection, pre-emphasis, framing and windowing;
1) Endpoint detection
The endpoint detection adopts the following mode: setting a time threshold T0, a time interval delta T and a sound threshold V0, carrying out signal acquisition by an audio signal acquisition circuit, continuously acquiring sound signals of N time nodes,
Figure QLYQS_1
;/>
if INT (0.6N) time nodes are satisfied
Figure QLYQS_2
The sound is considered to be detected, and the state is set as S to be 1; wherein INT (·) represents rounding; if the previous state s=0 when the sound is detected, the start point of the sound is considered to be detected;
if INT (0.6N) time nodes are satisfied
Figure QLYQS_3
If no sound is detected, setting the state as S to 0; wherein INT (·) represents rounding; if the previous state s=1 when the sound is detected, then the end point of the sound is considered to be detected;
after the end point detection is finished, cutting off silence at two ends of the sound signal;
2) Fluency detection
Cutting the voice into front and back parts, sampling the front and back parts, continuously collecting the sound signals of M time nodes, if the sound signals of M time nodes are satisfied
Figure QLYQS_4
The fluency is considered to be problematic, and at this time, the part of voice is cut off, and the voice after cutting off is an effective voice segment; calculating the length of the effective voice segment of the front and the rear parts, selecting the value with smaller length and the length value of the total voice to be scored to perform dividing operation, and obtainingComparing the value with a corresponding threshold value, and judging that the flow is favorable if the value is larger than the corresponding threshold value; otherwise, judging that the flow is unfavorable;
3) Pre-emphasis
A high-pass filter H (z) =1-alpha z-1 with a pre-emphasis coefficient of 0.91 is adopted to eliminate signal attenuation, and the high-frequency part of the signal is promoted; framing the pre-emphasized signal, wherein the frame length of the frame is 15ms, the voice sampling frequency is 11025Hz, the frame length is 256 samples, and the frame is shifted by 128 samples;
the signal x (n) of each frame is smoothed using a hamming window.
3. A speech recognition method according to claim 1, wherein the first speech recognition method is a large vocabulary speech recognition method based on a predetermined model, and the second speech recognition method is a speech recognition method based on auxiliary speech data packets.
4. A method of speech recognition according to claim 2, wherein the method further comprises:
a plurality of voice data packets are preset and stored in electronic equipment, the electronic equipment is connected with a processor, and the processor is connected with a server.
CN201811186096.XA 2018-10-11 2018-10-11 Speech recognition method Active CN109273000B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811186096.XA CN109273000B (en) 2018-10-11 2018-10-11 Speech recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811186096.XA CN109273000B (en) 2018-10-11 2018-10-11 Speech recognition method

Publications (2)

Publication Number Publication Date
CN109273000A CN109273000A (en) 2019-01-25
CN109273000B true CN109273000B (en) 2023-05-12

Family

ID=65196556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811186096.XA Active CN109273000B (en) 2018-10-11 2018-10-11 Speech recognition method

Country Status (1)

Country Link
CN (1) CN109273000B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110223679A (en) * 2019-06-14 2019-09-10 南京机电职业技术学院 A kind of voice recognition input devices
CN110322883B (en) * 2019-06-27 2023-02-17 上海麦克风文化传媒有限公司 Voice-to-text effect evaluation optimization method
CN110705302B (en) * 2019-10-11 2023-12-12 掌阅科技股份有限公司 Named entity identification method, electronic equipment and computer storage medium
CN110853635B (en) * 2019-10-14 2022-04-01 广东美的白色家电技术创新中心有限公司 Speech recognition method, audio annotation method, computer equipment and storage device
CN111243607A (en) * 2020-03-26 2020-06-05 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for generating speaker information
CN112364876A (en) * 2020-11-25 2021-02-12 北京紫光青藤微系统有限公司 Efficient bar code binarization method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105529028A (en) * 2015-12-09 2016-04-27 百度在线网络技术(北京)有限公司 Voice analytical method and apparatus

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4734155B2 (en) * 2006-03-24 2011-07-27 株式会社東芝 Speech recognition apparatus, speech recognition method, and speech recognition program
WO2010013371A1 (en) * 2008-07-28 2010-02-04 日本電気株式会社 Dialogue speech recognition system, dialogue speech recognition method, and recording medium for storing dialogue speech recognition program
JP2011203434A (en) * 2010-03-25 2011-10-13 Fujitsu Ltd Voice recognition device and voice recognition method
CN104143330A (en) * 2013-05-07 2014-11-12 佳能株式会社 Voice recognizing method and voice recognizing system
JP6277659B2 (en) * 2013-10-15 2018-02-14 三菱電機株式会社 Speech recognition apparatus and speech recognition method
CN106663421B (en) * 2014-07-08 2018-07-06 三菱电机株式会社 Sound recognition system and sound identification method
KR20170032096A (en) * 2015-09-14 2017-03-22 삼성전자주식회사 Electronic Device, Driving Methdo of Electronic Device, Voice Recognition Apparatus, Driving Method of Voice Recognition Apparatus, and Computer Readable Recording Medium
CN106128462A (en) * 2016-06-21 2016-11-16 东莞酷派软件技术有限公司 Audio recognition method and system
CN106328148B (en) * 2016-08-19 2019-12-31 上汽通用汽车有限公司 Natural voice recognition method, device and system based on local and cloud hybrid recognition
CN108428382A (en) * 2018-02-14 2018-08-21 广东外语外贸大学 It is a kind of spoken to repeat methods of marking and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105529028A (en) * 2015-12-09 2016-04-27 百度在线网络技术(北京)有限公司 Voice analytical method and apparatus

Also Published As

Publication number Publication date
CN109273000A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN109273000B (en) Speech recognition method
US11062699B2 (en) Speech recognition with trained GMM-HMM and LSTM models
CN103928023B (en) A kind of speech assessment method and system
CN107221318B (en) English spoken language pronunciation scoring method and system
CN111862954B (en) Method and device for acquiring voice recognition model
US20220262352A1 (en) Improving custom keyword spotting system accuracy with text-to-speech-based data augmentation
KR20230003056A (en) Speech recognition using non-speech text and speech synthesis
CN106548775B (en) Voice recognition method and system
JP2005043666A (en) Voice recognition device
CN111798840B (en) Voice keyword recognition method and device
US11335324B2 (en) Synthesized data augmentation using voice conversion and speech recognition models
Wand et al. Deep Neural Network Frontend for Continuous EMG-Based Speech Recognition.
CN110265063B (en) Lie detection method based on fixed duration speech emotion recognition sequence analysis
JP2001166789A (en) Method and device for voice recognition of chinese using phoneme similarity vector at beginning or end
CN106653002A (en) Literal live broadcasting method and platform
CN114550703A (en) Training method and device of voice recognition system, and voice recognition method and device
CN111785302A (en) Speaker separation method and device and electronic equipment
CN107492373B (en) Tone recognition method based on feature fusion
Chen et al. An end-to-end speech recognition algorithm based on attention mechanism
Sajeer et al. Novel approach of implementing speech recognition using neural networks for information retrieval
KR20210059581A (en) Method and apparatus for automatic proficiency evaluation of speech
Aşlyan Syllable Based Speech Recognition
CN111312216B (en) Voice marking method containing multiple speakers and computer readable storage medium
Yeh et al. Taiwanese speech recognition based on hybrid deep neural network architecture
Vyas et al. Study of Speech Recognition Technology and its Significance in Human-Machine Interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant