CN109273000B - Speech recognition method - Google Patents
Speech recognition method Download PDFInfo
- Publication number
- CN109273000B CN109273000B CN201811186096.XA CN201811186096A CN109273000B CN 109273000 B CN109273000 B CN 109273000B CN 201811186096 A CN201811186096 A CN 201811186096A CN 109273000 B CN109273000 B CN 109273000B
- Authority
- CN
- China
- Prior art keywords
- voice recognition
- recognition result
- voice
- corrected
- speech recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Abstract
The invention discloses a voice recognition method, which comprises the following steps: recognizing the voice information by using a first voice recognition method to obtain a first voice recognition result, and recognizing the voice information by using a second voice recognition method to obtain a second voice recognition result; and comparing the first voice recognition result with the second voice recognition result, outputting the voice recognition result according to the comparison result, and displaying the voice recognition result. By the scheme, the correlation detection of key information of voice data can be effectively solved, and the success rate of voice recognition is increased.
Description
Technical Field
The invention relates to the field of voice recognition, in particular to a voice recognition method.
Background
Speech recognition is an emerging research direction of current development, and the following five problems are mainly involved in speech recognition: the recognition and understanding of natural language by the transmitter. The continuous speech must first be broken down into words, phonemes, etc., and secondly a rule must be established that understands the semantics. The voice information of the capsule is large. The speech pattern is not only different for different speakers but also for the same speaker, e.g. the speech information of one speaker when speaking at will and carefully speaking is different. The way one speaks varies over time. ⒊ ambiguity of speech. The different words may sound similar when the speaker speaks. This is common in english and chinese. ⒋ the phonetic properties of individual letters or words, words are affected by the context so that accents, tones, volume, pronunciation speed, etc. are changed. ⒌ environmental noise and interference have a serious impact on speech recognition, resulting in low recognition rates. .
In order to solve the problems, scientists introduce deep learning research in the machine learning field into voice recognition acoustic model training, and the accuracy of the acoustic model is greatly improved by using a multi-layer neural network with RBM pre-training. In this respect, researchers from microsoft corporation have made breakthrough progress, and after they use deep neural network model (DNN), the speech recognition error rate is reduced by 30%, which is the fastest progress in speech recognition technology for the last 20 years. However, most mainstream speech recognition decoders have adopted finite state machine (WFST) based decoding networks that can integrate language models, dictionaries, and acoustically shared word sets into one large decoding network, greatly improving decoding speed and providing a basis for real-time application of speech recognition.
Nevertheless, the accuracy of recognition remains a serious problem, and in particular, there is no reasonable strategy at all for self-checking the recognition results. Regardless of what is identified. And directly outputting without any evaluation.
In order to solve the above problems, the present invention proposes a speech recognition method.
Disclosure of Invention
In order to solve the above problems, the present invention proposes a speech recognition method comprising:
recognizing the voice information by using a first voice recognition method to obtain a first voice recognition result, and recognizing the voice information by using a second voice recognition method to obtain a second voice recognition result; a kind of electronic device with high-pressure air-conditioning system
And comparing the first voice recognition result with the second voice recognition result, outputting the voice recognition result according to the comparison result, and displaying the voice recognition result.
Preferably, the preprocessing method comprises fluency detection, endpoint detection, pre-emphasis, framing and windowing;
1) Endpoint detection
The endpoint detection adopts the following mode: setting a time threshold T0, a time interval Deltat and a sound threshold V0, and carrying out signal acquisition through an audio signal acquisition circuit to continuously acquire sound signals of N time nodes, wherein N is greater than T0/Deltat;
if the sound signals > V0 of INT (0.6N) time nodes are satisfied, then the sound is considered to be detected and the state is set to S1; wherein INT (·) represents rounding; if the previous state s=0 when the sound is detected, the start point of the sound is considered to be detected;
if the sound signal < V0 for INT (0.6N) time nodes is satisfied, then no sound is considered detected and the state is set to S0; wherein INT (·) represents rounding; if the previous state s=1 when the sound is detected, then the end point of the sound is considered to be detected;
after the end point detection is finished, cutting off silence at two ends of the sound signal;
2) Fluency detection
Cutting the voice into a front part and a rear part, sampling the front part and the rear part, continuously collecting the voice signals of M time nodes, and if the voice signals of M time nodes are met with V0, considering that the fluency is problematic, at the moment, cutting the voice of the part, wherein the cut voice is an effective voice segment; calculating the lengths of the effective voice segments of the front part and the rear part respectively, selecting a value with smaller length and the length value of the total voice to be scored for dividing operation, comparing the obtained value with a corresponding threshold value, and judging that the voice is fluent if the obtained value is larger than the corresponding threshold value; otherwise, judging that the flow is unfavorable;
3) Pre-emphasis
A high-pass filter H (z) =1-alpha z-1 with a pre-emphasis coefficient of 0.91 is adopted to eliminate signal attenuation, and the high-frequency part of the signal is promoted; framing the pre-emphasized signal, wherein the frame length of the frame is 15ms, the voice sampling frequency is 11025Hz, the frame length is 256 samples, and the frame is shifted by 128 samples;
the signal x (n) of each frame is smoothed using a hamming window.
Preferably, the first voice recognition method is as follows:
acquiring characteristic parameters in the voice information; the characteristic parameters include pitch, frequency, rate of frequency change, pitch period, gain, and band pass unvoiced/voiced intensity;
if the acquired voice characteristic parameters pass through the corresponding ANN model, voice recognition is carried out; corresponding words and sentences are obtained.
The second voice recognition method comprises the following steps:
1) Acquiring characteristic parameters in the voice information; the probability of generating an observation sequence O given the model Λ,
defining a forward variable alpha t (i):
α t (t)=P{O 1 ,O 2 ,…,O t ;q t =S i |Λ}
Namely: under the given model condition, generating a partial observation symbol sequence before t, and at the moment of t, in a state S i Probability of (2);
initializing:
α 1 (i)=π i b i (O 1 )1≤i≤N
pi is the initial state distribution, pi= { pi i },π i =P[q 1 =S i ]J is more than or equal to 1 and less than or equal to N, and B is observation symbol probability distribution of states;
B={b j (O k )},b j (O k ) P [ output observation symbol at time t is O k |q t =S j ],1≤j≤N,1≤k≤M;
Iterative calculation:
finally calculate
Wherein a is ij B is an element in the state transition matrix j (O t ) Is an element in the observation symbol matrix;
2) The Baum-Welch algorithm finds the optimal solution λ=argmax { P (O|Λ) };
3) The Viterbi algorithm solves the optimal state transition sequence;
4) According to lambda corresponding to the optimal state sequence, candidate syllables or initial consonants are given;
5) Words and sentences are formed by the language model.
Preferably, the first speech recognition method is a large vocabulary speech recognition method based on a preset model, and the second speech recognition method is a speech recognition method based on an auxiliary speech data packet.
Preferably, the method further comprises:
a plurality of voice data packets are preset, the voice data packets are stored in the electronic equipment, the electronic equipment is connected with a processor, and the processor is connected with a server.
Preferably, the specific method for outputting the voice recognition result according to the comparison result comprises the following steps:
s1: comparing the first voice recognition result with the second voice recognition result, and if the coverage rate of the first voice recognition result and the second voice recognition result is lower than a set threshold value, executing the following steps, wherein the coverage rate refers to the ratio of complete repetition, the comparison is started from the first character, and the ratio of the same character number to the total character number is compared:
judging whether the character numbers of the first voice recognition result and the second voice recognition result are the same;
1) If the first voice recognition result and the second voice recognition result are the same, matching is carried out, and the matching quantity is counted; and calculates the similarity R: r=q (R1, R2)/Max (|r1|, |r2|); q (R1, R2) represents the same number as R1, R2; i.e. the same number of first speech recognition results as the second speech recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
2) If not, deleting irrelevant characters of the first voice recognition result and the second voice recognition result, including: deleting the deactivated characters and the continuous identical characters; obtaining a corrected first voice recognition result and a corrected second voice recognition result; judging whether the character number of the corrected first voice recognition result is the same as that of the corrected second voice recognition result, and if so, judging that R=Q (R1, R2)/Max (|R1|, |R2|); q (R1, R2) represents the same number as R1, R2; namely, the same number of corrected first voice recognition results and corrected second voice recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
if the number of characters of the corrected first voice recognition result is different from that of the corrected second voice recognition result, comparing the corrected first voice recognition result with the corrected second voice recognition result from front to back respectively, and calculating the similarity RA:
RA=Q1(R1,R2)/Max(|R1|,|R2|);
q1 (R1, R2) representing the same number of the corrected first speech recognition result and the corrected second speech recognition result compared from front to back; max (|r1|, |r2|) represents the maximum value of R1, R2;
comparing the corrected first voice recognition result with the corrected second voice recognition result from back to front, and calculating the similarity RB:
RB=Q2(R1,R2)/Max(|R1|,|R2|);
q2 (R1, R2) representing the same number of corrected first speech recognition results as corrected second speech recognition results compared from back to front; max (|r1|, |r2|) represents the maximum value of R1, R2;
compare RA, RB, r=max (RA, RB); s2, executing;
s2: if R is smaller than the appointed value, discarding the identification result, and resampling.
The invention has the beneficial effects that:
1) The correct recognition rate can be effectively improved;
2) The pre-judgment can be effectively prevented, and the output of an error result can be automatically prevented.
Detailed Description
Specific embodiments of the present invention will now be described in order to provide a clearer understanding of the technical features, objects and effects of the present invention.
A speech recognition method, the speech recognition method comprising:
recognizing the voice information by using a first voice recognition method to obtain a first voice recognition result, and recognizing the voice information by using a second voice recognition method to obtain a second voice recognition result; a kind of electronic device with high-pressure air-conditioning system
And comparing the first voice recognition result with the second voice recognition result, outputting the voice recognition result according to the comparison result, and displaying the voice recognition result.
Preferably, the preprocessing method comprises fluency detection, endpoint detection, pre-emphasis, framing and windowing;
1) Endpoint detection
The endpoint detection adopts the following mode: setting a time threshold T0, a time interval Deltat and a sound threshold V0, and carrying out signal acquisition through an audio signal acquisition circuit to continuously acquire sound signals of N time nodes, wherein N is greater than T0/Deltat;
if the sound signals > V0 of INT (0.6N) time nodes are satisfied, then the sound is considered to be detected and the state is set to S1; wherein INT (·) represents rounding; if the previous state s=0 when the sound is detected, the start point of the sound is considered to be detected;
if the sound signal < V0 for INT (0.6N) time nodes is satisfied, then no sound is considered detected and the state is set to S0; wherein INT (·) represents rounding; if the previous state s=1 when the sound is detected, then the end point of the sound is considered to be detected;
after the end point detection is finished, cutting off silence at two ends of the sound signal;
2) Fluency detection
Cutting the voice into a front part and a rear part, sampling the front part and the rear part, continuously collecting the voice signals of M time nodes, and if the voice signals of M time nodes are met with V0, considering that the fluency is problematic, at the moment, cutting the voice of the part, wherein the cut voice is an effective voice segment; calculating the lengths of the effective voice segments of the front part and the rear part respectively, selecting a value with smaller length and the length value of the total voice to be scored for dividing operation, comparing the obtained value with a corresponding threshold value, and judging that the voice is fluent if the obtained value is larger than the corresponding threshold value; otherwise, judging that the flow is unfavorable;
3) Pre-emphasis
A high-pass filter H (z) =1-alpha z-1 with a pre-emphasis coefficient of 0.91 is adopted to eliminate signal attenuation, and the high-frequency part of the signal is promoted; framing the pre-emphasized signal, wherein the frame length of the frame is 15ms, the voice sampling frequency is 11025Hz, the frame length is 256 samples, and the frame is shifted by 128 samples;
the signal x (n) of each frame is smoothed using a hamming window.
Preferably, the first voice recognition method is as follows:
acquiring characteristic parameters in the voice information; the characteristic parameters include pitch, frequency, rate of frequency change, pitch period, gain, and band pass unvoiced/voiced intensity;
if the acquired voice characteristic parameters pass through the corresponding ANN model, voice recognition is carried out; corresponding words and sentences are obtained.
The second voice recognition method comprises the following steps:
1) Acquiring characteristic parameters in the voice information; the probability of generating an observation sequence O given the model Λ,
defining a forward variable alpha t (i):
α t (t)=P{O 1 ,O 2 ,…,O t ;q t =S i |Λ}
Namely: under the given model condition, generating a partial observation symbol sequence before t, and at the moment of t, in a state S i Probability of (2);
initializing:
α 1 (i)=π i b i (O 1 ) 1≤i≤N
pi is the initial state distribution, pi= { pi i },π i =P[q 1 =S i ]J is more than or equal to 1 and less than or equal to N, and B is observation symbol probability distribution of states;
B={b j (O k )},b j (O k ) P [ output observation symbol at time t is O k |q t =S j ],1≤j≤N,1≤k≤M;
Iterative calculation:
finally calculate
Wherein a is ij B is an element in the state transition matrix j (O t ) Is an element in the observation symbol matrix;
2) The Baum-Welch algorithm finds the optimal solution λ=argmax { P (O|Λ) };
3) The Viterbi algorithm solves the optimal state transition sequence;
4) According to lambda corresponding to the optimal state sequence, candidate syllables or initial consonants are given;
5) Words and sentences are formed by the language model.
Preferably, the first speech recognition method is a large vocabulary speech recognition method based on a preset model, and the second speech recognition method is a speech recognition method based on an auxiliary speech data packet.
Preferably, the method further comprises:
a plurality of voice data packets are preset, the voice data packets are stored in the electronic equipment, the electronic equipment is connected with a processor, and the processor is connected with a server.
Preferably, the specific method for outputting the voice recognition result according to the comparison result comprises the following steps:
s1: comparing the first voice recognition result with the second voice recognition result, and if the coverage rate of the first voice recognition result and the second voice recognition result is lower than a set threshold value, executing the following steps, wherein the coverage rate refers to the ratio of complete repetition, the comparison is started from the first character, and the ratio of the same character number to the total character number is compared:
judging whether the character numbers of the first voice recognition result and the second voice recognition result are the same;
1) If the first voice recognition result and the second voice recognition result are the same, matching is carried out, and the matching quantity is counted; and calculates the similarity R: r=q (R1, R2)/Max (|r1|, |r2|); q (R1, R2) represents the same number as R1, R2; i.e. the same number of first speech recognition results as the second speech recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
2) If not, deleting irrelevant characters of the first voice recognition result and the second voice recognition result, including: deleting the deactivated characters and the continuous identical characters; obtaining a corrected first voice recognition result and a corrected second voice recognition result; judging whether the character number of the corrected first voice recognition result is the same as that of the corrected second voice recognition result, and if so, judging that R=Q (R1, R2)/Max (|R1|, |R2|); q (R1, R2) represents the same number as R1, R2; namely, the same number of corrected first voice recognition results and corrected second voice recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
if the number of characters of the corrected first voice recognition result is different from that of the corrected second voice recognition result, comparing the corrected first voice recognition result with the corrected second voice recognition result from front to back respectively, and calculating the similarity RA:
RA=Q1(R1,R2)/Max(|R1|,|R2|);
q1 (R1, R2) representing the same number of the corrected first speech recognition result and the corrected second speech recognition result compared from front to back; max (|r1|, |r2|) represents the maximum value of R1, R2;
comparing the corrected first voice recognition result with the corrected second voice recognition result from back to front, and calculating the similarity RB:
RB=Q2(R1,R2)/Max(|R1|,|R2|);
q2 (R1, R2) representing the same number of corrected first speech recognition results as corrected second speech recognition results compared from back to front; max (|r1|, |r2|) represents the maximum value of R1, R2;
compare RA, RB, r=max (RA, RB); s2, executing;
s2: if R is smaller than the appointed value, discarding the identification result, and resampling.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the described order of action, as some steps may take other order or be performed simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments and that the acts and elements referred to are not necessarily required in the present application.
In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in the embodiments may be accomplished by computer programs stored in a computer-readable storage medium, which when executed, may include the steps of the embodiments of the methods described above. Wherein the storage medium may be a magnetic disk, an optical disk, a ROM, a RAM, etc.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.
Claims (4)
1. A method of speech recognition, the method comprising:
acquiring voice information;
recognizing the voice information by using a first voice recognition method to obtain a first voice recognition result, and recognizing the voice information by using a second voice recognition method to obtain a second voice recognition result; comparing the first voice recognition result with the second voice recognition result, outputting the voice recognition result according to the comparison result, and displaying the voice recognition result;
the specific method for outputting the voice recognition result according to the comparison result comprises the following steps:
s1: comparing the first voice recognition result with the second voice recognition result, and if the coverage rate of the first voice recognition result and the second voice recognition result is lower than a set threshold value, executing the following steps, wherein the coverage rate refers to the ratio of complete repetition, the comparison is started from the first character, and the ratio of the same character number to the total character number is compared:
judging whether the character numbers of the first voice recognition result and the second voice recognition result are the same;
1) If the first voice recognition result and the second voice recognition result are the same, matching is carried out, and the matching quantity is counted; and calculates the similarity R: r=q (R1, R2)/Max (|r1|, |r2|); q (R1, R2) represents the same number as R1, R2; i.e. the same number of first speech recognition results as the second speech recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
2) If not, deleting irrelevant characters of the first voice recognition result and the second voice recognition result, including: deleting the deactivated characters and the continuous identical characters; obtaining a corrected first voice recognition result and a corrected second voice recognition result; judging whether the character number of the corrected first voice recognition result is the same as that of the corrected second voice recognition result, and if so, judging that R=Q (R1, R2)/Max (|R1|, |R2|); q (R1, R2) represents the same number as R1, R2; namely, the same number of corrected first voice recognition results and corrected second voice recognition results; max (|r1|, |r2|) represents the maximum value of R1, R2; s2, executing;
if the number of characters of the corrected first voice recognition result is different from that of the corrected second voice recognition result, comparing the corrected first voice recognition result with the corrected second voice recognition result from front to back respectively, and calculating the similarity RA:
RA=Q1(R1,R2)/Max(|R1|,|R2|);
q1 (R1, R2) representing the same number of the corrected first speech recognition result and the corrected second speech recognition result compared from front to back; max (|r1|, |r2|) represents the maximum value of R1, R2;
comparing the corrected first voice recognition result with the corrected second voice recognition result from back to front, and calculating the similarity RB:
RB=Q2(R1,R2)/Max(|R1|,|R2|);
q2 (R1, R2) representing the same number of corrected first speech recognition results as corrected second speech recognition results compared from back to front; max (|r1|, |r2|) represents the maximum value of R1, R2;
compare RA, RB, r=max (RA, RB); s2, executing;
s2: if R is smaller than the appointed value, discarding the identification result, and resampling.
2. A method for speech recognition according to claim 1,
after voice information is acquired, preprocessing the voice information;
the preprocessing method comprises fluency detection, endpoint detection, pre-emphasis, framing and windowing;
1) Endpoint detection
The endpoint detection adopts the following mode: setting a time threshold T0, a time interval delta T and a sound threshold V0, carrying out signal acquisition by an audio signal acquisition circuit, continuously acquiring sound signals of N time nodes,;/>
if INT (0.6N) time nodes are satisfiedThe sound is considered to be detected, and the state is set as S to be 1; wherein INT (·) represents rounding; if the previous state s=0 when the sound is detected, the start point of the sound is considered to be detected;
if INT (0.6N) time nodes are satisfiedIf no sound is detected, setting the state as S to 0; wherein INT (·) represents rounding; if the previous state s=1 when the sound is detected, then the end point of the sound is considered to be detected;
after the end point detection is finished, cutting off silence at two ends of the sound signal;
2) Fluency detection
Cutting the voice into front and back parts, sampling the front and back parts, continuously collecting the sound signals of M time nodes, if the sound signals of M time nodes are satisfiedThe fluency is considered to be problematic, and at this time, the part of voice is cut off, and the voice after cutting off is an effective voice segment; calculating the length of the effective voice segment of the front and the rear parts, selecting the value with smaller length and the length value of the total voice to be scored to perform dividing operation, and obtainingComparing the value with a corresponding threshold value, and judging that the flow is favorable if the value is larger than the corresponding threshold value; otherwise, judging that the flow is unfavorable;
3) Pre-emphasis
A high-pass filter H (z) =1-alpha z-1 with a pre-emphasis coefficient of 0.91 is adopted to eliminate signal attenuation, and the high-frequency part of the signal is promoted; framing the pre-emphasized signal, wherein the frame length of the frame is 15ms, the voice sampling frequency is 11025Hz, the frame length is 256 samples, and the frame is shifted by 128 samples;
the signal x (n) of each frame is smoothed using a hamming window.
3. A speech recognition method according to claim 1, wherein the first speech recognition method is a large vocabulary speech recognition method based on a predetermined model, and the second speech recognition method is a speech recognition method based on auxiliary speech data packets.
4. A method of speech recognition according to claim 2, wherein the method further comprises:
a plurality of voice data packets are preset and stored in electronic equipment, the electronic equipment is connected with a processor, and the processor is connected with a server.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811186096.XA CN109273000B (en) | 2018-10-11 | 2018-10-11 | Speech recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811186096.XA CN109273000B (en) | 2018-10-11 | 2018-10-11 | Speech recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109273000A CN109273000A (en) | 2019-01-25 |
CN109273000B true CN109273000B (en) | 2023-05-12 |
Family
ID=65196556
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811186096.XA Active CN109273000B (en) | 2018-10-11 | 2018-10-11 | Speech recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109273000B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110223679A (en) * | 2019-06-14 | 2019-09-10 | 南京机电职业技术学院 | A kind of voice recognition input devices |
CN110322883B (en) * | 2019-06-27 | 2023-02-17 | 上海麦克风文化传媒有限公司 | Voice-to-text effect evaluation optimization method |
CN110705302B (en) * | 2019-10-11 | 2023-12-12 | 掌阅科技股份有限公司 | Named entity identification method, electronic equipment and computer storage medium |
CN110853635B (en) * | 2019-10-14 | 2022-04-01 | 广东美的白色家电技术创新中心有限公司 | Speech recognition method, audio annotation method, computer equipment and storage device |
CN111243607A (en) * | 2020-03-26 | 2020-06-05 | 北京字节跳动网络技术有限公司 | Method, apparatus, electronic device, and medium for generating speaker information |
CN112364876A (en) * | 2020-11-25 | 2021-02-12 | 北京紫光青藤微系统有限公司 | Efficient bar code binarization method and system |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105529028A (en) * | 2015-12-09 | 2016-04-27 | 百度在线网络技术(北京)有限公司 | Voice analytical method and apparatus |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4734155B2 (en) * | 2006-03-24 | 2011-07-27 | 株式会社東芝 | Speech recognition apparatus, speech recognition method, and speech recognition program |
WO2010013371A1 (en) * | 2008-07-28 | 2010-02-04 | 日本電気株式会社 | Dialogue speech recognition system, dialogue speech recognition method, and recording medium for storing dialogue speech recognition program |
JP2011203434A (en) * | 2010-03-25 | 2011-10-13 | Fujitsu Ltd | Voice recognition device and voice recognition method |
CN104143330A (en) * | 2013-05-07 | 2014-11-12 | 佳能株式会社 | Voice recognizing method and voice recognizing system |
JP6277659B2 (en) * | 2013-10-15 | 2018-02-14 | 三菱電機株式会社 | Speech recognition apparatus and speech recognition method |
CN106663421B (en) * | 2014-07-08 | 2018-07-06 | 三菱电机株式会社 | Sound recognition system and sound identification method |
KR20170032096A (en) * | 2015-09-14 | 2017-03-22 | 삼성전자주식회사 | Electronic Device, Driving Methdo of Electronic Device, Voice Recognition Apparatus, Driving Method of Voice Recognition Apparatus, and Computer Readable Recording Medium |
CN106128462A (en) * | 2016-06-21 | 2016-11-16 | 东莞酷派软件技术有限公司 | Audio recognition method and system |
CN106328148B (en) * | 2016-08-19 | 2019-12-31 | 上汽通用汽车有限公司 | Natural voice recognition method, device and system based on local and cloud hybrid recognition |
CN108428382A (en) * | 2018-02-14 | 2018-08-21 | 广东外语外贸大学 | It is a kind of spoken to repeat methods of marking and system |
-
2018
- 2018-10-11 CN CN201811186096.XA patent/CN109273000B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105529028A (en) * | 2015-12-09 | 2016-04-27 | 百度在线网络技术(北京)有限公司 | Voice analytical method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN109273000A (en) | 2019-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109273000B (en) | Speech recognition method | |
US11062699B2 (en) | Speech recognition with trained GMM-HMM and LSTM models | |
CN103928023B (en) | A kind of speech assessment method and system | |
CN107221318B (en) | English spoken language pronunciation scoring method and system | |
CN111862954B (en) | Method and device for acquiring voice recognition model | |
US20220262352A1 (en) | Improving custom keyword spotting system accuracy with text-to-speech-based data augmentation | |
KR20230003056A (en) | Speech recognition using non-speech text and speech synthesis | |
CN106548775B (en) | Voice recognition method and system | |
JP2005043666A (en) | Voice recognition device | |
CN111798840B (en) | Voice keyword recognition method and device | |
US11335324B2 (en) | Synthesized data augmentation using voice conversion and speech recognition models | |
Wand et al. | Deep Neural Network Frontend for Continuous EMG-Based Speech Recognition. | |
CN110265063B (en) | Lie detection method based on fixed duration speech emotion recognition sequence analysis | |
JP2001166789A (en) | Method and device for voice recognition of chinese using phoneme similarity vector at beginning or end | |
CN106653002A (en) | Literal live broadcasting method and platform | |
CN114550703A (en) | Training method and device of voice recognition system, and voice recognition method and device | |
CN111785302A (en) | Speaker separation method and device and electronic equipment | |
CN107492373B (en) | Tone recognition method based on feature fusion | |
Chen et al. | An end-to-end speech recognition algorithm based on attention mechanism | |
Sajeer et al. | Novel approach of implementing speech recognition using neural networks for information retrieval | |
KR20210059581A (en) | Method and apparatus for automatic proficiency evaluation of speech | |
Aşlyan | Syllable Based Speech Recognition | |
CN111312216B (en) | Voice marking method containing multiple speakers and computer readable storage medium | |
Yeh et al. | Taiwanese speech recognition based on hybrid deep neural network architecture | |
Vyas et al. | Study of Speech Recognition Technology and its Significance in Human-Machine Interface |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |