CN112562646A - Robot voice recognition method - Google Patents
Robot voice recognition method Download PDFInfo
- Publication number
- CN112562646A CN112562646A CN202011447106.8A CN202011447106A CN112562646A CN 112562646 A CN112562646 A CN 112562646A CN 202011447106 A CN202011447106 A CN 202011447106A CN 112562646 A CN112562646 A CN 112562646A
- Authority
- CN
- China
- Prior art keywords
- probability
- matching
- voice
- characteristic information
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 230000003068 static effect Effects 0.000 claims abstract description 18
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000009432 framing Methods 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 9
- 238000001228 spectrum Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 6
- 238000009826 distribution Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Manipulator (AREA)
Abstract
The invention discloses a robot voice recognition method, which is characterized by comprising the following steps: acquiring a voice signal; extracting static characteristic information and dynamic characteristic information from the voice signal; and carrying out voice matching on the static characteristic information and the dynamic characteristic information, and when the voice signals are matched for the first time: obtaining the maximum probability from the matching output probabilities of all corresponding paths and comparing the maximum probability with the confidence probability, and outputting the output content of the corresponding path corresponding to the maximum probability when the maximum probability is greater than the confidence probability; when the maximum probability is smaller than the confidence probability, performing tone removing processing on the acquired voice signal, and performing secondary matching on the tone-removed voice signal; when the speech signal is a second match: and outputting the output content of the corresponding path with the highest probability in the matching output probabilities of all the corresponding paths. The invention introduces fuzzy recognition technology, can recognize a part of effective information of the signal under the condition of similar pronunciation of the voice signal, and improves the accuracy of recognition.
Description
Technical Field
The invention relates to the field of robots, in particular to a robot voice recognition method.
Background
Background art: with the development of society, nowadays, the voice recognition technology is mature day by day, the voice recognition technology is widely applied to our lives, people are also used to finish various affairs in a human-computer interaction mode, the life experience of people is enriched, and great convenience is brought to the lives. It can be said that speech recognition technology is ubiquitous in our lives. Under most circumstances, the speech recognition technology has higher recognition accuracy, can satisfy people to the interactive demand of people man-machine, but under different environment, speech recognition's accuracy can receive the influence of different degree, and traditional speech recognition technology can't satisfy all special circumstances, consequently need carry out optimization to data in the identification process.
The existing voice recognition technology is mainly used for outputting a final recognition result after preprocessing an input signal, extracting features and matching and comparing the input signal with an existing acoustic model. In a complex environment, the existing speech recognition technology is greatly influenced or even cannot be used, for example, in a noisy public environment, the signal recognition process is influenced by noise, which is a big problem to be solved at present.
Disclosure of Invention
The invention aims to provide a robot voice recognition method, which solves the technical problem that the accuracy of voice recognition is reduced due to external noise in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
a robot voice recognition method includes:
step 1: acquiring a voice signal;
step 2: preprocessing a voice signal and extracting static characteristic information;
and step 3: acquiring dynamic characteristic information through a difference algorithm according to the static characteristic information;
and 4, step 4: the hidden Markov model HMM is used to match the static characteristic information and the dynamic characteristic information with voice, the Viterbi algorithm is used to obtain the matching output probability of all the corresponding paths from the hidden Markov model HMM,
when the speech signal is a first match:
obtaining the maximum probability from the matching output probabilities of all the corresponding paths and comparing the maximum probability with the confidence probability,
when the maximum probability is greater than the confidence probability, outputting the output content of the corresponding path corresponding to the maximum probability, wherein the content is the identified content;
when the maximum probability is smaller than the confidence probability, performing tone removing processing on the voice signal obtained in the step 1, and performing step 2-4 on the tone removed voice signal for second matching;
when the speech signal is a second match:
and outputting the output content of the corresponding path with the highest probability in the matching output probabilities of all the corresponding paths, wherein the content is the identified content.
Further, the calculation formula of the confidence probability in the step 4 is as follows:
wherein, PcIs the confidence probability; m is the total number of successful first matching; piThe maximum probability value of successful first matching of the ith time.
Further, the preprocessing the voice signal in step 2 includes performing pre-emphasis, framing, and windowing in sequence.
Further, in the step 3, the FFT, mel filtering, and DCT processing are sequentially performed on the static feature information, and then the difference calculation is performed on the DCT processing result to obtain the dynamic feature information.
Further, in step 4, the artificial neural network ANN is combined with the hidden markov model HMM to perform speech matching on the static feature information and the dynamic feature information.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention can also recognize a part of effective information of the signal under the condition that the pronunciation of the voice signal is similar, thereby improving the recognition accuracy to a certain extent.
2. According to the invention, through the combination of the ANN and the HMM, the interference of noise on voice signals is reduced to a certain extent, and the anti-interference capability of the whole system is improved.
3. After the fuzzy recognition technology is introduced, the success rate of voice recognition in a complex environment can be improved to a certain extent.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:
FIG. 1 is a flow chart of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a specific embodiment of the present invention provides a robot speech recognition method, including:
step S1: acquiring a voice signal;
step S2: carrying out pre-emphasis, framing and windowing on the voice signals in sequence, and extracting static characteristic information from the processed voice signals;
the pre-emphasis part is used for leading the signal to pass through a high-pass filter, the pre-emphasis aims to promote the high-frequency part, so that the frequency spectrum of the signal becomes flat, the signal is kept in the whole frequency band from low frequency to high frequency, and the frequency spectrum can be obtained by the same signal-to-noise ratio. Meanwhile, the method is also used for eliminating the vocal cords and lip effects in the generation process, compensating the high-frequency part of the voice signal which is restrained by the pronunciation system, and highlighting the formants of the high frequency.
Since the desired signal must be a stationary signal in the subsequent fast fourier transform, the signal must be framed so that the local process of the speech signal can be considered stationary. However, there are discontinuities at the beginning and end of each frame, so that the more frames are divided, the larger the error from the original signal, and windowing is performed to solve this problem, so that the framed signal becomes continuous, and each frame exhibits the characteristics of a periodic function, and in speech signal processing, a hamming window is usually added.
Each frame is multiplied by a hamming window to increase the continuity of the left and right ends of the frame. Assuming that the signal after framing is S (N), N is 0,1 …, N-1, and N is the size of the frame, the signal after multiplication by the hamming window is of the form S' (N) ═ S (N) × w (N), w (N) as follows:
different values of a will result in different Hamming windows, typically a being 0.46.
Step S3: performing FFT, Mel filtering and DCT processing on the static characteristic information in sequence, and performing differential calculation on the output result of the DCT to obtain dynamic characteristic information;
since the signal is usually difficult to see by the transformation in the time domain, it is usually observed by transforming it into an energy distribution in the frequency domain, and different energy distributions can represent the characteristics of different voices. After multiplication by the hamming window, each frame must also undergo a fast fourier transform to obtain the energy distribution over the spectrum. And carrying out fast Fourier transform on each frame signal subjected to framing and windowing to obtain the frequency spectrum of each frame. And the power spectrum of the voice signal is obtained by taking the modulus square of the frequency spectrum of the voice signal.
The role of the mel filter bank is mainly to reduce the amplitude of the frequency domain and reduce the redundant part in the frequency spectrum. At this time, the amplitude spectrum obtained by FFT is multiplied and accumulated by each filter, and the obtained value is the energy value of the frame data in the frequency band corresponding to the filter. If the number of filters is 22, then 22 energy values should be obtained at this time. And logarithm is taken to the obtained energy value, so that the subsequent cepstrum analysis is facilitated.
The frequency response of the mel-triangle filter is defined as:
the triangular band pass filter has two main purposes:
the method smoothes the frequency spectrum, eliminates the effect of harmonic wave, and highlights the formant of the original voice, so that the tone or pitch of a section of voice is not presented in the MFCC parameters, in other words, the voice recognition system using MFCC as the characteristic is not influenced by the tone difference of the input voice, and in addition, the operation amount can be reduced.
The DCT is often used for signal processing and image processing for lossy data compression of signals and images, because the DCT has a strong "energy-concentrating" property: most of the natural signals, including sound and image, have their energy concentrated in the low frequency part after discrete cosine transform, and actually, each frame of data is subjected to dimensionality reduction once.
The log energy s (M) is substituted into the discrete cosine transform to obtain the Mel-scale Cepstrum parameter of L order, which refers to the order of MFCC coefficient, and usually takes 12-16, where M is the number of triangular filters.
The standard cepstral parameters MFCC only reflect the static characteristics of the speech parameters, and the dynamic characteristics of speech can be described by the differential spectrum of these static characteristics. Experiments prove that: the recognition performance of the system can be effectively improved by combining the dynamic and static characteristics. The calculation of the difference parameter may use the following formula:
in the formula (d)tDenotes the t-th first order difference, CtRepresents the t-th cepstral coefficient, Q represents the order of the cepstral coefficient, and K represents the time difference of the first derivative, which may be 1 or 2. And substituting the result of the above expression into the second-order difference parameter. Thus, the whole dynamic characteristic information extraction process of the voice signal is completed
Step S4: combining an artificial neural network ANN with a hidden Markov model HMM to perform voice matching on static characteristic information and dynamic characteristic information, acquiring matching output probabilities of all corresponding paths from the hidden Markov model HMM through a Viterbi algorithm,
when the speech signal is a first match:
and obtaining the maximum probability from the matching output probabilities of all the corresponding paths and comparing the maximum probability with the confidence probability, wherein the calculation formula of the confidence probability is as follows:
wherein, PcIs the confidence probability; m is the total number of successful first matching; piThe maximum probability value of successful first matching of the ith time;
when the maximum probability is greater than the confidence probability, outputting the output content of the corresponding path corresponding to the maximum probability, wherein the content is the identified content;
when the maximum probability is smaller than the confidence probability, performing tone removing processing on the voice signal obtained in the step 1, and performing step 2-4 on the tone removed voice signal for second matching;
when the speech signal is a second match:
and outputting the output content of the corresponding path with the highest probability in the matching output probabilities of all the corresponding paths, wherein the content is the identified content.
For the process of matching the voice information, the invention introduces the concept of fuzzy algorithm. Due to the influence of external noise, when the voice signals are matched, the matching output probability of the corresponding path is generally low, and the high matching probability is difficult to achieve, so that a confidence probability is introduced, and a calculation formula of the confidence probability is as follows:
wherein, PcIs the confidence probability; m is the total number of successful first matching; piThe maximum probability value of successful first matching of the ith time;
the confidence probability is a long-term accumulated value, when an initial value of the confidence probability is set at the initial stage of the robot voice recognition, the initial value is an average value of matching output probabilities of all corresponding paths in the first matching process, and the initial value is updated in real time according to the formula subsequently to adapt to the voice recognition in a noisy environment in turn. When the first matching of the voice information is unsuccessful, namely the maximum probability obtained from the matching output probabilities of all corresponding paths is smaller than the confidence probability, the voice signal is subjected to tone removal processing, and then the voice signal subjected to tone removal is subjected to the second matching in the step 2-4, so that external noise can possibly influence the related information of the voice information, including the influence on the tone of the voice information, thereby reducing the matching probability, removing the tone at the moment, and matching the voice signal again, thereby improving the probability and the accuracy of matching; when the speech signal is a second match: and outputting the output content of the corresponding path with the highest probability in the matching output probabilities of all the corresponding paths, wherein the content is the identified content. The second re-tone-removal instead of the first direct tone-removal can avoid the problem of reduced accuracy caused by the fact that the output probability of the corresponding path with the highest matching degree is reduced after tone removal.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.
Claims (5)
1. A robot voice recognition method, comprising:
step 1: acquiring a voice signal;
step 2: preprocessing a voice signal and extracting static characteristic information;
and step 3: acquiring dynamic characteristic information through a difference algorithm according to the static characteristic information;
and 4, step 4: the hidden Markov model HMM is used to match the static characteristic information and the dynamic characteristic information with voice, the Viterbi algorithm is used to obtain the matching output probability of all the corresponding paths from the hidden Markov model HMM,
when the speech signal is a first match:
obtaining the maximum probability from the matching output probabilities of all the corresponding paths and comparing the maximum probability with the confidence probability,
when the maximum probability is greater than the confidence probability, outputting the output content of the corresponding path corresponding to the maximum probability, wherein the content is the identified content;
when the maximum probability is smaller than the confidence probability, performing tone removing processing on the voice signal obtained in the step 1, and performing step 2-4 on the tone removed voice signal for second matching;
when the speech signal is a second match:
and outputting the output content of the corresponding path with the highest probability in the matching output probabilities of all the corresponding paths, wherein the content is the identified content.
2. The robot speech recognition method according to claim 1, wherein the confidence probability in step 4 is calculated as follows:
wherein, PcIs the confidence probability; m is the total number of successful first matching; piThe maximum probability value of successful first matching of the ith time.
3. The robot speech recognition method of claim 1, wherein the pre-processing of the speech signal in step 2 comprises pre-emphasis, framing, and windowing in sequence.
4. The robot speech recognition method according to claim 1, wherein in step 3, the static feature information is subjected to FFT, mel filtering, and DCT processing in sequence, and then the processing result of the DCT is subjected to difference calculation to obtain the dynamic feature information.
5. The robot speech recognition method according to claim 1, wherein in step 4, the artificial neural network ANN is combined with a hidden markov model HMM to perform speech matching on the static feature information and the dynamic feature information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011447106.8A CN112562646B (en) | 2020-12-09 | 2020-12-09 | Robot voice recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011447106.8A CN112562646B (en) | 2020-12-09 | 2020-12-09 | Robot voice recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112562646A true CN112562646A (en) | 2021-03-26 |
CN112562646B CN112562646B (en) | 2024-08-02 |
Family
ID=75061414
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011447106.8A Active CN112562646B (en) | 2020-12-09 | 2020-12-09 | Robot voice recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112562646B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114689298A (en) * | 2022-03-24 | 2022-07-01 | 三一重型装备有限公司 | Fault detection method and device for walking part of shearer |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1343350A (en) * | 1999-11-11 | 2002-04-03 | 皇家菲利浦电子有限公司 | Tone features for speech recognition |
US20100179812A1 (en) * | 2009-01-14 | 2010-07-15 | Samsung Electronics Co., Ltd. | Signal processing apparatus and method of recognizing a voice command thereof |
CN102945673A (en) * | 2012-11-24 | 2013-02-27 | 安徽科大讯飞信息科技股份有限公司 | Continuous speech recognition method with speech command range changed dynamically |
CN103065629A (en) * | 2012-11-20 | 2013-04-24 | 广东工业大学 | Speech recognition system of humanoid robot |
KR20170083320A (en) * | 2016-01-08 | 2017-07-18 | 현대자동차주식회사 | Vehicle and control method for the same |
CN108182937A (en) * | 2018-01-17 | 2018-06-19 | 出门问问信息科技有限公司 | Keyword recognition method, device, equipment and storage medium |
CN109036381A (en) * | 2018-08-08 | 2018-12-18 | 平安科技(深圳)有限公司 | Method of speech processing and device, computer installation and readable storage medium storing program for executing |
CN109243460A (en) * | 2018-08-15 | 2019-01-18 | 浙江讯飞智能科技有限公司 | A method of automatically generating news or interrogation record based on the local dialect |
CN109872714A (en) * | 2019-01-25 | 2019-06-11 | 广州富港万嘉智能科技有限公司 | A kind of method, electronic equipment and storage medium improving accuracy of speech recognition |
CN110503952A (en) * | 2019-07-29 | 2019-11-26 | 北京搜狗科技发展有限公司 | A kind of method of speech processing, device and electronic equipment |
-
2020
- 2020-12-09 CN CN202011447106.8A patent/CN112562646B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1343350A (en) * | 1999-11-11 | 2002-04-03 | 皇家菲利浦电子有限公司 | Tone features for speech recognition |
US20100179812A1 (en) * | 2009-01-14 | 2010-07-15 | Samsung Electronics Co., Ltd. | Signal processing apparatus and method of recognizing a voice command thereof |
CN103065629A (en) * | 2012-11-20 | 2013-04-24 | 广东工业大学 | Speech recognition system of humanoid robot |
CN102945673A (en) * | 2012-11-24 | 2013-02-27 | 安徽科大讯飞信息科技股份有限公司 | Continuous speech recognition method with speech command range changed dynamically |
KR20170083320A (en) * | 2016-01-08 | 2017-07-18 | 현대자동차주식회사 | Vehicle and control method for the same |
CN108182937A (en) * | 2018-01-17 | 2018-06-19 | 出门问问信息科技有限公司 | Keyword recognition method, device, equipment and storage medium |
CN109036381A (en) * | 2018-08-08 | 2018-12-18 | 平安科技(深圳)有限公司 | Method of speech processing and device, computer installation and readable storage medium storing program for executing |
CN109243460A (en) * | 2018-08-15 | 2019-01-18 | 浙江讯飞智能科技有限公司 | A method of automatically generating news or interrogation record based on the local dialect |
CN109872714A (en) * | 2019-01-25 | 2019-06-11 | 广州富港万嘉智能科技有限公司 | A kind of method, electronic equipment and storage medium improving accuracy of speech recognition |
CN110503952A (en) * | 2019-07-29 | 2019-11-26 | 北京搜狗科技发展有限公司 | A kind of method of speech processing, device and electronic equipment |
Non-Patent Citations (1)
Title |
---|
赵振东;胡喜梅;田景峰;: "基于VQ-SVM的说话人识别系统", 华北电力大学学报(自然科学版), no. 05, 30 September 2009 (2009-09-30) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114689298A (en) * | 2022-03-24 | 2022-07-01 | 三一重型装备有限公司 | Fault detection method and device for walking part of shearer |
Also Published As
Publication number | Publication date |
---|---|
CN112562646B (en) | 2024-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3933829B1 (en) | Speech processing method and apparatus, electronic device, and computer-readable storage medium | |
KR100908121B1 (en) | Speech feature vector conversion method and apparatus | |
RU2329550C2 (en) | Method and device for enhancement of voice signal in presence of background noise | |
CN108447495B (en) | A Deep Learning Speech Enhancement Method Based on Comprehensive Feature Set | |
JP5230103B2 (en) | Method and system for generating training data for an automatic speech recognizer | |
CN110619885A (en) | Method for generating confrontation network voice enhancement based on deep complete convolution neural network | |
TW201935464A (en) | Method and device for voiceprint recognition based on memorability bottleneck features | |
CN111508498B (en) | Conversational speech recognition method, conversational speech recognition system, electronic device, and storage medium | |
CN109767756B (en) | A Voice Feature Extraction Algorithm Based on Dynamic Segmentation Inverse Discrete Cosine Transform Cepstral Coefficients | |
CN112530410B (en) | Command word recognition method and device | |
Labied et al. | An overview of automatic speech recognition preprocessing techniques | |
CN111798846A (en) | Voice command word recognition method and device, conference terminal and conference terminal system | |
Ali et al. | Speech enhancement using dilated wave-u-net: an experimental analysis | |
CN114283835A (en) | Voice enhancement and detection method suitable for actual communication condition | |
Tu et al. | DNN training based on classic gain function for single-channel speech enhancement and recognition | |
CN114550741A (en) | Semantic recognition method and system | |
CN112562646B (en) | Robot voice recognition method | |
CN111681649B (en) | Speech recognition method, interactive system and performance management system including the system | |
KR20080077874A (en) | Speech feature vector extraction apparatus and method and speech recognition system and method employing same | |
Kaur et al. | Optimizing feature extraction techniques constituting phone based modelling on connected words for Punjabi automatic speech recognition | |
CN111833869B (en) | Voice interaction method and system applied to urban brain | |
Gao et al. | DNN Speech Separation Algorithm Based on Improved Segmented Masking Target | |
Xie et al. | New research on monaural speech segregation based on quality assessment | |
CN119418712A (en) | A noise reduction method for real-time speech at the edge | |
Zilvan et al. | Robust Features with Convolutional Autoencoder Speech Command Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |