CN110853629A - Speech recognition digital method based on deep learning - Google Patents
Speech recognition digital method based on deep learning Download PDFInfo
- Publication number
- CN110853629A CN110853629A CN201911149493.4A CN201911149493A CN110853629A CN 110853629 A CN110853629 A CN 110853629A CN 201911149493 A CN201911149493 A CN 201911149493A CN 110853629 A CN110853629 A CN 110853629A
- Authority
- CN
- China
- Prior art keywords
- pinyin
- digital
- chinese
- ctc
- deep learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000013135 deep learning Methods 0.000 title claims abstract description 21
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 16
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 14
- 238000003062 neural network model Methods 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims description 7
- 238000004140 cleaning Methods 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 5
- 230000009191 jumping Effects 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 2
- 230000005236 sound signal Effects 0.000 claims description 2
- 238000000605 extraction Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000013481 data capture Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for recognizing numbers by voices based on deep learning, which takes Chinese pinyin without tones as a modeling unit of an acoustic model to construct a deep neural network model from voice to pinyin end to end; and the deep neural network model is modeled by using a CNN + CTC structure, and a limited condition of digital pinyin is innovatively added on the basis of a CTC maximum decoding algorithm in a CTC decoding stage, so that the CTC decoding search space is greatly reduced, and the speech number can be efficiently and accurately identified.
Description
Technical Field
The invention belongs to the technical field of speech digital recognition, and particularly relates to a method for recognizing numbers by speech based on deep learning.
Background
The Speech digital Recognition is an important branch of an Automatic Speech Recognition (ASR) technology, and plays an important role in computer application fields such as user identity Recognition, living body authentication, network data capture and the like. However, in an actual application scenario, various complex factors such as accent, dialect, background noise interference and the like may exist in the voice data to be recognized, which brings a great challenge to the recognition of the high-accuracy voice digital verification code.
Application number is CN201910560346. X; the invention discloses a Chinese invention patent named as a voice digital recognition method and a voice digital recognition device, and discloses a digital voice data recognition method, wherein the method comprises the following steps: acquiring digital voice data to be recognized; extracting a spectral feature vector of the digital voice data using a short-time Fourier transform; identifying the frequency spectrum characteristic vector based on a preset DS2 network model to obtain an identified number; wherein the preset DS2 network model is obtained by resetting the output point of the last full connection layer to be the initial DS2 network model training of 10 numbers from 0 to 9.
With the rapid development of Deep Learning (DL) technology, the performance of an acoustic model based on a Deep Neural Network (DNN) is significantly improved compared with that of a traditional GMM-HMM model.
Disclosure of Invention
The invention provides a method for recognizing numbers based on deep learning, which is different from the prior art and is based on a CNN + CTC network model to efficiently and accurately recognize the numbers.
The invention is mainly realized by the following technical scheme: a method for recognizing numbers based on deep learning of speech includes using phonetic letters of Chinese without tone as model building unit of acoustic model, using CNN + CTC structure to build deep neural network model from speech to phonetic letters end to end, using CTC decoding algorithm with digital phonetic letter limiting condition to decode after model training and recognizing speech numbers.
Further, in order to better implement the invention, the method specifically comprises the following steps:
step S100: collecting audio annotation data, and cleaning and preprocessing the audio annotation data to obtain a Chinese pinyin and a spectrogram without tones;
step S200: inputting the two-dimensional matrix of the spectrogram obtained in the step S100 into the acoustic model by taking the Chinese pinyin without tones in the step S100 as a modeling unit of the acoustic model, and training the acoustic model by using a CNN + CTC model;
step S300: based on the acoustic model in the step S200, the CTC decoding algorithm with the digital pinyin limiting condition is used for maximum decoding, and the recognition from the voice to be recognized to the digital pinyin is carried out;
step S400: and then obtaining a final Arabic numeral sequence according to the corresponding relation between the numeric pinyin and the Arabic numerals.
The invention provides a method for recognizing numbers by voices based on deep learning, which takes Chinese pinyin without tones as a modeling unit of an acoustic model and constructs a deep neural network model from voice to pinyin end to end; and the deep neural network model is modeled by using a CNN + CTC structure, and a limited condition of digital pinyin is innovatively added on the basis of a CTC maximum decoding algorithm in a CTC decoding stage, so that the CTC decoding search space is greatly reduced, and the speech number can be efficiently and accurately identified.
Further, in order to better implement the present invention, when the audio annotation data is collected in step S100, at least 200 hours of chinese speech standard data need to be collected, wherein the chinese speech standard data is provided by a plurality of speech speakers with balanced male and female proportions, and the speech of each speech speaker is composed of a plurality of audio segments; each audio clip is used as a sample of the Chinese speech standard data and is provided with corresponding marked Chinese characters.
Furthermore, in order to better realize the invention, the total pronunciation time of each pronunciation speaker does not exceed 30 minutes, and one sample of the Chinese voice standard data does not exceed 30 seconds; the audio format of each sample is a single channel, 16k sample rate, 16 bit depth WAV format.
Further, in order to better implement the present invention, the step S100 of cleaning and preprocessing the audio annotation data specifically includes: deleting samples containing non-Chinese system symbols; removing punctuation marks of the marked Chinese characters, wherein if Arabic numerals exist, the marked Chinese characters need to be converted into corresponding Chinese characters; then, uniformly converting the Chinese characters into Chinese pinyin with tones removed; and framing the audio signal of each sample, performing short-time Fourier transform on each frame, and finally forming a spectrogram.
Further, in order to better implement the present invention, the architecture of the acoustic model in step S200 is that 1 full-link layer is added after the CNN convolutional neural network with 10 layers.
Further, in order to better implement the present invention, in step S300, the CTC decoding algorithm with the digital pinyin restriction condition performs maximum decoding on the speech to be recognized, all the decoded moments form paths and generate an optimal path, and the optimal path sequence of the optimal path is converted into a final digital pinyin sequence formed by digital pinyin;
namely, the CTC decoding algorithm with the digital pinyin limiting condition reduces the searching range of the CTC decoding from all Chinese pinyin to the range of digital pinyin.
Further, in order to better implement the present invention, the process of converting the optimal path sequence of the optimal path into the final digital pinyin sequence consisting of the digital pinyins is performed according to the following steps:
step S310: if continuous repeated digital pinyin or BLANK appears, merging and then jumping to the step S320; if there is no continuous repeated digital pinyin or BLANK, directly jumping to step S320;
step S320: removing all BLANK; if the digital pinyin before and after BLANK is the same, the continuous repetition of the digital pinyin is kept after the BLANK is removed.
The invention has the beneficial effects that:
(1) the invention can realize high-precision recognition only by a small amount of voice marking data, does not need special digital voice pronunciation data, can be the voice of any Chinese character, and the data can be easily acquired on an open source data set free of charge.
(2) The acoustic model uses a deep learning technology and is combined with the CTC-based decoding method, so that the automatic extraction of the audio features is realized, and a large amount of manual feature extraction work is saved.
(3) The modeling unit of the acoustic model is the pinyin with the tone removed, so that the acoustic model has strong robustness to dialects, and the numbers of various tones can be accurately identified.
Drawings
FIG. 1 is a schematic diagram of the architecture of the acoustic model of the present invention.
FIG. 2 is a schematic flow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments will be clearly and completely described below with reference to the accompanying drawings. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments presented in the figures is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
a method for recognizing numbers based on deep learning of speech includes using phonetic letters of Chinese without tone as model building unit of acoustic model, using CNN + CTC structure to build deep neural network model from speech to phonetic letters end to end, decoding by CTC decoding algorithm with digital phonetic letters limiting condition after deep learning to recognize speech numbers.
The method specifically comprises the steps S100-S400.
Step S100: and collecting audio annotation data, and cleaning and preprocessing the audio annotation data to obtain the Chinese pinyin and the spectrogram without tones.
The method specifically comprises the following steps:
1. chinese voice labeling data of more than 200 hours are collected, voice pronouncing persons participate in the method, the proportion of men and women is balanced, and the total pronunciation time of each person does not exceed 30 minutes. The voice of each person is composed of a plurality of segments, each segment being a sample of data, not more than 30 seconds. The audio format is unified as a wav format with a single channel, 16k sample rate, 16 bit depth. Each audio clip has a corresponding text label.
2. Deleting samples containing symbols of non-Chinese systems such as English and the like, removing punctuation marks of the marked characters, and converting the punctuation marks into corresponding Chinese characters if Arabic numerals exist. And finally, uniformly converting the Chinese characters into pinyin with tone removed.
For example: the following notations are provided:
today is No. 15, and the weather is clear.
The result after conversion is:
jin tian shi shi wu hao tian qi qing lang
3. and framing the audio waveform signal of each sample, performing short-time Fourier transform on each frame, and finally forming a spectrogram. The spectrogram is a time-series vector sequence, and each time corresponding vector is the characteristic of the audio at the current time.
Step S200: and (3) taking the Chinese pinyin without tones in the step S100 as a modeling unit of the acoustic model, inputting the two-dimensional matrix of the spectrogram obtained in the step S100 into the acoustic model, and training the acoustic model by using the CNN + CTC model.
And (3) training an acoustic model taking the non-tonal pinyin as a modeling unit by using the CNN + CTC model, wherein the input of the model is the two-dimensional matrix of the audio spectrogram obtained in the step S100.
The architecture of the acoustic model is shown in fig. 1 as 10 layers of CNN followed by a full connection, with the loss function using CTC.
For example: suppose the spectrogram feature of the sample is x(i)The corresponding phonetic notation is y(i)。
x(i)And y(i)Belongs to the training set X { (X)(1),y(1)),(x(2),y(2)) ,. for each x(i)Assume that its timing length is T(i)Then the audio characteristic at each moment is
The output of the acoustic model is the probability distribution for all pinyins at each time t, given xWherein the content of the first and second substances,i.e., the set of all pinyin and BLANK marks. Assuming concealment of the l-th layerLayer is hlThe length of the sliding window is c, and the convolution kernel parameter is wlIf f is the nonlinear transformation ReLU, the convolution layer is calculated at time t as follows:
the output of the last convolutional layer is input to a full link layer, WlFor the parameters of the fully connected layer, the calculation of the fully connected layer is as follows:
the final output layer L is a softmax layer, calculated as follows:
step S300: and based on the acoustic model in the step S200, performing maximum decoding by using a CTC decoding algorithm with a digital pinyin limiting condition to recognize the speech to be recognized from the digital pinyin.
The invention adds the limited condition of digital pinyin to the CTC decoding algorithm, the searching range of CTC decoding is reduced from all Chinese pinyin to the range of digital pinyin, and the path formed by all decoded moments is assumed to beThen the optimal pathThe expression of (a) is:
s.t.l′t∈{ling,yi,er,san,si,wu,liu,qi,ba,jiu,BLANK} (4)。
step S310: if continuous repeated pinyin or BLANK appears, merging;
step S320: and removing all BLANK, and if the Pinyin before and after BLANK is the same, keeping continuous repetition of the Pinyin after removal.
For example: the following optimal path sequence:
BLANK san BALNK jiu liu qi qi ba wu BLANK wu
the combined results are:
san jiu liu qi ba wu wu。
step S400: and then obtaining a final Arabic numeral sequence according to the corresponding relation between the numeric pinyin and the Arabic numerals.
For example: the result of san jiu liu qiba wu wu was 3967855.
The invention provides a method for recognizing speech numbers, which takes Chinese pinyin without tones as a modeling unit of an acoustic model and constructs a deep neural network model from speech to pinyin end to end. The model is modeled by using a CNN + CTC structure, and in a CTC decoding stage, the invention innovatively adds a digital pinyin limiting condition on the basis of a CTC maximum decoding algorithm, thereby greatly reducing a CTC decoding search space and efficiently and accurately identifying voice numbers.
① the cost of the phonetic label data is high, the invention only needs a small amount of phonetic label data to realize high precision identification, and does not need special pronunciation data for the number, it can be the phonetic of any Chinese character, such data can be easily obtained on the open source data set free, ② the acoustic model uses deep learning technique, and combines with the decoding method based on CTC, it realizes the automatic extraction of the audio frequency feature, saves a lot of manual feature extraction work, ③ the model unit of the acoustic model is the phonetic de-intonation, makes the model have strong robustness to dialect, the number of the tone can be identified accurately.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.
Claims (8)
1. A method for recognizing numbers based on deep learning is characterized in that the Chinese pinyin without tones is used as a modeling unit of an acoustic model, a CNN + CTC structure is adopted to construct a deep neural network model from voice to pinyin end, and a CTC decoding algorithm with digital pinyin limiting conditions is adopted to decode after the model is trained, so that the voice numbers are recognized.
2. The method for recognizing numbers based on deep learning of the speech according to claim 1, characterized by comprising the following steps:
s100, collecting audio annotation data, and cleaning and preprocessing the audio annotation data to obtain a Chinese pinyin and a spectrogram without tones;
s200, inputting the two-dimensional matrix of the spectrogram obtained in the step S100 into the acoustic model by taking the Chinese pinyin without tones in the step S100 as a modeling unit of the acoustic model and training the acoustic model by using a CNN + CTC model;
step S300, based on the acoustic model in the step S200, using a CTC decoding algorithm with a digital pinyin limiting condition to carry out maximum decoding, and identifying the voice to be identified to the digital pinyin;
and step S400, obtaining a final Arabic numeral sequence according to the corresponding relation between the digital pinyin and the Arabic numerals.
3. The method for recognizing numbers based on deep learning of claim 2, wherein the step S100 of collecting the audio annotation data is to collect at least 200 hours of chinese phonetic standard data, wherein the chinese phonetic standard data is provided by a plurality of phonetic speakers with equal male and female proportions, and the voice of each phonetic speaker is composed of a plurality of audio segments; each audio clip is used as a sample of the Chinese speech standard data and is provided with corresponding marked Chinese characters.
4. The method for recognizing numbers based on deep learning of claim 3, wherein the total pronunciation time of each pronunciation speaker is not more than 30 minutes, and one sample of the Chinese phonetic standard data is not more than 30 seconds; the audio format of each sample is a single channel, 16k sample rate, 16 bit depth WAV format.
5. The method for recognizing numbers based on deep learning of claim 3, wherein the step S100 of cleaning and preprocessing the audio annotation data specifically comprises: deleting samples containing non-Chinese system symbols; removing punctuation marks of the marked Chinese characters, wherein if Arabic numerals exist, the marked Chinese characters need to be converted into corresponding Chinese characters; then, uniformly converting the Chinese characters into Chinese pinyin with tones removed; and framing the audio signal of each sample, performing short-time Fourier transform on each frame, and finally forming a spectrogram.
6. The method for recognizing numbers based on deep learning of claim 2, wherein the acoustic model in step S200 is constructed by adding 1 full-connected layer after a 10-layer CNN convolutional neural network.
7. The method for recognizing numbers based on deep learning of claim 2, wherein in step S300, the CTC decoding algorithm with the digital pinyin restriction condition decodes the speech to be recognized maximally, forms paths at all times after decoding and generates the optimal path, and converts the optimal path sequence of the optimal path into the final digital pinyin sequence formed by digital pinyin;
namely, the CTC decoding algorithm with the digital pinyin limiting condition reduces the searching range of the CTC decoding from all Chinese pinyin to the range of digital pinyin.
8. The method for recognizing numbers based on deep learning of claim 7, wherein the process of converting the optimal path sequence of the optimal path into the final digital pinyin sequence consisting of digital pinyins is performed according to the following steps:
step S310, if continuous repeated digital pinyin or BLANK appears, merging and then jumping to step S320; if there is no continuous repeated digital pinyin or BLANK, directly jumping to step S320;
step S320, removing all BLANK; if the digital pinyin before and after BLANK is the same, the continuous repetition of the digital pinyin is kept after the BLANK is removed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911149493.4A CN110853629A (en) | 2019-11-21 | 2019-11-21 | Speech recognition digital method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911149493.4A CN110853629A (en) | 2019-11-21 | 2019-11-21 | Speech recognition digital method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110853629A true CN110853629A (en) | 2020-02-28 |
Family
ID=69603396
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911149493.4A Pending CN110853629A (en) | 2019-11-21 | 2019-11-21 | Speech recognition digital method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110853629A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111246026A (en) * | 2020-03-11 | 2020-06-05 | 兰州飞天网景信息产业有限公司 | Recording processing method based on convolutional neural network and connectivity time sequence classification |
CN111710330A (en) * | 2020-07-29 | 2020-09-25 | 深圳波洛斯科技有限公司 | Environmental noise elimination method and device based on deep neural network and storage medium |
CN111833869A (en) * | 2020-07-01 | 2020-10-27 | 中关村科学城城市大脑股份有限公司 | Voice interaction method and system applied to urban brain |
CN112104457A (en) * | 2020-08-28 | 2020-12-18 | 苏州云葫芦信息科技有限公司 | Method and system for generating verification code for converting digital to Chinese character type |
CN112767923A (en) * | 2021-01-05 | 2021-05-07 | 上海微盟企业发展有限公司 | Voice recognition method and device |
CN113380231A (en) * | 2021-06-15 | 2021-09-10 | 北京一起教育科技有限责任公司 | Voice conversion method and device and electronic equipment |
CN113506584A (en) * | 2021-07-06 | 2021-10-15 | 腾讯音乐娱乐科技(深圳)有限公司 | Data processing method and device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105869624A (en) * | 2016-03-29 | 2016-08-17 | 腾讯科技(深圳)有限公司 | Method and apparatus for constructing speech decoding network in digital speech recognition |
US20160351188A1 (en) * | 2015-05-26 | 2016-12-01 | Google Inc. | Learning pronunciations from acoustic sequences |
CN107358951A (en) * | 2017-06-29 | 2017-11-17 | 阿里巴巴集团控股有限公司 | A kind of voice awakening method, device and electronic equipment |
CN109065032A (en) * | 2018-07-16 | 2018-12-21 | 杭州电子科技大学 | A kind of external corpus audio recognition method based on depth convolutional neural networks |
CN109272990A (en) * | 2018-09-25 | 2019-01-25 | 江南大学 | Audio recognition method based on convolutional neural networks |
CN110288995A (en) * | 2019-07-19 | 2019-09-27 | 出门问问(苏州)信息科技有限公司 | Exchange method, device, storage medium and electronic equipment based on speech recognition |
CN110299132A (en) * | 2019-06-26 | 2019-10-01 | 京东数字科技控股有限公司 | A kind of speech digit recognition methods and device |
US10468019B1 (en) * | 2017-10-27 | 2019-11-05 | Kadho, Inc. | System and method for automatic speech recognition using selection of speech models based on input characteristics |
-
2019
- 2019-11-21 CN CN201911149493.4A patent/CN110853629A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160351188A1 (en) * | 2015-05-26 | 2016-12-01 | Google Inc. | Learning pronunciations from acoustic sequences |
CN105869624A (en) * | 2016-03-29 | 2016-08-17 | 腾讯科技(深圳)有限公司 | Method and apparatus for constructing speech decoding network in digital speech recognition |
CN107358951A (en) * | 2017-06-29 | 2017-11-17 | 阿里巴巴集团控股有限公司 | A kind of voice awakening method, device and electronic equipment |
US10468019B1 (en) * | 2017-10-27 | 2019-11-05 | Kadho, Inc. | System and method for automatic speech recognition using selection of speech models based on input characteristics |
CN109065032A (en) * | 2018-07-16 | 2018-12-21 | 杭州电子科技大学 | A kind of external corpus audio recognition method based on depth convolutional neural networks |
CN109272990A (en) * | 2018-09-25 | 2019-01-25 | 江南大学 | Audio recognition method based on convolutional neural networks |
CN110299132A (en) * | 2019-06-26 | 2019-10-01 | 京东数字科技控股有限公司 | A kind of speech digit recognition methods and device |
CN110288995A (en) * | 2019-07-19 | 2019-09-27 | 出门问问(苏州)信息科技有限公司 | Exchange method, device, storage medium and electronic equipment based on speech recognition |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111246026A (en) * | 2020-03-11 | 2020-06-05 | 兰州飞天网景信息产业有限公司 | Recording processing method based on convolutional neural network and connectivity time sequence classification |
CN111833869A (en) * | 2020-07-01 | 2020-10-27 | 中关村科学城城市大脑股份有限公司 | Voice interaction method and system applied to urban brain |
CN111833869B (en) * | 2020-07-01 | 2022-02-11 | 中关村科学城城市大脑股份有限公司 | Voice interaction method and system applied to urban brain |
CN111710330A (en) * | 2020-07-29 | 2020-09-25 | 深圳波洛斯科技有限公司 | Environmental noise elimination method and device based on deep neural network and storage medium |
CN112104457A (en) * | 2020-08-28 | 2020-12-18 | 苏州云葫芦信息科技有限公司 | Method and system for generating verification code for converting digital to Chinese character type |
CN112104457B (en) * | 2020-08-28 | 2022-06-17 | 苏州云葫芦信息科技有限公司 | Method and system for generating verification code for converting numbers into Chinese character types |
CN112767923A (en) * | 2021-01-05 | 2021-05-07 | 上海微盟企业发展有限公司 | Voice recognition method and device |
CN113380231A (en) * | 2021-06-15 | 2021-09-10 | 北京一起教育科技有限责任公司 | Voice conversion method and device and electronic equipment |
CN113380231B (en) * | 2021-06-15 | 2023-01-24 | 北京一起教育科技有限责任公司 | Voice conversion method and device and electronic equipment |
CN113506584A (en) * | 2021-07-06 | 2021-10-15 | 腾讯音乐娱乐科技(深圳)有限公司 | Data processing method and device |
CN113506584B (en) * | 2021-07-06 | 2024-05-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Data processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110853629A (en) | Speech recognition digital method based on deep learning | |
CN107680582B (en) | Acoustic model training method, voice recognition method, device, equipment and medium | |
CN110263322B (en) | Audio corpus screening method and device for speech recognition and computer equipment | |
CN109192213B (en) | Method and device for real-time transcription of court trial voice, computer equipment and storage medium | |
WO2018227781A1 (en) | Voice recognition method, apparatus, computer device, and storage medium | |
CN109410914B (en) | Method for identifying Jiangxi dialect speech and dialect point | |
Kelly et al. | Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors | |
CN111640418B (en) | Prosodic phrase identification method and device and electronic equipment | |
CN107945805A (en) | A kind of intelligent across language voice identification method for transformation | |
CN106782521A (en) | A kind of speech recognition system | |
CN110767210A (en) | Method and device for generating personalized voice | |
CN109377981B (en) | Phoneme alignment method and device | |
CN102237083A (en) | Portable interpretation system based on WinCE platform and language recognition method thereof | |
CN112489634A (en) | Language acoustic model training method and device, electronic equipment and computer medium | |
Anoop et al. | Automatic speech recognition for Sanskrit | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
CN113611286B (en) | Cross-language speech emotion recognition method and system based on common feature extraction | |
CN107123419A (en) | The optimization method of background noise reduction in the identification of Sphinx word speeds | |
CN113268989A (en) | Polyphone processing method and device | |
CN114626424B (en) | Data enhancement-based silent speech recognition method and device | |
CN111489745A (en) | Chinese speech recognition system applied to artificial intelligence | |
Lin et al. | Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder | |
CN112233668A (en) | Voice instruction and identity recognition method based on neural network | |
Bansod et al. | Speaker Recognition using Marathi (Varhadi) Language | |
Phuong et al. | Development of high-performance and large-scale vietnamese automatic speech recognition systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200228 |