CN110853629A - Speech recognition digital method based on deep learning - Google Patents

Speech recognition digital method based on deep learning Download PDF

Info

Publication number
CN110853629A
CN110853629A CN201911149493.4A CN201911149493A CN110853629A CN 110853629 A CN110853629 A CN 110853629A CN 201911149493 A CN201911149493 A CN 201911149493A CN 110853629 A CN110853629 A CN 110853629A
Authority
CN
China
Prior art keywords
pinyin
digital
chinese
ctc
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911149493.4A
Other languages
Chinese (zh)
Inventor
蒋欣辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Zhiyun Technology Co Ltd
Original Assignee
Zhongke Zhiyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Zhiyun Technology Co Ltd filed Critical Zhongke Zhiyun Technology Co Ltd
Priority to CN201911149493.4A priority Critical patent/CN110853629A/en
Publication of CN110853629A publication Critical patent/CN110853629A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for recognizing numbers by voices based on deep learning, which takes Chinese pinyin without tones as a modeling unit of an acoustic model to construct a deep neural network model from voice to pinyin end to end; and the deep neural network model is modeled by using a CNN + CTC structure, and a limited condition of digital pinyin is innovatively added on the basis of a CTC maximum decoding algorithm in a CTC decoding stage, so that the CTC decoding search space is greatly reduced, and the speech number can be efficiently and accurately identified.

Description

Speech recognition digital method based on deep learning
Technical Field
The invention belongs to the technical field of speech digital recognition, and particularly relates to a method for recognizing numbers by speech based on deep learning.
Background
The Speech digital Recognition is an important branch of an Automatic Speech Recognition (ASR) technology, and plays an important role in computer application fields such as user identity Recognition, living body authentication, network data capture and the like. However, in an actual application scenario, various complex factors such as accent, dialect, background noise interference and the like may exist in the voice data to be recognized, which brings a great challenge to the recognition of the high-accuracy voice digital verification code.
Application number is CN201910560346. X; the invention discloses a Chinese invention patent named as a voice digital recognition method and a voice digital recognition device, and discloses a digital voice data recognition method, wherein the method comprises the following steps: acquiring digital voice data to be recognized; extracting a spectral feature vector of the digital voice data using a short-time Fourier transform; identifying the frequency spectrum characteristic vector based on a preset DS2 network model to obtain an identified number; wherein the preset DS2 network model is obtained by resetting the output point of the last full connection layer to be the initial DS2 network model training of 10 numbers from 0 to 9.
With the rapid development of Deep Learning (DL) technology, the performance of an acoustic model based on a Deep Neural Network (DNN) is significantly improved compared with that of a traditional GMM-HMM model.
Disclosure of Invention
The invention provides a method for recognizing numbers based on deep learning, which is different from the prior art and is based on a CNN + CTC network model to efficiently and accurately recognize the numbers.
The invention is mainly realized by the following technical scheme: a method for recognizing numbers based on deep learning of speech includes using phonetic letters of Chinese without tone as model building unit of acoustic model, using CNN + CTC structure to build deep neural network model from speech to phonetic letters end to end, using CTC decoding algorithm with digital phonetic letter limiting condition to decode after model training and recognizing speech numbers.
Further, in order to better implement the invention, the method specifically comprises the following steps:
step S100: collecting audio annotation data, and cleaning and preprocessing the audio annotation data to obtain a Chinese pinyin and a spectrogram without tones;
step S200: inputting the two-dimensional matrix of the spectrogram obtained in the step S100 into the acoustic model by taking the Chinese pinyin without tones in the step S100 as a modeling unit of the acoustic model, and training the acoustic model by using a CNN + CTC model;
step S300: based on the acoustic model in the step S200, the CTC decoding algorithm with the digital pinyin limiting condition is used for maximum decoding, and the recognition from the voice to be recognized to the digital pinyin is carried out;
step S400: and then obtaining a final Arabic numeral sequence according to the corresponding relation between the numeric pinyin and the Arabic numerals.
The invention provides a method for recognizing numbers by voices based on deep learning, which takes Chinese pinyin without tones as a modeling unit of an acoustic model and constructs a deep neural network model from voice to pinyin end to end; and the deep neural network model is modeled by using a CNN + CTC structure, and a limited condition of digital pinyin is innovatively added on the basis of a CTC maximum decoding algorithm in a CTC decoding stage, so that the CTC decoding search space is greatly reduced, and the speech number can be efficiently and accurately identified.
Further, in order to better implement the present invention, when the audio annotation data is collected in step S100, at least 200 hours of chinese speech standard data need to be collected, wherein the chinese speech standard data is provided by a plurality of speech speakers with balanced male and female proportions, and the speech of each speech speaker is composed of a plurality of audio segments; each audio clip is used as a sample of the Chinese speech standard data and is provided with corresponding marked Chinese characters.
Furthermore, in order to better realize the invention, the total pronunciation time of each pronunciation speaker does not exceed 30 minutes, and one sample of the Chinese voice standard data does not exceed 30 seconds; the audio format of each sample is a single channel, 16k sample rate, 16 bit depth WAV format.
Further, in order to better implement the present invention, the step S100 of cleaning and preprocessing the audio annotation data specifically includes: deleting samples containing non-Chinese system symbols; removing punctuation marks of the marked Chinese characters, wherein if Arabic numerals exist, the marked Chinese characters need to be converted into corresponding Chinese characters; then, uniformly converting the Chinese characters into Chinese pinyin with tones removed; and framing the audio signal of each sample, performing short-time Fourier transform on each frame, and finally forming a spectrogram.
Further, in order to better implement the present invention, the architecture of the acoustic model in step S200 is that 1 full-link layer is added after the CNN convolutional neural network with 10 layers.
Further, in order to better implement the present invention, in step S300, the CTC decoding algorithm with the digital pinyin restriction condition performs maximum decoding on the speech to be recognized, all the decoded moments form paths and generate an optimal path, and the optimal path sequence of the optimal path is converted into a final digital pinyin sequence formed by digital pinyin;
namely, the CTC decoding algorithm with the digital pinyin limiting condition reduces the searching range of the CTC decoding from all Chinese pinyin to the range of digital pinyin.
Further, in order to better implement the present invention, the process of converting the optimal path sequence of the optimal path into the final digital pinyin sequence consisting of the digital pinyins is performed according to the following steps:
step S310: if continuous repeated digital pinyin or BLANK appears, merging and then jumping to the step S320; if there is no continuous repeated digital pinyin or BLANK, directly jumping to step S320;
step S320: removing all BLANK; if the digital pinyin before and after BLANK is the same, the continuous repetition of the digital pinyin is kept after the BLANK is removed.
The invention has the beneficial effects that:
(1) the invention can realize high-precision recognition only by a small amount of voice marking data, does not need special digital voice pronunciation data, can be the voice of any Chinese character, and the data can be easily acquired on an open source data set free of charge.
(2) The acoustic model uses a deep learning technology and is combined with the CTC-based decoding method, so that the automatic extraction of the audio features is realized, and a large amount of manual feature extraction work is saved.
(3) The modeling unit of the acoustic model is the pinyin with the tone removed, so that the acoustic model has strong robustness to dialects, and the numbers of various tones can be accurately identified.
Drawings
FIG. 1 is a schematic diagram of the architecture of the acoustic model of the present invention.
FIG. 2 is a schematic flow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments will be clearly and completely described below with reference to the accompanying drawings. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments presented in the figures is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
a method for recognizing numbers based on deep learning of speech includes using phonetic letters of Chinese without tone as model building unit of acoustic model, using CNN + CTC structure to build deep neural network model from speech to phonetic letters end to end, decoding by CTC decoding algorithm with digital phonetic letters limiting condition after deep learning to recognize speech numbers.
The method specifically comprises the steps S100-S400.
Step S100: and collecting audio annotation data, and cleaning and preprocessing the audio annotation data to obtain the Chinese pinyin and the spectrogram without tones.
The method specifically comprises the following steps:
1. chinese voice labeling data of more than 200 hours are collected, voice pronouncing persons participate in the method, the proportion of men and women is balanced, and the total pronunciation time of each person does not exceed 30 minutes. The voice of each person is composed of a plurality of segments, each segment being a sample of data, not more than 30 seconds. The audio format is unified as a wav format with a single channel, 16k sample rate, 16 bit depth. Each audio clip has a corresponding text label.
2. Deleting samples containing symbols of non-Chinese systems such as English and the like, removing punctuation marks of the marked characters, and converting the punctuation marks into corresponding Chinese characters if Arabic numerals exist. And finally, uniformly converting the Chinese characters into pinyin with tone removed.
For example: the following notations are provided:
today is No. 15, and the weather is clear.
The result after conversion is:
jin tian shi shi wu hao tian qi qing lang
3. and framing the audio waveform signal of each sample, performing short-time Fourier transform on each frame, and finally forming a spectrogram. The spectrogram is a time-series vector sequence, and each time corresponding vector is the characteristic of the audio at the current time.
Step S200: and (3) taking the Chinese pinyin without tones in the step S100 as a modeling unit of the acoustic model, inputting the two-dimensional matrix of the spectrogram obtained in the step S100 into the acoustic model, and training the acoustic model by using the CNN + CTC model.
And (3) training an acoustic model taking the non-tonal pinyin as a modeling unit by using the CNN + CTC model, wherein the input of the model is the two-dimensional matrix of the audio spectrogram obtained in the step S100.
The architecture of the acoustic model is shown in fig. 1 as 10 layers of CNN followed by a full connection, with the loss function using CTC.
For example: suppose the spectrogram feature of the sample is x(i)The corresponding phonetic notation is y(i)
x(i)And y(i)Belongs to the training set X { (X)(1),y(1)),(x(2),y(2)) ,. for each x(i)Assume that its timing length is T(i)Then the audio characteristic at each moment is
The output of the acoustic model is the probability distribution for all pinyins at each time t, given x
Figure BDA0002283141850000045
Wherein the content of the first and second substances,
Figure BDA0002283141850000046
i.e., the set of all pinyin and BLANK marks. Assuming concealment of the l-th layerLayer is hlThe length of the sliding window is c, and the convolution kernel parameter is wlIf f is the nonlinear transformation ReLU, the convolution layer is calculated at time t as follows:
Figure BDA0002283141850000042
the output of the last convolutional layer is input to a full link layer, WlFor the parameters of the fully connected layer, the calculation of the fully connected layer is as follows:
Figure BDA0002283141850000043
the final output layer L is a softmax layer, calculated as follows:
Figure BDA0002283141850000044
step S300: and based on the acoustic model in the step S200, performing maximum decoding by using a CTC decoding algorithm with a digital pinyin limiting condition to recognize the speech to be recognized from the digital pinyin.
The invention adds the limited condition of digital pinyin to the CTC decoding algorithm, the searching range of CTC decoding is reduced from all Chinese pinyin to the range of digital pinyin, and the path formed by all decoded moments is assumed to beThen the optimal path
Figure BDA0002283141850000052
The expression of (a) is:
Figure BDA0002283141850000053
s.t.l′t∈{ling,yi,er,san,si,wu,liu,qi,ba,jiu,BLANK} (4)。
optimal path
Figure BDA0002283141850000054
The final digital pinyin sequence is obtained by the following steps:
step S310: if continuous repeated pinyin or BLANK appears, merging;
step S320: and removing all BLANK, and if the Pinyin before and after BLANK is the same, keeping continuous repetition of the Pinyin after removal.
For example: the following optimal path sequence:
BLANK san BALNK jiu liu qi qi ba wu BLANK wu
the combined results are:
san jiu liu qi ba wu wu。
step S400: and then obtaining a final Arabic numeral sequence according to the corresponding relation between the numeric pinyin and the Arabic numerals.
For example: the result of san jiu liu qiba wu wu was 3967855.
The invention provides a method for recognizing speech numbers, which takes Chinese pinyin without tones as a modeling unit of an acoustic model and constructs a deep neural network model from speech to pinyin end to end. The model is modeled by using a CNN + CTC structure, and in a CTC decoding stage, the invention innovatively adds a digital pinyin limiting condition on the basis of a CTC maximum decoding algorithm, thereby greatly reducing a CTC decoding search space and efficiently and accurately identifying voice numbers.
① the cost of the phonetic label data is high, the invention only needs a small amount of phonetic label data to realize high precision identification, and does not need special pronunciation data for the number, it can be the phonetic of any Chinese character, such data can be easily obtained on the open source data set free, ② the acoustic model uses deep learning technique, and combines with the decoding method based on CTC, it realizes the automatic extraction of the audio frequency feature, saves a lot of manual feature extraction work, ③ the model unit of the acoustic model is the phonetic de-intonation, makes the model have strong robustness to dialect, the number of the tone can be identified accurately.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims (8)

1. A method for recognizing numbers based on deep learning is characterized in that the Chinese pinyin without tones is used as a modeling unit of an acoustic model, a CNN + CTC structure is adopted to construct a deep neural network model from voice to pinyin end, and a CTC decoding algorithm with digital pinyin limiting conditions is adopted to decode after the model is trained, so that the voice numbers are recognized.
2. The method for recognizing numbers based on deep learning of the speech according to claim 1, characterized by comprising the following steps:
s100, collecting audio annotation data, and cleaning and preprocessing the audio annotation data to obtain a Chinese pinyin and a spectrogram without tones;
s200, inputting the two-dimensional matrix of the spectrogram obtained in the step S100 into the acoustic model by taking the Chinese pinyin without tones in the step S100 as a modeling unit of the acoustic model and training the acoustic model by using a CNN + CTC model;
step S300, based on the acoustic model in the step S200, using a CTC decoding algorithm with a digital pinyin limiting condition to carry out maximum decoding, and identifying the voice to be identified to the digital pinyin;
and step S400, obtaining a final Arabic numeral sequence according to the corresponding relation between the digital pinyin and the Arabic numerals.
3. The method for recognizing numbers based on deep learning of claim 2, wherein the step S100 of collecting the audio annotation data is to collect at least 200 hours of chinese phonetic standard data, wherein the chinese phonetic standard data is provided by a plurality of phonetic speakers with equal male and female proportions, and the voice of each phonetic speaker is composed of a plurality of audio segments; each audio clip is used as a sample of the Chinese speech standard data and is provided with corresponding marked Chinese characters.
4. The method for recognizing numbers based on deep learning of claim 3, wherein the total pronunciation time of each pronunciation speaker is not more than 30 minutes, and one sample of the Chinese phonetic standard data is not more than 30 seconds; the audio format of each sample is a single channel, 16k sample rate, 16 bit depth WAV format.
5. The method for recognizing numbers based on deep learning of claim 3, wherein the step S100 of cleaning and preprocessing the audio annotation data specifically comprises: deleting samples containing non-Chinese system symbols; removing punctuation marks of the marked Chinese characters, wherein if Arabic numerals exist, the marked Chinese characters need to be converted into corresponding Chinese characters; then, uniformly converting the Chinese characters into Chinese pinyin with tones removed; and framing the audio signal of each sample, performing short-time Fourier transform on each frame, and finally forming a spectrogram.
6. The method for recognizing numbers based on deep learning of claim 2, wherein the acoustic model in step S200 is constructed by adding 1 full-connected layer after a 10-layer CNN convolutional neural network.
7. The method for recognizing numbers based on deep learning of claim 2, wherein in step S300, the CTC decoding algorithm with the digital pinyin restriction condition decodes the speech to be recognized maximally, forms paths at all times after decoding and generates the optimal path, and converts the optimal path sequence of the optimal path into the final digital pinyin sequence formed by digital pinyin;
namely, the CTC decoding algorithm with the digital pinyin limiting condition reduces the searching range of the CTC decoding from all Chinese pinyin to the range of digital pinyin.
8. The method for recognizing numbers based on deep learning of claim 7, wherein the process of converting the optimal path sequence of the optimal path into the final digital pinyin sequence consisting of digital pinyins is performed according to the following steps:
step S310, if continuous repeated digital pinyin or BLANK appears, merging and then jumping to step S320; if there is no continuous repeated digital pinyin or BLANK, directly jumping to step S320;
step S320, removing all BLANK; if the digital pinyin before and after BLANK is the same, the continuous repetition of the digital pinyin is kept after the BLANK is removed.
CN201911149493.4A 2019-11-21 2019-11-21 Speech recognition digital method based on deep learning Pending CN110853629A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911149493.4A CN110853629A (en) 2019-11-21 2019-11-21 Speech recognition digital method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911149493.4A CN110853629A (en) 2019-11-21 2019-11-21 Speech recognition digital method based on deep learning

Publications (1)

Publication Number Publication Date
CN110853629A true CN110853629A (en) 2020-02-28

Family

ID=69603396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911149493.4A Pending CN110853629A (en) 2019-11-21 2019-11-21 Speech recognition digital method based on deep learning

Country Status (1)

Country Link
CN (1) CN110853629A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111246026A (en) * 2020-03-11 2020-06-05 兰州飞天网景信息产业有限公司 Recording processing method based on convolutional neural network and connectivity time sequence classification
CN111710330A (en) * 2020-07-29 2020-09-25 深圳波洛斯科技有限公司 Environmental noise elimination method and device based on deep neural network and storage medium
CN111833869A (en) * 2020-07-01 2020-10-27 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
CN112104457A (en) * 2020-08-28 2020-12-18 苏州云葫芦信息科技有限公司 Method and system for generating verification code for converting digital to Chinese character type
CN112767923A (en) * 2021-01-05 2021-05-07 上海微盟企业发展有限公司 Voice recognition method and device
CN113380231A (en) * 2021-06-15 2021-09-10 北京一起教育科技有限责任公司 Voice conversion method and device and electronic equipment
CN113506584A (en) * 2021-07-06 2021-10-15 腾讯音乐娱乐科技(深圳)有限公司 Data processing method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869624A (en) * 2016-03-29 2016-08-17 腾讯科技(深圳)有限公司 Method and apparatus for constructing speech decoding network in digital speech recognition
US20160351188A1 (en) * 2015-05-26 2016-12-01 Google Inc. Learning pronunciations from acoustic sequences
CN107358951A (en) * 2017-06-29 2017-11-17 阿里巴巴集团控股有限公司 A kind of voice awakening method, device and electronic equipment
CN109065032A (en) * 2018-07-16 2018-12-21 杭州电子科技大学 A kind of external corpus audio recognition method based on depth convolutional neural networks
CN109272990A (en) * 2018-09-25 2019-01-25 江南大学 Audio recognition method based on convolutional neural networks
CN110288995A (en) * 2019-07-19 2019-09-27 出门问问(苏州)信息科技有限公司 Exchange method, device, storage medium and electronic equipment based on speech recognition
CN110299132A (en) * 2019-06-26 2019-10-01 京东数字科技控股有限公司 A kind of speech digit recognition methods and device
US10468019B1 (en) * 2017-10-27 2019-11-05 Kadho, Inc. System and method for automatic speech recognition using selection of speech models based on input characteristics

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160351188A1 (en) * 2015-05-26 2016-12-01 Google Inc. Learning pronunciations from acoustic sequences
CN105869624A (en) * 2016-03-29 2016-08-17 腾讯科技(深圳)有限公司 Method and apparatus for constructing speech decoding network in digital speech recognition
CN107358951A (en) * 2017-06-29 2017-11-17 阿里巴巴集团控股有限公司 A kind of voice awakening method, device and electronic equipment
US10468019B1 (en) * 2017-10-27 2019-11-05 Kadho, Inc. System and method for automatic speech recognition using selection of speech models based on input characteristics
CN109065032A (en) * 2018-07-16 2018-12-21 杭州电子科技大学 A kind of external corpus audio recognition method based on depth convolutional neural networks
CN109272990A (en) * 2018-09-25 2019-01-25 江南大学 Audio recognition method based on convolutional neural networks
CN110299132A (en) * 2019-06-26 2019-10-01 京东数字科技控股有限公司 A kind of speech digit recognition methods and device
CN110288995A (en) * 2019-07-19 2019-09-27 出门问问(苏州)信息科技有限公司 Exchange method, device, storage medium and electronic equipment based on speech recognition

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111246026A (en) * 2020-03-11 2020-06-05 兰州飞天网景信息产业有限公司 Recording processing method based on convolutional neural network and connectivity time sequence classification
CN111833869A (en) * 2020-07-01 2020-10-27 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
CN111833869B (en) * 2020-07-01 2022-02-11 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
CN111710330A (en) * 2020-07-29 2020-09-25 深圳波洛斯科技有限公司 Environmental noise elimination method and device based on deep neural network and storage medium
CN112104457A (en) * 2020-08-28 2020-12-18 苏州云葫芦信息科技有限公司 Method and system for generating verification code for converting digital to Chinese character type
CN112104457B (en) * 2020-08-28 2022-06-17 苏州云葫芦信息科技有限公司 Method and system for generating verification code for converting numbers into Chinese character types
CN112767923A (en) * 2021-01-05 2021-05-07 上海微盟企业发展有限公司 Voice recognition method and device
CN113380231A (en) * 2021-06-15 2021-09-10 北京一起教育科技有限责任公司 Voice conversion method and device and electronic equipment
CN113380231B (en) * 2021-06-15 2023-01-24 北京一起教育科技有限责任公司 Voice conversion method and device and electronic equipment
CN113506584A (en) * 2021-07-06 2021-10-15 腾讯音乐娱乐科技(深圳)有限公司 Data processing method and device
CN113506584B (en) * 2021-07-06 2024-05-14 腾讯音乐娱乐科技(深圳)有限公司 Data processing method and device

Similar Documents

Publication Publication Date Title
CN110853629A (en) Speech recognition digital method based on deep learning
CN107680582B (en) Acoustic model training method, voice recognition method, device, equipment and medium
CN110263322B (en) Audio corpus screening method and device for speech recognition and computer equipment
CN109192213B (en) Method and device for real-time transcription of court trial voice, computer equipment and storage medium
WO2018227781A1 (en) Voice recognition method, apparatus, computer device, and storage medium
CN109410914B (en) Method for identifying Jiangxi dialect speech and dialect point
Kelly et al. Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors
CN111640418B (en) Prosodic phrase identification method and device and electronic equipment
CN107945805A (en) A kind of intelligent across language voice identification method for transformation
CN106782521A (en) A kind of speech recognition system
CN110767210A (en) Method and device for generating personalized voice
CN109377981B (en) Phoneme alignment method and device
CN102237083A (en) Portable interpretation system based on WinCE platform and language recognition method thereof
CN112489634A (en) Language acoustic model training method and device, electronic equipment and computer medium
Anoop et al. Automatic speech recognition for Sanskrit
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
CN113611286B (en) Cross-language speech emotion recognition method and system based on common feature extraction
CN107123419A (en) The optimization method of background noise reduction in the identification of Sphinx word speeds
CN113268989A (en) Polyphone processing method and device
CN114626424B (en) Data enhancement-based silent speech recognition method and device
CN111489745A (en) Chinese speech recognition system applied to artificial intelligence
Lin et al. Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder
CN112233668A (en) Voice instruction and identity recognition method based on neural network
Bansod et al. Speaker Recognition using Marathi (Varhadi) Language
Phuong et al. Development of high-performance and large-scale vietnamese automatic speech recognition systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228