CN107293290A - The method and apparatus for setting up Speech acoustics model - Google Patents
The method and apparatus for setting up Speech acoustics model Download PDFInfo
- Publication number
- CN107293290A CN107293290A CN201710640480.1A CN201710640480A CN107293290A CN 107293290 A CN107293290 A CN 107293290A CN 201710640480 A CN201710640480 A CN 201710640480A CN 107293290 A CN107293290 A CN 107293290A
- Authority
- CN
- China
- Prior art keywords
- data
- result
- audio signal
- spectrogram
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000005236 sound signal Effects 0.000 claims abstract description 40
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 239000011159 matrix material Substances 0.000 claims description 24
- 238000001228 spectrum Methods 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 15
- 238000013527 convolutional neural network Methods 0.000 description 13
- 238000010801 machine learning Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of method and apparatus for setting up Speech acoustics model.Methods described includes:Obtain the audio signal of speech data;Feature extraction is carried out to audio signal, the spectrogram of audio signal is obtained;Image recognition is carried out to the spectrogram, result is identified;According to the actual sound data of recognition result and the speech data, Speech acoustics model is set up.
Description
Technical field
The present invention relates to field of information processing, espespecially a kind of method and apparatus for setting up Speech acoustics model.
Background technology
Machine learning has become one of data analysing method most popular in information industry at present, and it can make analysis mould
The foundation automation of type, algorithm is continued to optimize from data with existing by algorithm iteration and optimal model, machine learning is formed
So that computer has " brain ", allow them can not see clearly those data for being hidden in depths by explicit programming.Although
Miscellaneous machine learning algorithm is existing for a long time, but from past information occlusion progresses data explosion till now when
Generation, the data volume and data scale in each field all exponentially go up pattern, and the explosive growth of this data scale brings huge
Big opportunity and change potentiality, it is possible to use the advantage such as integrality of these data helps us preferably to make in all trades and professions
Decision-making, the research for being changed into data-driven in for scientific research provides good example, so for machine learning and greatly
The combination of data just becomes particularly important, and we pursue, and calculating is more and more faster, and more and more accurate, model is more and more accurate.
Machine learning under big data greatly improves sample size, and this classification for allowing for many problems has rich
Rich sample size is as support, and this is the advantage place of big data.But huge data volume also can bring one to machine learning
The problems such as relation between fixed puzzlement, data, screening of valid data, can largely effect on the accurate of machine learning model training
Degree and training time.Rule in data and our institutes are hidden so being excavated from the data that the scale of construction is huge, structure is various
Information is needed, so that data play maximized value, it is a core objective of big data technology.
Prediction claims, in following several years, and information is searched on the internet will increasingly rely on phonetic entry, rather than keyboard
Input, this represents the emergence for this conventional machines study for setting up Speech acoustics model, exactly because the introducing of deep learning
Help with big data causes the degree of accuracy and intelligent continuous improvement for setting up Speech acoustics model, how to improve and sets up a standard
The high Speech acoustics model of exactness is urgent problem to be solved.
The content of the invention
In order to solve the above-mentioned technical problem, the invention provides a kind of method for setting up Speech acoustics model, it can set up
The high Speech acoustics model of the degree of accuracy.
In order to reach the object of the invention, the invention provides a kind of method for setting up Speech acoustics model, including:
Obtain the audio signal of speech data;
Feature extraction is carried out to audio signal, the spectrogram of audio signal is obtained;
Image recognition is carried out to the spectrogram, result is identified;
According to the actual sound data of recognition result and the speech data, Speech acoustics model is set up.
Wherein, it is described that image recognition is carried out to the spectrogram, result is identified, including:
Spectrogram is handled successively using multiple convolutional layers in deep layer convolutional network, result is identified.
Wherein, it is described that image recognition is carried out to the spectrogram, result is identified, in addition to:
After convolutional layer processing, at the result after being handled using the pond layer in deep layer convolutional network convolutional layer
Reason, is identified result.
Wherein, image recognition is carried out to the spectrogram, be identified before result, methods described also includes:Obtain institute
The weight matrix of audio signal is stated, wherein the weight matrix is going out in voice according to the voice data of the audio signal
Determined between current with the importance in voice;The data of frequency spectrum are handled using the weight matrix.
Wherein, methods described also includes:The mark of valid data is carried out to the voice data in acoustic model.
A kind of device for setting up Speech acoustics model, including:
Signal acquisition module, the audio signal for obtaining speech data;
Extraction module, for carrying out feature extraction to audio signal, obtains the spectrogram of audio signal;
Identification module, for carrying out image recognition to the spectrogram, is identified result;
Determining module, for the actual sound data according to recognition result and the speech data, sets up Speech acoustics model.
Wherein, the identification module specifically for:
Spectrogram is handled successively using multiple convolutional layers in deep layer convolutional network, result is identified.
Wherein, the identification module is additionally operable to:
After convolutional layer processing, at the result after being handled using the pond layer in deep layer convolutional network convolutional layer
Reason, is identified result.
Wherein, described device also includes:
Matrix acquisition module, for being treated using convolutional layer in journey, obtains the weight square of the audio signal
Battle array, wherein the weight matrix is the weight in the voice data of audio signal time of occurrence and voice in voice
The property wanted is determined;
Processing module, for being handled using the weight matrix the data of frequency spectrum.
Wherein, described device also includes:
Mark module, the mark for carrying out valid data to the voice data in acoustic model.
The embodiment that the present invention is provided, by obtaining the spectrum information of audio signal, schemes to the image of spectrum information
As identification, audio signal is handled as view data, more the acoustic information of degree of accuracy attribute sound really, improve voice sound
The degree of accuracy for learning model is high.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by specification, rights
Specifically noted structure is realized and obtained in claim and accompanying drawing.
Brief description of the drawings
Accompanying drawing is used for providing further understanding technical solution of the present invention, and constitutes a part for specification, with this
The embodiment of application is used to explain technical scheme together, does not constitute the limitation to technical solution of the present invention.
The flow chart for the method for setting up Speech acoustics model that Fig. 1 provides for the present invention;
The schematic flow sheet for setting up Speech acoustics model that Fig. 2 provides for the present invention;
Fig. 3 handles the schematic flow sheet of audible spectrum image for the deep layer convolutional neural networks that the present invention is provided;
The structure chart for the device for setting up Speech acoustics model that Fig. 4 provides for the present invention.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with accompanying drawing to the present invention
Embodiment be described in detail.It should be noted that in the case where not conflicting, in the embodiment and embodiment in the application
Feature can mutually be combined.
Can be in the computer system of such as one group computer executable instructions the step of the flow of accompanying drawing is illustrated
Perform.And, although logical order is shown in flow charts, but in some cases, can be with suitable different from herein
Sequence performs shown or described step.
The flow chart for the method for setting up Speech acoustics model that Fig. 1 provides for the present invention.Method includes shown in Fig. 1:
Step 101, the audio signal for obtaining speech data;
Step 102, to audio signal carry out feature extraction, obtain the spectrogram of audio signal;
Step 103, to the spectrogram carry out image recognition, be identified result;
Step 104, the actual sound data according to recognition result and the speech data, set up Speech acoustics model.
The embodiment of the method that the present invention is provided, by obtaining the spectrum information of audio signal, enters to the image of spectrum information
Row image recognition, audio signal is handled as view data, more the acoustic information of degree of accuracy attribute sound really, improves language
The degree of accuracy of phonematics model is high..
The embodiment of the method that the present invention is provided is described further below:
The present invention is using in deep layer convolutional network (Deep Convolutional Neural Networks, deep CNN)
Multiple convolutional layers spectrogram is handled successively, be identified result.
Applied using deep layer convolutional neural networks algorithm in Speech acoustics model is set up, the frequency spectrum of voice signal is worked as
Image procossing is done, indeformable using convolution overcomes the diversity of voice signal in itself, can be substantially improved and set up Speech acoustics
The degree of accuracy of model.
Wherein, it is described that image recognition is carried out to the spectrogram, result is identified, in addition to:
After convolutional layer processing, at the result after being handled using the pond layer in deep layer convolutional network convolutional layer
Reason, is identified result.
Convolutional layer processing after, recycle pond layer handled, reduce convolution kernel size, can train deeper,
The more preferable convolutional neural networks model of effect, so as to lift recognition accuracy.
In actual applications, the different time may be different with the importance of frequency corresponding points, such as, current time correspondence
Frame importance than front and rear several vertical frame dimensions some, so, weight matrix need to be introduced, each layer is done before convolution operation first
Enter row element with this matrix to be intelligently multiplied, be weighted equivalent to according to importance, wherein the initialization value of weight is 1.
Specifically, carrying out image recognition to the spectrogram, it is identified before result, methods described also includes:
The weight matrix of the audio signal is obtained, wherein the weight matrix is the audio number according to the audio signal
Determined according to the importance in time of occurrence and voice in voice;The data of frequency spectrum are carried out using the weight matrix
Processing.
Wherein, in big data aspect, the valid data gone out using big data Analysis and Screening (with or without mark) are to model
Exercise supervision or unsupervised training, while calibrating patterns, lift scheme precision, that is, lift speech discrimination accuracy.
The embodiment of the method that the present invention is provided, speech recognition Acoustic Modeling is applied to by deep layer volume and nerual network technique
In, significantly lift the degree of accuracy of speech recognition.The achievement of image recognition in recent years, and voice and image have been used for reference in profit
With the intercommunity of CNN model trainings, compared to the existing convolutional neural networks combination deep neural network technology of industrial quarters, mistake
Rate relative reduction 10%.Many algorithms based on data set may fail in big data at present, so need to utilize simultaneously
Big data technology, carries out adjusting ginseng and correction to model.
The embodiment of the method that the present invention is provided is described further below:
The schematic flow sheet for setting up Speech acoustics model that Fig. 2 provides for the present invention.Flow includes shown in Fig. 2:
Signal transacting and feature extraction to input signal, are carried out at noise reduction and channel distortion to original audio signal
Reason, frequency domain is transformed into by signal from time domain, is acoustic model extraction characteristic vector below.
Core formula in Speech acoustics model is set up is as follows, and its core is W to be found so that P (W) and P (X |
W it is) all big.P (W) represents the language model set up in Speech acoustics model, that is, this string of words or word have in itself it is many " as
Words ";P (X | W), which represents the acoustic model set up in Speech acoustics model, i.e. the words and had, great may send out into this cross-talk.
So that the two value maximums are exactly the core missions for the lifting degree of accuracy for setting up Speech acoustics model, the comprehensive acoustic mode of decoding search
The result of type fraction and language model fraction, regards recognition result by overall output fraction highest word sequence.
The modeling for setting up Speech acoustics model is to need the relationship modeling between voice signal and word content.Normal conditions
Under, the voice spectrum being all based on after the time frequency analysis completion of Speech acoustics model is set up, and wherein voice time-frequency spectrum is tool
There is design feature.If improving the rate for setting up Speech acoustics model, exactly need to overcome voice signal to face various each
The diversity (such as various regions dialect, various language, liaison, change voice etc.) of sample, the diversity (such as noise jamming) of environment.
Fig. 3 handles the schematic flow sheet of audible spectrum image for the deep layer convolutional neural networks that the present invention is provided, specific real
Existing method is as follows:
Using convolutional neural networks, because it is locally connected, (each neuron is not necessarily to carry out global image in fact
Perceive, it is only necessary to the local letter for perceiving, then local informix being got up to just to have obtained the overall situation in higher
Breath.) and weight it is shared the characteristics of so that it has good translation invariance.The thought of convolutional neural networks is applied to and built
In the Acoustic Modeling of vertical Speech acoustics model, then the diversity of voice signal in itself can be overcome using the consistency of convolution
Many, while connecing pooling layers after layer convolution again, reducing the size of convolution kernel can allow us to train deeper, effect
The more preferable CNN models of fruit.From this view point, then may be considered the time-frequency spectrum that obtains whole speech signal analysis as
One image is equally handled, and it is identified using wide variety of deep layer convolutional network in image.Meanwhile, in model knot
In structure, deepCNN helps model to have the translation invariance in good time domain, so that model has preferably noise immunity.
For the frequency spectrum input of convolutional layer, the different times may different (current times with the importance of frequency corresponding points
The importance of corresponding frame than front and rear several vertical frame dimensions some), weight matrix (initialization value of weight be 1) is introduced, to each layer
Enter row element before doing convolution operation with this matrix first to be intelligently multiplied, be weighted equivalent to according to importance.
Calibration and training energy service hoisting of the big data to the mark of language material and analysis and to model simultaneously is in deepCNN skills
The degree of accuracy for setting up Speech acoustics model under art.
As seen from the above, the analysis by the introducing and big data of deep CNN algorithms to training corpus, image is known
The deepCNN algorithms of not middle extensive utilization are applied in Speech acoustics model is set up, and the frequency spectrum of voice signal is regarded into image
Processing, indeformable using convolution overcomes the diversity of voice signal in itself, can be substantially improved and set up Speech acoustics model
The degree of accuracy.
The degree of accuracy of Speech acoustics model is set up by two aspect liftings, is that in terms of algorithm lifting, will generally answer first
Convolutional neural networks technology for image recognition is applied to set up Speech acoustics model, and whole speech signal analysis is obtained
Frequency spectrum equally handled as image, the degree of accuracy for setting up Speech acoustics model can be greatly improved.Next to that big
Data plane, the valid data gone out using big data Analysis and Screening (with or without mark) are exercised supervision or non-supervisory instruction to model
Practice, while calibrating patterns, lift scheme precision, i.e. lifting set up the degree of accuracy of Speech acoustics model.
The structure chart for the device for setting up Speech acoustics model that Fig. 4 provides for the present invention.Fig. 4 shown devices include:
Signal acquisition module 401, the audio signal for obtaining speech data;
Extraction module 402, for carrying out feature extraction to audio signal, obtains the spectrogram of audio signal;
Identification module 403, for carrying out image recognition to the spectrogram, is identified result;
Module 404 is set up, for the actual sound data according to recognition result and the speech data, Speech acoustics mould is set up
Type.
Wherein described identification module 403, specifically for:
Spectrogram is handled successively using multiple convolutional layers in deep layer convolutional network, result is identified.
Wherein, the identification module 403 is additionally operable to after convolutional layer processing, right using the pond layer in deep layer convolutional network
Result after convolutional layer processing is handled, and is identified result.
Matrix acquisition module, for before being handled using convolutional layer, obtaining the weight matrix of the audio signal,
Wherein described weight matrix is important in the voice data of audio signal time of occurrence and voice in voice
Property is determined;
Processing module, for being handled using the weight matrix the data of frequency spectrum.
Optionally, described device also includes:
Mark module, the mark for carrying out valid data to the voice data in acoustic model.
The device embodiment that the present invention is provided, by obtaining the spectrum information of audio signal, enters to the image of spectrum information
Row image recognition, audio signal is handled as view data, more the acoustic information of degree of accuracy attribute sound really, improves language
The degree of accuracy of phonematics model is high.
Although disclosed herein embodiment as above, described content be only readily appreciate the present invention and use
Embodiment, is not limited to the present invention.Technical staff in any art of the present invention, is taken off not departing from the present invention
On the premise of the spirit and scope of dew, any modification and change, but the present invention can be carried out in the form and details of implementation
Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.
Claims (10)
1. a kind of method for setting up Speech acoustics model, it is characterised in that including:
Obtain the audio signal of speech data;
Feature extraction is carried out to audio signal, the spectrogram of audio signal is obtained;
Image recognition is carried out to the spectrogram, result is identified;
According to the actual sound data of recognition result and the speech data, Speech acoustics model is set up.
2. according to the method described in claim 1, it is characterised in that described that image recognition is carried out to the spectrogram, known
Other result, including:
Spectrogram is handled successively using multiple convolutional layers in deep layer convolutional network, result is identified.
3. method according to claim 2, it is characterised in that described to carry out image recognition to the spectrogram, is known
Other result, in addition to:
After convolutional layer processing, the result after convolutional layer processing is handled using the pond layer in deep layer convolutional network, obtained
To recognition result.
4. according to the method in claim 2 or 3, it is characterised in that image recognition is carried out to the spectrogram, is identified
As a result before, methods described also includes:
The weight matrix of the audio signal is obtained, wherein the weight matrix is existed according to the voice data of the audio signal
The importance in time of occurrence and voice in voice is determined;
The data of frequency spectrum are handled using the weight matrix.
5. method according to claim 4, it is characterised in that methods described also includes:
The mark of valid data is carried out to the voice data in acoustic model.
6. a kind of device for setting up Speech acoustics model, it is characterised in that including:
Signal acquisition module, the audio signal for obtaining speech data;
Extraction module, for carrying out feature extraction to audio signal, obtains the spectrogram of audio signal;
Identification module, for carrying out image recognition to the spectrogram, is identified result;
Determining module, for the actual sound data according to recognition result and the speech data, sets up Speech acoustics model.
7. device according to claim 6, it is characterised in that the identification module specifically for:
Spectrogram is handled successively using multiple convolutional layers in deep layer convolutional network, result is identified.
8. device according to claim 7, it is characterised in that the identification module is additionally operable to:
After convolutional layer processing, the result after convolutional layer processing is handled using the pond layer in deep layer convolutional network, obtained
To recognition result.
9. the device according to claim 7 or 8, it is characterised in that described device also includes:
Matrix acquisition module, for before being handled using convolutional layer, obtaining the weight matrix of the audio signal, wherein
The weight matrix be the importance in the voice data of audio signal time of occurrence and voice in voice come
Determine;
Processing module, for being handled using the weight matrix the data of frequency spectrum.
10. device according to claim 9, it is characterised in that described device also includes:
Mark module, the mark for carrying out valid data to the voice data in acoustic model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710640480.1A CN107293290A (en) | 2017-07-31 | 2017-07-31 | The method and apparatus for setting up Speech acoustics model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710640480.1A CN107293290A (en) | 2017-07-31 | 2017-07-31 | The method and apparatus for setting up Speech acoustics model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107293290A true CN107293290A (en) | 2017-10-24 |
Family
ID=60103935
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710640480.1A Pending CN107293290A (en) | 2017-07-31 | 2017-07-31 | The method and apparatus for setting up Speech acoustics model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107293290A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108877783A (en) * | 2018-07-05 | 2018-11-23 | 腾讯音乐娱乐科技(深圳)有限公司 | The method and apparatus for determining the audio types of audio data |
CN111048071A (en) * | 2019-11-11 | 2020-04-21 | 北京海益同展信息科技有限公司 | Voice data processing method and device, computer equipment and storage medium |
CN111768799A (en) * | 2019-03-14 | 2020-10-13 | 富泰华工业(深圳)有限公司 | Voice recognition method, voice recognition apparatus, computer apparatus, and storage medium |
CN112116926A (en) * | 2019-06-19 | 2020-12-22 | 北京猎户星空科技有限公司 | Audio data processing method and device and model training method and device |
CN112363114A (en) * | 2021-01-14 | 2021-02-12 | 杭州兆华电子有限公司 | Public place acoustic event positioning method and system based on distributed noise sensor |
CN113112969A (en) * | 2021-03-23 | 2021-07-13 | 平安科技(深圳)有限公司 | Buddhism music score recording method, device, equipment and medium based on neural network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140288928A1 (en) * | 2013-03-25 | 2014-09-25 | Gerald Bradley PENN | System and method for applying a convolutional neural network to speech recognition |
CN104616664A (en) * | 2015-02-02 | 2015-05-13 | 合肥工业大学 | Method for recognizing audio based on spectrogram significance test |
CN106128465A (en) * | 2016-06-23 | 2016-11-16 | 成都启英泰伦科技有限公司 | A kind of Voiceprint Recognition System and method |
CN106782501A (en) * | 2016-12-28 | 2017-05-31 | 百度在线网络技术(北京)有限公司 | Speech Feature Extraction and device based on artificial intelligence |
CN106782602A (en) * | 2016-12-01 | 2017-05-31 | 南京邮电大学 | Speech-emotion recognition method based on length time memory network and convolutional neural networks |
-
2017
- 2017-07-31 CN CN201710640480.1A patent/CN107293290A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140288928A1 (en) * | 2013-03-25 | 2014-09-25 | Gerald Bradley PENN | System and method for applying a convolutional neural network to speech recognition |
CN104616664A (en) * | 2015-02-02 | 2015-05-13 | 合肥工业大学 | Method for recognizing audio based on spectrogram significance test |
CN106128465A (en) * | 2016-06-23 | 2016-11-16 | 成都启英泰伦科技有限公司 | A kind of Voiceprint Recognition System and method |
CN106782602A (en) * | 2016-12-01 | 2017-05-31 | 南京邮电大学 | Speech-emotion recognition method based on length time memory network and convolutional neural networks |
CN106782501A (en) * | 2016-12-28 | 2017-05-31 | 百度在线网络技术(北京)有限公司 | Speech Feature Extraction and device based on artificial intelligence |
Non-Patent Citations (1)
Title |
---|
XUMCAS: "神经网络—CNN结构和语音识别应用", 《HTTPS://BLOG.CSDN.NET/XMDXCSJ/ARTICLE/DETAILS/54695995》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108877783A (en) * | 2018-07-05 | 2018-11-23 | 腾讯音乐娱乐科技(深圳)有限公司 | The method and apparatus for determining the audio types of audio data |
CN111768799A (en) * | 2019-03-14 | 2020-10-13 | 富泰华工业(深圳)有限公司 | Voice recognition method, voice recognition apparatus, computer apparatus, and storage medium |
CN112116926A (en) * | 2019-06-19 | 2020-12-22 | 北京猎户星空科技有限公司 | Audio data processing method and device and model training method and device |
CN111048071A (en) * | 2019-11-11 | 2020-04-21 | 北京海益同展信息科技有限公司 | Voice data processing method and device, computer equipment and storage medium |
CN112363114A (en) * | 2021-01-14 | 2021-02-12 | 杭州兆华电子有限公司 | Public place acoustic event positioning method and system based on distributed noise sensor |
CN113112969A (en) * | 2021-03-23 | 2021-07-13 | 平安科技(深圳)有限公司 | Buddhism music score recording method, device, equipment and medium based on neural network |
CN113112969B (en) * | 2021-03-23 | 2024-04-05 | 平安科技(深圳)有限公司 | Buddhism music notation method, device, equipment and medium based on neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107293290A (en) | The method and apparatus for setting up Speech acoustics model | |
CN104756182B (en) | Auditory attention clue is combined to detect for phone/vowel/syllable boundaries with phoneme posteriority score | |
CN105741832B (en) | Spoken language evaluation method and system based on deep learning | |
CN112466326B (en) | Voice emotion feature extraction method based on transducer model encoder | |
CN109065032B (en) | External corpus speech recognition method based on deep convolutional neural network | |
CN108986798B (en) | Processing method, device and the equipment of voice data | |
CN107818164A (en) | A kind of intelligent answer method and its system | |
CN107633842A (en) | Audio recognition method, device, computer equipment and storage medium | |
CN112487949B (en) | Learner behavior recognition method based on multi-mode data fusion | |
US12067989B2 (en) | Combined learning method and apparatus using deepening neural network based feature enhancement and modified loss function for speaker recognition robust to noisy environments | |
CN112990296A (en) | Image-text matching model compression and acceleration method and system based on orthogonal similarity distillation | |
CN110853680A (en) | double-BiLSTM structure with multi-input multi-fusion strategy for speech emotion recognition | |
CN106935239A (en) | The construction method and device of a kind of pronunciation dictionary | |
CN106297792A (en) | The recognition methods of a kind of voice mouth shape cartoon and device | |
CN106328123B (en) | Method for recognizing middle ear voice in normal voice stream under condition of small database | |
CN110825850B (en) | Natural language theme classification method and device | |
CN108962229A (en) | A kind of target speaker's voice extraction method based on single channel, unsupervised formula | |
Mao et al. | Unsupervised discovery of an extended phoneme set in l2 english speech for mispronunciation detection and diagnosis | |
Sunny et al. | Recognition of speech signals: an experimental comparison of linear predictive coding and discrete wavelet transforms | |
CN116244474A (en) | Learner learning state acquisition method based on multi-mode emotion feature fusion | |
CN114898779A (en) | Multi-mode fused speech emotion recognition method and system | |
CN110348482A (en) | A kind of speech emotion recognition system based on depth model integrated architecture | |
CN116860943A (en) | Multi-round dialogue method and system for dialogue style perception and theme guidance | |
Zhao et al. | Enhancing audio perception in augmented reality: a dynamic vocal information processing framework | |
Anindya et al. | Development of Indonesian speech recognition with deep neural network for robotic command |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171024 |