CN107293290A - The method and apparatus for setting up Speech acoustics model - Google Patents

The method and apparatus for setting up Speech acoustics model Download PDF

Info

Publication number
CN107293290A
CN107293290A CN201710640480.1A CN201710640480A CN107293290A CN 107293290 A CN107293290 A CN 107293290A CN 201710640480 A CN201710640480 A CN 201710640480A CN 107293290 A CN107293290 A CN 107293290A
Authority
CN
China
Prior art keywords
data
result
audio signal
spectrogram
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710640480.1A
Other languages
Chinese (zh)
Inventor
吕广杰
刘芮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710640480.1A priority Critical patent/CN107293290A/en
Publication of CN107293290A publication Critical patent/CN107293290A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of method and apparatus for setting up Speech acoustics model.Methods described includes:Obtain the audio signal of speech data;Feature extraction is carried out to audio signal, the spectrogram of audio signal is obtained;Image recognition is carried out to the spectrogram, result is identified;According to the actual sound data of recognition result and the speech data, Speech acoustics model is set up.

Description

The method and apparatus for setting up Speech acoustics model
Technical field
The present invention relates to field of information processing, espespecially a kind of method and apparatus for setting up Speech acoustics model.
Background technology
Machine learning has become one of data analysing method most popular in information industry at present, and it can make analysis mould The foundation automation of type, algorithm is continued to optimize from data with existing by algorithm iteration and optimal model, machine learning is formed So that computer has " brain ", allow them can not see clearly those data for being hidden in depths by explicit programming.Although Miscellaneous machine learning algorithm is existing for a long time, but from past information occlusion progresses data explosion till now when Generation, the data volume and data scale in each field all exponentially go up pattern, and the explosive growth of this data scale brings huge Big opportunity and change potentiality, it is possible to use the advantage such as integrality of these data helps us preferably to make in all trades and professions Decision-making, the research for being changed into data-driven in for scientific research provides good example, so for machine learning and greatly The combination of data just becomes particularly important, and we pursue, and calculating is more and more faster, and more and more accurate, model is more and more accurate.
Machine learning under big data greatly improves sample size, and this classification for allowing for many problems has rich Rich sample size is as support, and this is the advantage place of big data.But huge data volume also can bring one to machine learning The problems such as relation between fixed puzzlement, data, screening of valid data, can largely effect on the accurate of machine learning model training Degree and training time.Rule in data and our institutes are hidden so being excavated from the data that the scale of construction is huge, structure is various Information is needed, so that data play maximized value, it is a core objective of big data technology.
Prediction claims, in following several years, and information is searched on the internet will increasingly rely on phonetic entry, rather than keyboard Input, this represents the emergence for this conventional machines study for setting up Speech acoustics model, exactly because the introducing of deep learning Help with big data causes the degree of accuracy and intelligent continuous improvement for setting up Speech acoustics model, how to improve and sets up a standard The high Speech acoustics model of exactness is urgent problem to be solved.
The content of the invention
In order to solve the above-mentioned technical problem, the invention provides a kind of method for setting up Speech acoustics model, it can set up The high Speech acoustics model of the degree of accuracy.
In order to reach the object of the invention, the invention provides a kind of method for setting up Speech acoustics model, including:
Obtain the audio signal of speech data;
Feature extraction is carried out to audio signal, the spectrogram of audio signal is obtained;
Image recognition is carried out to the spectrogram, result is identified;
According to the actual sound data of recognition result and the speech data, Speech acoustics model is set up.
Wherein, it is described that image recognition is carried out to the spectrogram, result is identified, including:
Spectrogram is handled successively using multiple convolutional layers in deep layer convolutional network, result is identified.
Wherein, it is described that image recognition is carried out to the spectrogram, result is identified, in addition to:
After convolutional layer processing, at the result after being handled using the pond layer in deep layer convolutional network convolutional layer Reason, is identified result.
Wherein, image recognition is carried out to the spectrogram, be identified before result, methods described also includes:Obtain institute The weight matrix of audio signal is stated, wherein the weight matrix is going out in voice according to the voice data of the audio signal Determined between current with the importance in voice;The data of frequency spectrum are handled using the weight matrix.
Wherein, methods described also includes:The mark of valid data is carried out to the voice data in acoustic model.
A kind of device for setting up Speech acoustics model, including:
Signal acquisition module, the audio signal for obtaining speech data;
Extraction module, for carrying out feature extraction to audio signal, obtains the spectrogram of audio signal;
Identification module, for carrying out image recognition to the spectrogram, is identified result;
Determining module, for the actual sound data according to recognition result and the speech data, sets up Speech acoustics model.
Wherein, the identification module specifically for:
Spectrogram is handled successively using multiple convolutional layers in deep layer convolutional network, result is identified.
Wherein, the identification module is additionally operable to:
After convolutional layer processing, at the result after being handled using the pond layer in deep layer convolutional network convolutional layer Reason, is identified result.
Wherein, described device also includes:
Matrix acquisition module, for being treated using convolutional layer in journey, obtains the weight square of the audio signal Battle array, wherein the weight matrix is the weight in the voice data of audio signal time of occurrence and voice in voice The property wanted is determined;
Processing module, for being handled using the weight matrix the data of frequency spectrum.
Wherein, described device also includes:
Mark module, the mark for carrying out valid data to the voice data in acoustic model.
The embodiment that the present invention is provided, by obtaining the spectrum information of audio signal, schemes to the image of spectrum information As identification, audio signal is handled as view data, more the acoustic information of degree of accuracy attribute sound really, improve voice sound The degree of accuracy for learning model is high.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by specification, rights Specifically noted structure is realized and obtained in claim and accompanying drawing.
Brief description of the drawings
Accompanying drawing is used for providing further understanding technical solution of the present invention, and constitutes a part for specification, with this The embodiment of application is used to explain technical scheme together, does not constitute the limitation to technical solution of the present invention.
The flow chart for the method for setting up Speech acoustics model that Fig. 1 provides for the present invention;
The schematic flow sheet for setting up Speech acoustics model that Fig. 2 provides for the present invention;
Fig. 3 handles the schematic flow sheet of audible spectrum image for the deep layer convolutional neural networks that the present invention is provided;
The structure chart for the device for setting up Speech acoustics model that Fig. 4 provides for the present invention.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with accompanying drawing to the present invention Embodiment be described in detail.It should be noted that in the case where not conflicting, in the embodiment and embodiment in the application Feature can mutually be combined.
Can be in the computer system of such as one group computer executable instructions the step of the flow of accompanying drawing is illustrated Perform.And, although logical order is shown in flow charts, but in some cases, can be with suitable different from herein Sequence performs shown or described step.
The flow chart for the method for setting up Speech acoustics model that Fig. 1 provides for the present invention.Method includes shown in Fig. 1:
Step 101, the audio signal for obtaining speech data;
Step 102, to audio signal carry out feature extraction, obtain the spectrogram of audio signal;
Step 103, to the spectrogram carry out image recognition, be identified result;
Step 104, the actual sound data according to recognition result and the speech data, set up Speech acoustics model.
The embodiment of the method that the present invention is provided, by obtaining the spectrum information of audio signal, enters to the image of spectrum information Row image recognition, audio signal is handled as view data, more the acoustic information of degree of accuracy attribute sound really, improves language The degree of accuracy of phonematics model is high..
The embodiment of the method that the present invention is provided is described further below:
The present invention is using in deep layer convolutional network (Deep Convolutional Neural Networks, deep CNN) Multiple convolutional layers spectrogram is handled successively, be identified result.
Applied using deep layer convolutional neural networks algorithm in Speech acoustics model is set up, the frequency spectrum of voice signal is worked as Image procossing is done, indeformable using convolution overcomes the diversity of voice signal in itself, can be substantially improved and set up Speech acoustics The degree of accuracy of model.
Wherein, it is described that image recognition is carried out to the spectrogram, result is identified, in addition to:
After convolutional layer processing, at the result after being handled using the pond layer in deep layer convolutional network convolutional layer Reason, is identified result.
Convolutional layer processing after, recycle pond layer handled, reduce convolution kernel size, can train deeper, The more preferable convolutional neural networks model of effect, so as to lift recognition accuracy.
In actual applications, the different time may be different with the importance of frequency corresponding points, such as, current time correspondence Frame importance than front and rear several vertical frame dimensions some, so, weight matrix need to be introduced, each layer is done before convolution operation first Enter row element with this matrix to be intelligently multiplied, be weighted equivalent to according to importance, wherein the initialization value of weight is 1.
Specifically, carrying out image recognition to the spectrogram, it is identified before result, methods described also includes:
The weight matrix of the audio signal is obtained, wherein the weight matrix is the audio number according to the audio signal Determined according to the importance in time of occurrence and voice in voice;The data of frequency spectrum are carried out using the weight matrix Processing.
Wherein, in big data aspect, the valid data gone out using big data Analysis and Screening (with or without mark) are to model Exercise supervision or unsupervised training, while calibrating patterns, lift scheme precision, that is, lift speech discrimination accuracy.
The embodiment of the method that the present invention is provided, speech recognition Acoustic Modeling is applied to by deep layer volume and nerual network technique In, significantly lift the degree of accuracy of speech recognition.The achievement of image recognition in recent years, and voice and image have been used for reference in profit With the intercommunity of CNN model trainings, compared to the existing convolutional neural networks combination deep neural network technology of industrial quarters, mistake Rate relative reduction 10%.Many algorithms based on data set may fail in big data at present, so need to utilize simultaneously Big data technology, carries out adjusting ginseng and correction to model.
The embodiment of the method that the present invention is provided is described further below:
The schematic flow sheet for setting up Speech acoustics model that Fig. 2 provides for the present invention.Flow includes shown in Fig. 2:
Signal transacting and feature extraction to input signal, are carried out at noise reduction and channel distortion to original audio signal Reason, frequency domain is transformed into by signal from time domain, is acoustic model extraction characteristic vector below.
Core formula in Speech acoustics model is set up is as follows, and its core is W to be found so that P (W) and P (X | W it is) all big.P (W) represents the language model set up in Speech acoustics model, that is, this string of words or word have in itself it is many " as Words ";P (X | W), which represents the acoustic model set up in Speech acoustics model, i.e. the words and had, great may send out into this cross-talk. So that the two value maximums are exactly the core missions for the lifting degree of accuracy for setting up Speech acoustics model, the comprehensive acoustic mode of decoding search The result of type fraction and language model fraction, regards recognition result by overall output fraction highest word sequence.
The modeling for setting up Speech acoustics model is to need the relationship modeling between voice signal and word content.Normal conditions Under, the voice spectrum being all based on after the time frequency analysis completion of Speech acoustics model is set up, and wherein voice time-frequency spectrum is tool There is design feature.If improving the rate for setting up Speech acoustics model, exactly need to overcome voice signal to face various each The diversity (such as various regions dialect, various language, liaison, change voice etc.) of sample, the diversity (such as noise jamming) of environment.
Fig. 3 handles the schematic flow sheet of audible spectrum image for the deep layer convolutional neural networks that the present invention is provided, specific real Existing method is as follows:
Using convolutional neural networks, because it is locally connected, (each neuron is not necessarily to carry out global image in fact Perceive, it is only necessary to the local letter for perceiving, then local informix being got up to just to have obtained the overall situation in higher Breath.) and weight it is shared the characteristics of so that it has good translation invariance.The thought of convolutional neural networks is applied to and built In the Acoustic Modeling of vertical Speech acoustics model, then the diversity of voice signal in itself can be overcome using the consistency of convolution Many, while connecing pooling layers after layer convolution again, reducing the size of convolution kernel can allow us to train deeper, effect The more preferable CNN models of fruit.From this view point, then may be considered the time-frequency spectrum that obtains whole speech signal analysis as One image is equally handled, and it is identified using wide variety of deep layer convolutional network in image.Meanwhile, in model knot In structure, deepCNN helps model to have the translation invariance in good time domain, so that model has preferably noise immunity.
For the frequency spectrum input of convolutional layer, the different times may different (current times with the importance of frequency corresponding points The importance of corresponding frame than front and rear several vertical frame dimensions some), weight matrix (initialization value of weight be 1) is introduced, to each layer Enter row element before doing convolution operation with this matrix first to be intelligently multiplied, be weighted equivalent to according to importance.
Calibration and training energy service hoisting of the big data to the mark of language material and analysis and to model simultaneously is in deepCNN skills The degree of accuracy for setting up Speech acoustics model under art.
As seen from the above, the analysis by the introducing and big data of deep CNN algorithms to training corpus, image is known The deepCNN algorithms of not middle extensive utilization are applied in Speech acoustics model is set up, and the frequency spectrum of voice signal is regarded into image Processing, indeformable using convolution overcomes the diversity of voice signal in itself, can be substantially improved and set up Speech acoustics model The degree of accuracy.
The degree of accuracy of Speech acoustics model is set up by two aspect liftings, is that in terms of algorithm lifting, will generally answer first Convolutional neural networks technology for image recognition is applied to set up Speech acoustics model, and whole speech signal analysis is obtained Frequency spectrum equally handled as image, the degree of accuracy for setting up Speech acoustics model can be greatly improved.Next to that big Data plane, the valid data gone out using big data Analysis and Screening (with or without mark) are exercised supervision or non-supervisory instruction to model Practice, while calibrating patterns, lift scheme precision, i.e. lifting set up the degree of accuracy of Speech acoustics model.
The structure chart for the device for setting up Speech acoustics model that Fig. 4 provides for the present invention.Fig. 4 shown devices include:
Signal acquisition module 401, the audio signal for obtaining speech data;
Extraction module 402, for carrying out feature extraction to audio signal, obtains the spectrogram of audio signal;
Identification module 403, for carrying out image recognition to the spectrogram, is identified result;
Module 404 is set up, for the actual sound data according to recognition result and the speech data, Speech acoustics mould is set up Type.
Wherein described identification module 403, specifically for:
Spectrogram is handled successively using multiple convolutional layers in deep layer convolutional network, result is identified.
Wherein, the identification module 403 is additionally operable to after convolutional layer processing, right using the pond layer in deep layer convolutional network Result after convolutional layer processing is handled, and is identified result.
Matrix acquisition module, for before being handled using convolutional layer, obtaining the weight matrix of the audio signal, Wherein described weight matrix is important in the voice data of audio signal time of occurrence and voice in voice Property is determined;
Processing module, for being handled using the weight matrix the data of frequency spectrum.
Optionally, described device also includes:
Mark module, the mark for carrying out valid data to the voice data in acoustic model.
The device embodiment that the present invention is provided, by obtaining the spectrum information of audio signal, enters to the image of spectrum information Row image recognition, audio signal is handled as view data, more the acoustic information of degree of accuracy attribute sound really, improves language The degree of accuracy of phonematics model is high.
Although disclosed herein embodiment as above, described content be only readily appreciate the present invention and use Embodiment, is not limited to the present invention.Technical staff in any art of the present invention, is taken off not departing from the present invention On the premise of the spirit and scope of dew, any modification and change, but the present invention can be carried out in the form and details of implementation Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.

Claims (10)

1. a kind of method for setting up Speech acoustics model, it is characterised in that including:
Obtain the audio signal of speech data;
Feature extraction is carried out to audio signal, the spectrogram of audio signal is obtained;
Image recognition is carried out to the spectrogram, result is identified;
According to the actual sound data of recognition result and the speech data, Speech acoustics model is set up.
2. according to the method described in claim 1, it is characterised in that described that image recognition is carried out to the spectrogram, known Other result, including:
Spectrogram is handled successively using multiple convolutional layers in deep layer convolutional network, result is identified.
3. method according to claim 2, it is characterised in that described to carry out image recognition to the spectrogram, is known Other result, in addition to:
After convolutional layer processing, the result after convolutional layer processing is handled using the pond layer in deep layer convolutional network, obtained To recognition result.
4. according to the method in claim 2 or 3, it is characterised in that image recognition is carried out to the spectrogram, is identified As a result before, methods described also includes:
The weight matrix of the audio signal is obtained, wherein the weight matrix is existed according to the voice data of the audio signal The importance in time of occurrence and voice in voice is determined;
The data of frequency spectrum are handled using the weight matrix.
5. method according to claim 4, it is characterised in that methods described also includes:
The mark of valid data is carried out to the voice data in acoustic model.
6. a kind of device for setting up Speech acoustics model, it is characterised in that including:
Signal acquisition module, the audio signal for obtaining speech data;
Extraction module, for carrying out feature extraction to audio signal, obtains the spectrogram of audio signal;
Identification module, for carrying out image recognition to the spectrogram, is identified result;
Determining module, for the actual sound data according to recognition result and the speech data, sets up Speech acoustics model.
7. device according to claim 6, it is characterised in that the identification module specifically for:
Spectrogram is handled successively using multiple convolutional layers in deep layer convolutional network, result is identified.
8. device according to claim 7, it is characterised in that the identification module is additionally operable to:
After convolutional layer processing, the result after convolutional layer processing is handled using the pond layer in deep layer convolutional network, obtained To recognition result.
9. the device according to claim 7 or 8, it is characterised in that described device also includes:
Matrix acquisition module, for before being handled using convolutional layer, obtaining the weight matrix of the audio signal, wherein The weight matrix be the importance in the voice data of audio signal time of occurrence and voice in voice come Determine;
Processing module, for being handled using the weight matrix the data of frequency spectrum.
10. device according to claim 9, it is characterised in that described device also includes:
Mark module, the mark for carrying out valid data to the voice data in acoustic model.
CN201710640480.1A 2017-07-31 2017-07-31 The method and apparatus for setting up Speech acoustics model Pending CN107293290A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710640480.1A CN107293290A (en) 2017-07-31 2017-07-31 The method and apparatus for setting up Speech acoustics model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710640480.1A CN107293290A (en) 2017-07-31 2017-07-31 The method and apparatus for setting up Speech acoustics model

Publications (1)

Publication Number Publication Date
CN107293290A true CN107293290A (en) 2017-10-24

Family

ID=60103935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710640480.1A Pending CN107293290A (en) 2017-07-31 2017-07-31 The method and apparatus for setting up Speech acoustics model

Country Status (1)

Country Link
CN (1) CN107293290A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108877783A (en) * 2018-07-05 2018-11-23 腾讯音乐娱乐科技(深圳)有限公司 The method and apparatus for determining the audio types of audio data
CN111048071A (en) * 2019-11-11 2020-04-21 北京海益同展信息科技有限公司 Voice data processing method and device, computer equipment and storage medium
CN111768799A (en) * 2019-03-14 2020-10-13 富泰华工业(深圳)有限公司 Voice recognition method, voice recognition apparatus, computer apparatus, and storage medium
CN112116926A (en) * 2019-06-19 2020-12-22 北京猎户星空科技有限公司 Audio data processing method and device and model training method and device
CN112363114A (en) * 2021-01-14 2021-02-12 杭州兆华电子有限公司 Public place acoustic event positioning method and system based on distributed noise sensor
CN113112969A (en) * 2021-03-23 2021-07-13 平安科技(深圳)有限公司 Buddhism music score recording method, device, equipment and medium based on neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140288928A1 (en) * 2013-03-25 2014-09-25 Gerald Bradley PENN System and method for applying a convolutional neural network to speech recognition
CN104616664A (en) * 2015-02-02 2015-05-13 合肥工业大学 Method for recognizing audio based on spectrogram significance test
CN106128465A (en) * 2016-06-23 2016-11-16 成都启英泰伦科技有限公司 A kind of Voiceprint Recognition System and method
CN106782501A (en) * 2016-12-28 2017-05-31 百度在线网络技术(北京)有限公司 Speech Feature Extraction and device based on artificial intelligence
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140288928A1 (en) * 2013-03-25 2014-09-25 Gerald Bradley PENN System and method for applying a convolutional neural network to speech recognition
CN104616664A (en) * 2015-02-02 2015-05-13 合肥工业大学 Method for recognizing audio based on spectrogram significance test
CN106128465A (en) * 2016-06-23 2016-11-16 成都启英泰伦科技有限公司 A kind of Voiceprint Recognition System and method
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN106782501A (en) * 2016-12-28 2017-05-31 百度在线网络技术(北京)有限公司 Speech Feature Extraction and device based on artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XUMCAS: "神经网络—CNN结构和语音识别应用", 《HTTPS://BLOG.CSDN.NET/XMDXCSJ/ARTICLE/DETAILS/54695995》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108877783A (en) * 2018-07-05 2018-11-23 腾讯音乐娱乐科技(深圳)有限公司 The method and apparatus for determining the audio types of audio data
CN111768799A (en) * 2019-03-14 2020-10-13 富泰华工业(深圳)有限公司 Voice recognition method, voice recognition apparatus, computer apparatus, and storage medium
CN112116926A (en) * 2019-06-19 2020-12-22 北京猎户星空科技有限公司 Audio data processing method and device and model training method and device
CN111048071A (en) * 2019-11-11 2020-04-21 北京海益同展信息科技有限公司 Voice data processing method and device, computer equipment and storage medium
CN112363114A (en) * 2021-01-14 2021-02-12 杭州兆华电子有限公司 Public place acoustic event positioning method and system based on distributed noise sensor
CN113112969A (en) * 2021-03-23 2021-07-13 平安科技(深圳)有限公司 Buddhism music score recording method, device, equipment and medium based on neural network
CN113112969B (en) * 2021-03-23 2024-04-05 平安科技(深圳)有限公司 Buddhism music notation method, device, equipment and medium based on neural network

Similar Documents

Publication Publication Date Title
CN107293290A (en) The method and apparatus for setting up Speech acoustics model
CN104756182B (en) Auditory attention clue is combined to detect for phone/vowel/syllable boundaries with phoneme posteriority score
CN105741832B (en) Spoken language evaluation method and system based on deep learning
CN112466326B (en) Voice emotion feature extraction method based on transducer model encoder
CN109065032B (en) External corpus speech recognition method based on deep convolutional neural network
CN108986798B (en) Processing method, device and the equipment of voice data
CN107818164A (en) A kind of intelligent answer method and its system
CN107633842A (en) Audio recognition method, device, computer equipment and storage medium
CN112487949B (en) Learner behavior recognition method based on multi-mode data fusion
US12067989B2 (en) Combined learning method and apparatus using deepening neural network based feature enhancement and modified loss function for speaker recognition robust to noisy environments
CN112990296A (en) Image-text matching model compression and acceleration method and system based on orthogonal similarity distillation
CN110853680A (en) double-BiLSTM structure with multi-input multi-fusion strategy for speech emotion recognition
CN106935239A (en) The construction method and device of a kind of pronunciation dictionary
CN106297792A (en) The recognition methods of a kind of voice mouth shape cartoon and device
CN106328123B (en) Method for recognizing middle ear voice in normal voice stream under condition of small database
CN110825850B (en) Natural language theme classification method and device
CN108962229A (en) A kind of target speaker's voice extraction method based on single channel, unsupervised formula
Mao et al. Unsupervised discovery of an extended phoneme set in l2 english speech for mispronunciation detection and diagnosis
Sunny et al. Recognition of speech signals: an experimental comparison of linear predictive coding and discrete wavelet transforms
CN116244474A (en) Learner learning state acquisition method based on multi-mode emotion feature fusion
CN114898779A (en) Multi-mode fused speech emotion recognition method and system
CN110348482A (en) A kind of speech emotion recognition system based on depth model integrated architecture
CN116860943A (en) Multi-round dialogue method and system for dialogue style perception and theme guidance
Zhao et al. Enhancing audio perception in augmented reality: a dynamic vocal information processing framework
Anindya et al. Development of Indonesian speech recognition with deep neural network for robotic command

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171024