CN106875942A - Acoustic model adaptive approach based on accent bottleneck characteristic - Google Patents

Acoustic model adaptive approach based on accent bottleneck characteristic Download PDF

Info

Publication number
CN106875942A
CN106875942A CN201611232996.4A CN201611232996A CN106875942A CN 106875942 A CN106875942 A CN 106875942A CN 201611232996 A CN201611232996 A CN 201611232996A CN 106875942 A CN106875942 A CN 106875942A
Authority
CN
China
Prior art keywords
accent
feature
depth
acoustic model
bottleneck
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611232996.4A
Other languages
Chinese (zh)
Other versions
CN106875942B (en
Inventor
陶建华
易江燕
温正棋
倪浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201611232996.4A priority Critical patent/CN106875942B/en
Publication of CN106875942A publication Critical patent/CN106875942A/en
Application granted granted Critical
Publication of CN106875942B publication Critical patent/CN106875942B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction

Abstract

The invention belongs to technical field of voice recognition, and in particular to a kind of acoustic model adaptive approach based on accent bottleneck characteristic.In order to realize the user for different accents, personalized customization acoustic model is carried out, the method that the present invention is provided comprises the following steps:S1 is based on the first deep neural network, and feature is spliced as training sample using the vocal print of multiple accent voice datas, obtains depth accent bottleneck network model;S2, based on the depth accent bottleneck network, obtain the accent splicing feature of the accent voice data;S3, based on depth nervus opticus network, feature is spliced as training sample using the accent of multiple accent voice datas, obtain the baseline acoustic model of accent independence;S4, using specific accent voice data the accent splicing feature parameter of the baseline acoustic model of the accent independence is adjusted, generation accent rely on acoustic model.By the method for the present invention, the accuracy rate with accent speech recognition is improve.

Description

Acoustic model adaptive approach based on accent bottleneck characteristic
Technical field
The invention belongs to technical field of voice recognition, and in particular to a kind of acoustic model based on accent bottleneck characteristic is adaptive Induction method.
Background technology
So far, speech recognition technology has become the important entrance of man-machine interaction, uses the user number of the technology It is growing.Because these users come from all corners of the country, accent varies, thus general voice recognition acoustic model is difficult Suitable for all users.Accordingly, it would be desirable to the user of different accents is directed to, the corresponding acoustic model of personalized customization.At present, extract The technology of vocal print feature is widely used in speaker field, and the mouth of the vocal print feature of speaker and speaker Sound has the contact of countless ties.Although had many scholars before this extracting accent spy by extracting the technology of vocal print feature Levy, but this technology can not high-levelly characterize accent feature, and accent feature is characterized how high-levelly to personalization Customization acoustic model is most important.
Therefore, this area needs a kind of new method to solve the above problems.
The content of the invention
In order to solve above mentioned problem of the prior art, i.e., in order to realize the user for different accents, carry out individual Propertyization customizes acoustic model, the invention provides a kind of acoustic model adaptive approach based on accent bottleneck characteristic.The method Comprise the following steps:
S1, based on the first deep neural network, feature is spliced as training sample using the vocal print of multiple accent voice datas, Obtain depth accent bottleneck network model;
S2, based on the depth accent bottleneck network, obtain the accent splicing feature of the accent voice data;
S3, based on the second deep neural network, using the accent of multiple accent voice datas splice feature as Training sample, obtains the baseline acoustic model of accent independence;
S4, using the accent splicing feature of specific accent voice data to the baseline acoustic mould of the accent independence The parameter of type is adjusted, the acoustic model that generation accent is relied on.
Preferably, in step sl, the step of obtaining the vocal print splicing feature includes:
S11, from accent voice data extract acoustic feature;
S12, the vocal print feature vector that speaker is extracted using the acoustic feature;
S13, to merge the vocal print feature vectorial with the acoustic feature, generation vocal print splicing feature.
Preferably, in step sl, the first nerves network is depth BP network model, with the multiple institute The vocal print splicing feature for stating accent voice data is trained to the depth BP network model, obtains depth accent bottle Neck network.
Preferably, step S2 is further included:
S21, the accent bottleneck characteristic that the accent voice data is extracted using the depth accent bottleneck network model;
S22, the fusion accent bottleneck characteristic and the acoustic feature, obtain the accent splicing of the accent voice data Feature.
Preferably, step S21 is further included:The vocal print of the accent voice data is spliced into feature as the depth The input of accent bottleneck network model, the accent bottleneck characteristic of the accent voice data is obtained using propagated forward algorithm.
Preferably, in step s3, the nervus opticus network is the two-way short-term memory Recognition with Recurrent Neural Network long of depth, with Multiple accent splicing feature short-term memory Recognition with Recurrent Neural Network long two-way to the depth are trained, and obtain accent independence The two-way short-term memory Recognition with Recurrent Neural Network long of depth acoustic model;
Using the acoustic model of the two-way short-term memory Recognition with Recurrent Neural Network long of the depth of the accent independence as accent independence Baseline acoustic model.
Preferably, in step s 4, feature is spliced to the baseline acoustic model of the accent independence using the accent The parameter of output layer is adjusted, the acoustic model that production accent is relied on.
Preferably, in step s 4, to the parameter of last output layer of the baseline acoustic model of the accent independence It is adjusted.
Preferably, the parameter of the output layer of the baseline acoustic model of the accent independence is carried out using Back Propagation Algorithm Adjustment.
By using the acoustic model adaptive approach based on accent bottleneck characteristic of the invention, with following beneficial effect Really:
(1) there is more abstract, more general expression using the accent splicing feature of depth accent bottleneck network extraction, can be accurate Really obtain the high-level sign of accent.
(2) go to carry out self adaptation to the output layer of the baseline acoustic model of accent independence using accent splicing feature, it is each Planting accent has corresponding output layer, shares hidden layer parameter, can reduce the memory space of model.
(3) by the acoustic model adaptive approach based on accent bottleneck characteristic of the invention, improve band accent voice The accuracy rate of identification.
Brief description of the drawings
Fig. 1 is the flow chart of the acoustic model adaptive approach based on accent bottleneck characteristic of the invention;
Fig. 2 is the overall flow figure of the embodiment of the present invention;
Fig. 3 is the flow chart of the generation vocal print splicing feature of the embodiment of the present invention;
Fig. 4 is the flow chart of the generation accent splicing feature of the embodiment of the present invention.
Specific embodiment
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this A little implementation methods are used only for explaining know-why of the invention, it is not intended that limit the scope of the invention.
Reference picture 1, Fig. 1 shows the flow of the acoustic model adaptive approach based on accent bottleneck characteristic of the invention Figure.The method of the present invention is comprised the following steps:
S1, based on first nerves network model, feature is spliced as training sample using the vocal print of multiple accent voice datas, Obtain depth accent bottleneck network;
S2, based on the depth accent bottleneck network, obtain the accent splicing feature of the accent voice data;
S3, based on nervus opticus network model, using the accent of multiple accent voice datas splice feature as Training sample, obtains the baseline acoustic model of accent independence;
S4, using the accent splicing feature of specific accent voice data to the baseline acoustic mould of the accent independence The parameter of type is adjusted, the acoustic model that generation accent is relied on.
Fig. 2 shows that 2 is the overall flow figure of the embodiment of the present invention.The method of the present invention is carried out in detail referring to Fig. 2 Describe in detail bright.
In step sl, the step of obtaining the vocal print splicing feature includes:
S11, from accent voice data extract acoustic feature.Specifically, it is main in the step to use Mel spectrum signature Or mel cepstrum feature.By taking mel cepstrum feature as an example, the static parameter of mel cepstrum feature can be 13 dimensions, and one is done to it Order difference and second differnce, the dimension of final argument is 39 dimensions, then does subsequent treatment using the feature of this 39 dimension.
S12, the vocal print feature vector that speaker is extracted using the acoustic feature.Specifically, instructed using the acoustic feature Practice gauss hybrid models-universal background model, and then using the gauss hybrid models-universal background model from the acoustic feature In come extract everyone vocal print feature vector, and the vocal print feature vector dimension for 80 dimension.
S13, to merge the vocal print feature vectorial with the acoustic feature, generation vocal print splicing feature.As shown in figure 3, During production vocal print splicing feature, the vocal print feature Vector Fusion extracted in the acoustic feature and S12 that will be extracted in S11. Specifically, everyone vocal print feature vector is spliced on the acoustic feature of every frame, so as to generate vocal print splicing feature.
In step sl, first nerves network can be depth BP network model, splice special with the vocal print for generating Levy and the depth BP network model is trained, obtain depth accent bottleneck network.In the present embodiment, depth mouthful Last hidden node of sound bottleneck network is 60, fewer than other the number of hidden nodes, other hidden nodes can for 1024 or 2048.In the present embodiment, the training criterion of the depth BP network model is cross entropy, and training method is back-propagating Algorithm.The activation primitive of depth BP network model can be tangent bend activation primitive or hyperbolic tangent activation letter Number, the loss function of the network is cross entropy, and it belongs to techniques known in the art, is not described in detail herein.
In step s 2, the step of obtaining accent splicing feature includes:
S21, the accent bottleneck characteristic using accent voice data described in the depth accent bottleneck network extraction;
S22, the fusion accent bottleneck characteristic and the acoustic feature, obtain the accent splicing of the accent voice data Feature.
Specifically, the depth accent bottleneck network that will be obtained in step S1 is considered as a feature extractor, with step S13 The vocal print of middle generation splices feature as the input of the depth accent bottleneck network, and the accent is obtained using propagated forward algorithm The accent bottleneck characteristic of voice data.In the present embodiment, the accent bottleneck characteristic is 60 dimensions.As shown in figure 4, in production accent During splicing feature, the acoustic feature that the accent bottleneck characteristic that S21 is extracted is extracted with S11 is not merged in frame level, So as to generate accent splicing feature.
In step s3, nervus opticus network can be the two-way short-term memory Recognition with Recurrent Neural Network long of depth, with step S2 In the accent splicing feature short-term memory Recognition with Recurrent Neural Network long two-way to the depth that obtains be trained, will obtain in S2 Accent splicing feature is input into the two-way short-term memory Recognition with Recurrent Neural Network long of the depth, and the label of its output layer is sound mother.Obtain The acoustic model of the two-way short-term memory Recognition with Recurrent Neural Network long of depth of accent independence, and by the two-way length of the depth of the accent independence The acoustic model of short-term memory Recognition with Recurrent Neural Network as accent independence baseline acoustic model.In the present embodiment, depth is double To the training criterion of short-term memory Recognition with Recurrent Neural Network long to be coupled sequential classification function, training method is Back Propagation Algorithm. The two-way short-term memory Recognition with Recurrent Neural Network long of depth can remember the historical information of input feature vector, can predict input feature vector not again Carry out knowledge, its function of using three control doors to realize remembering and predicting, these three control doors are respectively input gate, forget door And out gate.The two-way short-term memory Recognition with Recurrent Neural Network long of depth belongs to techniques known in the art, is no longer retouched in detail herein State.
In step s 4, the base of the accent independence using the accent splicing feature obtained in step S2 to being obtained in step S3 The parameter of the output layer (generally last output layer) of line acoustic model is finely adjusted, the acoustic mode that production accent is relied on Type.Specifically, feature as the input of the baseline acoustic model of the accent independence is spliced into the corresponding accent of every kind of accent, it is every kind of The output layer that accent one accent of correspondence is relied on, hidden layer is so accent is shared.Further, using Back Propagation Algorithm to accent Independent baseline acoustic model carries out small parameter perturbations.Because the baseline acoustic model of accent independence is remembered in short-term based on two-way length Recall neural network model, the acoustic model that the accent that hidden layer is ultimately produced is relied on is also based on the two-way short-term memory circulation long of depth Neural network model, the label of its output layer is sound mother, and its combining with pronunciation dictionary and language model are that may recognize that audio number According to corresponding text.
So far, combined preferred embodiment shown in the drawings describes technical scheme, but, this area Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this On the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to correlation technique feature, these Technical scheme after changing or replacing it is fallen within protection scope of the present invention.

Claims (9)

1. a kind of acoustic model adaptive approach based on accent bottleneck characteristic, it is characterised in that methods described includes following step Suddenly:
S1, based on the first deep neural network, feature is spliced as training sample using the vocal print of multiple accent voice datas, obtain Depth accent bottleneck network model;
S2, based on the depth accent bottleneck network, obtain the accent splicing feature of the accent voice data;
S3, based on the second deep neural network, feature as training is spliced using the accent of multiple accent voice datas Sample, obtains the baseline acoustic model of accent independence;
S4, using the accent splicing feature of specific accent voice data to the baseline acoustic model of the accent independence Parameter is adjusted, the acoustic model that generation accent is relied on.
2. method according to claim 1, it is characterised in that in step sl, obtains the step that the vocal print splices feature Suddenly include:
S11, from accent voice data extract acoustic feature;
S12, the vocal print feature vector that speaker is extracted using the acoustic feature;
S13, to merge the vocal print feature vectorial with the acoustic feature, generation vocal print splicing feature.
3. method according to claim 2, it is characterised in that in step sl, before the first nerves network is depth Feedback neutral net, is carried out with the vocal print splicing feature of the multiple accent voice data to the depth feedforward neural network Training, obtains depth accent bottleneck network.
4. method according to claim 3, it is characterised in that step S2 is further included:
S21, the accent bottleneck characteristic that the accent voice data is extracted using the depth accent bottleneck network model;
S22, the fusion accent bottleneck characteristic and the acoustic feature, the accent splicing for obtaining the accent voice data are special Levy.
5. method according to claim 4, it is characterised in that step S21 is further included:
The vocal print of the accent voice data is spliced into feature as the input of the depth accent bottleneck network model, using preceding The accent bottleneck characteristic of the accent voice data is obtained to propagation algorithm.
6. method according to claim 5, it is characterised in that in step s3, the nervus opticus network is double depth To short-term memory Recognition with Recurrent Neural Network long,
It is trained with multiple accent splicing feature short-term memory Recognition with Recurrent Neural Network long two-way to the depth, obtains mouth The acoustic model of the two-way short-term memory Recognition with Recurrent Neural Network long of depth of sound independence;
Using the acoustic model of the two-way short-term memory Recognition with Recurrent Neural Network long of the depth of the accent independence as accent independence base Line acoustic model.
7. method according to claim 6, it is characterised in that in step s 4, feature is spliced to institute using the accent The parameter for stating the output layer of the baseline acoustic model of accent independence is adjusted, the acoustic model that production accent is relied on.
8. method according to claim 7, it is characterised in that in step s 4, to the baseline acoustic of the accent independence The parameter of last output layer of model is adjusted.
9. the method according to claim 7 or 8, it is characterised in that using Back Propagation Algorithm to the accent independence The parameter of the output layer of baseline acoustic model is adjusted.
CN201611232996.4A 2016-12-28 2016-12-28 Acoustic model self-adaption method based on accent bottleneck characteristics Active CN106875942B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611232996.4A CN106875942B (en) 2016-12-28 2016-12-28 Acoustic model self-adaption method based on accent bottleneck characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611232996.4A CN106875942B (en) 2016-12-28 2016-12-28 Acoustic model self-adaption method based on accent bottleneck characteristics

Publications (2)

Publication Number Publication Date
CN106875942A true CN106875942A (en) 2017-06-20
CN106875942B CN106875942B (en) 2021-01-22

Family

ID=59164199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611232996.4A Active CN106875942B (en) 2016-12-28 2016-12-28 Acoustic model self-adaption method based on accent bottleneck characteristics

Country Status (1)

Country Link
CN (1) CN106875942B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108074575A (en) * 2017-12-14 2018-05-25 广州势必可赢网络科技有限公司 A kind of auth method and device based on Recognition with Recurrent Neural Network
CN108538285A (en) * 2018-03-05 2018-09-14 清华大学 A kind of various keyword detection method based on multitask neural network
CN108682416A (en) * 2018-04-11 2018-10-19 深圳市卓翼科技股份有限公司 local adaptive voice training method and system
CN108682417A (en) * 2018-05-14 2018-10-19 中国科学院自动化研究所 Small data Speech acoustics modeling method in speech recognition
CN108922559A (en) * 2018-07-06 2018-11-30 华南理工大学 Recording terminal clustering method based on voice time-frequency conversion feature and integral linear programming
CN109074804A (en) * 2018-07-18 2018-12-21 深圳魔耳智能声学科技有限公司 Voice recognition processing method, electronic equipment and storage medium based on accent
CN109147763A (en) * 2018-07-10 2019-01-04 深圳市感动智能科技有限公司 A kind of audio-video keyword recognition method and device based on neural network and inverse entropy weighting
CN109887497A (en) * 2019-04-12 2019-06-14 北京百度网讯科技有限公司 Modeling method, device and the equipment of speech recognition
CN110033760A (en) * 2019-04-15 2019-07-19 北京百度网讯科技有限公司 Modeling method, device and the equipment of speech recognition
WO2019154107A1 (en) * 2018-02-12 2019-08-15 阿里巴巴集团控股有限公司 Voiceprint recognition method and device based on memorability bottleneck feature
CN110570858A (en) * 2019-09-19 2019-12-13 芋头科技(杭州)有限公司 Voice awakening method and device, intelligent sound box and computer readable storage medium
CN110890085A (en) * 2018-09-10 2020-03-17 阿里巴巴集团控股有限公司 Voice recognition method and system
CN110930982A (en) * 2019-10-31 2020-03-27 国家计算机网络与信息安全管理中心 Multi-accent acoustic model and multi-accent voice recognition method
CN111370025A (en) * 2020-02-25 2020-07-03 广州酷狗计算机科技有限公司 Audio recognition method and device and computer storage medium
CN111508501A (en) * 2020-07-02 2020-08-07 成都晓多科技有限公司 Voice recognition method and system with accent for telephone robot
CN111833847A (en) * 2019-04-15 2020-10-27 北京百度网讯科技有限公司 Speech processing model training method and device
CN112992126A (en) * 2021-04-22 2021-06-18 北京远鉴信息技术有限公司 Voice authenticity verification method and device, electronic equipment and readable storage medium

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108074575A (en) * 2017-12-14 2018-05-25 广州势必可赢网络科技有限公司 A kind of auth method and device based on Recognition with Recurrent Neural Network
WO2019154107A1 (en) * 2018-02-12 2019-08-15 阿里巴巴集团控股有限公司 Voiceprint recognition method and device based on memorability bottleneck feature
CN108538285A (en) * 2018-03-05 2018-09-14 清华大学 A kind of various keyword detection method based on multitask neural network
CN108538285B (en) * 2018-03-05 2021-05-04 清华大学 Multi-instance keyword detection method based on multitask neural network
CN108682416A (en) * 2018-04-11 2018-10-19 深圳市卓翼科技股份有限公司 local adaptive voice training method and system
CN108682416B (en) * 2018-04-11 2021-01-01 深圳市卓翼科技股份有限公司 Local adaptive speech training method and system
CN108682417A (en) * 2018-05-14 2018-10-19 中国科学院自动化研究所 Small data Speech acoustics modeling method in speech recognition
CN108922559A (en) * 2018-07-06 2018-11-30 华南理工大学 Recording terminal clustering method based on voice time-frequency conversion feature and integral linear programming
CN109147763A (en) * 2018-07-10 2019-01-04 深圳市感动智能科技有限公司 A kind of audio-video keyword recognition method and device based on neural network and inverse entropy weighting
WO2020014890A1 (en) * 2018-07-18 2020-01-23 深圳魔耳智能声学科技有限公司 Accent-based voice recognition processing method, electronic device and storage medium
CN109074804A (en) * 2018-07-18 2018-12-21 深圳魔耳智能声学科技有限公司 Voice recognition processing method, electronic equipment and storage medium based on accent
CN109074804B (en) * 2018-07-18 2021-04-06 深圳魔耳智能声学科技有限公司 Accent-based speech recognition processing method, electronic device, and storage medium
CN110890085B (en) * 2018-09-10 2023-09-12 阿里巴巴集团控股有限公司 Voice recognition method and system
CN110890085A (en) * 2018-09-10 2020-03-17 阿里巴巴集团控股有限公司 Voice recognition method and system
CN109887497A (en) * 2019-04-12 2019-06-14 北京百度网讯科技有限公司 Modeling method, device and the equipment of speech recognition
CN109887497B (en) * 2019-04-12 2021-01-29 北京百度网讯科技有限公司 Modeling method, device and equipment for speech recognition
CN111833847A (en) * 2019-04-15 2020-10-27 北京百度网讯科技有限公司 Speech processing model training method and device
CN110033760A (en) * 2019-04-15 2019-07-19 北京百度网讯科技有限公司 Modeling method, device and the equipment of speech recognition
CN110033760B (en) * 2019-04-15 2021-01-29 北京百度网讯科技有限公司 Modeling method, device and equipment for speech recognition
US11688391B2 (en) 2019-04-15 2023-06-27 Beijing Baidu Netcom Science And Technology Co. Mandarin and dialect mixed modeling and speech recognition
CN110570858A (en) * 2019-09-19 2019-12-13 芋头科技(杭州)有限公司 Voice awakening method and device, intelligent sound box and computer readable storage medium
CN110930982A (en) * 2019-10-31 2020-03-27 国家计算机网络与信息安全管理中心 Multi-accent acoustic model and multi-accent voice recognition method
CN111370025A (en) * 2020-02-25 2020-07-03 广州酷狗计算机科技有限公司 Audio recognition method and device and computer storage medium
CN111508501B (en) * 2020-07-02 2020-09-29 成都晓多科技有限公司 Voice recognition method and system with accent for telephone robot
CN111508501A (en) * 2020-07-02 2020-08-07 成都晓多科技有限公司 Voice recognition method and system with accent for telephone robot
CN112992126A (en) * 2021-04-22 2021-06-18 北京远鉴信息技术有限公司 Voice authenticity verification method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN106875942B (en) 2021-01-22

Similar Documents

Publication Publication Date Title
CN106875942A (en) Acoustic model adaptive approach based on accent bottleneck characteristic
JP7427723B2 (en) Text-to-speech synthesis in target speaker's voice using neural networks
US11837216B2 (en) Speech recognition using unspoken text and speech synthesis
CN106531157B (en) Regularization accent adaptive approach in speech recognition
CN109003601A (en) A kind of across language end-to-end speech recognition methods for low-resource Tujia language
US20200075024A1 (en) Response method and apparatus thereof
CN110223714A (en) A kind of voice-based Emotion identification method
CN107452379B (en) Dialect language identification method and virtual reality teaching method and system
CN108806667A (en) The method for synchronously recognizing of voice and mood based on neural network
CN110491393B (en) Training method of voiceprint representation model and related device
JP2017040919A (en) Speech recognition apparatus, speech recognition method, and speech recognition system
CN106688034A (en) Text-to-speech with emotional content
CN103928023A (en) Voice scoring method and system
CN105760852A (en) Driver emotion real time identification method fusing facial expressions and voices
CN107945790A (en) A kind of emotion identification method and emotion recognition system
CN108172218A (en) A kind of pronunciation modeling method and device
CN107871496A (en) Audio recognition method and device
CN110010136A (en) The training and text analyzing method, apparatus, medium and equipment of prosody prediction model
CN108986798A (en) Processing method, device and the equipment of voice data
CN109493846B (en) English accent recognition system
Sreevidya et al. Sentiment analysis by deep learning approaches
CN109377986A (en) A kind of non-parallel corpus voice personalization conversion method
Peguda et al. Speech to sign language translation for Indian languages
Wu et al. Oral English Speech Recognition Based on Enhanced Temporal Convolutional Network.
CN113470622A (en) Conversion method and device capable of converting any voice into multiple voices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant