CN106875942A - Acoustic model adaptive approach based on accent bottleneck characteristic - Google Patents
Acoustic model adaptive approach based on accent bottleneck characteristic Download PDFInfo
- Publication number
- CN106875942A CN106875942A CN201611232996.4A CN201611232996A CN106875942A CN 106875942 A CN106875942 A CN 106875942A CN 201611232996 A CN201611232996 A CN 201611232996A CN 106875942 A CN106875942 A CN 106875942A
- Authority
- CN
- China
- Prior art keywords
- accent
- feature
- depth
- acoustic model
- bottleneck
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 10
- 230000001755 vocal effect Effects 0.000 claims abstract description 32
- 238000013528 artificial neural network Methods 0.000 claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 22
- 241001269238 Data Species 0.000 claims abstract description 8
- 230000006403 short-term memory Effects 0.000 claims description 17
- 230000000306 recurrent effect Effects 0.000 claims description 16
- 238000004519 manufacturing process Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 4
- 210000005036 nerve Anatomy 0.000 claims description 4
- 230000007935 neutral effect Effects 0.000 claims 1
- 230000004913 activation Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 229910017435 S2 In Inorganic materials 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
Abstract
The invention belongs to technical field of voice recognition, and in particular to a kind of acoustic model adaptive approach based on accent bottleneck characteristic.In order to realize the user for different accents, personalized customization acoustic model is carried out, the method that the present invention is provided comprises the following steps:S1 is based on the first deep neural network, and feature is spliced as training sample using the vocal print of multiple accent voice datas, obtains depth accent bottleneck network model;S2, based on the depth accent bottleneck network, obtain the accent splicing feature of the accent voice data;S3, based on depth nervus opticus network, feature is spliced as training sample using the accent of multiple accent voice datas, obtain the baseline acoustic model of accent independence;S4, using specific accent voice data the accent splicing feature parameter of the baseline acoustic model of the accent independence is adjusted, generation accent rely on acoustic model.By the method for the present invention, the accuracy rate with accent speech recognition is improve.
Description
Technical field
The invention belongs to technical field of voice recognition, and in particular to a kind of acoustic model based on accent bottleneck characteristic is adaptive
Induction method.
Background technology
So far, speech recognition technology has become the important entrance of man-machine interaction, uses the user number of the technology
It is growing.Because these users come from all corners of the country, accent varies, thus general voice recognition acoustic model is difficult
Suitable for all users.Accordingly, it would be desirable to the user of different accents is directed to, the corresponding acoustic model of personalized customization.At present, extract
The technology of vocal print feature is widely used in speaker field, and the mouth of the vocal print feature of speaker and speaker
Sound has the contact of countless ties.Although had many scholars before this extracting accent spy by extracting the technology of vocal print feature
Levy, but this technology can not high-levelly characterize accent feature, and accent feature is characterized how high-levelly to personalization
Customization acoustic model is most important.
Therefore, this area needs a kind of new method to solve the above problems.
The content of the invention
In order to solve above mentioned problem of the prior art, i.e., in order to realize the user for different accents, carry out individual
Propertyization customizes acoustic model, the invention provides a kind of acoustic model adaptive approach based on accent bottleneck characteristic.The method
Comprise the following steps:
S1, based on the first deep neural network, feature is spliced as training sample using the vocal print of multiple accent voice datas,
Obtain depth accent bottleneck network model;
S2, based on the depth accent bottleneck network, obtain the accent splicing feature of the accent voice data;
S3, based on the second deep neural network, using the accent of multiple accent voice datas splice feature as
Training sample, obtains the baseline acoustic model of accent independence;
S4, using the accent splicing feature of specific accent voice data to the baseline acoustic mould of the accent independence
The parameter of type is adjusted, the acoustic model that generation accent is relied on.
Preferably, in step sl, the step of obtaining the vocal print splicing feature includes:
S11, from accent voice data extract acoustic feature;
S12, the vocal print feature vector that speaker is extracted using the acoustic feature;
S13, to merge the vocal print feature vectorial with the acoustic feature, generation vocal print splicing feature.
Preferably, in step sl, the first nerves network is depth BP network model, with the multiple institute
The vocal print splicing feature for stating accent voice data is trained to the depth BP network model, obtains depth accent bottle
Neck network.
Preferably, step S2 is further included:
S21, the accent bottleneck characteristic that the accent voice data is extracted using the depth accent bottleneck network model;
S22, the fusion accent bottleneck characteristic and the acoustic feature, obtain the accent splicing of the accent voice data
Feature.
Preferably, step S21 is further included:The vocal print of the accent voice data is spliced into feature as the depth
The input of accent bottleneck network model, the accent bottleneck characteristic of the accent voice data is obtained using propagated forward algorithm.
Preferably, in step s3, the nervus opticus network is the two-way short-term memory Recognition with Recurrent Neural Network long of depth, with
Multiple accent splicing feature short-term memory Recognition with Recurrent Neural Network long two-way to the depth are trained, and obtain accent independence
The two-way short-term memory Recognition with Recurrent Neural Network long of depth acoustic model;
Using the acoustic model of the two-way short-term memory Recognition with Recurrent Neural Network long of the depth of the accent independence as accent independence
Baseline acoustic model.
Preferably, in step s 4, feature is spliced to the baseline acoustic model of the accent independence using the accent
The parameter of output layer is adjusted, the acoustic model that production accent is relied on.
Preferably, in step s 4, to the parameter of last output layer of the baseline acoustic model of the accent independence
It is adjusted.
Preferably, the parameter of the output layer of the baseline acoustic model of the accent independence is carried out using Back Propagation Algorithm
Adjustment.
By using the acoustic model adaptive approach based on accent bottleneck characteristic of the invention, with following beneficial effect
Really:
(1) there is more abstract, more general expression using the accent splicing feature of depth accent bottleneck network extraction, can be accurate
Really obtain the high-level sign of accent.
(2) go to carry out self adaptation to the output layer of the baseline acoustic model of accent independence using accent splicing feature, it is each
Planting accent has corresponding output layer, shares hidden layer parameter, can reduce the memory space of model.
(3) by the acoustic model adaptive approach based on accent bottleneck characteristic of the invention, improve band accent voice
The accuracy rate of identification.
Brief description of the drawings
Fig. 1 is the flow chart of the acoustic model adaptive approach based on accent bottleneck characteristic of the invention;
Fig. 2 is the overall flow figure of the embodiment of the present invention;
Fig. 3 is the flow chart of the generation vocal print splicing feature of the embodiment of the present invention;
Fig. 4 is the flow chart of the generation accent splicing feature of the embodiment of the present invention.
Specific embodiment
The preferred embodiment of the present invention described with reference to the accompanying drawings.It will be apparent to a skilled person that this
A little implementation methods are used only for explaining know-why of the invention, it is not intended that limit the scope of the invention.
Reference picture 1, Fig. 1 shows the flow of the acoustic model adaptive approach based on accent bottleneck characteristic of the invention
Figure.The method of the present invention is comprised the following steps:
S1, based on first nerves network model, feature is spliced as training sample using the vocal print of multiple accent voice datas,
Obtain depth accent bottleneck network;
S2, based on the depth accent bottleneck network, obtain the accent splicing feature of the accent voice data;
S3, based on nervus opticus network model, using the accent of multiple accent voice datas splice feature as
Training sample, obtains the baseline acoustic model of accent independence;
S4, using the accent splicing feature of specific accent voice data to the baseline acoustic mould of the accent independence
The parameter of type is adjusted, the acoustic model that generation accent is relied on.
Fig. 2 shows that 2 is the overall flow figure of the embodiment of the present invention.The method of the present invention is carried out in detail referring to Fig. 2
Describe in detail bright.
In step sl, the step of obtaining the vocal print splicing feature includes:
S11, from accent voice data extract acoustic feature.Specifically, it is main in the step to use Mel spectrum signature
Or mel cepstrum feature.By taking mel cepstrum feature as an example, the static parameter of mel cepstrum feature can be 13 dimensions, and one is done to it
Order difference and second differnce, the dimension of final argument is 39 dimensions, then does subsequent treatment using the feature of this 39 dimension.
S12, the vocal print feature vector that speaker is extracted using the acoustic feature.Specifically, instructed using the acoustic feature
Practice gauss hybrid models-universal background model, and then using the gauss hybrid models-universal background model from the acoustic feature
In come extract everyone vocal print feature vector, and the vocal print feature vector dimension for 80 dimension.
S13, to merge the vocal print feature vectorial with the acoustic feature, generation vocal print splicing feature.As shown in figure 3,
During production vocal print splicing feature, the vocal print feature Vector Fusion extracted in the acoustic feature and S12 that will be extracted in S11.
Specifically, everyone vocal print feature vector is spliced on the acoustic feature of every frame, so as to generate vocal print splicing feature.
In step sl, first nerves network can be depth BP network model, splice special with the vocal print for generating
Levy and the depth BP network model is trained, obtain depth accent bottleneck network.In the present embodiment, depth mouthful
Last hidden node of sound bottleneck network is 60, fewer than other the number of hidden nodes, other hidden nodes can for 1024 or
2048.In the present embodiment, the training criterion of the depth BP network model is cross entropy, and training method is back-propagating
Algorithm.The activation primitive of depth BP network model can be tangent bend activation primitive or hyperbolic tangent activation letter
Number, the loss function of the network is cross entropy, and it belongs to techniques known in the art, is not described in detail herein.
In step s 2, the step of obtaining accent splicing feature includes:
S21, the accent bottleneck characteristic using accent voice data described in the depth accent bottleneck network extraction;
S22, the fusion accent bottleneck characteristic and the acoustic feature, obtain the accent splicing of the accent voice data
Feature.
Specifically, the depth accent bottleneck network that will be obtained in step S1 is considered as a feature extractor, with step S13
The vocal print of middle generation splices feature as the input of the depth accent bottleneck network, and the accent is obtained using propagated forward algorithm
The accent bottleneck characteristic of voice data.In the present embodiment, the accent bottleneck characteristic is 60 dimensions.As shown in figure 4, in production accent
During splicing feature, the acoustic feature that the accent bottleneck characteristic that S21 is extracted is extracted with S11 is not merged in frame level,
So as to generate accent splicing feature.
In step s3, nervus opticus network can be the two-way short-term memory Recognition with Recurrent Neural Network long of depth, with step S2
In the accent splicing feature short-term memory Recognition with Recurrent Neural Network long two-way to the depth that obtains be trained, will obtain in S2
Accent splicing feature is input into the two-way short-term memory Recognition with Recurrent Neural Network long of the depth, and the label of its output layer is sound mother.Obtain
The acoustic model of the two-way short-term memory Recognition with Recurrent Neural Network long of depth of accent independence, and by the two-way length of the depth of the accent independence
The acoustic model of short-term memory Recognition with Recurrent Neural Network as accent independence baseline acoustic model.In the present embodiment, depth is double
To the training criterion of short-term memory Recognition with Recurrent Neural Network long to be coupled sequential classification function, training method is Back Propagation Algorithm.
The two-way short-term memory Recognition with Recurrent Neural Network long of depth can remember the historical information of input feature vector, can predict input feature vector not again
Carry out knowledge, its function of using three control doors to realize remembering and predicting, these three control doors are respectively input gate, forget door
And out gate.The two-way short-term memory Recognition with Recurrent Neural Network long of depth belongs to techniques known in the art, is no longer retouched in detail herein
State.
In step s 4, the base of the accent independence using the accent splicing feature obtained in step S2 to being obtained in step S3
The parameter of the output layer (generally last output layer) of line acoustic model is finely adjusted, the acoustic mode that production accent is relied on
Type.Specifically, feature as the input of the baseline acoustic model of the accent independence is spliced into the corresponding accent of every kind of accent, it is every kind of
The output layer that accent one accent of correspondence is relied on, hidden layer is so accent is shared.Further, using Back Propagation Algorithm to accent
Independent baseline acoustic model carries out small parameter perturbations.Because the baseline acoustic model of accent independence is remembered in short-term based on two-way length
Recall neural network model, the acoustic model that the accent that hidden layer is ultimately produced is relied on is also based on the two-way short-term memory circulation long of depth
Neural network model, the label of its output layer is sound mother, and its combining with pronunciation dictionary and language model are that may recognize that audio number
According to corresponding text.
So far, combined preferred embodiment shown in the drawings describes technical scheme, but, this area
Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this
On the premise of the principle of invention, those skilled in the art can make equivalent change or replacement to correlation technique feature, these
Technical scheme after changing or replacing it is fallen within protection scope of the present invention.
Claims (9)
1. a kind of acoustic model adaptive approach based on accent bottleneck characteristic, it is characterised in that methods described includes following step
Suddenly:
S1, based on the first deep neural network, feature is spliced as training sample using the vocal print of multiple accent voice datas, obtain
Depth accent bottleneck network model;
S2, based on the depth accent bottleneck network, obtain the accent splicing feature of the accent voice data;
S3, based on the second deep neural network, feature as training is spliced using the accent of multiple accent voice datas
Sample, obtains the baseline acoustic model of accent independence;
S4, using the accent splicing feature of specific accent voice data to the baseline acoustic model of the accent independence
Parameter is adjusted, the acoustic model that generation accent is relied on.
2. method according to claim 1, it is characterised in that in step sl, obtains the step that the vocal print splices feature
Suddenly include:
S11, from accent voice data extract acoustic feature;
S12, the vocal print feature vector that speaker is extracted using the acoustic feature;
S13, to merge the vocal print feature vectorial with the acoustic feature, generation vocal print splicing feature.
3. method according to claim 2, it is characterised in that in step sl, before the first nerves network is depth
Feedback neutral net, is carried out with the vocal print splicing feature of the multiple accent voice data to the depth feedforward neural network
Training, obtains depth accent bottleneck network.
4. method according to claim 3, it is characterised in that step S2 is further included:
S21, the accent bottleneck characteristic that the accent voice data is extracted using the depth accent bottleneck network model;
S22, the fusion accent bottleneck characteristic and the acoustic feature, the accent splicing for obtaining the accent voice data are special
Levy.
5. method according to claim 4, it is characterised in that step S21 is further included:
The vocal print of the accent voice data is spliced into feature as the input of the depth accent bottleneck network model, using preceding
The accent bottleneck characteristic of the accent voice data is obtained to propagation algorithm.
6. method according to claim 5, it is characterised in that in step s3, the nervus opticus network is double depth
To short-term memory Recognition with Recurrent Neural Network long,
It is trained with multiple accent splicing feature short-term memory Recognition with Recurrent Neural Network long two-way to the depth, obtains mouth
The acoustic model of the two-way short-term memory Recognition with Recurrent Neural Network long of depth of sound independence;
Using the acoustic model of the two-way short-term memory Recognition with Recurrent Neural Network long of the depth of the accent independence as accent independence base
Line acoustic model.
7. method according to claim 6, it is characterised in that in step s 4, feature is spliced to institute using the accent
The parameter for stating the output layer of the baseline acoustic model of accent independence is adjusted, the acoustic model that production accent is relied on.
8. method according to claim 7, it is characterised in that in step s 4, to the baseline acoustic of the accent independence
The parameter of last output layer of model is adjusted.
9. the method according to claim 7 or 8, it is characterised in that using Back Propagation Algorithm to the accent independence
The parameter of the output layer of baseline acoustic model is adjusted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611232996.4A CN106875942B (en) | 2016-12-28 | 2016-12-28 | Acoustic model self-adaption method based on accent bottleneck characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611232996.4A CN106875942B (en) | 2016-12-28 | 2016-12-28 | Acoustic model self-adaption method based on accent bottleneck characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106875942A true CN106875942A (en) | 2017-06-20 |
CN106875942B CN106875942B (en) | 2021-01-22 |
Family
ID=59164199
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611232996.4A Active CN106875942B (en) | 2016-12-28 | 2016-12-28 | Acoustic model self-adaption method based on accent bottleneck characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106875942B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108074575A (en) * | 2017-12-14 | 2018-05-25 | 广州势必可赢网络科技有限公司 | A kind of auth method and device based on Recognition with Recurrent Neural Network |
CN108538285A (en) * | 2018-03-05 | 2018-09-14 | 清华大学 | A kind of various keyword detection method based on multitask neural network |
CN108682416A (en) * | 2018-04-11 | 2018-10-19 | 深圳市卓翼科技股份有限公司 | local adaptive voice training method and system |
CN108682417A (en) * | 2018-05-14 | 2018-10-19 | 中国科学院自动化研究所 | Small data Speech acoustics modeling method in speech recognition |
CN108922559A (en) * | 2018-07-06 | 2018-11-30 | 华南理工大学 | Recording terminal clustering method based on voice time-frequency conversion feature and integral linear programming |
CN109074804A (en) * | 2018-07-18 | 2018-12-21 | 深圳魔耳智能声学科技有限公司 | Voice recognition processing method, electronic equipment and storage medium based on accent |
CN109147763A (en) * | 2018-07-10 | 2019-01-04 | 深圳市感动智能科技有限公司 | A kind of audio-video keyword recognition method and device based on neural network and inverse entropy weighting |
CN109887497A (en) * | 2019-04-12 | 2019-06-14 | 北京百度网讯科技有限公司 | Modeling method, device and the equipment of speech recognition |
CN110033760A (en) * | 2019-04-15 | 2019-07-19 | 北京百度网讯科技有限公司 | Modeling method, device and the equipment of speech recognition |
WO2019154107A1 (en) * | 2018-02-12 | 2019-08-15 | 阿里巴巴集团控股有限公司 | Voiceprint recognition method and device based on memorability bottleneck feature |
CN110570858A (en) * | 2019-09-19 | 2019-12-13 | 芋头科技(杭州)有限公司 | Voice awakening method and device, intelligent sound box and computer readable storage medium |
CN110890085A (en) * | 2018-09-10 | 2020-03-17 | 阿里巴巴集团控股有限公司 | Voice recognition method and system |
CN110930982A (en) * | 2019-10-31 | 2020-03-27 | 国家计算机网络与信息安全管理中心 | Multi-accent acoustic model and multi-accent voice recognition method |
CN111370025A (en) * | 2020-02-25 | 2020-07-03 | 广州酷狗计算机科技有限公司 | Audio recognition method and device and computer storage medium |
CN111508501A (en) * | 2020-07-02 | 2020-08-07 | 成都晓多科技有限公司 | Voice recognition method and system with accent for telephone robot |
CN111833847A (en) * | 2019-04-15 | 2020-10-27 | 北京百度网讯科技有限公司 | Speech processing model training method and device |
CN112992126A (en) * | 2021-04-22 | 2021-06-18 | 北京远鉴信息技术有限公司 | Voice authenticity verification method and device, electronic equipment and readable storage medium |
-
2016
- 2016-12-28 CN CN201611232996.4A patent/CN106875942B/en active Active
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108074575A (en) * | 2017-12-14 | 2018-05-25 | 广州势必可赢网络科技有限公司 | A kind of auth method and device based on Recognition with Recurrent Neural Network |
WO2019154107A1 (en) * | 2018-02-12 | 2019-08-15 | 阿里巴巴集团控股有限公司 | Voiceprint recognition method and device based on memorability bottleneck feature |
CN108538285A (en) * | 2018-03-05 | 2018-09-14 | 清华大学 | A kind of various keyword detection method based on multitask neural network |
CN108538285B (en) * | 2018-03-05 | 2021-05-04 | 清华大学 | Multi-instance keyword detection method based on multitask neural network |
CN108682416A (en) * | 2018-04-11 | 2018-10-19 | 深圳市卓翼科技股份有限公司 | local adaptive voice training method and system |
CN108682416B (en) * | 2018-04-11 | 2021-01-01 | 深圳市卓翼科技股份有限公司 | Local adaptive speech training method and system |
CN108682417A (en) * | 2018-05-14 | 2018-10-19 | 中国科学院自动化研究所 | Small data Speech acoustics modeling method in speech recognition |
CN108922559A (en) * | 2018-07-06 | 2018-11-30 | 华南理工大学 | Recording terminal clustering method based on voice time-frequency conversion feature and integral linear programming |
CN109147763A (en) * | 2018-07-10 | 2019-01-04 | 深圳市感动智能科技有限公司 | A kind of audio-video keyword recognition method and device based on neural network and inverse entropy weighting |
WO2020014890A1 (en) * | 2018-07-18 | 2020-01-23 | 深圳魔耳智能声学科技有限公司 | Accent-based voice recognition processing method, electronic device and storage medium |
CN109074804A (en) * | 2018-07-18 | 2018-12-21 | 深圳魔耳智能声学科技有限公司 | Voice recognition processing method, electronic equipment and storage medium based on accent |
CN109074804B (en) * | 2018-07-18 | 2021-04-06 | 深圳魔耳智能声学科技有限公司 | Accent-based speech recognition processing method, electronic device, and storage medium |
CN110890085B (en) * | 2018-09-10 | 2023-09-12 | 阿里巴巴集团控股有限公司 | Voice recognition method and system |
CN110890085A (en) * | 2018-09-10 | 2020-03-17 | 阿里巴巴集团控股有限公司 | Voice recognition method and system |
CN109887497A (en) * | 2019-04-12 | 2019-06-14 | 北京百度网讯科技有限公司 | Modeling method, device and the equipment of speech recognition |
CN109887497B (en) * | 2019-04-12 | 2021-01-29 | 北京百度网讯科技有限公司 | Modeling method, device and equipment for speech recognition |
CN111833847A (en) * | 2019-04-15 | 2020-10-27 | 北京百度网讯科技有限公司 | Speech processing model training method and device |
CN110033760A (en) * | 2019-04-15 | 2019-07-19 | 北京百度网讯科技有限公司 | Modeling method, device and the equipment of speech recognition |
CN110033760B (en) * | 2019-04-15 | 2021-01-29 | 北京百度网讯科技有限公司 | Modeling method, device and equipment for speech recognition |
US11688391B2 (en) | 2019-04-15 | 2023-06-27 | Beijing Baidu Netcom Science And Technology Co. | Mandarin and dialect mixed modeling and speech recognition |
CN110570858A (en) * | 2019-09-19 | 2019-12-13 | 芋头科技(杭州)有限公司 | Voice awakening method and device, intelligent sound box and computer readable storage medium |
CN110930982A (en) * | 2019-10-31 | 2020-03-27 | 国家计算机网络与信息安全管理中心 | Multi-accent acoustic model and multi-accent voice recognition method |
CN111370025A (en) * | 2020-02-25 | 2020-07-03 | 广州酷狗计算机科技有限公司 | Audio recognition method and device and computer storage medium |
CN111508501B (en) * | 2020-07-02 | 2020-09-29 | 成都晓多科技有限公司 | Voice recognition method and system with accent for telephone robot |
CN111508501A (en) * | 2020-07-02 | 2020-08-07 | 成都晓多科技有限公司 | Voice recognition method and system with accent for telephone robot |
CN112992126A (en) * | 2021-04-22 | 2021-06-18 | 北京远鉴信息技术有限公司 | Voice authenticity verification method and device, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106875942B (en) | 2021-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106875942A (en) | Acoustic model adaptive approach based on accent bottleneck characteristic | |
JP7427723B2 (en) | Text-to-speech synthesis in target speaker's voice using neural networks | |
US11837216B2 (en) | Speech recognition using unspoken text and speech synthesis | |
CN106531157B (en) | Regularization accent adaptive approach in speech recognition | |
CN109003601A (en) | A kind of across language end-to-end speech recognition methods for low-resource Tujia language | |
US20200075024A1 (en) | Response method and apparatus thereof | |
CN110223714A (en) | A kind of voice-based Emotion identification method | |
CN107452379B (en) | Dialect language identification method and virtual reality teaching method and system | |
CN108806667A (en) | The method for synchronously recognizing of voice and mood based on neural network | |
CN110491393B (en) | Training method of voiceprint representation model and related device | |
JP2017040919A (en) | Speech recognition apparatus, speech recognition method, and speech recognition system | |
CN106688034A (en) | Text-to-speech with emotional content | |
CN103928023A (en) | Voice scoring method and system | |
CN105760852A (en) | Driver emotion real time identification method fusing facial expressions and voices | |
CN107945790A (en) | A kind of emotion identification method and emotion recognition system | |
CN108172218A (en) | A kind of pronunciation modeling method and device | |
CN107871496A (en) | Audio recognition method and device | |
CN110010136A (en) | The training and text analyzing method, apparatus, medium and equipment of prosody prediction model | |
CN108986798A (en) | Processing method, device and the equipment of voice data | |
CN109493846B (en) | English accent recognition system | |
Sreevidya et al. | Sentiment analysis by deep learning approaches | |
CN109377986A (en) | A kind of non-parallel corpus voice personalization conversion method | |
Peguda et al. | Speech to sign language translation for Indian languages | |
Wu et al. | Oral English Speech Recognition Based on Enhanced Temporal Convolutional Network. | |
CN113470622A (en) | Conversion method and device capable of converting any voice into multiple voices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |