CN109785824A - A kind of training method and device of voiced translation model - Google Patents

A kind of training method and device of voiced translation model Download PDF

Info

Publication number
CN109785824A
CN109785824A CN201910198404.9A CN201910198404A CN109785824A CN 109785824 A CN109785824 A CN 109785824A CN 201910198404 A CN201910198404 A CN 201910198404A CN 109785824 A CN109785824 A CN 109785824A
Authority
CN
China
Prior art keywords
model
text
translation
speech recognition
translation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910198404.9A
Other languages
Chinese (zh)
Other versions
CN109785824B (en
Inventor
马志强
刘俊华
魏思
胡国平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201910198404.9A priority Critical patent/CN109785824B/en
Publication of CN109785824A publication Critical patent/CN109785824A/en
Application granted granted Critical
Publication of CN109785824B publication Critical patent/CN109785824B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

This application discloses the training methods and device of a kind of voiced translation model, this method comprises: obtaining the model training data including each sample voice first, then, the sample voice got is directly translated using current voiced translation model, obtain prediction cypher text, simultaneously, the sample voice got is identified using current speech recognition modeling, obtain Forecasting recognition text, then, according to obtained prediction cypher text and Forecasting recognition text, the parameter of voiced translation model and speech recognition modeling is updated.Since voiced translation model and speech recognition modeling share department pattern parameter, so, when updating the parameter of speech recognition modeling, the model parameter that part is shared in voiced translation model can be equally updated, so that this department pattern parameter of voiced translation model is more accurate, and then when carrying out voiced translation using the voiced translation model, it is able to ascend the translation performance of voiced translation model.

Description

A kind of training method and device of voiced translation model
Technical field
This application involves voiced translation technical field more particularly to a kind of training methods and device of voiced translation model.
Background technique
Existing voice translation method generally includes two steps, that is, by the speech recognition of voiced translation model realization and text This translation.Specifically, firstly, one section of voice is passed through speech recognition technology, it is identified as the text of same languages therewith, then, The identification text is translated into the text of another languages using text translation technology, to realize voiced translation process.
But the shortcomings that joint speech recognition technology and text translation technology carry out voiced translation, and there are error accumulations, example Such as, it is assumed that using speech recognition technology by some word recognition errors, and work as and the word is carried out using text translation technology When translation, the translation result of mistake will be obtained according to the word of the mistake.As it can be seen that the mistake of speech recognition period can accumulate text This translating phase, so as to cause the inaccuracy of translation result, that is to say, that the translation performance of existing voiced translation model is also It is to be hoisted.
Summary of the invention
The main purpose of the embodiment of the present application is to provide the training method and device of a kind of voiced translation model, Neng Gouti Rise the translation performance of voiced translation model.
The embodiment of the present application provides a kind of training method of voiced translation model, comprising:
Model training data are obtained, the model training data include each sample voice;
The sample voice is directly translated using current voiced translation model, obtains prediction cypher text, In, voiced translation model and a speech recognition modeling share department pattern parameter;
The sample voice is identified using current speech recognition modeling, obtains Forecasting recognition text;
According to the prediction cypher text and the Forecasting recognition text, updates current voiced translation model and voice and know The parameter of other model.
Optionally, described according to the prediction cypher text and the Forecasting recognition text, update current voiced translation The parameter of model and speech recognition modeling, comprising:
Obtain the true cypher text and true identification text of the sample voice;
According to translation different information and Recognition Different information, current voiced translation model and speech recognition modeling are updated Parameter;
Wherein, the translation different information is the difference between the prediction cypher text and the true cypher text, The Recognition Different information is the difference between the Forecasting recognition text and the true identification text.
Optionally, described according to translation different information and Recognition Different information, update current voiced translation model and language The parameter of sound identification model, comprising:
According to the translation different information, parameter update is carried out to the voiced translation model;
According to the Recognition Different information, parameter update is carried out to the speech recognition modeling.
Optionally, the speech recognition modeling and one encoder of the voiced translation model sharing, the speech recognition Model includes an identification decoder, and the voiced translation model includes one and is translated and decoded device.
The embodiment of the present application also provides a kind of voice translation methods, comprising:
Obtain target voice to be translated;
The voiced translation model obtained using the training method training by above-mentioned voiced translation model, to the target language Sound is translated.
The embodiment of the present application also provides a kind of training devices of voiced translation model, comprising:
Training data acquiring unit, for obtaining model training data, the model training data include each sample language Sound;
Cypher text obtaining unit, for directly being turned over using current voiced translation model to the sample voice It translates, obtains prediction cypher text, wherein voiced translation model and a speech recognition modeling share department pattern parameter;
Identification text obtaining unit is obtained for being identified using current speech recognition modeling to the sample voice To Forecasting recognition text;
Model parameter updating unit, for updating current according to the prediction cypher text and the Forecasting recognition text Voiced translation model and speech recognition modeling parameter.
Optionally, the model parameter updating unit includes:
Real text obtains subelement, for obtaining the true cypher text and true identification text of the sample voice;
Model parameter updates subelement, for updating current voice according to translation different information and Recognition Different information The parameter of translation model and speech recognition modeling;
Wherein, the translation different information is the difference between the prediction cypher text and the true cypher text, The Recognition Different information is the difference between the Forecasting recognition text and the true identification text.
Optionally, the model parameter update subelement includes:
Translation model parameter updates subelement, for according to the translation different information, to the voiced translation model into Row parameter updates;
Identification model parameter updates subelement, for according to the Recognition Different information, to the speech recognition modeling into Row parameter updates.
Optionally, the speech recognition modeling and one encoder of the voiced translation model sharing, the speech recognition Model includes an identification decoder, and the voiced translation model includes one and is translated and decoded device.
The embodiment of the present application also provides a kind of speech translation apparatus, comprising:
Target voice acquiring unit, for obtaining target voice to be translated;
Target voice translation unit, the voice for being obtained using the training device training by above-mentioned voiced translation model Translation model translates the target voice.
The embodiment of the present application also provides a kind of training equipment of voiced translation model, comprising: processor memory, is System bus;
The processor and the memory are connected by the system bus;
The memory includes instruction, described instruction for storing one or more programs, one or more of programs The processor is set to execute any one reality in the training method of above-mentioned voiced translation model when being executed by the processor Existing mode.
The embodiment of the present application also provides a kind of computer readable storage medium, deposited in the computer readable storage medium Instruction is contained, when described instruction is run on the terminal device, so that the terminal device executes above-mentioned voiced translation model Any one implementation in training method.
The embodiment of the present application also provides a kind of computer program product, the computer program product is on the terminal device When operation, so that the terminal device executes any one implementation in the training method of above-mentioned voiced translation model.
The embodiment of the present application also provides a kind of speech translation apparatus, comprising: processor, memory, system bus;
The processor and the memory are connected by the system bus;
The memory includes instruction, described instruction for storing one or more programs, one or more of programs The processor is set to execute any one implementation of above-mentioned voice translation method when being executed by the processor.
The embodiment of the present application also provides a kind of computer readable storage medium, deposited in the computer readable storage medium Instruction is contained, when described instruction is run on the terminal device, so that the terminal device executes above-mentioned voice translation method Any one implementation.
The embodiment of the present application also provides a kind of computer program product, the computer program product is on the terminal device When operation, so that the terminal device executes any one implementation of above-mentioned voice translation method.
The training method and device of a kind of voiced translation model provided by the embodiments of the present application, to voiced translation model into When row training, the model training data including each sample voice are obtained first and then utilize current voiced translation model pair The sample voice got is directly translated, and prediction cypher text is obtained, meanwhile, using current speech recognition modeling to obtaining The sample voice got is identified, Forecasting recognition text is obtained, then, can be according to obtained prediction cypher text and prediction It identifies text, updates the parameter of current voiced translation model and speech recognition modeling.Due to current voiced translation model with Speech recognition modeling shares department pattern parameter, so, it, equally can be to voiced translation when updating the parameter of speech recognition modeling The model parameter that part is shared in model is updated, so that this department pattern ginseng for the voiced translation model that training obtains Number is more accurate, and then when carrying out voiced translation using the voiced translation model, is able to ascend the translation of voiced translation model Performance.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the application Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 is one of the structural schematic diagram of end-to-end voiced translation model provided by the embodiments of the present application;
Fig. 2 is the second structural representation of end-to-end voiced translation model provided by the embodiments of the present application;
Fig. 3 is a kind of flow diagram of the training method of voiced translation model provided by the embodiments of the present application;
Fig. 4 is one of voiced translation model provided by the embodiments of the present application and structural schematic diagram of speech recognition modeling;
Fig. 5 is the second structural representation of voiced translation model and speech recognition modeling provided by the embodiments of the present application;
Fig. 6 is provided by the embodiments of the present application to update current voice according to prediction cypher text with Forecasting recognition text and turn over Translate the flow diagram of the parameter of model and speech recognition modeling;
Fig. 7 is that the voice current according to translation different information and Recognition Different information update provided by the embodiments of the present application turns over Translate the flow diagram of the parameter of model and speech recognition modeling;
Fig. 8 is a kind of flow diagram of voice translation method provided by the embodiments of the present application;
Fig. 9 is a kind of composition schematic diagram of the training device of voiced translation model provided by the embodiments of the present application;
Figure 10 is a kind of composition schematic diagram of speech translation apparatus provided by the embodiments of the present application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
First embodiment
It should be noted that traditional voice translation method is usually first to carry out speech recognition to voice, it is identified as Then the text of same languages therewith is again handled the identification text, that is, using text translation technology to the identification text It is translated, is transcribed into the text of another languages, realize voiced translation.But this traditional voice translation method is often There are problems that error accumulation, that is, if producing mistake when speech recognition, it is translated which can accumulate subsequent text Journey in turn results in translation result inaccuracy.
Therefore, to solve drawbacks described above, voice can be carried out using the model of voiced translation end to end as shown in Figure 1 and turned over It translates, which includes encoder, attention layer (Attention) and decoder, can by the voiced translation model Not carry out speech recognition to source languages voice, and directly by the source languages voiced translation at target language text, realize direct Voiced translation, for example, Chinese speech is translated directly into English text.
One kind being optionally achieved in that voiced translation model can use network structure as shown in Figure 2 end to end, Next, will be introduced by taking the voiced translation model as an example to using its realization process for carrying out voiced translation:
(1) audio frequency characteristics of input source languages voice
Audio feature extraction is carried out to the source languages voice for needing to carry out voiced translation first, for example, can be with extraction source language The Meier spectrum signature (Mel Bank Features) of kind voice, as the audio frequency characteristics of source languages voice, which can To be indicated in the form of feature vector, here, this feature vector is defined as x1...T, wherein T indicates source languages voice The number for the vector element that the dimension size of audio feature vector, i.e. audio feature vector include, it is then possible to by x1...TAs Input data is input to voiced translation model end to end shown in Fig. 2.
(2) the corresponding coding vector of audio frequency characteristics of source languages voice is generated
As shown in Fig. 2, this end to end voiced translation model coded portion include two layers of convolutional neural networks (Convolutional Neural Networks, abbreviation CNN) and maximum pond layer (MaxPooling), one layer of convolution length Phase memory network (convolutional Long Short-Term Memory, abbreviation convolutional LSTM), three layers Two-way shot and long term memory network (Bi-directional Long Short-Term Memory, abbreviation BiLSTM).
The audio frequency characteristics x of (1) input source languages voice through the above steps1...TAfterwards, it can be carried out by one layer of CNN Then coding, then carry out down-sampled operation to it by MaxPooling, then by one layer of CNN and MaxPooling repeats this Operation obtains the coding vector that length is L and then recycles one layer of convolutional LSTM and three layers BiLSTM pairs This coding vector is handled, and to obtain final coding vector, is defined as h1...L, wherein L indicates the sound to source languages voice The number for the vector element that the dimension size for the coding vector that frequency feature obtains after being encoded, i.e. coding vector include, h1...L Specific formula for calculation it is as follows:
h1...L=enc (Wencx1...T) (1)
Wherein, enc indicates the entire coding calculating process of model based coding part;WencIndicate each layer in model based coding part The all-network parameter of network.
(3) the corresponding decoded vector of coding vector is generated
As shown in Fig. 2, this voiced translation solution to model code part includes 4 layers of unidirectional shot and long term memory network end to end (Long Short-Term Memory, abbreviation LSTM), softmax classifier.
(2) through the above steps are encoded to obtain using the coded portion of model to the audio frequency characteristics of source languages voice After coding vector, attention operation first can be carried out to the coding vector, it can be to life to be concerned about in coding vector At the related data of decoded vector, then it is decoded by 4 layers of LSTM and softmax classifier again, to be corresponded to Decoded vector, recycle the decoded vector to generate the cypher text of source languages voice, and be defined as y1...K, wherein K can To indicate the number for the individual character (or word) for including in cypher text.
The specific formula for calculation of decoded portion is as follows:
ck=att (sk,h1...L) (2)
sk=lstm (yk-1,sk-1,ck-1) (3)
yk=soft max (Wy[sk,ck]+by) (4)
Wherein, h1...LThe corresponding coding vector of audio frequency characteristics of expression source languages voice;ckIndicate k-th of attention Calculated result;Att indicates attention calculating process;ck-1Indicate -1 attention calculated result of kth;skIndicate decoding K-th of the hidden layer vector exported in 4 layers of LSTM network that part includes;Lstm indicates 4 layers of LSTM network that decoded portion includes Calculating process;sk-1- 1 hidden layer vector of kth exported in 4 layers of LSTM network that expression decoded portion includes;ykIndicate translation K-th of the word (or word) for including in text;yk-1Indicate -1 word of kth (or word) for including in cypher text;WyAnd byIt indicates Model parameter in softmax classifier.
If utilizing WdecThe all-network parameter of each layer network in representative model decoded portion, then the source languages that model exports The cypher text y of voice1...KCalculation formula it is as follows:
y1...K=dec (Wdech1...L) (5)
Wherein, dec indicates the entire solution Calculative Process of model decoded portion;WdecIndicate each layer in model decoded portion The all-network parameter of network;h1...LThe corresponding coding vector of audio frequency characteristics of expression source languages voice.
It should be noted that in the model of voiced translation end to end shown in FIG. 1 encoder and decoder network structure It is not unique, and network structure shown in Fig. 2 is only one such example, can also take other network structures or net Network layers number.For example, the encoder of model can also be using Recognition with Recurrent Neural Network (Recurrent Neural Network, abbreviation ) etc. RNN encoded, and the number of plies of network can also be set according to the actual situation, the embodiment of the present application to this without Limitation.Wherein, the number of plies of CNN, BiLSTM introduced in above-mentioned or subsequent content etc. is only example, and the application does not limit its layer Number, can be the number of plies referred in the embodiment of the present application, is also possible to other numbers of plies.
In the present embodiment, on the basis of the model of voiced translation end to end shown in Fig. 1, in order to improve voiced translation The translation performance of model may further be trained voiced translation model by the way of multitask training.
Wherein, multitask training refers to the machine learning method being trained that multiple inter-related tasks are put together, In training process, some model parameters are shared between the task model of this multiple inter-related task, for example, can be with Share Model bottom Partial parameters etc., the information acquired to share each task specifically can be by multiple relevant tasks simultaneously simultaneously Row study, and by way of gradient backpropagation simultaneously, the sharing model parameters between multiple inter-related tasks are adjusted, to realize The study of helping each other of multiple inter-related tasks, to promote the extensive effect of task model.As it can be seen that the side of this multitask training Formula can obtain better model generalization effect, improve the generalization ability of model for single task training.
It should be noted that when being trained by the way of multitask training to multi task model in the present embodiment, tool Body is that voiced translation model and speech recognition modeling are carried out while being trained, and after model training, makes voiced translation model With preferable translation performance.
Next, this implementation will carry out the training method of voiced translation model provided in this embodiment detailed in conjunction with attached drawing It is thin to introduce.
Referring to Fig. 3, it illustrates a kind of flow diagram of the training method of voiced translation model provided in this embodiment, Method includes the following steps:
S301: model training data are obtained, wherein model training data include each sample voice.
In the present embodiment, in order to be trained to voiced translation model, the translation performance of voiced translation model is improved, is needed A large amount of preparation is carried out in advance, firstly, it is necessary to a large amount of voice data be collected, as sample voice, to constitute mould Type training data.For example, a large amount of recording data can be collected in advance, for example, read aloud in match the voice of each entrant or Person's session recording etc. can be used as sample voice, to be trained to model.
It should be noted that the languages of the unlimited sample voice processed of the present embodiment, for example, sample voice can be Chinese Sound or English voice etc.;Meanwhile the length of the present embodiment also unlimited sample voice processed, for example, sample voice can be one Words or more words etc..
It should also be noted that, the present embodiment will be using each sample voice to voiced translation model and speech recognition modeling More wheel training are carried out, specifically by by taking the sample voice that current wheel training uses as an example, are realized according to subsequent step S302-S304 When the model training of front-wheel, it is specifically described as follows.
S302: directly translating sample voice using current voiced translation model, obtains prediction cypher text.
In the present embodiment, after getting each sample voice by step S301, it can use current voiced translation Model directly translates (without speech recognition) sample voice got, to obtain prediction cypher text, for example, false If a certain sample voice is Chinese speech, then it can use current voiced translation model and it directly translated, to obtain Predict translator of English text.
Wherein, the voiced translation model and a speech recognition modeling share department pattern parameter.A kind of optional realization Mode is that the network structure of current voiced translation model is as shown in figure 4, itself and the shared coding of a speech recognition modeling Device, and the speech recognition modeling includes an identification decoder, to generate Forecasting recognition text, and current voiced translation Model then includes one and is translated and decoded device, to generate prediction cypher text.It should be noted that speech recognition mould in Fig. 4 The identification decoder of type and the network structure for being translated and decoded device of voiced translation model may be the same or different, respective Concrete composition structure can be set according to the actual situation, and the embodiment of the present application is not limited this.
For example: for example a certain sample voice content is " end-to-end speech translation system ", and its corresponding translation text This languages are English, then available pre- after carrying out directly translation to it by current voiced translation model shown in Fig. 4 Surveying cypher text is " The end-to-end speech translation system ".
S303: identifying sample voice using current speech recognition modeling, obtains Forecasting recognition text.
In the present embodiment, after getting each sample voice by step S301, it can use current speech recognition Model identifies the sample voice got, to obtain Forecasting recognition text, for example, it is assumed that a certain sample voice is Chinese Voice then can use current speech recognition modeling and identify to it, to obtain the prediction Chinese identification text of same languages therewith This.
Wherein, the speech recognition modeling introduced in the speech recognition modeling and step S302 can share department pattern ginseng Number.One kind being optionally achieved in that the network structure of current speech recognition modeling is as shown in figure 4, itself and voiced translation mould Type shares an encoder, and speech recognition modeling includes an identification decoder, to generate Forecasting recognition text.
For example: for example a certain sample voice content is still " end-to-end speech translation system ", then by shown in Fig. 4 After current speech recognition modeling identifies it, available prediction Chinese identification text is " end-to-end speech translation system System ".
In the present embodiment, a kind of to be optionally achieved in that, voiced translation model and speech recognition modeling can use Network structure as shown in Figure 5, and network structure based on this model, generate prediction cypher text and Forecasting recognition text it is specific Process is as follows:
(1) audio frequency characteristics of input sample voice
Audio feature extraction is carried out to sample voice first, for example, the Meier spectrum signature of sample voice can be extracted, is made For the audio frequency characteristics of sample voice, and this feature vector is defined as x1...T, then, by x1...TAs input data, it is input to Encoder shown in fig. 5.
(2) the corresponding coding vector of audio frequency characteristics of sample voice is generated
As shown in figure 5, voiced translation model and the shared encoder of speech recognition modeling and coding structure phase shown in Fig. 2 Together, details are not described herein again.Pass through the audio frequency characteristics x for the sample voice that the encoder inputs above-mentioned steps (1)1...TIt is encoded Afterwards, available final coding vector h1...L, wherein what L expression obtained after encoding to the audio frequency characteristics of sample voice The number for the vector element that the dimension size of coding vector, i.e. coding vector include, h1...LSpecific formula for calculation be above-mentioned public affairs Formula (1), i.e., as follows:
h1...L=enc (Wencx1...T)
Wherein, enc indicates the entire coding calculating process of encoder in Fig. 5;WencEach layer net in encoder in expression Fig. 5 The all-network parameter of network.
(3) the corresponding decoded vector of coding vector is generated
As shown in fig. 5, it is assumed that the identification decoder of speech recognition modeling and the net for being translated and decoded device of voiced translation model Network structure is identical, includes 4 layers of LSTM, softmax classifier, but the training parameter of the two is not shared.
(2) through the above steps encode the audio frequency characteristics of sample voice using the coded portion of model and are compiled Code vector h1...LAfterwards, as shown in figure 5, attention operation first can be carried out to the coding vector respectively, then pass through respectively again It is translated and decoded device and identifies that 4 layers of LSTM and softmax classifier in decoder are decoded attention operation result, To obtain corresponding decoded vector, the two decoded vectors is recycled to generate the prediction cypher text of sample voice respectively y1...KAnd the Forecasting recognition text z of sample voice1...N, wherein individual character that N indicates to include in Forecasting recognition text (or it is single Word) number, the specific formula for calculation of decoded portion can be found in above-mentioned formula (2), (3), (4), and details are not described herein.
If utilizing WdecThe all-network parameter for being translated and decoded each layer network in device of voiced translation model in Fig. 5 is represented, then The prediction cypher text y of model output1...KCalculation formula it is as follows:
y1...K=dec (Wdech1...L) (6)
Wherein, dec indicates the entire solution Calculative Process for being translated and decoded device of voiced translation model in Fig. 5;WdecIndicate figure The all-network parameter for being translated and decoded each layer network in device of voiced translation model in 5;h1...LIndicate that the audio of sample voice is special Levy corresponding coding vector.
Similar, if utilizing WasrRepresent the all-network of each layer network in the identification decoder of speech recognition modeling in Fig. 5 Parameter, then the Forecasting recognition text z that model exports1...NCalculation formula it is as follows:
z1...N=dec (Wasrh1...L) (7)
Wherein, dec indicates the entire solution Calculative Process of the identification decoder of speech recognition modeling in Fig. 5;WasrIndicate figure The all-network parameter for identifying each layer network in decoder of speech recognition modeling in 5;h1...LIndicate that the audio of sample voice is special Levy corresponding coding vector.
It should be noted that the present embodiment do not limit S302 and S303 execute sequence, executed after S302 can be first carried out S303 is first carried out and is executed S302 after S303 or be performed simultaneously S302 and S303.
S304: according to the prediction cypher text of sample voice and Forecasting recognition text, current voiced translation model is updated With the parameter of speech recognition modeling.
In the present embodiment, a sample voice can be successively extracted from the model training data referred in S301, into Row model training, by more taking turns training, to update the parameter of current voiced translation model and speech recognition modeling.
Before model training, the model ginseng of voiced translation model and speech recognition modeling can be gone out with random initializtion first Number Wenc、WdecAnd Wasr.Then, in first round training process, S302-S303 through the above steps, to voiced translation model With the model parameter W of speech recognition modelingenc、WdecAnd WasrIt is updated;Second wheel training during, the first round more Step S302-S303 is continued through on the basis of new parameter carries out the second wheel parameter update ... until training terminates.
As an example, in the training process, the objective function that the present embodiment uses is as follows:
Obj=λ log p (y | x)+(1- λ) log p (z | x) (8)
Wherein, λ indicates weight, and the value of λ can be set between 0-1 based on experimental result or experience;Y indicates voice The prediction cypher text of translation model output;Z indicates the Forecasting recognition text of speech recognition modeling output;X indicates sample voice Audio characteristic data.
Specifically, a kind of to be optionally achieved in that, as shown in fig. 6, the realization process of this step S304 specifically can be with Including step S601-S602:
S601: the true cypher text and true identification text of sample voice are obtained.
In this implementation, while obtaining each sample voice as model training data, it can also get Each corresponding true cypher text of sample voice and true identification text.For example: assuming that the content of sample voice is still " end-to-end speech translation system ", then its corresponding true cypher text is " The end-to-end speech Translation system ", true identification text are " end-to-end speech translation system ".
S602: according to translation different information and Recognition Different information, current voiced translation model and speech recognition are updated The parameter of model.
In this implementation, translation different information refers to the difference between prediction cypher text and true cypher text It is different.For example, it is assumed that prediction cypher text is " The end-to-end speech translate system ", true translation text This is " The end-to-end speech translation system ", then the translation different information of the two is " translation " and " translate ".
In this implementation, Recognition Different information refers to the difference between Forecasting recognition text and true identification text It is different.For example, it is assumed that Forecasting recognition text is " end-to-end language translation system ", it is true to identify that text is " end-to-end speech translation System ", then the Recognition Different information of the two as " is sayed " and " sound ".
As a result, after the true cypher text for getting sample voice by step S601 and true identification text, into one It walks the corresponding true cypher text of available sample voice and predicts the translation different information and sample between cypher text Different information between the corresponding true identification text of voice and Forecasting recognition text can translate difference according to these in turn Information and Recognition Different information respectively correspond the parameter for updating current voiced translation model and speech recognition modeling.
In one implementation, as shown in fig. 7, step S602 specific implementation process may include following step S701- S702:
S701: according to translation different information, parameter update is carried out to voiced translation model.
It in the present embodiment, can be according to the translation difference after getting the corresponding translation different information of sample text Information, the encoder model parameter W corresponding with device is translated and decoded in reversed gradient updating voiced translation modelencAnd Wdec
S702: according to Recognition Different information, parameter update is carried out to speech recognition modeling.
It in the present embodiment, can be according to the Recognition Different after getting the corresponding Recognition Different information of sample text Information, the encoder model parameter W corresponding with identification decoder in reversed gradient updating speech recognition modelingencAnd Wasr
It should be noted that the present embodiment do not limit S701 and S702 execute sequence, executed after S701 can be first carried out S702 is first carried out and is executed S701 after S702 or be performed simultaneously S701 and S702.
As it can be seen that using prediction cypher text and Forecasting recognition text, while updating current voiced translation model and language During the parameter of sound identification model, by the training to speech recognition modeling, the model in meeting real-time update encoder is joined Number, so that coding result is more acurrate, as shown in Figure 4 and Figure 5, since speech recognition modeling and voiced translation model sharing one are compiled Code device, in this way, voiced translation model is translated and decoded device in decoding in the more accurate situation of coding result, it can basis More accurate coding result is decoded, so that more accurate decoding result is obtained, it is thus possible to improve voiced translation is accurate Degree, that is, the translation performance of voiced translation model can be promoted.
To sum up, the training method of a kind of voiced translation model provided in this embodiment, is instructed to voiced translation model When practicing, the model training data including each sample voice are obtained first, then, using current voiced translation model to acquisition To sample voice directly translated, obtain prediction cypher text, meanwhile, using current speech recognition modeling to getting Sample voice identified, obtain Forecasting recognition text, then, can be according to obtained prediction cypher text and Forecasting recognition Text updates the parameter of current voiced translation model and speech recognition modeling.Due to current voiced translation model and voice Identification model shares department pattern parameter, so, it, equally can be to voiced translation model when updating the parameter of speech recognition modeling In share the model parameter of part and be updated so that this department pattern parameter of voiced translation model that training obtains is more It is accurate to add, and then when carrying out voiced translation using the voiced translation model, is able to ascend the translation performance of voiced translation model.
Second embodiment
The above are a kind of specific embodiment of the training method of voiced translation model of the application first embodiment offer, bases The voiced translation model that training obtains in above-described embodiment, the embodiment of the present application also provides a kind of voice translation methods.
Referring to Fig. 8, it illustrates a kind of flow charts of voice translation method provided by the embodiments of the present application, as shown in figure 8, This method comprises:
S801: target voice to be translated is obtained.
In the present embodiment, any voice translated using the present embodiment is defined as target voice.The target language Sound is identical as the languages of sample voice in above-mentioned first embodiment.
It is understood that target voice can be obtained by modes such as recording according to actual needs, for example, people day Often telephone relation voice in life or session recording etc. can be used as target voice, can be with after getting target voice It is translated by subsequent step S802.
S802: the voiced translation model obtained using training translates target voice.
In practical applications, it after target voice to be translated being got by step S801, further, can will extract The audio frequency characteristics (such as the spectrum signatures such as Meier spectrum signature) of target voice out are input to training in first embodiment and obtain Voiced translation model, the corresponding cypher text of target voice is obtained, to realize the translation to target voice.
To sum up, a kind of voice translation method provided in this embodiment is to use after getting target voice to be translated The voiced translation model that training obtains in above-mentioned first embodiment, translates the target voice, so as to which it is direct Corresponding languages text is translated into, the operation without carrying out any speech recognition to it is therefore, advanced compared to existing Row speech recognition, then the voice translation method of text translation is carried out, it is tired that the present embodiment can reduce speech recognition bring mistake Meter, obtains more accurate voiced translation result.
3rd embodiment
A kind of training device of voiced translation model will be introduced in the present embodiment, and related content refers to the above method Embodiment.
It is a kind of composition schematic diagram of the training device of voiced translation model provided in this embodiment, the device referring to Fig. 9 900 include:
Training data acquiring unit 901, for obtaining model training data, the model training data include each sample Voice;
Cypher text obtaining unit 902, it is direct for being carried out using current voiced translation model to the sample voice Translation obtains prediction cypher text, wherein voiced translation model and a speech recognition modeling share department pattern parameter;
Identify text obtaining unit 903, for being identified using current speech recognition modeling to the sample voice, Obtain Forecasting recognition text;
Model parameter updating unit 904, for according to the prediction cypher text and the Forecasting recognition text, update to be worked as The parameter of preceding voiced translation model and speech recognition modeling.
In a kind of implementation of the present embodiment, the model parameter updating unit 904 includes:
Real text obtains subelement, for obtaining the true cypher text and true identification text of the sample voice;
Model parameter updates subelement, for updating current voice according to translation different information and Recognition Different information The parameter of translation model and speech recognition modeling;
Wherein, the translation different information is the difference between the prediction cypher text and the true cypher text, The Recognition Different information is the difference between the Forecasting recognition text and the true identification text.
In a kind of implementation of the present embodiment, the model parameter updates subelement and includes:
Translation model parameter updates subelement, for according to the translation different information, to the voiced translation model into Row parameter updates;
Identification model parameter updates subelement, for according to the Recognition Different information, to the speech recognition modeling into Row parameter updates.
In a kind of implementation of the present embodiment, the speech recognition modeling and the voiced translation model sharing one Encoder, the speech recognition modeling include an identification decoder, and the voiced translation model includes one and is translated and decoded device.
Further, the embodiment of the present application also provides a kind of training equipment of voiced translation model, comprising: processor, Memory, system bus;
The processor and the memory are connected by the system bus;
The memory includes instruction, described instruction for storing one or more programs, one or more of programs The processor is set to execute any realization side of the training method of above-mentioned voiced translation model when being executed by the processor Method.
Further, described computer-readable to deposit the embodiment of the present application also provides a kind of computer readable storage medium Instruction is stored in storage media, when described instruction is run on the terminal device, so that the terminal device executes above-mentioned voice Any implementation method of the training method of translation model.
Further, the embodiment of the present application also provides a kind of computer program product, the computer program product exists When being run on terminal device, so that the terminal device executes any realization side of the training method of above-mentioned voiced translation model Method.
Fourth embodiment
A kind of speech translation apparatus will be introduced in the present embodiment, and related content refers to above method embodiment.
It is a kind of composition schematic diagram of speech translation apparatus provided in this embodiment referring to Figure 10, which includes:
Target voice acquiring unit 1001, for obtaining target voice to be translated;
Target voice translation unit 1002, for what is obtained using the training device training by above-mentioned voiced translation model Voiced translation model translates the target voice.
Further, the embodiment of the present application also provides a kind of speech translation apparatus, comprising: processor, memory, system Bus;
The processor and the memory are connected by the system bus;
The memory includes instruction, described instruction for storing one or more programs, one or more of programs The processor is set to execute any implementation method of above-mentioned voice translation method when being executed by the processor.
Further, described computer-readable to deposit the embodiment of the present application also provides a kind of computer readable storage medium Instruction is stored in storage media, when described instruction is run on the terminal device, so that the terminal device executes above-mentioned voice Any implementation method of interpretation method.
Further, the embodiment of the present application also provides a kind of computer program product, the computer program product exists When being run on terminal device, so that the terminal device executes any implementation method of above-mentioned voice translation method.
As seen through the above description of the embodiments, those skilled in the art can be understood that above-mentioned implementation All or part of the steps in example method can be realized by means of software and necessary general hardware platform.Based on such Understand, substantially the part that contributes to existing technology can be in the form of software products in other words for the technical solution of the application It embodies, which can store in storage medium, such as ROM/RAM, magnetic disk, CD, including several Instruction is used so that a computer equipment (can be the network communications such as personal computer, server, or Media Gateway Equipment, etc.) execute method described in certain parts of each embodiment of the application or embodiment.
It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment emphasis is said Bright is the difference from other embodiments, and the same or similar parts in each embodiment may refer to each other.For reality For applying device disclosed in example, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place Referring to method part illustration.
It should also be noted that, herein, relational terms such as first and second and the like are used merely to one Entity or operation are distinguished with another entity or operation, without necessarily requiring or implying between these entities or operation There are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to contain Lid non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the application.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (16)

1. a kind of training method of voiced translation model characterized by comprising
Model training data are obtained, the model training data include each sample voice;
The sample voice is directly translated using current voiced translation model, obtains prediction cypher text, wherein language Sound translation model and a speech recognition modeling share department pattern parameter;
The sample voice is identified using current speech recognition modeling, obtains Forecasting recognition text;
According to the prediction cypher text and the Forecasting recognition text, current voiced translation model and speech recognition mould are updated The parameter of type.
2. the method according to claim 1, wherein described know according to the prediction cypher text and the prediction Other text updates the parameter of current voiced translation model and speech recognition modeling, comprising:
Obtain the true cypher text and true identification text of the sample voice;
According to translation different information and Recognition Different information, the ginseng of current voiced translation model and speech recognition modeling is updated Number;
Wherein, the translation different information is the difference between the prediction cypher text and the true cypher text, described Recognition Different information is the difference between the Forecasting recognition text and the true identification text.
3. according to the method described in claim 2, it is characterized in that, described according to translation different information and Recognition Different information, Update the parameter of current voiced translation model and speech recognition modeling, comprising:
According to the translation different information, parameter update is carried out to the voiced translation model;
According to the Recognition Different information, parameter update is carried out to the speech recognition modeling.
4. method according to any one of claims 1 to 3, which is characterized in that the speech recognition modeling and the voice Translation model shares an encoder, and the speech recognition modeling includes an identification decoder, the voiced translation model packet It includes one and is translated and decoded device.
5. a kind of voice translation method characterized by comprising
Obtain target voice to be translated;
The voiced translation model obtained using the training of the described in any item methods of Claims 1-4 4, carries out the target voice Translation.
6. a kind of training device of voiced translation model characterized by comprising
Training data acquiring unit, for obtaining model training data, the model training data include each sample voice;
Cypher text obtaining unit is obtained for directly being translated using current voiced translation model to the sample voice To prediction cypher text, wherein voiced translation model and a speech recognition modeling share department pattern parameter;
Identification text obtaining unit is obtained pre- for being identified using current speech recognition modeling to the sample voice Survey identification text;
Model parameter updating unit, for updating current language according to the prediction cypher text and the Forecasting recognition text The parameter of sound translation model and speech recognition modeling.
7. device according to claim 6, which is characterized in that the model parameter updating unit includes:
Real text obtains subelement, for obtaining the true cypher text and true identification text of the sample voice;
Model parameter updates subelement, for updating current voiced translation according to translation different information and Recognition Different information The parameter of model and speech recognition modeling;
Wherein, the translation different information is the difference between the prediction cypher text and the true cypher text, described Recognition Different information is the difference between the Forecasting recognition text and the true identification text.
8. device according to claim 7, which is characterized in that the model parameter updates subelement and includes:
Translation model parameter updates subelement, for joining to the voiced translation model according to the translation different information Number updates;
Identification model parameter updates subelement, for joining to the speech recognition modeling according to the Recognition Different information Number updates.
9. according to the described in any item devices of claim 6 to 8, which is characterized in that the speech recognition modeling and the voice Translation model shares an encoder, and the speech recognition modeling includes an identification decoder, the voiced translation model packet It includes one and is translated and decoded device.
10. a kind of speech translation apparatus characterized by comprising
Target voice acquiring unit, for obtaining target voice to be translated;
Target voice translation unit, the voiced translation mould for being obtained using the described in any item device training of claim 6 to 9 Type translates the target voice.
11. a kind of training equipment of voiced translation model characterized by comprising processor, memory, system bus;
The processor and the memory are connected by the system bus;
The memory includes instruction for storing one or more programs, one or more of programs, and described instruction works as quilt The processor makes the processor perform claim require 1-4 described in any item methods when executing.
12. a kind of computer readable storage medium, which is characterized in that instruction is stored in the computer readable storage medium, When described instruction is run on the terminal device, so that the terminal device perform claim requires the described in any item methods of 1-4.
13. a kind of computer program product, which is characterized in that when the computer program product is run on the terminal device, make It obtains the terminal device perform claim and requires the described in any item methods of 1-4.
14. a kind of speech translation apparatus characterized by comprising processor, memory, system bus;
The processor and the memory are connected by the system bus;
The memory includes instruction for storing one or more programs, one or more of programs, and described instruction works as quilt The processor makes method described in the processor perform claim requirement 5 when executing.
15. a kind of computer readable storage medium, which is characterized in that instruction is stored in the computer readable storage medium, When described instruction is run on the terminal device, so that method described in terminal device perform claim requirement 5.
16. a kind of computer program product, which is characterized in that when the computer program product is run on the terminal device, make Obtain method described in the terminal device perform claim requirement 5.
CN201910198404.9A 2019-03-15 2019-03-15 Training method and device of voice translation model Active CN109785824B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910198404.9A CN109785824B (en) 2019-03-15 2019-03-15 Training method and device of voice translation model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910198404.9A CN109785824B (en) 2019-03-15 2019-03-15 Training method and device of voice translation model

Publications (2)

Publication Number Publication Date
CN109785824A true CN109785824A (en) 2019-05-21
CN109785824B CN109785824B (en) 2021-04-06

Family

ID=66488103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910198404.9A Active CN109785824B (en) 2019-03-15 2019-03-15 Training method and device of voice translation model

Country Status (1)

Country Link
CN (1) CN109785824B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298046A (en) * 2019-07-03 2019-10-01 科大讯飞股份有限公司 A kind of translation model training method, text interpretation method and relevant apparatus
CN111243576A (en) * 2020-01-16 2020-06-05 腾讯科技(深圳)有限公司 Speech recognition and model training method, device, equipment and storage medium
CN111488486A (en) * 2020-04-20 2020-08-04 武汉大学 Electronic music classification method and system based on multi-sound-source separation
CN111508470A (en) * 2020-04-26 2020-08-07 北京声智科技有限公司 Training method and device of speech synthesis model
CN111862987A (en) * 2020-07-20 2020-10-30 北京百度网讯科技有限公司 Speech recognition method and device
CN112259083A (en) * 2020-10-16 2021-01-22 北京猿力未来科技有限公司 Audio processing method and device
CN112257471A (en) * 2020-11-12 2021-01-22 腾讯科技(深圳)有限公司 Model training method and device, computer equipment and storage medium
CN112528679A (en) * 2020-12-17 2021-03-19 科大讯飞股份有限公司 Intention understanding model training method and device and intention understanding method and device
CN112633017A (en) * 2020-12-24 2021-04-09 北京百度网讯科技有限公司 Translation model training method, translation processing method, translation model training device, translation processing equipment and storage medium
CN112699690A (en) * 2020-12-29 2021-04-23 科大讯飞股份有限公司 Translation model training method, translation method, electronic device, and storage medium
CN113362810A (en) * 2021-05-28 2021-09-07 平安科技(深圳)有限公司 Training method, device and equipment of voice processing model and storage medium
CN113420869A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Translation method based on omnidirectional attention and related equipment thereof
CN113674732A (en) * 2021-08-16 2021-11-19 北京百度网讯科技有限公司 Voice confidence detection method and device, electronic equipment and storage medium
CN113763937A (en) * 2021-10-27 2021-12-07 北京百度网讯科技有限公司 Method, device and equipment for generating voice processing model and storage medium
CN116450771A (en) * 2022-12-16 2023-07-18 镁佳(北京)科技有限公司 Multilingual speech translation model construction method and device
CN117275461A (en) * 2023-11-23 2023-12-22 上海蜜度科技股份有限公司 Multitasking audio processing method, system, storage medium and electronic equipment
KR102640315B1 (en) * 2022-10-28 2024-02-22 숭실대학교산학협력단 Deep learning-based voice phishing detection apparatus and method therefor

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100890404B1 (en) * 2007-07-13 2009-03-26 한국전자통신연구원 Method and Apparatus for auto translation using Speech Recognition
US20180075844A1 (en) * 2016-09-09 2018-03-15 Electronics And Telecommunications Research Institute Speech recognition system and method
CN108595443A (en) * 2018-03-30 2018-09-28 浙江吉利控股集团有限公司 Simultaneous interpreting method, device, intelligent vehicle mounted terminal and storage medium
CN108766414A (en) * 2018-06-29 2018-11-06 北京百度网讯科技有限公司 Method, apparatus, equipment and computer readable storage medium for voiced translation
CN108804427A (en) * 2018-06-12 2018-11-13 深圳市译家智能科技有限公司 Speech robot interpretation method and device
US20180342239A1 (en) * 2017-05-26 2018-11-29 International Business Machines Corporation Closed captioning through language detection
US20180365232A1 (en) * 2017-06-14 2018-12-20 Microsoft Technology Licensing, Llc Customized multi-device translated and transcribed conversations
CN109359309A (en) * 2018-12-11 2019-02-19 成都金山互动娱乐科技有限公司 A kind of interpretation method and device, the training method of translation model and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100890404B1 (en) * 2007-07-13 2009-03-26 한국전자통신연구원 Method and Apparatus for auto translation using Speech Recognition
US20180075844A1 (en) * 2016-09-09 2018-03-15 Electronics And Telecommunications Research Institute Speech recognition system and method
US20180342239A1 (en) * 2017-05-26 2018-11-29 International Business Machines Corporation Closed captioning through language detection
US20180365232A1 (en) * 2017-06-14 2018-12-20 Microsoft Technology Licensing, Llc Customized multi-device translated and transcribed conversations
CN108595443A (en) * 2018-03-30 2018-09-28 浙江吉利控股集团有限公司 Simultaneous interpreting method, device, intelligent vehicle mounted terminal and storage medium
CN108804427A (en) * 2018-06-12 2018-11-13 深圳市译家智能科技有限公司 Speech robot interpretation method and device
CN108766414A (en) * 2018-06-29 2018-11-06 北京百度网讯科技有限公司 Method, apparatus, equipment and computer readable storage medium for voiced translation
CN109359309A (en) * 2018-12-11 2019-02-19 成都金山互动娱乐科技有限公司 A kind of interpretation method and device, the training method of translation model and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HANS KRUPAKAR ET AL.: "《A survey of voice translation methodologies — Acoustic dialect decoder》", 《2016 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES)》 *
崔磊: "《统计机器翻译领域自适应的研究》", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110298046B (en) * 2019-07-03 2023-04-07 科大讯飞股份有限公司 Translation model training method, text translation method and related device
CN110298046A (en) * 2019-07-03 2019-10-01 科大讯飞股份有限公司 A kind of translation model training method, text interpretation method and relevant apparatus
CN111243576A (en) * 2020-01-16 2020-06-05 腾讯科技(深圳)有限公司 Speech recognition and model training method, device, equipment and storage medium
CN111243576B (en) * 2020-01-16 2022-06-03 腾讯科技(深圳)有限公司 Speech recognition and model training method, device, equipment and storage medium
CN111488486A (en) * 2020-04-20 2020-08-04 武汉大学 Electronic music classification method and system based on multi-sound-source separation
CN111488486B (en) * 2020-04-20 2021-08-17 武汉大学 Electronic music classification method and system based on multi-sound-source separation
CN111508470A (en) * 2020-04-26 2020-08-07 北京声智科技有限公司 Training method and device of speech synthesis model
CN111508470B (en) * 2020-04-26 2024-04-12 北京声智科技有限公司 Training method and device for speech synthesis model
US11735168B2 (en) 2020-07-20 2023-08-22 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for recognizing voice
CN111862987A (en) * 2020-07-20 2020-10-30 北京百度网讯科技有限公司 Speech recognition method and device
CN111862987B (en) * 2020-07-20 2021-12-28 北京百度网讯科技有限公司 Speech recognition method and device
CN112259083B (en) * 2020-10-16 2024-02-13 北京猿力未来科技有限公司 Audio processing method and device
CN112259083A (en) * 2020-10-16 2021-01-22 北京猿力未来科技有限公司 Audio processing method and device
CN112257471A (en) * 2020-11-12 2021-01-22 腾讯科技(深圳)有限公司 Model training method and device, computer equipment and storage medium
CN112528679A (en) * 2020-12-17 2021-03-19 科大讯飞股份有限公司 Intention understanding model training method and device and intention understanding method and device
CN112528679B (en) * 2020-12-17 2024-02-13 科大讯飞股份有限公司 Method and device for training intention understanding model, and method and device for intention understanding
CN112633017A (en) * 2020-12-24 2021-04-09 北京百度网讯科技有限公司 Translation model training method, translation processing method, translation model training device, translation processing equipment and storage medium
CN112633017B (en) * 2020-12-24 2023-07-25 北京百度网讯科技有限公司 Translation model training method, translation processing method, translation model training device, translation processing equipment and storage medium
CN112699690A (en) * 2020-12-29 2021-04-23 科大讯飞股份有限公司 Translation model training method, translation method, electronic device, and storage medium
CN112699690B (en) * 2020-12-29 2024-02-13 科大讯飞股份有限公司 Translation model training method, translation method, electronic device and storage medium
CN113362810A (en) * 2021-05-28 2021-09-07 平安科技(深圳)有限公司 Training method, device and equipment of voice processing model and storage medium
CN113362810B (en) * 2021-05-28 2024-02-09 平安科技(深圳)有限公司 Training method, device, equipment and storage medium of voice processing model
CN113420869A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Translation method based on omnidirectional attention and related equipment thereof
CN113420869B (en) * 2021-06-30 2024-03-15 平安科技(深圳)有限公司 Translation method based on omnidirectional attention and related equipment thereof
CN113674732A (en) * 2021-08-16 2021-11-19 北京百度网讯科技有限公司 Voice confidence detection method and device, electronic equipment and storage medium
CN113763937A (en) * 2021-10-27 2021-12-07 北京百度网讯科技有限公司 Method, device and equipment for generating voice processing model and storage medium
KR102640315B1 (en) * 2022-10-28 2024-02-22 숭실대학교산학협력단 Deep learning-based voice phishing detection apparatus and method therefor
CN116450771A (en) * 2022-12-16 2023-07-18 镁佳(北京)科技有限公司 Multilingual speech translation model construction method and device
CN117275461A (en) * 2023-11-23 2023-12-22 上海蜜度科技股份有限公司 Multitasking audio processing method, system, storage medium and electronic equipment
CN117275461B (en) * 2023-11-23 2024-03-15 上海蜜度科技股份有限公司 Multitasking audio processing method, system, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN109785824B (en) 2021-04-06

Similar Documents

Publication Publication Date Title
CN109785824A (en) A kind of training method and device of voiced translation model
KR102392094B1 (en) Sequence processing using convolutional neural networks
EP3477633A1 (en) Systems and methods for robust speech recognition using generative adversarial networks
CN109065032B (en) External corpus speech recognition method based on deep convolutional neural network
CN108153913B (en) Training method of reply information generation model, reply information generation method and device
CN107729324A (en) Interpretation method and equipment based on parallel processing
CN104598611B (en) The method and system being ranked up to search entry
CN106875940B (en) Machine self-learning construction knowledge graph training method based on neural network
CN107408111A (en) End-to-end speech recognition
CN108780464A (en) Method and system for handling input inquiry
CN108268441A (en) Sentence similarity computational methods and apparatus and system
CN106897265B (en) Word vector training method and device
CN109871542B (en) Text knowledge extraction method, device, equipment and storage medium
CN113641819B (en) Argumentation mining system and method based on multitasking sparse sharing learning
CN107871496A (en) Audio recognition method and device
CN108630198A (en) Method and apparatus for training acoustic model
Zhou et al. ICRC-HIT: A deep learning based comment sequence labeling system for answer selection challenge
CN111785303B (en) Model training method, imitation sound detection device, equipment and storage medium
CN114913938B (en) Small molecule generation method, equipment and medium based on pharmacophore model
CN108363685B (en) Self-media data text representation method based on recursive variation self-coding model
CN114360502A (en) Processing method of voice recognition model, voice recognition method and device
CN108229677B (en) Method and apparatus for performing recognition and training of a cyclic model using the cyclic model
CN109979461A (en) A kind of voice translation method and device
CN111653270A (en) Voice processing method and device, computer readable storage medium and electronic equipment
CN109933773A (en) A kind of multiple semantic sentence analysis system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant