CN114254659A - Translation method and device, computer readable storage medium and electronic device - Google Patents

Translation method and device, computer readable storage medium and electronic device Download PDF

Info

Publication number
CN114254659A
CN114254659A CN202010998594.5A CN202010998594A CN114254659A CN 114254659 A CN114254659 A CN 114254659A CN 202010998594 A CN202010998594 A CN 202010998594A CN 114254659 A CN114254659 A CN 114254659A
Authority
CN
China
Prior art keywords
information
translation
translated
text recognition
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010998594.5A
Other languages
Chinese (zh)
Inventor
卫林钰
张旭
陈伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN202010998594.5A priority Critical patent/CN114254659A/en
Publication of CN114254659A publication Critical patent/CN114254659A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Abstract

The invention provides a translation method and device, a computer readable storage medium and electronic equipment, and relates to the technical field of data processing. The translation method comprises the following steps: acquiring text identification information corresponding to information to be translated in a source language, wherein the text identification information comprises a confusion network; and inputting the text recognition information into a translation model to generate a translation result of the target language corresponding to the information to be translated. Because the text recognition information comprising the confusion network can effectively enrich the text recognition result of the information to be translated, the translation method provided by the invention can improve the translation accuracy.

Description

Translation method and device, computer readable storage medium and electronic device
Technical Field
The invention relates to the technical field of data processing, in particular to a translation method and device, a computer readable storage medium and electronic equipment.
Background
In recent years, with the accelerated development of globalization, more and more translation needs are emerging. Accordingly, translation techniques are also of increasing interest, particularly speech translation techniques that translate speech in one language (the source language) to text or speech in another language (the target language).
However, the translation accuracy and robustness of the existing translation technology are poor. Even the translation technology based on the neural network has the problem that the translation effect is still not ideal due to the limitation of the training data and the model structure.
Disclosure of Invention
The present invention has been made to solve the above-mentioned problems. The embodiment of the invention provides a translation method and device, a computer readable storage medium and electronic equipment.
In a first aspect, an embodiment of the present invention provides a translation method, where the method includes: acquiring text identification information corresponding to information to be translated in a source language, wherein the text identification information comprises a confusion network; and inputting the text recognition information into a translation model to generate a translation result of the target language corresponding to the information to be translated.
In an embodiment of the present invention, the text recognition information includes a plurality of candidate text recognition results and their corresponding weight information, where the candidate text recognition results correspond to candidate paths on the confusion network.
In an embodiment of the present invention, each candidate text recognition result in the plurality of candidate text recognition results includes a plurality of candidate text recognition units, and the weight information includes probability information corresponding to each of the plurality of candidate text recognition units.
In an embodiment of the invention, the translation model is obtained by training through a confusion network.
In an embodiment of the present invention, when the information to be translated is speech information to be translated, before inputting the text recognition information into the translation model to generate a translation result of a target language corresponding to the information to be translated, the method further includes: and extracting acoustic characteristic information based on the voice information to be translated. The method for inputting the text recognition information into the translation model to generate the translation result of the target language corresponding to the information to be translated includes: and inputting the text recognition information and the acoustic characteristic information into a translation model to generate a translation result of the target language corresponding to the information to be translated.
In an embodiment of the present invention, before the obtaining of the translation model based on the confusion network training, the method further includes: generating text embedding information based on candidate text recognition units in the confusion network; generating word lattice based position embedding information based on candidate paths in the confusion network; and constructing and training a translation model based on the text embedding information, the word lattice-based position embedding information and probability information corresponding to candidate text recognition units in the confusion network.
In a second aspect, an embodiment of the present invention provides a translation apparatus, including: the system comprises an acquisition module, a translation module and a translation module, wherein the acquisition module is used for acquiring text identification information corresponding to information to be translated of a source language, and the text identification information comprises a confusion network; and the translation module is used for inputting the text recognition information into the translation model so as to generate a translation result of the target language corresponding to the information to be translated.
In an embodiment of the present invention, when the information to be translated is speech information to be translated, the translation apparatus further includes a speech synthesis module, configured to synthesize a translation result of the target language into the speech information of the target language.
In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the translation method mentioned in any one of the above embodiments.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including: a processor and a memory for storing the computer-executable instructions. The processor is used for executing the computer-executable instructions to implement the translation method mentioned in any one of the above embodiments.
Because the text recognition information comprising the confusion network can effectively enrich the text recognition result of the information to be translated, compared with the prior art, the translation method provided by the embodiment of the invention can realize the purpose of improving the translation accuracy by utilizing the translation model.
Drawings
Fig. 1 is a schematic view of an application scenario of a translation method according to an embodiment of the present invention.
Fig. 2 is a schematic flow chart of a translation method according to an embodiment of the present invention.
Fig. 3 is a schematic flow chart of a translation method according to another embodiment of the present invention.
Fig. 4 is a schematic diagram illustrating an actual application of the translation method according to an embodiment of the present invention.
Fig. 5 is a flowchart illustrating a translation method according to another embodiment of the present invention.
Fig. 6 is a flowchart illustrating a method for training a network model according to an embodiment of the present invention.
Fig. 7 is a flowchart illustrating a network model training method according to another embodiment of the present invention.
Fig. 8 is a flowchart illustrating a network model training method according to another embodiment of the present invention.
Fig. 9 is a flowchart illustrating a network model training method according to yet another embodiment of the present invention.
Fig. 10 is a schematic structural diagram of a translation neural network according to an embodiment of the present invention.
Fig. 11 is a schematic structural diagram of a translation apparatus according to an embodiment of the present invention.
Fig. 12 is a schematic structural diagram of a training apparatus for a network model according to an embodiment of the present invention.
Fig. 13 is a schematic structural diagram of an apparatus for translation method according to an embodiment of the present invention.
Fig. 14 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
Hereinafter, example embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein.
The technical scheme provided by the invention can be applied to intelligent terminals (such as tablet computers, mobile phones and the like) so that the intelligent terminals have a translation function. Illustratively, the technical scheme provided by the invention can be applied to translation scenes of a single target language, such as scenes of simultaneous interpretation, audio and video subtitle making, voice input method translation and the like. In addition, the technical scheme provided by the invention can also be applied to a multi-target language translation scene, for example, a virtual anchor scene which needs to generate audio information of a plurality of target languages based on the audio information of the source language.
The application scenario of the translation method is briefly described below with reference to fig. 1.
Fig. 1 is a schematic view of an application scenario of a translation method according to an embodiment of the present invention. The scenario shown in fig. 1 includes a server 110 and a client 120 communicatively coupled to the server 110. Specifically, the server 110 may load a translation model that is generated based on the training method of the network model according to the embodiment of the present invention and supports the translation task.
In an actual application process, the client 120 may receive voice information (i.e., information to be translated in a source language) sent by a user, and send the received voice information to the server 110, the server 110 calculates a translation result in a target language corresponding to the received voice information by using a loaded translation model supporting a translation task, and sends the calculated translation result in the target language to the client 120, and the client 120 presents the received translation result in the target language to the user.
Illustratively, the translation model mentioned above is a deep learning based neural network model.
The following briefly describes the translation method of the present invention with reference to fig. 2 to 5.
Fig. 2 is a schematic flow chart of a translation method according to an embodiment of the present invention. As shown in fig. 2, the translation method provided by the embodiment of the present invention includes the following steps.
Step 210, obtaining text identification information corresponding to the information to be translated in the source language.
Illustratively, the text identification information is text information obtained by identifying information to be translated. For example, the information to be translated is speech information, and the text recognition information is text information obtained by recognizing the speech information. For another example, the information to be translated is picture information including a text, and the identification information is text information obtained by identifying the picture information including the text.
In one embodiment of the invention, the text recognition information includes a confusion network. Specifically, the text recognition information includes a plurality of candidate text recognition results corresponding to the information to be translated and weight information corresponding to each of the candidate text recognition results, where the candidate text recognition results correspond to a plurality of candidate paths on the confusion network. Illustratively, the weight information is determined based on information such as context and/or specific context associated with the information to be translated.
In an embodiment of the present invention, the candidate text recognition result is a complete recognition result corresponding to the information to be translated. For example, the information to be translated is the voice information of "i love in china", and correspondingly, the candidate text recognition result may be text information of "i love in china", "i love in the middle" and the like.
In another embodiment of the present invention, each candidate text recognition result includes a plurality of candidate text recognition units, and the weight information includes probability information corresponding to each of the plurality of candidate text recognition units. And any candidate text recognition unit has a corresponding acoustic position. For example, the information to be translated is voice information of "i love china", and the acoustic positions include a first acoustic position, a second acoustic position, a third acoustic position and a fourth acoustic position. The candidate text recognition units at the first acoustic position include "i", "holding", and "nest", the candidate text recognition units at the second acoustic position include "where", "ai", and "hey", the candidate text recognition units at the third acoustic position include "zhong", "shi", and "clock", and the candidate text recognition units at the fourth acoustic position include "guo", "shang", and "pan".
Step 220, inputting the text recognition information into the translation model to generate a translation result of the target language corresponding to the information to be translated.
Illustratively, the translation model is a deep learning based neural network model. It should be understood that the translation model is trained by using training data corresponding to the above-mentioned text recognition information, so that the trained translation model has a function of determining a translation result of a target language corresponding to information to be translated based on the text recognition information.
It should be noted that the source language and the target language are languages of different languages. For example, the source language is Chinese and the target language is English. For another example, the source language is Chinese and the target language is French.
In the actual application process, firstly, the text recognition information corresponding to the information to be translated of the source language is obtained, and then the text recognition information is input into the translation model to generate the translation result of the target language corresponding to the information to be translated.
Because the text recognition information comprising the confusion network can effectively enrich the text recognition result of the information to be translated, compared with the existing translation method, the translation method provided by the embodiment of the invention can achieve the purpose of improving the translation accuracy by utilizing the abundant text recognition result. In addition, compared with a word graph network, the confusion network can effectively reduce redundant information, and further achieves the purpose of improving the translation speed.
Fig. 3 is a schematic flow chart of a translation method according to another embodiment of the present invention. The embodiment shown in fig. 3 of the present invention is extended from the embodiment shown in fig. 2 of the present invention, and the differences between the embodiment shown in fig. 3 and the embodiment shown in fig. 2 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 3, in the translation method provided in the embodiment of the present invention, the step of obtaining text identification information corresponding to information to be translated in a source language includes the following steps.
Step 215, inputting the information to be translated into the recognition model to generate the confusion network.
Illustratively, the confusion network is a Directed Acyclic Graph (DAG) that includes weight information. The plurality of candidate text recognition results mentioned above correspond to a plurality of candidate paths on the confusion network. Each of the plurality of candidate text recognition results includes a plurality of candidate text recognition units, and the weight information includes probability information corresponding to each of the plurality of candidate text recognition units.
Illustratively, the recognition model is a deep learning based neural network model. It should be understood that the recognition model is obtained by training with training data corresponding to the above-mentioned information to be translated, so that the trained recognition model has a function of determining a confusion network corresponding to the information to be translated based on the information to be translated.
In addition, in the translation method provided by the embodiment of the present invention, the step of inputting the text recognition information into the translation model to generate the translation result of the target language corresponding to the information to be translated includes the following steps.
Step 225, inputting the confusion network into the translation model to generate a translation result of the target language corresponding to the information to be translated.
In the practical application process, firstly, the information to be translated is input into the recognition model to generate a confusion network corresponding to the information to be translated, and then the confusion network is input into the translation model to generate a translation result of a target language corresponding to the information to be translated.
The translation method provided by the embodiment of the invention further enriches the effective information content contained in the recognition result by means of the confusion network, thereby further improving the translation accuracy and the translation quality.
An application scenario of the translation method mentioned in the embodiment shown in fig. 3 is illustrated in conjunction with fig. 4.
Fig. 4 is a schematic diagram illustrating an actual application of the translation method according to an embodiment of the present invention. Specifically, in the embodiment of the present invention, the translation method mentioned in the embodiment shown in fig. 3 is implemented based on the translation apparatus shown in fig. 4.
As shown in fig. 4, the Translation apparatus provided in the embodiment of the present invention includes an Automatic Speech Recognition (ASR) module 410, a Neural network Translation (NMT) module 420 communicatively connected To the Speech Recognition module 410, and a Speech synthesis (Text To Speech, TTS) module 430 communicatively connected To the Neural network Translation module 420.
The speech recognition module 410 is used to convert the speech information to be translated in the source language into corresponding text recognition information, i.e., into the confusion network 421. The speech recognition module 410 performs a function equivalent to the obtaining module 1110 mentioned in the embodiment shown in fig. 11. The neural network translation module 420 is configured to generate text embedded information, position embedded information based on word lattice (also referred to as word graph), and probability information based on the confusion network 421 converted by the speech recognition module 410, then input the information into the neural network translation encoder 422 based on word lattice to perform encoding operation, and input a corresponding encoding result into the neural network translation decoder 423 based on word lattice to perform decoding operation, so as to finally obtain text information of a target language. The speech synthesis module 430 is used for converting the text information of the target language into the speech information of the target language.
Illustratively, the word lattice based neural network translation encoder 422 performs the encoding operation mainly by multiplying the word lattice based position embedding information and the probability information. For example, in the actual application process, the absolute position information of the candidate text recognition unit included in the candidate text recognition result (i.e. the position embedding information based on the word lattice) is determined according to the candidate path information of the confusion network, and then the absolute position information of each candidate text recognition unit is multiplied by the corresponding probability information to determine the actual embedding position of the candidate text recognition unit.
It should be noted that the above-mentioned steps of generating the text embedding information, the word lattice based position embedding information and the probability information based on the confusion network 421 converted by the speech recognition module 410 can be performed by the neural network translation module 420, or a front-end module communicatively connected to the neural network translation module 420 can be separately generated, and the front-end module is used for generating the text embedding information, the word lattice based position embedding information and the probability information based on the confusion network 421.
In addition, it should be noted that, in the confusion network 421, X0 to X9 represent 9 candidate text recognition units, and correspondingly, P0 to P9 represent probability information of the corresponding candidate text recognition units.
In practical application, the speech information to be translated in the source language is input into the speech recognition module 410 to generate the confusion network 421, then the generated confusion network 421 is input into the neural network translation module 420 to generate the text information in the target language, and finally the generated text information in the target language is input into the speech synthesis module 430 to generate the speech information in the target language.
It should be noted that the translation apparatus may not include the aforementioned speech synthesis module 430. Correspondingly, the translation device directly outputs the text information of the target language corresponding to the to-be-translated voice information of the source language.
Fig. 5 is a flowchart illustrating a translation method according to another embodiment of the present invention. The embodiment shown in fig. 5 of the present invention is extended from the embodiment shown in fig. 2 of the present invention, and the differences between the embodiment shown in fig. 5 and the embodiment shown in fig. 2 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 5, in the translation method provided in the embodiment of the present invention, the information to be translated is speech information to be translated.
Optionally, in another embodiment, before the text recognition information is input into the translation model to generate a translation result of the target language corresponding to the information to be translated, the method may further include the following step.
And step 216, extracting acoustic characteristic information based on the voice information to be translated.
In an embodiment of the present invention, the above-mentioned acoustic feature information includes at least one of linear prediction coefficients LPC, cepstrum coefficients CEP, Mel-cepstrum coefficients MFCC, and perceptual linear prediction PLP. Illustratively, the acoustic feature information is determined based on a relevant network model. For example, the acoustic feature information is output of a certain level of the recognition model mentioned in the following embodiments.
In addition, in the translation method provided by the embodiment of the present invention, the step of inputting the text recognition information into the translation model to generate the translation result of the target language corresponding to the information to be translated includes the following steps.
Step 226, inputting the text recognition information and the acoustic feature information into the translation model to generate a translation result of the target language corresponding to the information to be translated.
It should be understood that the translation model mentioned in step 226 is obtained by training with training data corresponding to the above-mentioned text recognition information and acoustic feature information, so that the trained translation model has a function of determining a translation result of the target language corresponding to the information to be translated based on the text recognition information and the acoustic feature information.
In the practical application process, firstly, text recognition information corresponding to-be-translated information of a source language is obtained, then acoustic characteristic information is extracted based on to-be-translated voice information included in the to-be-translated information, and then the text recognition information and the acoustic characteristic information are input into a translation model to generate a translation result of a target language corresponding to the to-be-translated information.
According to the translation method provided by the embodiment of the invention, the text recognition information and the acoustic characteristic information are input into the translation model, and the translation result of the target language corresponding to the information to be translated is generated by means of the translation model, so that the purpose of increasing the translation reference information quantity of the trained translation model by means of the acoustic characteristic information is realized, and the translation accuracy of the trained translation model is further improved.
In an embodiment of the present invention, the translation model mentioned above is obtained by training through a confusion network. And before the translation model is obtained based on the confusion network training, the translation method further comprises the following steps: generating text embedding information based on candidate text recognition units in the confusion network; generating word lattice based position embedding information based on candidate paths in the confusion network; and constructing and training a translation model based on the text embedding information, the word lattice-based position embedding information and probability information corresponding to candidate text recognition units in the confusion network.
It should be noted that, the text embedding information, the word lattice-based position embedding information, and the probability information corresponding to the candidate text recognition unit mentioned in the embodiment of the present invention may be referred to as the embodiment shown in fig. 4, and the embodiment of the present invention is not described again.
In the practical application process, firstly, information to be translated is input into a recognition model to generate a confusion network corresponding to the information to be translated, then, text embedded information is generated based on candidate text recognition units in the confusion network, word lattice-based position embedded information is generated based on candidate paths in the confusion network, and a translation model is constructed and trained based on the text embedded information, the word lattice-based position embedded information and probability information corresponding to the candidate text recognition units in the confusion network.
According to the embodiment of the invention, the matching degree of the translation model and the confusion network is improved by constructing and training the translation model based on the text embedded information, the position embedded information based on the word lattice and the probability information, so that the translation accuracy of the translation method is improved.
The above describes the embodiment of the translation method of the present invention in detail with reference to fig. 2 to 5, and the following describes the embodiment of the training method of the network model of the present invention in detail with reference to fig. 6 to 9. It is to be understood that the training method of the network model mentioned below is for training the translation model in the above-described embodiment, in other words, in some cases, the steps of the training method of the network model mentioned below may be incorporated into the translation method mentioned in the above-described embodiment. Furthermore, it should be understood that the description of the embodiment of the translation method corresponds to the description of the embodiment of the training method of the network model, and therefore, the parts not described in detail can be referred to the previous embodiment of the translation method.
Fig. 6 is a flowchart illustrating a method for training a network model according to an embodiment of the present invention. As shown in fig. 6, the method for training a network model according to the embodiment of the present invention includes the following steps.
Step 610, obtaining text identification information corresponding to the sample information to be translated of the source language.
Illustratively, the text identification information is text information obtained by identifying the sample information to be translated. For example, the sample information to be translated is speech information, and the text recognition information is text information obtained by recognizing the speech information. For another example, the sample information to be translated is picture information including a text, and the identification information is text information obtained by identifying the picture information including the text.
In one embodiment of the invention, the text recognition information includes a confusion network. Specifically, the text recognition information includes a plurality of candidate text recognition results corresponding to the sample information to be translated and weight information corresponding to each of the candidate text recognition results, where the candidate text recognition results correspond to a plurality of candidate paths on the confusion network. Illustratively, the weight information is determined based on information such as context and/or specific context related to the sample information to be translated.
It should be understood that the sample information to be translated mentioned in step 610 corresponds to the information to be translated mentioned in the above embodiments, for example, the information type of the sample information to be translated is consistent with the information type of the information to be translated. Based on this, the post-training translation model obtained by training in the embodiment of the present invention can be applied to the translation method mentioned in the above embodiment.
Step 620, training a translation model based on the text recognition information.
In the practical application process, firstly, the text recognition information corresponding to the to-be-translated sample information of the source language is obtained, and then the translation model is trained based on the text recognition information.
Because the text recognition information comprising the confusion network can effectively enrich the text recognition result of the sample information to be translated, compared with the prior art, the embodiment of the invention can improve the robustness and the translation accuracy of the translation model obtained by training.
Fig. 7 is a flowchart illustrating a network model training method according to another embodiment of the present invention. The embodiment shown in fig. 7 of the present invention is extended from the embodiment shown in fig. 6 of the present invention, and the differences between the embodiment shown in fig. 7 and the embodiment shown in fig. 6 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 7, in the training method of a network model provided in the embodiment of the present invention, the step of obtaining text recognition information corresponding to sample information to be translated in a source language includes the following steps.
Step 615, inputting the sample information to be translated into the recognition model to generate a confusion network corresponding to the sample information to be translated.
Illustratively, the generating of the recognition model includes: and training the recognition model based on the sample information to be translated to generate the recognition model for determining the confusion network corresponding to the sample information to be translated. Wherein, the recognition model is a neural network model based on deep learning.
In addition, in the training method of the network model provided by the embodiment of the present invention, the step of training the translation model based on the text recognition information includes the following steps.
Step 625, train the translation model based on the confusion network.
In the practical application process, the information of the sample to be translated is firstly input into the recognition model to generate a confusion network corresponding to the information of the sample to be translated, and then the translation model is trained based on the confusion network.
The training method of the network model provided by the embodiment of the invention further improves the translation accuracy of the translation model obtained by training by means of the confusion network. In addition, compared with a word graph network, the confusion network can effectively reduce redundant information, and further achieves the purpose of improving the translation speed of the translation model obtained through training.
Fig. 8 is a flowchart illustrating a network model training method according to another embodiment of the present invention. The embodiment shown in fig. 8 of the present invention is extended from the embodiment shown in fig. 7 of the present invention, and the differences between the embodiment shown in fig. 8 and the embodiment shown in fig. 7 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 8, in the training method of a network model provided in the embodiment of the present invention, before training a translation model based on a confusion network, the following steps are further included.
At step 616, text embedding information is generated based on the candidate text recognition units in the confusion network.
Step 617, generate word lattice based position embedded information based on the candidate paths in the confusion network.
And step 618, constructing and training a translation model based on the text embedding information, the word lattice-based position embedding information and probability information corresponding to candidate text recognition units in the confusion network.
It should be noted that, the text embedding information, the word lattice-based position embedding information, and the probability information corresponding to the candidate text recognition unit mentioned in the embodiment of the present invention may be referred to as the embodiment shown in fig. 4, and the embodiment of the present invention is not described again.
In the practical application process, firstly, the information of the sample to be translated is input into a recognition model to generate a confusion network corresponding to the information of the sample to be translated, then, text embedded information is generated based on candidate text recognition units in the confusion network, word lattice-based position embedded information is generated based on candidate paths in the confusion network, and a translation model is constructed and trained based on the text embedded information, the word lattice-based position embedded information and probability information corresponding to the candidate text recognition units in the confusion network.
According to the embodiment of the invention, the matching degree of the translation model and the confusion network is improved by constructing and training the translation model based on the text embedded information, the position embedded information based on the word lattice and the probability information, so that the translation accuracy of the translation model generated by training is improved.
Fig. 9 is a flowchart illustrating a network model training method according to yet another embodiment of the present invention. The embodiment shown in fig. 9 of the present invention is extended from the embodiment shown in fig. 6 of the present invention, and the differences between the embodiment shown in fig. 9 and the embodiment shown in fig. 6 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 9, in the training method of a network model according to the embodiment of the present invention, the sample information to be translated is speech sample information to be translated.
Optionally, in an embodiment of the present invention, before training the translation model based on the text recognition information, the training method for the network model may further include the following steps.
Step 619, extracting acoustic feature information based on the voice sample information to be translated.
In addition, in the training method of the network model provided by the embodiment of the present invention, the step of training the translation model based on the text recognition information includes the following steps.
At step 626, the translation model is trained based on the text recognition information and the acoustic feature information.
In the practical application process, firstly, text recognition information corresponding to-be-translated sample information of a source language is obtained, acoustic feature information is extracted based on to-be-translated voice sample information, and then a translation model is trained based on the text recognition information and the acoustic feature information.
According to the embodiment of the invention, the information quantity of the training data is further enriched by means of the acoustic characteristic information, and the robustness and the translation accuracy of the translation model generated by training are further improved.
The method embodiment of the present invention is described in detail above with reference to fig. 2 to 9, and the model structure embodiment and the apparatus embodiment of the present invention are described in detail below with reference to fig. 10 to 14. It is to be understood that the training means of the network model mentioned below is for training the translation model, in other words, in some cases, the training means of the network model mentioned below may be incorporated into the translation means mentioned below. Furthermore, it is to be understood that the description of the method embodiments corresponds to the description of the model structure embodiments and the device embodiments, and therefore reference may be made to the preceding method embodiments for parts that are not described in detail.
Fig. 10 is a schematic structural diagram of a translation neural network according to an embodiment of the present invention. As shown in fig. 10, a neural network 1000 for translation provided by an embodiment of the present invention includes a translation model 1020 and a recognition model 1010 communicatively coupled to the translation model 1020.
The translation model 1020 is configured to generate a translation result of the target language corresponding to the information to be translated based on the text recognition information corresponding to the information to be translated in the source language. Wherein the text recognition information includes a confusion network. The recognition model 1110 is used for generating a confusion network corresponding to the information to be translated based on the information to be translated.
According to the translation neural network provided by the embodiment of the invention, rich text recognition information of the recognition model is transmitted to the translation model through cascading the recognition model and the translation model, so that the translation model is translated by means of the text recognition information, and the purpose of improving the translation accuracy is realized.
It should be noted that the recognition model 1010 mentioned in the above embodiments is not necessarily present. When the recognition model 1010 is not included in the translation neural network 1000, a confusion network corresponding to the information to be translated may be generated by means of other related structures.
Fig. 11 is a schematic structural diagram of a translation apparatus according to an embodiment of the present invention. As shown in fig. 11, the translation apparatus 1100 according to the embodiment of the present invention includes an obtaining module 1110 and a translating module 1120. The obtaining module 1110 is configured to obtain text identification information corresponding to information to be translated in a source language, where the text identification information includes a confusion network. The translation module 1120 is configured to input the text recognition information into the translation model to generate a translation result of the target language corresponding to the information to be translated.
Optionally, the obtaining module 1110 is further configured to input information to be translated into the recognition model to generate the confusion network. The translation module 1120 is further configured to input the confusion network into the translation model to generate a translation result of the target language corresponding to the information to be translated.
Optionally, the obtaining module 1110 is further configured to extract acoustic feature information based on the speech information to be translated. The translation module 1120 is further configured to input the text recognition information and the acoustic feature information into a translation model to generate a translation result of a target language corresponding to the information to be translated.
Fig. 12 is a schematic structural diagram of a training apparatus for a network model according to an embodiment of the present invention. As shown in fig. 12, the training apparatus 1200 for network model provided in the embodiment of the present invention includes an obtaining module 1210 and a training module 1220. The obtaining module 1210 is configured to obtain text identification information corresponding to sample information to be translated in a source language. The training module 1220 is used to train the translation model based on the text recognition information.
Optionally, the obtaining module 1210 is further configured to input the sample information to be translated into the recognition model to generate a confusion network corresponding to the sample information to be translated. The training module 1220 is also configured to train the translation model based on the confusion network.
Optionally, the training module 1220 is further configured to generate text embedding information based on candidate text recognition units in the confusion network; generating word lattice based position embedding information based on candidate paths in the confusion network; and constructing and training a translation model based on the text embedding information, the word lattice-based position embedding information and probability information corresponding to candidate text recognition units in the confusion network.
Optionally, the obtaining module 1210 is further configured to extract acoustic feature information based on the to-be-translated speech sample information. The training module 1220 is further configured to train the translation model based on the text recognition information and the acoustic feature information.
Fig. 13 is a block diagram illustrating an apparatus 1300 for a translation method in accordance with an example embodiment. For example, apparatus 1300 may be a robot, mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.
Referring to fig. 13, the apparatus 1300 may include one or more of the following components: a processing component 1302, a memory 1304, a power component 1306, a multimedia component 1308, an audio component 1310, an input/output (I/O) interface 1312, a sensor component 1313, and a communication component 1316.
The processing component 1302 generally controls overall operation of the device 1300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing element 1302 may include one or more processors 1320 to execute instructions to perform all or part of the steps of the method described above. Further, the processing component 1302 can include one or more modules that facilitate interaction between the processing component 1302 and other components. For example, the processing component 1302 may include a multimedia module to facilitate interaction between the multimedia component 1308 and the processing component 1302.
The memory 1304 is configured to store various types of data to support operations at the apparatus 1300. Examples of such data include instructions for any application or method operating on device 1300, information to be translated, pictures, videos, and so forth. The memory 1304 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROMZ), magnetic memory, flash memory, magnetic or optical disks.
Power supply component 1306 provides power to the various components of device 1300. Power components 1306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for device 1300.
The multimedia component 1308 includes a screen between the device 1300 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1308 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the apparatus 1300 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 1310 is configured to output and/or input audio signals. For example, the audio component 1310 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 1300 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1304 or transmitted via the communication component 1316. In some embodiments, the audio component 1310 also includes a speaker for outputting audio signals.
The I/O interface 1312 provides an interface between the processing component 1302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 1314 includes one or more sensors for providing various aspects of state assessment for the device 1300. For example, the sensor assembly 1314 may detect the open/closed state of the device 1300, the relative positioning of components, such as a display and keypad of the device 1300, the sensor assembly 1314 may also detect a change in the position of the device 1300 or a component of the device 1300, the presence or absence of user contact with the device 1300, orientation or acceleration/deceleration of the device 1300, and a change in the temperature of the device 1300. The sensor assembly 1314 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 1316 is configured to facilitate communications between the apparatus 1300 and other devices in a wired or wireless manner. The apparatus 1300 may access a wireless network based on a communication standard, such as WiFi, 2G or 8G, or a combination thereof. In an exemplary embodiment, the communication component 1316 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1316 also includes a Near Field Communications (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 1300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 1304 comprising instructions, executable by the processor 1320 of the apparatus 1300 to perform the method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 14 is a schematic structural diagram of a server in an embodiment of the present invention. The server 1400 may vary widely by configuration or performance, and may include one or more Central Processing Units (CPUs) 1422 (e.g., one or more processors) and memory 1432, one or more storage media 1430 (e.g., one or more mass storage devices) that store applications 1442 or data 1444. Memory 1432 and storage media 1430, among other things, may be transient or persistent storage. The program stored on storage medium 1430 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a server. Still further, a central processor 1422 may be disposed in communication with storage medium 1430 for executing a series of instruction operations on storage medium 1430 on server 1400.
The server 1400 may also include one or more power supplies 1424, one or more wired or wireless network interfaces 1450, one or more input-output interfaces 1458, one or more keyboards 1454, and/or one or more operating systems 1441, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as a read-only memory (ROM), a RAM, a magnetic disk, or an optical disk.
It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only one embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of translation, comprising:
acquiring text identification information corresponding to information to be translated in a source language, wherein the text identification information comprises a confusion network;
and inputting the text recognition information into a translation model to generate a translation result of the target language corresponding to the information to be translated.
2. The translation method according to claim 1, wherein the text identification information includes: a plurality of candidate text recognition results and their respective corresponding weight information, wherein the plurality of candidate text recognition results correspond to a plurality of candidate paths on the confusion network.
3. The translation method according to claim 2, wherein each of the candidate text recognition results includes a plurality of candidate text recognition units, and the weight information includes probability information corresponding to each of the candidate text recognition units.
4. The translation method according to claim 1, wherein the translation model is trained by a confusion network.
5. The translation method according to any one of claims 1 to 4, wherein when the information to be translated is speech information to be translated, before the inputting the text recognition information into a translation model to generate a translation result of a target language corresponding to the information to be translated, the method further comprises:
extracting acoustic characteristic information based on the voice information to be translated;
inputting the text recognition information into a translation model to generate a translation result of a target language corresponding to the information to be translated, wherein the method comprises the following steps:
and inputting the text recognition information and the acoustic characteristic information into the translation model to generate a translation result of a target language corresponding to the information to be translated.
6. The translation method according to claim 4, further comprising, before training the translation model based on a confusion network:
generating text embedding information based on candidate text recognition units in the confusion network;
generating word lattice based position embedding information based on candidate paths in the confusion network;
and constructing and training the translation model based on the text embedding information, the word lattice-based position embedding information and probability information corresponding to candidate text recognition units in the confusion network.
7. A translation apparatus, comprising:
the system comprises an acquisition module, a translation module and a translation module, wherein the acquisition module is used for acquiring text identification information corresponding to information to be translated of a source language, and the text identification information comprises a confusion network;
and the translation module is used for inputting the text recognition information into a translation model so as to generate a translation result of the target language corresponding to the information to be translated.
8. The translation apparatus according to claim 7, wherein when the information to be translated is speech information to be translated, the translation apparatus further comprises a speech synthesis module for synthesizing the translation result of the target language into speech information of the target language.
9. A computer-readable storage medium, characterized in that the storage medium stores instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the translation method of any of the preceding claims 1 to 6.
10. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing the computer-executable instructions;
the processor, configured to execute the computer-executable instructions to implement the translation method of any of the above claims 1 to 6.
CN202010998594.5A 2020-09-21 2020-09-21 Translation method and device, computer readable storage medium and electronic device Pending CN114254659A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010998594.5A CN114254659A (en) 2020-09-21 2020-09-21 Translation method and device, computer readable storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010998594.5A CN114254659A (en) 2020-09-21 2020-09-21 Translation method and device, computer readable storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN114254659A true CN114254659A (en) 2022-03-29

Family

ID=80788365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010998594.5A Pending CN114254659A (en) 2020-09-21 2020-09-21 Translation method and device, computer readable storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN114254659A (en)

Similar Documents

Publication Publication Date Title
CN107705783B (en) Voice synthesis method and device
CN110210310B (en) Video processing method and device for video processing
CN106971723B (en) Voice processing method and device for voice processing
CN107291690B (en) Punctuation adding method and device and punctuation adding device
CN107221330B (en) Punctuation adding method and device and punctuation adding device
CN107945806B (en) User identification method and device based on sound characteristics
CN107291704B (en) Processing method and device for processing
CN108073572B (en) Information processing method and device, simultaneous interpretation system
CN111836062A (en) Video playing method and device and computer readable storage medium
CN113362812B (en) Voice recognition method and device and electronic equipment
CN108628819B (en) Processing method and device for processing
CN108364635B (en) Voice recognition method and device
CN112735396A (en) Speech recognition error correction method, device and storage medium
CN113539233A (en) Voice processing method and device and electronic equipment
CN113362813A (en) Voice recognition method and device and electronic equipment
CN115273831A (en) Voice conversion model training method, voice conversion method and device
CN111369978A (en) Data processing method and device and data processing device
CN105913841B (en) Voice recognition method, device and terminal
CN115039169A (en) Voice instruction recognition method, electronic device and non-transitory computer readable storage medium
CN109887492B (en) Data processing method and device and electronic equipment
CN109979435B (en) Data processing method and device for data processing
CN110930977A (en) Data processing method and device and electronic equipment
CN113115104B (en) Video processing method and device, electronic equipment and storage medium
CN113113040B (en) Audio processing method and device, terminal and storage medium
CN114254659A (en) Translation method and device, computer readable storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination