CN108804427B - Voice machine translation method and device - Google Patents

Voice machine translation method and device Download PDF

Info

Publication number
CN108804427B
CN108804427B CN201810598964.9A CN201810598964A CN108804427B CN 108804427 B CN108804427 B CN 108804427B CN 201810598964 A CN201810598964 A CN 201810598964A CN 108804427 B CN108804427 B CN 108804427B
Authority
CN
China
Prior art keywords
corpus
translated
word
vector
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810598964.9A
Other languages
Chinese (zh)
Other versions
CN108804427A (en
Inventor
吴严忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yijia Intelligent Technology Co ltd
Original Assignee
Shenzhen Yijia Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yijia Intelligent Technology Co ltd filed Critical Shenzhen Yijia Intelligent Technology Co ltd
Priority to CN201810598964.9A priority Critical patent/CN108804427B/en
Publication of CN108804427A publication Critical patent/CN108804427A/en
Application granted granted Critical
Publication of CN108804427B publication Critical patent/CN108804427B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a speech machine translation method and device, and relates to the technical field of data processing based on deep learning. The method comprises the following steps: collecting voice information, and converting the voice information into linguistic data to be translated; inputting the linguistic data to be translated into the trained translation model; converting the linguistic data to be translated into intermediate linguistic data vectors; and converting the intermediate corpus vector into a target corpus corresponding to a preset language, wherein the preset language is different from the language corresponding to the corpus to be translated. According to the scheme, collected voice information is converted into the linguistic data to be translated, then the linguistic data to be translated is converted into the intermediate linguistic data vector, and the intermediate linguistic data vector is converted into the target linguistic data corresponding to the preset language, so that on one hand, the voice can be directly translated, on the other hand, the translation model among multiple languages is simplified and constructed, the complexity of the system is reduced, and the consumption of operation resources of the system in the translation process can be reduced.

Description

Voice machine translation method and device
Technical Field
The invention relates to the technical field of data processing based on deep learning, in particular to a method and a device for speech machine translation.
Background
In the field of Language Translation by Machine, a combination of deep learning technology and Natural Language Processing (NLP) is a common means for realizing Machine Translation, from a Machine Translation method based entirely on rules originally compiled by human to the current Neural Machine Translation (NMT). In the prior NMT technology, training complexity is high, and interpretability is poor. For example, in the prior art, only a corpus in a text form can be translated, another translation model can only translate two fixed languages, and if it is necessary to translate other languages or translate a corpus to be translated into other languages, translation models corresponding to the two languages need to be established separately, thereby increasing the complexity of the system and the consumption of system computing resources.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a speech machine translation method and a speech machine translation device.
In order to achieve the above object, the technical solutions provided by the preferred embodiments of the present invention are as follows:
the preferred embodiment of the present invention provides a speech machine translation method, which comprises:
collecting voice information, and converting the voice information into a corpus to be translated;
inputting the linguistic data to be translated into a trained translation model;
converting the linguistic data to be translated into intermediate linguistic data vectors;
and converting the intermediate corpus vector into a target corpus corresponding to a preset language, wherein the preset language is different from the language corresponding to the corpus to be translated.
Optionally, before the step of acquiring the voice information, the method includes:
acquiring a training corpus, which comprises a plurality of training corpuses;
for each training corpus, converting each word and/or word in the training corpus into a word vector, wherein each word vector is associated with a word or word corresponding to at least one type of preset language in advance;
and training a preset translation model by using the training corpus and adopting a deep learning algorithm to obtain the trained translation model.
Optionally, the step of converting the corpus to be translated into an intermediate corpus vector includes:
and converting the characters and/or words in the corpus to be translated into corresponding word vectors to be translated, and combining the word vectors to be translated to obtain the intermediate corpus vector.
Optionally, the step of converting the intermediate corpus vector into a target corpus corresponding to a preset language includes:
matching the word vector to be translated with the word vector in the trained translation model to obtain the similarity between the word vector to be translated and the word vector;
taking the word or word associated with the word vector with the maximum similarity in the preset language in the training corpus as the corresponding word or word of the word vector to be translated in the preset language;
and arranging and combining the characters or words corresponding to each word vector to be translated to obtain the target corpus.
Optionally, the step of combining the words or phrases corresponding to each word vector to be translated to obtain the target corpus includes:
and sequencing and combining the characters or words corresponding to each word vector to be translated according to the grammar of the preset language to obtain the target language material.
Optionally, after the step of converting the intermediate corpus vector into the target corpus corresponding to the preset language, the method further includes:
and playing the target corpus in a voice mode.
An embodiment of the present invention further provides a speech machine translation apparatus, where the speech machine translation apparatus includes:
the acquisition and conversion unit is used for acquiring voice information and converting the voice information into linguistic data to be translated;
the input unit is used for inputting the linguistic data to be translated into the trained translation model;
the first conversion unit is used for converting the linguistic data to be translated into an intermediate linguistic data vector;
and a second conversion unit, configured to convert the intermediate corpus vector into a target corpus corresponding to a preset language, where the preset language is different from the language corresponding to the corpus to be translated.
Optionally, the speech machine translation device further includes a third conversion unit and a model training unit, and before the acquisition and conversion unit obtains the corpus to be translated, the input unit is further configured to obtain a training corpus including a plurality of training corpora;
the third conversion unit is configured to convert, for each corpus, each word and/or word in the corpus into a word vector, where each word vector is associated with a word or word corresponding to at least one type of the preset language in advance;
and the model training unit is used for training a preset translation model by using the training corpus and adopting a deep learning algorithm to obtain the trained translation model.
Optionally, the first conversion unit is further configured to:
and converting the characters and/or words in the corpus to be translated into corresponding word vectors to be translated, and combining the word vectors to be translated to obtain the intermediate corpus vector.
Optionally, the second conversion unit is further configured to:
matching the word vector to be translated with the word vector in the trained translation model to obtain the similarity between the word vector to be translated and the word vector;
taking the word or word associated with the word vector with the maximum similarity in the preset language in the training corpus as the corresponding word or word of the word vector to be translated in the preset language;
and arranging and combining the characters or words corresponding to each word vector to be translated to obtain the target corpus.
Compared with the prior art, the speech machine translation method and the device provided by the invention have the advantages that the collected speech information is converted into the corpus to be translated, then the corpus to be translated is converted into the intermediate corpus vector, and the intermediate corpus vector is converted into the target corpus corresponding to the preset language, so that the speech can be directly translated, the construction of a translation model among multiple languages is simplified, the complexity of the system is reduced, and the consumption of operation resources of the system in the translation process is reduced.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments will be briefly described below. It is appreciated that the following drawings depict only some embodiments of the invention and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 is a block diagram of an electronic device according to an embodiment of the present invention.
Fig. 2 is a flowchart of a speech machine translation method according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of language translation provided in this embodiment.
Fig. 4 is a second flowchart of the speech machine translation method according to the embodiment of the present invention.
Fig. 5 is a block diagram of a speech machine translation apparatus according to an embodiment of the present invention.
Icon: 10-an electronic device; 11-a processing unit; 12-a storage unit; 100-speech machine translation means; 110-an acquisition conversion unit; 120-an input unit; 130-a first conversion unit; 140-second conversion unit.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Fig. 1 is a block diagram of an electronic device 10 according to an embodiment of the present invention. The electronic device 10 provided by the embodiment of the present invention may be used to execute the steps of the machine translation method. For example, the electronic device 10 may be used to translate documents in the Chinese language to documents in the English language.
In the present embodiment, the electronic device 10 may be, but is not limited to, a smart phone, a Personal Computer (PC), a tablet PC, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), and the like.
In this embodiment, the electronic device 10 may include a processing unit 11, a storage unit 12, and a speech machine translation apparatus 100, and the respective elements of the processing unit 11, the storage unit 12, and the speech machine translation apparatus 100 are directly or indirectly electrically connected to implement data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.
The processing unit 11 may be a processor. For example, the Processor may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Network Processor (NP), or the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed.
The memory unit 12 may be, but is not limited to, a random access memory, a read only memory, a programmable read only memory, an erasable programmable read only memory, an electrically erasable programmable read only memory, and the like. In this embodiment, the storage unit 12 may be used to store a translation model. Of course, the storage unit 12 may also be used for storing a program, which the processing unit 11 executes upon receiving an execution instruction.
Alternatively, the electronic device 10 may include a communication unit configured to establish a communication connection between the electronic device 10 and a server via a network and to transmit and receive data via the network. The network may be, but is not limited to, a wired network or a wireless network. The server may be configured to store a translation model, receive the corpus to be translated from the electronic device 10, and translate the corpus to be translated into the target corpus to be output to the electronic device 10.
Further, the speech machine translation apparatus 100 includes at least one software functional module which can be stored in the storage unit 12 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the electronic device 10. The processing unit 11 is used for executing executable modules stored in the storage unit 12, such as software functional modules and computer programs included in the speech machine translation apparatus 100.
It is understood that the configuration shown in fig. 1 is only a schematic configuration of the electronic device 10, and that the electronic device 10 may further include more components than those shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
Fig. 2 is a schematic flow chart of a machine translation method according to an embodiment of the present invention. The machine translation method provided by the present invention can be applied to the electronic device 10 described above, and the electronic device 10 executes each step of the machine translation method. Wherein, the electronic device 10 may store a preset translation model in advance.
As will be described in detail below for each step of the machine translation method shown in fig. 2, in this embodiment, the machine translation method may include the following steps:
step S210, collecting voice information, and converting the voice information into linguistic data to be translated;
understandably, the voice information may be the voice uttered by the user of the handheld electronic device 10, or the voice uttered by other users.
The step of converting the voice information into the corpus to be translated may be: before the step, the corresponding relation between the sound spectrum characteristics and the words is established. And then, by identifying the voice spectrum characteristics in the voice information, using the corpus consisting of words and phrases corresponding to the identified voice spectrum characteristics as the corpus to be translated. The corpus to be translated can be text information.
Step S220, inputting the linguistic data to be translated into the trained translation model;
in this embodiment, the corpus to be translated has corresponding language classification. The language may be determined based on the text, for example, the corpus to be translated may be chinese language, english language, russian language, and the like, which is not limited herein.
In this embodiment, the trained translation model can be understood as: the preset translation model stored in the electronic device 10 is a translation model obtained after training through a deep learning algorithm.
Step S230, converting the linguistic data to be translated into intermediate linguistic data vectors;
in this embodiment, the corpus to be translated can be converted into an intermediate corpus vector through a translation model. The intermediate corpus vector can be understood as a combination of word vectors corresponding to the words and the words in the corpus to be translated, and the word vectors can be understood as meanings of the words and the words in the corpus to be translated in the standard language. The standard language may be set according to actual conditions, for example, the standard language is a chinese language or a new language constructed separately, and is not limited specifically here.
Alternatively, step S230 may include: and converting the characters and/or words in the linguistic data to be translated into corresponding word vectors to be translated, and combining the word vectors to be translated to obtain an intermediate linguistic data vector.
Understandably, in step S230, the words in the corpus to be translated may be decomposed to obtain a mapping relationship between the corresponding words and meanings of the words in the standard language, and the mapping relationship may be understood as a word vector to be translated, and then the word vectors to be translated may be combined according to the expressed meanings of the corpus to be translated to obtain an intermediate corpus vector, that is, the intermediate corpus may represent the meanings of the corpus to be translated in the standard language. In step S230, the words and phrases in the corpus to be translated may be converted into corresponding word vectors to be translated, or the words and phrases in the corpus to be translated may be converted into corresponding word vectors to be translated, and then the word vectors to be translated are combined to obtain an intermediate corpus vector.
Step S240, converting the intermediate corpus vector into a target corpus corresponding to a preset language, wherein the preset language is different from the language corresponding to the corpus to be translated.
In this embodiment, the predetermined language is the language of the target corpus. For example, english needs to be translated into chinese, and chinese is the preset language. The preset languages can be set according to actual conditions, for example, the languages can be Chinese, English, Japanese, Korean and the like, and the preset languages are different from the languages of the target corpus.
Alternatively, step S240 may include: matching the word vector to be translated with the word vector in the trained translation model to obtain the similarity between the word vector to be translated and the word vector; taking the word or word associated with the word vector with the maximum similarity in the training corpus in the preset language as the corresponding word or word of the word vector to be translated in the preset language; and arranging and combining the characters or words corresponding to each word vector to be translated to obtain the target corpus. The word vector with the maximum similarity is selected to help improve the translation accuracy, and in addition, in the word combination process, the combination can be carried out according to the meaning of the linguistic data to be translated in the standard languages.
Optionally, the step of combining the words or phrases corresponding to each word vector to be translated to obtain the target corpus may include: and (4) sequencing and combining the characters or words corresponding to each word vector to be translated according to a preset rule of a preset language to obtain a target language material. For example, if english is converted into chinese, corresponding chinese words in english can be combined according to chinese grammar.
Optionally, after step S240, the speech machine translation method may further include: and playing the target corpus in a voice mode.
Understandably, the translated target corpus is played through voice, and in the understanding process of the target corpus, the eyes of a user can be liberated, so that the experience of the user is improved.
In order to facilitate understanding of the present embodiment, the following will describe, by way of example, the steps S230 and S240 of the present embodiment:
please refer to fig. 3, which is a schematic diagram of language translation provided in this embodiment, wherein the language on the left side may be understood as the language of the corpus to be translated, and the language on the right side may be understood as the predetermined language. For example, to be translatedThe corpus is "Did you have lunch? If the standard language is the chinese language, and the corpus to be translated needs to be translated into the target corpus of the chinese language (i.e. the predetermined language). Then, the word corresponding to "you" in the standard language in the corpus to be translated is "you" or "your", "Did … have lunch? "is the word corresponding to the meaning in the standard language" has eaten lunch? "," Did you have a lunch? "can the standard language (i.e. target language material) corresponding to the standard language be" do you have had lunch? "is the target material of the corresponding chinese language, i.e.," did you have had lunch? ". If the corpus to be translated into the korean language is needed, the corpus after the corpus to be translated into the korean language can be obtained based on the corresponding relation between the pre-trained intermediate corpus vector and the word of the korean language
Figure BDA0001692721880000091
Based on this, in a scene of mutual translation of multiple languages, when a translation model is constructed, only the corresponding relation between the corpus to be translated and the intermediate corpus vector needs to be trained, and the model corresponding to each preset language needs to be trained by training the intermediate corpus vector. Compared with the prior art that the language of the corpus to be translated and each preset language need to be trained, the method provided by the invention can simplify the steps of model training, has less training data, and is beneficial to reducing the memory occupation of the electronic equipment 10 and the operation resources of an operating system.
Fig. 4 is a second flowchart of a machine translation method according to an embodiment of the present invention. In this embodiment, before step S220, the machine translation method may further include a step of training a preset translation model, for example, the machine translation method may further include step S250, step S260, and step S270.
Step S250, a training corpus is obtained, which includes a plurality of training corpora.
In this embodiment, the training corpus may include words and words of a plurality of national languages, that is, the training corpus may include corpora corresponding to a plurality of languages. Each category of languages comprises a plurality of corpora as training corpora. The number of the training corpora and the number of the languages may be set according to the actual situation, which is not particularly limited.
Step S260, converting each word and/or word in the corpus into a word vector for each corpus, wherein each word vector is associated with a word or word corresponding to at least one type of preset language in advance.
Understandably, if the corpus is a complete sentence, the step S260 is to decompose the complete sentence into words; the meaning pairs of words may then be mapped to words of corresponding meaning in standard languages to form word vectors. The word vector in each standard language is associated with a word or words in a predetermined language. For example, corresponding weights may be set for words in a preset language based on word frequency, and words with higher word frequency may be preferentially used as words corresponding to word vectors.
In this embodiment, for each corpus, each word or phrase in the corpus is converted into a word vector, or each word or phrase in the corpus is converted into a word vector.
And step S270, training a preset translation model by using a training corpus and adopting a deep learning algorithm to obtain a trained translation model.
In this embodiment, the preset translation model is trained to obtain the corresponding relationship between the training corpus and the word vector, and the corresponding relationship between the word vector and the word in the preset language. Based on the corresponding relation obtained by training, when the corpus to be translated is input into the trained translation model, the translation model can analyze the corpus to be translated to obtain the corresponding relation between the word vector to be translated and the word in the intermediate corpus vector, the corresponding word can be obtained according to the intermediate corpus vector, and finally the obtained word is combined to obtain the target corpus.
Alternatively, the deep learning algorithm may be, but is not limited to, a convolutional neural network algorithm, a cyclic neural network algorithm, a deep neural network algorithm, and the like, and is not particularly limited herein.
Based on the design, in the situation of mutual translation among multiple languages, the scheme can reduce the training data of the translation model, simplify the steps of constructing the translation model, and is beneficial to reducing the memory occupation amount of the trained model in the electronic equipment 10, thereby being beneficial to reducing the consumption of system resources.
Fig. 5 is a block diagram of a speech machine translation apparatus 100 according to an embodiment of the present invention. The speech machine translation apparatus 100 can be applied to the electronic device 10 described above, and is configured to execute the steps of the machine translation method described above. In this embodiment, the speech machine translation apparatus 100 may include an acquisition conversion unit 110, an input unit 120, a first conversion unit 130, and a second conversion unit 140.
The collecting and converting unit 110 is configured to collect voice information and convert the voice information into a corpus to be translated. In this embodiment, the acquisition conversion unit 110 may be configured to execute step S210 shown in fig. 2, and the detailed description of step S210 may be referred to for specific executed operation content.
The input unit 120 is configured to input the corpus to be translated into the trained translation model. In the present embodiment, the input unit 120 may be configured to execute step S220 shown in fig. 2, and the detailed description of step S220 may be referred to for specific operation content.
The first converting unit 130 is configured to convert the corpus to be translated into an intermediate corpus vector. In the present embodiment, the first conversion unit 130 may be configured to execute step S230 shown in fig. 2, and the detailed description of step S230 may be referred to for specific operation content.
The second converting unit 140 is configured to convert the intermediate corpus vector into a target corpus corresponding to a preset language, where the preset language is different from the language corresponding to the corpus to be translated. In the present embodiment, the second conversion unit 140 may be configured to execute step S240 shown in fig. 2, and the detailed description of step S240 may be referred to for specific operation content.
Optionally, the speech machine translation apparatus 100 further includes a third conversion unit and a model training unit, before the obtaining unit obtains the corpus to be translated, the obtaining unit is further configured to obtain a training corpus, which includes a plurality of training corpora. In the present embodiment, the obtaining unit may be configured to execute step S250 shown in fig. 4, and the specific operation content may refer to the detailed description of step S250.
And the third conversion unit is used for converting each word and/or word in the training corpus into a word vector aiming at each training corpus, and each word vector is associated with a word or word corresponding to at least one type of preset language in advance. In this embodiment, the third converting unit may be configured to execute step S260 shown in fig. 4, and the detailed description of step S260 may be referred to for specific operation content.
And the model training unit is used for training the preset translation model by using the training corpus and adopting a deep learning algorithm to obtain the trained translation model. In this embodiment, the model training unit may be configured to execute step S270 shown in fig. 4, and the detailed operation content of the step S270 may be referred to.
Optionally, the first conversion unit 130 is further configured to: and converting the characters and/or words in the linguistic data to be translated into corresponding word vectors to be translated, and combining the word vectors to be translated to obtain an intermediate linguistic data vector.
Optionally, the second converting unit 140 is further configured to: matching the word vector to be translated with the word vector in the trained translation model to obtain the similarity between the word vector to be translated and the word vector; taking the word or word associated with the word vector with the maximum similarity in the training corpus in the preset language as the corresponding word or word of the word vector to be translated in the preset language; and arranging and combining the characters or words corresponding to each word vector to be translated to obtain the target corpus.
Optionally, the second conversion unit 140 is further configured to: and (4) sequencing and combining the characters or words corresponding to each word vector to be translated according to a preset rule of a preset language to obtain a target language material.
Optionally, the speech machine translation apparatus 100 further includes an audio playing unit, and after the second converting unit 140 converts the intermediate corpus vector into a target corpus corresponding to the preset language, the audio playing unit is further configured to play the target corpus in speech.
In summary, the present invention provides a speech machine translation method and apparatus. The method comprises the following steps: collecting voice information, and converting the voice information into linguistic data to be translated; inputting the linguistic data to be translated into the trained translation model; converting the linguistic data to be translated into intermediate linguistic data vectors; and converting the intermediate corpus vector into a target corpus corresponding to a preset language, wherein the preset language is different from the language corresponding to the corpus to be translated. According to the scheme, collected voice information is converted into the linguistic data to be translated, then the linguistic data to be translated is converted into the intermediate linguistic data vector, and the intermediate linguistic data vector is converted into the target linguistic data corresponding to the preset language, so that on one hand, the voice can be directly translated, on the other hand, the translation model among multiple languages is simplified and constructed, the complexity of the system is reduced, and the consumption of operation resources of the system in the translation process can be reduced.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
Alternatively, all or part of the implementation may be in software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to be performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. A method for speech machine translation, the method comprising:
collecting voice information, and converting the voice information into a corpus to be translated;
inputting the linguistic data to be translated into a trained translation model;
converting the corpus to be translated into an intermediate corpus vector, wherein the intermediate corpus vector is a combination of word vectors corresponding to characters and words in the corpus to be translated, the word vectors represent meanings of the characters and words in the corpus to be translated in a standard language, and the standard language is different from the language corresponding to the corpus to be translated and a preset language;
converting the intermediate corpus vector into a target corpus corresponding to the preset language, wherein the preset language is different from the language corresponding to the corpus to be translated;
wherein, the step of converting the corpus to be translated into an intermediate corpus vector comprises:
converting characters and/or words in the corpus to be translated into corresponding word vectors to be translated, and arranging and combining the word vectors to be translated to obtain the intermediate corpus vector;
the step of converting the intermediate corpus vector into a target corpus corresponding to a preset language includes: matching the word vector to be translated with the word vector in the trained translation model to obtain the similarity between the word vector to be translated and the word vector in the trained translation model; taking the word or the word associated with the word vector in the trained translation model with the maximum similarity in the preset language as the corresponding word or the word of the word vector to be translated in the preset language; arranging and combining characters or words corresponding to each word vector to be translated to obtain the target corpus;
the step of obtaining the target corpus by arranging and combining the characters or words corresponding to each word vector to be translated includes: and sequencing and combining the characters or words corresponding to each word vector to be translated according to the grammar of the preset language to obtain the target language material.
2. The method of claim 1, wherein the step of collecting voice information is preceded by the method comprising:
acquiring a training corpus comprising a plurality of training corpuses;
for each training corpus, converting each word and/or word in the training corpus into a word vector, wherein the word vector of each word and/or word is associated with a word or a word corresponding to at least one type of preset language in advance;
and training a preset translation model by using the training corpus and adopting a deep learning algorithm to obtain the trained translation model.
3. The method according to claim 1, wherein after the step of converting the intermediate corpus vector into the target corpus corresponding to the predetermined language, the method further comprises:
and playing the target corpus in a voice mode.
4. A speech machine translation apparatus, comprising:
the acquisition and conversion unit is used for acquiring voice information and converting the voice information into linguistic data to be translated;
the input unit is used for inputting the linguistic data to be translated into the trained translation model;
the first conversion unit is used for converting the linguistic data to be translated into an intermediate linguistic data vector, wherein the intermediate linguistic data vector is a combination of word vectors corresponding to characters and words in the linguistic data to be translated, the word vectors represent meanings of the characters and the words in a standard language in the linguistic data to be translated, and the standard language is different from the language corresponding to the linguistic data to be translated and a preset language;
a second conversion unit, configured to convert the intermediate corpus vector into a target corpus corresponding to the preset language, where the preset language is different from the language corresponding to the corpus to be translated;
the first conversion unit is further configured to: converting characters and/or words in the corpus to be translated into corresponding word vectors to be translated, and combining the word vectors to be translated to obtain the intermediate corpus vector; the second conversion unit is further configured to: matching the word vector to be translated with the word vector in the trained translation model to obtain the similarity between the word vector to be translated and the word vector; taking the word or word associated with the word vector with the maximum similarity in the preset language in the training corpus as the corresponding word or word of the word vector to be translated in the preset language; arranging and combining characters or words corresponding to each word vector to be translated to obtain the target corpus;
the second conversion unit is further configured to arrange and combine the words or phrases corresponding to each word vector to be translated according to the grammar of the preset language, so as to obtain the target corpus.
5. The speech machine translation device according to claim 4, further comprising a third conversion unit and a model training unit, wherein before the acquisition and conversion unit obtains the corpus to be translated, the input unit is further configured to obtain a training corpus, which includes a plurality of training corpora;
the third conversion unit is configured to convert, for each corpus, each word and/or word in the corpus into a word vector, where each word vector is associated with a word or word corresponding to at least one type of the preset language in advance;
and the model training unit is used for training a preset translation model by using the training corpus and adopting a deep learning algorithm to obtain the trained translation model.
CN201810598964.9A 2018-06-12 2018-06-12 Voice machine translation method and device Active CN108804427B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810598964.9A CN108804427B (en) 2018-06-12 2018-06-12 Voice machine translation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810598964.9A CN108804427B (en) 2018-06-12 2018-06-12 Voice machine translation method and device

Publications (2)

Publication Number Publication Date
CN108804427A CN108804427A (en) 2018-11-13
CN108804427B true CN108804427B (en) 2022-05-31

Family

ID=64085458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810598964.9A Active CN108804427B (en) 2018-06-12 2018-06-12 Voice machine translation method and device

Country Status (1)

Country Link
CN (1) CN108804427B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635305B (en) * 2018-12-17 2022-07-12 北京百度网讯科技有限公司 Voice translation method and device, equipment and storage medium
CN109785824B (en) * 2019-03-15 2021-04-06 科大讯飞股份有限公司 Training method and device of voice translation model
CN112767918B (en) * 2020-12-30 2023-12-01 中国人民解放军战略支援部队信息工程大学 Russian Chinese language translation method, russian Chinese language translation device and storage medium
CN113129925B (en) * 2021-04-20 2023-08-04 深圳追一科技有限公司 VC model-based mouth motion driving model training method and component
CN113241074A (en) * 2021-04-28 2021-08-10 平安科技(深圳)有限公司 Training method, device and equipment of multi-language translation model and readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8805669B2 (en) * 2010-07-13 2014-08-12 Dublin City University Method of and a system for translation
CN103605644B (en) * 2013-12-02 2017-02-01 哈尔滨工业大学 Pivot language translation method and device based on similarity matching
CN104391842A (en) * 2014-12-18 2015-03-04 苏州大学 Translation model establishing method and system
CN105068998B (en) * 2015-07-29 2017-12-15 百度在线网络技术(北京)有限公司 Interpretation method and device based on neural network model

Also Published As

Publication number Publication date
CN108804427A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN108804427B (en) Voice machine translation method and device
CN113505205B (en) Man-machine dialogue system and method
CN110349572B (en) Voice keyword recognition method and device, terminal and server
CN108711420B (en) Multilingual hybrid model establishing method, multilingual hybrid model establishing device, multilingual hybrid model data obtaining device and electronic equipment
CN110287461B (en) Text conversion method, device and storage medium
US10592607B2 (en) Iterative alternating neural attention for machine reading
CN110930980B (en) Acoustic recognition method and system for Chinese and English mixed voice
CN110569354B (en) Barrage emotion analysis method and device
CN110942763B (en) Speech recognition method and device
CN111428010A (en) Man-machine intelligent question and answer method and device
CN109117474B (en) Statement similarity calculation method and device and storage medium
US20220358297A1 (en) Method for human-machine dialogue, computing device and computer-readable storage medium
CN112287085B (en) Semantic matching method, system, equipment and storage medium
CN110895656B (en) Text similarity calculation method and device, electronic equipment and storage medium
CN110019691A (en) Conversation message treating method and apparatus
CN110795541A (en) Text query method and device, electronic equipment and computer readable storage medium
JP6782329B1 (en) Emotion estimation device, emotion estimation system, and emotion estimation method
CN112307754A (en) Statement acquisition method and device
CN108874786B (en) Machine translation method and device
CN116913278B (en) Voice processing method, device, equipment and storage medium
CN113868377A (en) Similarity combined model training and semantic matching method and device
CN111444321A (en) Question answering method, device, electronic equipment and storage medium
CN111401070B (en) Word meaning similarity determining method and device, electronic equipment and storage medium
CN109002498B (en) Man-machine conversation method, device, equipment and storage medium
CN114881008B (en) Text generation method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant