WO2016165590A1 - Speech translation method and device - Google Patents

Speech translation method and device Download PDF

Info

Publication number
WO2016165590A1
WO2016165590A1 PCT/CN2016/078895 CN2016078895W WO2016165590A1 WO 2016165590 A1 WO2016165590 A1 WO 2016165590A1 CN 2016078895 W CN2016078895 W CN 2016078895W WO 2016165590 A1 WO2016165590 A1 WO 2016165590A1
Authority
WO
WIPO (PCT)
Prior art keywords
language
voiceprint feature
voice data
extracted
module
Prior art date
Application number
PCT/CN2016/078895
Other languages
French (fr)
Chinese (zh)
Inventor
张丽竹
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016165590A1 publication Critical patent/WO2016165590A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language

Definitions

  • the present invention relates to the field of voice translation technology, and in particular, to a voice translation method and apparatus.
  • the main purpose of the embodiments of the present invention is to provide a voice translation method and device, which aims to solve the problem that the existing voice translation software or device cannot accurately distinguish different languages through voice recognition, thereby causing low communication efficiency.
  • a voice translation method provided by an embodiment of the present invention includes the following steps:
  • the step of determining a language category corresponding to the extracted voiceprint feature comprises:
  • the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, determining that the language category corresponding to the extracted voiceprint feature is the first language;
  • the extracted voiceprint feature does not match the pre-stored voiceprint feature of the first language, it is determined that the language category corresponding to the extracted voiceprint feature is the second language.
  • the step of converting the first voice data from the first language to the second voice data corresponding to the second language comprises:
  • the second text data is synthesized into the second voice data.
  • the method further includes: outputting the second voice data.
  • the method before the step of extracting the voiceprint feature of the first voice data when the first voice data is received, the method further includes:
  • an embodiment of the present invention further provides a voice translation apparatus, including:
  • An extracting module configured to extract a voiceprint feature of the first voice data when the first voice data is received
  • Determining a module configured to determine a language category corresponding to the extracted voiceprint feature
  • Obtaining a module configured to acquire a pre-stored second language when the language category corresponding to the extracted voiceprint feature is the first language
  • the conversion module is configured to convert the first voice data from the first language to the second voice data corresponding to the second language.
  • the determining module includes a determining unit and a determining unit,
  • the determining unit is configured to determine whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language
  • the determining unit is configured to: when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, determine that the language category corresponding to the extracted voiceprint feature is the first language; When the voiceprint feature does not match the pre-stored voiceprint feature of the first language, it is determined that the language category corresponding to the extracted voiceprint feature is the second language.
  • the conversion module includes a conversion unit, a translation unit, and a synthesis unit.
  • the converting unit is configured to convert the first voice data into first text data corresponding to the first language according to a first language
  • Translating unit configured to translate the first text data into second text data corresponding to the second language
  • the synthesizing unit is configured to synthesize the second text data into the second voice data.
  • the speech translation apparatus further includes an output module configured to output the second voice data.
  • the voice translation device further includes a receiving module, a providing module, and a saving module.
  • the receiving module is configured to receive a setting instruction of the first language and the second language
  • the providing module is configured to provide a selection interface of a language category according to the setting instruction, so that the user selects the first language and the second language;
  • the saving module is configured to save the first language and the second language when the user selects the first language and the second language; and further configured to save the voiceprint feature of the first language;
  • the extraction module is further configured to extract a voiceprint feature of the first language corresponding voice data.
  • the embodiment of the present invention extracts voice data corresponding to the voiceprint corresponding to the voice data, and determines a language category corresponding to the extracted voiceprint feature, and the language category corresponding to the extracted voiceprint feature is And acquiring a pre-stored second language; converting the first voice data from the first language to the second voice data corresponding to the second language. Achieve accurate communication between different languages and automatically convert the voice of one language into the voice of another language, thereby improving the effectiveness of communication.
  • FIG. 1 is a schematic flow chart of a first embodiment of a voice translation method according to the present invention
  • FIG. 2 is a schematic flow chart of an embodiment of step S40 in FIG. 1;
  • FIG. 3 is a schematic flow chart of a second embodiment of a voice translation method according to the present invention.
  • FIG. 4 is a schematic flow chart of a third embodiment of a voice translation method according to the present invention.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a speech translation apparatus according to the present invention.
  • FIG. 6 is a schematic diagram of a refinement function module of an embodiment of the determining module of FIG. 5;
  • FIG. 7 is a schematic diagram of a refinement function module of an embodiment of the conversion module of FIG. 5;
  • FIG. 8 is a schematic diagram of functional modules of a second embodiment of a speech translation apparatus according to the present invention.
  • the main solution of the embodiment of the present invention is: extracting a voiceprint feature of the first voice data when receiving the first voice data; determining a language category corresponding to the extracted voiceprint feature; and extracting the voiceprint feature When the corresponding language category is the first language, acquiring the pre-stored second language; converting the first voice data from the first language to the second voice data corresponding to the second language.
  • the present invention provides a speech translation method.
  • FIG. 1 is a schematic flowchart diagram of a first embodiment of a voice translation method according to the present invention.
  • the speech translation method comprises:
  • Step S10 extracting a voiceprint feature of the first voice data when receiving the first voice data
  • the voice data is received in real time, and the voiceprint feature is extracted from the received voice data.
  • the voiceprint feature extraction may be extracted during the session, and may be different according to different language selections, such as dialect or medium in the language. English recognition, etc., can also focus on extracting the accent and pronunciation of the speaker.
  • the extraction of the voiceprint feature may be performed by pre-processing the first voice data, the pre-processing is to sample, quantize, pre-emphasize, window, etc. the first voice data, and the original first voice
  • the data is converted into an N-dimensional feature vector to extract the voiceprint features of the first voice data.
  • the manner of receiving the first voice data may be received by a microphone or received by a Bluetooth headset, and the like, and is not limited to other receiving modes.
  • Step S20 determining a language category corresponding to the extracted voiceprint feature
  • a voiceprint model is established according to the extracted voiceprint feature, and it is determined whether the voiceprint model matches the voiceprint model of the pre-stored language category.
  • the voiceprint feature model may select different voiceprint feature models according to different language settings, and appropriately increase the proportion of certain voiceprint features associated with a particular language.
  • Step S30 acquiring a pre-stored second language when the language category corresponding to the extracted voiceprint feature is the first language
  • the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language. Acquiring another voice in the conversation scene as a second language when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language; and extracting the voiceprint feature and the pre-stored first language sound When the pattern features do not match, it is determined that the language category corresponding to the extracted voiceprint feature is the second language. Taking a Chinese and English conversation scene as an example, in the conversation scene, the first language is Chinese and the second language is English. After extracting the voiceprint features of the voice data, it is determined whether the extracted voiceprint features are pre-stored. Chinese voiceprint feature matching.
  • the extracted voiceprint feature matches the pre-existing Chinese voiceprint feature, it is determined that the language category corresponding to the extracted voiceprint feature is Chinese, and then another voice in the dialogue scenario is English.
  • the voiceprint feature does not match the pre-stored Chinese voiceprint feature, the voiceprint feature corresponds to the language category, and the other voice in the dialogue scenario is Chinese.
  • Step S40 converting the first voice data from the first language to the second voice data corresponding to the second language.
  • the cloud server After determining the first language and the second language, transmitting the first language, the second language, and the first voice data to the cloud server, for the cloud server to process the first voice data, according to the first language
  • the first voice data is converted into second voice data corresponding to the second language.
  • the processing of the received voice data can also be partially in the cloud server. Processing, partially processed locally.
  • the process of converting the first voice data from the first language to the second voice data corresponding to the second language may be:
  • Step S41 converting the first voice data into the first text data corresponding to the first language according to the first language;
  • Step S42 translating the first text data into second text data corresponding to the second language
  • Step S43 synthesizing the second text data into the second voice data.
  • the first language is Chinese
  • the second language is English.
  • the Chinese voice data is converted into Chinese text data according to Chinese
  • the Chinese text data is Translating into English text data
  • displaying Chinese text data and English text data converted into an interface and finally synthesizing the English text data into English voice data.
  • the voiceprint feature of the first voice data is extracted; the language category corresponding to the extracted voiceprint feature is determined; and the language category corresponding to the extracted voiceprint feature is the first And acquiring a pre-stored second language; converting the first voice data from the first language to the second voice data corresponding to the second language. It realizes the accurate distinction between different languages through speech recognition, and automatically converts the voice of one language into the voice of another language, thereby improving the effectiveness of communication.
  • FIG. 3 is a schematic flowchart diagram of a second embodiment of a voice translation method according to the present invention. Based on the first embodiment of the foregoing method, the step S20 includes:
  • Step S21 determining whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language
  • Step S22 determining that the language category corresponding to the extracted voiceprint feature is the first language when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language;
  • Step S23 When the extracted voiceprint feature does not match the pre-stored voiceprint feature of the first language, determine that the language category corresponding to the extracted voiceprint feature is the second language.
  • the language category corresponding to the first voice data It is the first language, and the second language is another voice in the conversation scene. Otherwise, the language category corresponding to the first voice data is the second language.
  • the first language and the second language are acquired, the first language and the second language are displayed for the user to discern whether the first language and the second language are erroneous.
  • the manner of displaying the first language and the second language may be a voice broadcast of the current first language and the second language, highlighting the current first language and the second language, etc., according to user needs and/or system performance. Settings.
  • the method further includes:
  • Step S50 outputting the second voice data.
  • the outputting the second voice data may be directly output through a speaker or a headphone output, according to a user's needs and/or performance settings of the system.
  • the embodiment determines whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language; and determines the voiceprint feature when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language.
  • the corresponding language category is the first language.
  • the language category corresponding to the voiceprint feature is determined by the voiceprint feature, the accuracy of the recognition is improved, and the effectiveness of communication is further improved.
  • FIG. 4 is a schematic flowchart diagram of a third embodiment of a voice translation method according to the present invention. Based on the first embodiment of the foregoing method, before the step S10, the method further includes:
  • Step S60 receiving a setting instruction of the first language and the second language
  • Step S70 providing a selection interface of a language category according to the setting instruction, so that the user selects the first language and the second language;
  • Step S80 saving the first language and the second language when the user selects the first language and the second language
  • Step S90 extracting a voiceprint feature of the first language corresponding voice data, and saving the voiceprint feature.
  • the setting instruction for receiving the first language and the second language may provide a selection interface of the language category according to the setting instruction when the setting instruction of the first language and the second language is received at the initial stage of the dialogue, for the user to select a first language and a second language; saving the first language and the second language when the user selects the first language and the second language.
  • the first language and the second language can also be selected by voice, according to the needs of the user and/or the performance of the system. After saving the first language and the second language, receiving the first voice data corresponding to the first language, extracting a voiceprint feature of the first voice data, and saving the voiceprint feature.
  • the first and second languages may be Chinese, English, etc., or may be based on a geographical name, such as Guangdong, Canada, etc., if the geographical name is set, the voiceprint feature corresponding to the local primary language category may be pre-stored locally.
  • the voice translation method may further be: in a multi-language conference, for example, there are four languages: A, B, C, and D.
  • a multi-language conference for example, there are four languages: A, B, C, and D.
  • an interface is provided for the user to select his or her own language. After the user selects his or her own language, it is transmitted to the cloud server through the Bluetooth or Wi-Fi of the transmission module.
  • the voiceprint features corresponding to the four languages A, B, C, and D and the four languages are pre-stored in the cloud server.
  • the voiceprint feature of the voice data is extracted to determine whether the extracted voiceprint feature matches the voiceprint feature of the pre-stored language category.
  • the language category corresponding to the extracted voiceprint feature is determined. Is the A language.
  • the received voice data is converted into the A text data corresponding to the A language, and then the A text data is translated into the B text data, the C text data, the D text data, and the B text data is converted into the B voice data, the C text.
  • the data is converted into C voice data, and the D text data is converted into D voice data, and finally transmitted to the speaker or earphone of the user corresponding to the B, C, and D languages through the Bluetooth or Wi-Fi of the transmission module.
  • the voiceprint feature of the voice data may be extracted, according to the voiceprint feature of the first language.
  • the correspondence relationship of the first language can determine the language category corresponding to the voiceprint feature, and accurately distinguish different languages through voice recognition, thereby improving the effectiveness of communication.
  • the execution bodies of the speech translation methods of the above first to third embodiments may each be a speech translation device or a translation device that is coupled to a speech translation device. Still further, the speech translation method can be implemented by a client translation program installed on a speech translation device or device, including but not limited to a mobile phone, a pad, a notebook computer, and the like.
  • the invention further provides a speech translation device.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a speech translation apparatus according to the present invention.
  • the speech translation apparatus comprises: an extraction module 10, a determination module 20, an acquisition module 30, and a conversion module 40.
  • the extracting module 10 is configured to extract a voiceprint feature of the first voice data when the first voice data is received;
  • the voice data is received in real time, and the voiceprint feature is extracted from the received voice data.
  • the voiceprint feature extraction may be extracted during the session, and may be different according to different language selections, such as dialect or medium in the language. English recognition, etc., can also focus on extracting the accent and pronunciation of the speaker.
  • the extraction of the voiceprint feature may be performed by pre-processing the first voice data, the pre-processing is to sample, quantize, pre-emphasize, window, etc. the first voice data, and the original first voice
  • the data is converted into an N-dimensional feature vector to extract the voiceprint features of the first voice data.
  • the manner of receiving the first voice data may be received by a microphone or received by a Bluetooth headset, and the like, and is not limited to other receiving modes.
  • the determining module 20 is configured to determine a language category corresponding to the extracted voiceprint feature
  • a voiceprint model is established according to the extracted voiceprint feature, and it is determined whether the voiceprint model matches the voiceprint model of the pre-stored language category.
  • the voiceprint feature model may select different voiceprint feature models according to different language settings, and appropriately increase the proportion of certain voiceprint features associated with a particular language.
  • the determining module 20 includes a determining unit 21 and a determining unit 22,
  • the determining unit 21 is configured to determine whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language
  • the determining unit 22 is configured to: when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, determine that the language category corresponding to the extracted voiceprint feature is the first language; When the voiceprint feature does not match the pre-stored voiceprint feature of the first language, it is determined that the language category corresponding to the extracted voiceprint feature is the second language.
  • the language category corresponding to the first voice data It is the first language, and the second language is another voice in the conversation scene. Otherwise, the language category corresponding to the first voice data is the second language.
  • the first language and the second language are acquired, the first language and the second language are displayed for the user to discern whether the first language and the second language are erroneous.
  • the manner of displaying the first language and the second language may be a voice broadcast of the current first language and the second language, highlighting the current first language and the second language, etc., according to user needs and/or system performance. Settings.
  • the obtaining module 30 is configured to acquire a pre-stored second language when the language category corresponding to the extracted voiceprint feature is the first language;
  • the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language. Acquiring another voice in the conversation scene as a second language when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language; and extracting the voiceprint feature and the pre-stored first language sound When the pattern features do not match, it is determined that the language category corresponding to the extracted voiceprint feature is the second language. Taking a Chinese and English conversation scene as an example, in the conversation scene, the first language is Chinese and the second language is English. After extracting the voiceprint features of the voice data, it is determined whether the extracted voiceprint features are pre-stored. Chinese voiceprint feature matching.
  • the extracted voiceprint feature matches the pre-existing Chinese voiceprint feature, it is determined that the language category corresponding to the extracted voiceprint feature is Chinese, and then another voice in the dialogue scenario is English.
  • the extracted voiceprint feature does not match the pre-stored Chinese voiceprint feature, the extracted voiceprint feature corresponds to the language category, and the other voice in the dialogue scenario is Chinese.
  • the conversion module 40 is configured to convert the first voice data from the first language to the second voice data corresponding to the second language.
  • the processing of the received voice data can also be partially processed in the cloud server, and partially processed locally.
  • the conversion module 40 includes a conversion unit 41, a translation unit 42, and a synthesis unit 43,
  • the converting unit 41 is configured to convert the first voice data into first text data corresponding to the first language according to a first language
  • the translating unit 42 is configured to translate the first text data into second text data corresponding to the second language
  • the synthesizing unit 43 is configured to synthesize the second text data into the second voice data.
  • the first language is Chinese
  • the second language is English.
  • the Chinese voice data is converted into Chinese text data according to Chinese
  • the Chinese text data is Translating into English text data
  • displaying Chinese text data and English text data converted into an interface and finally synthesizing the English text data into English voice data.
  • the voiceprint feature of the first voice data is extracted; the language category corresponding to the extracted voiceprint feature is determined; and the language category corresponding to the extracted voiceprint feature is determined to be the first language. And acquiring the pre-stored second language; converting the first voice data from the first language to the second voice data corresponding to the second language. Realize the accuracy of communication by accurately distinguishing different languages through speech recognition.
  • FIG. 8 is a schematic diagram of functional modules of a second embodiment of a speech translation apparatus according to the present invention.
  • the voice translation apparatus of this embodiment further includes an output module 50, a receiving module 60, a providing module 70, and a saving module 80.
  • the output module 50 is configured to output the second voice data.
  • the outputting the second voice data may be directly output through a speaker or a headphone output, according to a user's needs and/or performance settings of the system.
  • the receiving module 60 is configured to receive a setting instruction of the first language and the second language
  • the providing module 70 is configured to provide a selection interface of a language category according to the setting instruction, so that the user selects the first language and the second language;
  • the saving module 80 is configured to save the first language and the second language when the user selects the first language and the second language, and is further configured to save the voiceprint feature of the first language;
  • the extraction module 10 is further configured to extract voiceprint features of the first language corresponding voice data.
  • the setting instruction for receiving the first language and the second language may provide a selection interface of the language category according to the setting instruction when the setting instruction of the first language and the second language is received at the initial stage of the dialogue, for the user to select a first language and a second language; saving the first language and the second language when the user selects the first language and the second language.
  • the first language and the second language can also be selected by voice, according to the needs of the user and/or the performance of the system. After saving the first language and the second language, receiving the first voice data corresponding to the first language, extracting a voiceprint feature of the first voice data, and saving the voiceprint feature.
  • the first and second languages may be Chinese, English, etc., or may be based on a geographical name, such as Guangdong, Canada, etc., if the geographical name is set, the voiceprint feature corresponding to the local primary language category may be pre-stored locally.
  • the voice translation method may further be: in a multi-language conference, for example, there are four languages: A, B, C, and D.
  • a multi-language conference for example, there are four languages: A, B, C, and D.
  • an interface is provided for the user to select his or her own language. After the user selects their own language, Transfer to the cloud server via Bluetooth or Wi-Fi of the transmission module.
  • the voiceprint features corresponding to the four languages A, B, C, and D and the four languages are pre-stored in the cloud server.
  • the voiceprint feature of the voice data is extracted to determine whether the extracted voiceprint feature matches the voiceprint feature of the pre-stored language category.
  • the language category corresponding to the extracted voiceprint feature is determined. Is the A language.
  • Obtain pre-stored B, C, and D languages from the cloud server convert the received voice data into A-text data corresponding to the A language according to the A language, and then translate the A-text data into B-text data, C-text data, and D.
  • Text data convert B text data into B voice data, convert C text data into C voice data, convert D text data into D voice data, and finally transmit to B, C, D through Bluetooth or Wi-Fi of the transmission module.
  • the language of the user's speaker or headset is Obtain pre-stored B, C, and D languages from the cloud server, convert the received voice data into A-text data corresponding to the A language according to the A language, and then translate the A-text data into B-text data, C-text data, and D.
  • Text data convert B text data into B voice data, convert C text data into C voice data, convert D text data into D voice data, and finally transmit to B, C, D through Bluetooth or Wi-
  • the voiceprint feature of the voice data may be extracted, according to the voiceprint feature of the first language.
  • the correspondence relationship of the first language can determine the language category corresponding to the voiceprint feature, accurately distinguish different languages, and improve the effectiveness of communication.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • a storage medium such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
  • the foregoing embodiments of the present invention can be applied to the field of voice translation technology, and solve the problem that the existing voice translation software or device cannot accurately distinguish different languages through voice recognition, thereby causing low communication efficiency, and accurately distinguishing different languages. And automatically convert the voice of one language into the voice of another language, thereby improving the effectiveness of communication.

Abstract

A speech translation method and device. The method comprises the steps of: when first speech data is received, extracting a voice print characteristic of the first speech data (S10); determining a language type corresponding to the extracted voice print characteristic (S20); when the language type corresponding to the extracted voice print characteristic is a first language, acquiring a pre-stored second language (S30); and converting the first speech data from the first language into second speech data corresponding to the second language (S40). In the method, different languages are differentiated by extracting voice print characteristics, and speech of a language is automatically converted into speech of another language, thereby improving the effectiveness of communication.

Description

语音翻译方法及装置Speech translation method and device 技术领域Technical field
本发明涉及语音翻译技术领域,尤其涉及语音翻译方法及装置。The present invention relates to the field of voice translation technology, and in particular, to a voice translation method and apparatus.
背景技术Background technique
当与使用不同语言的人沟通时,为了直接的、有效的沟通交流,结合语音识别、翻译以及语音合成技术已能将一种语言的语音转换成另一种语言的语音,虽然目前语音识别技术已拥有对多数语言的识别模型,但是现有的语音翻译软件或设备在沟通前都需要用户手动切换源语言和目标语言来进行相应的语音识别和翻译,无法通过语音识别来准确区别不同语言,进而导致沟通效率低。When communicating with people who use different languages, in order to communicate directly and effectively, combined with speech recognition, translation and speech synthesis technology, it is possible to convert the speech of one language into the speech of another language, although the current speech recognition technology There are already recognition models for most languages, but existing voice translation software or devices require users to manually switch the source language and target language to perform corresponding speech recognition and translation before communication. It is impossible to accurately distinguish different languages by speech recognition. This in turn leads to inefficient communication.
上述内容仅用于辅助理解本发明的技术方案,并不代表承认上述内容是现有技术。The above content is only used to assist in understanding the technical solutions of the present invention, and does not constitute an admission that the above is prior art.
发明内容Summary of the invention
本发明实施例的主要目的在于提供一种语音翻译方法及装置,旨在解决现有的语音翻译软件或设备无法通过语音识别来准确区别不同语言,进而导致沟通效率低的问题。The main purpose of the embodiments of the present invention is to provide a voice translation method and device, which aims to solve the problem that the existing voice translation software or device cannot accurately distinguish different languages through voice recognition, thereby causing low communication efficiency.
为实现上述目的,本发明实施例提供的一种语音翻译方法,包括步骤:To achieve the above objective, a voice translation method provided by an embodiment of the present invention includes the following steps:
在接收到第一语音数据时,提取所述第一语音数据的声纹特征;Extracting a voiceprint feature of the first voice data when the first voice data is received;
确定所提取的声纹特征对应的语言类别;Determining a language category corresponding to the extracted voiceprint feature;
在所提取的声纹特征对应的语言类别是第一语言时,获取预存的第二语言;Obtaining a pre-stored second language when the language category corresponding to the extracted voiceprint feature is the first language;
将所述第一语音数据由第一语言转换成第二语言对应的第二语音数据。Converting the first voice data from a first language to second voice data corresponding to a second language.
优选地,所述确定所提取的声纹特征对应的语言类别的步骤包括:Preferably, the step of determining a language category corresponding to the extracted voiceprint feature comprises:
判断所提取的声纹特征是否与预存的第一语言的声纹特征匹配;Determining whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language;
在所提取的声纹特征与预存的第一语言的声纹特征匹配时,确定所提取的声纹特征对应的语言类别是第一语言;When the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, determining that the language category corresponding to the extracted voiceprint feature is the first language;
在所提取的声纹特征与预存的第一语言的声纹特征不匹配时,确定所提取的声纹特征对应的语言类别是第二语言。When the extracted voiceprint feature does not match the pre-stored voiceprint feature of the first language, it is determined that the language category corresponding to the extracted voiceprint feature is the second language.
优选地,所述将所述第一语音数据由第一语言转换成所述第二语言对应的第二语音数据的步骤包括: Preferably, the step of converting the first voice data from the first language to the second voice data corresponding to the second language comprises:
根据第一语言将所述第一语音数据转换成所述第一语言对应的第一文本数据;Converting the first voice data into first text data corresponding to the first language according to a first language;
将所述第一文本数据翻译成所述第二语言对应的第二文本数据;Translating the first text data into second text data corresponding to the second language;
将所述第二文本数据合成第二语音数据。The second text data is synthesized into the second voice data.
优选地,所述将所述第一语音数据由第一语言转换成所述第二语言对应的第二语音数据的步骤之后,还包括:输出所述第二语音数据。Preferably, after the step of converting the first voice data from the first language to the second voice data corresponding to the second language, the method further includes: outputting the second voice data.
优选地,所述在接收到第一语音数据时,提取所述第一语音数据的声纹特征的步骤之前,还包括:Preferably, before the step of extracting the voiceprint feature of the first voice data when the first voice data is received, the method further includes:
接收第一语言和第二语言的设置指令;Receiving a setting instruction of the first language and the second language;
根据所述设置指令提供语言类别的选择界面,以供用户选择第一语言和第二语言;Providing a selection interface of a language category according to the setting instruction, for the user to select the first language and the second language;
在所述用户选择第一语言和第二语言时,保存所述第一语言和第二语言;Saving the first language and the second language when the user selects the first language and the second language;
提取所述第一语言对应语音数据的声纹特征,并保存所述声纹特征。Extracting a voiceprint feature of the first language corresponding voice data, and saving the voiceprint feature.
此外,为实现上述目的,本发明实施例还提供一种语音翻译装置,包括:In addition, in order to achieve the above object, an embodiment of the present invention further provides a voice translation apparatus, including:
提取模块,设置为在接收到第一语音数据时,提取所述第一语音数据的声纹特征;An extracting module, configured to extract a voiceprint feature of the first voice data when the first voice data is received;
确定模块,设置为确定所提取的声纹特征对应的语言类别;Determining a module, configured to determine a language category corresponding to the extracted voiceprint feature;
获取模块,设置为在所提取的声纹特征对应的语言类别是第一语言时,获取预存的第二语言;Obtaining a module, configured to acquire a pre-stored second language when the language category corresponding to the extracted voiceprint feature is the first language;
转换模块,设置为将所述第一语音数据由第一语言转换成第二语言对应的第二语音数据。The conversion module is configured to convert the first voice data from the first language to the second voice data corresponding to the second language.
优选地,所述确定模块包括判断单元和确定单元,Preferably, the determining module includes a determining unit and a determining unit,
所述判断单元,设置为判断所提取的声纹特征是否与预存的第一语言的声纹特征匹配;The determining unit is configured to determine whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language;
所述确定单元,设置为在所提取的声纹特征与预存的第一语言的声纹特征匹配时,确定所提取的声纹特征对应的语言类别是第一语言;还设置为在所提取的声纹特征与预存的第一语言的声纹特征不匹配时,确定所提取的声纹特征对应的语言类别是第二语言。The determining unit is configured to: when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, determine that the language category corresponding to the extracted voiceprint feature is the first language; When the voiceprint feature does not match the pre-stored voiceprint feature of the first language, it is determined that the language category corresponding to the extracted voiceprint feature is the second language.
优选地,所述转换模块包括转换单元、翻译单元和合成单元,Preferably, the conversion module includes a conversion unit, a translation unit, and a synthesis unit.
所述转换单元,设置为根据第一语言将所述第一语音数据转换成所述第一语言对应的第一文本数据;The converting unit is configured to convert the first voice data into first text data corresponding to the first language according to a first language;
所述翻译单元,设置为将所述第一文本数据翻译成所述第二语言对应的第二文本数据;Translating unit, configured to translate the first text data into second text data corresponding to the second language;
所述合成单元,设置为将所述第二文本数据合成第二语音数据。The synthesizing unit is configured to synthesize the second text data into the second voice data.
优选地,所述语音翻译装置还包括输出模块,设置为输出所述第二语音数据。 Preferably, the speech translation apparatus further includes an output module configured to output the second voice data.
优选地,所述语音翻译装置还包括接收模块、提供模块和保存模块,Preferably, the voice translation device further includes a receiving module, a providing module, and a saving module.
所述接收模块,设置为接收第一语言和第二语言的设置指令;The receiving module is configured to receive a setting instruction of the first language and the second language;
所述提供模块,设置为根据所述设置指令提供语言类别的选择界面,以供用户选择第一语言和第二语言;The providing module is configured to provide a selection interface of a language category according to the setting instruction, so that the user selects the first language and the second language;
所述保存模块,设置为在所述用户选择第一语言和第二语言时,保存所述第一语言和第二语言;还设置为保存所述第一语言的声纹特征;The saving module is configured to save the first language and the second language when the user selects the first language and the second language; and further configured to save the voiceprint feature of the first language;
所述提取模块,还设置为提取所述第一语言对应语音数据的声纹特征。The extraction module is further configured to extract a voiceprint feature of the first language corresponding voice data.
相对现有技术,本发明实施例通过接收语音数据,提取所述语音数据对应的声纹特征,确定所提取的声纹特征对应的语言类别,在所提取的声纹特征对应的语言类别是第一语言时,获取预存的第二语言;将所述第一语音数据由第一语言转换成第二语言对应的第二语音数据。实现准确区别不同语言,并自动将一种语言的语音转换成另一种语言的语音,进而提高沟通的有效性。Compared with the prior art, the embodiment of the present invention extracts voice data corresponding to the voiceprint corresponding to the voice data, and determines a language category corresponding to the extracted voiceprint feature, and the language category corresponding to the extracted voiceprint feature is And acquiring a pre-stored second language; converting the first voice data from the first language to the second voice data corresponding to the second language. Achieve accurate communication between different languages and automatically convert the voice of one language into the voice of another language, thereby improving the effectiveness of communication.
附图说明DRAWINGS
图1为本发明语音翻译方法的第一实施例的流程示意图;1 is a schematic flow chart of a first embodiment of a voice translation method according to the present invention;
图2为图1中步骤S40一实施例的细化流程示意图;2 is a schematic flow chart of an embodiment of step S40 in FIG. 1;
图3为本发明语音翻译方法的第二实施例的流程示意图;3 is a schematic flow chart of a second embodiment of a voice translation method according to the present invention;
图4为本发明语音翻译方法的第三实施例的流程示意图;4 is a schematic flow chart of a third embodiment of a voice translation method according to the present invention;
图5为本发明语音翻译装置的第一实施例的功能模块示意图;FIG. 5 is a schematic diagram of functional modules of a first embodiment of a speech translation apparatus according to the present invention; FIG.
图6为图5中确定模块一实施例的细化功能模块示意图;6 is a schematic diagram of a refinement function module of an embodiment of the determining module of FIG. 5;
图7为图5中转换模块一实施例的细化功能模块示意图;7 is a schematic diagram of a refinement function module of an embodiment of the conversion module of FIG. 5;
图8为本发明语音翻译装置的第二实施例的功能模块示意图。FIG. 8 is a schematic diagram of functional modules of a second embodiment of a speech translation apparatus according to the present invention.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features, and advantages of the present invention will be further described in conjunction with the embodiments.
具体实施方式detailed description
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
本发明实施例的主要解决方案是:在接收到第一语音数据时,提取所述第一语音数据的声纹特征;确定所提取的声纹特征对应的语言类别;在所提取的声纹特征对应的语言类别是第一语言时,获取预存的第二语言;将所述第一语音数据由第一语言转换成第二语言对应的第二语音数据。有效避免现有的语音翻译软件或设备无法通过语音识别来准确区别不同语言, 进而导致沟通效率低的问题。实现通过语音识别准确地区别不同语言,并自动将一种语言的语音转换成另一种语言的语音,进而提高沟通的有效性。The main solution of the embodiment of the present invention is: extracting a voiceprint feature of the first voice data when receiving the first voice data; determining a language category corresponding to the extracted voiceprint feature; and extracting the voiceprint feature When the corresponding language category is the first language, acquiring the pre-stored second language; converting the first voice data from the first language to the second voice data corresponding to the second language. Effectively avoid existing speech translation software or devices that cannot accurately distinguish different languages through speech recognition. This in turn leads to problems of low communication efficiency. It realizes the accurate distinction between different languages through speech recognition, and automatically converts the speech of one language into the speech of another language, thereby improving the effectiveness of communication.
由于现有的语音翻译软件或设备无法通过语音识别来准确区别不同语言,进而导致沟通效率低。Since existing speech translation software or devices cannot accurately distinguish different languages through speech recognition, communication efficiency is low.
基于上述问题,本发明提供一种语音翻译方法。Based on the above problems, the present invention provides a speech translation method.
参照图1,图1为本发明语音翻译方法的第一实施例的流程示意图。Referring to FIG. 1, FIG. 1 is a schematic flowchart diagram of a first embodiment of a voice translation method according to the present invention.
在一实施例中,所述语音翻译方法包括:In an embodiment, the speech translation method comprises:
步骤S10,在接收到第一语音数据时,提取所述第一语音数据的声纹特征;Step S10, extracting a voiceprint feature of the first voice data when receiving the first voice data;
实时接收语音数据,对接收到的语音数据进行声纹特征提取,所述声纹特征的提取可以在会话过程中提取,可以根据选择语言的不同而侧重点不同,如语言中有方言或中、英文识别等,也可以侧重提取辨别说话人的口音、发音方式等。所述声纹特征的提取可以通过对所述第一语音数据进行预处理,所述预处理是对所述第一语音数据进行采样、量化、预加重和加窗等,将原始的第一语音数据转化成N维的特征矢量,从而提取到所述第一语音数据的声纹特征。所述接收第一语音数据的方式可以通过麦克风接收或蓝牙耳机接收等不限于其他接收方式。The voice data is received in real time, and the voiceprint feature is extracted from the received voice data. The voiceprint feature extraction may be extracted during the session, and may be different according to different language selections, such as dialect or medium in the language. English recognition, etc., can also focus on extracting the accent and pronunciation of the speaker. The extraction of the voiceprint feature may be performed by pre-processing the first voice data, the pre-processing is to sample, quantize, pre-emphasize, window, etc. the first voice data, and the original first voice The data is converted into an N-dimensional feature vector to extract the voiceprint features of the first voice data. The manner of receiving the first voice data may be received by a microphone or received by a Bluetooth headset, and the like, and is not limited to other receiving modes.
步骤S20,确定所提取的声纹特征对应的语言类别;Step S20, determining a language category corresponding to the extracted voiceprint feature;
根据提取到的声纹特征建立声纹模型,判断所述声纹模型是否与预存的语言类别的声纹模型匹配。所述声纹特征模型可以根据设置语言的不同,选择不同的声纹特征模型,适当增加与特定语种相关的某些声纹特征比重。A voiceprint model is established according to the extracted voiceprint feature, and it is determined whether the voiceprint model matches the voiceprint model of the pre-stored language category. The voiceprint feature model may select different voiceprint feature models according to different language settings, and appropriately increase the proportion of certain voiceprint features associated with a particular language.
步骤S30,在所提取的声纹特征对应的语言类别是第一语言时,获取预存的第二语言;Step S30, acquiring a pre-stored second language when the language category corresponding to the extracted voiceprint feature is the first language;
判断所提取的声纹特征是否与预存的第一语言的声纹特征匹配。在所提取的声纹特征与预存的第一语言的声纹特征匹配时,获取该对话场景中的另一种语音作为第二语言;在所提取的声纹特征与预存的第一语言的声纹特征不匹配时,判断所提取的声纹特征对应的语言类别是第二语言。以一中文和英文的对话场景为例,在所述对话场景中第一语言为中文,第二语言为英文,在提取语音数据的声纹特征后,判断所提取的声纹特征是否与预存的中文的声纹特征匹配。在所提取的声纹特征与预存的中文的声纹特征匹配时,判断所提取的声纹特征对应的语言类别是中文,那么所述对话场景中另一种语音即为英文。在所提取的声纹特征与预存的中文的声纹特征不匹配时,所述声纹特征对应的语言类别是英文,那么所述对话场景中另一种语音即为中文。It is determined whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language. Acquiring another voice in the conversation scene as a second language when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language; and extracting the voiceprint feature and the pre-stored first language sound When the pattern features do not match, it is determined that the language category corresponding to the extracted voiceprint feature is the second language. Taking a Chinese and English conversation scene as an example, in the conversation scene, the first language is Chinese and the second language is English. After extracting the voiceprint features of the voice data, it is determined whether the extracted voiceprint features are pre-stored. Chinese voiceprint feature matching. When the extracted voiceprint feature matches the pre-existing Chinese voiceprint feature, it is determined that the language category corresponding to the extracted voiceprint feature is Chinese, and then another voice in the dialogue scenario is English. When the extracted voiceprint feature does not match the pre-stored Chinese voiceprint feature, the voiceprint feature corresponds to the language category, and the other voice in the dialogue scenario is Chinese.
步骤S40,将所述第一语音数据由第一语言转换成第二语言对应的第二语音数据。Step S40, converting the first voice data from the first language to the second voice data corresponding to the second language.
在确定第一语言和第二语言后,将所述第一语言、第二语言和第一语音数据传送到云端服务器,以供云端服务器对所述第一语音数据进行处理,根据第一语言将所述第一语音数据转换成第二语言对应的第二语音数据。对接收到的语音数据的处理也可以部分在云端服务器 处理,部分在本地处理。After determining the first language and the second language, transmitting the first language, the second language, and the first voice data to the cloud server, for the cloud server to process the first voice data, according to the first language The first voice data is converted into second voice data corresponding to the second language. The processing of the received voice data can also be partially in the cloud server. Processing, partially processed locally.
具体的,参考图2,将所述第一语音数据由第一语言转换成第二语言对应的第二语音数据的过程可以是:Specifically, referring to FIG. 2, the process of converting the first voice data from the first language to the second voice data corresponding to the second language may be:
步骤S41,根据第一语言将所述第一语音数据转换成所述第一语言对应的第一文本数据;Step S41, converting the first voice data into the first text data corresponding to the first language according to the first language;
步骤S42,将所述第一文本数据翻译成所述第二语言对应的第二文本数据;Step S42, translating the first text data into second text data corresponding to the second language;
步骤S43,将所述第二文本数据合成第二语音数据。Step S43, synthesizing the second text data into the second voice data.
在本实施例中,以所述第一语言是中文,第二语言是英文为例,在获取中文、英文后,根据中文将所述中文语音数据转换成中文文本数据;将所述中文文本数据翻译成英文文本数据;可以在界面显示转换成的中文文本数据和英文文本数据,最后将所述英文文本数据合成英文语音数据。In this embodiment, the first language is Chinese, and the second language is English. After acquiring Chinese and English, the Chinese voice data is converted into Chinese text data according to Chinese; the Chinese text data is Translating into English text data; displaying Chinese text data and English text data converted into an interface, and finally synthesizing the English text data into English voice data.
本实施例在接收到第一语音数据时,提取所述第一语音数据的声纹特征;确定所提取的声纹特征对应的语言类别;在所提取的声纹特征对应的语言类别是第一语言时,获取预存的第二语言;将所述第一语音数据由第一语言转换成第二语言对应的第二语音数据。实现通过语音识别准确区别不同语言,并自动将一种语言的语音转换成另一种语言的语音,进而提高沟通的有效性。In this embodiment, when the first voice data is received, the voiceprint feature of the first voice data is extracted; the language category corresponding to the extracted voiceprint feature is determined; and the language category corresponding to the extracted voiceprint feature is the first And acquiring a pre-stored second language; converting the first voice data from the first language to the second voice data corresponding to the second language. It realizes the accurate distinction between different languages through speech recognition, and automatically converts the voice of one language into the voice of another language, thereby improving the effectiveness of communication.
参照图3,图3为本发明语音翻译方法的第二实施例的流程示意图。基于上述方法的第一实施例,所述步骤S20包括:Referring to FIG. 3, FIG. 3 is a schematic flowchart diagram of a second embodiment of a voice translation method according to the present invention. Based on the first embodiment of the foregoing method, the step S20 includes:
步骤S21,判断所提取的声纹特征是否与预存的第一语言的声纹特征匹配;Step S21, determining whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language;
步骤S22,在所提取的声纹特征与预存的第一语言的声纹特征匹配时,确定所提取的声纹特征对应的语言类别是第一语言;Step S22, determining that the language category corresponding to the extracted voiceprint feature is the first language when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language;
步骤S23,在所提取的声纹特征与预存的第一语言的声纹特征不匹配时,确定所提取的声纹特征对应的语言类别是第二语言。Step S23: When the extracted voiceprint feature does not match the pre-stored voiceprint feature of the first language, determine that the language category corresponding to the extracted voiceprint feature is the second language.
判断所提取的声纹特征是否与预存的第一语言的声纹特征匹配,若所提取的声纹特征与预存的第一语言的声纹特征匹配,则所述第一语音数据对应的语言类别是第一语言,第二语言即为该对话场景中的另一种语音。否则,所述第一语音数据对应的语言类别为第二语言。在获取第一语言和第二语言时,显示所述第一语言和第二语言,以供用户辨别所述第一语言和第二语言是否有误。所述显示第一语言和第二语言的方式可以是语音播报当前第一语言和第二语言、高亮显示当前第一语言和第二语言等显示方式,根据用户的需要及/或系统的性能设置。在用户辨别所述第一语言和第二语言有误时,接收重新设置第一语言和第二语言的指令;根据所述指令提供语言类别的选择界面,以供用户选择第一语言和第二语言;在所述用户选择第一语言和第二语言时,保存第一语言和第二语言。接收第一语言所对应的第一语音数据,并提取所述第一语音数据的声纹特征,保存所述第一语言的声纹特征。在保存所述声 纹特征后,调整并更新原有的声纹特征。再次接收到语音数据时,提取所述语音数据的声纹特征,判断所述声纹特征是否与更新后的声纹特征匹配。Determining whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, and if the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, the language category corresponding to the first voice data It is the first language, and the second language is another voice in the conversation scene. Otherwise, the language category corresponding to the first voice data is the second language. When the first language and the second language are acquired, the first language and the second language are displayed for the user to discern whether the first language and the second language are erroneous. The manner of displaying the first language and the second language may be a voice broadcast of the current first language and the second language, highlighting the current first language and the second language, etc., according to user needs and/or system performance. Settings. Receiving, by the user, that the first language and the second language are incorrect, receiving an instruction to reset the first language and the second language; providing a selection interface of the language category according to the instruction, for the user to select the first language and the second Language; when the user selects the first language and the second language, the first language and the second language are saved. Receiving the first voice data corresponding to the first language, and extracting the voiceprint feature of the first voice data, and storing the voiceprint feature of the first language. Saving the sound After the pattern features, adjust and update the original voiceprint features. When the voice data is received again, the voiceprint feature of the voice data is extracted, and it is determined whether the voiceprint feature matches the updated voiceprint feature.
进一步,所述步骤S40之后,还包括:Further, after the step S40, the method further includes:
步骤S50,输出所述第二语音数据。Step S50, outputting the second voice data.
所述输出所述第二语音数据可以直接通过扬声器输出或者耳机输出,根据用户的需要及/或系统的性能设置。The outputting the second voice data may be directly output through a speaker or a headphone output, according to a user's needs and/or performance settings of the system.
本实施例通过判断所提取的声纹特征是否与预存的第一语言的声纹特征匹配;在所提取的声纹特征与预存的第一语言的声纹特征匹配时,确定所述声纹特征对应的语言类别是第一语言。通过声纹特征确定所述声纹特征对应的语言类别,提高识别的准确性,进一步提高沟通的有效性。The embodiment determines whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language; and determines the voiceprint feature when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language. The corresponding language category is the first language. The language category corresponding to the voiceprint feature is determined by the voiceprint feature, the accuracy of the recognition is improved, and the effectiveness of communication is further improved.
参照图4,图4为本发明语音翻译方法的第三实施例的流程示意图。基于上述方法的第一实施例,所述步骤S10之前,还包括:Referring to FIG. 4, FIG. 4 is a schematic flowchart diagram of a third embodiment of a voice translation method according to the present invention. Based on the first embodiment of the foregoing method, before the step S10, the method further includes:
步骤S60,接收第一语言和第二语言的设置指令;Step S60, receiving a setting instruction of the first language and the second language;
步骤S70,根据所述设置指令提供语言类别的选择界面,以供用户选择第一语言和第二语言;Step S70, providing a selection interface of a language category according to the setting instruction, so that the user selects the first language and the second language;
步骤S80,在所述用户选择第一语言和第二语言时,保存所述第一语言和第二语言;Step S80, saving the first language and the second language when the user selects the first language and the second language;
步骤S90,提取所述第一语言对应语音数据的声纹特征,并保存所述声纹特征。Step S90, extracting a voiceprint feature of the first language corresponding voice data, and saving the voiceprint feature.
接收第一语言和第二语言的设置指令可以在对话的起始阶段,在接收到第一语言和第二语言的设置指令时,根据所述设置指令提供语言类别的选择界面,以供用户选择第一语言和第二语言;在所述用户选择第一语言和第二语言时,保存第一语言和第二语言。也可以通过语音的方式来选择第一语言和第二语言,根据用户的需要及/或系统的性能设置。在保存第一语言和第二语言后,接收所述第一语言对应的第一语音数据,提取所述第一语音数据的声纹特征,保存所述声纹特征。所述第一、第二语言可以是中文、英文等,也可以根据地域名称,例如广东、加拿大等,如果设置的是地域名称,可以本地预存地域名称与当地主要语言类别对应的声纹特征。The setting instruction for receiving the first language and the second language may provide a selection interface of the language category according to the setting instruction when the setting instruction of the first language and the second language is received at the initial stage of the dialogue, for the user to select a first language and a second language; saving the first language and the second language when the user selects the first language and the second language. The first language and the second language can also be selected by voice, according to the needs of the user and/or the performance of the system. After saving the first language and the second language, receiving the first voice data corresponding to the first language, extracting a voiceprint feature of the first voice data, and saving the voiceprint feature. The first and second languages may be Chinese, English, etc., or may be based on a geographical name, such as Guangdong, Canada, etc., if the geographical name is set, the voiceprint feature corresponding to the local primary language category may be pre-stored locally.
在本发明其他实施例中,所述语音翻译方法还可以是:在多语言会议下,例如,有A、B、C、D四种语言,在会议中,提供界面以供用户选择自己的语言,在用户选择自己的语言后,通过传输模块的蓝牙或Wi-Fi等传送到云端服务器。在云端服务器中预存A、B、C、D四种语言以及四种语言对应的声纹特征。在接收到语音数据时,提取所述语音数据的声纹特征,判断所提取的声纹特征是否与预存的语言类别的声纹特征匹配。以所提取的声纹特征与预存的A语言的声纹特征匹配为例,在所提取的声纹特征与预存的A语言的声纹特征匹配时,确定所提取的声纹特征对应的语言类别是A语言。从云端服务器中获取预存B、C、D三种语言, 根据A语言将接收到的语音数据转换成A语言对应的A文本数据,再将A文本数据翻译成B文本数据、C文本数据、D文本数据,将B文本数据转换成B语音数据,C文本数据转换成C语音数据,D文本数据转换成D语音数据,最后通过传输模块的蓝牙或Wi-Fi等传送到使用B、C、D语言对应的用户的扬声器或耳机。有效避免现有的语音翻译软件或设备无法通过语音识别来准确区别不同语言,进而导致沟通效率低的问题。实现通过语音识别准确地区别不同语言,并自动将一种语言的语音转换成另一种语言的语音,进而提高沟通的有效性。In other embodiments of the present invention, the voice translation method may further be: in a multi-language conference, for example, there are four languages: A, B, C, and D. In the conference, an interface is provided for the user to select his or her own language. After the user selects his or her own language, it is transmitted to the cloud server through the Bluetooth or Wi-Fi of the transmission module. The voiceprint features corresponding to the four languages A, B, C, and D and the four languages are pre-stored in the cloud server. Upon receiving the voice data, the voiceprint feature of the voice data is extracted to determine whether the extracted voiceprint feature matches the voiceprint feature of the pre-stored language category. Taking the extracted voiceprint feature and the pre-existing voiceprint feature of the A language as an example, when the extracted voiceprint feature matches the pre-existing voiceprint feature of the A language, the language category corresponding to the extracted voiceprint feature is determined. Is the A language. Obtain pre-stored B, C, and D languages from the cloud server. According to the A language, the received voice data is converted into the A text data corresponding to the A language, and then the A text data is translated into the B text data, the C text data, the D text data, and the B text data is converted into the B voice data, the C text. The data is converted into C voice data, and the D text data is converted into D voice data, and finally transmitted to the speaker or earphone of the user corresponding to the B, C, and D languages through the Bluetooth or Wi-Fi of the transmission module. Effectively avoid the problem that existing speech translation software or devices cannot accurately distinguish different languages through speech recognition, resulting in low communication efficiency. It realizes the accurate distinction between different languages through speech recognition, and automatically converts the speech of one language into the speech of another language, thereby improving the effectiveness of communication.
本实施例通过预存第一语言、第二语言以及第一语言的声纹特征,在接收到语音数据时,可以提取所述语音数据的声纹特征,根据第一语言的声纹特征与所述第一语言的对应关系可以确定所述声纹特征对应的语言类别,通过语音识别准确地区别不同语言,进而提高沟通的有效性。In this embodiment, by pre-storing the voiceprint features of the first language, the second language, and the first language, when the voice data is received, the voiceprint feature of the voice data may be extracted, according to the voiceprint feature of the first language. The correspondence relationship of the first language can determine the language category corresponding to the voiceprint feature, and accurately distinguish different languages through voice recognition, thereby improving the effectiveness of communication.
上述第一至第三实施例的语音翻译方法的执行主体均可以为语音翻译设备或与语音翻译设备信号连接的翻译设备。更进一步地,该语音翻译方法可以由安装在语音翻译设备或设备上的客户端翻译程序实现,其中,所述语音翻译设备包括但不限于手机、pad、笔记本电脑等。The execution bodies of the speech translation methods of the above first to third embodiments may each be a speech translation device or a translation device that is coupled to a speech translation device. Still further, the speech translation method can be implemented by a client translation program installed on a speech translation device or device, including but not limited to a mobile phone, a pad, a notebook computer, and the like.
本发明进一步提供一种语音翻译装置。The invention further provides a speech translation device.
参照图5,图5为本发明语音翻译装置的第一实施例的功能模块示意图。Referring to FIG. 5, FIG. 5 is a schematic diagram of functional modules of a first embodiment of a speech translation apparatus according to the present invention.
在一实施例中,所述语音翻译装置包括:提取模块10、确定模块20、获取模块30和转换模块40。In an embodiment, the speech translation apparatus comprises: an extraction module 10, a determination module 20, an acquisition module 30, and a conversion module 40.
提取模块10,设置为在接收到第一语音数据时,提取所述第一语音数据的声纹特征;The extracting module 10 is configured to extract a voiceprint feature of the first voice data when the first voice data is received;
实时接收语音数据,对接收到的语音数据进行声纹特征提取,所述声纹特征的提取可以在会话过程中提取,可以根据选择语言的不同而侧重点不同,如语言中有方言或中、英文识别等,也可以侧重提取辨别说话人的口音、发音方式等。所述声纹特征的提取可以通过对所述第一语音数据进行预处理,所述预处理是对所述第一语音数据进行采样、量化、预加重和加窗等,将原始的第一语音数据转化成N维的特征矢量,从而提取到所述第一语音数据的声纹特征。所述接收第一语音数据的方式可以通过麦克风接收或蓝牙耳机接收等不限于其他接收方式。The voice data is received in real time, and the voiceprint feature is extracted from the received voice data. The voiceprint feature extraction may be extracted during the session, and may be different according to different language selections, such as dialect or medium in the language. English recognition, etc., can also focus on extracting the accent and pronunciation of the speaker. The extraction of the voiceprint feature may be performed by pre-processing the first voice data, the pre-processing is to sample, quantize, pre-emphasize, window, etc. the first voice data, and the original first voice The data is converted into an N-dimensional feature vector to extract the voiceprint features of the first voice data. The manner of receiving the first voice data may be received by a microphone or received by a Bluetooth headset, and the like, and is not limited to other receiving modes.
确定模块20,设置为确定所提取的声纹特征对应的语言类别;The determining module 20 is configured to determine a language category corresponding to the extracted voiceprint feature;
根据提取到的声纹特征建立声纹模型,判断所述声纹模型是否与预存的语言类别的声纹模型匹配。所述声纹特征模型可以根据设置语言的不同,选择不同的声纹特征模型,适当增加与特定语种相关的某些声纹特征比重。A voiceprint model is established according to the extracted voiceprint feature, and it is determined whether the voiceprint model matches the voiceprint model of the pre-stored language category. The voiceprint feature model may select different voiceprint feature models according to different language settings, and appropriately increase the proportion of certain voiceprint features associated with a particular language.
具体的,参考图6,所述确定模块20包括判断单元21和确定单元22,Specifically, referring to FIG. 6, the determining module 20 includes a determining unit 21 and a determining unit 22,
所述判断单元21,设置为判断所提取的声纹特征是否与预存的第一语言的声纹特征匹配; The determining unit 21 is configured to determine whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language;
所述确定单元22,设置为在所提取的声纹特征与预存的第一语言的声纹特征匹配时,确定所提取的声纹特征对应的语言类别是第一语言;还设置为在所提取的声纹特征与预存的第一语言的声纹特征不匹配时,确定所提取的声纹特征对应的语言类别是第二语言。The determining unit 22 is configured to: when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, determine that the language category corresponding to the extracted voiceprint feature is the first language; When the voiceprint feature does not match the pre-stored voiceprint feature of the first language, it is determined that the language category corresponding to the extracted voiceprint feature is the second language.
判断所提取的声纹特征是否与预存的第一语言的声纹特征匹配,若所提取的声纹特征与预存的第一语言的声纹特征匹配,则所述第一语音数据对应的语言类别是第一语言,第二语言即为该对话场景中的另一种语音。否则,所述第一语音数据对应的语言类别为第二语言。在获取第一语言和第二语言时,显示所述第一语言和第二语言,以供用户辨别所述第一语言和第二语言是否有误。所述显示第一语言和第二语言的方式可以是语音播报当前第一语言和第二语言、高亮显示当前第一语言和第二语言等显示方式,根据用户的需要及/或系统的性能设置。在用户辨别所述第一语言和第二语言有误时,接收重新设置第一语言和第二语言的指令;根据所述指令提供语言类别的选择界面,以供用户选择第一语言和第二语言;在所述用户选择第一语言和第二语言时,保存第一语言和第二语言。接收第一语言所对应的第一语音数据,并提取所述第一语音数据的声纹特征,保存所述声纹特征。在保存所述声纹特征后,调整并更新原有的声纹特征。再次接收到语音数据时,提取所述语音数据的声纹特征,判断所述声纹特征是否与更新后的声纹特征匹配。Determining whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, and if the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, the language category corresponding to the first voice data It is the first language, and the second language is another voice in the conversation scene. Otherwise, the language category corresponding to the first voice data is the second language. When the first language and the second language are acquired, the first language and the second language are displayed for the user to discern whether the first language and the second language are erroneous. The manner of displaying the first language and the second language may be a voice broadcast of the current first language and the second language, highlighting the current first language and the second language, etc., according to user needs and/or system performance. Settings. Receiving, by the user, that the first language and the second language are incorrect, receiving an instruction to reset the first language and the second language; providing a selection interface of the language category according to the instruction, for the user to select the first language and the second Language; when the user selects the first language and the second language, the first language and the second language are saved. Receiving first voice data corresponding to the first language, and extracting a voiceprint feature of the first voice data, and saving the voiceprint feature. After saving the voiceprint feature, the original voiceprint feature is adjusted and updated. When the voice data is received again, the voiceprint feature of the voice data is extracted, and it is determined whether the voiceprint feature matches the updated voiceprint feature.
获取模块30,设置为在所提取的声纹特征对应的语言类别是第一语言时,获取预存的第二语言;The obtaining module 30 is configured to acquire a pre-stored second language when the language category corresponding to the extracted voiceprint feature is the first language;
判断所提取的声纹特征是否与预存的第一语言的声纹特征匹配。在所提取的声纹特征与预存的第一语言的声纹特征匹配时,获取该对话场景中的另一种语音作为第二语言;在所提取的声纹特征与预存的第一语言的声纹特征不匹配时,判断所提取的声纹特征对应的语言类别是第二语言。以一中文和英文的对话场景为例,在所述对话场景中第一语言为中文,第二语言为英文,在提取语音数据的声纹特征后,判断所提取的声纹特征是否与预存的中文的声纹特征匹配。在所提取的声纹特征与预存的中文的声纹特征匹配时,判断所提取的声纹特征对应的语言类别是中文,那么所述对话场景中另一种语音即为英文。在所提取的声纹特征与预存的中文的声纹特征不匹配时,所提取的声纹特征对应的语言类别是英文,那么所述对话场景中另一种语音即为中文。It is determined whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language. Acquiring another voice in the conversation scene as a second language when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language; and extracting the voiceprint feature and the pre-stored first language sound When the pattern features do not match, it is determined that the language category corresponding to the extracted voiceprint feature is the second language. Taking a Chinese and English conversation scene as an example, in the conversation scene, the first language is Chinese and the second language is English. After extracting the voiceprint features of the voice data, it is determined whether the extracted voiceprint features are pre-stored. Chinese voiceprint feature matching. When the extracted voiceprint feature matches the pre-existing Chinese voiceprint feature, it is determined that the language category corresponding to the extracted voiceprint feature is Chinese, and then another voice in the dialogue scenario is English. When the extracted voiceprint feature does not match the pre-stored Chinese voiceprint feature, the extracted voiceprint feature corresponds to the language category, and the other voice in the dialogue scenario is Chinese.
转换模块40,设置为将所述第一语音数据由第一语言转换成第二语言对应的第二语音数据。The conversion module 40 is configured to convert the first voice data from the first language to the second voice data corresponding to the second language.
在确定第一语言和第二语言后,将所述第一语言、第二语言和第一语音数据传送到云端服务器,以供云端服务器对所述第一语音数据进行处理,根据第一语言将所述第一语音数据转换成第二语言对应的第二语音数据。对接收到的语音数据的处理也可以部分在云端服务器处理,部分在本地处理。After determining the first language and the second language, transmitting the first language, the second language, and the first voice data to the cloud server, for the cloud server to process the first voice data, according to the first language The first voice data is converted into second voice data corresponding to the second language. The processing of the received voice data can also be partially processed in the cloud server, and partially processed locally.
具体的,参考图7,所述转换模块40包括转换单元41、翻译单元42和合成单元43,Specifically, referring to FIG. 7, the conversion module 40 includes a conversion unit 41, a translation unit 42, and a synthesis unit 43,
所述转换单元41,设置为根据第一语言将所述第一语音数据转换成所述第一语言对应的第一文本数据; The converting unit 41 is configured to convert the first voice data into first text data corresponding to the first language according to a first language;
所述翻译单元42,设置为将所述第一文本数据翻译成所述第二语言对应的第二文本数据;The translating unit 42 is configured to translate the first text data into second text data corresponding to the second language;
所述合成单元43,设置为将所述第二文本数据合成第二语音数据。The synthesizing unit 43 is configured to synthesize the second text data into the second voice data.
在本实施例中,以所述第一语言是中文,第二语言是英文为例,在获取中文、英文后,根据中文将所述中文语音数据转换成中文文本数据;将所述中文文本数据翻译成英文文本数据;可以在界面显示转换成的中文文本数据和英文文本数据,最后将所述英文文本数据合成英文语音数据。In this embodiment, the first language is Chinese, and the second language is English. After acquiring Chinese and English, the Chinese voice data is converted into Chinese text data according to Chinese; the Chinese text data is Translating into English text data; displaying Chinese text data and English text data converted into an interface, and finally synthesizing the English text data into English voice data.
本实施例在接收到第一语音数据时,提取所述第一语音数据的声纹特征;确定所提取声纹特征对应的语言类别;在确定所提取声纹特征对应的语言类别是第一语言时,获取预存的第二语言;将所述第一语音数据由第一语言转换成第二语言对应的第二语音数据。实现通过语音识别准确区别不同语言,进而提高沟通的有效性。In this embodiment, when the first voice data is received, the voiceprint feature of the first voice data is extracted; the language category corresponding to the extracted voiceprint feature is determined; and the language category corresponding to the extracted voiceprint feature is determined to be the first language. And acquiring the pre-stored second language; converting the first voice data from the first language to the second voice data corresponding to the second language. Realize the accuracy of communication by accurately distinguishing different languages through speech recognition.
参照图8,图8为本发明语音翻译装置的第二实施例的功能模块示意图。Referring to FIG. 8, FIG. 8 is a schematic diagram of functional modules of a second embodiment of a speech translation apparatus according to the present invention.
基于上述第一实施例,本实施例所述语音翻译装置还包括输出模块50、接收模块60、提供模块70和保存模块80。Based on the foregoing first embodiment, the voice translation apparatus of this embodiment further includes an output module 50, a receiving module 60, a providing module 70, and a saving module 80.
所述输出模块50,设置为输出所述第二语音数据。The output module 50 is configured to output the second voice data.
所述输出所述第二语音数据可以直接通过扬声器输出或者耳机输出,根据用户的需要及/或系统的性能设置。The outputting the second voice data may be directly output through a speaker or a headphone output, according to a user's needs and/or performance settings of the system.
所述接收模块60,设置为接收第一语言和第二语言的设置指令;The receiving module 60 is configured to receive a setting instruction of the first language and the second language;
所述提供模块70,设置为根据所述设置指令提供语言类别的选择界面,以供用户选择第一语言和第二语言;The providing module 70 is configured to provide a selection interface of a language category according to the setting instruction, so that the user selects the first language and the second language;
所述保存模块80,设置为在所述用户选择第一语言和第二语言时,保存所述第一语言和第二语言;还设置为保存所述第一语言的声纹特征;The saving module 80 is configured to save the first language and the second language when the user selects the first language and the second language, and is further configured to save the voiceprint feature of the first language;
所述提取模块10,还设置为提取第一语言对应语音数据的声纹特征。The extraction module 10 is further configured to extract voiceprint features of the first language corresponding voice data.
接收第一语言和第二语言的设置指令可以在对话的起始阶段,在接收到第一语言和第二语言的设置指令时,根据所述设置指令提供语言类别的选择界面,以供用户选择第一语言和第二语言;在所述用户选择第一语言和第二语言时,保存第一语言和第二语言。也可以通过语音的方式来选择第一语言和第二语言,根据用户的需要及/或系统的性能设置。在保存第一语言和第二语言后,接收所述第一语言对应的第一语音数据,提取所述第一语音数据的声纹特征,保存所述声纹特征。所述第一、第二语言可以是中文、英文等,也可以根据地域名称,例如广东、加拿大等,如果设置的是地域名称,可以本地预存地域名称与当地主要语言类别对应的声纹特征。The setting instruction for receiving the first language and the second language may provide a selection interface of the language category according to the setting instruction when the setting instruction of the first language and the second language is received at the initial stage of the dialogue, for the user to select a first language and a second language; saving the first language and the second language when the user selects the first language and the second language. The first language and the second language can also be selected by voice, according to the needs of the user and/or the performance of the system. After saving the first language and the second language, receiving the first voice data corresponding to the first language, extracting a voiceprint feature of the first voice data, and saving the voiceprint feature. The first and second languages may be Chinese, English, etc., or may be based on a geographical name, such as Guangdong, Canada, etc., if the geographical name is set, the voiceprint feature corresponding to the local primary language category may be pre-stored locally.
在本发明其他实施例中,所述语音翻译方法还可以是:在多语言会议下,例如,有A、B、C、D四种语言,在会议中,提供界面以供用户选择自己的语言,在用户选择自己的语言后, 通过传输模块的蓝牙或Wi-Fi等传送到云端服务器。在云端服务器中预存A、B、C、D四种语言以及四种语言对应的声纹特征。在接收到语音数据时,提取所述语音数据的声纹特征,判断所提取的声纹特征是否与预存的语言类别的声纹特征匹配。以所提取的声纹特征与预存的A语言的声纹特征匹配为例,在所提取的声纹特征与预存的A语言的声纹特征匹配时,确定所提取的声纹特征对应的语言类别是A语言。从云端服务器中获取预存B、C、D三种语言,根据A语言将接收到的语音数据转换成A语言对应的A文本数据,再将A文本数据翻译成B文本数据、C文本数据、D文本数据,将B文本数据转换成B语音数据,C文本数据转换成C语音数据,D文本数据转换成D语音数据,最后通过传输模块的蓝牙或Wi-Fi等传送到使用B、C、D语言对应的用户的扬声器或耳机。有效避免现有的语音翻译软件或设备无法通过语音识别来准确区别不同语言,进而导致沟通效率低的问题。实现通过语音识别准确地区别不同语言,并自动将一种语言的语音转换成另一种语言的语音,进而提高沟通的有效性。In other embodiments of the present invention, the voice translation method may further be: in a multi-language conference, for example, there are four languages: A, B, C, and D. In the conference, an interface is provided for the user to select his or her own language. After the user selects their own language, Transfer to the cloud server via Bluetooth or Wi-Fi of the transmission module. The voiceprint features corresponding to the four languages A, B, C, and D and the four languages are pre-stored in the cloud server. Upon receiving the voice data, the voiceprint feature of the voice data is extracted to determine whether the extracted voiceprint feature matches the voiceprint feature of the pre-stored language category. Taking the extracted voiceprint feature and the pre-existing voiceprint feature of the A language as an example, when the extracted voiceprint feature matches the pre-existing voiceprint feature of the A language, the language category corresponding to the extracted voiceprint feature is determined. Is the A language. Obtain pre-stored B, C, and D languages from the cloud server, convert the received voice data into A-text data corresponding to the A language according to the A language, and then translate the A-text data into B-text data, C-text data, and D. Text data, convert B text data into B voice data, convert C text data into C voice data, convert D text data into D voice data, and finally transmit to B, C, D through Bluetooth or Wi-Fi of the transmission module. The language of the user's speaker or headset. Effectively avoid the problem that existing speech translation software or devices cannot accurately distinguish different languages through speech recognition, resulting in low communication efficiency. It realizes the accurate distinction between different languages through speech recognition, and automatically converts the speech of one language into the speech of another language, thereby improving the effectiveness of communication.
本实施例通过预存第一语言、第二语言以及第一语言的声纹特征,在接收到语音数据时,可以提取所述语音数据的声纹特征,根据第一语言的声纹特征与所述第一语言的对应关系可以确定所述声纹特征对应的语言类别,准确的区别不同语言,提高沟通的有效性。In this embodiment, by pre-storing the voiceprint features of the first language, the second language, and the first language, when the voice data is received, the voiceprint feature of the voice data may be extracted, according to the voiceprint feature of the first language. The correspondence relationship of the first language can determine the language category corresponding to the voiceprint feature, accurately distinguish different languages, and improve the effectiveness of communication.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。The serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments. Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
可选地,本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例,本实施例在此不再赘述。For example, the specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the optional embodiments, and details are not described herein again.
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above description is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.
工业实用性Industrial applicability
上述的本发明实施例,可以应用于语音翻译技术领域,解决了解决现有的语音翻译软件或设备无法通过语音识别来准确区别不同语言,进而导致沟通效率低的问题,实现准确区别不同语言,并自动将一种语言的语音转换成另一种语言的语音,进而提高沟通的有效性。 The foregoing embodiments of the present invention can be applied to the field of voice translation technology, and solve the problem that the existing voice translation software or device cannot accurately distinguish different languages through voice recognition, thereby causing low communication efficiency, and accurately distinguishing different languages. And automatically convert the voice of one language into the voice of another language, thereby improving the effectiveness of communication.

Claims (10)

  1. 一种语音翻译方法,包括步骤:A speech translation method comprising the steps of:
    在接收到第一语音数据时,提取所述第一语音数据的声纹特征;Extracting a voiceprint feature of the first voice data when the first voice data is received;
    确定所提取的声纹特征对应的语言类别;Determining a language category corresponding to the extracted voiceprint feature;
    在所提取的声纹特征对应的语言类别是第一语言时,获取预存的第二语言;Obtaining a pre-stored second language when the language category corresponding to the extracted voiceprint feature is the first language;
    将所述第一语音数据由第一语言转换成第二语言对应的第二语音数据。Converting the first voice data from a first language to second voice data corresponding to a second language.
  2. 如权利要求1所述的语音翻译方法,其中,所述确定所提取的声纹特征对应的语言类别的步骤包括:The speech translation method according to claim 1, wherein said step of determining a language category corresponding to said extracted voiceprint feature comprises:
    判断所提取的声纹特征是否与预存的第一语言的声纹特征匹配;Determining whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language;
    在所提取的声纹特征与预存的第一语言的声纹特征匹配时,确定所提取的声纹特征对应的语言类别是第一语言;When the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, determining that the language category corresponding to the extracted voiceprint feature is the first language;
    在所提取的声纹特征与预存的第一语言的声纹特征不匹配时,确定所提取的声纹特征对应的语言类别是第二语言。When the extracted voiceprint feature does not match the pre-stored voiceprint feature of the first language, it is determined that the language category corresponding to the extracted voiceprint feature is the second language.
  3. 如权利要求1所述的语音翻译方法,其中,所述将所述第一语音数据由第一语言转换成所述第二语言对应的第二语音数据的步骤包括:The speech translation method according to claim 1, wherein the converting the first speech data from the first language to the second speech data corresponding to the second language comprises:
    根据第一语言将所述第一语音数据转换成所述第一语言对应的第一文本数据;Converting the first voice data into first text data corresponding to the first language according to a first language;
    将所述第一文本数据翻译成所述第二语言对应的第二文本数据;Translating the first text data into second text data corresponding to the second language;
    将所述第二文本数据合成第二语音数据。The second text data is synthesized into the second voice data.
  4. 如权利要求3所述的语音翻译方法,其中,所述将所述第一语音数据由第一语言转换成所述第二语言对应的第二语音数据的步骤之后,还包括:The speech translation method of claim 3, wherein after the step of converting the first speech data from the first language to the second speech data corresponding to the second language, the method further comprises:
    输出所述第二语音数据。The second voice data is output.
  5. 如权利要求1至4中任一项所述的语音翻译方法,其中,所述在接收到第一语音数据时,提取所述第一语音数据的声纹特征的步骤之前,还包括:The speech translation method according to any one of claims 1 to 4, wherein the step of extracting the voiceprint feature of the first voice data when the first voice data is received further includes:
    接收第一语言和第二语言的设置指令;Receiving a setting instruction of the first language and the second language;
    根据所述设置指令提供语言类别的选择界面,以供用户选择第一语言和第二语言;Providing a selection interface of a language category according to the setting instruction, for the user to select the first language and the second language;
    在所述用户选择第一语言和第二语言时,保存所述第一语言和第二语言;Saving the first language and the second language when the user selects the first language and the second language;
    提取所述第一语言对应语音数据的声纹特征,并保存所述第一语言的声纹特征。Extracting the voiceprint feature of the first language corresponding voice data, and saving the voiceprint feature of the first language.
  6. 一种语音翻译装置,包括:A speech translation device comprising:
    提取模块,设置为在接收到第一语音数据时,提取所述第一语音数据的声纹特征; An extracting module, configured to extract a voiceprint feature of the first voice data when the first voice data is received;
    确定模块,设置为确定所提取的声纹特征对应的语言类别;Determining a module, configured to determine a language category corresponding to the extracted voiceprint feature;
    获取模块,设置为在所提取的声纹特征对应的语言类别是第一语言时,获取预存的第二语言;Obtaining a module, configured to acquire a pre-stored second language when the language category corresponding to the extracted voiceprint feature is the first language;
    转换模块,设置为将所述第一语音数据由第一语言转换成第二语言对应的第二语音数据。The conversion module is configured to convert the first voice data from the first language to the second voice data corresponding to the second language.
  7. 如权利要求6所述的语音翻译装置,其中,所述确定模块包括判断单元和确定单元,The speech translation apparatus according to claim 6, wherein said determination module comprises a determination unit and a determination unit,
    所述判断单元,设置为判断所提取的声纹特征是否与预存的第一语言的声纹特征匹配;The determining unit is configured to determine whether the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language;
    所述确定单元,设置为在所提取的声纹特征与预存的第一语言的声纹特征匹配时,确定所提取的声纹特征对应的语言类别是第一语言;还设置为在所提取的声纹特征与预存的第一语言的声纹特征不匹配时,确定所提取的声纹特征对应的语言类别是第二语言。The determining unit is configured to: when the extracted voiceprint feature matches the pre-stored voiceprint feature of the first language, determine that the language category corresponding to the extracted voiceprint feature is the first language; When the voiceprint feature does not match the pre-stored voiceprint feature of the first language, it is determined that the language category corresponding to the extracted voiceprint feature is the second language.
  8. 如权利要求6所述的语音翻译装置,其中,所述转换模块包括转换单元、翻译单元和合成单元,The speech translation apparatus according to claim 6, wherein said conversion module comprises a conversion unit, a translation unit, and a synthesis unit,
    所述转换单元,设置为根据第一语言将所述第一语音数据转换成所述第一语言对应的第一文本数据;The converting unit is configured to convert the first voice data into first text data corresponding to the first language according to a first language;
    所述翻译单元,设置为将所述第一文本数据翻译成所述第二语言对应的第二文本数据;Translating unit, configured to translate the first text data into second text data corresponding to the second language;
    所述合成单元,设置为将所述第二文本数据合成第二语音数据。The synthesizing unit is configured to synthesize the second text data into the second voice data.
  9. 如权利要求6所述的语音翻译装置,其中,所述语音翻译装置还包括输出模块,设置为输出所述第二语音数据。The speech translation apparatus of claim 6, wherein the speech translation apparatus further comprises an output module configured to output the second speech data.
  10. 如权利要求6至9任一项所述的语音翻译装置,其中,所述语音翻译装置还包括接收模块、提供模块和保存模块,The speech translation apparatus according to any one of claims 6 to 9, wherein the speech translation apparatus further comprises a receiving module, a providing module, and a saving module,
    所述接收模块,设置为接收第一语言和第二语言的设置指令;The receiving module is configured to receive a setting instruction of the first language and the second language;
    所述提供模块,设置为根据所述设置指令提供语言类别的选择界面,以供用户选择第一语言和第二语言;The providing module is configured to provide a selection interface of a language category according to the setting instruction, so that the user selects the first language and the second language;
    所述保存模块,设置为在所述用户选择第一语言和第二语言时,保存所述第一语言和第二语言;还设置为保存所述第一语言的声纹特征;The saving module is configured to save the first language and the second language when the user selects the first language and the second language; and further configured to save the voiceprint feature of the first language;
    所述提取模块,还设置为提取第一语言对应语音数据的声纹特征。 The extraction module is further configured to extract a voiceprint feature of the first language corresponding voice data.
PCT/CN2016/078895 2015-04-13 2016-04-08 Speech translation method and device WO2016165590A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510172421.7A CN106156009A (en) 2015-04-13 2015-04-13 Voice translation method and device
CN201510172421.7 2015-04-13

Publications (1)

Publication Number Publication Date
WO2016165590A1 true WO2016165590A1 (en) 2016-10-20

Family

ID=57125556

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/078895 WO2016165590A1 (en) 2015-04-13 2016-04-08 Speech translation method and device

Country Status (2)

Country Link
CN (1) CN106156009A (en)
WO (1) WO2016165590A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114239613A (en) * 2022-02-23 2022-03-25 阿里巴巴达摩院(杭州)科技有限公司 Real-time voice translation method, device, equipment and storage medium

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315740A (en) * 2017-01-20 2017-11-03 北京分音塔科技有限公司 A kind of real-time voice intertranslation device
CN108733656A (en) * 2017-04-14 2018-11-02 深圳市领芯者科技有限公司 Speech translation apparatus, system and method
CN107749296A (en) * 2017-10-12 2018-03-02 深圳市沃特沃德股份有限公司 Voice translation method and device
CN107731232A (en) * 2017-10-17 2018-02-23 深圳市沃特沃德股份有限公司 Voice translation method and device
CN107910004A (en) * 2017-11-10 2018-04-13 科大讯飞股份有限公司 Voiced translation processing method and processing device
CN107861955B (en) * 2017-11-14 2021-09-28 维沃移动通信有限公司 Translation method and mobile terminal
WO2019104556A1 (en) * 2017-11-29 2019-06-06 深圳市沃特沃德股份有限公司 Translation method and device
CN108281145B (en) * 2018-01-29 2021-07-02 南京地平线机器人技术有限公司 Voice processing method, voice processing device and electronic equipment
CN108447486B (en) * 2018-02-28 2021-12-03 科大讯飞股份有限公司 Voice translation method and device
CN108966066A (en) * 2018-03-07 2018-12-07 深圳市哈尔马科技有限公司 A kind of real time translation interactive system based on wireless headset
CN109121123A (en) * 2018-07-03 2019-01-01 Oppo广东移动通信有限公司 Information processing method and related product
CN109005480A (en) * 2018-07-19 2018-12-14 Oppo广东移动通信有限公司 Information processing method and related product
CN109147769B (en) * 2018-10-17 2020-12-22 北京猎户星空科技有限公司 Language identification method, language identification device, translation machine, medium and equipment
CN109344415A (en) * 2018-12-13 2019-02-15 深圳市友杰智新科技有限公司 E-book intelligent sound reads aloud implementation method
CN110428813B (en) * 2019-07-23 2022-04-22 北京奇艺世纪科技有限公司 Voice understanding method and device, electronic equipment and medium
CN110442881A (en) * 2019-08-06 2019-11-12 上海祥久智能科技有限公司 A kind of information processing method and device of voice conversion
CN110956950A (en) * 2019-12-02 2020-04-03 联想(北京)有限公司 Data processing method and device and electronic equipment
CN112989847A (en) * 2021-03-11 2021-06-18 读书郎教育科技有限公司 Recording translation system and method of scanning pen

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1334532A (en) * 2000-07-13 2002-02-06 白涛 Automatic simultaneous interpretation system between multiple languages for GSM
CN1602483A (en) * 2001-12-17 2005-03-30 内维尼·加雅拉特尼 Real time translator and method of performing real time translation of a plurality of spoken word languages
JP2011128260A (en) * 2009-12-16 2011-06-30 Nec Corp Foreign language conversation support device, method, program and phone terminal device
CN103309854A (en) * 2013-06-08 2013-09-18 开平市中铝实业有限公司 Translator system for taxis
CN103838714A (en) * 2012-11-22 2014-06-04 北大方正集团有限公司 Method and device for converting voice information

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894548B (en) * 2010-06-23 2012-07-04 清华大学 Modeling method and modeling device for language identification
CN202772966U (en) * 2012-09-03 2013-03-06 上海三旗通信科技股份有限公司 Mobile phone having global barrier-free communication function
CN103117059B (en) * 2012-12-27 2015-05-06 内蒙古科技大学 Voice signal characteristics extracting method based on tensor decomposition
CN103117061B (en) * 2013-02-05 2016-01-20 广东欧珀移动通信有限公司 A kind of voice-based animals recognition method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1334532A (en) * 2000-07-13 2002-02-06 白涛 Automatic simultaneous interpretation system between multiple languages for GSM
CN1602483A (en) * 2001-12-17 2005-03-30 内维尼·加雅拉特尼 Real time translator and method of performing real time translation of a plurality of spoken word languages
JP2011128260A (en) * 2009-12-16 2011-06-30 Nec Corp Foreign language conversation support device, method, program and phone terminal device
CN103838714A (en) * 2012-11-22 2014-06-04 北大方正集团有限公司 Method and device for converting voice information
CN103309854A (en) * 2013-06-08 2013-09-18 开平市中铝实业有限公司 Translator system for taxis

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114239613A (en) * 2022-02-23 2022-03-25 阿里巴巴达摩院(杭州)科技有限公司 Real-time voice translation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN106156009A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
WO2016165590A1 (en) Speech translation method and device
US11114091B2 (en) Method and system for processing audio communications over a network
US10079014B2 (en) Name recognition system
US9864745B2 (en) Universal language translator
US9552815B2 (en) Speech understanding method and system
WO2017054122A1 (en) Speech recognition system and method, client device and cloud server
US20140365200A1 (en) System and method for automatic speech translation
KR20180026687A (en) Terminal and handsfree device for servicing handsfree automatic interpretation, and method thereof
US9747282B1 (en) Translation with conversational overlap
JP2016527587A5 (en) Hybrid offline / online speech translation system and method
WO2016101571A1 (en) Voice translation method, communication method and related device
WO2015149359A1 (en) Method for automatically adjusting volume, volume adjustment apparatus and electronic device
US20180286388A1 (en) Conference support system, conference support method, program for conference support device, and program for terminal
CN110827803A (en) Method, device and equipment for constructing dialect pronunciation dictionary and readable storage medium
US20180288109A1 (en) Conference support system, conference support method, program for conference support apparatus, and program for terminal
JP6614080B2 (en) Spoken dialogue system and spoken dialogue method
WO2019075829A1 (en) Voice translation method and apparatus, and translation device
US20180288110A1 (en) Conference support system, conference support method, program for conference support device, and program for terminal
WO2019169686A1 (en) Voice translation method and apparatus, and computer device
CN109559744B (en) Voice data processing method and device and readable storage medium
KR20110132960A (en) Method and apparatus for improving automatic interpretation function by use of mutual communication between portable interpretation terminals
US20190066676A1 (en) Information processing apparatus
TWM515143U (en) Speech translating system and translation processing apparatus
JP2006268710A (en) Translation system
KR102622350B1 (en) Electronic apparatus and control method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16779565

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16779565

Country of ref document: EP

Kind code of ref document: A1