WO2019148564A1 - Method and device for translating speech and speech translation apparatus - Google Patents

Method and device for translating speech and speech translation apparatus Download PDF

Info

Publication number
WO2019148564A1
WO2019148564A1 PCT/CN2018/077452 CN2018077452W WO2019148564A1 WO 2019148564 A1 WO2019148564 A1 WO 2019148564A1 CN 2018077452 W CN2018077452 W CN 2018077452W WO 2019148564 A1 WO2019148564 A1 WO 2019148564A1
Authority
WO
WIPO (PCT)
Prior art keywords
engine
translation
speech
combination
feature information
Prior art date
Application number
PCT/CN2018/077452
Other languages
French (fr)
Chinese (zh)
Inventor
郑勇
金志军
王文祺
Original Assignee
深圳市沃特沃德股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市沃特沃德股份有限公司 filed Critical 深圳市沃特沃德股份有限公司
Publication of WO2019148564A1 publication Critical patent/WO2019148564A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Definitions

  • the present invention relates to the field of speech translation technology, and in particular to a method, device and speech translation device for implementing speech translation.
  • the general translation process of the voice translation device is: the voice translation device receives the original voice information of the user, and sends the voice information to the voice translation engine, and the voice translation engine translates the original voice information into the target voice information (translated from one language to another)
  • the language is returned to the speech translation device, and the speech translation device outputs the target speech information.
  • the current speech translation engine mainly includes Google engine, Microsoft engine, IBM engine, Xunfei engine, Baidu engine, Jinshan engine, etc., and each speech translation engine includes a speech recognition engine, a text translation engine and a speech synthesis engine, and each engine can The supported language types, billing standards, processing delays, and translation accuracy vary.
  • the current speech translation device only supports a single engine, for example, only supports the Baidu engine, and implements speech translation through the Baidu engine's speech recognition engine, text translation engine, and speech synthesis engine.
  • the Baidu engine is currently only able to translate more than ten mainstream languages, but not for some small languages. Some engines may be able to translate small languages, but they may not be as good as the cost of use, translation speed, translation accuracy, etc.
  • the existing speech translation equipment has poor translation flexibility and low translation performance, and cannot provide users with high-quality translation services, and the user experience is not good.
  • the main object of the present invention is to provide a method, device and speech translation device for implementing speech translation, which aims to improve the flexibility of translation and thereby improve translation performance.
  • the embodiment of the present invention provides a method for implementing voice translation, and the method includes the following steps:
  • a speech recognition engine, a text translation engine, and a speech synthesis engine supporting the translation service are respectively selected from the speech recognition engine set, the text translation engine set, and the speech synthesis engine, and at least two sets of engines are formed. combination;
  • the engine combination of each translation service is prioritized according to the feature information, and an engine combination selection set of the translation service is generated.
  • An embodiment of the present invention further provides an apparatus for implementing voice translation, where the apparatus includes:
  • a combination module configured to select, from each of the speech recognition engine set, the text translation engine set, and the speech synthesis engine, a speech recognition engine, a text translation engine, and a speech synthesis engine, respectively, for each translation service, and Forming at least two sets of engine combinations;
  • the sorting module is configured to prioritize engine combinations of each translation service according to the feature information, and generate an engine combination selection set of the translation service.
  • Embodiments of the present invention also provide a voice translation device including a memory, a processor, and at least one application stored in the memory and configured to be executed by the processor, the application being configured to use The foregoing method of implementing speech translation is performed.
  • a method for realizing speech translation provided by an embodiment of the present invention, for each translation service, different speech recognition engines, a text translation engine, and a speech synthesis engine are freely combined into a plurality of engine combinations, and then according to the characteristics of the engine combination
  • the information prioritizes the engine combinations of each translation service and generates a set of engine combination selections for subsequent translations.
  • FIG. 1 is a flow chart of a first embodiment of a method for implementing voice translation according to the present invention
  • FIG. 2 is a schematic diagram of a speech recognition engine set in an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a text translation engine set in an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a speech synthesis engine set in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of an engine language database in an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of an engine charging list in an embodiment of the present invention.
  • FIG. 7 is a flow chart of a second embodiment of a method for implementing voice translation according to the present invention.
  • FIG. 8 is a schematic block diagram of a first embodiment of an apparatus for implementing voice translation according to the present invention.
  • Figure 10 is a block diagram of the first acquisition unit of Figure 9;
  • Figure 11 is a block diagram of the first statistical unit of Figure 10;
  • Figure 12 is a block diagram of the second acquisition unit of Figure 9;
  • Figure 13 is a block diagram of the third acquisition unit of Figure 9;
  • FIG. 14 is a schematic block diagram of a second embodiment of an apparatus for implementing voice translation according to the present invention.
  • FIG. 15 is a block diagram of the screening module of Figure 14;
  • 16 is a schematic block diagram of a third embodiment of an apparatus for implementing voice translation according to the present invention.
  • FIG. 17 is a block diagram of the recommended module of FIG. 16;
  • FIG. 18 is another block diagram of the recommended module of FIG. 16.
  • the method includes the following steps:
  • the voice translation device collects information of the voice recognition engine, establishes a voice recognition engine set, collects information of the text translation engine, establishes a text translation engine set, collects information of the voice recognition engine, and establishes a voice recognition engine set.
  • the information of the speech recognition engine includes the engine name, the type of language supported, and the like.
  • the engine names include Google's speech recognition engine, Microsoft speech recognition engine, IBM speech recognition engine, Nuance speech recognition engine, Baidu speech recognition engine, etc.
  • the languages supported by each engine are different. As shown in Figure 2, an example of a speech recognition engine set, where M represents the engine name and X represents the supported language category.
  • the information of the text translation engine includes the engine name, the type of language supported, and the like.
  • the engine name includes Google Translate Engine, Microsoft Translation Engine, IBM Translation Engine, Xunfei Translation Engine, Baidu Translation Engine, Jinshan Translation Engine, etc.
  • the languages supported by each engine are different. As shown in Figure 3, an example of a text translation engine set, where N represents the engine name and X represents the supported language category.
  • the speech synthesis engine information includes the engine name and the supported language types.
  • the engine name includes Nuance synthesis engine, Microsoft speech synthesis engine, IBM speech synthesis engine, Baidu speech synthesis engine, etc.
  • the languages supported by each engine are different. As shown in FIG. 4, an example is a speech synthesis engine set, where K represents the engine name and X represents the supported language type.
  • the speech recognition engine set, the text translation engine set, and the speech synthesis engine may be combined into one engine language database. As shown in Figure 5, it is an instance of the engine language database.
  • For each translation service select a speech recognition engine, a text translation engine, and a speech synthesis engine that support the translation service from a speech recognition engine set, a text translation engine set, and a speech synthesis engine, and form at least two sets of engine combinations. .
  • the translation service is a service of mutual translation between two languages, such as Chinese-English translation service, Chinese-Japanese translation service, Japanese-English translation service, Japanese-Korean translation service, French-English mutual translation. Translation services, Chinese-Bulgarian translation services, etc.
  • each translation service at least two sets of engine combinations supporting speech recognition + text translation + speech synthesis of the translation service are formed, that is, each set of engine combinations includes a speech recognition engine, a text translation engine and a speech synthesis engine.
  • the speech translation device selects the Chinese-Bulgarian translation service from the speech recognition engine to support the Chinese and Bulgarian speech recognition engines, and selects Chinese-Bulgarian language from the text translation engine.
  • the translation service supports Chinese and Bulgarian text translation engines, and selects the Chinese-Bulgarian translation service from the speech synthesis engine to support Chinese and Bulgarian speech synthesis engines. Then use the selected engine to form at least two sets of engines that can provide Chinese-Bulgarian translation services.
  • Each set of engine combinations includes a speech recognition engine, a text translation engine and a speech synthesis engine.
  • the feature information of the engine combination includes a processing delay, a usage fee, a translation accuracy, and the like, and the voice translation device may acquire at least one of the feature information.
  • the speech translation device uses the engine combination to perform a speech translation test, and the time taken to complete a speech translation test is counted, and the statistical time is taken as the processing delay of the engine combination.
  • the voice translation device can obtain the original voice information of each language as a test template, such as "What is the weather today", "Where are you from?", "What is your name?”, and then use the original Voice information is tested for speech translation.
  • the speech translation device separately calculates the first time spent by the speech recognition engine for speech recognition, calculates the second time spent by the text translation engine for text translation, and calculates the cost of the speech synthesis engine for speech synthesis. At three times, the sum of the first time, the second time, and the third time is finally calculated, and the calculation result is taken as the time taken to complete a speech translation test.
  • the voice translation device can further filter the engine combination according to the processing delay, and filter out the engine combination that the processing delay is too long, that is, the translation speed is too slow, thereby reducing the data volume of the database, saving the storage space, and improving the operation efficiency.
  • the voice translation device compares the processing delay of the engine combination with a threshold, and determines whether the processing delay is greater than or equal to the threshold; when the processing delay is greater than or equal to the threshold, the engine combination is discarded.
  • the threshold can be set according to actual needs, or can be customized by the user.
  • the voice translation device acquires the charging standard of each engine in the engine combination, and the usage fee of the engine combination is counted according to the charging standard.
  • the charging standard includes information such as billing method and billing price.
  • the billing method of each engine is different, including time billing, billing, and word billing. Even if the billing method is the same, the billing price is not exhausted. the same.
  • the voice translation device may collect the charging standards of each engine, and establish an engine charging list as shown in FIG. 6, where H represents a charging standard of the voice recognition engine, I represents a charging standard of the text translation engine, and J represents a voice.
  • the charging standard for the synthesis engine For each engine combination, the voice translation device queries the engine toll list, obtains the charging standard of the speech recognition engine in the engine combination, the charging standard of the text translation engine, and the charging standard of the speech synthesis engine, and calculates the charging standard of the three engines. The cost of using this engine combination.
  • the voice translation device can perform a voice translation test on the original voice information by using the engine combination, and the usage fee of the test is used as the usage fee of the engine combination.
  • the voice translation device may directly calculate the usage fee of the engine combination by using the charging prices of the three engines. For example, the voice recognition engine charges h yuan / minute, the text translation engine charges i yuan / minute, the voice synthesis engine charges j yuan / minute, then the engine combination of these three engines Is (h+i+j) yuan/minute.
  • the speech translation device collects a user's score for the accuracy of the translation result of the engine combination, and then for a period of time (eg, within one month, within six months, within one year, etc.) The collected scores are counted (such as calculating the average of the scores), and the statistical results are used as the translation accuracy of the engine combination.
  • voice translation device can also obtain other feature information of the engine combination according to actual needs, and the present invention will not be repeated herein.
  • the voice translation device prioritizes the engine combination of the same translation service according to the feature information, and generates an engine combination selection set of the translation service for selection and use in subsequent translation.
  • the speech translation device can prioritize engine combinations based on a feature information. For example, the engine combination is prioritized according to the order of processing delay from low to high; the engine combination is prioritized according to the order of usage cost from low to high; in order of translation accuracy from high to low, Prioritize engine combinations.
  • the speech translation device may also prioritize engine combinations based on a combination of at least two types of feature information. For example, the engine combination is prioritized according to the standard of low processing delay and low usage cost, and the engine combination is prioritized according to the standard of low processing delay and high translation accuracy, and the usage cost is low and the translation accuracy is high.
  • the standard prioritizes engine combinations and prioritizes engine combinations based on criteria such as low processing latency, low cost of ownership, and high translation accuracy.
  • the speech translation device stores the above prioritization queue as an engine combination selection set of the translation service, and finally each translation service has its own engine combination selection set. Subsequently, the engine can be provided with better translation services according to the engine combination selection set and the needs of the user.
  • step S14 the following steps are further included:
  • the voice translation device can provide an option for the translation service for the user to select, and determine the translation service that the user needs according to the user's selection.
  • the voice translation device can also receive the information of the translation service input by the user, and determine the translation service required by the user according to the input information.
  • the voice translation device then obtains a filter condition of the user for the engine combination, and then selects a priority queue of the engine combination that meets the filter condition from the engine combination selection set for the user to select.
  • the voice translation device can provide an option for the engine combination screening condition, including low processing delay, low low usage cost, high translation accuracy, low processing delay, low usage cost, low processing delay + translation High accuracy, low cost of use + high translation accuracy, low processing delay + low cost of use + high translation accuracy.
  • the speech translation device then obtains a filter condition of the user for the engine combination, and then selects the engine combination that most matches the filter condition from the engine combination selection set.
  • the speech translation device selects the engine combination with the lowest processing delay; when the user selects the low-cost filtering condition, the speech translation device selects the engine combination with the lowest usage fee; When a screening condition with high translation accuracy is selected, the speech translation device selects the engine combination with the highest translation accuracy; when the user selects the processing time delay + low usage cost screening condition, the speech translation device takes a low processing delay + low The highest priority engine combination is selected among the priority queues of the engine combination using the fee; when the user selects the filtering condition with low processing delay and high translation accuracy, the speech translation device has low processing delay + high translation accuracy.
  • the engine combination's priority queue selects the highest priority engine combination; when the user chooses to use the low cost + high translation accuracy filter condition, the voice translation device takes the priority of the engine combination from low usage cost + high translation accuracy. Select the engine combination with the highest priority in the queue; when the user chooses to process, the delay is low + the usage fee is low + When the translation of high accurate screening conditions, the speech translation device from the low priority queue processing delay + low cost + engine using a combination of high translation accuracy in selecting the highest priority engine combination.
  • the voice translation device may automatically select an optimal engine combination for the user, or automatically select the engine combination that best meets the user's needs according to the user's usage habits or user settings (such as translation fee cap settings).
  • the embodiment of the present invention implements a method for voice translation.
  • different voice recognition engines, a text translation engine, and a voice synthesis engine are freely combined into a plurality of engine combinations, and then according to the feature information of the engine combination.
  • the translation service's engine combination prioritizes and generates an engine combination selection set for use in subsequent translations.
  • the voice translation device in the embodiment of the present invention may be a professional translation machine or a variety of terminal devices, such as mobile terminals such as mobile phones and tablets, computer terminals such as personal computers and notebook computers, and the like.
  • a specific application APP may be installed on the foregoing translation machine and the terminal device, and the method for realizing the speech translation of the embodiment of the present invention is implemented by using the application.
  • the apparatus includes a setup module 10, a combination module 20, an acquisition module 30, and a ranking module 40.
  • the setup module 10 is configured to establish a voice recognition engine set. a text translation engine set and a speech synthesis engine set; the combination module 20 is configured to, for each translation service, select a speech recognition engine supporting the translation service from the speech recognition engine set, the text translation engine set, and the speech synthesis engine, respectively.
  • a text translation engine and a speech synthesis engine and composing at least two sets of engine combinations; an acquisition module 30 configured to acquire feature information of each set of engine combinations; and a sorting module 40 configured to perform engine combination for each translation service according to the feature information Prioritization, generating a set of engine combination selections for the translation service for use in subsequent translations.
  • the establishing module 10 collects information of the speech recognition engine, establishes a speech recognition engine set, collects information of the text translation engine, establishes a text translation engine set, collects information of the speech recognition engine, and establishes a speech recognition engine set.
  • the building module 10 may combine the speech recognition engine set, the text translation engine set, and the speech synthesis engine into one engine language database.
  • the feature information of the engine combination includes a processing delay, a usage fee, a translation accuracy, and the like, and the acquiring module 30 may obtain at least one of the feature information.
  • the obtaining module 30 includes a first acquiring unit 31, and the first acquiring unit 31 is configured to acquire a processing delay of the engine combination.
  • the first obtaining unit 31 includes a translation test sub-unit 311 and a first statistical sub-unit 312, wherein: a translation test sub-unit 311 is configured to perform a speech translation test using the engine combination for each group of engine combinations.
  • the first statistic sub-unit 312 is configured to count the time taken to complete a speech translation test, and the statistic time is taken as the processing delay of the engine combination.
  • the first statistic sub-unit 312 includes a first calculation sub-unit 3121 and a second calculation sub-unit 3122, wherein: the first calculation sub-unit 3121 is configured to perform a speech recognition engine during the speech translation test.
  • the first time spent by the speech recognition calculating the second time spent by the text translation engine for text translation, calculating the third time spent by the speech synthesis engine for speech synthesis;
  • the second calculation subunit 3122 setting to calculate the first time
  • the sum of the second time and the third time, the calculation result is taken as the time taken to complete a speech translation test.
  • the obtaining module 30 further includes a second obtaining unit 32, which is configured to acquire the usage fee of the engine combination.
  • the second obtaining unit 32 includes a charging acquisition sub-unit 321 and a second unified sub-unit, wherein: the charging acquisition sub-unit 321 is configured to acquire each engine in the engine combination for each group of engine combinations.
  • the charging standard; the second statistical sub-unit 322 is configured to calculate the usage fee of the engine combination according to the charging standard of each engine.
  • the obtaining module 30 further includes a third obtaining unit 33 configured to acquire translation accuracy of the engine combination.
  • the third obtaining unit 33 includes a score collecting sub-unit 331 and a third statistical sub-unit 332, wherein: the scoring collecting sub-unit 331 is configured to collect a translation of the engine combination for each set of engine combinations. The score of the accuracy of the result; the third statistic sub-unit 332 is configured to perform statistics on the scores collected within a period of time (eg, within one month, within a half year, within one year, etc.) (eg, an average of the calculated scores), and the statistics will be The result is the translation accuracy of the engine combination.
  • a period of time eg, within one month, within a half year, within one year, etc.
  • the apparatus further includes a screening module 50 configured to further filter the engine combination according to the processing delay, and filter out The processing combination is too long, that is, the translation speed is too slow, thus reducing the amount of data in the database, saving storage space, and improving operational efficiency.
  • the screening module 50 includes a determining unit 51 and a discarding unit 52, wherein: the determining unit 51 is configured to compare the processing delay of the engine combination with a threshold, and determine whether the processing delay of the engine combination is greater than or equal to the threshold.
  • the discarding unit 52 is configured to discard the engine combination when the processing delay is greater than or equal to the threshold.
  • the apparatus further includes a recommendation module 60 configured to recommend a suitable engine combination to the user.
  • the recommendation module 60 includes a determining unit and a recommending unit, wherein: the determining unit is configured to determine a translation service required by the user; and the recommending unit is configured to recommend the engine combination to the user according to the engine combination selection set of the translation service.
  • the recommendation unit may include a condition acquisition subunit 61 and a reconciliation subunit 62, as shown in FIG. 17, wherein: the condition acquisition subunit 61 is configured to acquire a filter condition of the engine combination; and the subunit 62 is set to be a slave engine combination. Select the centralized priority queue for the engine combination that meets the filter criteria for the user to select.
  • the condition acquisition sub-unit 61 may provide an option for the filter condition of the engine combination for the user to select, and obtain the filter condition selected by the user, and the filter condition includes low processing delay, low low usage cost, high translation accuracy, low processing delay, and use. Low cost, low processing delay + high translation accuracy, low cost of use + high translation accuracy, low processing delay + low cost of use + high translation accuracy.
  • the recommendation unit may also include a condition acquisition subunit 61 and a selection subunit 63, as shown in FIG. 18, wherein: the condition acquisition subunit 61 is configured to acquire a filter condition of the engine combination; and the selection subunit 63 is set to select from the engine combination. Focus on the engine combination that best matches the filter criteria.
  • the recommendation module 60 can automatically select an optimal engine combination for the user, or automatically select the engine combination that best meets the user's needs according to the user's usage habits or user settings (such as translation fee cap settings).
  • the apparatus for speech translation and the method of speech translation provided in the above embodiments are all based on the same inventive concept. Therefore, the specific functions of the function modules/units of the specific embodiments in the apparatus for voice translation can be referred to the foregoing method embodiments, and details are not described herein again.
  • the device for implementing voice translation separately combines different voice recognition engines, a text translation engine, and a voice synthesis engine into a plurality of engine combinations for each translation service, and then according to the feature information of the engine combination for each
  • the translation service's engine combination prioritizes and generates an engine combination selection set for use in subsequent translations.
  • the invention also proposes a speech translation device comprising a memory, a processor and at least one application stored in the memory and configured to be executed by the processor, the application being configured to perform speech translation method.
  • the method comprises the steps of: establishing a speech recognition engine set, a text translation engine set, and a speech synthesis engine set; for each translation service, respectively selecting and supporting the translation from a set of a speech recognition engine set, a text translation engine set, and a speech synthesis engine
  • the speech recognition engine, the text translation engine and the speech synthesis engine of the service and form at least two sets of engine combinations; acquire feature information of each set of engine combinations; prioritize engine combinations of each translation service according to the feature information, and generate translations
  • the engine's engine combination selection set is the method for implementing the speech translation involved in the foregoing embodiment of the present invention, and details are not described herein again.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A method and device for translating speech and a speech translation apparatus. The method comprises the following steps: establishing a speech recognition engine set, a text translation engine set, and a speech synthesis engine set (S11); for each translation service, respectively selecting, from the speech recognition engine set, the text translation engine set, and the speech synthesis engine set, a speech recognition engine, a text translation engine, and a speech synthesis engine supporting a given translation service, and forming at least two engine combinations (S12); obtaining feature information of each engine combination (S13); and prioritizing the engine combinations of each translation service according to the feature information, and generating an engine combination selection set of the given translation service (S14). The present invention enables the free combination of a variety of engines, and utilizes the advantages of different engines, thereby improving the flexibility of translation, and greatly improving translation performance.

Description

实现语音翻译的方法、装置和语音翻译设备Method, device and speech translation device for realizing speech translation 技术领域Technical field
本发明涉及语音翻译技术领域,特别是涉及到一种实现语音翻译的方法、装置和语音翻译设备。 The present invention relates to the field of speech translation technology, and in particular to a method, device and speech translation device for implementing speech translation.
背景技术Background technique
随着经济的快速发展,对外交流越来越广泛,而对于许多人来说语言不通是对外交流的一大障碍。为了解决上述问题,市场上出现了各种各样的语音翻译设备。语音翻译设备凭借着强大的语言翻译功能,深受广大有语言翻译需求的人士的欢迎,同时也是人们学习外语的好帮手。语音翻译设备可以在双方对话的过程中进行翻译,使得使用不同语言的用户可以无障碍交流。With the rapid development of the economy, foreign exchanges have become more and more extensive, and for many people, language barriers are a major obstacle to foreign exchanges. In order to solve the above problems, various voice translation devices have appeared on the market. With its powerful language translation function, the voice translation device is well received by people who have language translation needs, and it is also a good helper for people to learn foreign languages. The speech translation device can be translated during the dialogue between the two parties, so that users using different languages can communicate freely.
语音翻译设备的大致翻译流程为:语音翻译设备接收用户的原始语音信息,将语音信息发送给语音翻译引擎,语音翻译引擎将原始语音信息翻译为目标语音信息(从一种语言翻译为另一种语言)并返回给语音翻译设备,语音翻译设备再输出目标语音信息。目前的语音翻译引擎主要包括谷歌引擎、微软引擎、IBM引擎、讯飞引擎、百度引擎、金山引擎等,而每个语音翻译引擎又包括语音识别引擎、文本翻译引擎和语音合成引擎,各个引擎能支持的语言种类、计费标准、处理时延、翻译准确度各不相同。The general translation process of the voice translation device is: the voice translation device receives the original voice information of the user, and sends the voice information to the voice translation engine, and the voice translation engine translates the original voice information into the target voice information (translated from one language to another) The language is returned to the speech translation device, and the speech translation device outputs the target speech information. The current speech translation engine mainly includes Google engine, Microsoft engine, IBM engine, Xunfei engine, Baidu engine, Jinshan engine, etc., and each speech translation engine includes a speech recognition engine, a text translation engine and a speech synthesis engine, and each engine can The supported language types, billing standards, processing delays, and translation accuracy vary.
然而,目前的语音翻译设备只支持单一的引擎,例如只支持百度引擎,通过百度引擎的语音识别引擎、文本翻译引擎和语音合成引擎来实现语音翻译。但百度引擎目前只能够对十余种主流语言进行翻译,而针对某些小语种则无法翻译。某些引擎或许可以翻译小语种,但在使用费用、翻译速度、翻译准确度等方面可能又不尽如人意。However, the current speech translation device only supports a single engine, for example, only supports the Baidu engine, and implements speech translation through the Baidu engine's speech recognition engine, text translation engine, and speech synthesis engine. However, the Baidu engine is currently only able to translate more than ten mainstream languages, but not for some small languages. Some engines may be able to translate small languages, but they may not be as good as the cost of use, translation speed, translation accuracy, etc.
由此可见,现有的语音翻译设备,翻译灵活性差,翻译性能较低,无法为用户提供优质的翻译服务,用户体验不佳。It can be seen that the existing speech translation equipment has poor translation flexibility and low translation performance, and cannot provide users with high-quality translation services, and the user experience is not good.
技术问题technical problem
本发明的主要目的为提供一种实现语音翻译的方法、装置和语音翻译设备,旨在提高翻译的灵活性,进而提高翻译性能。The main object of the present invention is to provide a method, device and speech translation device for implementing speech translation, which aims to improve the flexibility of translation and thereby improve translation performance.
技术解决方案Technical solution
为达以上目的,本发明实施例提出一种实现语音翻译的方法,所述方法包括以下步骤:To achieve the above objective, the embodiment of the present invention provides a method for implementing voice translation, and the method includes the following steps:
建立语音识别引擎集、文本翻译引擎集和语音合成引擎集;Establishing a speech recognition engine set, a text translation engine set, and a speech synthesis engine set;
针对每一种翻译服务,分别从所述语音识别引擎集、文本翻译引擎集和语音合成引擎集中选取支持所述翻译服务的语音识别引擎、文本翻译引擎和语音合成引擎,并组成至少两组引擎组合;For each translation service, a speech recognition engine, a text translation engine, and a speech synthesis engine supporting the translation service are respectively selected from the speech recognition engine set, the text translation engine set, and the speech synthesis engine, and at least two sets of engines are formed. combination;
获取每组引擎组合的特征信息;Obtaining feature information of each group of engine combinations;
根据所述特征信息对每一种翻译服务的引擎组合进行优先级排序,生成所述翻译服务的引擎组合选择集。The engine combination of each translation service is prioritized according to the feature information, and an engine combination selection set of the translation service is generated.
本发明实施例同时提出一种实现语音翻译的装置,所述装置包括:An embodiment of the present invention further provides an apparatus for implementing voice translation, where the apparatus includes:
建立模块,设置为建立语音识别引擎集、文本翻译引擎集和语音合成引擎集;Establishing a module, set to establish a speech recognition engine set, a text translation engine set, and a speech synthesis engine set;
组合模块,设置为针对每一种翻译服务,分别从所述语音识别引擎集、文本翻译引擎集和语音合成引擎集中选取支持所述翻译服务的语音识别引擎、文本翻译引擎和语音合成引擎,并组成至少两组引擎组合;a combination module, configured to select, from each of the speech recognition engine set, the text translation engine set, and the speech synthesis engine, a speech recognition engine, a text translation engine, and a speech synthesis engine, respectively, for each translation service, and Forming at least two sets of engine combinations;
获取模块,设置为获取每组引擎组合的特征信息;Obtaining a module, configured to obtain feature information of each group of engine combinations;
排序模块,设置为根据所述特征信息对每一种翻译服务的引擎组合进行优先级排序,生成所述翻译服务的引擎组合选择集。The sorting module is configured to prioritize engine combinations of each translation service according to the feature information, and generate an engine combination selection set of the translation service.
本发明实施例还提出一种语音翻译设备,其包括存储器、处理器和至少一个被存储在所述存储器中并被配置为由所述处理器执行的应用程序,所述应用程序被配置为用于执行前述实现语音翻译的方法。Embodiments of the present invention also provide a voice translation device including a memory, a processor, and at least one application stored in the memory and configured to be executed by the processor, the application being configured to use The foregoing method of implementing speech translation is performed.
有益效果Beneficial effect
本发明实施例所提供的一种实现语音翻译的方法,针对每一种翻译服务,将不同的语音识别引擎、文本翻译引擎和语音合成引擎自由组合成多种引擎组合,再根据引擎组合的特征信息对每一种翻译服务的引擎组合进行优先级排序,生成引擎组合选择集,以供后续翻译时选择使用。从而实现了对多种引擎的自由组合,充分利用了不同引擎的优势,提高了翻译的灵活性,进而大大提高了翻译性能,既扩展了可翻译语言的范围,又可以满足用户对翻译费用、翻译速度、翻译准确度等方面的特别需求,能够为用户提供更加优质的翻译服务,极大的提升了用户体验。A method for realizing speech translation provided by an embodiment of the present invention, for each translation service, different speech recognition engines, a text translation engine, and a speech synthesis engine are freely combined into a plurality of engine combinations, and then according to the characteristics of the engine combination The information prioritizes the engine combinations of each translation service and generates a set of engine combination selections for subsequent translations. Thereby, the free combination of multiple engines is realized, the advantages of different engines are fully utilized, the flexibility of translation is improved, and the translation performance is greatly improved, which not only expands the range of translatable languages, but also satisfies users' translation costs. The special needs of translation speed, translation accuracy, etc., can provide users with better translation services, greatly improving the user experience.
附图说明DRAWINGS
图1是本发明实现语音翻译的方法第一实施例的流程图;1 is a flow chart of a first embodiment of a method for implementing voice translation according to the present invention;
图2是本发明实施例中语音识别引擎集的示意图;2 is a schematic diagram of a speech recognition engine set in an embodiment of the present invention;
图3是本发明实施例中文本翻译引擎集的示意图;3 is a schematic diagram of a text translation engine set in an embodiment of the present invention;
图4是本发明实施例中语音合成引擎集的示意图;4 is a schematic diagram of a speech synthesis engine set in an embodiment of the present invention;
图5是本发明实施例中引擎语种数据库的示意图;5 is a schematic diagram of an engine language database in an embodiment of the present invention;
图6是本发明实施例中引擎收费列表的示意图;6 is a schematic diagram of an engine charging list in an embodiment of the present invention;
图7是本发明实现语音翻译的方法第二实施例的流程图;7 is a flow chart of a second embodiment of a method for implementing voice translation according to the present invention;
图8是本发明实现语音翻译的装置第一实施例的模块示意图;FIG. 8 is a schematic block diagram of a first embodiment of an apparatus for implementing voice translation according to the present invention; FIG.
图9是8中的获取模块的模块示意图;9 is a block diagram of an acquisition module in 8;
图10是图9中的第一获取单元的模块示意图;Figure 10 is a block diagram of the first acquisition unit of Figure 9;
图11是图10中的第一统计单元的模块示意图;Figure 11 is a block diagram of the first statistical unit of Figure 10;
图12是图9中的第二获取单元的模块示意图;Figure 12 is a block diagram of the second acquisition unit of Figure 9;
图13是图9中的第三获取单元的模块示意图;Figure 13 is a block diagram of the third acquisition unit of Figure 9;
图14是本发明实现语音翻译的装置第二实施例的模块示意图;14 is a schematic block diagram of a second embodiment of an apparatus for implementing voice translation according to the present invention;
图15是图14中的筛选模块的模块示意图;Figure 15 is a block diagram of the screening module of Figure 14;
图16是本发明实现语音翻译的装置第三实施例的模块示意图;16 is a schematic block diagram of a third embodiment of an apparatus for implementing voice translation according to the present invention;
图17是图16中的推荐模块的模块示意图;17 is a block diagram of the recommended module of FIG. 16;
图18是图16中的推荐模块的又一模块示意图。18 is another block diagram of the recommended module of FIG. 16.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features, and advantages of the present invention will be further described in conjunction with the embodiments.
本发明的最佳实施方式BEST MODE FOR CARRYING OUT THE INVENTION
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能解释为对本发明的限制。The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.
参照图1,提出本发明实现语音翻译的方法第一实施例,所述方法包括以下步骤:Referring to FIG. 1, a first embodiment of a method for implementing speech translation according to the present invention is proposed. The method includes the following steps:
S11、建立语音识别引擎集、文本翻译引擎集和语音合成引擎集。S11. Establish a speech recognition engine set, a text translation engine set, and a speech synthesis engine set.
本发明实施例中,语音翻译设备搜集语音识别引擎的信息,建立起语音识别引擎集;搜集文本翻译引擎的信息,建立起文本翻译引擎集;搜集语音识别引擎的信息,建立起语音识别引擎集。In the embodiment of the present invention, the voice translation device collects information of the voice recognition engine, establishes a voice recognition engine set, collects information of the text translation engine, establishes a text translation engine set, collects information of the voice recognition engine, and establishes a voice recognition engine set. .
语音识别引擎的信息包括引擎名称、支持的语言种类等。引擎名称包括谷歌语音识别引擎、微软语音识别引擎、IBM语音识别引擎、Nuance语音识别引擎、百度语音识别引擎等,各引擎支持的语言种类有差异性。如图2所示,为语音识别引擎集一实例,其中M代表引擎名称,X代表支持的语言种类。The information of the speech recognition engine includes the engine name, the type of language supported, and the like. The engine names include Google's speech recognition engine, Microsoft speech recognition engine, IBM speech recognition engine, Nuance speech recognition engine, Baidu speech recognition engine, etc. The languages supported by each engine are different. As shown in Figure 2, an example of a speech recognition engine set, where M represents the engine name and X represents the supported language category.
文本翻译引擎的信息包括引擎名称、支持的语言种类等。引擎名称包括谷歌翻译引擎、微软翻译引擎、IBM翻译引擎、讯飞翻译引擎、百度翻译引擎、金山翻译引擎等,各引擎支持的语言种类有差异性。如图3所示,为文本翻译引擎集一实例,其中N代表引擎名称,X代表支持的语言种类。The information of the text translation engine includes the engine name, the type of language supported, and the like. The engine name includes Google Translate Engine, Microsoft Translation Engine, IBM Translation Engine, Xunfei Translation Engine, Baidu Translation Engine, Jinshan Translation Engine, etc. The languages supported by each engine are different. As shown in Figure 3, an example of a text translation engine set, where N represents the engine name and X represents the supported language category.
语音合成引擎的信息包括引擎名称和支持的语言种类等。引擎名称包括Nuance合成引擎、微软语音合成引擎、IBM语音合成引擎、百度语音合成引擎等,各引擎支持的语言种类有差异性。如图4所示,为语音合成引擎集一实例,其K代表引擎名称,X代表支持的语言种类。The speech synthesis engine information includes the engine name and the supported language types. The engine name includes Nuance synthesis engine, Microsoft speech synthesis engine, IBM speech synthesis engine, Baidu speech synthesis engine, etc. The languages supported by each engine are different. As shown in FIG. 4, an example is a speech synthesis engine set, where K represents the engine name and X represents the supported language type.
可选地,可以将语音识别引擎集、文本翻译引擎集和语音合成引擎集合并为一个引擎语种数据库。如图5所示,为引擎语种数据库一实例。Alternatively, the speech recognition engine set, the text translation engine set, and the speech synthesis engine may be combined into one engine language database. As shown in Figure 5, it is an instance of the engine language database.
S12、针对每一种翻译服务,分别从语音识别引擎集、文本翻译引擎集和语音合成引擎集中选取支持该翻译服务的语音识别引擎、文本翻译引擎和语音合成引擎,并组成至少两组引擎组合。S12. For each translation service, select a speech recognition engine, a text translation engine, and a speech synthesis engine that support the translation service from a speech recognition engine set, a text translation engine set, and a speech synthesis engine, and form at least two sets of engine combinations. .
本发明实施例中,翻译服务即两种语言互译的服务,如中文-英文互译服务、中文-日文互译服务、日文-英文互译服务、日文-韩文互译服务、法文-英文互译服务、中文-保加利亚语互译服务等。针对每一种翻译服务,组建至少两组支持该翻译服务的语音识别+文本翻译+语音合成的引擎组合,即每组引擎组合中包含一个语音识别引擎、一个文本翻译引擎和一个语音合成引擎。In the embodiment of the present invention, the translation service is a service of mutual translation between two languages, such as Chinese-English translation service, Chinese-Japanese translation service, Japanese-English translation service, Japanese-Korean translation service, French-English mutual translation. Translation services, Chinese-Bulgarian translation services, etc. For each translation service, at least two sets of engine combinations supporting speech recognition + text translation + speech synthesis of the translation service are formed, that is, each set of engine combinations includes a speech recognition engine, a text translation engine and a speech synthesis engine.
以中文-保加利亚语互译服务为例,语音翻译设备从语音识别引擎集中选取支持中文-保加利亚语互译服务即支持中文和保加利亚语的语音识别引擎,从文本翻译引擎集中选取支持中文-保加利亚语互译服务即支持中文和保加利亚语的文本翻译引擎,从语音合成引擎集中选取支持中文-保加利亚语互译服务即支持中文和保加利亚语的语音合成引擎。然后利用选取的引擎组成至少两组能够提供中文-保加利亚语互译服务的引擎组合,每组引擎组合中包含一个语音识别引擎、一个文本翻译引擎和一个语音合成引擎。Taking the Chinese-Bulgarian translation service as an example, the speech translation device selects the Chinese-Bulgarian translation service from the speech recognition engine to support the Chinese and Bulgarian speech recognition engines, and selects Chinese-Bulgarian language from the text translation engine. The translation service supports Chinese and Bulgarian text translation engines, and selects the Chinese-Bulgarian translation service from the speech synthesis engine to support Chinese and Bulgarian speech synthesis engines. Then use the selected engine to form at least two sets of engines that can provide Chinese-Bulgarian translation services. Each set of engine combinations includes a speech recognition engine, a text translation engine and a speech synthesis engine.
S13、获取每组引擎组合的特征信息。S13. Acquire feature information of each group of engine combinations.
本发明实施例中,引擎组合的特征信息包括处理时延、使用费用、翻译准确度等,语音翻译设备可以获取其中的至少一种特征信息。In the embodiment of the present invention, the feature information of the engine combination includes a processing delay, a usage fee, a translation accuracy, and the like, and the voice translation device may acquire at least one of the feature information.
当获取处理时延时,针对每一组引擎组合,语音翻译设备利用该引擎组合进行语音翻译测试,统计完成一次语音翻译测试所耗费的时间,将统计的时间作为引擎组合的处理时延。When processing delays are acquired, for each set of engine combinations, the speech translation device uses the engine combination to perform a speech translation test, and the time taken to complete a speech translation test is counted, and the statistical time is taken as the processing delay of the engine combination.
具体实施时,语音翻译设备可以获取各个语种的原始语音信息作为测试模板,所述原始语音信息如“今天天气怎样”、“你是哪里人”、“你叫什么名字”等句子,然后利用原始语音信息进行语音翻译测试。在语音翻译测试过程中,语音翻译设备分别计算语音识别引擎进行语音识别所耗费的第一时间,计算文本翻译引擎进行文本翻译所耗费的第二时间,计算语音合成引擎进行语音合成所耗费的第三时间,最后计算第一时间、第二时间和第三时间之和,将计算结果作为完成一次语音翻译测试所耗费的时间。In a specific implementation, the voice translation device can obtain the original voice information of each language as a test template, such as "What is the weather today", "Where are you from?", "What is your name?", and then use the original Voice information is tested for speech translation. In the speech translation test process, the speech translation device separately calculates the first time spent by the speech recognition engine for speech recognition, calculates the second time spent by the text translation engine for text translation, and calculates the cost of the speech synthesis engine for speech synthesis. At three times, the sum of the first time, the second time, and the third time is finally calculated, and the calculation result is taken as the time taken to complete a speech translation test.
以测试支持中文-英文互译服务的一个引擎组合为例:语音翻译设备将中文原始语音信息发送给引擎组合中的语音识别引擎,经过该语音识别引擎识别处理后,得到中文文本信息(如字符串),并计算整个处理时间t1;语音翻译设备将中文文本信息发送给引擎组合中的文本翻译引擎,经过该文本翻译引擎翻译处理后,得到英文文本信息,并计算整个处理时间t2;语音翻译设备将英文文本信息发送给语音合成引擎,经过语音合成引擎合成处理后,得到英文语音信息,并计算整个处理时间t3;最后,语音翻译设备计算整个识别+翻译+合成的时间T=t1+t2+t3。T即为该引擎组合的处理时延。For example, an engine combination that supports the Chinese-English translation service is used: the speech translation device sends the Chinese original speech information to the speech recognition engine in the engine combination, and after the speech recognition engine recognizes and processes, the Chinese text information (such as characters) is obtained. String), and calculate the entire processing time t1; the speech translation device sends the Chinese text information to the text translation engine in the engine combination, after the text translation engine translates the processing, the English text information is obtained, and the entire processing time t2 is calculated; the speech translation The device sends the English text information to the speech synthesis engine, and after synthesizing and processing by the speech synthesis engine, the English speech information is obtained, and the whole processing time t3 is calculated; finally, the speech translation device calculates the entire recognition + translation + synthesis time T=t1+t2 +t3. T is the processing delay of the engine combination.
进一步地,语音翻译设备还可以根据处理时延对引擎组合进行进一步筛选,过滤掉处理时延太长即翻译速度太慢的引擎组合,从而减少数据库的数据量,节省存储空间,提高运行效率。Further, the voice translation device can further filter the engine combination according to the processing delay, and filter out the engine combination that the processing delay is too long, that is, the translation speed is too slow, thereby reducing the data volume of the database, saving the storage space, and improving the operation efficiency.
具体的,语音翻译设备将引擎组合的处理时延与阈值进行比较,判断其处理时延是否大于或等于阈值;当处理时延大于或等于阈值时,则丢弃该引擎组合。阈值可以根据实际需要设定,也可以由用户自定义设置。Specifically, the voice translation device compares the processing delay of the engine combination with a threshold, and determines whether the processing delay is greater than or equal to the threshold; when the processing delay is greater than or equal to the threshold, the engine combination is discarded. The threshold can be set according to actual needs, or can be customized by the user.
当获取使用费用时,针对每一种引擎组合,语音翻译设备获取该引擎组合中每个引擎的收费标准,根据收费标准统计出该引擎组合的使用费用。收费标准包含计费方式、计费价格等信息,每个引擎的计费方式不尽相同,主要包括时间计费、次数计费、字数计费等,即使计费方式相同计费价格也不尽相同。When the usage fee is obtained, for each engine combination, the voice translation device acquires the charging standard of each engine in the engine combination, and the usage fee of the engine combination is counted according to the charging standard. The charging standard includes information such as billing method and billing price. The billing method of each engine is different, including time billing, billing, and word billing. Even if the billing method is the same, the billing price is not exhausted. the same.
具体实施时,语音翻译设备可以搜集各个引擎的收费标准,并建立如图6所示的引擎收费列表,其中,H代表语音识别引擎的收费标准,I代表文本翻译引擎的收费标准,J代表语音合成引擎的收费标准。针对每一种引擎组合,语音翻译设备查询引擎收费列表,得到该引擎组合中语音识别引擎的收费标准、文本翻译引擎的收费标准和语音合成引擎的收费标准,综合三种引擎的收费标准计算出该引擎组合的使用费用。In a specific implementation, the voice translation device may collect the charging standards of each engine, and establish an engine charging list as shown in FIG. 6, where H represents a charging standard of the voice recognition engine, I represents a charging standard of the text translation engine, and J represents a voice. The charging standard for the synthesis engine. For each engine combination, the voice translation device queries the engine toll list, obtains the charging standard of the speech recognition engine in the engine combination, the charging standard of the text translation engine, and the charging standard of the speech synthesis engine, and calculates the charging standard of the three engines. The cost of using this engine combination.
语音翻译设备可以利用引擎组合对原始语音信息进行语音翻译测试,将本次测试的使用费用作为该引擎组合的使用费用。可选地,当引擎组合中三个引擎的计费方式相同时,语音翻译设备也可以直接利用三个引擎的计费价格计算出该引擎组合的使用费用。例如:语音识别引擎的收费标准为h元/分钟,文本翻译引擎的收费标准为i元/分钟,语音合成引擎的收费标准为j元/分钟,则这三种引擎组成的引擎组合的使用费用为(h+i+j)元/分钟。The voice translation device can perform a voice translation test on the original voice information by using the engine combination, and the usage fee of the test is used as the usage fee of the engine combination. Optionally, when the charging modes of the three engines in the engine combination are the same, the voice translation device may directly calculate the usage fee of the engine combination by using the charging prices of the three engines. For example, the voice recognition engine charges h yuan / minute, the text translation engine charges i yuan / minute, the voice synthesis engine charges j yuan / minute, then the engine combination of these three engines Is (h+i+j) yuan/minute.
当获取翻译准确度时,针对每一组引擎组合,语音翻译设备搜集用户对该引擎组合的翻译结果的准确度的评分,然后对一段时间内(如一个月内、半年内、一年内等)搜集到的评分进行统计(如计算评分的平均值),将统计结果作为该引擎组合的翻译准确度。When obtaining translation accuracy, for each set of engine combinations, the speech translation device collects a user's score for the accuracy of the translation result of the engine combination, and then for a period of time (eg, within one month, within six months, within one year, etc.) The collected scores are counted (such as calculating the average of the scores), and the statistical results are used as the translation accuracy of the engine combination.
本领域技术人员可以理解,语音翻译设备还可以根据实际需要获取引擎组合的其它特征信息,本发明在此不再一一列举赘述。It can be understood by those skilled in the art that the voice translation device can also obtain other feature information of the engine combination according to actual needs, and the present invention will not be repeated herein.
S14、根据所述特征信息对每一种翻译服务的引擎组合进行优先级排序,生成翻译服务的引擎组合选择集。S14. Prioritize the engine combination of each translation service according to the feature information, and generate an engine combination selection set of the translation service.
本发明实施例中,语音翻译设备根据特征信息对同一种翻译服务的引擎组合进行优先级排序,生成翻译服务的引擎组合选择集,以供后续翻译时选择使用。In the embodiment of the present invention, the voice translation device prioritizes the engine combination of the same translation service according to the feature information, and generates an engine combination selection set of the translation service for selection and use in subsequent translation.
语音翻译设备可以根据一种特征信息对引擎组合进行优先级排序。例如:按照处理时延由低到高的顺序,对引擎组合进行优先级排序;按照使用费用由低到高的顺序,对引擎组合进行优先级排序;按照翻译准确度由高到低的顺序,对引擎组合进行优先级排序。The speech translation device can prioritize engine combinations based on a feature information. For example, the engine combination is prioritized according to the order of processing delay from low to high; the engine combination is prioritized according to the order of usage cost from low to high; in order of translation accuracy from high to low, Prioritize engine combinations.
语音翻译设备也可以根据至少两种特征信息的组合对引擎组合进行优先级排序。例如:按照处理时延低且使用费用低的标准对引擎组合进行优先级排序,按照处理时延低且翻译准确度高的标准对引擎组合进行优先级排序,按照使用费用低且翻译准确度高的标准对引擎组合进行优先级排序,按照处理时延低、使用费用低且翻译准确度高的标准对引擎组合进行优先级排序。The speech translation device may also prioritize engine combinations based on a combination of at least two types of feature information. For example, the engine combination is prioritized according to the standard of low processing delay and low usage cost, and the engine combination is prioritized according to the standard of low processing delay and high translation accuracy, and the usage cost is low and the translation accuracy is high. The standard prioritizes engine combinations and prioritizes engine combinations based on criteria such as low processing latency, low cost of ownership, and high translation accuracy.
语音翻译设备将上述优先级排序队列存储为该翻译服务的引擎组合选择集,最终每一种翻译服务都有自己的引擎组合选择集。后续则可以根据该引擎组合选择集以及用户的需求为用户提供更加优质的翻译服务。The speech translation device stores the above prioritization queue as an engine combination selection set of the translation service, and finally each translation service has its own engine combination selection set. Subsequently, the engine can be provided with better translation services according to the engine combination selection set and the needs of the user.
进一步,如图7所示,在本发明的实现语音翻译的方法第二实施例中,步骤S14之后还包括以下步骤:Further, as shown in FIG. 7, in the second embodiment of the method for implementing voice translation of the present invention, after step S14, the following steps are further included:
S15、确定用户需要的翻译服务。S15. Determine a translation service required by the user.
语音翻译设备可以提供翻译服务的选项供用户选择,根据用户的选择确定用户需要的翻译服务。语音翻译设备也可以接收用户输入的翻译服务的信息,根据输入的信息确定用户需要的翻译服务。The voice translation device can provide an option for the translation service for the user to select, and determine the translation service that the user needs according to the user's selection. The voice translation device can also receive the information of the translation service input by the user, and determine the translation service required by the user according to the input information.
S16、根据该翻译服务的引擎组合选择集向用户推荐引擎组合。S16. Recommend an engine combination to the user according to the engine combination selection set of the translation service.
可选地,语音翻译设备接着获取用户对引擎组合的筛选条件,然后从引擎组合选择集中调取符合筛选条件的引擎组合的优先级队列供用户选择。Optionally, the voice translation device then obtains a filter condition of the user for the engine combination, and then selects a priority queue of the engine combination that meets the filter condition from the engine combination selection set for the user to select.
语音翻译设备可以提供引擎组合的筛选条件的选项供用户选择,该筛选条件包括处理时延低、低使用费用低、翻译准确度高、处理时延低+使用费用低、处理时延低+翻译准确度高、使用费用低+翻译准确度高、处理时延低+使用费用低+翻译准确度高等。The voice translation device can provide an option for the engine combination screening condition, including low processing delay, low low usage cost, high translation accuracy, low processing delay, low usage cost, low processing delay + translation High accuracy, low cost of use + high translation accuracy, low processing delay + low cost of use + high translation accuracy.
可选地,语音翻译设备接着获取用户对引擎组合的筛选条件,然后从引擎组合选择集中选取最符合筛选条件的引擎组合。Optionally, the speech translation device then obtains a filter condition of the user for the engine combination, and then selects the engine combination that most matches the filter condition from the engine combination selection set.
当用户选择处理时延低的筛选条件时,语音翻译设备则选取处理时延最低的引擎组合;当用户选择使用费用低的筛选条件时,语音翻译设备则选取使用费用最低的引擎组合;当用户选择翻译准确度高的筛选条件时,语音翻译设备则选取翻译准确度最高的引擎组合;当用户选择处理时延低+使用费用低的筛选条件时,语音翻译设备则从低处理时延+低使用费用的引擎组合的优先级队列中选取优先级最高的引擎组合;当用户选择处理时延低+翻译准确度高的筛选条件时,语音翻译设备则从低处理时延+高翻译准确度的引擎组合的优先级队列中选取优先级最高的引擎组合;当用户选择使用费用低+翻译准确度高的筛选条件时,语音翻译设备则从低使用费用+高翻译准确度的引擎组合的优先级队列中选取优先级最高的引擎组合;当用户选择处理时延低+使用费用低+翻译准确度高的筛选条件时,语音翻译设备则从低处理时延+低使用费用+高翻译准确度的引擎组合的优先级队列中选取优先级最高的引擎组合。When the user selects a screening condition that delays the processing time, the speech translation device selects the engine combination with the lowest processing delay; when the user selects the low-cost filtering condition, the speech translation device selects the engine combination with the lowest usage fee; When a screening condition with high translation accuracy is selected, the speech translation device selects the engine combination with the highest translation accuracy; when the user selects the processing time delay + low usage cost screening condition, the speech translation device takes a low processing delay + low The highest priority engine combination is selected among the priority queues of the engine combination using the fee; when the user selects the filtering condition with low processing delay and high translation accuracy, the speech translation device has low processing delay + high translation accuracy. The engine combination's priority queue selects the highest priority engine combination; when the user chooses to use the low cost + high translation accuracy filter condition, the voice translation device takes the priority of the engine combination from low usage cost + high translation accuracy. Select the engine combination with the highest priority in the queue; when the user chooses to process, the delay is low + the usage fee is low + When the translation of high accurate screening conditions, the speech translation device from the low priority queue processing delay + low cost + engine using a combination of high translation accuracy in selecting the highest priority engine combination.
在其它实施例中,语音翻译设备可以自动为用户选取最优的引擎组合,也可以根据用户的使用习惯或用户设置(如翻译费用上限设置等)自动为用户选取最符合用户需求的引擎组合。In other embodiments, the voice translation device may automatically select an optimal engine combination for the user, or automatically select the engine combination that best meets the user's needs according to the user's usage habits or user settings (such as translation fee cap settings).
本发明实施例实现语音翻译的方法,针对每一种翻译服务,将不同的语音识别引擎、文本翻译引擎和语音合成引擎自由组合成多种引擎组合,再根据引擎组合的特征信息对每一种翻译服务的引擎组合进行优先级排序,生成引擎组合选择集,以供后续翻译时选择使用。从而实现了对多种引擎的自由组合,充分利用了不同引擎的优势,提高了翻译的灵活性,进而大大提高了翻译性能,既扩展了可翻译语言的范围,又可以满足用户对翻译费用、翻译速度、翻译准确度等方面的特别需求,能够为用户提供更加优质的翻译服务,极大的提升了用户体验。The embodiment of the present invention implements a method for voice translation. For each translation service, different voice recognition engines, a text translation engine, and a voice synthesis engine are freely combined into a plurality of engine combinations, and then according to the feature information of the engine combination. The translation service's engine combination prioritizes and generates an engine combination selection set for use in subsequent translations. Thereby, the free combination of multiple engines is realized, the advantages of different engines are fully utilized, the flexibility of translation is improved, and the translation performance is greatly improved, which not only expands the range of translatable languages, but also satisfies users' translation costs. The special needs of translation speed, translation accuracy, etc., can provide users with better translation services, greatly improving the user experience.
本发明实施例所述的语音翻译设备,可以是专业的翻译机,也可以各种终端设备,所述终端设备如手机、平板等移动终端,个人电脑、笔记本电脑等计算机终端等。可以在前述翻译机和终端设备上安装特定的应用(APP),通过应用实现本发明实施例的实现语音翻译的方法。The voice translation device in the embodiment of the present invention may be a professional translation machine or a variety of terminal devices, such as mobile terminals such as mobile phones and tablets, computer terminals such as personal computers and notebook computers, and the like. A specific application (APP) may be installed on the foregoing translation machine and the terminal device, and the method for realizing the speech translation of the embodiment of the present invention is implemented by using the application.
参照图8,提出本发明实现语音翻译的装置第一实施例,所述装置包括建立模块10、组合模块20、获取模块30和排序模块40,其中:建立模块10,设置为建立语音识别引擎集、文本翻译引擎集和语音合成引擎集;组合模块20,设置为针对每一种翻译服务,分别从语音识别引擎集、文本翻译引擎集和语音合成引擎集中选取支持该翻译服务的语音识别引擎、文本翻译引擎和语音合成引擎,并组成至少两组引擎组合;获取模块30,设置为获取每组引擎组合的特征信息;排序模块40,设置为根据特征信息对每一种翻译服务的引擎组合进行优先级排序,生成翻译服务的引擎组合选择集,以供后续翻译时选择使用。Referring to FIG. 8, a first embodiment of an apparatus for implementing voice translation according to the present invention is provided. The apparatus includes a setup module 10, a combination module 20, an acquisition module 30, and a ranking module 40. The setup module 10 is configured to establish a voice recognition engine set. a text translation engine set and a speech synthesis engine set; the combination module 20 is configured to, for each translation service, select a speech recognition engine supporting the translation service from the speech recognition engine set, the text translation engine set, and the speech synthesis engine, respectively. a text translation engine and a speech synthesis engine, and composing at least two sets of engine combinations; an acquisition module 30 configured to acquire feature information of each set of engine combinations; and a sorting module 40 configured to perform engine combination for each translation service according to the feature information Prioritization, generating a set of engine combination selections for the translation service for use in subsequent translations.
本发明实施例中,建立模块10搜集语音识别引擎的信息,建立起语音识别引擎集;搜集文本翻译引擎的信息,建立起文本翻译引擎集;搜集语音识别引擎的信息,建立起语音识别引擎集。In the embodiment of the present invention, the establishing module 10 collects information of the speech recognition engine, establishes a speech recognition engine set, collects information of the text translation engine, establishes a text translation engine set, collects information of the speech recognition engine, and establishes a speech recognition engine set. .
可选地,建立模块10可以将语音识别引擎集、文本翻译引擎集和语音合成引擎集合并为一个引擎语种数据库。Alternatively, the building module 10 may combine the speech recognition engine set, the text translation engine set, and the speech synthesis engine into one engine language database.
本发明实施例中,引擎组合的特征信息包括处理时延、使用费用、翻译准确度等,获取模块30可以获取其中的至少一种特征信息。In the embodiment of the present invention, the feature information of the engine combination includes a processing delay, a usage fee, a translation accuracy, and the like, and the acquiring module 30 may obtain at least one of the feature information.
如图9所示,本发明实施例中,获取模块30包括第一获取单元31,该第一获取单元31设置为获取引擎组合的处理时延。As shown in FIG. 9, in the embodiment of the present invention, the obtaining module 30 includes a first acquiring unit 31, and the first acquiring unit 31 is configured to acquire a processing delay of the engine combination.
如图10所示,第一获取单元31包括翻译测试子单元311和第一统计子单元312,其中:翻译测试子单元311,设置为针对每一组引擎组合,利用该引擎组合进行语音翻译测试;第一统计子单元312,设置为统计完成一次语音翻译测试所耗费的时间,将统计的时间作为该引擎组合的处理时延。As shown in FIG. 10, the first obtaining unit 31 includes a translation test sub-unit 311 and a first statistical sub-unit 312, wherein: a translation test sub-unit 311 is configured to perform a speech translation test using the engine combination for each group of engine combinations. The first statistic sub-unit 312 is configured to count the time taken to complete a speech translation test, and the statistic time is taken as the processing delay of the engine combination.
第一统计子单元312如图11所示,包括第一计算子单元3121和第二计算子单元3122,其中:第一计算子单元3121,设置为在语音翻译测试过程中,计算语音识别引擎进行语音识别所耗费的第一时间,计算文本翻译引擎进行文本翻译所耗费的第二时间,计算语音合成引擎进行语音合成所耗费的第三时间;第二计算子单元3122,设置为计算第一时间、第二时间和第三时间之和,将计算结果作为完成一次语音翻译测试所耗费的时间。As shown in FIG. 11, the first statistic sub-unit 312 includes a first calculation sub-unit 3121 and a second calculation sub-unit 3122, wherein: the first calculation sub-unit 3121 is configured to perform a speech recognition engine during the speech translation test. The first time spent by the speech recognition, calculating the second time spent by the text translation engine for text translation, calculating the third time spent by the speech synthesis engine for speech synthesis; the second calculation subunit 3122, setting to calculate the first time The sum of the second time and the third time, the calculation result is taken as the time taken to complete a speech translation test.
进一步地,获取模块30还包括第二获取单元32,该第二获取单元32设置为获取引擎组合的使用费用。Further, the obtaining module 30 further includes a second obtaining unit 32, which is configured to acquire the usage fee of the engine combination.
如图12所示,第二获取单元32包括收费获取子单元321和第二统子计单元,其中:收费获取子单元321,设置为针对每一组引擎组合,获取该引擎组合中每个引擎的收费标准;第二统计子单元322,设置为根据每个引擎的收费标准统计出引擎组合的使用费用。 As shown in FIG. 12, the second obtaining unit 32 includes a charging acquisition sub-unit 321 and a second unified sub-unit, wherein: the charging acquisition sub-unit 321 is configured to acquire each engine in the engine combination for each group of engine combinations. The charging standard; the second statistical sub-unit 322 is configured to calculate the usage fee of the engine combination according to the charging standard of each engine.
进一步地,获取模块30还包括第三获取单元33,该第三获取单元33设置为获取引擎组合的翻译准确度。Further, the obtaining module 30 further includes a third obtaining unit 33 configured to acquire translation accuracy of the engine combination.
如图13所示,第三获取单元33包括评分搜集子单元331和第三统计子单元332,其中:评分搜集子单元331,设置为针对每一组引擎组合,搜集用户对该引擎组合的翻译结果的准确度的评分;第三统计子单元332,设置为对一段时间内(如一个月内、半年内、一年内等)搜集到的评分进行统计(如计算评分的平均值),将统计结果作为该引擎组合的翻译准确度。As shown in FIG. 13, the third obtaining unit 33 includes a score collecting sub-unit 331 and a third statistical sub-unit 332, wherein: the scoring collecting sub-unit 331 is configured to collect a translation of the engine combination for each set of engine combinations. The score of the accuracy of the result; the third statistic sub-unit 332 is configured to perform statistics on the scores collected within a period of time (eg, within one month, within a half year, within one year, etc.) (eg, an average of the calculated scores), and the statistics will be The result is the translation accuracy of the engine combination.
进一步地,如图14所示,在本发明实现语音翻译的装置第二实施例中,该装置还包括筛选模块50,该筛选模块50设置为根据处理时延对引擎组合进行进一步筛选,过滤掉处理时延太长即翻译速度太慢的引擎组合,从而减少数据库的数据量,节省存储空间,提高运行效率。Further, as shown in FIG. 14, in the second embodiment of the apparatus for implementing voice translation according to the present invention, the apparatus further includes a screening module 50 configured to further filter the engine combination according to the processing delay, and filter out The processing combination is too long, that is, the translation speed is too slow, thus reducing the amount of data in the database, saving storage space, and improving operational efficiency.
如图15所示,筛选模块50包括判断单元51和丢弃单元52,其中:判断单元51,设置为将引擎组合的处理时延与阈值进行比较,判断引擎组合的处理时延是否大于或等于阈值;丢弃单元52,设置为当处理时延大于或等于阈值时,丢弃该引擎组合。As shown in FIG. 15, the screening module 50 includes a determining unit 51 and a discarding unit 52, wherein: the determining unit 51 is configured to compare the processing delay of the engine combination with a threshold, and determine whether the processing delay of the engine combination is greater than or equal to the threshold. The discarding unit 52 is configured to discard the engine combination when the processing delay is greater than or equal to the threshold.
进一步地,如图16所示,在本发明实现语音翻译的装置第三实施例中,该装置还包括推荐模块60,该推荐模块60设置为向用户推荐合适的引擎组合。Further, as shown in FIG. 16, in a third embodiment of the apparatus for implementing speech translation according to the present invention, the apparatus further includes a recommendation module 60 configured to recommend a suitable engine combination to the user.
本发明实施例中,推荐模块60包括确定单元和推荐单元,其中:确定单元,设置为确定用户需要的翻译服务;推荐单元,设置为根据翻译服务的引擎组合选择集向用户推荐引擎组合。In the embodiment of the present invention, the recommendation module 60 includes a determining unit and a recommending unit, wherein: the determining unit is configured to determine a translation service required by the user; and the recommending unit is configured to recommend the engine combination to the user according to the engine combination selection set of the translation service.
推荐单元可以如图17所示,包括条件获取子单元61和调取子单元62,其中:条件获取子单元61,设置为获取引擎组合的筛选条件;调取子单元62,设置为从引擎组合选择集中调取符合筛选条件的引擎组合的优先级队列供用户选择。The recommendation unit may include a condition acquisition subunit 61 and a reconciliation subunit 62, as shown in FIG. 17, wherein: the condition acquisition subunit 61 is configured to acquire a filter condition of the engine combination; and the subunit 62 is set to be a slave engine combination. Select the centralized priority queue for the engine combination that meets the filter criteria for the user to select.
条件获取子单元61可以提供引擎组合的筛选条件的选项供用户选择,获取用户选择的筛选条件,该筛选条件包括处理时延低、低使用费用低、翻译准确度高、处理时延低+使用费用低、处理时延低+翻译准确度高、使用费用低+翻译准确度高、处理时延低+使用费用低+翻译准确度高等。The condition acquisition sub-unit 61 may provide an option for the filter condition of the engine combination for the user to select, and obtain the filter condition selected by the user, and the filter condition includes low processing delay, low low usage cost, high translation accuracy, low processing delay, and use. Low cost, low processing delay + high translation accuracy, low cost of use + high translation accuracy, low processing delay + low cost of use + high translation accuracy.
推荐单元也可以如图18所示,包括条件获取子单元61和选取子单元63,其中:条件获取子单元61,设置为获取引擎组合的筛选条件;选取子单元63,设置为从引擎组合选择集中选取最符合筛选条件的引擎组合。The recommendation unit may also include a condition acquisition subunit 61 and a selection subunit 63, as shown in FIG. 18, wherein: the condition acquisition subunit 61 is configured to acquire a filter condition of the engine combination; and the selection subunit 63 is set to select from the engine combination. Focus on the engine combination that best matches the filter criteria.
在其它实施例中,推荐模块60可以自动为用户选取最优的引擎组合,也可以根据用户的使用习惯或用户设置(如翻译费用上限设置等)自动为用户选取最符合用户需求的引擎组合。In other embodiments, the recommendation module 60 can automatically select an optimal engine combination for the user, or automatically select the engine combination that best meets the user's needs according to the user's usage habits or user settings (such as translation fee cap settings).
上述实施例中提供的语音翻译的装置和语音翻译的方法均是基于相同的发明构思。因此,语音翻译的装置中各个具体实施例的功能模块/单元的具体的功能可以参见前述方法实施例,在此不再赘述。The apparatus for speech translation and the method of speech translation provided in the above embodiments are all based on the same inventive concept. Therefore, the specific functions of the function modules/units of the specific embodiments in the apparatus for voice translation can be referred to the foregoing method embodiments, and details are not described herein again.
本发明实施例实现语音翻译的装置,针对每一种翻译服务,将不同的语音识别引擎、文本翻译引擎和语音合成引擎自由组合成多种引擎组合,再根据引擎组合的特征信息对每一种翻译服务的引擎组合进行优先级排序,生成引擎组合选择集,以供后续翻译时选择使用。从而实现了对多种引擎的自由组合,充分利用了不同引擎的优势,提高了翻译的灵活性,进而大大提高了翻译性能,既扩展了可翻译语言的范围,又可以满足用户对翻译费用、翻译速度、翻译准确度等方面的特别需求,能够为用户提供更加优质的翻译服务,极大的提升了用户体验。The device for implementing voice translation according to an embodiment of the present invention separately combines different voice recognition engines, a text translation engine, and a voice synthesis engine into a plurality of engine combinations for each translation service, and then according to the feature information of the engine combination for each The translation service's engine combination prioritizes and generates an engine combination selection set for use in subsequent translations. Thereby, the free combination of multiple engines is realized, the advantages of different engines are fully utilized, the flexibility of translation is improved, and the translation performance is greatly improved, which not only expands the range of translatable languages, but also satisfies users' translation costs. The special needs of translation speed, translation accuracy, etc., can provide users with better translation services, greatly improving the user experience.
本发明同时提出一种语音翻译设备,其包括存储器、处理器和至少一个被存储在存储器中并被配置为由处理器执行的应用程序,所述应用程序被配置为用于执行实现语音翻译的方法。所述方法包括以下步骤:建立语音识别引擎集、文本翻译引擎集和语音合成引擎集;针对每一种翻译服务,分别从语音识别引擎集、文本翻译引擎集和语音合成引擎集中选取支持该翻译服务的语音识别引擎、文本翻译引擎和语音合成引擎,并组成至少两组引擎组合;获取每组引擎组合的特征信息;根据特征信息对每一种翻译服务的引擎组合进行优先级排序,生成翻译服务的引擎组合选择集。本实施例中所描述的实现语音翻译的方法为本发明中上述实施例所涉及的实现语音翻译的方法,在此不再赘述。The invention also proposes a speech translation device comprising a memory, a processor and at least one application stored in the memory and configured to be executed by the processor, the application being configured to perform speech translation method. The method comprises the steps of: establishing a speech recognition engine set, a text translation engine set, and a speech synthesis engine set; for each translation service, respectively selecting and supporting the translation from a set of a speech recognition engine set, a text translation engine set, and a speech synthesis engine The speech recognition engine, the text translation engine and the speech synthesis engine of the service, and form at least two sets of engine combinations; acquire feature information of each set of engine combinations; prioritize engine combinations of each translation service according to the feature information, and generate translations The engine's engine combination selection set. The method for implementing the speech translation described in this embodiment is the method for implementing the speech translation involved in the foregoing embodiment of the present invention, and details are not described herein again.
以上参照附图说明了本发明的优选实施例,并非因此局限本发明的权利范围。本领域技术人员不脱离本发明的范围和实质,可以有多种变型方案实现本发明,比如作为一个实施例的特征可用于另一实施例而得到又一实施例。凡在运用本发明的技术构思之内所作的任何修改、等同替换和改进,均应在本发明的权利范围之内。The preferred embodiments of the present invention have been described above with reference to the drawings, and are not intended to limit the scope of the invention. A person skilled in the art can implement the invention in various variants without departing from the scope and spirit of the invention. For example, the features of one embodiment can be used in another embodiment to obtain a further embodiment. Any modifications, equivalent substitutions and improvements made within the technical concept of the invention are intended to be included within the scope of the invention.

Claims (20)

  1. 一种实现语音翻译的方法,包括以下步骤:A method for implementing speech translation, comprising the following steps:
    建立语音识别引擎集、文本翻译引擎集和语音合成引擎集;Establishing a speech recognition engine set, a text translation engine set, and a speech synthesis engine set;
    针对每一种翻译服务,分别从所述语音识别引擎集、所述文本翻译引擎集和所述语音合成引擎集中选取支持所述翻译服务的语音识别引擎、文本翻译引擎和语音合成引擎,并组成至少两组引擎组合;For each translation service, a speech recognition engine, a text translation engine, and a speech synthesis engine supporting the translation service are respectively selected from the speech recognition engine set, the text translation engine set, and the speech synthesis engine, and are respectively composed of At least two sets of engine combinations;
    获取每组引擎组合的特征信息;Obtaining feature information of each group of engine combinations;
    根据所述特征信息对每一种翻译服务的引擎组合进行优先级排序,生成所述翻译服务的引擎组合选择集。The engine combination of each translation service is prioritized according to the feature information, and an engine combination selection set of the translation service is generated.
  2. 根据权利要求1所述的实现语音翻译的方法,其中,所述特征信息包括处理时延,所述获取每组引擎组合的特征信息的步骤包括: The method for implementing speech translation according to claim 1, wherein the feature information comprises a processing delay, and the step of acquiring feature information of each set of engine combinations comprises:
    针对每一组引擎组合,利用所述引擎组合进行语音翻译测试;Performing a speech translation test using the engine combination for each set of engine combinations;
    统计完成一次语音翻译测试所耗费的时间,将统计的时间作为所述引擎组合的处理时延。The time taken to complete a speech translation test is counted, and the time counted is used as the processing delay of the engine combination.
  3. 根据权利要求1所述的实现语音翻译的方法,其中,所述特征信息包括使用费用,所述获取每组引擎组合的特征信息的步骤包括:The method for implementing speech translation according to claim 1, wherein the feature information comprises a usage fee, and the step of acquiring feature information of each set of engine combinations comprises:
    针对每一组引擎组合,获取所述引擎组合中每个引擎的收费标准;Obtaining a charging standard for each engine in the engine combination for each set of engine combinations;
    根据所述收费标准统计出所述引擎组合的使用费用。The usage fee of the engine combination is counted according to the charging standard.
  4. 根据权利要求1所述的实现语音翻译的方法,其中,所述特征信息包括翻译准确度,所述获取每组引擎组合的特征信息的步骤包括: The method for implementing speech translation according to claim 1, wherein the feature information comprises translation accuracy, and the step of acquiring feature information of each set of engine combinations comprises:
    针对每一组引擎组合,搜集用户对所述引擎组合的翻译结果的准确度的评分;For each set of engine combinations, collecting a user's rating of the accuracy of the translation results of the engine combination;
    对所述评分进行统计,将统计结果作为所述引擎组合的翻译准确度。The score is counted, and the statistical result is used as the translation accuracy of the engine combination.
  5. 根据权利要求1所述的实现语音翻译的方法,其中,所述特征信息至少有两种,所述根据所述特征信息对每一种翻译服务的引擎组合进行优先级排序的步骤包括:The method for implementing speech translation according to claim 1, wherein the feature information has at least two types, and the step of prioritizing engine combinations of each translation service according to the feature information comprises:
    根据至少两种特征信息的组合对所述引擎组合进行优先级排序。The engine combination is prioritized according to a combination of at least two pieces of feature information.
  6. 根据权利要求2所述的实现语音翻译的方法,其中,所述统计完成一次语音翻译测试所耗费的时间的步骤包括:The method for implementing speech translation according to claim 2, wherein the step of counting the time taken to complete a speech translation test comprises:
    在语音翻译测试过程中,计算语音识别引擎进行语音识别所耗费的第一时间,计算文本翻译引擎进行文本翻译所耗费的第二时间,计算语音合成引擎进行语音合成所耗费的第三时间;In the speech translation test process, calculating the first time spent by the speech recognition engine for speech recognition, calculating the second time spent by the text translation engine for text translation, and calculating the third time spent by the speech synthesis engine for speech synthesis;
    计算第一时间、第二时间和第三时间之和,将计算结果作为完成一次语音翻译测试所耗费的时间。The sum of the first time, the second time, and the third time is calculated, and the calculation result is taken as the time taken to complete a speech translation test.
  7. 根据权利要求2所述的实现语音翻译的方法,其中,所述获取每组引擎组合的特征信息的步骤之后还包括:The method for implementing speech translation according to claim 2, wherein the step of acquiring feature information of each set of engine combinations further comprises:
    判断所述引擎组合的处理时延是否大于或等于阈值;Determining whether the processing delay of the engine combination is greater than or equal to a threshold;
    当所述处理时延大于或等于阈值时,丢弃所述引擎组合。When the processing delay is greater than or equal to a threshold, the engine combination is discarded.
  8. 根据权利要求1所述的实现语音翻译的方法,其中,所述生成所述翻译服务的引擎组合选择集的步骤之后还包括:The method for implementing a speech translation according to claim 1, wherein the step of generating the engine combination selection set of the translation service further comprises:
    确定用户需要的翻译服务;Identify the translation services that users need;
    根据所述翻译服务的引擎组合选择集向用户推荐引擎组合。The engine combination is recommended to the user according to the engine combination selection set of the translation service.
  9. 根据权利要求8所述的实现语音翻译的方法,其中,所述根据所述翻译服务的引擎组合选择集向用户推荐引擎组合的步骤包括:The method for implementing speech translation according to claim 8, wherein the step of recommending an engine combination to a user according to an engine combination selection set of the translation service comprises:
    获取引擎组合的筛选条件;Get the filter criteria for the engine combination;
    从所述引擎组合选择集中调取符合所述筛选条件的引擎组合的优先级队列供用户选择。A priority queue of the engine combination that meets the screening condition is retrieved from the engine combination selection center for selection by the user.
  10. 根据权利要求8所述的实现语音翻译的方法,其中,所述根据所述翻译服务的引擎组合选择集向用户推荐引擎组合的步骤包括:The method for implementing speech translation according to claim 8, wherein the step of recommending an engine combination to a user according to an engine combination selection set of the translation service comprises:
    获取引擎组合的筛选条件;Get the filter criteria for the engine combination;
    从所述引擎组合选择集中选取最符合所述筛选条件的引擎组合。A combination of engines that best meets the screening criteria is selected from the engine combination selection set.
  11. 一种实现语音翻译的装置,包括: A device for implementing speech translation, comprising:
    建立模块,设置为建立语音识别引擎集、文本翻译引擎集和语音合成引擎集;Establishing a module, set to establish a speech recognition engine set, a text translation engine set, and a speech synthesis engine set;
    组合模块,设置为针对每一种翻译服务,分别从所述语音识别引擎集、所述文本翻译引擎集和所述语音合成引擎集中选取支持所述翻译服务的语音识别引擎、文本翻译引擎和语音合成引擎,并组成至少两组引擎组合;a combination module, configured to select, for each translation service, a speech recognition engine, a text translation engine, and a voice that support the translation service from the speech recognition engine set, the text translation engine set, and the speech synthesis engine set respectively Synthesizing engines and composing at least two sets of engine combinations;
    获取模块,设置为获取每组引擎组合的特征信息;Obtaining a module, configured to obtain feature information of each group of engine combinations;
    排序模块,设置为根据所述特征信息对每一种翻译服务的引擎组合进行优先级排序,生成所述翻译服务的引擎组合选择集。The sorting module is configured to prioritize engine combinations of each translation service according to the feature information, and generate an engine combination selection set of the translation service.
  12. 根据权利要求11所述的实现语音翻译的装置,其中,所述特征信息包括处理时延,所述获取模块包括第一获取单元,所述第一获取单元包括: The apparatus for implementing a speech translation according to claim 11, wherein the feature information includes a processing delay, the obtaining module includes a first acquiring unit, and the first acquiring unit includes:
    翻译测试子单元,设置为针对每一组引擎组合,利用所述引擎组合进行语音翻译测试;a translation test sub-unit, configured to perform a speech translation test using the engine combination for each set of engine combinations;
    第一统计子单元,设置为统计完成一次语音翻译测试所耗费的时间,将统计的时间作为所述引擎组合的处理时延。The first statistic subunit is set to count the time taken to complete a speech translation test, and the statistic time is used as the processing delay of the engine combination.
  13. 根据权利要求11所述的实现语音翻译的装置,其中,所述特征信息包括使用费用,所述获取模块包括第二获取单元,所述第二获取单元包括: The apparatus for implementing a speech translation according to claim 11, wherein the feature information includes a usage fee, the acquisition module includes a second acquisition unit, and the second acquisition unit includes:
    收费获取子单元,设置为针对每一组引擎组合,获取所述引擎组合中每个引擎的收费标准;a charging acquisition subunit, configured to acquire a charging standard for each engine in the engine combination for each group of engine combinations;
    第二统计子单元,设置为根据所述收费标准统计出所述引擎组合的使用费用。 The second statistic subunit is configured to count the usage fee of the engine combination according to the charging standard.
  14. 根据权利要求11所述的实现语音翻译的装置,其中,所述特征信息包括翻译准确度,所述获取模块包括第三获取单元,所述第三获取单元包括: The apparatus for implementing a speech translation according to claim 11, wherein the feature information includes translation accuracy, the acquisition module includes a third acquisition unit, and the third acquisition unit includes:
    评分搜集子单元,设置为针对每一组引擎组合,搜集用户对所述引擎组合的翻译结果的准确度的评分;a score collection sub-unit, configured to collect, for each set of engine combinations, a score of a user's accuracy of translation results of the engine combination;
    第三统计子单元,设置为对所述评分进行统计,将统计结果作为所述引擎组合的翻译准确度。The third statistical subunit is configured to perform statistics on the score, and use the statistical result as the translation accuracy of the engine combination.
  15. 根据权利要求11所述的实现语音翻译的装置,其中,所述特征信息至少有两种,所述排序模块设置为:根据至少两种特征信息的组合对所述引擎组合进行优先级排序。 The apparatus for implementing speech translation according to claim 11, wherein the feature information is at least two, and the sorting module is configured to: prioritize the engine combination according to a combination of at least two types of feature information.
  16. 根据权利要求12所述的实现语音翻译的装置,其中,所述第一统计子单元包括: The apparatus for implementing speech translation according to claim 12, wherein the first statistical subunit comprises:
    第一计算子单元,设置为在语音翻译测试过程中,计算语音识别引擎进行语音识别所耗费的第一时间,计算文本翻译引擎进行文本翻译所耗费的第二时间,计算语音合成引擎进行语音合成所耗费的第三时间;The first computing subunit is configured to calculate a first time spent by the speech recognition engine for speech recognition during the speech translation test, calculate a second time spent by the text translation engine for text translation, and calculate a speech synthesis engine for speech synthesis The third time spent;
    第二计算子单元,设置为计算第一时间、第二时间和第三时间之和,将计算结果作为完成一次语音翻译测试所耗费的时间。The second calculation subunit is configured to calculate the sum of the first time, the second time, and the third time, and use the calculation result as the time taken to complete a speech translation test.
  17. 根据权利要求12所述的实现语音翻译的装置,其中,所述装置还包括筛选模块,所述筛选模块包括: The apparatus for implementing speech translation according to claim 12, wherein the apparatus further comprises a screening module, the screening module comprising:
    判断单元,设置为判断所述引擎组合的处理时延是否大于或等于阈值;a determining unit, configured to determine whether a processing delay of the engine combination is greater than or equal to a threshold;
    丢弃单元,设置为当所述处理时延大于或等于阈值时,丢弃所述引擎组合。The discarding unit is configured to discard the engine combination when the processing delay is greater than or equal to a threshold.
  18. 根据权利要求11所述的实现语音翻译的装置,其中,所述装置还包括推荐模块,所述推荐模块包括: The apparatus for implementing a speech translation according to claim 11, wherein the apparatus further comprises a recommendation module, the recommendation module comprising:
    确定单元,设置为确定用户需要的翻译服务;Determining a unit, set to determine a translation service required by the user;
    推荐单元,设置为根据所述翻译服务的引擎组合选择集向用户推荐引擎组合。A recommendation unit, configured to recommend a engine combination to the user according to the engine combination selection set of the translation service.
  19. 根据权利要求18所述的实现语音翻译的装置,其中,所述推荐单元包括: The apparatus for implementing speech translation according to claim 18, wherein the recommending unit comprises:
    条件获取子单元,设置为获取引擎组合的筛选条件;Condition acquisition subunit, set to obtain the filter condition of the engine combination;
    调取子单元,设置为从所述引擎组合选择集中调取符合所述筛选条件的引擎组合的优先级队列供用户选择。The sub-unit is retrieved, and is set to select a priority queue of the engine combination that meets the screening condition from the engine combination selection set for the user to select.
  20. 一种语音翻译设备,包括存储器、处理器和至少一个被存储在所述存储器中并被配置为由所述处理器执行的应用程序,其中,所述应用程序被配置为用于执行权利要求1所述的实现语音翻译的方法。 A speech translation device comprising a memory, a processor and at least one application stored in the memory and configured to be executed by the processor, wherein the application is configured to perform claim 1 The method of implementing speech translation.
PCT/CN2018/077452 2018-02-05 2018-02-27 Method and device for translating speech and speech translation apparatus WO2019148564A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810112285.6 2018-02-05
CN201810112285.6A CN108319591A (en) 2018-02-05 2018-02-05 Realize the method, apparatus and speech translation apparatus of voiced translation

Publications (1)

Publication Number Publication Date
WO2019148564A1 true WO2019148564A1 (en) 2019-08-08

Family

ID=62902880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/077452 WO2019148564A1 (en) 2018-02-05 2018-02-27 Method and device for translating speech and speech translation apparatus

Country Status (2)

Country Link
CN (1) CN108319591A (en)
WO (1) WO2019148564A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214011A (en) * 2018-09-19 2019-01-15 深圳市合言信息科技有限公司 It is a kind of by user feedback come the cognitive engine selection strategy of self-perfection
CN108986793A (en) * 2018-09-28 2018-12-11 北京百度网讯科技有限公司 translation processing method, device and equipment
CN109286725B (en) 2018-10-15 2021-10-19 华为技术有限公司 Translation method and terminal
CN110287499A (en) * 2019-06-26 2019-09-27 一带科技服务(北京)有限公司 Interpretation method, device and comprehensive platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104303177A (en) * 2012-04-25 2015-01-21 寇平公司 Instant translation system
CN106791078A (en) * 2016-12-18 2017-05-31 程在舒 The speech playing method and application of mobile terminal new information and Domestic News
CN106997762A (en) * 2017-03-08 2017-08-01 广东美的制冷设备有限公司 The sound control method and device of household electrical appliance

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100731761B1 (en) * 2005-05-02 2007-06-22 주식회사 싸일런트뮤직밴드 Music production system and method by using internet
JP2009048003A (en) * 2007-08-21 2009-03-05 Toshiba Corp Voice translation device and method
CN102800313A (en) * 2011-05-25 2012-11-28 上海先先信息科技有限公司 Method for supporting multi-voice recognition engine in Voice extensible markup language (XML) 2.0
CN103838857B (en) * 2014-03-17 2017-02-15 中国科学院软件研究所 Automatic service combination system and method based on semantics
CN106021239B (en) * 2016-04-29 2018-10-26 北京创鑫旅程网络技术有限公司 A kind of translation quality real-time estimating method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104303177A (en) * 2012-04-25 2015-01-21 寇平公司 Instant translation system
CN106791078A (en) * 2016-12-18 2017-05-31 程在舒 The speech playing method and application of mobile terminal new information and Domestic News
CN106997762A (en) * 2017-03-08 2017-08-01 广东美的制冷设备有限公司 The sound control method and device of household electrical appliance

Also Published As

Publication number Publication date
CN108319591A (en) 2018-07-24

Similar Documents

Publication Publication Date Title
WO2019148564A1 (en) Method and device for translating speech and speech translation apparatus
US6434549B1 (en) Network-based, human-mediated exchange of information
US11138391B2 (en) Automatic translation of advertisements
US20150149149A1 (en) System and method for translation
TWI711967B (en) Method, device and equipment for determining broadcast voice
CN101257512A (en) Inquiry answer matching method used for inquiry answer system as well as inquiry answer method and system
CN110457580B (en) Hotspot recommendation method and device based on search
US11934394B2 (en) Data query method supporting natural language, open platform, and user terminal
CN106503184B (en) Determine the method and device of the affiliated class of service of target text
CN110266900B (en) Method and device for identifying customer intention and customer service system
CN109948438A (en) Automatic interview methods of marking, device, system, computer equipment and storage medium
TWI690811B (en) Intelligent Online Customer Service Convergence Core System
CN108682421A (en) A kind of audio recognition method, terminal device and computer readable storage medium
WO2016178337A1 (en) Information processing device, information processing method, and computer program
CN109286902A (en) The flow of the people acquisition methods and device of scenic spot tourist
CN110289015A (en) A kind of audio-frequency processing method, device, server, storage medium and system
CN107562457B (en) Navigation menu generation method and device
WO2020220779A1 (en) Face sample library deployment method, and face recognition-based service processing method and apparatus
CN107894983A (en) The switching method and device of engine
CN110929014B (en) Information processing method, information processing device, electronic equipment and storage medium
CN109102903A (en) A kind of topic prediction technique and system for health consultation platform
CN107967921A (en) The volume adjusting method and device of conference system
CN110197196A (en) Question processing method, device, electronic equipment and storage medium
CN110473570B (en) Integrated voice recognition system and method
US20220075826A1 (en) Contact generatio device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18904072

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18904072

Country of ref document: EP

Kind code of ref document: A1