WO2019148564A1

WO2019148564A1 - Method and device for translating speech and speech translation apparatus

Info

Publication number: WO2019148564A1
Application number: PCT/CN2018/077452
Authority: WO
Inventors: 郑勇; 金志军; 王文祺
Original assignee: 深圳市沃特沃德股份有限公司
Priority date: 2018-02-05
Filing date: 2018-02-27
Publication date: 2019-08-08
Also published as: CN108319591A

Abstract

A method and device for translating speech and a speech translation apparatus. The method comprises the following steps: establishing a speech recognition engine set, a text translation engine set, and a speech synthesis engine set (S11); for each translation service, respectively selecting, from the speech recognition engine set, the text translation engine set, and the speech synthesis engine set, a speech recognition engine, a text translation engine, and a speech synthesis engine supporting a given translation service, and forming at least two engine combinations (S12); obtaining feature information of each engine combination (S13); and prioritizing the engine combinations of each translation service according to the feature information, and generating an engine combination selection set of the given translation service (S14). The present invention enables the free combination of a variety of engines, and utilizes the advantages of different engines, thereby improving the flexibility of translation, and greatly improving translation performance.

Description

Method, device and speech translation device for realizing speech translation

Technical field

The present invention relates to the field of speech translation technology, and in particular to a method, device and speech translation device for implementing speech translation.

Background technique

With the rapid development of the economy, foreign exchanges have become more and more extensive, and for many people, language barriers are a major obstacle to foreign exchanges. In order to solve the above problems, various voice translation devices have appeared on the market. With its powerful language translation function, the voice translation device is well received by people who have language translation needs, and it is also a good helper for people to learn foreign languages. The speech translation device can be translated during the dialogue between the two parties, so that users using different languages can communicate freely.

The general translation process of the voice translation device is: the voice translation device receives the original voice information of the user, and sends the voice information to the voice translation engine, and the voice translation engine translates the original voice information into the target voice information (translated from one language to another) The language is returned to the speech translation device, and the speech translation device outputs the target speech information. The current speech translation engine mainly includes Google engine, Microsoft engine, IBM engine, Xunfei engine, Baidu engine, Jinshan engine, etc., and each speech translation engine includes a speech recognition engine, a text translation engine and a speech synthesis engine, and each engine can The supported language types, billing standards, processing delays, and translation accuracy vary.

However, the current speech translation device only supports a single engine, for example, only supports the Baidu engine, and implements speech translation through the Baidu engine's speech recognition engine, text translation engine, and speech synthesis engine. However, the Baidu engine is currently only able to translate more than ten mainstream languages, but not for some small languages. Some engines may be able to translate small languages, but they may not be as good as the cost of use, translation speed, translation accuracy, etc.

It can be seen that the existing speech translation equipment has poor translation flexibility and low translation performance, and cannot provide users with high-quality translation services, and the user experience is not good.

technical problem

The main object of the present invention is to provide a method, device and speech translation device for implementing speech translation, which aims to improve the flexibility of translation and thereby improve translation performance.

Technical solution

To achieve the above objective, the embodiment of the present invention provides a method for implementing voice translation, and the method includes the following steps:

Establishing a speech recognition engine set, a text translation engine set, and a speech synthesis engine set;

For each translation service, a speech recognition engine, a text translation engine, and a speech synthesis engine supporting the translation service are respectively selected from the speech recognition engine set, the text translation engine set, and the speech synthesis engine, and at least two sets of engines are formed. combination;

Obtaining feature information of each group of engine combinations;

The engine combination of each translation service is prioritized according to the feature information, and an engine combination selection set of the translation service is generated.

An embodiment of the present invention further provides an apparatus for implementing voice translation, where the apparatus includes:

Establishing a module, set to establish a speech recognition engine set, a text translation engine set, and a speech synthesis engine set;

a combination module, configured to select, from each of the speech recognition engine set, the text translation engine set, and the speech synthesis engine, a speech recognition engine, a text translation engine, and a speech synthesis engine, respectively, for each translation service, and Forming at least two sets of engine combinations;

Obtaining a module, configured to obtain feature information of each group of engine combinations;

The sorting module is configured to prioritize engine combinations of each translation service according to the feature information, and generate an engine combination selection set of the translation service.

Embodiments of the present invention also provide a voice translation device including a memory, a processor, and at least one application stored in the memory and configured to be executed by the processor, the application being configured to use The foregoing method of implementing speech translation is performed.

Beneficial effect

A method for realizing speech translation provided by an embodiment of the present invention, for each translation service, different speech recognition engines, a text translation engine, and a speech synthesis engine are freely combined into a plurality of engine combinations, and then according to the characteristics of the engine combination The information prioritizes the engine combinations of each translation service and generates a set of engine combination selections for subsequent translations. Thereby, the free combination of multiple engines is realized, the advantages of different engines are fully utilized, the flexibility of translation is improved, and the translation performance is greatly improved, which not only expands the range of translatable languages, but also satisfies users' translation costs. The special needs of translation speed, translation accuracy, etc., can provide users with better translation services, greatly improving the user experience.

DRAWINGS

1 is a flow chart of a first embodiment of a method for implementing voice translation according to the present invention;

2 is a schematic diagram of a speech recognition engine set in an embodiment of the present invention;

3 is a schematic diagram of a text translation engine set in an embodiment of the present invention;

4 is a schematic diagram of a speech synthesis engine set in an embodiment of the present invention;

5 is a schematic diagram of an engine language database in an embodiment of the present invention;

6 is a schematic diagram of an engine charging list in an embodiment of the present invention;

7 is a flow chart of a second embodiment of a method for implementing voice translation according to the present invention;

FIG. 8 is a schematic block diagram of a first embodiment of an apparatus for implementing voice translation according to the present invention; FIG.

9 is a block diagram of an acquisition module in 8;

Figure 10 is a block diagram of the first acquisition unit of Figure 9;

Figure 11 is a block diagram of the first statistical unit of Figure 10;

Figure 12 is a block diagram of the second acquisition unit of Figure 9;

Figure 13 is a block diagram of the third acquisition unit of Figure 9;

14 is a schematic block diagram of a second embodiment of an apparatus for implementing voice translation according to the present invention;

Figure 15 is a block diagram of the screening module of Figure 14;

16 is a schematic block diagram of a third embodiment of an apparatus for implementing voice translation according to the present invention;

17 is a block diagram of the recommended module of FIG. 16;

18 is another block diagram of the recommended module of FIG. 16.

The implementation, functional features, and advantages of the present invention will be further described in conjunction with the embodiments.

BEST MODE FOR CARRYING OUT THE INVENTION

It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.

Referring to FIG. 1, a first embodiment of a method for implementing speech translation according to the present invention is proposed. The method includes the following steps:

S11. Establish a speech recognition engine set, a text translation engine set, and a speech synthesis engine set.

In the embodiment of the present invention, the voice translation device collects information of the voice recognition engine, establishes a voice recognition engine set, collects information of the text translation engine, establishes a text translation engine set, collects information of the voice recognition engine, and establishes a voice recognition engine set. .

The information of the speech recognition engine includes the engine name, the type of language supported, and the like. The engine names include Google's speech recognition engine, Microsoft speech recognition engine, IBM speech recognition engine, Nuance speech recognition engine, Baidu speech recognition engine, etc. The languages supported by each engine are different. As shown in Figure 2, an example of a speech recognition engine set, where M represents the engine name and X represents the supported language category.

The information of the text translation engine includes the engine name, the type of language supported, and the like. The engine name includes Google Translate Engine, Microsoft Translation Engine, IBM Translation Engine, Xunfei Translation Engine, Baidu Translation Engine, Jinshan Translation Engine, etc. The languages supported by each engine are different. As shown in Figure 3, an example of a text translation engine set, where N represents the engine name and X represents the supported language category.

The speech synthesis engine information includes the engine name and the supported language types. The engine name includes Nuance synthesis engine, Microsoft speech synthesis engine, IBM speech synthesis engine, Baidu speech synthesis engine, etc. The languages supported by each engine are different. As shown in FIG. 4, an example is a speech synthesis engine set, where K represents the engine name and X represents the supported language type.

Alternatively, the speech recognition engine set, the text translation engine set, and the speech synthesis engine may be combined into one engine language database. As shown in Figure 5, it is an instance of the engine language database.

S12. For each translation service, select a speech recognition engine, a text translation engine, and a speech synthesis engine that support the translation service from a speech recognition engine set, a text translation engine set, and a speech synthesis engine, and form at least two sets of engine combinations. .

In the embodiment of the present invention, the translation service is a service of mutual translation between two languages, such as Chinese-English translation service, Chinese-Japanese translation service, Japanese-English translation service, Japanese-Korean translation service, French-English mutual translation. Translation services, Chinese-Bulgarian translation services, etc. For each translation service, at least two sets of engine combinations supporting speech recognition + text translation + speech synthesis of the translation service are formed, that is, each set of engine combinations includes a speech recognition engine, a text translation engine and a speech synthesis engine.

Taking the Chinese-Bulgarian translation service as an example, the speech translation device selects the Chinese-Bulgarian translation service from the speech recognition engine to support the Chinese and Bulgarian speech recognition engines, and selects Chinese-Bulgarian language from the text translation engine. The translation service supports Chinese and Bulgarian text translation engines, and selects the Chinese-Bulgarian translation service from the speech synthesis engine to support Chinese and Bulgarian speech synthesis engines. Then use the selected engine to form at least two sets of engines that can provide Chinese-Bulgarian translation services. Each set of engine combinations includes a speech recognition engine, a text translation engine and a speech synthesis engine.

S13. Acquire feature information of each group of engine combinations.

In the embodiment of the present invention, the feature information of the engine combination includes a processing delay, a usage fee, a translation accuracy, and the like, and the voice translation device may acquire at least one of the feature information.

When processing delays are acquired, for each set of engine combinations, the speech translation device uses the engine combination to perform a speech translation test, and the time taken to complete a speech translation test is counted, and the statistical time is taken as the processing delay of the engine combination.

In a specific implementation, the voice translation device can obtain the original voice information of each language as a test template, such as "What is the weather today", "Where are you from?", "What is your name?", and then use the original Voice information is tested for speech translation. In the speech translation test process, the speech translation device separately calculates the first time spent by the speech recognition engine for speech recognition, calculates the second time spent by the text translation engine for text translation, and calculates the cost of the speech synthesis engine for speech synthesis. At three times, the sum of the first time, the second time, and the third time is finally calculated, and the calculation result is taken as the time taken to complete a speech translation test.

For example, an engine combination that supports the Chinese-English translation service is used: the speech translation device sends the Chinese original speech information to the speech recognition engine in the engine combination, and after the speech recognition engine recognizes and processes, the Chinese text information (such as characters) is obtained. String), and calculate the entire processing time t1; the speech translation device sends the Chinese text information to the text translation engine in the engine combination, after the text translation engine translates the processing, the English text information is obtained, and the entire processing time t2 is calculated; the speech translation The device sends the English text information to the speech synthesis engine, and after synthesizing and processing by the speech synthesis engine, the English speech information is obtained, and the whole processing time t3 is calculated; finally, the speech translation device calculates the entire recognition + translation + synthesis time T=t1+t2 +t3. T is the processing delay of the engine combination.

Further, the voice translation device can further filter the engine combination according to the processing delay, and filter out the engine combination that the processing delay is too long, that is, the translation speed is too slow, thereby reducing the data volume of the database, saving the storage space, and improving the operation efficiency.

Specifically, the voice translation device compares the processing delay of the engine combination with a threshold, and determines whether the processing delay is greater than or equal to the threshold; when the processing delay is greater than or equal to the threshold, the engine combination is discarded. The threshold can be set according to actual needs, or can be customized by the user.

When the usage fee is obtained, for each engine combination, the voice translation device acquires the charging standard of each engine in the engine combination, and the usage fee of the engine combination is counted according to the charging standard. The charging standard includes information such as billing method and billing price. The billing method of each engine is different, including time billing, billing, and word billing. Even if the billing method is the same, the billing price is not exhausted. the same.

In a specific implementation, the voice translation device may collect the charging standards of each engine, and establish an engine charging list as shown in FIG. 6, where H represents a charging standard of the voice recognition engine, I represents a charging standard of the text translation engine, and J represents a voice. The charging standard for the synthesis engine. For each engine combination, the voice translation device queries the engine toll list, obtains the charging standard of the speech recognition engine in the engine combination, the charging standard of the text translation engine, and the charging standard of the speech synthesis engine, and calculates the charging standard of the three engines. The cost of using this engine combination.

The voice translation device can perform a voice translation test on the original voice information by using the engine combination, and the usage fee of the test is used as the usage fee of the engine combination. Optionally, when the charging modes of the three engines in the engine combination are the same, the voice translation device may directly calculate the usage fee of the engine combination by using the charging prices of the three engines. For example, the voice recognition engine charges h yuan / minute, the text translation engine charges i yuan / minute, the voice synthesis engine charges j yuan / minute, then the engine combination of these three engines Is (h+i+j) yuan/minute.

When obtaining translation accuracy, for each set of engine combinations, the speech translation device collects a user's score for the accuracy of the translation result of the engine combination, and then for a period of time (eg, within one month, within six months, within one year, etc.) The collected scores are counted (such as calculating the average of the scores), and the statistical results are used as the translation accuracy of the engine combination.

It can be understood by those skilled in the art that the voice translation device can also obtain other feature information of the engine combination according to actual needs, and the present invention will not be repeated herein.

S14. Prioritize the engine combination of each translation service according to the feature information, and generate an engine combination selection set of the translation service.

In the embodiment of the present invention, the voice translation device prioritizes the engine combination of the same translation service according to the feature information, and generates an engine combination selection set of the translation service for selection and use in subsequent translation.

The speech translation device can prioritize engine combinations based on a feature information. For example, the engine combination is prioritized according to the order of processing delay from low to high; the engine combination is prioritized according to the order of usage cost from low to high; in order of translation accuracy from high to low, Prioritize engine combinations.

The speech translation device may also prioritize engine combinations based on a combination of at least two types of feature information. For example, the engine combination is prioritized according to the standard of low processing delay and low usage cost, and the engine combination is prioritized according to the standard of low processing delay and high translation accuracy, and the usage cost is low and the translation accuracy is high. The standard prioritizes engine combinations and prioritizes engine combinations based on criteria such as low processing latency, low cost of ownership, and high translation accuracy.

The speech translation device stores the above prioritization queue as an engine combination selection set of the translation service, and finally each translation service has its own engine combination selection set. Subsequently, the engine can be provided with better translation services according to the engine combination selection set and the needs of the user.

Further, as shown in FIG. 7, in the second embodiment of the method for implementing voice translation of the present invention, after step S14, the following steps are further included:

S15. Determine a translation service required by the user.

The voice translation device can provide an option for the translation service for the user to select, and determine the translation service that the user needs according to the user's selection. The voice translation device can also receive the information of the translation service input by the user, and determine the translation service required by the user according to the input information.

S16. Recommend an engine combination to the user according to the engine combination selection set of the translation service.

Optionally, the voice translation device then obtains a filter condition of the user for the engine combination, and then selects a priority queue of the engine combination that meets the filter condition from the engine combination selection set for the user to select.

The voice translation device can provide an option for the engine combination screening condition, including low processing delay, low low usage cost, high translation accuracy, low processing delay, low usage cost, low processing delay + translation High accuracy, low cost of use + high translation accuracy, low processing delay + low cost of use + high translation accuracy.

Optionally, the speech translation device then obtains a filter condition of the user for the engine combination, and then selects the engine combination that most matches the filter condition from the engine combination selection set.

When the user selects a screening condition that delays the processing time, the speech translation device selects the engine combination with the lowest processing delay; when the user selects the low-cost filtering condition, the speech translation device selects the engine combination with the lowest usage fee; When a screening condition with high translation accuracy is selected, the speech translation device selects the engine combination with the highest translation accuracy; when the user selects the processing time delay + low usage cost screening condition, the speech translation device takes a low processing delay + low The highest priority engine combination is selected among the priority queues of the engine combination using the fee; when the user selects the filtering condition with low processing delay and high translation accuracy, the speech translation device has low processing delay + high translation accuracy. The engine combination's priority queue selects the highest priority engine combination; when the user chooses to use the low cost + high translation accuracy filter condition, the voice translation device takes the priority of the engine combination from low usage cost + high translation accuracy. Select the engine combination with the highest priority in the queue; when the user chooses to process, the delay is low + the usage fee is low + When the translation of high accurate screening conditions, the speech translation device from the low priority queue processing delay + low cost + engine using a combination of high translation accuracy in selecting the highest priority engine combination.

In other embodiments, the voice translation device may automatically select an optimal engine combination for the user, or automatically select the engine combination that best meets the user's needs according to the user's usage habits or user settings (such as translation fee cap settings).

The embodiment of the present invention implements a method for voice translation. For each translation service, different voice recognition engines, a text translation engine, and a voice synthesis engine are freely combined into a plurality of engine combinations, and then according to the feature information of the engine combination. The translation service's engine combination prioritizes and generates an engine combination selection set for use in subsequent translations. Thereby, the free combination of multiple engines is realized, the advantages of different engines are fully utilized, the flexibility of translation is improved, and the translation performance is greatly improved, which not only expands the range of translatable languages, but also satisfies users' translation costs. The special needs of translation speed, translation accuracy, etc., can provide users with better translation services, greatly improving the user experience.

The voice translation device in the embodiment of the present invention may be a professional translation machine or a variety of terminal devices, such as mobile terminals such as mobile phones and tablets, computer terminals such as personal computers and notebook computers, and the like. A specific application (APP) may be installed on the foregoing translation machine and the terminal device, and the method for realizing the speech translation of the embodiment of the present invention is implemented by using the application.

Referring to FIG. 8, a first embodiment of an apparatus for implementing voice translation according to the present invention is provided. The apparatus includes a setup module 10, a combination module 20, an acquisition module 30, and a ranking module 40. The setup module 10 is configured to establish a voice recognition engine set. a text translation engine set and a speech synthesis engine set; the combination module 20 is configured to, for each translation service, select a speech recognition engine supporting the translation service from the speech recognition engine set, the text translation engine set, and the speech synthesis engine, respectively. a text translation engine and a speech synthesis engine, and composing at least two sets of engine combinations; an acquisition module 30 configured to acquire feature information of each set of engine combinations; and a sorting module 40 configured to perform engine combination for each translation service according to the feature information Prioritization, generating a set of engine combination selections for the translation service for use in subsequent translations.

In the embodiment of the present invention, the establishing module 10 collects information of the speech recognition engine, establishes a speech recognition engine set, collects information of the text translation engine, establishes a text translation engine set, collects information of the speech recognition engine, and establishes a speech recognition engine set. .

Alternatively, the building module 10 may combine the speech recognition engine set, the text translation engine set, and the speech synthesis engine into one engine language database.

In the embodiment of the present invention, the feature information of the engine combination includes a processing delay, a usage fee, a translation accuracy, and the like, and the acquiring module 30 may obtain at least one of the feature information.

As shown in FIG. 9, in the embodiment of the present invention, the obtaining module 30 includes a first acquiring unit 31, and the first acquiring unit 31 is configured to acquire a processing delay of the engine combination.

As shown in FIG. 10, the first obtaining unit 31 includes a translation test sub-unit 311 and a first statistical sub-unit 312, wherein: a translation test sub-unit 311 is configured to perform a speech translation test using the engine combination for each group of engine combinations. The first statistic sub-unit 312 is configured to count the time taken to complete a speech translation test, and the statistic time is taken as the processing delay of the engine combination.

As shown in FIG. 11, the first statistic sub-unit 312 includes a first calculation sub-unit 3121 and a second calculation sub-unit 3122, wherein: the first calculation sub-unit 3121 is configured to perform a speech recognition engine during the speech translation test. The first time spent by the speech recognition, calculating the second time spent by the text translation engine for text translation, calculating the third time spent by the speech synthesis engine for speech synthesis; the second calculation subunit 3122, setting to calculate the first time The sum of the second time and the third time, the calculation result is taken as the time taken to complete a speech translation test.

Further, the obtaining module 30 further includes a second obtaining unit 32, which is configured to acquire the usage fee of the engine combination.

As shown in FIG. 12, the second obtaining unit 32 includes a charging acquisition sub-unit 321 and a second unified sub-unit, wherein: the charging acquisition sub-unit 321 is configured to acquire each engine in the engine combination for each group of engine combinations. The charging standard; the second statistical sub-unit 322 is configured to calculate the usage fee of the engine combination according to the charging standard of each engine.

Further, the obtaining module 30 further includes a third obtaining unit 33 configured to acquire translation accuracy of the engine combination.

As shown in FIG. 13, the third obtaining unit 33 includes a score collecting sub-unit 331 and a third statistical sub-unit 332, wherein: the scoring collecting sub-unit 331 is configured to collect a translation of the engine combination for each set of engine combinations. The score of the accuracy of the result; the third statistic sub-unit 332 is configured to perform statistics on the scores collected within a period of time (eg, within one month, within a half year, within one year, etc.) (eg, an average of the calculated scores), and the statistics will be The result is the translation accuracy of the engine combination.

Further, as shown in FIG. 14, in the second embodiment of the apparatus for implementing voice translation according to the present invention, the apparatus further includes a screening module 50 configured to further filter the engine combination according to the processing delay, and filter out The processing combination is too long, that is, the translation speed is too slow, thus reducing the amount of data in the database, saving storage space, and improving operational efficiency.

As shown in FIG. 15, the screening module 50 includes a determining unit 51 and a discarding unit 52, wherein: the determining unit 51 is configured to compare the processing delay of the engine combination with a threshold, and determine whether the processing delay of the engine combination is greater than or equal to the threshold. The discarding unit 52 is configured to discard the engine combination when the processing delay is greater than or equal to the threshold.

Further, as shown in FIG. 16, in a third embodiment of the apparatus for implementing speech translation according to the present invention, the apparatus further includes a recommendation module 60 configured to recommend a suitable engine combination to the user.

In the embodiment of the present invention, the recommendation module 60 includes a determining unit and a recommending unit, wherein: the determining unit is configured to determine a translation service required by the user; and the recommending unit is configured to recommend the engine combination to the user according to the engine combination selection set of the translation service.

The recommendation unit may include a condition acquisition subunit 61 and a reconciliation subunit 62, as shown in FIG. 17, wherein: the condition acquisition subunit 61 is configured to acquire a filter condition of the engine combination; and the subunit 62 is set to be a slave engine combination. Select the centralized priority queue for the engine combination that meets the filter criteria for the user to select.

The condition acquisition sub-unit 61 may provide an option for the filter condition of the engine combination for the user to select, and obtain the filter condition selected by the user, and the filter condition includes low processing delay, low low usage cost, high translation accuracy, low processing delay, and use. Low cost, low processing delay + high translation accuracy, low cost of use + high translation accuracy, low processing delay + low cost of use + high translation accuracy.

The recommendation unit may also include a condition acquisition subunit 61 and a selection subunit 63, as shown in FIG. 18, wherein: the condition acquisition subunit 61 is configured to acquire a filter condition of the engine combination; and the selection subunit 63 is set to select from the engine combination. Focus on the engine combination that best matches the filter criteria.

In other embodiments, the recommendation module 60 can automatically select an optimal engine combination for the user, or automatically select the engine combination that best meets the user's needs according to the user's usage habits or user settings (such as translation fee cap settings).

The apparatus for speech translation and the method of speech translation provided in the above embodiments are all based on the same inventive concept. Therefore, the specific functions of the function modules/units of the specific embodiments in the apparatus for voice translation can be referred to the foregoing method embodiments, and details are not described herein again.

The device for implementing voice translation according to an embodiment of the present invention separately combines different voice recognition engines, a text translation engine, and a voice synthesis engine into a plurality of engine combinations for each translation service, and then according to the feature information of the engine combination for each The translation service's engine combination prioritizes and generates an engine combination selection set for use in subsequent translations. Thereby, the free combination of multiple engines is realized, the advantages of different engines are fully utilized, the flexibility of translation is improved, and the translation performance is greatly improved, which not only expands the range of translatable languages, but also satisfies users' translation costs. The special needs of translation speed, translation accuracy, etc., can provide users with better translation services, greatly improving the user experience.

The invention also proposes a speech translation device comprising a memory, a processor and at least one application stored in the memory and configured to be executed by the processor, the application being configured to perform speech translation method. The method comprises the steps of: establishing a speech recognition engine set, a text translation engine set, and a speech synthesis engine set; for each translation service, respectively selecting and supporting the translation from a set of a speech recognition engine set, a text translation engine set, and a speech synthesis engine The speech recognition engine, the text translation engine and the speech synthesis engine of the service, and form at least two sets of engine combinations; acquire feature information of each set of engine combinations; prioritize engine combinations of each translation service according to the feature information, and generate translations The engine's engine combination selection set. The method for implementing the speech translation described in this embodiment is the method for implementing the speech translation involved in the foregoing embodiment of the present invention, and details are not described herein again.

The preferred embodiments of the present invention have been described above with reference to the drawings, and are not intended to limit the scope of the invention. A person skilled in the art can implement the invention in various variants without departing from the scope and spirit of the invention. For example, the features of one embodiment can be used in another embodiment to obtain a further embodiment. Any modifications, equivalent substitutions and improvements made within the technical concept of the invention are intended to be included within the scope of the invention.

Claims

A method for implementing speech translation, comprising the following steps:

Establishing a speech recognition engine set, a text translation engine set, and a speech synthesis engine set;

For each translation service, a speech recognition engine, a text translation engine, and a speech synthesis engine supporting the translation service are respectively selected from the speech recognition engine set, the text translation engine set, and the speech synthesis engine, and are respectively composed of At least two sets of engine combinations;

Obtaining feature information of each group of engine combinations;

The engine combination of each translation service is prioritized according to the feature information, and an engine combination selection set of the translation service is generated.
The method for implementing speech translation according to claim 1, wherein the feature information comprises a processing delay, and the step of acquiring feature information of each set of engine combinations comprises:

Performing a speech translation test using the engine combination for each set of engine combinations;

The time taken to complete a speech translation test is counted, and the time counted is used as the processing delay of the engine combination.
The method for implementing speech translation according to claim 1, wherein the feature information comprises a usage fee, and the step of acquiring feature information of each set of engine combinations comprises:

Obtaining a charging standard for each engine in the engine combination for each set of engine combinations;

The usage fee of the engine combination is counted according to the charging standard.
The method for implementing speech translation according to claim 1, wherein the feature information comprises translation accuracy, and the step of acquiring feature information of each set of engine combinations comprises:

For each set of engine combinations, collecting a user's rating of the accuracy of the translation results of the engine combination;

The score is counted, and the statistical result is used as the translation accuracy of the engine combination.
The method for implementing speech translation according to claim 1, wherein the feature information has at least two types, and the step of prioritizing engine combinations of each translation service according to the feature information comprises:

The engine combination is prioritized according to a combination of at least two pieces of feature information.
The method for implementing speech translation according to claim 2, wherein the step of counting the time taken to complete a speech translation test comprises:

In the speech translation test process, calculating the first time spent by the speech recognition engine for speech recognition, calculating the second time spent by the text translation engine for text translation, and calculating the third time spent by the speech synthesis engine for speech synthesis;

The sum of the first time, the second time, and the third time is calculated, and the calculation result is taken as the time taken to complete a speech translation test.
The method for implementing speech translation according to claim 2, wherein the step of acquiring feature information of each set of engine combinations further comprises:

Determining whether the processing delay of the engine combination is greater than or equal to a threshold;

When the processing delay is greater than or equal to a threshold, the engine combination is discarded.
The method for implementing a speech translation according to claim 1, wherein the step of generating the engine combination selection set of the translation service further comprises:

Identify the translation services that users need;

The engine combination is recommended to the user according to the engine combination selection set of the translation service.
The method for implementing speech translation according to claim 8, wherein the step of recommending an engine combination to a user according to an engine combination selection set of the translation service comprises:

Get the filter criteria for the engine combination;

A priority queue of the engine combination that meets the screening condition is retrieved from the engine combination selection center for selection by the user.
The method for implementing speech translation according to claim 8, wherein the step of recommending an engine combination to a user according to an engine combination selection set of the translation service comprises:

Get the filter criteria for the engine combination;

A combination of engines that best meets the screening criteria is selected from the engine combination selection set.
A device for implementing speech translation, comprising:

Establishing a module, set to establish a speech recognition engine set, a text translation engine set, and a speech synthesis engine set;

a combination module, configured to select, for each translation service, a speech recognition engine, a text translation engine, and a voice that support the translation service from the speech recognition engine set, the text translation engine set, and the speech synthesis engine set respectively Synthesizing engines and composing at least two sets of engine combinations;

Obtaining a module, configured to obtain feature information of each group of engine combinations;

The sorting module is configured to prioritize engine combinations of each translation service according to the feature information, and generate an engine combination selection set of the translation service.
The apparatus for implementing a speech translation according to claim 11, wherein the feature information includes a processing delay, the obtaining module includes a first acquiring unit, and the first acquiring unit includes:

a translation test sub-unit, configured to perform a speech translation test using the engine combination for each set of engine combinations;

The first statistic subunit is set to count the time taken to complete a speech translation test, and the statistic time is used as the processing delay of the engine combination.
The apparatus for implementing a speech translation according to claim 11, wherein the feature information includes a usage fee, the acquisition module includes a second acquisition unit, and the second acquisition unit includes:

a charging acquisition subunit, configured to acquire a charging standard for each engine in the engine combination for each group of engine combinations;

The second statistic subunit is configured to count the usage fee of the engine combination according to the charging standard.
The apparatus for implementing a speech translation according to claim 11, wherein the feature information includes translation accuracy, the acquisition module includes a third acquisition unit, and the third acquisition unit includes:

a score collection sub-unit, configured to collect, for each set of engine combinations, a score of a user's accuracy of translation results of the engine combination;

The third statistical subunit is configured to perform statistics on the score, and use the statistical result as the translation accuracy of the engine combination.
The apparatus for implementing speech translation according to claim 11, wherein the feature information is at least two, and the sorting module is configured to: prioritize the engine combination according to a combination of at least two types of feature information.
The apparatus for implementing speech translation according to claim 12, wherein the first statistical subunit comprises:

The first computing subunit is configured to calculate a first time spent by the speech recognition engine for speech recognition during the speech translation test, calculate a second time spent by the text translation engine for text translation, and calculate a speech synthesis engine for speech synthesis The third time spent;

The second calculation subunit is configured to calculate the sum of the first time, the second time, and the third time, and use the calculation result as the time taken to complete a speech translation test.
The apparatus for implementing speech translation according to claim 12, wherein the apparatus further comprises a screening module, the screening module comprising:

a determining unit, configured to determine whether a processing delay of the engine combination is greater than or equal to a threshold;

The discarding unit is configured to discard the engine combination when the processing delay is greater than or equal to a threshold.
The apparatus for implementing a speech translation according to claim 11, wherein the apparatus further comprises a recommendation module, the recommendation module comprising:

Determining a unit, set to determine a translation service required by the user;

A recommendation unit, configured to recommend a engine combination to the user according to the engine combination selection set of the translation service.
The apparatus for implementing speech translation according to claim 18, wherein the recommending unit comprises:

Condition acquisition subunit, set to obtain the filter condition of the engine combination;

The sub-unit is retrieved, and is set to select a priority queue of the engine combination that meets the screening condition from the engine combination selection set for the user to select.
A speech translation device comprising a memory, a processor and at least one application stored in the memory and configured to be executed by the processor, wherein the application is configured to perform claim 1 The method of implementing speech translation.