CN110047488B

CN110047488B - Voice translation method, device, equipment and control equipment

Info

Publication number: CN110047488B
Application number: CN201910154764.9A
Authority: CN
Inventors: 易华龙
Original assignee: Guangzhou Caicheng Ming Technology Co ltd; Beijing Caiyun Ring Pacific Technology Co ltd
Current assignee: Guangzhou Caicheng Ming Technology Co ltd; Beijing Caiyun Ring Pacific Technology Co ltd
Priority date: 2019-03-01
Filing date: 2019-03-01
Publication date: 2022-04-12
Anticipated expiration: 2039-03-01
Also published as: CN110047488A

Abstract

The application provides a voice translation method, a voice translation device, voice translation equipment and control equipment. The method comprises the steps of obtaining audio data and a language text converted by a voice processing device based on the audio data; sending the audio data and the language text to a control terminal so that the control terminal checks the language text according to the audio data and generates a correction rule when the language text is checked to be correct; and receiving the correction rule sent by the control terminal, and calibrating the language text according to the correction rule. The speech translation method, the speech translation device, the speech translation equipment and the speech translation control equipment improve the accuracy of the speech text translation result, can calibrate the speech text in real time according to the correction rule, and do not influence the normal running of a conference.

Description

Voice translation method, device, equipment and control equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a control device for speech translation.

Background

With the development of speech recognition technology, people no longer meet the requirements of automatic translation of texts, and the requirements of speech translation are increasing, so that the speech translation method is widely applied to translation support of conferences.

And voice translation, namely translating the voice recognition into a target language text, namely, directly producing a translation result of the target language by a translation system according to the voice of the user while the user inputs the voice. In the prior art, speech translation generally includes two systems, a user side performs speech input and display, and a speech processing device side performs speech recognition and translation.

Due to the diversity of voice input and user side display equipment, the current voice translation usually has the problems of inaccurate translation, poor language and character display effect and the like, and participants cannot normally watch the translation result, so that the conference is interrupted or the speech of a conference speaker is interfered.

Disclosure of Invention

The application provides a voice translation method, a device, equipment and control equipment, which aim to solve the technical problem of inaccurate voice translation in the prior art.

In a first aspect, an embodiment of the present invention provides a speech translation method, including:

acquiring audio data and a language text converted by voice processing equipment based on the audio data;

sending the audio data and the language text to a control terminal so that the control terminal checks the language text according to the audio data and generates a correction rule when the language text is checked to be correct;

and receiving the correction rule sent by the control terminal, and calibrating the language text according to the correction rule.

In a second aspect, an embodiment of the present invention provides a speech translation method, including:

receiving audio data sent by a user terminal and a language text obtained by converting the audio data;

checking the language text according to the audio data, and generating a correction rule when the language text is checked to be correct;

and sending the correction rule to a user terminal so that the user terminal calibrates the language text according to the correction rule.

In a third aspect, an embodiment of the present invention provides a speech translation apparatus, including:

the acquisition module is used for acquiring audio data and a language text converted by the voice processing equipment based on the audio data;

the first sending module is used for sending the audio data and the language text to a control terminal so that the control terminal can check the language text according to the audio data and generate a correction rule when the language text is checked and determined to be incorrect;

and the calibration module is used for receiving the correction rule sent by the control terminal and calibrating the language text according to the correction rule.

In a fourth aspect, an embodiment of the present invention provides a speech translation apparatus, including:

the second receiving module is used for receiving the audio data sent by the user terminal and the language text obtained by conversion based on the audio data;

the verification module is used for verifying the language text according to the audio data and generating a correction rule when the language text is verified to be correct;

and the second sending module is used for sending the correction rule to the user terminal so that the user terminal can calibrate the language text according to the correction rule.

In a fifth aspect, an embodiment of the present invention provides a speech translation apparatus, including a memory, a processor;

a memory: for storing the processor-executable instructions;

wherein the processor is configured to: the executable instructions are executed to implement the method of the first aspect described above.

In a sixth aspect, an embodiment of the present invention provides a control device, including a memory, a processor;

a memory: for storing the processor-executable instructions;

wherein the processor is configured to: the executable instructions are executed to implement the method of the second aspect described above.

In a seventh aspect, the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and when executed by a processor, the computer-executable instructions are configured to implement the method according to the first aspect, or implement the method according to the second aspect.

In an eighth aspect, an embodiment of the present invention provides a speech translation system, including:

a speech processing apparatus, and the speech translation apparatus of the fifth aspect and the control apparatus of the sixth aspect.

According to the voice translation method and device provided by the embodiment of the invention, audio data and a language text converted by voice processing equipment based on the audio data are obtained; sending the audio data and the language text to a control terminal so that the control terminal checks the language text according to the audio data and generates a correction rule when the language text is checked to be correct; and receiving the correction rule sent by the control terminal, and calibrating the language text according to the correction rule to obtain a calibrated language text, so that the accuracy of a language text translation result is improved, and the correction rule can be sent and adjusted in real time so as to calibrate the language text in real time without influencing the normal operation of a conference.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic structural diagram of a speech translation system according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a speech translation method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a speech translation method according to another embodiment of the present invention;

FIG. 4 is a flowchart illustrating a speech translation method according to another embodiment of the present invention;

FIG. 5 is a flowchart illustrating a speech translation method according to another embodiment of the present invention;

fig. 6 is an interaction signaling diagram of a speech translation method according to an embodiment of the present invention;

fig. 7 is an interaction signaling diagram of a speech translation method according to another embodiment of the present invention;

FIG. 8 is a functional block diagram of a speech translation apparatus according to an embodiment of the present invention;

fig. 9 is a functional block diagram of a speech translation apparatus according to another embodiment of the present invention;

fig. 10 is a schematic hardware configuration diagram of a speech translation apparatus according to an embodiment of the present invention;

fig. 11 is a schematic diagram of a hardware structure of a control device according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a speech translation system according to an embodiment of the present invention.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Furthermore, references to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example.

The speech translation method provided by the application is applicable to the architecture schematic diagram of the speech translation system shown in fig. 1. Taking the speech translation system shown in fig. 1 as an example, the speech translation system includes a user terminal 10, a speech processing device 20, and a control terminal 30, where the user terminal 10 may be a mobile phone, a computer, a vehicle-mounted terminal, a smart home device, a robot, and other terminal devices, and is not limited herein. A user can perform business processing such as specifying a translation language, reading and displaying a language text after voice translation and the like through the user terminal 10; the user terminal 10 may have an audio acquisition device for acquiring audio data; the user terminal 10 includes at least one image or text display unit for displaying the language text translated by the speech processing device 20.

The speech processing device 20 is used for performing recognition and translation of audio data.

The control terminal 30 may be a computing device such as a desktop computer, a notebook computer, a palm computer, and a cloud speech processing device, which is not limited herein. The user can perform service processing such as instruction transmission and rule setting through the control terminal 30.

The control terminal 30 communicates with the user terminal 10 through a network; alternatively, the control terminal 30 may communicate with the voice processing device 20 through a network, and the voice processing device 20 communicates with the user terminal 10 through the network, thereby achieving indirect communication between the control terminal 30 and the user terminal 10. One control terminal 30 may communicate with a plurality of user terminals 10, and one voice processing device 20 may also communicate with a plurality of user terminals 10. The network in the above can be suitable for different network standards.

A user inputs a voice on a user terminal 10, sets a language type after voice recognition and translation, the user terminal 10 sends the obtained audio data and the language type to a voice processing device 20, the voice processing device 20 recognizes and translates the audio data, converts the audio data into a language text of the set language type, and feeds the language text back to the user terminal 10, the user terminal 10 sends the language text and the audio data to a control terminal 30, the control terminal 30 generates a correction rule or a control instruction according to the language text and the audio data, and sends the correction rule or the control instruction to the user terminal 10, the user terminal 10 calibrates or adjusts the language text according to the correction rule or the control instruction, obtains the calibrated language text, and improves the accuracy of a language text translation result; the control terminal 30 and the user terminal 10 interact in real time, so that real-time calibration of language texts is realized; and one control terminal 30 can interact with a plurality of user terminals 10, suitable for a scenario in which a plurality of users perform speeches or cooperate with a plurality of user terminals.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 2 is a flowchart illustrating a speech translation method according to an embodiment of the present invention. The main execution body of this embodiment is the user terminal in the embodiment shown in fig. 1, and as shown in fig. 2, the method includes:

s201, acquiring audio data and a language text converted by the voice processing equipment based on the audio data.

In this embodiment, the source of the audio data includes any one of: the system comprises audio acquisition equipment carried by a user terminal and a carried audio output unit.

In one implementation mode, the user terminal comprises at least one audio acquisition device, and the user terminal sends a first instruction signal to the audio acquisition device mounted on the user terminal, so that the audio acquisition device acquires audio data according to the first instruction signal. The first instruction signal may be designated by a user or may be transmitted by a control terminal. Optionally, the audio capture device is a microphone.

For clarity of the embodiment, in a possible conference application scenario, the user terminal is a computer disposed on a platform and includes a plurality of microphones and a projector, and the control terminal is another computer on a conference site and is operated by an operator. The voice processing equipment and the user terminal realize network communication through the wireless networking device.

When preparing a conference site, marking a wireless microphone used by a listener as a second microphone, and if the listener needs to ask questions in the conference, an operator sends the following instruction signals to a user terminal through a control terminal: and setting the audio acquisition equipment as a second microphone. The second microphone can collect the audio data of the listener.

In another embodiment, the user terminal is equipped with other audio output units, and the user terminal acquires a second instruction signal sent by the control terminal and acquires audio data from the audio output unit equipped in the user terminal according to the second instruction signal. Alternatively, the second instruction signal may be generated by the user terminal itself.

Specifically, the user terminal takes audio output of other audio output units as a source of audio data. If the user terminal runs the Windows operating system, the user terminal can obtain the Audio output data of other Audio output units through the Windows Audio Session API (Windows Audio Session API); if the user terminal runs the linux operating system, the user terminal can acquire Audio output data of other programs through the Pulse Audio program. Where Pulse Audio is an Audio speech processing device, a background process accepts Audio input from one or more Audio sources (processes or input devices) and then redirects the Audio to one or more slots (sound card, remote network Pulse Audio service, or other process).

In one possible conferencing application scenario, the speaker plays the english video without subtitles. The operation sends the following instructions to the user terminal through the control terminal: the audio input source is set to be retrieved from the video player. Therefore, when the video is played, Chinese and English subtitles are displayed on a screen of the user terminal, wherein the Chinese subtitles are in the language of the target language and are specified by the user at the user terminal. Compared with a scheme that audio data are directly acquired by audio acquisition equipment, the scheme has the advantages of less voice distortion and convenience in improving the accuracy of voice translation.

S202, sending the audio data and the language text to a control terminal so that the control terminal can verify the language text according to the audio data, and generating a correction rule when the verification determines that the language text is wrong.

In this embodiment, the user terminal sends audio data and language text to the control terminal. Alternatively, the user terminal may also transmit the number of the user terminal, the number of the audio collecting device, display data of the language text, size data of the display unit of the user terminal, and the like to the control terminal. Wherein the display data of the language text comprises a final effect of the language text displayed at the user terminal.

Optionally, the control terminal returns the correction rule by an operator.

S203, receiving the correction rule sent by the control terminal, and calibrating the language text according to the correction rule.

In this embodiment, the correction rule includes addition, modification, or deletion of language text.

Optionally, the user terminal calibrates the language text according to the correction rule, obtains the calibrated language text, and displays the calibrated language text.

In one implementation mode, after the user terminal receives the language text converted by the voice processing equipment, the language text is calibrated according to a current first correction rule of the user terminal, the calibrated language text is obtained and displayed, and meanwhile, the calibrated language text is sent to the control terminal, so that the control terminal verifies the calibrated language text, and a second correction rule is generated when the verification determines that the calibrated language text is wrong; and the user terminal receives the second correction rule and calibrates the language text obtained by next conversion.

In another embodiment, after receiving the language text converted by the voice processing device, the user terminal sends the language text to the control terminal, so that the control terminal checks the language text and generates a correction rule when the check determines that the calibrated language text is incorrect; and the user terminal receives the correction rule, and calibrates and displays the language text.

In another embodiment, after receiving the language text converted by the voice processing device, the user terminal sends the language text to the control terminal, so that the control terminal checks according to the language text, and generates a correction rule when the check determines that the calibrated language text is incorrect; and the user terminal receives the correction rule, and calibrates and displays the language text. And simultaneously, the user terminal sends the correction rule to the voice processing equipment so that the voice processing equipment can calibrate the voice text obtained by next conversion according to the correction rule.

According to the voice translation method provided by the embodiment of the invention, voice data and a language text obtained by voice processing equipment based on the voice data conversion are obtained; sending the audio data and the language text to a control terminal so that the control terminal checks the language text according to the audio data and generates a correction rule when the language text is checked to be correct; and receiving the correction rule sent by the control terminal, and calibrating the language text according to the correction rule to obtain a calibrated language text, so that the accuracy of a language text translation result is improved, the correction rule can be sent and adjusted in real time, the calibration of the language text is carried out in real time, and the normal operation of a conference is not influenced.

In an actual application scenario, a user specifies a language of a language text, and a user terminal sends the language to a voice processing device. The voice processing equipment firstly identifies the audio data to obtain a first language text with the same language as the audio data, and then translates the first language text to obtain a second language text with the same language as the specified language.

And the user terminal acquires and displays a language text obtained by the voice processing equipment based on the audio data conversion, wherein the language text comprises the first language text and the second language text. It should be understood that the language text displayed is a first language text and/or a second language text.

The calibration of the language text can be based on the first language text or the second language text, and the following describes the calibration process of the language text in detail through the embodiment shown in fig. 3.

Fig. 3 is a flowchart illustrating a speech translation method according to another embodiment of the present invention. The language text comprises a second language text obtained based on the translation of the first language text, wherein the first language text is a text obtained based on audio data identification; the revised rule is a recognition alignment rule for aligning text in the first language.

As shown in fig. 3, the method may further include:

s301, calibrating the first language text according to the correction rule to obtain the calibrated first language text. In this embodiment, the modification rule is an identification calibration rule for calibrating the first language text, and the identification calibration rule includes at least one of the following items: addition, modification, and deletion of text in the first language.

S302, the calibrated first language text is sent to the voice processing equipment, so that the voice processing equipment translates the calibrated first language text to obtain the calibrated language text.

S303, receiving the calibrated language text sent by the voice processing equipment.

In this embodiment, the user terminal calibrates the first language text to obtain a calibrated first language text, and the voice processing device translates the calibrated first language text to obtain a final second language text.

Optionally, if the correction rule is a translation calibration rule for calibrating the second language text, the user terminal may calibrate the second language text according to the correction rule to obtain the calibrated language text.

According to the voice translation method provided by the embodiment of the invention, the plurality of user terminals respectively receive and match respective correction rules and carry out calibration independently, so that the workload of the voice processing equipment is greatly reduced, and the efficiency of voice translation calibration is improved; the method can be more timely and accurately suitable for different voice translation scenes; in addition, the calibration of the first language text and the second language text can be respectively carried out, so that the statistics of errors in the speech recognition of the speech processing equipment is facilitated, and the optimization of the subsequent speech recognition is facilitated.

Alternatively, the calibration of the speech text may be performed by the speech processing device. In one embodiment, the user terminal sends the modification rule to the speech processing device, so that the speech processing device calibrates the next converted speech text according to the modification rule.

If the correction rule is an identification calibration rule, the voice processing equipment calibrates the first language text directly identified and obtained by the audio data according to the identification calibration rule in the subsequent voice identification translation, then translates the calibrated first language text to obtain a calibrated second language text, and sends the calibrated first language text and the calibrated second language text to the user terminal. It should be understood that this calibration occurs after speech recognition and before text translation.

And if the correction rule is a translation calibration rule, the speech processing equipment directly calibrates the second language text according to the translation calibration rule in the subsequent speech recognition translation. In this case, the calibration takes place after the text translation and before the transmission to the user terminal.

To clearly illustrate the embodiment, in a possible conference application scenario, a multi-person conference, wherein the speech of the speaker is chinese, the first language text is chinese, and the second language text is english. The fact that the speaker has two cameras means that the mobile phone has two cameras, the user terminal displays the two cameras (with two colors in English), namely, the voice processing device has an error at the voice recognition end and recognizes the two cameras as two colors, which causes inaccurate subsequent translation; the user terminal sends the displayed language text and the audio data to the control terminal, after an operator sees the error, the operator judges that the error occurs when the first language text is obtained according to the audio data, the following identification and calibration rules are added to the control terminal, and the user terminal receives the correction rules and sends the correction rules to the voice processing equipment so as to calibrate the voice processing equipment.

Identifying a calibration rule: and modifying the 'double colors' in the first language text into 'double shot' during the speech processing of the next strip.

The voice processing equipment identifies the audio data to obtain a first language text, calibrates according to an identification calibration rule, modifies 'Double colors' in the first language text into 'Double shot', translates the 'Double shot' to obtain a second language text of the 'Double cell', and the user terminal obtains and displays the second language text. Thereafter, when the speaker advances to "Double shot", it is displayed as "Double cell" on the caption.

Optionally, if the modification rule is a translation calibration rule, the modification is performed after the second language text is acquired.

According to the voice translation method provided by the embodiment of the invention, the voice processing equipment is used for calibrating the language text, the calibration of the language text is divided into the first language text and the second language text which are respectively carried out, the error in voice recognition can be counted, and the subsequent voice recognition can be conveniently optimized.

Fig. 4 is a flowchart of a speech translation method according to another embodiment of the present invention, in this embodiment, a display calibration for a speech text is added on the basis of the above embodiment, for example, on the basis of the embodiment shown in fig. 2, the method may further include:

s401, acquiring display data of the language text on the local computer; wherein the display data comprises a display picture or a display video of the language text on the local computer.

In one embodiment, a displayed picture of the language text is obtained from the time-sampled screenshot.

In another embodiment, the displayed video of the language text is obtained in a video stream.

In practical applications, any of the above manners may be selected according to the network bandwidth.

S402, sending the display data to a control terminal so that the control terminal generates a display parameter adjusting instruction according to the display data.

In this embodiment, the number and size of the user terminal display device may also be sent to the control terminal.

And S403, receiving the display parameter adjusting instruction sent by the control terminal, and adjusting the display data of the language text on the local computer according to the display parameter adjusting instruction.

In one embodiment, the user terminal adjusts a window for displaying the language text on the local computer according to the display parameter adjustment instruction, wherein the adjustment of the window includes at least one of the following: the size, position, background color and background transparency of the window;

in another embodiment, the user terminal adjusts at least one of the following language texts according to the display parameter adjustment instruction: font size, font, color, transparency, and dwell time of the language text.

Optionally, the language text is displayed on top of a semitransparent window, so that other demonstration programs carried by the user terminal are not interfered, and the user can be ensured to clearly read the language text.

In one possible conferencing application scenario, there is a back-line listener representation looking away from verbal text content. The operator responds to the requirement, and sends the following display parameter adjusting instructions to the user terminal through the control terminal: the font size of the language text is adjusted to 32 px. The user terminal receives the display parameter adjusting instruction and adjusts the language text to 32px according to the rule so as to solve the technical problems that the language text display effect is poor and the conference is influenced.

The voice translation method provided by the embodiment of the invention solves the technical problems of poor language character display effect caused by the diversity of user terminal display equipment, such as improper character display size, improper display area, improper character display color caused by equipment color difference and the like, and the display adjustment of the language text does not influence the conference progress, thereby greatly improving the application effect of voice translation.

Fig. 5 is a flowchart illustrating a speech translation method according to still another embodiment of the present invention. The execution subject of this embodiment is the control terminal in the embodiment shown in fig. 1, and as shown in fig. 5, the method includes:

s501, receiving audio data sent by a user terminal and a language text obtained by converting the audio data;

s502, verifying the language text according to the audio data, and generating a correction rule when the language text is verified to be correct;

s503, sending the correction rule to the user terminal so that the user terminal can calibrate the language text according to the correction rule.

For the specific implementation principle and process of the method in this embodiment, reference may be made to any of the embodiments described above, which are not described herein again.

The speech translation method provided by this embodiment receives audio data sent by a user terminal and a language text obtained by conversion based on the audio data; checking the language text according to the audio data, and generating a correction rule when the language text is checked to be correct; the correction rule is sent to the user terminal, so that the user terminal can calibrate the language text according to the correction rule, the translation accuracy of the language text can be effectively improved, the real-time calibration of the language text can be realized, the normal operation of a conference is not influenced, and meanwhile, a plurality of user terminals are supported, so that the method is convenient to be applied to a scene in which a plurality of people speak or cooperate with the plurality of user terminals.

Fig. 6 is an interaction signaling diagram of a speech translation method according to an embodiment of the present invention, as shown in fig. 6, the method may include:

s601, the user terminal acquires audio data.

S602, the user terminal sends the audio data to the voice processing equipment.

And S603, converting the voice data by the voice processing equipment to obtain a language text.

S604, the voice processing equipment sends the language text to the user terminal.

S605, the user terminal sends the audio data and the language text to the control terminal.

And S606, the control terminal checks the language text according to the audio data, and generates a correction rule when the language text is checked to be correct.

S607, the control terminal sends the correction rule to the user terminal.

And S608, the user terminal calibrates the language text according to the correction rule.

For specific implementation of this embodiment, reference may be made to the embodiments shown in fig. 1 and fig. 5, which are not described herein again.

Fig. 7 is an interaction signaling diagram of a speech translation method according to another embodiment of the present invention, as shown in fig. 7, the method may include:

s701, the user terminal obtains audio data.

S702, the user terminal sends the audio data to the voice processing equipment.

And S703, converting the voice processing equipment based on the audio data to obtain a language text.

S704, the voice processing equipment sends the language text to the user terminal.

S705, the user terminal acquires display data of the voice text on the local computer.

S706, the user terminal sends the audio data, the language text and the display data to a control terminal.

S707, the control terminal checks the language text according to the audio data, and generates a correction rule when the check determines that the language text is wrong; and meanwhile, the control terminal generates a display parameter adjusting instruction according to the display data.

It should be understood that the control terminal verifies the display data and generates a display parameter adjustment instruction when the display effect of the display data is not good.

And S708, the control terminal sends the correction rule and the display parameter adjusting instruction to the user terminal.

And S709, the user terminal adjusts the display data of the language text on the local computer according to the display parameter adjusting instruction.

S710, the user terminal sends the correction rule to the voice processing equipment.

And S711, the voice processing equipment calibrates the voice text obtained by next conversion according to the correction rule.

The specific implementation of this embodiment can refer to the embodiments shown in fig. 1, fig. 4, and fig. 5, and is not described herein again.

Based on the speech translation method provided by the above embodiment, the embodiment of the present invention further provides an apparatus embodiment for implementing the above method embodiment, in which the user terminal is used as an execution subject and the control terminal is used as an execution subject.

Fig. 8 is a schematic structural diagram of a speech translation apparatus according to an embodiment of the present invention. The speech translation apparatus 80 is applied to a user terminal, and as shown in fig. 8, the speech translation apparatus includes an obtaining module 810, a first sending module 820, and a calibration module 830.

An obtaining module 810, configured to obtain audio data and a language text converted by the speech processing device based on the audio data.

The first sending module 820 is configured to send the audio data and the language text to a control terminal, so that the control terminal checks the language text according to the audio data, and generates a correction rule when the language text is checked to be incorrect.

The calibration module 830 is configured to receive the correction rule sent by the control terminal, and calibrate the language text according to the correction rule.

According to the voice translation device provided by the embodiment of the invention, the acquisition module acquires audio data and a language text converted by the voice processing equipment based on the audio data; the first sending module sends the audio data and the language text to a control terminal so that the control terminal checks the language text according to the audio data and generates a correction rule when the language text is checked to be correct; the calibration module receives the correction rule sent by the control terminal, calibrates the language text according to the correction rule, obtains the calibrated language text, improves the accuracy of the translation result of the language text, and can calibrate the language text in real time according to the real-time sending and adjusting of the correction rule without influencing the normal operation of the conference.

Optionally, the language text includes a first language text obtained by performing speech recognition on the audio data; the calibration module 830 is specifically configured to: calibrating the first language text according to the correction rule to obtain a calibrated first language text; sending the calibrated first language text to the voice processing equipment so that the voice processing equipment translates the calibrated first language text to obtain a calibrated language text; and receiving the calibrated language text sent by the voice processing equipment.

Optionally, the language text includes a second language text translated based on a first language text, where the first language text is a text recognized and obtained based on the audio data. The calibration module 830 is further specifically configured to: and calibrating the second language text according to the correction rule to obtain a calibrated language text.

Optionally, the speech translation apparatus further includes a third sending module (not shown in fig. 8) configured to send the modification rule to the speech processing device, so that the speech processing device calibrates the speech text obtained by the next conversion according to the modification rule.

Optionally, the speech translation apparatus further includes a display adjustment module (not shown in fig. 8), specifically configured to: acquiring display data of the language text on a local computer; wherein the display data comprises a display picture or a display video of the language text on the local computer; sending the display data to a control terminal so that the control terminal generates a display parameter adjusting instruction according to the display data; and receiving the display parameter adjusting instruction sent by the control terminal, and adjusting the display data of the language text on the local computer according to the display parameter adjusting instruction.

The display adjustment module is further specifically configured to, according to the display parameter adjustment instruction, adjust a window on the native device, where the window is used for displaying the language text, where the adjustment of the window includes at least one of: the size, position, background color and background transparency of the window; and/or adjusting at least one of the following language texts according to the display parameter adjusting instruction: font size, font, color, transparency, and dwell time of the language text.

Optionally, the obtaining module 810 is specifically configured to send a first instruction signal to an audio acquisition device mounted on the local computer, so that the audio acquisition device acquires audio data according to the first instruction signal; or receiving a second instruction signal, and acquiring audio data from an audio output unit carried by the machine according to the second instruction signal; wherein, the second instruction signal is originated from the control terminal or the local machine.

The speech translation apparatus 80 in the embodiment shown in fig. 8 can be used to execute the technical solution that uses the user terminal as the execution subject in the above method, and this embodiment is not described herein again.

Fig. 9 is a schematic structural diagram of a speech translation apparatus according to another embodiment of the present invention. The speech translation apparatus 90 is applied to a control terminal, and as shown in fig. 9, the speech translation apparatus includes a first receiving module 910, a verifying module 920 and a second sending module 930.

The first receiving module 910 is configured to receive audio data sent by a user terminal and a language text obtained by converting the audio data.

The verification module 920 is configured to verify the language text according to the audio data, and generate a modification rule when the verification determines that the language text is faulty.

A second sending module 930, configured to send the modification rule to the user terminal, so that the user terminal calibrates the language text according to the modification rule.

The speech translation apparatus 90 in the embodiment shown in fig. 9 can be used to implement the technical solution of the method described above that uses the control terminal as the execution subject, and this embodiment is not described herein again.

It should be understood that the division of the modules of the speech translation apparatus shown in fig. 8 and 9 is only a logical division, and the actual implementation may be wholly or partially integrated into a physical entity, or may be physically separated; and these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling by the processing element in software, and part of the modules can be realized in the form of hardware. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

Fig. 10 is a schematic hardware structure diagram of a speech translation apparatus according to an embodiment of the present invention. As shown in fig. 10, the speech translation apparatus 100 provided in the present embodiment includes: at least one memory 101, a processor 102 and a computer program; wherein the computer program is stored in the memory 101 and configured to be executed by the processor 102 to implement the speech translation method as described above with the user terminal as the execution subject. The speech translation apparatus 100 also includes a communication section. The processor 102, the memory 101, and the communication unit are connected by a bus.

It will be understood by those skilled in the art that fig. 10 is merely an example of a speech translation device and is not intended to be limiting, and that a speech translation device may include more or fewer components than shown, or some components may be combined, or different components, e.g., the speech translation device may also include an input-output device, a network access device, a bus, etc. In this embodiment, the speech translation apparatus includes at least one audio capture apparatus and an image display unit.

Fig. 11 is a schematic diagram of a hardware structure of a control device according to an embodiment of the present invention. As shown in fig. 11, the control device 110 includes: at least one memory 111, a processor 112 and a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor to implement the speech translation method as described above with the control terminal as the execution subject.

Furthermore, an embodiment of the present invention provides a readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method described in any implementation manner when the user terminal is taken as an execution subject or to implement the method described in any implementation manner when the control terminal is taken as an execution subject.

The readable storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

Fig. 12 is a schematic hardware structure diagram of a speech translation system according to an embodiment of the present invention. As shown in fig. 12, the speech translation system 120 includes a speech translation apparatus 100, a speech processing apparatus 20, and a control apparatus 110. The speech processing device 20 is used for recognizing and translating audio data. The speech translation apparatus 100 may be the speech translation apparatus in the embodiment described above in fig. 10; the control device 110 may be the control device in the embodiment described above with respect to fig. 11.

The parts not described in detail in this embodiment may refer to the relevant description of the corresponding embodiment of the method.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A speech translation method is applied to a user terminal and comprises the following steps:

receiving the correction rule sent by the control terminal, and calibrating the language text according to the correction rule;

after the calibrating the language text according to the correction rule, the method further includes:

and sending the correction rule to the voice processing equipment so that the voice processing equipment can calibrate the voice text obtained by next conversion according to the correction rule.

2. The method according to claim 1, wherein the language text comprises a first language text obtained by speech recognition of the audio data;

the calibrating the language text according to the correction rule comprises:

calibrating the first language text according to the correction rule to obtain a calibrated first language text;

sending the calibrated first language text to the voice processing equipment so that the voice processing equipment translates the calibrated first language text to obtain a calibrated language text;

and receiving the calibrated language text sent by the voice processing equipment.

3. The method of claim 1, wherein the language text comprises a second language text translated based on a first language text, wherein the first language text is a text obtained based on the audio data identification;

the calibrating the language text according to the correction rule comprises:

and calibrating the second language text according to the correction rule to obtain a calibrated language text.

4. The method of claim 1, wherein after the obtaining audio data and the language text converted by the speech processing device based on the audio data, the method further comprises:

acquiring display data of the language text on a local computer; wherein the display data comprises a display picture or a display video of the language text on the local computer;

sending the display data to a control terminal so that the control terminal generates a display parameter adjusting instruction according to the display data;

and receiving the display parameter adjusting instruction sent by the control terminal, and adjusting the display data of the language text on the local computer according to the display parameter adjusting instruction.

5. The method of claim 4, wherein adjusting the display data of the language text on the native machine according to the display parameter adjustment instruction comprises:

according to the display parameter adjusting instruction, adjusting a window which is used for displaying the language text on the local machine, wherein the adjustment of the window comprises at least one of the following items: the size, position, background color and background transparency of the window;

and/or

Adjusting at least one of the following language texts according to the display parameter adjusting instruction: font size, font, color, transparency, and dwell time of the language text.

6. The method of claim 1, wherein the obtaining audio data comprises:

sending a first instruction signal to audio acquisition equipment carried on the machine so that the audio acquisition equipment acquires audio data according to the first instruction signal;

or

Acquiring a second instruction signal, and acquiring audio data from an audio output unit carried on the computer according to the second instruction signal; and the source of the second instruction signal is the control terminal or the local machine.

7. A voice translation method is applied to a control terminal and comprises the following steps:

sending the correction rule to a user terminal so that the user terminal calibrates the language text according to the correction rule; and the correction rule is also used for calibrating the voice text obtained by next conversion by the voice processing equipment, and the correction rule is sent to the voice processing equipment after the user terminal finishes calibrating the language text.

8. A speech translation apparatus, applied to a user terminal, comprising:

the calibration module is used for receiving the correction rule sent by the control terminal and calibrating the language text according to the correction rule;

the device further includes a third sending module, configured to send the modification rule to the speech processing device, so that the speech processing device calibrates the speech text obtained by the next conversion according to the modification rule.

9. A speech translation apparatus, applied to a control terminal, includes:

the second sending module is used for sending the correction rule to a user terminal so that the user terminal can calibrate the language text according to the correction rule; and the correction rule is also used for calibrating the voice text obtained by next conversion by the voice processing equipment, and the correction rule is sent to the voice processing equipment after the user terminal finishes calibrating the language text.

10. A speech translation device comprising a memory, a processor;

a memory: for storing the processor-executable instructions;

wherein the processor is configured to: executing the executable instructions to implement the method of any of claims 1 to 6.

11. A control device comprising a memory, a processor;

a memory: a memory for storing the processor-executable instructions;

wherein the processor is configured to: the executable instructions are executed to implement the method of claim 7.

12. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, are configured to implement the method of any one of claims 1 to 6 or the method of claim 8.

13. A speech translation system, comprising:

speech processing device, and a speech translation device according to claim 10 and a control device according to claim 11.