CN111447397B

CN111447397B - Video conference based translation method, video conference system and translation device

Info

Publication number: CN111447397B
Application number: CN202010230286.8A
Authority: CN
Inventors: 李智彪
Original assignee: Shenzhen Wooask Technology Co ltd
Current assignee: Shenzhen Wooask Technology Co ltd
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2021-11-23
Anticipated expiration: 2040-03-27
Also published as: CN111447397A

Abstract

The invention discloses a translation method and a translation device based on a video conference, wherein the method comprises the steps of obtaining first voice information corresponding to a first video conference terminal through a first translation terminal; and translating the first voice information into first translation information based on a preset language type, and sending the first voice information and the first translation information to a second video conference terminal. According to the invention, the first translation terminal receives the first voice information input by the user through the first video conference terminal, automatically translates the first voice information, and synchronously sends the translated information and the first voice information obtained by translation to the second video conference terminal, so that the automatic translation of voice in the video conference process is realized, the first voice information is received and forwarded through the first translation terminal, and the synchronism of the second video conference terminal for receiving the first voice information and the corresponding first translation information is improved.

Description

Video conference based translation method, video conference system and translation device

Technical Field

The invention relates to the technical field of translation, in particular to a translation method, a video conference system and a translation device based on a video conference.

Background

At present, simultaneous interpretation systems are generally required to be deployed in various conferences, and interpreters provide instant translation services through the simultaneous interpretation systems, so that participants can obtain real-time translation voices. However, when multiple languages are involved in the video conference, multiple translators are required to perform translation synchronously, which increases conference cost of the video conference on one hand, and increases difficulties for the transnational video conference on the other hand because communication quality of the conference also depends on translation levels of the translators.

Disclosure of Invention

The invention provides a translation method, a video conference system and a translation device based on a video conference, aiming at the defects of the prior art.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a translation method based on a video conference is applied to a video conference system, wherein the video conference system comprises a plurality of video conference terminals and a plurality of translation terminals, and the video conference terminals are in one-to-one correspondence with the translation terminals; the method comprises the following steps:

the method comprises the steps that a first translation terminal obtains first voice information corresponding to a first video conference terminal;

the first translation terminal translates the first voice information into first translation information based on a preset language type, wherein the first translation information is character information corresponding to the voice type;

and the first translation terminal sends the first voice information and the first translation information to a second video conference terminal.

The video conference based translation method includes that the first translation terminal sends the first voice information and the first translation information to a second video conference terminal:

the first translation terminal sends the first voice information and the first translation information to a second translation terminal so as to send the first voice information and the first translation information to a second video conference terminal through the second translation terminal, wherein the second translation terminal is connected with the second video conference terminal.

The video conference based translation method, wherein the translating the first voice information into the first translation information by the first translation terminal based on the preset language type specifically includes:

the first translation terminal acquires first character information corresponding to the first voice information;

the first translation terminal translates the first character information into first translation information, wherein the language type corresponding to the first translation information is a preset language type.

The video conference based translation method includes that the step of acquiring, by the first translation terminal, first text information corresponding to the first voice information specifically includes:

the first translation terminal sends the first voice information to a cloud end, and first character information corresponding to the first voice information is identified through the cloud end.

The video conference based translation method comprises the following steps that when a first translation terminal receives second voice information and second translation information sent by other translation terminals, the method comprises the following steps:

a first translation terminal acquires a target language type corresponding to the first translation terminal and selects target translation information from the second translation information, wherein the target translation information corresponds to the target language type;

and the first translation terminal sends the second voice information and the target translation information to a corresponding first video conference terminal.

According to the translation method based on the video conference, the first video conference terminal plays the second voice information in a voice broadcasting mode and displays the target translation information in a subtitle form.

The video conference-based translation method, wherein when the target language type comprises a plurality of target language types; and the first video conference terminal synchronously displays the target translation information corresponding to each target language type on a display interface of the first conference terminal.

A video conference system comprises a plurality of video conference terminals and a plurality of translation terminals, wherein the plurality of conference terminals correspond to the plurality of translation terminals one by one, and each translation terminal is connected with the corresponding conference terminal; each of the translation terminals may be adapted to perform a videoconference based translation method as described above.

A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps in the video conference based translation method and translation apparatus as described in any one of the above.

A translation device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the communication bus realizes connection communication between the processor and the memory;

the processor, when executing the computer readable program, implements the steps in the video conference based translation method and translation apparatus as described in any of the above.

Has the advantages that: compared with the prior art, the invention provides a translation method and a translation device based on a video conference, wherein the method comprises the steps of obtaining first voice information corresponding to a first video conference terminal through a first translation terminal; and translating the first voice information into first translation information based on a preset language type, and sending the first voice information and the first translation information to a second video conference terminal. According to the invention, the first translation terminal receives the first voice information input by the user through the first video conference terminal, automatically translates the first voice information, and synchronously sends the translated information and the first voice information obtained by translation to the second video conference terminal, so that the automatic translation of voice in the video conference process is realized, the first voice information is received and forwarded through the first translation terminal, and the synchronism of the second video conference terminal for receiving the first voice information and the corresponding first translation information is improved.

Drawings

Fig. 1 is a flowchart of a video conference-based translation method provided in the present invention.

Fig. 2 is a schematic structural diagram of a translation apparatus provided in the present invention.

Detailed Description

The invention provides a translation method, a video conference system and a translation device based on a video conference, and in order to make the purpose, technical scheme and effect of the invention clearer and clearer, the invention is further described in detail below by referring to the attached drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The implementation provides a translation method based on a video conference, which is applied to a video conference system, wherein the video conference system comprises a plurality of video conference terminals and a plurality of translation terminals, and the video conference terminals are in one-to-one correspondence with the translation terminals. As shown in fig. 1, the method may include the steps of:

s10, the first translation terminal acquires first voice information corresponding to the first video conference terminal.

Specifically, the first translation terminal is connected with the first video conference terminal, wherein the first translation terminal may be an independent translation box and is connected with the first video conference terminal through a wire or wirelessly; or a translation function configured in the first video conference terminal, wherein when the first video conference terminal starts a video conference, the first video conference terminal starts the translation function; the terminal can also be a translation APP, the translation APP is loaded in the first video conference terminal, and when the first video conference terminal starts a video conference, the first video conference terminal starts the translation function. Of course, in practical applications, the first translation terminal may also exist in other forms as long as the method described in the present application can be implemented. In a specific implementation manner of this embodiment, the first translation terminal is an independent translation box, and the translation box is connected to the first video conference terminal.

Of course, it should be noted that, several translation terminals in the video conference system may take different forms, for example, part of the translation terminals are independent translation boxes, part of the translation terminals are translation APPs installed on the video conference, and so on. The translation terminals can be connected to the background server through a network, and the background server forms a translation terminal cluster with the translation terminals in the same video conference according to the conference information sent by the translation terminals, so that the translation terminals can determine other translation terminals which need to be connected.

Further, when the first translation terminal is a translation box, various audio equipment accessories, such as various sound collectors (e.g., microphone, etc.), wireless earphones, loudspeakers (e.g., sound box, etc.), etc., may be mounted on the translation box. It can be understood that, the first translation terminal may obtain the first voice information corresponding to the first video conference terminal, where the first translation terminal receives the first voice information sent by the first video conference terminal through a network or a connection line, or may obtain the voice information sent by the first video conference terminal through an audio device accessory configured by the first translation terminal. For example, when a sound pickup is arranged on the first translation terminal, when the first video conference terminal plays voice information through a loudspeaker, the first translation terminal receives the voice information through the sound pickup; for another example, when a user carries out a video conference through a first video conference terminal, a first translation terminal acquires voice information formed by the user through a sound pickup device configured by the first translation terminal to obtain picked-up voice information, and directly translates the picked-up voice information, so that when the first video conference terminal sends the first voice information picked up by the first video conference terminal to a first translation terminal, the first translation terminal already acquires translation information corresponding to the first voice information, and the instantaneity of the translation information is improved. Of course, the first translation terminal picks up that the picked-up voice information corresponds to the first voice information transmitted by the first video conference.

Further, in a specific implementation manner of this embodiment, when the first translation terminal is a translation box, the first translation terminal may be provided with one HDMI input port, two HDMI output ports, a network port, a light port, and an expansion port; the HDMI input port is used for connecting the HDMI output of the video conference to acquire the speaking sound of the video conference for translation; the two HDMI output ports are used for connecting a television or a screen and displaying the video conference image, sound and translation information output value on the television or the screen; the expansion port is used for connecting expansion equipment, such as a far-field pickup array microphone, a loudspeaker, a memory, a wireless earphone transmitter and the like; the network port is used for connecting a network so as to communicate with the background server through the network; the optical port is used to transmit audio.

S20, the first translation terminal translates the first voice information into first translation information based on a preset language type, wherein the first translation information is character information corresponding to the voice type.

Specifically, the preset language type is used for determining a language type corresponding to the translation information, and it can be understood that the first translation terminal needs to translate the first speech information into translation information corresponding to the preset language type. When the preset language type is one, the first translation information is one, and the first translation information corresponds to the preset language type; when the preset language types are multiple, the first translation information is multiple, and the multiple first translation information corresponds to the multiple preset language types one to one. For example, the preset language type includes a language type a and a language type B, and the first translation information includes translation information a and translation information B, where the translation information a and the translation information B are both translation information corresponding to the first speech information, the language type corresponding to the translation information a is the language type a, and the language type corresponding to the translation information B is the language type B.

Further, the preset language type may be obtained by the first translation terminal through the background server, and all the video conference terminals participating in the video conference need to obtain the language type. For example, after the video conference terminal is connected, each translation terminal acquires a language type to be translated by the video conference terminal, and uploads the language type to the background server, and the background server forms a language type list according to all the received language types corresponding to the video conference. The first translation terminal may determine the preset language type corresponding to the first translation terminal according to the language type list. Of course, it should be noted that the preset language types corresponding to different translation terminals may be different.

Further, in an implementation manner of this embodiment, the translating, by the first translation terminal, the first voice information into the first translation information based on the preset language type specifically includes:

s21, the first translation terminal acquires first character information corresponding to the first voice information;

s22, the first translation terminal translates the first character information into first translation information, wherein the language type corresponding to the first translation information is a preset language type.

Specifically, after acquiring first voice information, the first translation terminal performs voice recognition on the first voice information to obtain first text information corresponding to the first voice information, wherein the first translation terminal can recognize the first voice information through a preset voice recognition network, and the voice recognition network is a trained deep learning network model and inputs the first voice information and outputs the first text information corresponding to the first voice information. In addition, the first translation terminal can translate the first voice information locally, and can also translate the first voice information through a cloud. That is to say, the voice recognition network model may be configured in the first translation terminal, or may be assembled in the cloud, and the first voice information is translated through the cloud. Correspondingly, the step of acquiring, by the first translation terminal, the first text information corresponding to the first voice information specifically includes: the first translation terminal sends the first voice information to a cloud end, and first character information corresponding to the first voice information is identified through the cloud end

And S30, the first translation terminal sends the first voice information and the first translation information to a second video conference terminal.

Specifically, the first voice information and the first translation information are synchronously sent to the second video conference terminal, so that the second video conference terminal receives the first voice information and the translation information corresponding to the first voice information at the same time. The second video conference terminal is a translation device for performing a video conference with the first video conference terminal, and the language type corresponding to the first translation information includes a language type which the second video conference terminal needs to acquire.

Further, in an implementation manner of this embodiment, the second video conference terminal is connected to a second translation terminal, the second translation terminal is connected to the first translation terminal (for example, connected through the internet, etc.), and the first translation terminal synchronously sends the first voice information and the first translation information to the second translation terminal, and sends the first voice information and the first translation information to the second video conference terminal through the second translation terminal. Correspondingly, the sending, by the first translation terminal, the first voice information and the first translation information to the second video conference terminal specifically includes: the first translation terminal sends the first voice information and the first translation information to a second translation terminal so as to send the first voice information and the first translation information to a second video conference terminal through the second translation terminal, wherein the second translation terminal is connected with the second video conference terminal.

Further, in an implementation manner of this embodiment, after the first translation terminal sends the first voice information, the first translation terminal may further receive second voice information and second translation information sent by other video conference terminals in the video conference. Correspondingly, when the first translation terminal receives the second voice information and the second translation information sent by other translation terminals, the method comprises the following steps:

h10, the first translation terminal acquires a target language type corresponding to the first translation terminal, and selects target translation information from the second translation information, wherein the target translation information corresponds to the target language type;

h20, the first translation terminal sends the second voice information and the target translation information to the corresponding first video conference terminal.

Specifically, the target language type corresponding to the first translation terminal is a language type that needs to be obtained by the first video conference terminal corresponding to the first translation terminal. The target language type can be sent to the first translation terminal by the first video conference terminal and stored in the first translation terminal, or can be determined and stored by the first translation terminal according to the language type configuration instruction received by the first translation terminal. The target language type can be one or a plurality of types; when the target language types are multiple, the first translation terminal includes multiple pieces of translation information in the received second translation information, the second translation information includes a number of the multiple pieces of translation information larger than or equal to the number of the target language types, and the multiple language types corresponding to the multiple pieces of translation information include the multiple target language types, so that the first translation terminal can acquire the target translation information corresponding to each target language type in the second translation information.

Further, after receiving second voice information and target translation information, the first video conference terminal plays the second voice information in a voice broadcasting mode and displays the target translation information in a subtitle form, so that a user can synchronously acquire subtitle information corresponding to the voice information while acquiring the language information. Further, when the target language type includes a plurality of target language types; and the first video conference terminal synchronously displays the target translation information corresponding to each target language type on a display interface of the first conference terminal.

Further, in one embodiment, a display device (e.g., a smart television, a display, etc.) may be connected to the first translation terminal, and the first translation terminal, after receiving the second voice message and the target translation message, transmits the second voice message and/or the target translation message to the display device, plays the second voice message through the display device, and displays the target translation message. Of course, when the first video conference is the video conference app, the translation information may also be displayed to the user through a display device connected to the first translation terminal, and may be acquired by a plurality of users at the same time, which brings convenience to the users.

Based on the above translation method based on the video conference, this embodiment provides a video conference system, where the video conference system includes a plurality of video conference terminals and a plurality of translation terminals, the plurality of conference terminals correspond to the plurality of translation terminals one by one, and each translation terminal is connected to its corresponding conference terminal; each of the translation terminals may be configured to perform the video conference based translation method according to the above embodiment.

Based on the above-mentioned video conference based translation method, the present embodiment provides a computer-readable storage medium, which stores one or more programs, and the one or more programs can be executed by one or more processors to implement the steps in the video conference based translation method and the translation apparatus according to the above-mentioned embodiments.

Based on the above-mentioned video conference-based translation method, the present embodiment further provides a translation apparatus, as shown in fig. 2, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory)22, and may further include a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the methods in the embodiments described above.

Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.

The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the translation apparatus, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.

In addition, the specific processes loaded and executed by the instruction processors in the storage medium and the translation apparatus are described in detail in the method, and are not stated herein. Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A translation method based on a video conference is characterized in that the method is applied to a video conference system, the video conference system comprises a plurality of video conference terminals and a plurality of translation terminals, the plurality of video conference terminals correspond to the plurality of translation terminals one by one, when the translation terminals are translation boxes, an HDMI input port, two HDMI output ports, a network port, a light port and an expansion port are arranged on each translation terminal, the HDMI input port is used for being connected with the HDMI output ports of the video conference terminals to obtain voice information of the video conference, the two HDMI output ports are both used for being connected with a television or a screen, the expansion port is used for being connected with expansion equipment, the network port is used for being communicated with a background server through a connection network, and the light port is used for transmitting audio; the method comprises the following steps:

the first translation terminal translates the first voice information into first translation information based on a preset language type, wherein the first translation information is character information corresponding to the language type;

the first translation terminal sends the first voice information and the first translation information to a second video conference terminal;

when a first translation terminal receives second voice information and second translation information sent by other translation terminals, the method comprises the following steps:

the first translation terminal sends the second voice information and the target translation information to a corresponding first video conference terminal;

when the target language type includes a plurality of target language types; the first video conference terminal synchronously displays the target translation information corresponding to each target language type on a display interface of the first conference terminal;

the sending, by the first translation terminal, the first voice information and the first translation information to the second video conference terminal specifically includes:

the first translation terminal sends the first voice information and the first translation information to a second translation terminal so as to send the first voice information and the first translation information to a second video conference terminal through the second translation terminal, wherein the second translation terminal is connected with the second video conference terminal;

the translating, by the first translation terminal, the first voice information into first translation information based on a preset language type specifically includes:

the first translation terminal translates the first character information into first translation information, wherein the language type corresponding to the first translation information is a preset language type;

the acquiring, by the first translation terminal, the first text information corresponding to the first voice information specifically includes: the first translation terminal sends the first voice information to a cloud end, and first character information corresponding to the first voice information is identified through the cloud end;

and the first video conference terminal plays the second voice information in a voice broadcasting mode and displays the target translation information in a subtitle form.

2. A video conference system, characterized in that the video conference system comprises a plurality of video conference terminals and a plurality of translation terminals, the video conference terminals correspond to the translation terminals one by one, and each translation terminal is connected with the corresponding conference terminal, wherein, when the translation terminal is a translation box, the translation terminal is provided with an HDMI input port, two HDMI output ports, a network port, a light port and an expansion port, the HDMI input port is used for connecting the HDMI output ports of the video conference terminal to acquire the voice information of the video conference, both the HDMI output ports are used for connecting a television or a screen, the expansion port is used for connecting expansion equipment, the network port is used for communicating with a background server through a connection network, and the light ray port is used for transmitting audio; each of the plurality of translation terminals is operable to perform the videoconference based translation method of claim 1.

3. A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps in the video conference based translation method and translation apparatus of claim 1.

4. A translation apparatus, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;

the processor, when executing the computer readable program, implements the steps in the video conference based translation method and translation apparatus of claim 1.