CN115278331B

CN115278331B - Multi-language subtitle display method, system, equipment and medium based on machine translation

Info

Publication number: CN115278331B
Application number: CN202210725079.9A
Authority: CN
Inventors: 刘丽梅; 高翊; 王旭阳; 田祥花
Original assignee: Glabal Tone Communication Technology Co ltd
Current assignee: Glabal Tone Communication Technology Co ltd
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2023-10-20
Anticipated expiration: 2042-06-23
Also published as: CN115278331A

Abstract

The application provides a multi-language subtitle display method, a system, equipment and a medium based on machine translation, wherein after a conference starts, a presentation file and a text display control are displayed in real time in a graphical user interface of a field display, and target text information and a target translation result are displayed in the text display control; the field terminal acquires voice information in real time through the voice acquisition terminal; the field terminal carries out voice recognition on the voice information through the server to obtain text information to be displayed; the field terminal translates the text information to be displayed through the server to obtain a translation result to be displayed; the field terminal updates the text information to be displayed and the translation result to be displayed in the text control in real time so as to display the text information and the translation result corresponding to the voice information in the conference in the field display. By adopting the method, the conference participants can be ensured to effectively receive and understand the voice information in the conference.

Description

Multi-language subtitle display method, system, equipment and medium based on machine translation

Technical Field

The application relates to the field of intelligent conference equipment, in particular to a multi-language subtitle display method, a system, equipment and a medium based on machine translation.

Background

When a company performs a conference, files or documents related to conference contents are printed as paper files and distributed to all participants, and the participants browse the paper files on the conference site and listen to the speaking words of conference speakers so as to acquire the conference contents in the conference in real time.

The inventor finds out in the study that when the speaker broadcast of the participants for receiving the conference audio information is unclear, or when the participants are in a noisy environment or the hearing of the participants is poor, the participants can not effectively receive the voice information in the conference; in addition, when the participants come from different areas, the participants may not understand the language of the other party, for example, the united states people who do not have chinese and the chinese people who do not have english, cannot understand the language of the other party, which may also result in that the participants cannot effectively understand the voice information in the conference.

Disclosure of Invention

In view of the above, the present invention aims to provide a multi-language subtitle display method, system, device and medium based on machine translation, which can ensure that participants can effectively receive and understand voice information in a conference.

In a first aspect, an embodiment of the present application provides a machine-translation-based multi-language subtitle display method, which is applied to a machine-translation-based multi-language subtitle display system, where the machine-translation-based multi-language subtitle display system includes a server, a field terminal, a voice acquisition terminal, and a field display, the field terminal is connected with the voice acquisition terminal and the field display through a local area network, and the server and the field terminal are connected through a wide area network, and the method includes:

displaying a presentation file and a text display control in real time in a graphical user interface of the field display after a conference starts, wherein the text display control displays target text information corresponding to voice information acquired by the voice acquisition terminal and a target translation result of the target text information, and the text display control is suspended in the graphical user interface;

responding to an operation instruction of a user on the presentation, and performing corresponding operation on the presentation according to the operation instruction;

updating contents of the target text information and the target translation result in real time through the following steps, wherein the target text information is text information recorded in a first language, and the target translation result is text information recorded in a second language:

The field terminal acquires voice information in real time through the voice acquisition terminal;

the field terminal carries out voice recognition on the voice information through the server to obtain text information to be displayed;

the field terminal translates the text information to be displayed through the server to obtain a translation result to be displayed;

the field terminal judges whether the sum of the number of characters in the current displayed text information and the number of characters in the text information to be displayed and the sum of the number of characters in the current displayed translation result and the number of characters in the translation result to be displayed exceeds a maximum reference value which can be displayed by a preset display control in real time;

if the sum of the number of words in the displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed exceed the maximum reference value, the field end removes at least part of the displayed text information and the translation result to be displayed, merges the rest of the displayed text information and the text information to be displayed into the target text information, and merges the rest of the translation result to be displayed and the translation result to be displayed into the target translation result;

If the sum of the number of words in the displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed do not exceed the maximum reference value, the field terminal merges the displayed text information and the text information to be displayed into target text information and merges the displayed translation result and the translation result to be displayed into target translation result.

Optionally, when presenting the presentation and the text display control in real time in the graphical user interface of the field display, the method includes:

responding to a control instruction issued by the user in the area where the text display control is located, and selecting a target controlled control from the displayed presentation by the field end according to the operation position of the control instruction;

the field terminal operates the target controlled control according to the operation content indicated by the control instruction;

or alternatively, the first and second heat exchangers may be,

responding to a control instruction issued by the user in the area where the text display control is located, and selecting a target controlled control from the displayed text display control by the field end according to the operation position of the control instruction;

And the field terminal operates the target controlled control according to the operation content indicated by the control instruction.

Optionally, when displaying, in the text display control, target text information corresponding to the voice information collected by the voice collection terminal and a target translation result of the target text information, the method includes:

the field terminal obtains a first number of words which can be displayed in the target text information in each row of the word display control;

the field terminal obtains a second number of words which can be displayed in the target translation result in each row of the word display control;

after updating the target text information and the target translation result each time, displaying the updated target text information on the upper half part of the text display control in a mode of displaying the characters of the target number in each row, and displaying the updated target translation result on the lower half part of the text display control in a mode of displaying the characters of the target number in each row, wherein the target number is the minimum number in the first number and the second number.

Optionally, when displaying, in the text display control, target text information corresponding to the voice information collected by the voice collection terminal and a target translation result of the target text information, the method further includes:

And the field end responds to the target language selection operation of the user and hides the text information which is displayed in the text display control and is recorded by adopting the non-target language, wherein the target language comprises the first language and the second language.

the field terminal highlights the target text information and the area where the rarely used word in the target translation result is located in the display, and displays the pronunciation rules of the rarely used word above the rarely used word.

Optionally, after the field end determines in real time whether the sum of the number of words in the current displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the current displayed translation result and the number of words in the translation result to be displayed exceeds a preset maximum reference value which can be displayed by a display control, the method includes:

if the sum of the number of words in the displayed text information and the number of words in the text information to be displayed exceeds the maximum reference value, but the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed does not exceed the maximum reference value, the field end removes at least part of the displayed text information, merges the rest of the displayed text information and the text information to be displayed into the target text information, and merges the displayed translation result and the translation result to be displayed into the target translation result;

If the sum of the number of words in the displayed text information and the number of words in the text information to be displayed does not exceed the maximum reference value, but the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed exceeds the maximum reference value, the field terminal removes at least part of the displayed translation result, merges the displayed text information and the text information to be displayed into the target text information, and merges the rest of the translation result to be displayed and the translation result to be displayed into the target translation result.

the field terminal determines a target role for speaking each voice message according to the voice characteristics of each voice message;

and the field terminal displays target text information corresponding to the voice information taught by different target roles in the text display control with different colors.

In a second aspect, an embodiment of the present application provides a machine translation-based multilingual subtitle display system, where the system includes a server, a field terminal, a voice acquisition terminal, and a field display, where the field terminal is connected to the voice acquisition terminal and the field display through a local area network, and the server and the field terminal are connected through a wide area network:

after a conference starts, the field end is used for displaying a presentation and a text display control in real time in a graphical user interface of the field display, wherein the text display control displays target text information corresponding to voice information acquired by the voice acquisition terminal and a target translation result of the target text information, and the text display control is suspended in the graphical user interface;

the field terminal is used for responding to an operation instruction of a user on the presentation, and carrying out corresponding operation on the presentation according to the operation instruction;

the field terminal is used for updating contents of the target text information and the target translation result in real time through the following steps, wherein the target text information is text information recorded in a first language, and the target translation result is text information recorded in a second language:

The field terminal is used for collecting voice information in real time through the voice collection terminal;

the field terminal is used for carrying out voice recognition on the voice information through the server to obtain text information to be displayed;

the field terminal is used for translating the text information to be displayed through the server to obtain a translation result to be displayed;

the field terminal is used for judging the sum of the number of characters in the current displayed text information and the number of characters in the text information to be displayed in real time, and judging whether the sum of the number of characters in the current displayed translation result and the number of characters in the translation result to be displayed exceeds the maximum reference value which can be displayed by the preset display control;

if the sum of the number of words in the displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed exceed the maximum reference value, the field end is configured to remove at least part of the displayed text information and the displayed translation result, merge the rest of the displayed text information and the text information to be displayed into the target text information, and merge the rest of the displayed translation result and the translation result to be displayed into the target translation result;

And if the sum of the number of words in the displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed do not exceed the maximum reference value, the field end is used for merging the displayed text information and the text information to be displayed into target text information and merging the displayed translation result and the translation result to be displayed into target translation result.

In a third aspect, an embodiment of the present application provides a computer apparatus, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the machine-translation-based multi-language subtitle presentation method of any of the alternative embodiments of the first aspect described above.

In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to perform the steps of the machine translation based multi-language subtitle presentation method according to any of the alternative embodiments of the first aspect.

The technical method provided by the application comprises the following beneficial effects:

displaying a presentation file and a text display control in real time in a graphical user interface of the field display after a conference starts, wherein the text display control displays target text information corresponding to voice information acquired by the voice acquisition terminal and a target translation result of the target text information, and the text display control is suspended in the graphical user interface; responding to an operation instruction of a user on the presentation, and performing corresponding operation on the presentation according to the operation instruction; through the steps, on-site participants can browse text information and translation results corresponding to presentation files and voice information related to the conference through the on-site display.

Updating contents of the target text information and the target translation result in real time through the following steps, wherein the target text information is text information recorded in a first language, and the target translation result is text information recorded in a second language: the field terminal acquires voice information in real time through the voice acquisition terminal; the field terminal carries out voice recognition on the voice information through the server to obtain text information to be displayed; the field terminal translates the text information to be displayed through the server to obtain a translation result to be displayed; through the steps, the voice information in the conference can be identified to obtain the text information, and the text information is translated into information of other languages, so that people from different areas can effectively understand the voice information and the conference content in the conference.

The field terminal judges whether the sum of the number of characters in the current displayed text information and the number of characters in the text information to be displayed and the sum of the number of characters in the current displayed translation result and the number of characters in the translation result to be displayed exceeds a maximum reference value which can be displayed by a preset display control in real time; if the sum of the number of words in the displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed exceed the maximum reference value, the field end removes at least part of the displayed text information and the translation result to be displayed, merges the rest of the displayed text information and the text information to be displayed into the target text information, and merges the rest of the translation result to be displayed and the translation result to be displayed into the target translation result; if the sum of the number of words in the displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed do not exceed the maximum reference value, the field terminal merges the displayed text information and the text information to be displayed into target text information and merges the displayed translation result and the translation result to be displayed into target translation result; through the steps, when a speaker speaks, text information corresponding to voice information in a conference and a translation result can be displayed on a site display in real time.

By adopting the method, after the text information is obtained through recognition of the voice information in the conference, the text information is translated to obtain the translation result, and the text information and the translation result are synchronously displayed in the on-site display according to the collection process of the voice information, so that the conference content can be displayed in a text form while the presentation is displayed for the participants in the conference site, and the participants can be ensured to effectively receive and understand the voice information in the conference.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a multi-language caption presentation method based on machine translation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a graphical user interface of a field display provided in accordance with a first embodiment of the present invention;

FIG. 3 is a flowchart of a method for determining the target text information and the target translation result according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a graphical user interface of another field display provided in accordance with an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a multi-language caption display system based on machine translation according to a second embodiment of the present invention;

fig. 6 shows a schematic structural diagram of a computer device according to a third embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.

Example 1

For the convenience of understanding the present application, the following describes the first embodiment of the present application in detail with reference to the flowchart of the machine translation-based multi-language subtitle display method provided in the first embodiment of the present application shown in fig. 1.

Referring to fig. 1, fig. 1 shows a flowchart of a multi-language caption display method based on machine translation according to an embodiment of the present application, wherein the multi-language caption display method based on machine translation is applied to a multi-language caption display system based on machine translation, the multi-language caption display system based on machine translation includes a server, a field terminal, a voice acquisition terminal and a field display, the field terminal is respectively connected with the voice acquisition terminal and the field display through a local area network, and the server and the field terminal are connected through a wide area network, the method includes steps S101 to S102:

s101: and after the conference starts, displaying the presentation file and the text display control in real time in a graphical user interface of the field display, wherein the text display control displays target text information corresponding to the voice information acquired by the voice acquisition terminal and a target translation result of the target text information, and the text display control is suspended in the graphical user interface.

Specifically, referring to fig. 2, fig. 2 shows a schematic diagram of a gui of a field display according to the first embodiment of the present invention, where in addition to displaying a presentation in real time, files or contents related to a meeting such as a video, a picture, a text box, etc. may be displayed in real time in the gui of the field display, and the contents may be tiled or displayed in the gui according to a preset ratio; the text display control can be displayed at any position of the graphical user interface according to the needs of a user, and the user can change the display position of the text display control in the graphical user interface by dragging the text display control.

S102: responding to an operation instruction of a user on the presentation, and performing corresponding operation on the presentation according to the operation instruction.

Specifically, responding to an operation instruction of a user on the presentation file or any content displayed in a graphical user interface, and performing corresponding operation on the presentation file according to the operation instruction.

For example, when the graphical user interface displays a presentation or a slide, the user may flip back the presentation by clicking the area where the presentation is located, flip forward the presentation by double clicking the area where the presentation is located; when the graphic user interface displays the video, a user can pause the video currently playing by clicking the area where the video is located, and the double-click area is enlarged and displayed by clicking the area where the video is located; when the graphical user interface displays the picture, the user can mark the single-click region in the picture by clicking the region where the picture is located, and the double-click region is displayed in an enlarged manner by clicking the region where the picture is located.

Referring to fig. 3, fig. 3 shows a flowchart of the method for determining the target text information and the target translation result according to the first embodiment of the present invention, wherein the content of the target text information and the target translation result is updated in real time by the steps of:

s201: the field terminal acquires voice information in real time through the voice acquisition terminal.

Specifically, after the conference starts, the field terminal sends a voice acquisition instruction to the voice acquisition terminal, and the voice acquisition terminal starts to acquire voice information in the conference after receiving the voice acquisition instruction.

S302: and the field terminal carries out voice recognition on the voice information through the server to obtain text information to be displayed.

Specifically, the field terminal performs voice recognition on each piece of voice information through the server, and the voice recognition method includes, but is not limited to, a method based on a sound channel model and voice knowledge, a pattern matching method and an artificial neural network method, so as to obtain the text information to be displayed, which is not displayed in the field display after recognition.

S203: and the field terminal translates the text information to be displayed through the server to obtain a translation result to be displayed.

Specifically, after obtaining the target text information, the field terminal sends the target text information to a server, and the server translates the text information to be displayed according to translation rules stored in the server in advance, so as to obtain translated results to be displayed, which are not displayed in the field display.

S204: and the field terminal judges whether the sum of the number of characters in the current displayed text information and the number of characters in the text information to be displayed and the sum of the number of characters in the current displayed translation result and the number of characters in the translation result to be displayed exceeds the maximum reference value which can be displayed by a preset display control in real time.

Specifically, for example, the maximum reference value that can be displayed by the preset display control is 15, when the currently displayed text information is "we eat a good eating bowl today" and the text information to be displayed is "instant noodle bar", the number of words in the currently displayed text information is 13, the number of words in the text information to be displayed is 5, and the sum of the number of words in the displayed text information and the number of words in the text information to be displayed is 18; assuming that the number of words in the displayed translation result is 10, the number of words in the translation result to be displayed is 10, the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed is 20, and whether the sum of the number of words in the current displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the current displayed translation result and the number of words in the translation result to be displayed exceeds the maximum reference value which can be displayed by the preset display control is judged.

S205: and if the sum of the number of words in the displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed exceed the maximum reference value, the field end removes at least part of the displayed text information and the translation result to be displayed, merges the rest of the displayed text information and the text information to be displayed into the target text information, and merges the rest of the translation result to be displayed and the translation result to be displayed into the target translation result.

Specifically, for example, if the sum 18 of the number of words in the displayed text information and the number of words in the text information to be displayed in step S204 and the sum 20 of the number of words in the displayed translation result and the number of words in the translation result to be displayed are both greater than the maximum reference value 15, at least part of "we eat good and big bowl today" is removed (for example, "we today"), and the remaining part of the displayed text information "eat good and big bowl" and the instant noodle bar "of the text information to be displayed" are combined into the target text information "eat good and big bowl instant noodle bar"; the above method is referred to in determining the target translation result.

S206: if the sum of the number of words in the displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed do not exceed the maximum reference value, the field terminal merges the displayed text information and the text information to be displayed into target text information and merges the displayed translation result and the translation result to be displayed into target translation result.

Specifically, when the sum of the number of words in the displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed do not exceed the maximum reference value, any one of the displayed text information, the text information to be displayed, the translation result to be displayed and the translation result to be displayed does not need to be partially removed, the displayed text information and the text information to be displayed are directly combined into target text information, and the displayed translation result and the translation result to be displayed are combined into target translation result.

In one possible embodiment, when presenting the presentation and text display control in real time in the graphical user interface of the field display, the method includes steps S301 to S302 or S303 to S304:

S301: and responding to a control instruction issued by the user in the area where the text display control is located, and selecting a target controlled control from the displayed presentation by the field terminal according to the operation position of the control instruction.

Specifically, when a user sends a control instruction in an area where a text display control is located, the field terminal uses the control in the presentation file displayed at the operation position indicated by the operation instruction as a target controlled control to operate the target controlled control.

S302: and the field terminal operates the target controlled control according to the operation content indicated by the control instruction.

Specifically, the field terminal operates the target controlled control in the presentation according to the operation content indicated by the operation instruction, including but not limited to the following cases: clicking a page turning control in the presentation to realize page turning of the presentation; and clicking a pause control in the video when the presentation is the played video so as to pause the playing of the current video.

S303: and responding to a control instruction issued by the user in the area where the text display control is located, and selecting a target controlled control from the displayed text display control by the field terminal according to the operation position of the control instruction.

Specifically, when a user sends a control instruction in an area where the text display control is located, the field terminal uses the control in the text display control displayed at the operation position indicated by the operation instruction as a target controlled control to operate the target controlled control.

S304: and the field terminal operates the target controlled control according to the operation content indicated by the control instruction.

Specifically, the field terminal operates the target controlled control in the text display control according to the operation content indicated by the operation instruction, including but not limited to the following cases: clicking a font thickening control piece in the text display control to realize thickening display of the text in the text display control; clicking a target color selection control in the text display control to display the text in the text display control in the target color.

In a possible implementation manner, when displaying, in the text display control, target text information corresponding to the voice information collected by the voice collection terminal and a target translation result of the target text information, the method includes steps S401 to S403:

S401: the field terminal obtains a first number of words in the target text information which can be displayed in each row of the word display control.

S402: and the field terminal acquires a second number of words which can be displayed in the target translation result in each row of the word display control.

S403: after updating the target text information and the target translation result each time, displaying the updated target text information on the upper half part of the text display control in a mode of displaying the characters of the target number in each row, and displaying the updated target translation result on the lower half part of the text display control in a mode of displaying the characters of the target number in each row, wherein the target number is the minimum number in the first number and the second number.

Specifically, in order to ensure that the content in the target text information displayed in the text display control can correspond to the content in the target translation result displayed in the text display control, the situation that the content in the target text information displayed in one row is different from the content in the target translation result displayed in one row due to the fact that the number of characters occupied by characters representing the same meaning in different languages is different is avoided, so that after the target text information and the target translation result are updated each time, the updated target text information is displayed on the upper half part of the text display control in a mode of displaying the number of characters of each row, and the updated target translation result is displayed on the lower half part of the text display control in a mode of displaying the number of characters of each row.

Referring to fig. 4, fig. 4 is a schematic diagram of a graphical user interface of another field display provided by the first embodiment of the present invention, where the method can display the target text information and the target translation result as subtitles in the field display simultaneously while displaying the presentation in the graphical user interface, so that, for convenience, a user can perform multilingual text comparison, the target text information can be displayed above the target translation result.

It is noted that the specific form of displaying the target text information and the target translation result in the text display control can be set according to actual requirements.

In a possible implementation manner, when the text display control displays the target text information corresponding to the voice information collected by the voice collection terminal and the target translation result of the target text information, the method further includes step S501:

s501: and the field end responds to the target language selection operation of the user and hides the text information which is displayed in the text display control and is recorded by adopting the non-target language, wherein the target language comprises the first language and the second language.

Specifically, the user can set the caption language displayed in the site display through the site end, and when the user selects the caption content of the target language, the site end hides the displayed text information which is not the target language in the text display control, so that the text of the target language is displayed in the text display control.

The user can select to display the target text information, the target translation result or both from the original translation display buttons in the subtitle setting page, if only the target text information or the target translation result is displayed, displaying the target text information or the target translation result in the bottom subtitle display area of the main user display interface according to the number of subtitle lines in the subtitle setting page, and displaying the subtitle according to the number of subtitle lines, for example, when the user sets the number of subtitle lines to be 2, only displaying the target text information, and displaying the target text information in the subtitle display area.

And if the target text information and the target translation result are displayed at the same time, dividing the target text information and the target translation result into halves according to the number of the subtitle lines in a bottom subtitle display area of a main user display interface, displaying the target text information and the target translation result according to the number of half subtitle lines, and displaying the target text information and the target translation result subtitle according to the number of the other half subtitle lines, for example, when the user sets the number of the subtitle lines to 2, displaying the target text information and the target translation result, and displaying one line of target text information and one line of target translation result in the subtitle display area.

In a possible implementation manner, when displaying, in the text display control, the target text information corresponding to the voice information collected by the voice collection terminal and the target translation result of the target text information, the method includes step S601:

s601: the field terminal highlights the target text information and the area where the rarely used word in the target translation result is located in the display, and displays the pronunciation rules of the rarely used word above the rarely used word.

Specifically, the field terminal highlights the rarely used word in the target text information and the target translation result, or fills the preset range where the rarely used word is located with color, emphasizes the rarely used word, and simultaneously displays the pronunciation rule of the rarely used word above the rarely used word or in another text display control in the graphical user interface.

In a possible implementation manner, after the field end determines in real time whether the sum of the number of words in the current displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the current displayed translation result and the number of words in the translation result to be displayed exceeds the maximum reference value that can be displayed by the preset display control, the method includes steps S701 to S702:

S701: if the sum of the number of words in the displayed text information and the number of words in the text information to be displayed exceeds the maximum reference value, but the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed does not exceed the maximum reference value, the field terminal removes at least part of the displayed text information, merges the rest of the displayed text information and the text information to be displayed into the target text information, and merges the displayed translation result and the translation result to be displayed into the target translation result.

Specifically, referring to the methods in step S204 and step S205, a target translation result is determined when the sum of the number of words in the displayed text information and the number of words in the text information to be displayed exceeds the maximum reference value, but the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed does not exceed the maximum reference value.

S702: if the sum of the number of words in the displayed text information and the number of words in the text information to be displayed does not exceed the maximum reference value, but the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed exceeds the maximum reference value, the field terminal removes at least part of the displayed translation result, merges the displayed text information and the text information to be displayed into the target text information, and merges the rest of the translation result to be displayed and the translation result to be displayed into the target translation result.

Specifically, referring to the methods in step S204 and step S205, a target translation result is determined when the sum of the number of words in the displayed text information and the number of words in the text information to be displayed does not exceed the maximum reference value, but the sum of the number of words in the displayed translation result and the number of words in the translation result to be displayed exceeds the maximum reference value.

In a possible implementation manner, when displaying, in the text display control, target text information corresponding to the voice information collected by the voice collection terminal and a target translation result of the target text information, the method includes steps S801 to S802:

s801: and the field terminal determines the target role for speaking each voice message according to the voice characteristics of each voice message.

Specifically, the method for determining the voice characteristics includes, but is not limited to, a method based on a hidden Markov model of a parametric model and a method based on vector quantization of a non-parametric model, and determines a target character speaking each voice message according to the extracted voice characteristics of each voice message.

S802: and the field terminal displays target text information corresponding to the voice information taught by different target roles in the text display control with different colors.

Specifically, in order to enable the user to browse the content spoken by each character more intuitively, text content corresponding to voice information spoken by different characters is displayed in text display controls by using different colors.

In the machine translation-based multilingual subtitle presentation system, the target text information and the target translation result can be determined in real time by:

the field terminal acquires voice information in real time through the voice acquisition terminal; the field terminal carries out voice recognition on the voice information through the server to obtain the target text information; the field terminal translates the term text in the target text information through the server according to a first preset translation rule in a cloud term library so as to obtain a first translation result; the field terminal translates the non-term text in the target text information through the server according to a second preset translation rule in a cloud translation library so as to obtain a second translation result; and the field terminal combines the first translation result and the second translation result according to the text sequence in the target text information through the server to obtain the target translation result.

After the field end obtains the target translation result by combining the first translation result and the second translation result according to the text sequence in the target text information through the server, judging whether a local term library and a local translation library are required to be started for translating the target text information according to the network delay time between the field end and the server, wherein the local term library and the local translation library are prestored in the field end; when the network delay time exceeds a first preset threshold, the field terminal translates the term text in the target text information according to a third translation rule in the local term library to obtain a third translation result; the field terminal translates the non-term text in the target text information through a fourth translation rule in the local translation library to obtain a fourth translation result; the field terminal combines the third translation result and the fourth translation result according to the text sequence in the target text information to obtain a local translation result; and the field terminal updates the target translation result by using the local translation result.

In the multi-language caption display system based on machine translation, the data and information in the conference can be acquired in the mobile terminal in real time through the following steps:

displaying a real-time image display control in a graphical user interface of the field display, wherein the image display control displays a conference two-dimensional code generated by the server according to the product serial number of the field end, the conference two-dimensional code is used for accessing an acquisition address of a remote screen-throwing interface, and the image display control is suspended in the graphical user interface; after a user scans the conference two-dimensional code by using the mobile terminal, the mobile terminal displays the remote screen-throwing interface in a display of the mobile terminal; the field end responds to the selection operation of the user on the target language, and translates the target text information into a target translation through the server, wherein the target translation is text information recorded by the target language; and the mobile terminal displays the target translation in the remote screen-throwing interface.

After the mobile terminal displays the target translation in the remote screen-throwing interface, the mobile terminal determines whether to receive the presentation, the voice information, the target text information and the target translation result according to the network delay time between the mobile terminal and the field terminal; when the network delay time length is smaller than or equal to a preset first threshold value, the mobile terminal receives the presentation, the voice information, the target text information and the target translation result, and displays the presentation, the voice information, the target text information and the target translation result in the remote screen-throwing interface; when the network delay time is longer than the first threshold value and smaller than or equal to a preset second threshold value, the mobile terminal receives the voice information, the target text information and the target translation result and displays the voice information, the target text information and the target translation result in the remote screen-throwing interface, wherein the first threshold value is smaller than the second threshold value; and when the network delay time is longer than the second threshold, the mobile terminal receives the target text information and the target translation result and displays the target text information and the target translation result in the remote screen-throwing interface.

In the multi-language caption display system based on machine translation, the system further comprises a transcription client, the transcription client and the field end are connected through a local area network, and the system can update the contents of the target text information and the target translation result through the following steps:

the field terminal acquires voice information in real time through the voice acquisition terminal; the field terminal determines a target role for speaking each voice message according to the voice characteristics of each voice message; the field terminal carries out voice recognition on each voice message through the server to obtain the target text message; the field terminal translates the target text information through the server to obtain the target translation result; the transfer client displays the target text information, the target translation result and a target role corresponding to the target text information on a transfer display, and performs voice broadcasting on the target text information and the target translation result through a loudspeaker of the target role; the transfer client responds to a modification instruction issued by a transfer person and modifies the target translation result; and the transfer client transmits the modified target translation result to the field end so that the field end updates the target translation result by using the modified target translation result transmitted by the transfer client.

After the field end translates the target text information through the server to obtain the target translation result, the transfer client displays the presentation in real time in the middle of a graphical user interface of the transfer display, displays a voice playing control in real time at the upper part of the graphical user interface of the transfer display, displays a first text display control in real time at the left side of the graphical user interface of the transfer display, and displays a second text display control in real time at the right side of the graphical user interface of the transfer display, wherein the target text information is displayed in the first text display control, and the target translation result is displayed in the second text display control; the transcription client side performs sentence breaking on the target text information according to a preset sentence breaking rule so as to obtain at least one target sentence; the transcription client displays each target sentence in each text display sub-control in at least one text display sub-control according to the position sequence of each target sentence in the target text information, wherein the text display controls comprise the at least one text display sub-control arranged in sequence; and responding to the dragging operation of the user on the voice playing control by the transcription client, and highlighting characters corresponding to the playing control in the displayed target text information in the first character display control.

In addition, the real-time transfer system based on the multi-language conference system is deployed in the intelligent equipment at the site, and in the system structure of the intelligent equipment, the functions of displaying the subtitles are realized by a user interface layer, an interactive logic processing layer, a business logic layer, a service assembly layer and a data acquisition and processing layer:

the user interface layer is composed of a plurality of behaviors, is responsible for displaying a user interaction interface, receiving the operation of a user and timely feeding back the operation of the user; each page is a behavior, the layout of each behavior is configured by a specific layout file, and multiple sets of layouts are set for different resolutions and screen sizes to ensure the normal display of the user interface at various resolutions and screen sizes.

An interaction logic processing layer, configured to process interaction events between a user and a behavior, such as remote controller operations (including up key, down key, left key, right key, ok key, return key, menu key), and the like; according to different user operations, different changes of the user interface are controlled, the operations are mapped to user behaviors, and different business operations are executed.

The business logic layer is used for processing the operation behaviors of the user, for example, when the user clicks a setting button, the business logic layer is switched to a setting page, and when the user clicks a return button, the business logic layer is switched back to the main interface; the business logic layer is composed of a number of sub-modules, including: the system comprises a voice recognition logic module, a machine translation logic module, a scene setting logic module, a video and audio setting logic module, a registration module, a login module, a conference summary module and the like.

The service component layer is used for processing the instructions of the service logic layer and consists of 3 sub-modules, namely: a speech recognition engine layer, a machine translation engine layer and an interface open service layer. When the service logic layer notifies the service component layer to process the starting voice recognition action of the user, the voice recognition submodule firstly initializes the recognition engine selected by the current user, sets parameters such as languages selected by the user to the voice recognition engine, selects the corresponding voice recognition engine according to the used language, calls a starting recognition interface of the engine, returns a voice recognition result or error information to the service logic layer, transmits the message upwards layer by layer, and finally displays the voice recognition result on the user interface layer.

The data acquisition and processing layer consists of a networking framework, a JSON (JavaScript Object Notation, JS object numbered musical notation) analysis framework and a message pushing framework, and is responsible for sending a data request to a server, processing data returned by the server and returning the structured data to the upper layer by layer.

Example two

Referring to the drawings, fig. 5 shows a schematic structural diagram of a multi-language caption display system based on machine translation according to a second embodiment of the present invention, where, as shown in fig. 5, the multi-language caption display system based on machine translation according to the second embodiment of the present invention includes a server 501, a field end 502, a voice acquisition terminal 503, and a field display 504, where the field end 502 is connected to the voice acquisition terminal 503 and the field display 504 through a local area network, and the server 501 and the field end 502 are connected through a wide area network:

Example III

Based on the same application concept, referring to fig. 6, fig. 6 shows a schematic structural diagram of a computer device provided in a third embodiment of the present application, where, as shown in fig. 6, a computer device 600 provided in the third embodiment of the present application includes:

the system comprises a processor 601, a memory 602 and a bus 603, wherein the memory 602 stores machine-readable instructions executable by the processor 601, and when the computer device 600 is running, the processor 601 and the memory 602 communicate through the bus 603, and the machine-readable instructions are executed by the processor 601 to perform the steps of a multi-language caption presentation method based on machine translation as shown in the first embodiment.

Example IV

Based on the same application concept, the embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of a multi-language subtitle display method based on machine translation are executed.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again.

The computer program product for performing machine translation-based multilingual subtitle presentation according to the embodiment of the present invention includes a computer readable storage medium storing program code, where the program code includes instructions for executing the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment and will not be described herein.

The multi-language caption display system based on machine translation provided by the embodiment of the invention can be specific hardware on equipment or software or firmware installed on the equipment. The device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding content in the foregoing method embodiment where the device embodiment is not mentioned. It will be clear to those skilled in the art that, for convenience and brevity, the specific operation of the system, apparatus and unit described above may refer to the corresponding process in the above method embodiment, which is not described in detail herein.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments provided in the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The multi-language caption display method based on machine translation is characterized by being applied to a multi-language caption display system based on machine translation, wherein the multi-language caption display system based on machine translation comprises a server, a field terminal, a voice acquisition terminal and a field display, the field terminal is respectively connected with the voice acquisition terminal and the field display through a local area network, and the server and the field terminal are connected through a wide area network, and the method comprises the following steps:

2. The method of claim 1, wherein when presenting the presentation and text display controls in real time in a graphical user interface of the field display, the method comprises:

or alternatively, the first and second heat exchangers may be,

3. The method according to claim 1, wherein when displaying target text information corresponding to the voice information collected by the voice collection terminal and a target translation result of the target text information in the text display control, the method comprises:

4. The method according to claim 1, wherein when displaying target text information corresponding to the voice information collected by the voice collection terminal and a target translation result of the target text information in the text display control, the method further comprises:

5. The method according to claim 1, wherein when displaying target text information corresponding to the voice information collected by the voice collection terminal and a target translation result of the target text information in the text display control, the method comprises:

6. The method according to claim 1, wherein after the field end determines in real time whether the sum of the number of words in the current displayed text information and the number of words in the text information to be displayed and the sum of the number of words in the current displayed translation result and the number of words in the translation result to be displayed exceeds a maximum reference value that can be displayed by a preset display control, the method includes:

7. The method according to claim 1, wherein when displaying target text information corresponding to the voice information collected by the voice collection terminal and a target translation result of the target text information in the text display control, the method comprises:

8. The multi-language caption display system based on machine translation is characterized by comprising a server, a field end, a voice acquisition terminal and a field display, wherein the field end is respectively connected with the voice acquisition terminal and the field display through a local area network, and the server is connected with the field end through a wide area network:

9. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the machine-translation-based multi-language subtitle presentation method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the machine-translation-based multilingual subtitle presentation method as claimed in any one of claims 1 to 7.