CN114913857A

CN114913857A - Real-time transcription method, system, equipment and medium based on multi-language conference system

Info

Publication number: CN114913857A
Application number: CN202210726391.XA
Authority: CN
Inventors: 刘丽梅; 张凯; 田祥花; 高翊
Original assignee: Glabal Tone Communication Technology Co ltd
Current assignee: Glabal Tone Communication Technology Co ltd
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2022-08-16

Abstract

The application provides a real-time transcription method, a system, equipment and a medium based on a multi-language conference system, wherein after a conference starts, a presentation and a character display control are displayed in real time in a graphical user interface of a field display, and target text information and a target translation result are displayed in the character display control; the site end determines the target role of each spoken voice message; the transfer client displays the target text information, the target translation result and the target role on a transfer display, and broadcasts the target text information and the target translation result through a loudspeaker of the transfer client; the transcription client responds to a modification instruction issued by a transcription worker and modifies a target translation result; and the site end updates the target translation result by using the modified target translation result transmitted by the transcription client. By adopting the method, the text information of the recorded conference content can be edited and displayed in real time.

Description

Real-time transcription method, system, equipment and medium based on multi-language conference system

Technical Field

The invention relates to the field of intelligent conference equipment, in particular to a real-time transcription method, a real-time transcription system, real-time transcription equipment and a real-time transcription medium based on a multi-language conference system.

Background

When a company carries out a conference, files or documents related to conference contents are usually printed into paper files to be distributed to all participants, and the participants browse the paper files in the conference site and listen to the words spoken by speakers of the conference so as to obtain the conference contents in the conference in real time; meanwhile, the conference secret book can record the conference content in real time, and the recorded characters are used as a conference summary; the inventor finds out in research that the content recorded by the conference secretary can be stored as a conference summary in real time, so that the text information of the recorded conference content cannot be edited and displayed in real time.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a method, a system, a device and a medium for real-time transcription based on a multi-language conference system, which can edit and display text information of recorded conference contents in real time.

In a first aspect, an embodiment of the present application provides a real-time transcription method based on a multi-language conference system, which is applied to a real-time transcription system based on the multi-language conference system, where the real-time transcription system based on the multi-language conference system includes a server, a site terminal, a voice acquisition terminal, a site display, and a transcription client, the site terminal is connected to the transcription client, the voice acquisition terminal, and the site display through a local area network, and the site terminal is connected to the server through a wide area network, and the method includes:

after a conference begins, displaying a presentation and a character display control in real time in a graphical user interface of the field display, wherein the character display control displays target text information corresponding to voice information acquired by the voice acquisition terminal and a target translation result of the target text information, and the character display control is suspended in the graphical user interface;

responding to an operation instruction of a user on the presentation, and performing corresponding operation on the presentation according to the operation instruction;

determining the contents of the target text information and the target translation result by:

the field terminal acquires voice information in real time through the voice acquisition terminal;

the site end determines a target role for speaking each voice message according to the voice characteristics of each voice message;

the site terminal performs voice recognition on each voice message through the server to obtain the target text message;

the site terminal translates the target text information through the server to obtain a target translation result;

the transfer client displays the target text information, the target translation result and a target role corresponding to the target text information on a transfer display, and carries out voice broadcast on the target text information and the target translation result through a loudspeaker of the transfer client;

the transcription client responds to a modification instruction issued by a transcription worker and modifies the target translation result;

and the transcription client transmits the modified target translation result to the field terminal so that the field terminal updates the target translation result by using the modified target translation result transmitted by the transcription client.

Optionally, after the site terminal translates the target text information through the server to obtain the target translation result, the method further includes:

the transfer client displays the presentation in real time in the middle of a graphical user interface of the transfer display, displays a voice playing control in real time in the upper part of the graphical user interface of the transfer display, displays a first character display control in real time on the left side of the graphical user interface of the transfer display, and displays a second character display control in real time on the right side of the graphical user interface of the transfer display, wherein the target text information is displayed in the first character display control, and the target translation result is displayed in the second character display control;

the transfer client breaks the target text information according to a preset sentence breaking rule to obtain at least one target sentence;

the transcription client displays each target sentence in each character display sub-control in at least one character display sub-control according to the position sequence of each target sentence in the target text information, wherein the character display control comprises the at least one character display sub-control arranged in sequence;

and the transfer client responds to the dragging operation of the user on the voice playing control, and highlights the characters corresponding to the playing control in the displayed target text information in the first character displaying control.

Optionally, after the transcription client transmits the modified target translation result to the site end, the method further includes:

the site terminal updates the target translation result to be displayed by using the modified target translation result;

or the like, or, alternatively,

and the site terminal updates the displayed target translation result and the target translation result to be displayed by using the modified target translation result.

Optionally, when the transcription client displays the target text information, the target translation result, and a target role corresponding to the target text information on a transcription display, the method includes:

the transcription client highlights the obscure words in the target text information in the region of the transcription display;

for each highlight display area, the transcription client responds to the click operation of the transcription personnel on the highlight display area, and paraphrase content of the uncommon words in the highlight display area is displayed.

Optionally, before the transcription client displays the target text information, the target translation result, and the target role corresponding to the target text information on a transcription display, the method further includes:

after the transcriber inputs a site end IP in the transcribing client, the transcribing client determines a site end with the site end IP in the currently connectable terminal according to the site end IP;

the transcription client sends a session request to the site terminal;

and the field terminal responds to a session request sent by the transcription client terminal and establishes a session with the transcription client terminal so as to realize the transmission of the target text information, the target translation result and the target role corresponding to the target text information with the transcription client terminal.

Optionally, when the transcription client broadcasts the target text information and the target translation result in a voice through a speaker thereof, the method includes:

the method comprises the steps that a transfer client displays a graphic display control in a transfer display in real time, wherein a voice oscillogram and a broadcast progress point at the current moment are displayed in the graphic display control;

determining the content in the voice oscillogram and the position of the broadcast progress point through the following steps:

the transcription client generates a voice oscillogram of the voice information according to the decibel value of each character in the voice information acquired by the site terminal through the voice acquisition terminal;

and the transcription client determines the position of a broadcasting progress point at the current moment in the voice oscillogram according to the position of the current voice broadcasting character in the target text information.

Optionally, after the target text information and the target translation result are subjected to voice broadcast through a speaker thereof, the method further includes:

and the transcription client responds to a target speed-multiplying broadcasting instruction issued by the transcription personnel, and carries out voice broadcasting on the target text information and the target translation result according to the target speed-multiplying.

In a second aspect, an embodiment of the present application provides a real-time transcription system based on a multi-language conference system, where the real-time transcription system based on the multi-language conference system includes a server, a site, a voice acquisition terminal, a site display, and a transcription client, where the site is connected to the transcription client, the voice acquisition terminal, and the site display through a local area network, and the site is connected to the server through a wide area network:

after a conference begins, the site end is used for displaying a presentation and a text display control in real time in a graphical user interface of the site display, wherein the text display control displays target text information corresponding to voice information acquired by the voice acquisition terminal and a target translation result of the target text information, and the text display control is suspended in the graphical user interface;

the field terminal is used for responding to an operation instruction of a user on the presentation and carrying out corresponding operation on the presentation according to the operation instruction;

the site end is used for determining the contents of the target text information and the target translation result through the following steps:

the site end is used for acquiring voice information in real time through the voice acquisition terminal;

the site end is used for determining a target role for speaking each voice message according to the voice characteristics of each voice message;

the site end is used for carrying out voice recognition on each voice message through the server to obtain the target text message;

the site end is used for translating the target text information through the server to obtain a target translation result;

the transcription client is used for displaying the target text information, the target translation result and a target role corresponding to the target text information on a transcription display, and performing voice broadcast on the target text information and the target translation result through a loudspeaker of the transcription client;

the transcription client is used for responding to a modification instruction issued by a transcription worker and modifying the target translation result;

the transcription client is used for transmitting the modified target translation result to the field terminal so that the field terminal can update the target translation result by using the modified target translation result transmitted by the transcription client.

In a third aspect, an embodiment of the present application provides a computer device, including: a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory communicate via the bus when a computer device is running, and the machine-readable instructions are executed by the processor to perform the steps of the real-time transcription method based on the multi-language conference system as described in any one of the optional embodiments of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the real-time transcription method based on the multi-language conference system as described in any optional implementation manner of the first aspect.

The technical scheme provided by the application comprises but is not limited to the following beneficial effects:

after a conference begins, displaying a presentation and a character display control in real time in a graphical user interface of the field display, wherein the character display control displays target text information corresponding to voice information acquired by the voice acquisition terminal and a target translation result of the target text information, and the character display control is suspended in the graphical user interface; responding to an operation instruction of a user on the presentation, and performing corresponding operation on the presentation according to the operation instruction; through the steps, on-site participants can browse text information and translation results corresponding to the presentation and voice information related to the conference through the on-site display.

Determining the contents of the target text information and the target translation result by: the field terminal acquires voice information in real time through the voice acquisition terminal; the site end determines a target role for speaking each voice message according to the voice characteristics of each voice message; the site terminal performs voice recognition on each voice message through the server to obtain the target text message; the site terminal translates the target text information through the server to obtain a target translation result; through the steps, the voice information in the conference can be identified to obtain the text information, and the text information is translated into the information of other languages, so that people from different regions can effectively understand the voice information and the conference content in the conference.

The transfer client displays the target text information, the target translation result and a target role corresponding to the target text information on a transfer display, and carries out voice broadcast on the target text information and the target translation result through a loudspeaker of the transfer client; the transcription client responds to a modification instruction issued by a transcription worker and modifies the target translation result; and the transcription client transmits the modified target translation result to the field terminal so that the field terminal updates the target translation result by using the modified target translation result transmitted by the transcription client.

By adopting the method, the conference content can be displayed in real time in a text form while the presentation is displayed for the participants in the conference site, and the text information corresponding to the voice information in the conference and the translation result of the text information can be edited and modified in real time, and the modified content is displayed for the participants in real time.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart illustrating a real-time transcription method based on a multi-language conference system according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a graphical user interface of a field display according to an embodiment of the present invention;

fig. 3 is a flowchart of a method for determining the target text information and the target translation result according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a graphical user interface of a transcription display provided by an embodiment of the invention;

FIG. 5 is a diagram illustrating a graphical user interface in a transcription display provided by an embodiment of the invention;

fig. 6 is a schematic structural diagram illustrating a real-time transcription system based on a multilingual conference system according to a second embodiment of the present invention;

fig. 7 shows a schematic structural diagram of a computer device according to a third embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Example one

For the convenience of understanding of the present application, the following describes in detail an embodiment of the present application with reference to a content of a flowchart description of a real-time transcription method based on a multilingual conference system provided in an embodiment of the present invention shown in fig. 1.

Referring to fig. 1, fig. 1 is a flowchart illustrating a real-time transcription method based on a multi-language conference system according to an embodiment of the present invention, where the real-time transcription method based on the multi-language conference system is applied to a real-time transcription system based on the multi-language conference system, the real-time transcription system based on the multi-language conference system includes a server, a site terminal, a voice collecting terminal, a site display, and a transcription client, the site terminal is connected to the transcription client, the voice collecting terminal, and the site display through a local area network, and the site terminal is connected to the server through a wide area network, and the method includes steps S101 to S102:

s101: and after the conference begins, displaying a presentation and a character display control in real time in a graphical user interface of the field display, wherein the character display control displays target text information corresponding to the voice information acquired by the voice acquisition terminal and a target translation result of the target text information, and the character display control is suspended in the graphical user interface.

Specifically, referring to fig. 2, fig. 2 is a schematic diagram illustrating a graphical user interface of a field display according to an embodiment of the present invention, where in addition to displaying a presentation in real time, a video, a picture, a text box, and other files or contents related to a conference may be displayed in real time in the graphical user interface of the field display, and the contents are tiled or displayed in the graphical user interface according to a preset ratio; the text display control can be displayed at any position of the graphical user interface according to the needs of a user, and the user can change the display position of the text display control in the graphical user interface by dragging the text display control.

S102: responding to an operation instruction of a user on the presentation, and performing corresponding operation on the presentation according to the operation instruction.

Specifically, the operation instruction of the user on the presentation or any content displayed in the graphical user interface is responded, and the corresponding operation is performed on the presentation according to the operation instruction.

For example, when the graphical user interface displays a presentation or a slide, the user can page the presentation backward by clicking the area where the presentation is located, and page the presentation forward by double clicking the area where the presentation is located; when the video is displayed on the graphical user interface, the user can pause the currently played video by clicking the area where the video is located, and enlarge and display the double-click area by double-clicking the area where the video is located; when the picture is displayed on the graphical user interface, the user can mark the single click region in the picture by clicking the region where the picture is located, and amplify and display the double click region by double clicking the region where the picture is located.

Referring to fig. 3, fig. 3 is a flowchart illustrating a method for determining the target text information and the target translation result according to an embodiment of the present invention, where the contents of the target text information and the target translation result are determined through the following steps, and the steps include S201 to S207:

s201: and the field terminal acquires voice information in real time through the voice acquisition terminal.

Specifically, after the conference starts, the on-site end sends a voice acquisition instruction to the voice acquisition terminal, and the voice acquisition terminal starts acquisition of voice information in the conference after receiving the voice acquisition instruction.

S202: and the site end determines the target role of each piece of voice information according to the voice characteristics of each piece of voice information.

Specifically, the site side extracts the voice features of each voice message from each voice message, the extraction methods include, but are not limited to, a hidden markov model based on a parameter model method and a vector quantization method based on a non-parameter model, and determines a target character speaking each voice message according to the extracted voice features of each voice message.

S203: and the site terminal performs voice recognition on each voice message through the server to obtain the target text message.

Specifically, the field side performs voice recognition on each voice message through the server, and the voice recognition method includes, but is not limited to, a method based on a vocal tract model and voice knowledge, a pattern matching method, and an artificial neural network method, so as to obtain recognized target text information.

S204: and the site terminal translates the target text information through the server to obtain the target translation result.

Specifically, after obtaining target text information, the field end sends the target text information to a server, and the server translates the target text information according to a translation rule stored in the server in advance to obtain a target translation result; the target text information is a text described in a first language, and the target translation result is a text described in a second language.

For example, when the target text information is a text described in chinese, the server translates the target text information according to a chinese-to-english rule stored in advance therein to obtain a target translation result described in english.

S205: and the transfer client displays the target text information, the target translation result and a target role corresponding to the target text information on a transfer display, and performs voice broadcast on the target text information and the target translation result through a loudspeaker of the transfer client.

Specifically, referring to fig. 4, fig. 4 is a schematic diagram illustrating a graphical user interface of a transcription display according to an embodiment of the present invention, where the transcription end receives a target translation result obtained by a field end from a server, receives target text information and a target role corresponding to the target text information, displays each piece of the target text information in a left text display control of the graphical user interface in the transcription display, displays, for each piece of the target text, the target role corresponding to the target text in front of the target text, and displays the target translation result in a right text display control of the graphical user interface in the transcription display.

S206: and the transcription client responds to a modification instruction issued by a transcription worker and modifies the target translation result.

Specifically, when a transfer worker modifies the text content in any one text display control in the transfer client, the transfer client modifies the text content in the text display control.

S207: and the transcription client transmits the modified target translation result to the field terminal so that the field terminal updates the target translation result by using the modified target translation result transmitted by the transcription client.

Specifically, the transcription client transmits the modified target translation result to the field terminal, and the field terminal updates the original target translation result by using the modified target translation result.

In a possible embodiment, after the site translates the target text message by the server to obtain the target translation result, the method further includes steps S301 to S304:

s301: the method comprises the steps that the transcription client displays the presentation in real time in the middle of a graphical user interface of the transcription display, displays a voice playing control in real time in the upper portion of the graphical user interface of the transcription display, displays a first character display control in real time on the left side of the graphical user interface of the transcription display, and displays a second character display control in real time on the right side of the graphical user interface of the transcription display, wherein the target text information is displayed in the first character display control, and the target translation result is displayed in the second character display control.

Specifically, when the presentation is displayed in the middle of the graphical user interface in real time, the presentation may be displayed in a tiled manner, or the entire graphical user interface may be filled with the presentation, or the presentation may be displayed in a zoomed manner according to a preset zoom ratio.

When the voice playing control, the first text display control and the second text display control are displayed, the controls can be dragged to any appointed position in the graphical user interface to be displayed according to dragging operation of a user on the controls, when a plurality of controls are overlapped with one another, the controls dragged to the positions behind the user are preferentially displayed, the overlapped parts of the controls which are originally in the positions and are dragged to the positions behind the user are hidden, and only the parts which are originally in the controls in the positions and are not overlapped with the controls dragged to the positions behind the user are displayed.

S302: and the transcription client performs sentence breaking on the target text information according to a preset sentence breaking rule to obtain at least one target sentence.

Specifically, the transcription client performs sentence interruption on the target text information according to a preset sentence interruption rule to obtain at least one target sentence, for example, the sentence interruption may be performed according to punctuation marks in the text information, and characters between every two periods, exclamation marks or question marks are used as one target sentence.

S303: and the transcription client displays each target sentence in each character display sub-control in at least one character display sub-control according to the position sequence of each target sentence in the target text information, wherein the character display control comprises the at least one character display sub-control arranged in sequence.

Specifically, after splitting the target text information into at least one target sentence, numbering each target sentence in the at least one target sentence from small to large according to the sequence obtained by splitting (the numbering is used as a hidden mark of the sentence and is not displayed in the character display control); the character display control comprises at least one character display sub-control, and each character display sub-control displays on the left side of the graphical user interface from top to bottom according to numbers which are pre-distributed for each character display sub-control from small to large.

Referring to fig. 5, fig. 5 is a schematic diagram illustrating a graphical user interface in a transcription display according to a first embodiment of the present invention, where each text display sub-control is composed of two text display sub-controls, the text display sub-control located on the left side of the text display sub-control is used as a first text display sub-control, the text display sub-control located on the right side of the text display sub-control is used as a second text display sub-control, so that a target sentence having the same number as that of each text display sub-control is displayed in the first text display sub-control of the text display sub-control in the transcription display, and a translation result of the target sentence having the same number as that of each text display sub-control is displayed in the second text display sub-control of the text display sub-control in the transcription display.

S304: and the transfer client responds to the dragging operation of the user on the voice playing control, and highlights the characters corresponding to the playing control in the displayed target text information in the first character displaying control.

Specifically, when the user drags the voice playing control, the characters corresponding to the voice playing control in the displayed target text information in the first character display control are highlighted, and meanwhile, the user jumps to the step of voice broadcasting the characters corresponding to the voice playing control.

In a possible embodiment, after the transcription client transmits the modified target translation result to the site end, the method further includes step S401:

s401: the site terminal updates the target translation result to be displayed by using the modified target translation result; or the site end updates the displayed target translation result and the target translation result to be displayed by using the modified target translation result, and updates the target text information and the target translation result each time.

Specifically, after the transcription client transmits the modified target translation result to the site end, the modified target translation result may be used to update the target translation result to be displayed, and the displayed target translation result is not updated; or updating the target translation result to be displayed by using the modified target translation result, and replacing the displayed target translation result by using the content corresponding to the displayed target translation result in the modified target translation result.

In a possible embodiment, when the transcription client displays the target text information, the target translation result, and the target role corresponding to the target text information on a transcription display, the method includes steps S501 to S502:

s501: and the transcription client highlights the obscure words in the target text information in the region of the transcription display.

Specifically, the transcription client can perform color filling or bolding display on the region where the uncommon word in the target text information is located in the transcription display besides highlight display, or perform display in other forms capable of realizing highlight display.

S502: for each highlight display area, the transcription client responds to the click operation of the transcription personnel on the highlight display area, and paraphrase content of the uncommon words in the highlight display area is displayed.

Specifically, when the transcription personnel click on the area for highlighting in the text display control, the paraphrase of the uncommon word in the highlighting area is determined from the paraphrase library in the transcription client, and paraphrase content is displayed in the transcription display, and can be displayed below the uncommon word or displayed in a text display control specially used for displaying the paraphrase content in a graphical user interface.

In a possible embodiment, before the transcription client displays the target text information, the target translation result, and the target role corresponding to the target text information on a transcription display, the method further includes steps S601 to S603:

s601: after the transcription personnel input the site end IP in the transcription client, the transcription client determines the site end with the site end IP in the currently connectable terminal according to the site end IP.

Specifically, before a transcription worker starts a transcription client, a login account and a password are input in the transcription client for login verification, after the login verification is successful, the transcription client starts WebSocket (a protocol for full duplex communication on a single TCP connection) connection monitoring, and simultaneously IP information is input in the transcription client, so that the transcription client determines a field terminal which is the same as the IP information from all terminals which can be connected with the transcription client and establishes connection with the field terminal.

S602: and the transcription client sends a session request to the site terminal.

Specifically, the transcription client sends a session request to the site end determined in step S601, or sends a request for establishing a channel.

S603: and the field terminal responds to a session request sent by the transcription client terminal and establishes a session with the transcription client terminal so as to realize the transmission of the target text information, the target translation result and the target role corresponding to the target text information with the transcription client terminal.

Specifically, the field side responds to a session request sent by the transcription client side, a session is established with the transcription client side, so that the target text information, the target translation result and the target role corresponding to the target text information are transmitted to the transcription client side, and after the transcription client side receives the information sent by the field side, the information is stored in a local database of the transcription client side.

In a possible embodiment, when the transcription client broadcasts the target text information and the target translation result in voice through its speaker, the method includes the steps S701:

s701: the transfer client displays a graphic display control in a transfer display in real time, wherein a voice oscillogram and a broadcast progress point at the current moment are displayed in the graphic display control.

Specifically, report progress point and voice waveform graph coincidence each other to the realization instructs the voice broadcast progress through the position of reporting progress point in voice waveform graph.

Determining the content in the voice waveform diagram and the position of the broadcast progress point through the following steps, wherein the steps comprise S801-S802:

s801: and the transcription client generates a voice oscillogram of the voice information according to the decibel value of each character in the voice information acquired by the site terminal through the voice acquisition terminal.

Specifically, the voice information is an information carrier with frequency and amplitude changes of regular sound waves of voice, music and sound effects, according to characteristics of the sound waves, the voice information can be regarded as a continuously-changing analog signal and is characterized by using a continuous curve, the curve has different kurtosis at different moments, and the higher the decibel value of the voice information at the moment is, the maximum the kurtosis of the curve corresponding to the moment is.

S802: and the transfer client determines the position of the broadcast progress point at the current moment in the voice oscillogram according to the position of the current voice broadcast character in the target text information.

Specifically, the position of the currently voice-broadcasted character in the target text message, the relative position between the first character and the last character in the target text message, the position of the broadcasting progress point at the current moment, and the relative position between the first voice point and the last voice point in the voice oscillogram are the same.

In a possible embodiment, after the target text information and the target translation result are subjected to voice broadcast through the speaker thereof, the method further includes step S901:

s901: and the transcription client responds to a target speed-multiplying broadcasting instruction given by the transcription personnel, and carries out voice broadcasting on the target text information and the target translation result according to the target speed-multiplying speed.

Specifically, the target multiple speed may be 0.75 multiple speed, 0.5 multiple speed, 1.5 multiple speed and 2.0 multiple speed, or the target text information and the target translation result may be broadcasted by voice according to the multiple speed value input by the user.

In the real-time transcription system based on the multilingual conference system, the content of the target text information and the target translation result can be updated through the following steps, wherein the target text information is character information recorded in a first language, and the target translation result is character information recorded in a second language:

the field terminal acquires voice information in real time through the voice acquisition terminal; the site terminal performs voice recognition on the voice information through the server to obtain text information to be displayed; the site terminal translates the text information to be displayed through the server to obtain a translation result to be displayed; the field terminal judges the sum of the current displayed text information and the text information to be displayed in real time and whether the sum of the current displayed translation result and the translation result to be displayed exceeds the maximum reference value which can be displayed by a preset display control or not; if the sum of the displayed text information and the text information to be displayed and the sum of the displayed translation result and the translation result to be displayed both exceed the maximum reference value, the site end removes at least part of the displayed text information and the displayed translation result, merges the remaining part of the displayed text information and the text information to be displayed into the target text information, and merges the remaining part of the displayed translation result and the translation result to be displayed into the target translation result; if the sum of the displayed text information and the text information to be displayed and the sum of the displayed translation result and the translation result to be displayed do not exceed the maximum reference value, the site end merges the displayed text information and the text information to be displayed into target text information and merges the displayed translation result and the translation result to be displayed into a target translation result.

Responding to a control instruction issued by the user in the area where the text display control is located, and selecting a target controlled control from the displayed presentation by the field terminal according to the operation position of the control instruction; the site terminal operates the target controlled control according to the operation content indicated by the control instruction; or responding to a control instruction issued by the user in the area where the text display control is located, and selecting a target controlled control from displayed text display controls by the field terminal according to the operation position of the control instruction; and the site terminal operates the target controlled control according to the operation content indicated by the control instruction.

In the real-time transcription system based on the multi-language conference system, the target text information and the target translation result can be determined in real time through the following steps:

the site terminal collects voice information in real time through the voice collecting terminal; the site terminal performs voice recognition on the voice information through the server to obtain the target text information; the field terminal translates the term text in the target text information according to a first preset translation rule in a cloud term library through the server to obtain a first translation result; the field terminal translates the non-term text in the target text information through the server according to a second preset translation rule in a cloud translation library to obtain a second translation result; and the site end combines the first translation result and the second translation result according to the character sequence in the target text information through the server to obtain the target translation result.

After the field end combines the first translation result and the second translation result according to the word sequence in the target text information through the server to obtain the target translation result, the field end judges whether a local term base and a local translation base are needed to be started to translate the target text information according to the network delay time between the field end and the server, wherein the local term base and the local translation base are stored in the field end in advance; when the network delay time exceeds a first preset threshold, the site terminal translates the term text in the target text information according to a third translation rule in the local term library to obtain a third translation result; the site terminal translates the non-term text in the target text information through a fourth translation rule in the local translation library to obtain a fourth translation result; the field end combines the third translation result and the fourth translation result according to the character sequence in the target text information to obtain a local translation result; and the site terminal updates the target translation result by using the local translation result.

In the real-time transcription system based on the multi-language conference system, the data and the information in the conference can be acquired in real time through the following steps:

displaying a control in real time in a graphical user interface of the field display, wherein a conference two-dimensional code generated by the server according to a product serial number of the field terminal is displayed in the image display control, the conference two-dimensional code is used for accessing an acquisition address of a remote screen projection interface, and the image display control is suspended in the graphical user interface; after a user scans the conference two-dimensional code by using the mobile terminal, the mobile terminal displays the remote screen projection interface in a display of the mobile terminal; the field terminal responds to the selection operation of the user on the target language and translates the target text information into a target translation through the server, wherein the target translation is character information recorded by adopting the target language; and the mobile terminal displays the target translation in the remote screen-casting interface.

After the mobile terminal displays the target translation in the remote screen-casting interface, the mobile terminal determines whether to receive the presentation, the voice information, the target text information and the target translation result according to the network delay time between the mobile terminal and the site end; when the network delay time is less than or equal to a preset first threshold value, the mobile terminal receives the presentation, the voice information, the target text information and the target translation result, and displays the presentation, the voice information, the target text information and the target translation result in the remote screen-casting interface; when the network delay duration is greater than the first threshold and less than or equal to a preset second threshold, the mobile terminal receives the voice information, the target text information and the target translation result and displays the voice information, the target text information and the target translation result in the remote screen-casting interface, wherein the first threshold is less than the second threshold; and when the network delay time is greater than the second threshold value, the mobile terminal receives the target text information and the target translation result and displays the target text information and the target translation result in the remote screen-casting interface.

In addition, the real-time transcription system based on the multi-language conference system is deployed in intelligent equipment at a site end, and in the system structure of the intelligent equipment, the functions of caption display are realized by a user interface layer, an interactive logic processing layer, a service logic layer, a service component layer and a data acquisition and processing layer:

the user interface layer is composed of a plurality of behaviors and is responsible for displaying the user interaction interface, receiving the operation of the user and feeding back the operation of the user in time; each page is an activity, the layout of each activity is configured by a specific layout file, and for different resolutions and screen sizes, multiple sets of layouts are set to ensure the normal display of the user interface at various resolutions and screen sizes.

An interactive logic processing layer for processing interactive events between users and behaviors, such as remote controller operations (including up key, down key, left key, right key, enter key, return key, menu key) and the like; according to different user operations, the user interface is controlled to have different changes, the operations are mapped into user behaviors, and different business operations are executed.

The business logic layer is used for processing the operation behavior of the user, for example, when the user clicks a setting button, the setting page is switched to, and when the user clicks a return button, the main interface is switched back; the business logic layer is composed of a plurality of sub-modules, including: the system comprises a voice recognition logic module, a machine translation logic module, a scene setting logic module, a video and audio setting logic module, a registration module, a login module, a conference summary module and the like.

The service component layer is used for processing the instruction of the business logic layer, and is composed of 3 sub-modules, which are respectively: the system comprises a speech recognition engine layer, a machine translation engine layer and an interface open service layer. When the service logic layer informs the service component layer to process the starting voice recognition action of the user, the voice recognition submodule firstly initializes the recognition engine selected by the current user, sets the parameters of the language selected by the user and the like to the voice recognition engine, selects the corresponding voice recognition engine according to the used language, calls the starting recognition interface of the engine, returns the voice recognition result or error information to the service logic layer, and the service logic layer transmits the information upwards layer by layer and finally displays the voice recognition result on the user interface layer.

And the data acquisition and processing layer consists of a networking frame, a JSON (JavaScript Object Notation) analysis frame and a message pushing frame, is responsible for sending a data request to the server, processing data returned by the server and returning the structured data to the upper layer by layer.

That is to say, the real-time transcription system based on the multi-language conference system is deployed in a client architecture of a field side, the client architecture comprises a user interface, a service logic layer, a basic component layer and a data link layer, wherein the steps S101 and S102 are executed in the user interface; step S201, step S202, step S203, step S204 and step S401 are executed in the business logic layer; step S603 is performed at the data link layer.

Example two

Referring to the drawings, fig. 6 is a schematic structural diagram of a real-time transcription system based on a multi-language conference system according to a second embodiment of the present invention, where as shown in fig. 6, the real-time transcription system based on the multi-language conference system according to the second embodiment of the present invention includes a server 601, a site end 602, a voice collecting terminal 603, a site display 604 and a transcription client 605, where the site end 602 is connected to the transcription client 605, the voice collecting terminal 603 and the site display 604 through a local area network, and the site end 602 is connected to the server 601 through a wide area network:

the transfer client is used for displaying the target text information, the target translation result and a target role corresponding to the target text information on a transfer display, and performing voice broadcast on the target text information and the target translation result through a loudspeaker of the transfer client;

EXAMPLE III

Based on the same application concept, referring to fig. 7, fig. 7 is a schematic structural diagram of a computer device according to a third embodiment of the present invention, where as shown in fig. 7, a computer device 700 according to the third embodiment of the present invention includes:

a processor 701, a memory 702 and a bus 703, wherein the memory 702 stores machine-readable instructions executable by the processor 701, when the computer device 700 is running, the processor 701 and the memory 702 communicate with each other through the bus 703, and when the processor 701 runs, the machine-readable instructions perform the steps of the real-time transcription based on the multi-language conference system according to the first embodiment.

Example four

Based on the same application concept, the embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, performs the steps of the real-time transcription based on the multi-language conference system described in any of the above embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The computer program product for performing real-time transcription based on the multi-language conference system provided by the embodiment of the present invention includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.

The real-time transcription system based on the multi-language conference system provided by the embodiment of the invention can be specific hardware on equipment or software or firmware installed on the equipment and the like. The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A real-time transcription method based on a multi-language conference system is characterized in that the method is applied to the real-time transcription system based on the multi-language conference system, the real-time transcription system based on the multi-language conference system comprises a server, a site terminal, a voice acquisition terminal, a site display and a transcription client, the site terminal is respectively connected with the transcription client, the voice acquisition terminal and the site display through a local area network, and the site terminal is connected with the server through a wide area network, and the method comprises the following steps:

the site terminal collects voice information in real time through the voice collecting terminal;

2. The method according to claim 1, wherein after the site terminal translates the target text information by the server to obtain the target translation result, the method further comprises:

3. The method of claim 1, wherein after the transcription client transmits the modified target translation result to the site end, the method further comprises:

or the like, or a combination thereof,

4. The method of claim 1, wherein when the transcription client displays the target text information, the target translation result, and a target role corresponding to the target text information on a transcription display, the method comprises:

5. The method of claim 1, wherein before the transcription client displays the target text information, the target translation result, and the target role corresponding to the target text information on a transcription display, the method further comprises:

the transcription client sends a session request to the site terminal;

6. The method of claim 1, wherein when the transcription client broadcasts the target text information and the target translation result in voice through its speaker, the method comprises:

the transcription client generates a voice oscillogram of the voice information according to the decibel value of each character in the voice information acquired by the field terminal through the voice acquisition terminal;

7. The method according to claim 1, wherein after the target text information and the target translation result are subjected to voice broadcast through a speaker thereof, the method further comprises:

8. A real-time transcription system based on a multi-language conference system is characterized by comprising a server, a site terminal, a voice acquisition terminal, a site display and a transcription client, wherein the site terminal is respectively connected with the transcription client, the voice acquisition terminal and the site display through a local area network, and the site terminal is connected with the server through a wide area network:

9. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when a computer device is running, the machine-readable instructions when executed by the processor performing the steps of the real-time transcription method based on the multi language conference system according to any one of claims 1 to 7.

10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for real-time transcription based on a multilingual conferencing system of any one of claims 1 to 7.