CN112735430A - Multilingual online simultaneous interpretation system - Google Patents

Multilingual online simultaneous interpretation system Download PDF

Info

Publication number
CN112735430A
CN112735430A CN202011582495.5A CN202011582495A CN112735430A CN 112735430 A CN112735430 A CN 112735430A CN 202011582495 A CN202011582495 A CN 202011582495A CN 112735430 A CN112735430 A CN 112735430A
Authority
CN
China
Prior art keywords
audio
audience
original
target audio
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011582495.5A
Other languages
Chinese (zh)
Inventor
彭川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transn Iol Technology Co ltd
Original Assignee
Transn Iol Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Transn Iol Technology Co ltd filed Critical Transn Iol Technology Co ltd
Priority to CN202011582495.5A priority Critical patent/CN112735430A/en
Publication of CN112735430A publication Critical patent/CN112735430A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides a multilingual online simultaneous interpretation system, which comprises: the system comprises an audio and video acquisition end, a voice recognition end, a translator end and a spectator end; the audio and video acquisition end is used for acquiring an original video of the speech of the speaker and sending the original video to the translator end and the audience end; the translator end is used for inputting a target audio translated by the translator according to the original video and sending the target audio to the voice recognition end and the audience end; wherein, the language selected by the audience at the audience end is the same as the language of the target audio; the voice recognition end is used for recognizing the target audio frequency, acquiring characters corresponding to the target audio frequency and sending the characters corresponding to the target audio frequency to the audience end; and the audience terminal is used for playing the target audio and the original video and displaying the characters corresponding to the target audio. The invention realizes remote simultaneous interpretation of interpreters, saves time and cost and improves conference quality.

Description

Multilingual online simultaneous interpretation system
Technical Field
The invention relates to the technical field of simultaneous interpretation, in particular to a multilingual online simultaneous interpretation system.
Background
With the development of science and technology and society, the demand of translation is increasing, especially the demand of simultaneous interpretation. Firstly, the simultaneous interpretation in the conference with compact rhythm can save more time and improve the efficiency; secondly, some conferences involve more than two foreign languages, and under the circumstance, interactive interpretation is obviously unrealistic and relay simultaneous transmission is needed.
The simultaneous interpretation is the most difficult one in various interpretation activities, and is a popular interpretation mode at present. The simultaneous interpretation is characterized in that a speaker continuously speaks, a translator interprets while listening, and the average interval time between the translation of the original text and the translation of the translated text is three to four seconds and at most more than ten seconds. Ear-hearing, eye-watching, hand-writing and mouth-speaking are performed at almost the same time, and a translator only uses a slight gap between two adjacent sentences spoken by the speaker to complete the translation work, so that the requirement on the quality of a practitioner is very high.
At present, in a conference needing simultaneous interpretation, an interpreter must go to a conference site to finish simultaneous interpretation work with high quality, and the interpreter who interprets simultaneous interpretation usually needs to come to a city where the conference is located one day in advance to prepare, so that high cost is brought.
Disclosure of Invention
The invention provides a multilingual online simultaneous interpretation system, which is used for solving the defects that in the prior art, an interpreter of simultaneous interpretation needs to visit a meeting site and prepare in advance, time and labor are wasted, and the cost is high, and the interpreter can remotely carry out simultaneous interpretation.
The invention provides a multilingual online simultaneous interpretation system, which comprises an audio and video acquisition end, a voice recognition end, an interpreter end and an audience end, wherein the interpreter end is used for interpreting the audio and video acquisition end;
the audio and video acquisition end is used for acquiring an original video of a speaker for speaking and sending the original video to the interpreter end and the audience end;
the interpreter end is used for inputting a target audio translated by an interpreter according to the original video and sending the target audio to the voice recognition end and the audience end; wherein, the language selected by the audience at the audience end is the same as the language of the target audio;
the voice recognition end is used for recognizing the target audio frequency, acquiring characters corresponding to the target audio frequency and sending the characters corresponding to the target audio frequency to the audience end;
and the audience terminal is used for playing the target audio and the original video and displaying characters corresponding to the target audio.
According to the multilingual online simultaneous interpretation system provided by the invention, the audio and video acquisition end is also used for acquiring the original audio of the speech of the speaker and sending the original audio to the voice recognition end;
the voice recognition end is further used for recognizing the original audio, acquiring characters corresponding to the original audio and sending the characters corresponding to the original audio to the audience end;
and the audience terminal is used for displaying the characters corresponding to the original audio.
According to the multilingual online simultaneous interpretation system provided by the invention, the audio and video acquisition end is also used for acquiring the original audio of the speech of the speaker and sending the original audio to the audience end;
and the audience terminal is used for playing the original audio.
According to the multilingual online simultaneous interpretation system provided by the invention, a plurality of interpreter terminals are provided, and each interpreter terminal is used for inputting a target audio translated by each interpreter according to a corresponding segment of the original video; and the target audio of all the segments recorded by the translator end forms the target audio of the whole original video.
According to the multilingual online simultaneous interpretation system provided by the invention, the audio and video acquisition end is further used for acquiring the language of the speaker speaking input by the user and switching the audio and video acquisition end to acquire the language of the original video according to the language of the speaker speaking.
According to the multilingual online simultaneous interpretation system provided by the invention, the voice recognition end is further used for uploading the characters corresponding to the target audio and the characters corresponding to the original audio to a cloud end, so that the audience end can acquire and display the characters corresponding to the target audio and the characters corresponding to the original audio from the cloud end.
According to the multilingual online simultaneous interpretation system, the original video of the speech of the speaker is collected by the audio and video acquisition end and sent to the interpreter end, the interpreter can translate the original video through the remote interpreter end and input the translated audio, so that the interpreter can remotely interpret simultaneously, and time and cost are saved; and the audience can hear the audio translated by the translator through the audience end and can also see the subtitles of the audio translated by the translator, thereby reducing the gap sense that the audience cannot understand foreign languages, improving the conference quality and having no need of wearing additional equipment on the conference site.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a framework of a multilingual online simultaneous interpretation system provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The multilingual online simultaneous interpretation system of the present invention is described below with reference to fig. 1, and includes an audio/video acquisition end, a speech recognition end, an interpreter end and an audience end; the audio and video acquisition end is used for acquiring an original video of a speaker for speaking and sending the original video to the interpreter end and the audience end;
the multilingual online simultaneous interpretation system in the embodiment comprises four terminals, namely an audio and video acquisition terminal, a voice recognition terminal, an interpreter terminal and an audience terminal. The speaking scene of the speaker can be simultaneous interpretation of the live conference, simultaneous interpretation of the online conference, simultaneous interpretation of the live broadcast of the network and the like.
The audio and video acquisition end works under a Windows system and is mainly responsible for the acquisition of audio and video. And transmitting the acquired audio and video data to the interpreter terminal and the audience terminal through the live broadcast stream. The audio and video acquisition end is the starting point of the whole set of multi-language on-line simultaneous interpretation system. And taking the video collected by the audio and video collection end as an original video and taking the audio collected by the audio and video collection end as an original audio.
The interpreter end is used for inputting a target audio translated by an interpreter according to the original video and sending the target audio to the voice recognition end and the audience end; wherein, the language selected by the audience at the audience end is the same as the language of the target audio;
the interpreter side comprises Windows and Mac versions and is mainly used for recording and sending target audio translated by the interpreter according to the original video translated by the interpreter. The translator can work normally in any place with a computer without visiting the meeting site. The translator can see or hear the meeting site or see the remote shared picture with low delay, and the translation work is assisted. Conventional simultaneous interpretation equipment is not required.
The target audio is audio of a language desired by the viewer. Because different audiences can understand different languages, the audiences can select the required language at the audience side. Target audios of one or more languages are input into the interpreter according to needs. And then, the target audio of the corresponding language is sent to the audience according to the requirement of each audience. The embodiment can realize multi-language simultaneous interpretation.
The voice recognition end is used for recognizing the target audio frequency, acquiring characters corresponding to the target audio frequency and sending the characters corresponding to the target audio frequency to the audience end;
the voice recognition end works under an Android system, is mainly responsible for the recognition work of audio, and pushes recognized characters to a viewer end through Instant Messaging (IM).
And the audience terminal is used for playing the target audio and the original video and displaying characters corresponding to the target audio.
The viewer side is an APP (Application) used by the end user. The audiences can hear the target audio of the language selected by the audiences and see the characters of the target audio through the APP, and can see the original video of the speaker speaking.
In the embodiment, the original video of the speech of the speaker is collected by the audio and video acquisition end and is sent to the translator end, the translator can translate the original video through the remote translator end and input the translated audio, so that the remote simultaneous interpretation of the translator is realized, and the time and the cost are saved; and the audience can hear the audio translated by the translator through the audience end and can also see the subtitles of the audio translated by the translator, thereby reducing the gap sense that the audience cannot understand foreign languages, improving the conference quality and having no need of wearing additional equipment on the conference site.
On the basis of the above embodiment, in this embodiment, the audio/video acquisition end is further configured to acquire an original audio of a speech of a speaker, and send the original audio to the voice recognition end; the voice recognition end is further used for recognizing the original audio, acquiring characters corresponding to the original audio and sending the characters corresponding to the original audio to the audience end; and the audience terminal is used for displaying the characters corresponding to the original audio.
Specifically, the translated target audio is sent to the voice recognition end to recognize corresponding characters, and the collected original audio is sent to the voice recognition end to recognize corresponding characters. And the characters of the original audio are displayed through the audience terminal, so that the real-time subtitle display function of the original audio and the target audio is realized.
On the basis of the above embodiment, the audio/video acquisition end in this embodiment is further configured to acquire an original audio of a speech of a speaker, and send the original audio to the audience; and the audience terminal is used for playing the original audio.
Specifically, the original video or the original audio is sent to the audience, so that the audience can remotely see the original video or hear the original audio of the conference through the audience.
On the basis of the above embodiments, in this embodiment, there are a plurality of interpreter ends, and each interpreter end is configured to enter a target audio translated by each interpreter according to a corresponding segment of the original video; and the target audio of all the segments recorded by the translator end forms the target audio of the whole original video.
Specifically, because the simultaneous interpretation work intensity is high, when the original video is translated into the target audio of a certain language, a plurality of interpreters work together on line through corresponding interpreter terminals. For example, two translators take turns to translate.
On the basis of the foregoing embodiments, in this embodiment, the audio/video acquisition end is further configured to acquire the language type of the speaker speaking, which is input by the user, and switch the audio/video acquisition end to acquire the language type of the original video according to the language type of the speaker speaking.
Specifically, the audio/video acquisition terminal cannot identify the language used in the field. Therefore, after the language used on the spot is artificially determined, the language used on the spot is input. And switching the audio and video acquisition end to the language used on site to acquire the audio and video.
On the basis of the foregoing embodiment, in this embodiment, the voice recognition end is further configured to upload the text corresponding to the target audio and the text corresponding to the original audio to a cloud end, so that the audience obtains and displays the text corresponding to the target audio and the text corresponding to the original audio from the cloud end.
Specifically, the embodiment uploads the conference site real-time subtitles to the cloud synchronously. After the conference is finished, all text contents corresponding to the audio of the speaker and the translator in the current conference can be checked, so that the conference summary can be generated quickly.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. A multilingual online simultaneous interpretation system is characterized by comprising an audio and video acquisition end, a voice recognition end, an interpreter end and an audience end;
the audio and video acquisition end is used for acquiring an original video of a speaker for speaking and sending the original video to the interpreter end and the audience end;
the interpreter end is used for inputting a target audio translated by an interpreter according to the original video and sending the target audio to the voice recognition end and the audience end; wherein, the language selected by the audience at the audience end is the same as the language of the target audio;
the voice recognition end is used for recognizing the target audio frequency, acquiring characters corresponding to the target audio frequency and sending the characters corresponding to the target audio frequency to the audience end;
and the audience terminal is used for playing the target audio and the original video and displaying characters corresponding to the target audio.
2. The multilingual online simultaneous interpretation system according to claim 1, wherein the audio/video acquisition end is further configured to acquire an original audio of a speech of a speaker and send the original audio to the voice recognition end;
the voice recognition end is further used for recognizing the original audio, acquiring characters corresponding to the original audio and sending the characters corresponding to the original audio to the audience end;
and the audience terminal is used for displaying the characters corresponding to the original audio.
3. The multilingual online simultaneous interpretation system according to claim 1, wherein the audio/video acquisition end is further configured to acquire an original audio of a speech of a speaker and send the original audio to the audience;
and the audience terminal is used for playing the original audio.
4. The multilingual online simultaneous interpretation system according to any one of claims 1 to 3, wherein there are a plurality of said interpreter terminals, each of said interpreter terminals being configured to enter a target audio translated by each interpreter according to a corresponding segment of said original video; and the target audio of all the segments recorded by the translator end forms the target audio of the whole original video.
5. The multilingual online simultaneous interpretation system according to any one of claims 1 to 3, wherein the audio/video capture end is further configured to obtain a language spoken by the speaker input by a user, and switch the audio/video capture end to capture the language of the original video according to the language spoken by the speaker.
6. The multilingual online simultaneous interpretation system according to claim 2, wherein the speech recognition module is further configured to upload the text corresponding to the target audio and the text corresponding to the original audio to a cloud, so that the viewer can obtain the text corresponding to the target audio and the text corresponding to the original audio from the cloud and display the text corresponding to the target audio and the text corresponding to the original audio.
CN202011582495.5A 2020-12-28 2020-12-28 Multilingual online simultaneous interpretation system Pending CN112735430A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011582495.5A CN112735430A (en) 2020-12-28 2020-12-28 Multilingual online simultaneous interpretation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011582495.5A CN112735430A (en) 2020-12-28 2020-12-28 Multilingual online simultaneous interpretation system

Publications (1)

Publication Number Publication Date
CN112735430A true CN112735430A (en) 2021-04-30

Family

ID=75606597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011582495.5A Pending CN112735430A (en) 2020-12-28 2020-12-28 Multilingual online simultaneous interpretation system

Country Status (1)

Country Link
CN (1) CN112735430A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460371A (en) * 2021-06-09 2022-12-09 苏州译牛智能科技有限公司 Simultaneous interpretation method in video conference, server and readable storage medium
WO2023219556A1 (en) * 2022-05-13 2023-11-16 Song Peng A system and method to manage a plurality of language audio streams

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108650484A (en) * 2018-06-29 2018-10-12 中译语通科技股份有限公司 A kind of method and device of the remote synchronous translation based on audio/video communication
CN110677406A (en) * 2019-09-26 2020-01-10 上海译牛科技有限公司 Simultaneous interpretation method and system based on network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108650484A (en) * 2018-06-29 2018-10-12 中译语通科技股份有限公司 A kind of method and device of the remote synchronous translation based on audio/video communication
CN110677406A (en) * 2019-09-26 2020-01-10 上海译牛科技有限公司 Simultaneous interpretation method and system based on network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460371A (en) * 2021-06-09 2022-12-09 苏州译牛智能科技有限公司 Simultaneous interpretation method in video conference, server and readable storage medium
WO2023219556A1 (en) * 2022-05-13 2023-11-16 Song Peng A system and method to manage a plurality of language audio streams

Similar Documents

Publication Publication Date Title
US10176366B1 (en) Video relay service, communication system, and related methods for performing artificial intelligence sign language translation services in a video relay service environment
JP5564459B2 (en) Method and system for adding translation to a video conference
CN102006453B (en) Superposition method and device for auxiliary information of video signals
US20200294525A1 (en) Generating visual closed caption for sign language
CN111739553A (en) Conference sound acquisition method, conference recording method, conference record presentation method and device
CN110166729B (en) Cloud video conference method, device, system, medium and computing equipment
CN112653902B (en) Speaker recognition method and device and electronic equipment
CN112601101B (en) Subtitle display method and device, electronic equipment and storage medium
CN112735430A (en) Multilingual online simultaneous interpretation system
CN113225577B (en) Live stream processing method, device and system, electronic equipment and storage medium
CN110933485A (en) Video subtitle generating method, system, device and storage medium
CN112601102A (en) Method and device for determining simultaneous interpretation of subtitles, electronic equipment and storage medium
CN112764549B (en) Translation method, translation device, translation medium and near-to-eye display equipment
CN207854084U (en) A kind of caption display system
CN109743529A (en) A kind of Multifunctional video conferencing system
CN112738446B (en) Simultaneous interpretation method and system based on online conference
US20240233745A1 (en) Performing artificial intelligence sign language translation services in a video relay service environment
US20220264193A1 (en) Program production apparatus, program production method, and recording medium
CN107566863A (en) A kind of exchange of information methods of exhibiting, device and equipment, set top box
CN109905756B (en) Television caption dynamic generation method based on artificial intelligence and related equipment
CN116527840A (en) Live conference intelligent subtitle display method and system based on cloud edge collaboration
CN111160051A (en) Data processing method and device, electronic equipment and storage medium
CN115412702A (en) Conference terminal and video wall integrated equipment and system
CN115643424A (en) Live broadcast data processing method and system
CN205726152U (en) A kind of internet video interaction systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210430