CN116933806A

CN116933806A - Concurrent translation system and concurrent translation terminal

Info

Publication number: CN116933806A
Application number: CN202311024945.2A
Authority: CN
Inventors: 黄发洋; 李艳雄; 席艺涵
Original assignee: Ningbo Yilian Technology Co ltd
Current assignee: Ningbo Yilian Technology Co ltd
Priority date: 2023-08-15
Filing date: 2023-08-15
Publication date: 2023-10-24

Abstract

The application discloses a simultaneous transmission translation system and a simultaneous transmission translation terminal, which relate to the technical field of translation systems, wherein a quality evaluation module comprehensively analyzes understanding data and translation data based on a quality analysis model, evaluates whether the current translation quality is qualified, when the evaluation result is that the current translation quality is unqualified, a regulation and control module wakes a secondary optimization module, the secondary optimization module selects other translators to repeatedly translate acquired contents, if the re-translation quality is continuously unqualified for more than two times, a speaker is prompted to re-input voice, if the translation quality is qualified, the translation data is sent to a text synthesis module, and when the evaluation result is that the current translation quality is qualified, the regulation and control module sends the translation data to the text synthesis module. When the translation system is used for translating the video conference, the translation quality can be evaluated in real time and processed, so that the translation accuracy is effectively ensured, and the stable progress of the conference is ensured.

Description

Concurrent translation system and concurrent translation terminal

Technical Field

The application relates to the technical field of translation systems, in particular to a concurrent translation system and a concurrent translation terminal.

Background

The simultaneous interpretation system, also called as 'simultaneous interpretation system', is a technical tool specially designed for real-time interpretation and interpretation, and aims to realize instant and accurate language interpretation in the occasions of lectures, conferences, negotiations and the like, so that people with different language backgrounds can effectively communicate and understand each other;

in the middle of the 20 th century, multilingual communication is becoming more and more common in international conferences and other activities, but language barriers become a significant problem, and the traditional continuous interpretation requires a long time, so that communication is easily interrupted, and the occurrence of a simultaneous interpretation system aims to solve the problem.

The prior art has the following defects:

when an emergency situation occurs (such as a serious event of an enterprise), the enterprise holds an emergency video conference and can not configure translation staff in time, and auxiliary translation is needed to be performed through a concurrent translation system at this time, however, when the conventional concurrent translation system performs real-time video conference translation, the translation quality is not evaluated, so that inaccurate translation results or wrong translation are easily caused, and the conference is affected.

Disclosure of Invention

The application aims to provide a concurrent translation system and a concurrent translation terminal, which are used for solving the defects in the background technology.

In order to achieve the above object, the present application provides the following technical solutions: the simultaneous transmission translation system comprises a voice input module, a voice recognition module, a semantic understanding module, a translation module, a quality evaluation module, a regulation and control module, a secondary optimization module, a text synthesis module, a voice output module and a user interface module;

a voice input module: converting speech input of a presenter into digitized speech data;

and a voice recognition module: converting the voice data into a text form;

semantic understanding module: semantic analysis and understanding are carried out on the identified text, and the intention of a lecturer and expressed content are obtained;

and a translation module: converting the text of the source language into the text of the target language, and performing language translation;

the quality evaluation module: comprehensively analyzing the understanding data and the translation data based on the quality analysis model, and evaluating whether the current translation quality is qualified;

and a regulation and control module: when the evaluation result is that the current translation quality is not qualified, waking up the secondary optimization module, and when the evaluation result is that the current translation quality is qualified, transmitting translation data to the text synthesis module;

and a secondary optimization module: repeatedly translating the acquired content, prompting a speaker to re-input voice if the re-translating quality is continuously more than twice and is unqualified, and transmitting translation data to a text synthesis module if the translating quality is qualified;

and a text synthesis module: converting the translated target language text into voice data;

and the voice output module is used for: delivering the synthesized voice data to a listener through an audio output device;

a user interface module: and displaying the prompt information to a user.

Preferably, the understanding data comprises a voice correct recognition rate, and the translation data comprises a translation result similarity index, word level matching degree and a network packet loss rate during translation.

Preferably, the mass analysis model establishment includes the following steps:

the quality coefficient zlx is obtained by comprehensively calculating the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation, and the calculation expression is as follows:

wherein zqy is a voice correct recognition rate, xsf is a translation result similarity index, jpf is word level matching degree, dbw is a network packet loss rate during translation, and α, β, γ, δ are respectively the voice correct recognition rate, the translation result similarity index, the word level matching degree, and the proportionality coefficient of the network packet loss rate during translation, and α, β, γ, δ are all greater than 0;

after the value of the quality coefficient zlx is obtained, the value of the quality coefficient zlx is compared with a quality threshold value, and the establishment of a quality analysis model is completed.

Preferably, after the quality evaluation module obtains the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation, the evaluation of whether the current translation quality is qualified or not based on the analysis of the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation by the quality analysis model comprises the following steps:

substituting the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation into a quality coefficient calculation formula to calculate and obtain a quality coefficient zlx value;

if the quality coefficient zlx value is more than or equal to the quality threshold value, evaluating that the current translation quality is qualified;

and if the quality coefficient zlx value is less than the quality threshold value, evaluating that the current translation quality is unqualified.

Preferably, the calculation expression of the voice correct recognition rate is:

in the formula, zq voice correctly recognizes the word number, lj is the voice understanding recognition word number, and cw is the voice unrecognizable word number.

Preferably, the calculation expression of the translation result similarity index is:

in the method, in the process of the application,representing the ith n-gram, + in candidate text c>Represents the j-th n-gram in the reference text,representation->And->Similarity between N-grams is typically calculated using the BLEU or other similarity measure between N-grams, M being the number of reference texts and N being the total number of N-grams in the candidate texts.

Preferably, the word level matching degree is calculated by the following expression:

jpf＝(1-τ)*P+τ*R*F

where P represents the exact match rate, R represents the recall rate, F represents the F1 score, and τ is a parameter that balances the exact match rate and the recall rate.

Preferably, the calculation expression of the F1 score F is:

p represents an exact match rate, R represents a recall rate, the exact match rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the machine translation result, and the recall rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the reference translation.

Preferably, the calculation expression of the network packet loss rate during translation is:

where dsb is the number of packets lost and zfb is the total number of packets sent.

The application also provides a concurrent translation terminal and a concurrent translation system.

In the technical scheme, the application has the technical effects and advantages that:

1. according to the application, the semantic analysis and understanding are carried out on the identified text through the semantic understanding module so as to acquire the intention and expressed content of a presenter, the translation module converts the text of the source language into the text of the target language, the translation function of the language is realized, the quality evaluation module comprehensively analyzes the understanding data and the translation data based on the quality analysis model, whether the current translation quality is qualified or not is evaluated, when the evaluation result is that the current translation quality is unqualified, the regulation and control module wakes up the secondary optimization module, the secondary optimization module selects other translators to translate acquired content for a plurality of times, if the re-translation quality is continuously unqualified for more than two times, the presenter is prompted to re-input voice, if the translation quality is qualified, the translation data is sent to the text synthesis module, and when the evaluation result is that the current translation quality is qualified, the regulation and control module sends the translation data to the text synthesis module, and the translation system can evaluate the translation quality in real time and process when the translation system translates in a video conference, the translation accuracy is effectively ensured, and the stable conference is ensured.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a block diagram of a system according to the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Example 1: referring to fig. 1, the simultaneous transmission translation system of the present embodiment includes a voice input module, a voice recognition module, a semantic understanding module, a translation module, a quality evaluation module, a regulation module, a secondary optimization module, a text synthesis module, a voice output module, and a user interface module;

a voice input module: the module is used to convert the speech input of the presenter into digitized speech data, typically implemented by a microphone or other speech input device, which is sent to the speech recognition module.

And a voice recognition module: the module converts the voice data into a text form, namely converts the voice data of a presenter into corresponding characters, and sends the recognition text to the semantic understanding module;

pretreatment: the collected audio signals may contain noise, echo and other interference, and preprocessing is needed to improve the accuracy of voice recognition; the preprocessing may include denoising, audio enhancement, etc.;

feature extraction: converting the audio signal into a mathematical feature representation is a key step in speech recognition; audio signals are typically converted into a series of eigenvectors using techniques such as mel-frequency cepstral coefficients (MFCCs);

acoustic model: the acoustic model is an important component of speech recognition, and is a trained model for mapping feature vectors to units of text at the phoneme or subword level; common acoustic models include Hidden Markov Models (HMMs) and deep learning models, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), and the like;

decoding: in the decoding stage, the speech recognition system uses the probability distribution generated by the acoustic model and the assistance of the language model to find the most probable text sequence; the decoding process generally uses techniques such as viterbi algorithm;

post-treatment: the decoded text sequence may contain erroneous or unnatural parts, and the post-processing step may further optimize recognition results, such as performing spelling correction, grammar correction, etc.;

outputting text: finally, the speech recognition module outputs the decoded text sequence as a textual representation of the speech content of the presenter.

Semantic understanding module: the module performs semantic analysis and understanding on the identified text to acquire the intention of a speaker and expressed content, the acquired content is sent to the translation module and the secondary optimization module, and understanding data is sent to the quality evaluation module;

lexical analysis: dividing the recognized text into words or phrases, performing lexical analysis, and determining the part of speech, morphology and basic attribute of each word;

syntax analysis: syntactic analysis is a process of analyzing sentence structure, and determining grammatical relation and hierarchical structure among words; this helps to understand the main predicate relationships, modifier relationships, etc. in sentences;

semantic role labeling: semantic role labeling marks each word in a sentence as a different semantic role, such as a subject, an action, an object, etc.; this helps capture semantic relationships and logical structures in sentences;

named entity identification: identifying named entities in the text, such as person names, place names, organization names, etc., to help understand specific information in sentences;

dependency analysis: the dependency analysis is to analyze the dependency relationship between words and determine the relationship between each word and other words in sentences; this can help understand the structure and meaning of sentences;

semantic parsing: semantic parsing is a process of converting sentences into semantic representations, and modeling the relationship between words in the sentences and sentence meanings; this helps capture semantic information of sentences;

intent analysis: deducing the intention and the purpose of a presenter according to the semantic representation of the sentence; this may involve an operation, action, request, etc. extracted from the sentence;

emotion analysis: in some cases, the semantic understanding module may also need emotion analysis to determine emotion colors expressed in sentences to better understand the emotion attitudes of the presenter.

And a translation module: the module converts the text of the source language into the text of the target language to realize the translation function of the language, and the module can use machine translation technology such as statistical machine translation or neural machine translation, and the translation data is sent to the quality evaluation module;

pretreatment: before machine translation is performed, preprocessing is required to be performed on the source language text, including word segmentation, punctuation removal, lower case conversion and the like; these steps help provide better input data to the translation model;

feature extraction (for SMT): in statistical machine translation, it is necessary to convert the source language text into a feature vector representation; this typically involves the use of vocabularies, phrase tables, and language models;

coding (for NMT): in neural machine translation, the source language text is encoded as a continuous vector representation, for example using a Recurrent Neural Network (RNN) or a transducer encoder;

decoding: decoding is the process of converting the feature vector or encoded representation into target language text; in statistical machine translation, phrase translation tables and language models may be used for decoding; in neural machine translation, a decoder is used to generate target language text;

and (3) generating a translation result: in the decoding process, generating text of a target language, which can be word, phrase or subword level;

post-treatment: the generated target language text may need post-processing, such as re-word segmentation, case processing, etc., to obtain a more natural translation result;

outputting the target language text: finally, the translation module outputs the generated target language text as a translation result.

The quality evaluation module: and comprehensively analyzing the understanding data and the translation data based on the quality analysis model, evaluating whether the current translation quality is qualified or not, and transmitting an evaluation result to the regulation and control module.

And a regulation and control module: and when the evaluation result is that the current translation quality is not qualified, waking up the secondary optimization module, and when the evaluation result is that the current translation quality is qualified, transmitting translation data to the text synthesis module.

And a secondary optimization module: and selecting other translators to translate the acquired content for multiple times, if the re-translation quality is continuously more than two times of disqualification, prompting a presenter to re-input voice, and if the translation quality is qualified, transmitting translation data to a text synthesis module and prompting information to a user interface module.

And a text synthesis module: the module converts the translated target language text into speech data for delivery to a listener, typically using text-to-speech synthesis techniques;

text analysis: firstly, analyzing a translated target language text, and knowing information such as content, mood, emotion and the like of the text; this helps to determine the proper pronunciation, intonation, and speed of speech;

speech synthesis engine selection: selecting proper speech synthesis engines, wherein the engines can generate natural and smooth speech based on different technologies and models;

acoustic model generation: generating an acoustic model using the selected speech synthesis engine, the model modeling a mapping relationship of text and sound; the acoustic model may be statistical-based or neural network-based;

pronunciation rules and speech library: in text-to-speech synthesis, pronunciation rules and a speech library need to be considered to ensure that the synthesized speech pronounces accurately and naturally; pronunciation rules may include pronunciation specifications for specific vocabulary, phonemes, accents, etc.;

setting synthesis parameters: setting synthesis parameters such as speech speed, tone, emotion and the like; these parameters can be adjusted according to the specific scene so that the synthesized voice more meets the requirements of the audience;

and (3) speech synthesis generation: inputting the translation text of the target language into a voice synthesis engine, and generating corresponding voice by the engine according to the acoustic model, the pronunciation rules and the parameters;

post-treatment: the synthesized speech may need post-processing to improve its quality and naturalness; post-processing may include smoothing of audio, denoising, etc.;

outputting voice data: finally, the text synthesis module outputs synthesized voice data, and the data can be transmitted to listeners to realize voice playing of the translated text.

And the voice output module is used for: the module delivers the synthesized voice data to the listener through a speaker or other audio output device so that the listener can hear the translation result;

audio transmission: the synthesized voice data needs to be sent to a loudspeaker or an audio output device through a proper audio transmission mode; this may be a wired or wireless transmission, such as audio cable, bluetooth, wi-Fi, etc.;

audio playback apparatus: selecting an appropriate audio playback device, such as a speaker, headphones, etc., to ensure that the listener can hear the resulting speech;

audio play control: controlling the operations of starting, pausing, stopping and the like of audio playing, and ensuring that synthesized voice is played at a proper time;

volume control: controlling the volume of the audio so that the voice of the translation result can be transmitted to the audience with proper volume, and avoiding the volume of the audio being too large or too small;

and (3) optimizing sound quality: for some special scenarios, it may be desirable to optimize sound quality, such as removing noise, adjusting timbre, etc., to provide a better hearing experience.

A user interface module: the prompt information is displayed to the user, the module provides a friendly interface for the user so that the user can operate and control the functions of the system, and the user interface can be a graphical interface, a voice interaction interface or other forms;

interface design: designing a user-friendly interface, and considering elements such as layout, colors, icons and the like, so that a user can intuitively understand the functions and operations of the interface;

and (3) interactive design: designing an interaction mode of a user and an interface, wherein the interaction mode comprises interaction elements such as buttons, text boxes, sliding bars and the like, so that the user can conveniently interact with the system;

graphical interface: if a graphical interface is adopted, visual presentation of the interface and interaction with a user are required to be realized; the user can interact with the system by clicking a button, filling in a text and the like;

voice interaction interface: for some scenarios, the user may prefer to interact through speech; the voice interaction interface can receive voice instructions of the user, recognize the intention of the user and execute corresponding operations;

feedback and cues: the user interface needs to be able to provide timely feedback and prompts to the user, informing the user of information such as whether the system is processing, operating is successful or not;

and (3) function control: the user interface allows the user to control various functions of the system, such as initiating speech recognition, starting translation, adjusting volume, etc.;

language selection: in a multi-language environment, the user interface may provide language selection functionality for a user to select a source language and a target language;

setting options: providing some configurable setting options, and enabling the user to adjust parameters of the system according to own requirements.

According to the application, the semantic analysis and understanding are carried out on the identified text through the semantic understanding module so as to acquire the intention and expressed content of a presenter, the translation module converts the text of the source language into the text of the target language, the translation function of the language is realized, the quality evaluation module comprehensively analyzes the understanding data and the translation data based on the quality analysis model, whether the current translation quality is qualified or not is evaluated, when the evaluation result is that the current translation quality is unqualified, the regulation and control module wakes up the secondary optimization module, the secondary optimization module selects other translators to translate acquired content for a plurality of times, if the re-translation quality is continuously unqualified for more than two times, the presenter is prompted to re-input voice, if the translation quality is qualified, the translation data is sent to the text synthesis module, and when the evaluation result is that the current translation quality is qualified, the regulation and control module sends the translation data to the text synthesis module, and the translation system can evaluate the translation quality in real time and process when the translation system translates in a video conference, the translation accuracy is effectively ensured, and the stable conference is ensured.

Example 2: the quality evaluation module comprehensively analyzes the understanding data and the translation data based on the quality analysis model, evaluates whether the current translation quality is qualified or not, and sends an evaluation result to the regulation and control module.

The understanding data comprise voice correct recognition rate, and the translation data comprise translation result similarity indexes, word level matching degree and network packet loss rate during translation;

the mass analysis model establishment comprises the following steps:

in the formula, zqy is the correct recognition rate of voice, xsf is the similarity index of the translation result, jpf is the word level matching degree, dbw is the network packet loss rate during translation, and α, β, γ and δ are the correct recognition rate of voice, the similarity index of the translation result, the word level matching degree and the proportionality coefficient of the network packet loss rate during translation, respectively, and α, β, γ and δ are all larger than 0.

After the quality evaluation module obtains the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation, the quality evaluation module analyzes the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation based on the quality analysis model, and the evaluation on whether the current translation quality is qualified comprises the following steps:

According to the application, after the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation are obtained through the quality evaluation module, whether the current translation quality is qualified or not is evaluated based on the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation, the analysis is more comprehensive, and the data processing efficiency is effectively improved.

In the application, the following components are added:

the calculation expression of the correct recognition rate of the voice is as follows:

in the formula, zq voice correctly recognizes the word number, lj is the voice understanding recognition word number, cw is the voice unrecognizable word number, and the greater the voice correct recognition rate is, the higher the voice recognition accuracy of the translation system to the speaker is, the translation system is shown:

1) More accurate translation basis: the translation system will accurately convert the speaker's words into text, which provides accurate input for subsequent translation steps;

2) More accurate translation: accurate speech recognition can help the translation system to better understand the intent and content of the presenter, thereby generating more accurate translation results;

3) Reducing misunderstandings and ambiguities: the voice recognition accuracy is high, misunderstanding and ambiguity caused by a wrong recognition result can be reduced, and the meaning of a presenter can be accurately conveyed by translation content;

4) Translation efficiency is improved: high accuracy speech recognition can reduce the work of correction and revision of the translator, thereby improving the translation efficiency.

The calculation expression of the translation result similarity index is as follows:

in the method, in the process of the application,representing the ith n-gram, + in candidate text c>Represents the j-th n-gram in the reference text,representation->And->Similarity between N-grams is typically calculated using the BLEU or other similarity measure between N-grams, M being the number of reference texts and N being the total N-gram number in the candidate texts;

the specific logic is as follows: for each ofCalculating the sum of the similarity between the candidate text and the n-gram in all the reference texts, summing and averaging the similarity, and dividing the sum by the total number of n-gram in the candidate texts;

the larger the translation result similarity index, the higher the similarity between the translation system and the reference text, i.e. the better the consistency between the candidate translation and the multiple reference translations.

The word level matching degree is calculated by the following expression:

jpf＝(1-τ)*P+τ*R*F

wherein P represents the exact match rate, R represents the recall rate, F represents the F1 score, and τ is a parameter for balancing the exact match rate and the recall rate;

wherein, the calculation expression of the F1 score F is as follows:

p represents an exact match rate, R represents a recall rate, the exact match rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the machine translation result, and the recall rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the reference translation;

τ is a parameter for balancing the exact match rate and recall rate, typically ranging from 0 to 1, considering only the exact match rate when τ is 0; when tau is 1, only the recall rate is considered, and the importance of the exact matching rate and the recall rate can be balanced according to specific requirements and scenes by adjusting the value of tau;

the greater the word level matching degree, the higher the translation quality of the translation system, and the better the translation system is in terms of word matching, fluency, semantic consistency and the like.

The calculation expression of the network packet loss rate during translation is as follows:

in the formula, dsb is the number of lost data packets, zfb is the total number of data packets sent, the number of lost data packets refers to the number of data packets which fail to reach a destination in the transmission process, the total number of data packets sent in the transmission process refers to the total number of data packets sent in the transmission process, the network packet loss rate during translation represents the proportion of the lost data packets in the data transmission process, and the high packet loss rate can cause interruption and distortion of voice transmission, so that the translation quality is affected.

And when the evaluation result is that the current translation quality is unqualified, the regulation and control module wakes up the secondary optimization module, and when the evaluation result is that the current translation quality is qualified, the regulation and control module sends the translation data to the text synthesis module.

And the secondary optimizing module selects other translators to translate the acquired content for multiple times again, if the re-translating quality is continuously more than twice and is unqualified, the speaker is prompted to re-input the voice, if the translating quality is qualified, the translating data is sent to the text synthesizing module, and the prompting information is sent to the user interface module.

The secondary optimizing module selects other translators to translate the acquired content for a plurality of times again, evaluates whether the translation quality is qualified or not based on the quality analysis model, and if the re-translation quality is continuously more than twice unqualified, the secondary optimizing module indicates that voice input errors or network influences possibly exist, so that a presenter needs to be prompted to re-input voice.

The concurrent translation terminal is used for operating the concurrent translation system.

The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas with a large amount of data collected for software simulation to obtain the latest real situation, and preset parameters in the formulas are set by those skilled in the art according to the actual situation.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The preferred embodiments of the application disclosed above are intended only to assist in the explanation of the application. The preferred embodiments are not intended to be exhaustive or to limit the application to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and the full scope and equivalents thereof.

Claims

1. A simultaneous translation system, characterized by: the system comprises a voice input module, a voice recognition module, a semantic understanding module, a translation module, a quality evaluation module, a regulation and control module, a secondary optimization module, a text synthesis module, a voice output module and a user interface module;

and a voice recognition module: converting the voice data into a text form;

a user interface module: and displaying the prompt information to a user.

2. The concurrent translation system according to claim 1, wherein: the understanding data comprise voice correct recognition rate, and the translation data comprise translation result similarity indexes, word level matching degree and network packet loss rate during translation.

3. A concurrent translation system according to claim 2 wherein: the mass analysis model establishment comprises the following steps:

4. A concurrent translation system according to claim 3, wherein: after the quality evaluation module obtains the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation, the quality evaluation module analyzes the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation based on the quality analysis model, and the evaluation on whether the current translation quality is qualified comprises the following steps:

5. The concurrent translation system according to claim 4, wherein: the calculation expression of the voice correct recognition rate is as follows:

6. The concurrent translation system according to claim 5, wherein: the calculation expression of the translation result similarity index is as follows:

in the method, in the process of the application,representing the ith n-gram, + in candidate text c>Represents the j-th n-gram in the reference text,representation->And->Between (a) and (b)Similarity, typically calculated using BLEU or other similarity measure between N-grams, M is the number of reference texts and N is the total N-gram number in the candidate texts.

7. The concurrent translation system according to claim 6, wherein: the word level matching degree is calculated by the following expression:

jpf＝(1-τ)*P+τ*R*F

8. The concurrent translation system according to claim 7, wherein: the calculation expression of the F1 score F is as follows:

9. The concurrent translation system according to claim 8, wherein: the calculation expression of the network packet loss rate during translation is as follows:

10. The simultaneous transmission translation terminal is characterized in that: a concurrent translation system as claimed in any one of claims 1 to 9.