CN116933806A - Concurrent translation system and concurrent translation terminal - Google Patents

Concurrent translation system and concurrent translation terminal Download PDF

Info

Publication number
CN116933806A
CN116933806A CN202311024945.2A CN202311024945A CN116933806A CN 116933806 A CN116933806 A CN 116933806A CN 202311024945 A CN202311024945 A CN 202311024945A CN 116933806 A CN116933806 A CN 116933806A
Authority
CN
China
Prior art keywords
translation
quality
module
voice
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311024945.2A
Other languages
Chinese (zh)
Inventor
黄发洋
李艳雄
席艺涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Yilian Technology Co ltd
Original Assignee
Ningbo Yilian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Yilian Technology Co ltd filed Critical Ningbo Yilian Technology Co ltd
Priority to CN202311024945.2A priority Critical patent/CN116933806A/en
Publication of CN116933806A publication Critical patent/CN116933806A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • H04L43/0829Packet loss

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a simultaneous transmission translation system and a simultaneous transmission translation terminal, which relate to the technical field of translation systems, wherein a quality evaluation module comprehensively analyzes understanding data and translation data based on a quality analysis model, evaluates whether the current translation quality is qualified, when the evaluation result is that the current translation quality is unqualified, a regulation and control module wakes a secondary optimization module, the secondary optimization module selects other translators to repeatedly translate acquired contents, if the re-translation quality is continuously unqualified for more than two times, a speaker is prompted to re-input voice, if the translation quality is qualified, the translation data is sent to a text synthesis module, and when the evaluation result is that the current translation quality is qualified, the regulation and control module sends the translation data to the text synthesis module. When the translation system is used for translating the video conference, the translation quality can be evaluated in real time and processed, so that the translation accuracy is effectively ensured, and the stable progress of the conference is ensured.

Description

Concurrent translation system and concurrent translation terminal
Technical Field
The application relates to the technical field of translation systems, in particular to a concurrent translation system and a concurrent translation terminal.
Background
The simultaneous interpretation system, also called as 'simultaneous interpretation system', is a technical tool specially designed for real-time interpretation and interpretation, and aims to realize instant and accurate language interpretation in the occasions of lectures, conferences, negotiations and the like, so that people with different language backgrounds can effectively communicate and understand each other;
in the middle of the 20 th century, multilingual communication is becoming more and more common in international conferences and other activities, but language barriers become a significant problem, and the traditional continuous interpretation requires a long time, so that communication is easily interrupted, and the occurrence of a simultaneous interpretation system aims to solve the problem.
The prior art has the following defects:
when an emergency situation occurs (such as a serious event of an enterprise), the enterprise holds an emergency video conference and can not configure translation staff in time, and auxiliary translation is needed to be performed through a concurrent translation system at this time, however, when the conventional concurrent translation system performs real-time video conference translation, the translation quality is not evaluated, so that inaccurate translation results or wrong translation are easily caused, and the conference is affected.
Disclosure of Invention
The application aims to provide a concurrent translation system and a concurrent translation terminal, which are used for solving the defects in the background technology.
In order to achieve the above object, the present application provides the following technical solutions: the simultaneous transmission translation system comprises a voice input module, a voice recognition module, a semantic understanding module, a translation module, a quality evaluation module, a regulation and control module, a secondary optimization module, a text synthesis module, a voice output module and a user interface module;
a voice input module: converting speech input of a presenter into digitized speech data;
and a voice recognition module: converting the voice data into a text form;
semantic understanding module: semantic analysis and understanding are carried out on the identified text, and the intention of a lecturer and expressed content are obtained;
and a translation module: converting the text of the source language into the text of the target language, and performing language translation;
the quality evaluation module: comprehensively analyzing the understanding data and the translation data based on the quality analysis model, and evaluating whether the current translation quality is qualified;
and a regulation and control module: when the evaluation result is that the current translation quality is not qualified, waking up the secondary optimization module, and when the evaluation result is that the current translation quality is qualified, transmitting translation data to the text synthesis module;
and a secondary optimization module: repeatedly translating the acquired content, prompting a speaker to re-input voice if the re-translating quality is continuously more than twice and is unqualified, and transmitting translation data to a text synthesis module if the translating quality is qualified;
and a text synthesis module: converting the translated target language text into voice data;
and the voice output module is used for: delivering the synthesized voice data to a listener through an audio output device;
a user interface module: and displaying the prompt information to a user.
Preferably, the understanding data comprises a voice correct recognition rate, and the translation data comprises a translation result similarity index, word level matching degree and a network packet loss rate during translation.
Preferably, the mass analysis model establishment includes the following steps:
the quality coefficient zlx is obtained by comprehensively calculating the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation, and the calculation expression is as follows:
wherein zqy is a voice correct recognition rate, xsf is a translation result similarity index, jpf is word level matching degree, dbw is a network packet loss rate during translation, and α, β, γ, δ are respectively the voice correct recognition rate, the translation result similarity index, the word level matching degree, and the proportionality coefficient of the network packet loss rate during translation, and α, β, γ, δ are all greater than 0;
after the value of the quality coefficient zlx is obtained, the value of the quality coefficient zlx is compared with a quality threshold value, and the establishment of a quality analysis model is completed.
Preferably, after the quality evaluation module obtains the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation, the evaluation of whether the current translation quality is qualified or not based on the analysis of the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation by the quality analysis model comprises the following steps:
substituting the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation into a quality coefficient calculation formula to calculate and obtain a quality coefficient zlx value;
if the quality coefficient zlx value is more than or equal to the quality threshold value, evaluating that the current translation quality is qualified;
and if the quality coefficient zlx value is less than the quality threshold value, evaluating that the current translation quality is unqualified.
Preferably, the calculation expression of the voice correct recognition rate is:
in the formula, zq voice correctly recognizes the word number, lj is the voice understanding recognition word number, and cw is the voice unrecognizable word number.
Preferably, the calculation expression of the translation result similarity index is:
in the method, in the process of the application,representing the ith n-gram, + in candidate text c>Represents the j-th n-gram in the reference text,representation->And->Similarity between N-grams is typically calculated using the BLEU or other similarity measure between N-grams, M being the number of reference texts and N being the total number of N-grams in the candidate texts.
Preferably, the word level matching degree is calculated by the following expression:
jpf=(1-τ)*P+τ*R*F
where P represents the exact match rate, R represents the recall rate, F represents the F1 score, and τ is a parameter that balances the exact match rate and the recall rate.
Preferably, the calculation expression of the F1 score F is:
p represents an exact match rate, R represents a recall rate, the exact match rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the machine translation result, and the recall rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the reference translation.
Preferably, the calculation expression of the network packet loss rate during translation is:
where dsb is the number of packets lost and zfb is the total number of packets sent.
The application also provides a concurrent translation terminal and a concurrent translation system.
In the technical scheme, the application has the technical effects and advantages that:
1. according to the application, the semantic analysis and understanding are carried out on the identified text through the semantic understanding module so as to acquire the intention and expressed content of a presenter, the translation module converts the text of the source language into the text of the target language, the translation function of the language is realized, the quality evaluation module comprehensively analyzes the understanding data and the translation data based on the quality analysis model, whether the current translation quality is qualified or not is evaluated, when the evaluation result is that the current translation quality is unqualified, the regulation and control module wakes up the secondary optimization module, the secondary optimization module selects other translators to translate acquired content for a plurality of times, if the re-translation quality is continuously unqualified for more than two times, the presenter is prompted to re-input voice, if the translation quality is qualified, the translation data is sent to the text synthesis module, and when the evaluation result is that the current translation quality is qualified, the regulation and control module sends the translation data to the text synthesis module, and the translation system can evaluate the translation quality in real time and process when the translation system translates in a video conference, the translation accuracy is effectively ensured, and the stable conference is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a block diagram of a system according to the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Example 1: referring to fig. 1, the simultaneous transmission translation system of the present embodiment includes a voice input module, a voice recognition module, a semantic understanding module, a translation module, a quality evaluation module, a regulation module, a secondary optimization module, a text synthesis module, a voice output module, and a user interface module;
a voice input module: the module is used to convert the speech input of the presenter into digitized speech data, typically implemented by a microphone or other speech input device, which is sent to the speech recognition module.
And a voice recognition module: the module converts the voice data into a text form, namely converts the voice data of a presenter into corresponding characters, and sends the recognition text to the semantic understanding module;
pretreatment: the collected audio signals may contain noise, echo and other interference, and preprocessing is needed to improve the accuracy of voice recognition; the preprocessing may include denoising, audio enhancement, etc.;
feature extraction: converting the audio signal into a mathematical feature representation is a key step in speech recognition; audio signals are typically converted into a series of eigenvectors using techniques such as mel-frequency cepstral coefficients (MFCCs);
acoustic model: the acoustic model is an important component of speech recognition, and is a trained model for mapping feature vectors to units of text at the phoneme or subword level; common acoustic models include Hidden Markov Models (HMMs) and deep learning models, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), and the like;
decoding: in the decoding stage, the speech recognition system uses the probability distribution generated by the acoustic model and the assistance of the language model to find the most probable text sequence; the decoding process generally uses techniques such as viterbi algorithm;
post-treatment: the decoded text sequence may contain erroneous or unnatural parts, and the post-processing step may further optimize recognition results, such as performing spelling correction, grammar correction, etc.;
outputting text: finally, the speech recognition module outputs the decoded text sequence as a textual representation of the speech content of the presenter.
Semantic understanding module: the module performs semantic analysis and understanding on the identified text to acquire the intention of a speaker and expressed content, the acquired content is sent to the translation module and the secondary optimization module, and understanding data is sent to the quality evaluation module;
lexical analysis: dividing the recognized text into words or phrases, performing lexical analysis, and determining the part of speech, morphology and basic attribute of each word;
syntax analysis: syntactic analysis is a process of analyzing sentence structure, and determining grammatical relation and hierarchical structure among words; this helps to understand the main predicate relationships, modifier relationships, etc. in sentences;
semantic role labeling: semantic role labeling marks each word in a sentence as a different semantic role, such as a subject, an action, an object, etc.; this helps capture semantic relationships and logical structures in sentences;
named entity identification: identifying named entities in the text, such as person names, place names, organization names, etc., to help understand specific information in sentences;
dependency analysis: the dependency analysis is to analyze the dependency relationship between words and determine the relationship between each word and other words in sentences; this can help understand the structure and meaning of sentences;
semantic parsing: semantic parsing is a process of converting sentences into semantic representations, and modeling the relationship between words in the sentences and sentence meanings; this helps capture semantic information of sentences;
intent analysis: deducing the intention and the purpose of a presenter according to the semantic representation of the sentence; this may involve an operation, action, request, etc. extracted from the sentence;
emotion analysis: in some cases, the semantic understanding module may also need emotion analysis to determine emotion colors expressed in sentences to better understand the emotion attitudes of the presenter.
And a translation module: the module converts the text of the source language into the text of the target language to realize the translation function of the language, and the module can use machine translation technology such as statistical machine translation or neural machine translation, and the translation data is sent to the quality evaluation module;
pretreatment: before machine translation is performed, preprocessing is required to be performed on the source language text, including word segmentation, punctuation removal, lower case conversion and the like; these steps help provide better input data to the translation model;
feature extraction (for SMT): in statistical machine translation, it is necessary to convert the source language text into a feature vector representation; this typically involves the use of vocabularies, phrase tables, and language models;
coding (for NMT): in neural machine translation, the source language text is encoded as a continuous vector representation, for example using a Recurrent Neural Network (RNN) or a transducer encoder;
decoding: decoding is the process of converting the feature vector or encoded representation into target language text; in statistical machine translation, phrase translation tables and language models may be used for decoding; in neural machine translation, a decoder is used to generate target language text;
and (3) generating a translation result: in the decoding process, generating text of a target language, which can be word, phrase or subword level;
post-treatment: the generated target language text may need post-processing, such as re-word segmentation, case processing, etc., to obtain a more natural translation result;
outputting the target language text: finally, the translation module outputs the generated target language text as a translation result.
The quality evaluation module: and comprehensively analyzing the understanding data and the translation data based on the quality analysis model, evaluating whether the current translation quality is qualified or not, and transmitting an evaluation result to the regulation and control module.
And a regulation and control module: and when the evaluation result is that the current translation quality is not qualified, waking up the secondary optimization module, and when the evaluation result is that the current translation quality is qualified, transmitting translation data to the text synthesis module.
And a secondary optimization module: and selecting other translators to translate the acquired content for multiple times, if the re-translation quality is continuously more than two times of disqualification, prompting a presenter to re-input voice, and if the translation quality is qualified, transmitting translation data to a text synthesis module and prompting information to a user interface module.
And a text synthesis module: the module converts the translated target language text into speech data for delivery to a listener, typically using text-to-speech synthesis techniques;
text analysis: firstly, analyzing a translated target language text, and knowing information such as content, mood, emotion and the like of the text; this helps to determine the proper pronunciation, intonation, and speed of speech;
speech synthesis engine selection: selecting proper speech synthesis engines, wherein the engines can generate natural and smooth speech based on different technologies and models;
acoustic model generation: generating an acoustic model using the selected speech synthesis engine, the model modeling a mapping relationship of text and sound; the acoustic model may be statistical-based or neural network-based;
pronunciation rules and speech library: in text-to-speech synthesis, pronunciation rules and a speech library need to be considered to ensure that the synthesized speech pronounces accurately and naturally; pronunciation rules may include pronunciation specifications for specific vocabulary, phonemes, accents, etc.;
setting synthesis parameters: setting synthesis parameters such as speech speed, tone, emotion and the like; these parameters can be adjusted according to the specific scene so that the synthesized voice more meets the requirements of the audience;
and (3) speech synthesis generation: inputting the translation text of the target language into a voice synthesis engine, and generating corresponding voice by the engine according to the acoustic model, the pronunciation rules and the parameters;
post-treatment: the synthesized speech may need post-processing to improve its quality and naturalness; post-processing may include smoothing of audio, denoising, etc.;
outputting voice data: finally, the text synthesis module outputs synthesized voice data, and the data can be transmitted to listeners to realize voice playing of the translated text.
And the voice output module is used for: the module delivers the synthesized voice data to the listener through a speaker or other audio output device so that the listener can hear the translation result;
audio transmission: the synthesized voice data needs to be sent to a loudspeaker or an audio output device through a proper audio transmission mode; this may be a wired or wireless transmission, such as audio cable, bluetooth, wi-Fi, etc.;
audio playback apparatus: selecting an appropriate audio playback device, such as a speaker, headphones, etc., to ensure that the listener can hear the resulting speech;
audio play control: controlling the operations of starting, pausing, stopping and the like of audio playing, and ensuring that synthesized voice is played at a proper time;
volume control: controlling the volume of the audio so that the voice of the translation result can be transmitted to the audience with proper volume, and avoiding the volume of the audio being too large or too small;
and (3) optimizing sound quality: for some special scenarios, it may be desirable to optimize sound quality, such as removing noise, adjusting timbre, etc., to provide a better hearing experience.
A user interface module: the prompt information is displayed to the user, the module provides a friendly interface for the user so that the user can operate and control the functions of the system, and the user interface can be a graphical interface, a voice interaction interface or other forms;
interface design: designing a user-friendly interface, and considering elements such as layout, colors, icons and the like, so that a user can intuitively understand the functions and operations of the interface;
and (3) interactive design: designing an interaction mode of a user and an interface, wherein the interaction mode comprises interaction elements such as buttons, text boxes, sliding bars and the like, so that the user can conveniently interact with the system;
graphical interface: if a graphical interface is adopted, visual presentation of the interface and interaction with a user are required to be realized; the user can interact with the system by clicking a button, filling in a text and the like;
voice interaction interface: for some scenarios, the user may prefer to interact through speech; the voice interaction interface can receive voice instructions of the user, recognize the intention of the user and execute corresponding operations;
feedback and cues: the user interface needs to be able to provide timely feedback and prompts to the user, informing the user of information such as whether the system is processing, operating is successful or not;
and (3) function control: the user interface allows the user to control various functions of the system, such as initiating speech recognition, starting translation, adjusting volume, etc.;
language selection: in a multi-language environment, the user interface may provide language selection functionality for a user to select a source language and a target language;
setting options: providing some configurable setting options, and enabling the user to adjust parameters of the system according to own requirements.
According to the application, the semantic analysis and understanding are carried out on the identified text through the semantic understanding module so as to acquire the intention and expressed content of a presenter, the translation module converts the text of the source language into the text of the target language, the translation function of the language is realized, the quality evaluation module comprehensively analyzes the understanding data and the translation data based on the quality analysis model, whether the current translation quality is qualified or not is evaluated, when the evaluation result is that the current translation quality is unqualified, the regulation and control module wakes up the secondary optimization module, the secondary optimization module selects other translators to translate acquired content for a plurality of times, if the re-translation quality is continuously unqualified for more than two times, the presenter is prompted to re-input voice, if the translation quality is qualified, the translation data is sent to the text synthesis module, and when the evaluation result is that the current translation quality is qualified, the regulation and control module sends the translation data to the text synthesis module, and the translation system can evaluate the translation quality in real time and process when the translation system translates in a video conference, the translation accuracy is effectively ensured, and the stable conference is ensured.
Example 2: the quality evaluation module comprehensively analyzes the understanding data and the translation data based on the quality analysis model, evaluates whether the current translation quality is qualified or not, and sends an evaluation result to the regulation and control module.
The understanding data comprise voice correct recognition rate, and the translation data comprise translation result similarity indexes, word level matching degree and network packet loss rate during translation;
the mass analysis model establishment comprises the following steps:
the quality coefficient zlx is obtained by comprehensively calculating the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation, and the calculation expression is as follows:
in the formula, zqy is the correct recognition rate of voice, xsf is the similarity index of the translation result, jpf is the word level matching degree, dbw is the network packet loss rate during translation, and α, β, γ and δ are the correct recognition rate of voice, the similarity index of the translation result, the word level matching degree and the proportionality coefficient of the network packet loss rate during translation, respectively, and α, β, γ and δ are all larger than 0.
After the value of the quality coefficient zlx is obtained, the value of the quality coefficient zlx is compared with a quality threshold value, and the establishment of a quality analysis model is completed.
After the quality evaluation module obtains the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation, the quality evaluation module analyzes the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation based on the quality analysis model, and the evaluation on whether the current translation quality is qualified comprises the following steps:
substituting the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation into a quality coefficient calculation formula to calculate and obtain a quality coefficient zlx value;
if the quality coefficient zlx value is more than or equal to the quality threshold value, evaluating that the current translation quality is qualified;
and if the quality coefficient zlx value is less than the quality threshold value, evaluating that the current translation quality is unqualified.
According to the application, after the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation are obtained through the quality evaluation module, whether the current translation quality is qualified or not is evaluated based on the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation, the analysis is more comprehensive, and the data processing efficiency is effectively improved.
In the application, the following components are added:
the calculation expression of the correct recognition rate of the voice is as follows:
in the formula, zq voice correctly recognizes the word number, lj is the voice understanding recognition word number, cw is the voice unrecognizable word number, and the greater the voice correct recognition rate is, the higher the voice recognition accuracy of the translation system to the speaker is, the translation system is shown:
1) More accurate translation basis: the translation system will accurately convert the speaker's words into text, which provides accurate input for subsequent translation steps;
2) More accurate translation: accurate speech recognition can help the translation system to better understand the intent and content of the presenter, thereby generating more accurate translation results;
3) Reducing misunderstandings and ambiguities: the voice recognition accuracy is high, misunderstanding and ambiguity caused by a wrong recognition result can be reduced, and the meaning of a presenter can be accurately conveyed by translation content;
4) Translation efficiency is improved: high accuracy speech recognition can reduce the work of correction and revision of the translator, thereby improving the translation efficiency.
The calculation expression of the translation result similarity index is as follows:
in the method, in the process of the application,representing the ith n-gram, + in candidate text c>Represents the j-th n-gram in the reference text,representation->And->Similarity between N-grams is typically calculated using the BLEU or other similarity measure between N-grams, M being the number of reference texts and N being the total N-gram number in the candidate texts;
the specific logic is as follows: for each ofCalculating the sum of the similarity between the candidate text and the n-gram in all the reference texts, summing and averaging the similarity, and dividing the sum by the total number of n-gram in the candidate texts;
the larger the translation result similarity index, the higher the similarity between the translation system and the reference text, i.e. the better the consistency between the candidate translation and the multiple reference translations.
The word level matching degree is calculated by the following expression:
jpf=(1-τ)*P+τ*R*F
wherein P represents the exact match rate, R represents the recall rate, F represents the F1 score, and τ is a parameter for balancing the exact match rate and the recall rate;
wherein, the calculation expression of the F1 score F is as follows:
p represents an exact match rate, R represents a recall rate, the exact match rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the machine translation result, and the recall rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the reference translation;
τ is a parameter for balancing the exact match rate and recall rate, typically ranging from 0 to 1, considering only the exact match rate when τ is 0; when tau is 1, only the recall rate is considered, and the importance of the exact matching rate and the recall rate can be balanced according to specific requirements and scenes by adjusting the value of tau;
the greater the word level matching degree, the higher the translation quality of the translation system, and the better the translation system is in terms of word matching, fluency, semantic consistency and the like.
The calculation expression of the network packet loss rate during translation is as follows:
in the formula, dsb is the number of lost data packets, zfb is the total number of data packets sent, the number of lost data packets refers to the number of data packets which fail to reach a destination in the transmission process, the total number of data packets sent in the transmission process refers to the total number of data packets sent in the transmission process, the network packet loss rate during translation represents the proportion of the lost data packets in the data transmission process, and the high packet loss rate can cause interruption and distortion of voice transmission, so that the translation quality is affected.
And when the evaluation result is that the current translation quality is unqualified, the regulation and control module wakes up the secondary optimization module, and when the evaluation result is that the current translation quality is qualified, the regulation and control module sends the translation data to the text synthesis module.
And the secondary optimizing module selects other translators to translate the acquired content for multiple times again, if the re-translating quality is continuously more than twice and is unqualified, the speaker is prompted to re-input the voice, if the translating quality is qualified, the translating data is sent to the text synthesizing module, and the prompting information is sent to the user interface module.
The secondary optimizing module selects other translators to translate the acquired content for a plurality of times again, evaluates whether the translation quality is qualified or not based on the quality analysis model, and if the re-translation quality is continuously more than twice unqualified, the secondary optimizing module indicates that voice input errors or network influences possibly exist, so that a presenter needs to be prompted to re-input voice.
The concurrent translation terminal is used for operating the concurrent translation system.
The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas with a large amount of data collected for software simulation to obtain the latest real situation, and preset parameters in the formulas are set by those skilled in the art according to the actual situation.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the application disclosed above are intended only to assist in the explanation of the application. The preferred embodiments are not intended to be exhaustive or to limit the application to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and the full scope and equivalents thereof.

Claims (10)

1. A simultaneous translation system, characterized by: the system comprises a voice input module, a voice recognition module, a semantic understanding module, a translation module, a quality evaluation module, a regulation and control module, a secondary optimization module, a text synthesis module, a voice output module and a user interface module;
a voice input module: converting speech input of a presenter into digitized speech data;
and a voice recognition module: converting the voice data into a text form;
semantic understanding module: semantic analysis and understanding are carried out on the identified text, and the intention of a lecturer and expressed content are obtained;
and a translation module: converting the text of the source language into the text of the target language, and performing language translation;
the quality evaluation module: comprehensively analyzing the understanding data and the translation data based on the quality analysis model, and evaluating whether the current translation quality is qualified;
and a regulation and control module: when the evaluation result is that the current translation quality is not qualified, waking up the secondary optimization module, and when the evaluation result is that the current translation quality is qualified, transmitting translation data to the text synthesis module;
and a secondary optimization module: repeatedly translating the acquired content, prompting a speaker to re-input voice if the re-translating quality is continuously more than twice and is unqualified, and transmitting translation data to a text synthesis module if the translating quality is qualified;
and a text synthesis module: converting the translated target language text into voice data;
and the voice output module is used for: delivering the synthesized voice data to a listener through an audio output device;
a user interface module: and displaying the prompt information to a user.
2. The concurrent translation system according to claim 1, wherein: the understanding data comprise voice correct recognition rate, and the translation data comprise translation result similarity indexes, word level matching degree and network packet loss rate during translation.
3. A concurrent translation system according to claim 2 wherein: the mass analysis model establishment comprises the following steps:
the quality coefficient zlx is obtained by comprehensively calculating the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation, and the calculation expression is as follows:
wherein zqy is a voice correct recognition rate, xsf is a translation result similarity index, jpf is word level matching degree, dbw is a network packet loss rate during translation, and α, β, γ, δ are respectively the voice correct recognition rate, the translation result similarity index, the word level matching degree, and the proportionality coefficient of the network packet loss rate during translation, and α, β, γ, δ are all greater than 0;
after the value of the quality coefficient zlx is obtained, the value of the quality coefficient zlx is compared with a quality threshold value, and the establishment of a quality analysis model is completed.
4. A concurrent translation system according to claim 3, wherein: after the quality evaluation module obtains the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation, the quality evaluation module analyzes the voice correct recognition rate, the translation result similarity index, the word level matching degree and the network packet loss rate during translation based on the quality analysis model, and the evaluation on whether the current translation quality is qualified comprises the following steps:
substituting the correct recognition rate of the voice, the similarity index of the translation result, the word level matching degree and the network packet loss rate during translation into a quality coefficient calculation formula to calculate and obtain a quality coefficient zlx value;
if the quality coefficient zlx value is more than or equal to the quality threshold value, evaluating that the current translation quality is qualified;
and if the quality coefficient zlx value is less than the quality threshold value, evaluating that the current translation quality is unqualified.
5. The concurrent translation system according to claim 4, wherein: the calculation expression of the voice correct recognition rate is as follows:
in the formula, zq voice correctly recognizes the word number, lj is the voice understanding recognition word number, and cw is the voice unrecognizable word number.
6. The concurrent translation system according to claim 5, wherein: the calculation expression of the translation result similarity index is as follows:
in the method, in the process of the application,representing the ith n-gram, + in candidate text c>Represents the j-th n-gram in the reference text,representation->And->Between (a) and (b)Similarity, typically calculated using BLEU or other similarity measure between N-grams, M is the number of reference texts and N is the total N-gram number in the candidate texts.
7. The concurrent translation system according to claim 6, wherein: the word level matching degree is calculated by the following expression:
jpf=(1-τ)*P+τ*R*F
where P represents the exact match rate, R represents the recall rate, F represents the F1 score, and τ is a parameter that balances the exact match rate and the recall rate.
8. The concurrent translation system according to claim 7, wherein: the calculation expression of the F1 score F is as follows:
p represents an exact match rate, R represents a recall rate, the exact match rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the machine translation result, and the recall rate represents a ratio of the number of correctly matched words in the machine translation result to the total number of words in the reference translation.
9. The concurrent translation system according to claim 8, wherein: the calculation expression of the network packet loss rate during translation is as follows:
where dsb is the number of packets lost and zfb is the total number of packets sent.
10. The simultaneous transmission translation terminal is characterized in that: a concurrent translation system as claimed in any one of claims 1 to 9.
CN202311024945.2A 2023-08-15 2023-08-15 Concurrent translation system and concurrent translation terminal Pending CN116933806A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311024945.2A CN116933806A (en) 2023-08-15 2023-08-15 Concurrent translation system and concurrent translation terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311024945.2A CN116933806A (en) 2023-08-15 2023-08-15 Concurrent translation system and concurrent translation terminal

Publications (1)

Publication Number Publication Date
CN116933806A true CN116933806A (en) 2023-10-24

Family

ID=88375390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311024945.2A Pending CN116933806A (en) 2023-08-15 2023-08-15 Concurrent translation system and concurrent translation terminal

Country Status (1)

Country Link
CN (1) CN116933806A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117275455A (en) * 2023-11-22 2023-12-22 深圳市阳日电子有限公司 Sound cloning method for translation earphone

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117275455A (en) * 2023-11-22 2023-12-22 深圳市阳日电子有限公司 Sound cloning method for translation earphone
CN117275455B (en) * 2023-11-22 2024-02-13 深圳市阳日电子有限公司 Sound cloning method for translation earphone

Similar Documents

Publication Publication Date Title
US20200226327A1 (en) System and method for direct speech translation system
KR102525209B1 (en) Simultaneous interpretation system for generating a synthesized voice similar to the native talker's voice and method thereof
WO2019165748A1 (en) Speech translation method and apparatus
KR20220004737A (en) Multilingual speech synthesis and cross-language speech replication
CN113439301A (en) Reconciling between analog data and speech recognition output using sequence-to-sequence mapping
US11093110B1 (en) Messaging feedback mechanism
JP2001100781A (en) Method and device for voice processing and recording medium
JP6471074B2 (en) Machine translation apparatus, method and program
KR102450823B1 (en) User-customized interpretation apparatus and method
KR20030076686A (en) Hierarchical Language Model
KR102062524B1 (en) Voice recognition and translation method and, apparatus and server therefor
US11295730B1 (en) Using phonetic variants in a local context to improve natural language understanding
KR20230086737A (en) Cascade Encoders for Simplified Streaming and Non-Streaming Speech Recognition
CN116933806A (en) Concurrent translation system and concurrent translation terminal
CA3160315C (en) Real-time speech-to-speech generation (rssg) apparatus, method and a system therefore
TWI467566B (en) Polyglot speech synthesis method
US9218807B2 (en) Calibration of a speech recognition engine using validated text
KR20240024960A (en) Robust direct speech-to-speech translation
JP6580281B1 (en) Translation apparatus, translation method, and translation program
CN112420050A (en) Voice recognition method and device and electronic equipment
KR20210123545A (en) Method and apparatus for conversation service based on user feedback
WO2022249362A1 (en) Speech synthesis to convert text into synthesized speech
CN114519358A (en) Translation quality evaluation method and device, electronic equipment and storage medium
CN104756183B (en) In the record correction of intelligent Chinese speech dictation ambiguous characters are effectively inputted using character describer
JP2001117752A (en) Information processor, information processing method and recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination