CN111368556B

CN111368556B - Performance determination method and confidence determination method and device of translation model

Info

Publication number: CN111368556B
Application number: CN202010148193.0A
Authority: CN
Inventors: 涂兆鹏; 史树明
Original assignee: Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2024-03-26
Anticipated expiration: 2040-03-05
Also published as: CN111368556A

Abstract

The application discloses a performance determining method, a confidence determining method, a device, equipment and a storage medium of a translation model, and belongs to the technical field of artificial intelligence. The method comprises the steps of obtaining a translation output by a translation model and confidence degrees of all words in the translation, wherein the confidence degrees are used for indicating the probability that the corresponding words are correct translation results; based on the value of the confidence, dividing each confidence into a plurality of groups; determining the average confidence coefficient corresponding to each group of confidence coefficients and the average accuracy of words corresponding to each group of confidence coefficients; a confidence error of the translation model is determined based on the average confidence and the average accuracy of each set of confidence levels, the confidence error being indicative of performance of the translation model. By the technical scheme, the confidence coefficient error of the translation model can be accurately obtained, the performance of the translation model can be more accurately analyzed based on the confidence coefficient error, and the translation model can be conveniently improved.

Description

Performance determination method and confidence determination method and device of translation model

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, a device, equipment, and a storage medium for determining performance and confidence of a translation model.

Background

With the development of artificial intelligence technology, the application of the deep neural network is more and more diversified, for example, in a computer aided translation scene, a machine translation model can be constructed based on the deep neural network, and the machine translation model processes an input text to obtain a translation corresponding to the input text and the confidence of each phrase in the translation, wherein the confidence can be used for representing the probability that one phrase is a correct translation result. When the confidence of the machine translation model on a certain phrase is low, the information can be fed back to the translator, so that the translator can accurately position the phrase translated inaccurately in the translation, and the translation is edited again.

However, in practical applications, there may be an error in the confidence coefficient output by the model, and this error may affect the accuracy of the output result of the model, so it is difficult for a developer to accurately grasp the performance of the model, and further it is difficult to improve the model, so how to determine the confidence coefficient error of the model is an important research direction.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for determining the performance of a translation model, wherein the confidence error of the translation model can be obtained, so that the performance of the model is determined. The technical scheme is as follows:

In one aspect, a method for determining performance of a translation model is provided, the method comprising:

obtaining a translation output by a translation model and confidence degrees of all words in the translation, wherein the confidence degrees are used for indicating the probability that the corresponding words are correct translation results;

based on the value of the confidence, dividing each confidence into a plurality of groups;

determining the average confidence coefficient corresponding to each group of confidence coefficients and the average accuracy of words corresponding to each group of confidence coefficients;

based on the average confidence and the average accuracy of the respective sets of confidence, a confidence error of the translation model is determined, the confidence error being indicative of performance of the translation model.

In one aspect, a method for determining confidence of a translation model is provided, the method comprising:

acquiring initial confidence coefficient of each word in the translation output by the translation model, wherein the initial confidence coefficient is used for indicating the probability that the word output by the translation model is a correct translation result;

determining a smoothing factor corresponding to each initial confidence coefficient based on the value of each initial confidence coefficient, wherein the smoothing factor is used for adjusting the value of each initial confidence coefficient;

and determining the target confidence corresponding to each word based on the value of each initial confidence and the smoothing factor corresponding to each initial confidence.

In one aspect, there is provided a performance determining apparatus of a translation model, the apparatus comprising:

the obtaining module is used for obtaining the translation output by the translation model and the confidence coefficient of each word in the translation, wherein the confidence coefficient is used for indicating the probability that the corresponding word is a correct translation result;

a grouping module, configured to divide each confidence level into a plurality of groups based on the value of the confidence level;

the determining module is used for determining the average confidence coefficient corresponding to each group of confidence coefficient and the average accuracy of the word corresponding to each group of confidence coefficient; based on the average confidence and the average accuracy of the respective sets of confidence, a confidence error of the translation model is determined, the confidence error being indicative of performance of the translation model.

In one aspect, there is provided a confidence determining apparatus for a translation model, the apparatus comprising:

the obtaining module is used for obtaining initial confidence coefficient of each word in the translation output by the translation model, wherein the initial confidence coefficient is used for indicating probability that the word output by the translation model is a correct translation result;

the determining module is used for determining a smoothing factor corresponding to each initial confidence coefficient based on the value of each initial confidence coefficient, and the smoothing factor is used for adjusting the value of each initial confidence coefficient; and determining the target confidence corresponding to each word based on the value of each initial confidence and the smoothing factor corresponding to each initial confidence.

In one possible implementation, the determining module is configured to:

determining target intervals to which the values of the initial confidence coefficients belong, wherein different target intervals correspond to different smoothing factors;

and determining a smoothing factor corresponding to each initial confidence coefficient based on the target interval to which the value of each initial confidence coefficient belongs.

In one aspect, a computer device is provided that includes one or more processors and one or more memories having stored therein at least one piece of program code that is loaded and executed by the one or more processors to implement the method of performance determination of a translation model and the operations performed by the confidence determination of a translation model.

In one aspect, a computer readable storage medium having stored therein at least one program code loaded and executed by a processor to perform the operations performed by a performance determination method of a translation model and a confidence determination of a translation model is provided.

According to the technical scheme provided by the embodiment of the application, the translation output by the translation model and the confidence coefficient of each word in the translation are obtained, and the confidence coefficient is used for indicating the probability that the corresponding word is a correct translation result; based on the value of the confidence, dividing each confidence into a plurality of groups; determining the average confidence coefficient corresponding to each group of confidence coefficients and the average accuracy of words corresponding to each group of confidence coefficients; a confidence error of the translation model is determined based on the average confidence and the average accuracy of each set of confidence levels, the confidence error being indicative of performance of the translation model. By the technical scheme, the confidence coefficient error of the translation model can be accurately obtained, the performance of the translation model can be more accurately analyzed based on the confidence coefficient error, and the translation model can be conveniently improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an implementation environment of a method for determining performance of a translation model according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for determining performance of a translation model provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of a translation annotation according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for determining confidence of a translation model according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a device for determining performance of a translation model according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a confidence determining device for a translation model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms "first," "second," and the like in this application are used to distinguish between identical or similar items that have substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the "first," "second," and "nth" terms, nor is it limited to the number or order of execution.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions. The embodiment of the application relates to a natural language processing technology and a deep learning technology.

Among them, natural language processing (Nature Language Processing, NLP) is a study of various theories and methods that enable efficient communication between humans and computers in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like. The embodiment of the application mainly relates to a machine translation technology in natural language processing, and the technical scheme provided by the embodiment of the application is applied to determine the performance of a translation model.

The terms referred to in this application are explained below:

BLEU (Bilingual Evaluation Understudy, bilingual evaluation alternative): is a standard method for machine translation evaluation. BLUE can be used to indicate the model performance of the translation model, with a larger BLEU value indicating better model performance of the translation model and better model performance. The BLEU of the translation model may be determined by using a matching rule of N-gram (multiple precision score), i.e. comparing the occurrence frequency of N groups of words between the translation output by the translation model and the correct translation, where N is a positive integer. The specific calculation method of the BLEU in the embodiment of the present application is not limited.

ECE (Expected Calibration Error ): the confidence coefficient error is a quantization index of the error of the model confidence coefficient calibration, and the smaller the ECE value is, the better the performance of the model is.

Fig. 1 is a schematic diagram of an implementation environment of a method for determining performance of a translation model according to an embodiment of the present application, and referring to fig. 1, the implementation environment may include a server 101 and a terminal 102. The server 101 may have a translation model deployed therein that can convert one natural language to another. The server 101 may include at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. The terminal 102 may be a terminal used by a developer, and the terminal may obtain a translation output by the translation model and a confidence level corresponding to each word in the translation, calculate a confidence error of the translation model, and further determine performance of the translation model. Of course, the step of calculating the confidence error may also be performed by the server 101, which is not limited in the embodiment of the present application. The terminal 101 may be a smart phone, tablet computer, e-book reader, MP3 (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer 3) player, MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compression standard audio layer 4) player, laptop portable computer, desktop computer, etc.

The terminals and the servers can be connected through a wired network or a wireless network, so that data interaction can be performed between the terminals and the servers.

Those skilled in the art will recognize that the number of terminals may be greater or lesser. Such as the above-mentioned terminals may be only one, or the above-mentioned terminals may be several tens or hundreds, or more. The embodiment of the present disclosure does not limit the number of terminals and the type of devices.

Fig. 2 is a flowchart of a method for determining performance of a translation model according to an embodiment of the present application. The method may be applied to the above terminal or the server, and both the terminal and the server may be regarded as a computer device, so in the embodiment of the present application, the performance determining method of the translation model is described with the computer device as an execution body, referring to fig. 2, and this embodiment may specifically include the following steps:

201. the computer device obtains the translation output by the translation model and the confidence of each word in the translation.

The translation model may be a model constructed based on a deep neural network, for example, the translation model may be an NMT (Neural Machine Translation, neural network machine translation) model, and in this embodiment of the present application, the translation model may translate based on input text, images, voices, and the like, and output a translation and a confidence level corresponding to each word in the translation, where the confidence level may be used to indicate a probability that the corresponding word is a correct translation result. It should be noted that, the specific structure of the translation model is not limited in the embodiment of the present application. In the embodiment of the application, the word may refer to the smallest sentence-making unit composed of morphemes, for example, may be a word in chinese, and the translation output by the translation model may be composed of a plurality of words.

In this embodiment of the present application, the translation model may be deployed in a server, and after the translation model completes one translation, the server may send an output result of the translation model to the computer device, where the output result includes a translation and a confidence level corresponding to each word in the translation, and the computer device executes the following confidence level error determining step based on the output result of one translation task. Of course, the server may also perform the step of sending the translation result based on the target period, that is, sending the output result of the translation model in one target period to the computer device, where the computer device performs the following confidence error determining step based on the output result generated by multiple translation tasks, which is not limited in detail in the embodiment of the present application. The target period may be set by a developer, which is not limited in the embodiments of the present application.

202. The computer device classifies the respective confidence levels into a plurality of groups based on the value of the confidence level.

In one possible implementation, the computer device may divide the range of values of the confidence level equally into a plurality of sub-ranges; and based on the values of the confidence coefficients, classifying the confidence coefficients with the values within the same sub-range into a group. For example, the confidence level is typically in the range of 0,1 ]The computer device may divide the range of values into M sub-ranges on average, with confidence levels of values of [0,1/M ] being the first set of confidence levels, denoted B ₁ Taking the confidence coefficient with the value of [1/M, 2/M) as the confidence coefficient of the second group, and marking the confidence coefficient as B ₂ And so on, obtaining M groups of confidence degrees { B } ₁ ，B ₂ ，…，B _m }. Wherein M is a positive integer, and specific values thereof may be set by a developer, which is not limited in the embodiment of the present application. In this embodiment of the present application, the computer device may dynamically determine the value range of each confidence level output by the translation model based on the value of the confidence level, that is, based on the confidence level with the largest value and the confidence level with the smallest value, for example, when the confidence level with the largest value output by the translation model is 0.9 and the confidence level with the smallest value is 0.4, the computer device may determine the value range as [0.4,0.9 ]]Based on [0.4,0.9 ]]This range of values performs the step of confidence grouping described above. The method for dynamically determining the value range based on the confidence coefficient can avoid the situation that the confidence coefficient is not contained in the grouping, reduce the number of the grouping, and further reduce the operand when the confidence coefficient error is calculated based on the confidence coefficient grouping.

203. The computer device determines an average confidence level for each set of confidence levels and an average accuracy rate for words for each set of confidence levels.

The average confidence coefficient is the average value of the values of all the groups of confidence coefficient, and the average value can reflect the accuracy of the word from the model prediction dimension; the average accuracy is the average value of the actual accuracy of each word, and can be used for measuring whether the confidence coefficient output by the translation model is accurate or not.

In the embodiment of the application, the computer device may determine the average confidence coefficient corresponding to each set of confidence coefficients based on the number of confidence coefficients included in each set of confidence coefficients and the value of each confidence coefficient. That is, the values of the respective confidence levels in the set of confidence levels are summed and averaged.

In the embodiment of the application, the computer device may determine the average accuracy corresponding to each set of confidence degrees based on the word corresponding to each set of confidence degrees and the correct translation result. In one possible implementation manner, the method for determining the average accuracy rate specifically may include the following steps:

first, the computer device may compare each word with each correct translation result, if any word is the same as the correct translation result, add a first tag to any word, and if any word is different from the correct translation result, add a second tag to any word. In one possible implementation manner, the computer device may label each word in the translation based on a TER (Translation Edit Rate, translation editing rate) tool, referring to fig. 3, fig. 3 is a schematic diagram of labeling a translation provided in the embodiment of the present application, where the TER tool may label each word in the translation 302 based on a correct translation result 301 by using four kinds of labels of C (correct), S (incorrect), I (excessive translation) and D (missed translation), and may use a C label as the first label and an S label as the second label, and of course, may use S, I, D labels as the second label, which is not limited in this embodiment of the present application. The above description of the method for adding the tag to each word is merely an exemplary description, and the embodiment of the present application is not limited to what kind of tag adding method is specifically used.

The computer device may then determine a number of first tags and a number of second tags corresponding to each set of confidence levels based on the tags of each word corresponding to each set of confidence levels, and determine a total number of tags corresponding to each set of confidence levels based on the number of first tags and the number of second tags.

Finally, the computer device may take as the average accuracy the ratio of the number of the first tags to the total number of tags, i.e. the ratio of the number of words for which the translation is correct to the total number of words.

In the embodiment of the application, whether the word translation is wrong or not is indicated by adding different types of labels to each word, and the average accuracy corresponding to each group of confidence coefficient can be accurately obtained based on the number of the labels of each type, so that the actual translation accuracy of the translation model can be determined. It should be noted that the above description of the method for determining the average accuracy is merely an exemplary description, and the embodiment of the present application does not limit what method is specifically adopted to determine the average accuracy corresponding to each set of confidence coefficients.

In the technical scheme, in the reasoning stage of the translation model, namely in the process that the translation model carries out forward operation on the input text to obtain the translation, the average confidence coefficient and the average accuracy of the translation model are obtained, and the performance of the model can be measured based on the accuracy of model prediction and the data of two dimensions of the true translation accuracy.

204. The computer device determines a confidence error for the translation model based on the average confidence and the average accuracy for each set of confidence.

Wherein the confidence error may be used to indicate the performance of the translation model, the smaller the value of the confidence error, the better the performance of the translation model.

In the embodiment of the application, the computer device may determine, as the intermediate error, an absolute value of a difference between the average confidence level and the average accuracy of each set of confidence levels; and carrying out weighted average on each intermediate error based on the number of the confidence coefficient contained in each group of confidence coefficient to obtain the confidence coefficient error of the translation model. In one possible implementation, the method for calculating the confidence error may be expressed as the following formula (1):

wherein ECE can represent confidence errors of the translation model, M represents M groups of confidence coefficients to participate in operation, M can represent numbers of the confidence coefficients of each group, and B _m Represents the m-th group confidence, |B _m The "I" represents the number of samples included in the M-th set of confidence levels, i.e., the confidence level, N represents the total number of samples included in the M-th set of confidence levels, i.e., the confidence level, acc (B) _m ) Representation B _m Average accuracy of (c) (conf (B) _m ) Representation B _m Is a mean confidence level of (c).

In embodiments of the present application, the performance of the model may be determined based on confidence errors of the translation model, including interpretability, understanding capabilities, etc. of the translation model. For example, when the confidence error of the translation model on a certain article or a certain phrase is low, it can be determined that the word and the grammar structure contained in the certain article or the certain phrase are simpler for the translation model; when the confidence error of the translation model on a certain article or a certain phrase is high, it can be determined that the word and the grammar structure contained in the certain article or the certain phrase are difficult for the translation model, and a developer can improve the parameters, the model structure and the like of the translation model based on the confidence error.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.

In the embodiment of the application, based on the technical scheme, the confidence errors of the translation model in the translation tasks of the English translation day, the English translation German and the Chinese translation English translation can be determined, and further the correlation value between the model expression of the translation model and the confidence errors can be determined. Typically, this is the case. The smaller the confidence error of the translation model, the better the model performs, and therefore, the correlation value between the confidence error and the model performs is negative, and the smaller the correlation value, the better the performance of the translation model. Referring to Table 1, table 1 shows the relevance values of the translation model in the training phase and the reasoning phase.

TABLE 1

Correlation value	English translation day	Indonesia and interpretation	Chinese-translated English	Average value of
					Training phase	-0.10	-0.23	-0.44	-0.26
Inference phase	-0.57	-0.80	-0.85	-0.74

From the data in table 1, it can be seen that the correlation value between the confidence error of the translation model in the reasoning stage and the model performance is smaller, and the data obtained by the translation model in the reasoning stage can better perform the performance of the model, so that the calculation of the confidence error in the reasoning stage is of great significance to the evaluation of the model performance.

In the embodiment of the application, the words in the translated text can be further classified into a better calibration (well-calibrated) type and a worse calibration (mis-calibrated) type based on the confidence errors corresponding to the words. For example, words with confidence errors greater than a target threshold may be categorized into a poor calibration category, and words with confidence errors less than a target threshold may be categorized into a good calibration category, where the value of the target threshold may be set by a developer, which is not limited in the embodiments of the present application. In the embodiment of the application, the computer equipment can determine the correlation values between each correct sample and each error sample in the translation and the better calibration and the worse calibration respectively so as to analyze the translation model. Referring to table 2, table 2 shows correlation values between the output correct sample and the output error sample and the better calibration and the worse calibration respectively in three different translation tasks of the translation model.

TABLE 2

From the data in Table 2, it can be seen that the correlation between the correct sample and the better calibration is higher, and the correlation between the wrong sample and the worse calibration is higher, in three different translation tasks. Based on statistics in table 2, it can be known that applying the above technical solution, a confidence error with finer granularity is obtained by taking words as units, which is helpful for better analysis of the translation model.

In the embodiment of the application, the relevance value between different translation error types and the confidence errors can be determined based on the confidence errors of the words and the labels corresponding to the words. Referring to Table 3, table 3 shows the correlation values between different translation error types and confidence errors in three different translation tasks.

TABLE 3 Table 3

From the data in table 3, it can be seen that in different translation tasks, the confidence level output by the translation model is generally higher than the actual accuracy for the problems of over-translation and wrong translation types, and lower than the actual accuracy for the problems of missed translation types. Therefore, the confidence errors are determined by applying the technical scheme provided by the embodiment of the application, so that developers can conveniently analyze and improve the model aiming at different error types.

It should be noted that, the technical solution provided in the embodiment of the present application may also be applied to other models capable of executing classification tasks, and in the embodiment of the present application, only the determination of a translation model is illustrated as an example.

The above embodiment mainly introduces obtaining the confidence errors of the translation model, and further analyzing the performance of the translation model based on the confidence errors, and in this embodiment of the present application, the computer device may further apply a segmented label smoothing method (graduate label smoothing) to calibrate the confidence of each word in the translation to obtain a target confidence corresponding to each word, where the target confidence is closer to the true translation accuracy. Referring to fig. 4, fig. 4 is a flowchart of a method for determining confidence of a translation model according to an embodiment of the present application, and in one possible implementation, the method may specifically include the following steps:

401. the computer device obtains an initial confidence level for each word in the translation output by the translation model.

Wherein the initial confidence level may be used to indicate the probability that the word output by the translation model is the correct translation result, i.e., the probability predicted by the translation model.

402. The computer device determines a smoothing factor corresponding to each initial confidence level based on the value of each initial confidence level.

Wherein the smoothing factor may be used to adjust the value of each of the initial confidence levels.

In one possible implementation, the computer device may determine a target interval to which each of the initial confidence values belongs, different target intervals corresponding to different smoothing factors; and determining a smoothing factor corresponding to each initial confidence coefficient based on the target interval to which the value of each initial confidence coefficient belongs. The values of each target interval and the smoothing factor corresponding to each target interval may be set by a developer, which is not limited in the embodiment of the present application. For example, a smoothing factor of 0.3 may be applied for samples with initial confidence above 0.7, a smoothing factor of 0.1 may be applied for samples with initial confidence between 0.3 and 0.7, and a smoothing factor of 0 may be applied for samples with initial confidence below 0.3.

403. The computer device determines a target confidence level corresponding to each word based on the value of each initial confidence level and the smoothing factor corresponding to each initial confidence level.

In one possible implementation, the computer device may add each initial confidence level to its corresponding smoothing factor to obtain the target confidence level. The target confidence level can more accurately represent the probability that the word output by the model is a correct translation result.

In the embodiment of the application, based on the sectional type label smoothing method, different smoothing factors are applied to the initial confidence coefficient belonging to different value intervals to obtain the target confidence coefficient, and the accuracy of the translation model output result can be improved through the confidence coefficient calibration method.

By applying the sectional type label smoothing method, the confidence error of the translation model in the reasoning stage can be obviously reduced. Referring to table 4, table 4 shows the performance of the translation model, including model performance (BLEU) values of the translation model, confidence errors, probability of output confidence being higher than the time accuracy, and probability of output confidence being lower than the actual accuracy, in different search spaces of different sizes and different translation tasks.

TABLE 4 Table 4

According to the data in table 4, the above technical scheme is applied to adjust the initial confidence coefficient output by the translation model, so that the confidence coefficient error of the translation model in the reasoning stage can be effectively reduced, the model performance of the translation model can be improved, and the quality of the translation output by the translation model can be improved.

In this embodiment of the present application, after determining the target confidence levels corresponding to the words in the translation, the computer device may display the target confidence levels and modify the words in the translation based on the edit instruction of the translation. For example, the computer device may display the translation on the target page, display the target confidence corresponding to each word in the translation in the lower region of each word, and re-edit the word with lower target confidence. The target page may be a translation editing page, etc., which is not limited in the embodiment of the present application. In one possible implementation, the computer device may also highlight a word with a lower target confidence, for example, when the target confidence corresponding to any word is less than the confidence threshold, the computer device may display the any word as a target color to prompt the user to re-edit the content therein. The confidence threshold and the target color may be set by a developer, which is not limited in the embodiment of the present application. By the technical scheme, the target confidence corresponding to each word is displayed, so that the translation accuracy of each part of content in the translation is conveniently interpreted by a user, the word with lower target confidence is highlighted, the user can conveniently locate the part with lower translation accuracy in the translation, and the user can conveniently modify the translation.

Fig. 5 is a schematic structural diagram of a device for determining performance of a translation model according to an embodiment of the present application, referring to fig. 5, the device includes:

the obtaining module 501 is configured to obtain a translation output by the translation model and a confidence coefficient of each word in the translation, where the confidence coefficient is used to indicate a probability that a word corresponding to the confidence coefficient is a correct translation result;

a grouping module 502, configured to divide each confidence level into a plurality of groups based on the value of the confidence level;

a determining module 503, configured to determine an average confidence coefficient corresponding to each set of confidence coefficients and an average accuracy rate of a word corresponding to each set of confidence coefficients; based on the average confidence and the average accuracy of the respective sets of confidence, a confidence error of the translation model is determined, the confidence error being indicative of performance of the translation model.

In one possible implementation, the grouping module 502 is configured to:

dividing the value range of the confidence into a plurality of sub-ranges on average;

based on the value of each confidence, the confidence with the value within the same sub-range is divided into a group.

In one possible implementation, the determining module 503 is configured to:

determining the average confidence corresponding to each set of confidence based on the number of the confidence contained in each set of confidence and the value of each confidence;

And determining the average accuracy corresponding to each set of confidence based on the word corresponding to each set of confidence and the correct translation result.

In one possible implementation, the determining module 503 is configured to:

comparing each word with each correct translation result;

if any word is the same as the correct translation result, adding a first label for the any word;

if the word is different from the correct translation result, adding a second label to the word;

determining the total number of the labels corresponding to each group of confidence degrees based on the number of the first labels and the number of the second labels;

and taking the ratio of the number of the first tags to the total number of the tags as the average accuracy.

In one possible implementation, the determining module 503 is configured to:

determining the absolute value of the difference between the average confidence and the average accuracy of the confidence of each group as an intermediate error;

and carrying out weighted average on each intermediate error based on the number of the confidence coefficient contained in each group of confidence coefficient to obtain the confidence coefficient error of the translation model.

According to the device provided by the embodiment of the application, the translation output by the translation model and the confidence coefficient of each word in the translation are obtained, and the confidence coefficient is used for indicating the probability that the corresponding word is a correct translation result; based on the value of the confidence, dividing each confidence into a plurality of groups; determining the average confidence coefficient corresponding to each group of confidence coefficients and the average accuracy of words corresponding to each group of confidence coefficients; a confidence error of the translation model is determined based on the average confidence and the average accuracy of each set of confidence levels, the confidence error being indicative of performance of the translation model. By the device, the confidence coefficient error of the translation model can be accurately obtained, the performance of the translation model can be more accurately analyzed based on the confidence coefficient error, and the translation model can be conveniently improved.

It should be noted that: the device for determining the performance of the translation model provided in the above embodiment is only exemplified by the division of the above functional modules when determining the performance of the translation model, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the functions described above. In addition, the device for determining the performance of the translation model provided in the above embodiment belongs to the same concept as the embodiment of the method for determining the performance of the translation model, and detailed implementation processes of the device are shown in the method embodiment, and are not repeated here.

Fig. 6 is a schematic structural diagram of a confidence determining device for a translation model according to an embodiment of the present application, referring to fig. 6, the device includes:

the obtaining module 601 is configured to obtain an initial confidence coefficient of each word in the translation output by the translation model, where the initial confidence coefficient is used to indicate a probability that the word output by the translation model is a correct translation result;

a determining module 602, configured to determine, based on the value of each initial confidence coefficient, a smoothing factor corresponding to each initial confidence coefficient, where the smoothing factor is used to adjust the value of each initial confidence coefficient; and determining the target confidence corresponding to each word based on the value of each initial confidence and the smoothing factor corresponding to each initial confidence.

In one possible implementation, the smoothing factor determination module 602 is configured to:

It should be noted that: in the confidence determining apparatus for a translation model provided in the above embodiment, only the division of the above functional modules is used for illustration when determining the confidence, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above. In addition, the confidence determining device of the translation model provided in the above embodiment belongs to the same concept as the confidence determining method embodiment of the translation model, and detailed implementation process of the confidence determining device is referred to the method embodiment, and is not repeated here.

The computer device provided by the above technical solution may be implemented as a terminal or a server, for example, fig. 7 is a schematic structural diagram of a terminal provided in an embodiment of the present application. The terminal 700 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, the terminal 700 includes: one or more processors 701, and one or more memories 702.

Processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 701 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 701 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 701 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 701 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. The memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one program code for execution by processor 701 to implement the method for determining the performance of a translation model and the method for determining the confidence of a translation model provided by the method embodiments in the present application.

In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 703 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, a display 705, a camera assembly 706, audio circuitry 707, and a power supply 708.

A peripheral interface 703 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 701 and memory 702. In some embodiments, the processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 704 is configured to receive and transmit RF (Radio Frequency) signals, also referred to as electromagnetic signals. The radio frequency circuitry 704 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 704 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 704 may also include NFC (Near Field Communication ) related circuitry, which is not limited in this application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 705 is a touch display, the display 705 also has the ability to collect touch signals at or above the surface of the display 705. The touch signal may be input to the processor 701 as a control signal for processing. At this time, the display 705 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 705 may be one, providing a front panel of the terminal 700; in other embodiments, the display 705 may be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in some embodiments, the display 705 may be a flexible display disposed on a curved surface or a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The display 705 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 706 is used to capture images or video. Optionally, the camera assembly 706 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing, or inputting the electric signals to the radio frequency circuit 704 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 707 may also include a headphone jack.

The power supply 708 is used to power the various components in the terminal 700. The power source 708 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 708 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 700 further includes one or more sensors 709. The one or more sensors 709 include, but are not limited to: acceleration sensor 710, gyro sensor 711, pressure sensor 712, optical sensor 713, and proximity sensor 714.

The acceleration sensor 710 may detect the magnitudes of accelerations on three coordinate axes of a coordinate system established with the terminal 700. For example, the acceleration sensor 710 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 710. Acceleration sensor 710 may also be used for the acquisition of motion data of a game or user.

The gyro sensor 711 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 711 may collect a 3D motion of the user on the terminal 700 in cooperation with the acceleration sensor 710. The processor 701 may implement the following functions according to the data collected by the gyro sensor 711: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 712 may be disposed at a side frame of the terminal 700 and/or at a lower layer of the display screen 705. When the pressure sensor 712 is disposed at a side frame of the terminal 700, a grip signal of the user to the terminal 700 may be detected, and the processor 701 performs a left-right hand recognition or a shortcut operation according to the grip signal collected by the pressure sensor 712. When the pressure sensor 712 is disposed at the lower layer of the display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 713 is used to collect the intensity of ambient light. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 713. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 705 is turned up; when the ambient light intensity is low, the display brightness of the display screen 705 is turned down. In another embodiment, the processor 701 may also dynamically adjust the shooting parameters of the camera assembly 706 based on the ambient light intensity collected by the optical sensor 713.

A proximity sensor 714, also known as a distance sensor, is typically provided on the front panel of the terminal 700. The proximity sensor 714 is used to collect the distance between the user and the front of the terminal 700. In one embodiment, when the proximity sensor 714 detects that the distance between the user and the front of the terminal 700 gradually decreases, the processor 701 controls the display 705 to switch from the bright screen state to the off screen state; when the proximity sensor 714 detects that the distance between the user and the front surface of the terminal 700 gradually increases, the processor 701 controls the display screen 705 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 7 is not limiting of the terminal 700 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 800 may include one or more processors (Central Processing Units, CPU) 801 and one or more memories 802, where the one or more memories 802 store at least one program code, and the at least one program code is loaded and executed by the one or more processors 801 to implement the methods provided in the foregoing method embodiments. Of course, the server 800 may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, a computer readable storage medium, such as a memory, comprising at least one program code executable by a processor to perform the performance determination method of the translation model and the confidence determination method of the translation model in the above embodiments is also provided. For example, the computer readable storage medium may be Read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), compact disc Read-Only Memory (CD-ROM), magnetic tape, floppy disk, optical data storage device, and the like.

It will be appreciated by those of ordinary skill in the art that all or part of the steps of implementing the above-described embodiments may be implemented by hardware, or may be implemented by at least one piece of hardware associated with a program, where the program may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic disk or optical disk, etc.

The foregoing is illustrative of the present invention and is not to be construed as limiting thereof, but rather as providing for the use of various modifications, equivalents, improvements, etc. within the spirit and principles of the present application.

Claims

1. A method for determining performance of a translation model, the method comprising:

acquiring initial confidence coefficient of each word in the translation output by the translation model, wherein the initial confidence coefficient is used for indicating probability that the word output by the translation model is a correct translation result;

determining target intervals to which the values of the initial confidence coefficients belong, wherein different target intervals correspond to different smoothing factors, and the smoothing factors are used for adjusting the values of the initial confidence coefficients;

determining smoothing factors corresponding to the initial confidence coefficients based on target intervals to which the values of the initial confidence coefficients belong;

adding the value of each initial confidence coefficient and the smoothing factor corresponding to each initial confidence coefficient to obtain the confidence coefficient corresponding to each word, wherein the confidence coefficient is used for indicating the probability that the word corresponding to the confidence coefficient is a correct translation result;

determining a value range of the confidence coefficient based on the confidence coefficient with the maximum value and the confidence coefficient with the minimum value in the confidence coefficient of each word in the translation;

dividing each confidence into a plurality of groups based on the value range of the confidence;

determining an average confidence coefficient corresponding to each set of confidence coefficients and an average accuracy rate of a word corresponding to each set of confidence coefficients, wherein the average confidence coefficient is used for reflecting the accuracy rate of the word from a model prediction dimension, the average accuracy rate is determined by a label of the word corresponding to each set of confidence coefficients, the label is used for representing the translation condition of the corresponding word, the translation condition comprises correct translation, incorrect translation, over translation and missed translation, and the average accuracy rate is used for reflecting the accuracy rate of the word from a real condition;

Determining a confidence error of the translation model based on the average confidence and the average accuracy of the respective sets of confidence, the confidence error being indicative of performance of the translation model;

classifying the words with the confidence errors larger than the target threshold value into poor calibration categories, and classifying the words with the confidence errors smaller than the target threshold value into good calibration categories;

determining correlation values between each correct sample and each incorrect sample in the translation and the better calibration category and the worse calibration category respectively;

based on the correlation values, the performance of the translation model is analyzed.

2. The method of claim 1, wherein the grouping each of the confidence levels into a plurality of groups based on the range of values of the confidence levels comprises:

and classifying the confidence degrees with values within the same sub-range into a group based on the values of the confidence degrees.

3. The method of claim 1, wherein determining the average confidence level for each set of confidence levels and the average accuracy of the word for which each set of confidence levels corresponds comprises:

Determining the average confidence degrees corresponding to the confidence degrees of each group based on the number of the confidence degrees contained in the confidence degrees of each group and the value of each confidence degree;

and determining the average accuracy corresponding to each group of confidence coefficients based on the labels of the words corresponding to each group of confidence coefficients and the correct translation result.

4. The method of claim 3, wherein the determining the average accuracy rate for each set of confidence levels based on the labels of the words for which each set of confidence levels corresponds and the correct translation results comprises:

comparing each word with each correct translation result;

if any word is the same as the correct translation result, adding a first label to any word;

if the result of the correct translation is different from the result of the correct translation, adding a second label to the word;

5. The method of claim 1, wherein the determining a confidence error for the translation model based on the average confidence and the average accuracy for the respective set of confidence coefficients comprises:

Determining the absolute value of the difference between the average confidence coefficient and the average accuracy of each set of confidence coefficients as an intermediate error;

6. A performance determining apparatus for a translation model, the apparatus comprising:

the translation module is used for outputting translation results of the words, and the translation results are used for judging whether the translation results are correct or not; determining target intervals to which the values of the initial confidence coefficients belong, wherein different target intervals correspond to different smoothing factors, and the smoothing factors are used for adjusting the values of the initial confidence coefficients; determining smoothing factors corresponding to the initial confidence coefficients based on target intervals to which the values of the initial confidence coefficients belong; adding the value of each initial confidence coefficient and the smoothing factor corresponding to each initial confidence coefficient to obtain the confidence coefficient corresponding to each word, wherein the confidence coefficient is used for indicating the probability that the word corresponding to the confidence coefficient is a correct translation result;

The grouping module is used for determining a value range of the confidence coefficient based on the confidence coefficient with the maximum value and the confidence coefficient with the minimum value in the confidence coefficient of each word in the translation; dividing each confidence into a plurality of groups based on the value range of the confidence;

the device comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining average confidence degrees corresponding to all groups of confidence degrees and average accuracy rates of words corresponding to all groups of confidence degrees, the average confidence degrees are used for reflecting the accuracy rates of the words from model prediction dimensions, the average accuracy rates are determined by labels of the words corresponding to all groups of confidence degrees, the labels are used for representing translation conditions of the corresponding words, the translation conditions comprise correct translation, incorrect translation, excessive translation and missed translation, and the average accuracy rates are used for reflecting the accuracy rates of the words from real conditions; determining a confidence error of the translation model based on the average confidence and the average accuracy of the respective sets of confidence, the confidence error being indicative of performance of the translation model; classifying the words with the confidence errors larger than the target threshold value into poor calibration categories, and classifying the words with the confidence errors smaller than the target threshold value into good calibration categories; determining correlation values between each correct sample and each incorrect sample in the translation and the better calibration category and the worse calibration category respectively; based on the correlation values, the performance of the translation model is analyzed.

7. The apparatus of claim 6, wherein the grouping module is configured to:

8. The apparatus of claim 6, wherein the means for determining is configured to:

9. The apparatus of claim 8, wherein the determining module is configured to:

comparing each word with each correct translation result;

10. The apparatus of claim 6, wherein the means for determining is configured to:

11. A computer device comprising one or more processors and one or more memories, the one or more memories having stored therein at least one piece of program code that is loaded and executed by the one or more processors to implement the operations performed by the method of determining the performance of a translation model as claimed in any of claims 1 to 5.

12. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the operations performed by the method for determining the performance of a translation model according to any one of claims 1 to 5.