CN113836946B

CN113836946B - Method, device, terminal and storage medium for training scoring model

Info

Publication number: CN113836946B
Application number: CN202111069233.3A
Authority: CN
Inventors: 徐金安; 黄辉; 狄慧; 刘健; 陈钰枫
Original assignee: Beijing Jiaotong University; Toshiba China Co Ltd
Current assignee: Beijing Jiaotong University; Toshiba China Co Ltd
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2023-11-14
Anticipated expiration: 2041-09-13
Also published as: CN113836946A

Abstract

The application discloses a method, a device, a terminal and a storage medium for training a scoring model, and belongs to the technical field of Internet. The method comprises the following steps: acquiring a sample original text, a first sample translation and at least one second sample translation, wherein the semantics of the first sample translation are the same as the semantics corresponding to the sample original text, and the semantics of the second sample translation are different from the semantics of the first sample translation; inputting the sample original text and the first sample translation into a scoring model to obtain a first sample score corresponding to the first sample translation, and inputting the sample original text and each second sample translation into the scoring model to obtain a second sample score corresponding to each second sample translation; determining loss information based on the first sample score and the at least one second sample score; the scoring model is adjusted based on the loss information. Therefore, the embodiment of the application solves the problem that the scoring model cannot be trained without the reference score corresponding to the sample translation.

Description

Method, device, terminal and storage medium for training scoring model

Technical Field

The present application relates to the field of internet technologies, and in particular, to a method, an apparatus, a terminal, and a storage medium for training a scoring model.

Background

With the development of science and technology, it is becoming particularly important to evaluate the translation of machine translation.

In the related art, an original text and a corresponding translation are input into a scoring model after training is completed, a score corresponding to the translation is obtained, and then the translation is evaluated based on the score. The training process of the scoring model after training is as follows: and acquiring a training sample set, wherein each training sample comprises a sample original text, a corresponding sample translation and a pre-marked reference score. And inputting the sample original text and the sample translated text in the training sample into a scoring model to obtain a prediction score corresponding to the sample original text. And training and adjusting the scoring model based on the prediction score and the reference score.

In the above process, the reference score is obtained by scoring the sample original text and the corresponding sample translation by the professional translator. Once the training samples lack the corresponding benchmark scores, the scoring model cannot be trained.

Disclosure of Invention

The embodiment of the application provides a method, a device, a terminal and a storage medium for training a scoring model, which solve the problem that the scoring model cannot be trained without a reference score corresponding to a sample translation. The technical scheme is as follows:

In a first aspect, an embodiment of the present application provides a method for training a scoring model, including:

acquiring a sample original text, a first sample translation and at least one second sample translation, wherein the semantics of the first sample translation are the same as those of the sample original text, and the semantics of the second sample translation are different from those of the first sample translation;

inputting the sample original text and the first sample translation into a scoring model to obtain a first sample score corresponding to the first sample translation, and inputting the sample original text and each second sample translation into the scoring model to obtain a second sample score corresponding to each second sample translation;

determining loss information based on the first sample score and at least one second sample score;

and adjusting the scoring model based on the loss information.

Optionally, before the obtaining the sample original text, the first sample translation and the at least one second sample translation, the method further includes:

acquiring a first text sample vector and a Gaussian noise vector corresponding to the first text translation;

adding the first sample vector and the Gaussian noise vector to obtain a first sample vector after noise addition;

And inputting the first sample vector after noise addition and the first sample vector into a pre-trained denoising self-encoder to obtain the second sample translation.

acquiring a first sample vector corresponding to the first sample text;

randomly destroying the first sample translation to obtain a destroyed first sample translation;

determining a second sample text vector corresponding to the corrupted first sample translation;

and inputting the first sample text vector and the second sample text vector into a pre-trained denoising self-encoder to obtain the second sample translation.

Optionally, the determining loss information based on the first sample score and at least one second sample score includes:

determining the loss information based on the first sample score, the at least one second sample score, and a first preset formula;

the first preset formula is l= Σ _x∈D -(p _x ×log(W _x ×h(x))+(1-p _x )×log(1-W _x ×h(x)))；

Wherein L is the loss information, D is a sample translation set consisting of the first sample translation and the at least one second sample translation, x is any one of the sample translations in the sample translation set D, h (x) is a score corresponding to the sample translation x, W _x For a preset coefficient, p _x To preset constant, p _x The numerical range of (1, 0).

determining the loss information based on the first sample score, the at least one second sample score, and a second preset formula;

the second preset formula is that

Wherein L is the loss information, D is a sample translation set composed of the first sample translation and the at least one second sample translation, s is the first sample translation, h(s) is a first sample fraction corresponding to the first sample translation, x is any sample translation in the sample translation set D, h (x) is a fraction corresponding to the sample translation x, and margin is a preset constant.

Optionally, the method further comprises:

and inputting the target original text and the target translation into a pre-trained scoring model to obtain a target score corresponding to the target translation.

Optionally, the scoring model includes a text preprocessing module, a feature extraction module, and a scoring module;

inputting the sample original text and the first sample translation into a scoring model to obtain a first sample score corresponding to the first sample translation, wherein the step of obtaining the first sample score comprises the following steps:

Inputting the sample original text and the first sample translated text into a text preprocessing module to obtain a sample character sequence;

inputting the sample character sequence into a feature extraction module to obtain sample feature information;

and inputting the sample characteristic information into a scoring module to obtain a first sample score corresponding to the first sample translation.

In a second aspect, an embodiment of the present application provides an apparatus for training a scoring model, where the apparatus includes:

the first acquisition module is configured to acquire a sample original text, a first sample translation and at least one second sample translation, wherein the semantics of the first sample translation are the same as those of the sample original text, and the semantics of the second sample translation are different from those of the first sample translation;

the input module is configured to input the sample original text and the first sample translation into a scoring model to obtain a first sample score corresponding to the first sample translation, and input the sample original text and each second sample translation into the scoring model to obtain a second sample score corresponding to each second sample translation;

a determining module configured to determine loss information based on the first sample score and at least one second sample score;

And the adjustment module is configured to adjust the scoring model based on the loss information.

Optionally, the apparatus further includes a second acquisition module configured to:

Optionally, the apparatus further includes a third acquisition module configured to:

acquiring a first sample vector corresponding to the first sample text;

Optionally, the determining module is configured to:

the second preset formula is that

Optionally, the apparatus further comprises a use module configured to:

the input module is configured to:

In a third aspect, an embodiment of the present application provides a terminal, where the terminal includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded and executed by the processor to implement the method for training a scoring model described above.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the method of training a scoring model described above.

In a fifth aspect, embodiments of the present application provide a computer program product or computer program comprising computer program code stored in a computer readable storage medium, the computer program code being read from the computer readable storage medium by a processor of a computer device, the computer program code being executed by the processor, causing the computer device to perform the method of training a scoring model as described above.

In the embodiment of the application, a first sample score corresponding to a first sample translation with the same semantic meaning as a sample original text and a second sample score corresponding to a second sample translation with different semantic meaning as the sample original text are obtained. Loss information is determined based on the first sample score and the second sample score, and the scoring model is adjusted based on the loss information. Therefore, the method and the device do not need to acquire the reference score of the sample translation, and solve the problem that the scoring model cannot be trained without the reference score corresponding to the sample translation in the prior art.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an implementation environment of a method for training a scoring model according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of training a scoring model provided by an embodiment of the present application;

FIG. 3A is a schematic diagram of a method for training a scoring model according to an embodiment of the present application;

FIG. 3B is a schematic diagram of a method for training a scoring model according to an embodiment of the present application;

FIG. 4A is a schematic diagram of a method for training a scoring model according to an embodiment of the present application;

FIG. 4B is a schematic diagram of a method for training a scoring model according to an embodiment of the present application;

FIG. 5A is a schematic diagram of a method for training a scoring model according to an embodiment of the present application;

FIG. 5B is a schematic diagram of a method for training a scoring model according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an apparatus for training scoring models according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of an implementation environment of a method for training a scoring model according to an embodiment of the present application. As shown in fig. 1, the method may be implemented by the terminal 101 or the server 102.

The terminal 101 may include a processor, memory, etc. The processor, which may be a CPU (Central Processing Unit ) or the like, may be configured to obtain a sample original text, a first sample translation and at least one second sample translation, input the sample original text and the first sample translation into a scoring model to obtain a first sample score corresponding to the first sample translation, input the sample original text and each second sample translation into the scoring model to obtain a second sample score corresponding to each second sample translation, determine loss information based on the first sample score and the at least one second sample score, and adjust the scoring model based on the loss information. Memory, which may be RAM (Random Access Memory ), flash (Flash memory) or the like, may be used to store the sample version, the first sample version, and the at least one second sample version, or the like. The terminal 101 may further include a transceiver, an image detection section, a screen, an audio output section, an audio input section, and the like. Wherein, the audio output component can be a sound box, a headset, and the like. The audio input means may be a microphone or the like.

The server 102 may include a processor, memory, etc. The processor, which may be a CPU (Central Processing Unit ) or the like, may be configured to obtain a sample original text, a first sample translation and at least one second sample translation, input the sample original text and the first sample translation into a scoring model to obtain a first sample score corresponding to the first sample translation, input the sample original text and each second sample translation into the scoring model to obtain a second sample score corresponding to each second sample translation, determine loss information based on the first sample score and the at least one second sample score, and adjust the scoring model based on the loss information. Memory, which may be RAM (Random Access Memory ), flash (Flash memory) or the like, may be used to store the sample version, the first sample version, and the at least one second sample version, or the like.

Fig. 2 is a flowchart of a method for training a scoring model according to an embodiment of the present application. Referring to fig. 2, this embodiment includes:

step 201, obtaining a sample original, a first sample translation and at least one second sample translation.

The semantics of the first sample translation are the same as the semantics corresponding to the sample original, and the semantics of the second sample translation are different from the semantics of the first sample translation, namely the semantics of the second sample translation are different from the semantics of the sample original. The sample text, the first sample version, the second sample version 1, and the second sample version 2 may be as shown in table 1 below:

TABLE 1

Alternatively, the second sample translation may be a translation semantically different from the first sample translation, but if the deviation between the second sample translation and the first sample translation is great, the trained scoring model may have a poor effect. Therefore, the deviation between the second sample translation and the first sample translation is smaller in the embodiment of the application. The embodiment of the application provides a plurality of methods for obtaining a second sample translation with smaller deviation from a first sample translation, and the specific methods for obtaining the second sample translation are as follows:

in the first method, a first text sample vector and a Gaussian noise vector corresponding to the first text translation are obtained. The first sample vector and the Gaussian noise vector are added to obtain a first sample vector after noise addition. The first sample vector after noise addition and the first sample vector are input into a pre-trained denoising self-encoder to obtain a second sample translation.

The text vector is in a vector form corresponding to the text, and the number of numerical value bits contained in the first text vector is the same as the number of numerical value bits contained in the Gaussian noise vector. The gaussian noise vector is a multidimensional gaussian distributed randomly generated noise vector with a mean value of 0 and a variance of 1, and the method for generating the noise vector is the prior art, and the embodiment of the application is not repeated.

In implementation, a vector form corresponding to the first sample translation is obtained through a word embedding algorithm, a first sample vector is obtained, and a Gaussian noise vector is randomly generated through the prior art. And adding the first sample vector and the Gaussian noise vector to obtain the first sample vector after noise addition. The first sample vector after noise addition and the first sample vector are input into a pre-trained denoising self-encoder to obtain a second sample translation.

The training process of the denoising self-encoder is as follows: and obtaining a sample text, and obtaining a vector form corresponding to the sample text, thereby obtaining a sample text vector corresponding to the sample text. And adding the sample text vector and the randomly generated Gaussian noise vector to obtain the sample text vector after noise addition. And inputting the sample text vector after noise addition and the sample text vector into a denoising self-encoder to obtain a predicted text. And obtaining loss information based on the predicted text and the sample text, and adjusting parameters of the denoising self-encoder based on the loss information to obtain the denoising self-encoder after parameter adjustment. And training and adjusting the denoised self-encoder subjected to parameter adjustment by using other sample texts until the first denoised self-encoder subjected to parameter adjustment converges, so as to obtain the pre-trained denoised self-encoder.

The specific structure of the denoising self-encoder is shown in fig. 3A, a first sample vector corresponding to the first sample translation and a gaussian noise vector generated randomly are obtained, and the first sample vector and the gaussian noise vector are added to obtain a first sample vector containing noise. The first sample vector containing noise and the first sample vector are input into a denoising self-encoder to obtain a second sample translation.

After the first sample vector containing noise is input to the denoising self-encoder, the denoising self-encoder performs linear mapping on the first sample vector containing noise to obtain a first vector after linear mapping. The first vector is input into a multi-head self-attention layer to obtain a second vector. And carrying out residual connection on the first vector and the second vector, and carrying out standardization processing on the vectors after residual connection to obtain a third vector. And inputting the third vector into the feedforward layer to obtain a fourth vector. And carrying out residual connection on the third vector and the fourth vector, and carrying out normalization processing on the vectors after residual connection to obtain a fifth vector. Meanwhile, after the first sample vector is input into the denoising self-encoder, the denoising encoder performs linear mapping on the first sample vector to obtain a sixth linear mapped vector. The sixth vector is input into the multi-headed self-attention layer of the mask to obtain a seventh vector. And carrying out residual connection on the sixth vector and the seventh vector, and carrying out normalization processing on the vectors after residual connection to obtain an eighth vector. And inputting the eighth vector and the fifth vector into a multi-head mutual attention layer to obtain a ninth vector. And carrying out residual connection on the eighth vector and the ninth vector, and carrying out normalization processing on the vectors after residual connection to obtain a tenth vector. The tenth vector is input to the feed forward layer to obtain an eleventh vector. And carrying out residual connection on the tenth vector and the eleventh vector, and carrying out normalization processing on the vectors after residual connection to obtain a twelfth vector. And inputting the twelfth vector into the linear layer, and performing Softmax processing to obtain a second sample translation.

The denoising self-encoder is a single encoder-single decoder architecture, and only one multi-head mutual attention layer is arranged in the denoising self-encoder, and is used for realizing information interaction between the encoder and the decoder. The above process involves a multi-head mutual attention layer, a feedforward layer, a masked multi-head self attention layer, a multi-head mutual attention layer, and a feedforward layer that are all neural networks.

In the actual use process, if the first sample translation is directly noisy based on the rule, the obtained second sample translation has the characteristics of obvious grammar errors, hard sentence patterns, poor diversity and the like, and the characteristics are easily captured by a neural network, so that the training of a scoring model is not facilitated. If the second sample translation constructed based on the translation model often has a fixed syntax mode, semantic errors cannot be guaranteed to occur, and training of the scoring model is also not facilitated.

In the embodiment of the application, in order to construct a second sample translation with less deviation between the semantics and the first sample translation, a first sample vector after noise addition is firstly obtained, and then the first sample vector after noise addition and the first sample vector are input into a pre-trained denoising self-encoder to obtain the second sample translation. Although the pre-trained denoising self-encoder is used for correcting the semantics of the first sample translation added with noise, in the actual use process, the pre-trained denoising self-encoder cannot completely correct the first sample translation after the noise is added, namely, the pre-trained denoising self-encoder can only remove part of noise in the first sample vector after the noise is added, and further, a second sample translation is obtained based on the first sample vector containing part of noise, and the reserved part of noise causes smaller deviation between the semantics of the first sample translation and the semantics of the second sample translation. The scoring model is trained based on the first sample translation and the second sample translation with smaller deviation from the semantics of the first sample translation, and the scoring model can capture more detailed features in the training process, so that the training effect is better.

In a second method, a first sample vector corresponding to the first sample is obtained. And randomly destroying the first sample translation to obtain the destroyed first sample translation. A second sample text vector corresponding to the first sample translation after the corruption is determined. The first sample text vector and the second sample text vector are input into a pre-trained denoising self-encoder, and a second sample translation is obtained.

In practice, the first sample translation is randomly corrupted to obtain a corrupted first sample translation. And acquiring a second sample text vector corresponding to the destroyed first sample translation. The first sample text vector and the second sample text vector are input into a pre-trained denoising self-encoder, and a second sample translation is obtained.

It should be noted that, the random disruption to the text includes random masking, random substitution, random deletion, random insertion, and the like. The random masking is to MASK a part of words randomly using a [ MASK ] code, the random substitution is to replace a part of words with other random words, the random deletion is to delete a part of words in the text, and the random insertion is to insert random words at random positions. The method for randomly destroying the text is the prior art, and the embodiment of the application is not repeated. For example, the results of random disruption to "I am Chinese, I love Chinese" may be as shown in Table 2.

TABLE 2

Source statement	I am Chinese,I love China.
		Random masking	I am[MASK],I[MASK]China.
Random substitution	I am Chinese,residual love China.
		Random deletion	I am,I love China.
Random insertion	I am Chinese,I monitoring love China.

The third method, namely adding a Gaussian noise vector to the first text sample vector corresponding to the first sample translation, also randomly destroys the first sample translation. The method comprises the following specific steps: and carrying out random destruction on the first sample translation to obtain the first sample translation after random destruction. And obtaining a sample text vector corresponding to the first sample translation after random destruction and a Gaussian noise vector generated randomly, and adding the two vectors to obtain a vector after addition. The vector after addition and the first sample vector are then input to a pre-trained denoising self-encoder to obtain a second sample translation. The denoising self-encoder related to the method is the same as the denoising self-encoder related to the first method and the second method

And a fourth method, namely acquiring a third sample text vector corresponding to the sample text, a first sample text vector corresponding to the first sample translation, a first Gaussian noise vector and a second Gaussian noise vector. And adding the third sample text vector with the first Gaussian noise vector to obtain a denoised third sample text vector. And adding the first sample vector and the second Gaussian noise vector to obtain a first sample vector after noise addition. Inputting the first sample vector after noise addition, the third sample text vector after noise addition and the first sample vector into a pre-trained denoising self-encoder to obtain a second sample translation. The first gaussian noise vector and the second gaussian noise vector may be the same vector or may be different vectors.

The training process of the denoising self-encoder is as follows: and acquiring the sample original text and a first sample translation corresponding to the sample original text, and inputting the first sample text vector after noise addition, the third sample text vector after noise addition and the first sample text vector into a denoising self-encoder to obtain a predicted text. And inputting the predicted text and the first sample translation into a loss function to obtain loss information, and adjusting the denoising self-encoder based on the loss information to obtain the adjusted denoising self-encoder. And continuously adjusting the adjusted denoising self-encoder by using other sample texts and first sample translations corresponding to the other sample texts, and obtaining the pre-trained denoising self-encoder when the adjusted denoising self-encoder converges.

The structure of the denoising self-encoder is different from that of the denoising encoder related to the three methods, the denoising self-encoder is a double encoder-single decoder architecture, the specific composition is shown in fig. 4, and a third sample text vector corresponding to a sample text is obtained through a word embedding algorithm. And adding the third sample text vector with the randomly generated first Gaussian noise vector to obtain a denoised third sample text vector. And linearly mapping the third sample text vector after noise addition to obtain a thirteenth vector, and inputting the thirteenth vector into the multi-head mutual attention layer to obtain a fourteenth vector. And carrying out residual connection on the fourteenth vector and the thirteenth vector, and carrying out normalization processing on the vectors after residual connection to obtain a fifteenth vector. The fifteenth vector is input to the feed forward layer to obtain a sixteenth vector. And carrying out residual connection on the fifteenth vector and the sixteenth vector, and carrying out normalization processing on the vectors after residual connection to obtain a seventeenth vector. Similarly, a first text sample vector corresponding to the sample text is obtained through a word embedding algorithm. And adding the first sample vector and the second Gaussian noise vector generated randomly to obtain the first sample vector after noise addition. And carrying out linear mapping on the third sample text vector after noise addition to obtain an eighteenth vector, and inputting the eighteenth vector into a multi-head mutual attention layer to obtain a nineteenth vector. And carrying out residual connection on the eighteenth vector and the nineteenth vector, and carrying out normalization processing on the vectors after residual connection to obtain the twentieth vector. The twenty-first vector is input to the feed-forward layer to obtain a twenty-first vector. And carrying out residual connection on the twentieth vector and the twenty-first vector, and carrying out normalization processing on the vectors after residual connection to obtain the twentieth vector. Similarly, the first text sample vector is linearly mapped to obtain a twenty-third vector, and the twenty-third vector is input into the multi-head self-attention layer of the mask to obtain a twenty-fourth vector. And carrying out residual connection on the twenty-third vector and the twenty-fourth vector, and carrying out normalization processing on the vectors after residual connection to obtain a twenty-fifth vector. And inputting the twenty-fifth vector and the seventeenth vector into the multi-head mutual attention layer to obtain a twenty-sixth vector. And carrying out residual connection on the twenty-sixth vector and the twenty-fifth vector, and carrying out normalization processing on the vectors after residual connection to obtain a twenty-seventh vector. Inputting the twenty-seventh vector and the twenty-second vector into the multi-head mutual attention layer to obtain the twenty-eighth vector. And carrying out residual connection on the twenty-eighth vector and the twenty-seventh vector, and carrying out normalization processing on the vectors after residual connection to obtain the twenty-ninth vector. And inputting the twenty-ninth vector into the feedforward layer to obtain a thirty-ninth vector. And carrying out residual connection on the twenty-ninth vector and the thirty-eighth vector, and carrying out normalization processing on the vectors after residual connection to obtain the thirty-eighth vector. The thirty vectors are input into the linear layer and Softmax processing is performed to obtain a second sample translation.

The denoising self-encoder comprises two multi-head mutual attention layers, and the multi-head mutual attention layers respectively interact with the two encoders so as to finish decoding. And the multi-head self-attention layer, the feedforward layer, the masked multi-head self-attention layer, the multi-head mutual attention layer and the feedforward layer in the denoising self-encoder are all neural networks. The multi-head self-attention layer is used for projecting the characteristic vector input by the encoder through a plurality of linear transformations to obtain a query, key and value triplet, then calculating attention weight between the query and the key, multiplying the attention weight by the value, and obtaining the characteristic representation input by the encoder. The multi-head mutual attention layer is used for projecting the characteristic vectors input by the encoder and the decoder through a plurality of linear transformations to obtain a query, a key and a value triplet, then calculating attention weight between the query and the key, multiplying the attention weight by the value, and obtaining the characteristic representation after the information interaction of the encoder and the decoder. The full connection layer is used for mapping the input characteristic representation twice and increasing the characteristic representation capability. Residual connections are used to connect the input vectors, avoiding the gradient vanishing problem. The layer normalization is used for normalizing the neuron distribution of the same layer into the same distribution, so that the training stability is ensured.

And fifthly, randomly destroying the sample original text to obtain the sample original text after random destruction. And carrying out random destruction on the first sample translation to obtain the first sample translation after random destruction. And obtaining a fourth sample text vector corresponding to the corrupted sample text and a second sample text vector corresponding to the corrupted first sample text. And inputting the fourth sample text vector, the second sample text vector and the first sample text vector into a pre-trained denoising self-encoder to obtain a second sample translation.

And a sixth method, wherein the first sample translation is randomly destroyed, so as to obtain the randomly destroyed first sample translation. And obtaining a sample text vector corresponding to the first sample translation after random destruction and a third Gaussian noise vector generated randomly, and adding the two vectors to obtain a first vector. And carrying out random destruction on the sample original text to obtain the sample original text after random destruction. And obtaining a sample text vector corresponding to the sample text after random destruction and a fourth Gaussian noise vector generated randomly, and adding the two vectors to obtain a second vector. The first vector, the second vector and the first sample vector are input into a pre-trained denoising self-encoder to obtain a second sample translation.

The third noise vector and the fourth gaussian noise vector may be the same vector, or may be different noise vectors.

And a seventh method, wherein the sample original text or the first sample translation is randomly destroyed, and the text vector corresponding to the sample original text or the first sample translation after random destruction is obtained. And obtaining a sample text vector corresponding to the first sample translation or the sample text and a Gaussian noise vector generated randomly, and adding the two vectors to obtain a vector after addition. And inputting the two obtained vectors and the first sample vector into a denoising self-encoder to obtain a second sample translation.

It should be noted that, the training process and the structure of the denoising self-encoder are the same in the fourth method, the fifth method, the sixth method and the seventh method, and are not described here again.

Step 202, inputting the sample original text and the first sample translation into a scoring model to obtain a first sample score corresponding to the first sample translation, and inputting the sample original text and each second sample translation into the scoring model to obtain a second sample score corresponding to each second sample translation.

Optionally, the scoring model in the embodiment of the application comprises a text preprocessing module, a feature extraction module and a scoring module. The method for obtaining the first sample score corresponding to the first sample translation comprises the specific steps of: and inputting the sample original text and the first sample translation into a text preprocessing module to obtain a sample character sequence. And inputting the sample character sequence into a feature extraction module to obtain sample feature information. And inputting the sample characteristic information into a scoring module to obtain a first sample score corresponding to the first sample translation.

The text preprocessing module is an algorithm model and mainly used for preprocessing the text of an input sample original text and a first sample translation to obtain a sample text after text preprocessing and a first sample translation after text preprocessing, and splicing the sample text and the first sample translation to obtain a sample character sequence.

The text preprocessing includes segmentation word processing, subword segmentation processing, special character processing and truncation processing. The word segmentation process is to separate punctuation from text, the sub word segmentation process is to further segment single words according to the occurrence frequency of continuous letters, the special character process is to delete non-printed characters and transfer escape characters, the truncation process is to truncate an input sequence according to the upper limit of the sentence length which can be processed by a model.

It should be noted that, the text preprocessing includes a subword segmentation process, so the sample original text and the first sample translation may be segmented into a plurality of subwords. For example, after the sample original text "I draft apple" and the first sample translation "I draft apple" are subjected to the BERT preprocessing flow, the sample original text obtained by text preprocessing is "[ CLS ] I draft apple. The first sample translation of the text pre-processing is "[ SEP ] I drink an app# - [ SEP ]". And splicing the two to obtain the [ CLS ] I eat apples. [ SEP ] I drink an app# # le ] [ SEP ] ".

In the above sequence, the term apple is split into two parts, app and # le. This helps to reduce the size of the vocabulary and reduce the computational overhead.

The feature extraction module is a neural network model and is mainly used for extracting features of the text vector to obtain sample feature information. The specific processing procedure of the feature extraction module is as follows: each character in the sequence of characters is first converted into a text vector. The text vector is then sent to an encoder in the feature extraction module. And each layer of the converter encodes the text vector layer by layer into characteristic information, so that the characteristic information corresponding to each word fuses the word context information.

For example, although the words "bank" are included in both "I am fishing on the bank" and "I went to bank to deposit money," bank "in both sentences is a different meaning. "bank" in the first sentence should be translated into "river bank", and "bank" in the second sentence should be translated into "bank". The feature extraction module may distinguish between different meanings of the two words based on the context information, thereby giving different representation vectors for the same word.

The scoring module is also a neural network model, the structure of the scoring module is composed of a layer of fully-connected network, and the scoring module maps the characteristic information into a real-value continuous numerical value, namely a score, and the real-value continuous numerical value is used as a quality evaluation result of the sample original text and the first sample translated text.

Step 203, determining loss information based on the first sample score and the at least one second sample score.

And adding noise to the first sample translation to obtain a second sample translation. The second sample score corresponding to the second sample translation should be lower than the first sample score corresponding to the first sample translation. This allows loss information to be obtained based on the comparison between the first sample fraction and the second sample fraction. Based on this principle, embodiments of the present application provide two forms of comparative training. The first approach is to compare the classifications, as shown in FIG. 5A, and the other is to compare the ordering, as shown in FIG. 5B, with the goal of both loss functions being to have the first sample score higher than the second sample score. The specific two methods are as follows.

The first method determines loss information based on a first sample score, at least one second sample score, and a first predetermined formula.

Wherein L is loss information, D is a sample translation set consisting of a first sample translation and at least one second sample translation, x is any sample translation in the sample translation set D, h (x) is a score corresponding to the sample translation x, and W _x For a preset coefficient, p _x To preset constant, p _x The numerical range of (1, 0).

In FIG. 5A, S is the original sample, T ₀ For the first sample translation, T ₁ `～T _n All are the second sample translations, l ₀ For the first sample score, l ₁ `～l _n The second sample fractions are all. The sample original text S and the first sample translation T ₀ Inputting the scoring model to obtain a first sample score l ₀ . The sample original text S and the first sample translation T ₁ Inputting the scoring model to obtain a first sample score l ₁ And (3) the method. The sample original text S and the first sample translation T _n Inputting the scoring model to obtain a first sample score l _n And (3) the method. Then the first preset formula pair l is adopted ₀ And l ₁ `～l _n And (5) performing comparison and sorting.

Note that, in fig. 5A, the number of scoring models may be one or a plurality of scoring models. But when there are multiple scoring models, the parameters used by each scoring model are shared. Thus, there is only one scoring model actually trained. By the aid of the parameter sharing mode, training efficiency of the neural network is improved, and occupied space of a scoring model is reduced.

In a second method, loss information is determined based on the first sample fraction, at least one second sample fraction, and a second predetermined formula.

The second preset formula is

Wherein L is loss information, D is a sample translation set composed of a first sample translation and at least one second sample translation, s is the first sample translation, h(s) is a first sample score corresponding to the first sample translation, x is any sample translation in the sample translation set D, h (x) is a score corresponding to the sample translation x, and margin is a preset constant for increasing a difference between the first sample score and the second sample score.

As shown in fig. 5B, S is a sample original, T is a first sample translation, T 'is a second sample translation, l is a first sample score, and l' is a second sample score, where the scoring models in fig. 5B are the same model or are a model shared by multiple parameters. And inputting the sample original text S and the first sample translation T into a scoring model to obtain a first sample score l. And inputting the sample original text S and the first sample translated text T 'into a scoring model to obtain a first sample score l'. And calculating the marginal loss of l and l' through a second preset formula.

And 204, adjusting the scoring model based on the loss information.

In an implementation, parameters in the scoring model are adjusted based on the loss information to obtain an adjusted scoring model. And training and adjusting the scoring model based on the other sample texts, the corresponding first sample translation and the corresponding second sample translation.

After obtaining the loss information, gradient back-pass and parameter update are carried out on the scoring model by using a back-propagation algorithm of deep learning. In the one-time training process, parameters of the feature extraction module and the scoring module in the scoring model are updated, and the same learning rate is used. Meanwhile, in the training process, once the scoring model is trained and preset, a verification process can be carried out on the scoring model. The verification process is similar to the training process in the prior art, but the parameters of the scoring model are not adjusted based on the loss information, but the predicted score and the reference score output by the scoring model are compared to obtain the loss information, then the scoring model is trained and adjusted based on the loss information, and meanwhile the accuracy of the scoring model is calculated based on the predicted score and the reference score. And when the calculated accuracy is not improved any more, obtaining a scoring model after training.

Wherein the loss information is calculated based on the prediction score, the reference score, and a third formula, wherein the third formula isWherein Levent is loss information, h(s) is a predictive score output by a scoring model, hter _s As a reference score, W _s Is a preset coefficient, wherein sigmoid (x) is a mapping function, and is used for mapping x into a numerical range of 0-1.

Therefore, when the scoring model is trained and adjusted, the scoring model can be trained based on the prediction score and the reference score, so that the output result of the trained scoring model is more accurate.

In the embodiment of the application, a sample original text, a first sample translation and at least one second sample translation are obtained, wherein the semantics of the first sample translation are the same as the semantics corresponding to the sample original text, and the semantics of the second sample translation are different from the semantics of the first sample translation. Inputting the sample original text and the first sample translation into a scoring model to obtain a first sample score corresponding to the first sample translation, and inputting the sample original text and each second sample translation into the scoring model to obtain a second sample score corresponding to each second sample translation; determining loss information based on the first sample score and at least one second sample score; and adjusting the scoring model based on the loss information. Therefore, the embodiment of the application can train the scoring model on the premise of not depending on the reference score.

In the related art, before training a scoring model, a professional translator or a native speaker is required to evaluate the scoring model according to a sample original text and a translated sample translation, score the scoring model from a plurality of different aspects such as accuracy, fluency and the like, and then integrate a plurality of evaluation scores of the sample translation to obtain a final reference score. As shown in table 3:

TABLE 3 Table 3

Sample original text	I are Chinese.	I eat apples.
			Sample translation	I am Chinese.	I drink an apple.
Results of manual evaluation 1	1.0	0.2
			Results of manual evaluation 2	0.9	0.4
Results of manual evaluation 3	1.0	0.35
			Final manual evaluation results	0.9667	0.3167

In table 2, the manual evaluation results corresponding to the sample translation "I am chinese" are 1.0, 0.9 and 1.0, respectively, and the final manual evaluation result is the average value 0.9667 of the three manual evaluation results, i.e., the average value is the reference score. The manual evaluation results corresponding to the sample translation "I drink an apple" are respectively 0.2, 0.4 and 0.35, and the final manual evaluation result is the average value 0.3167 of the three manual evaluation results, i.e. the average value is the reference score.

Because the manual evaluation process is time-consuming and laborious, a large number of specialized translators are involved to obtain a relatively objective evaluation score. In addition, since the error distribution of different languages, different fields and different machine translation systems is different, when quality evaluation is performed for a specific language, a specific field and a specific machine translation system, an existing scoring model cannot be directly used, and training and adjustment are performed on the scoring model again based on the specific language, the specific field and the specific machine translation system, which is time-consuming and labor-consuming.

In the actual use process of the scoring model, the target original text and the target translation are input into a pre-trained scoring model, and target scores corresponding to the target translation are obtained.

Fig. 6 is a schematic structural diagram of an apparatus for training a scoring model according to an embodiment of the present application, referring to fig. 6, the apparatus includes:

a first obtaining module 610 configured to obtain a sample original, a first sample translation, and at least one second sample translation, wherein the semantics of the first sample translation are the same as the semantics of the sample original, and the semantics of the second sample translation are different from the semantics of the first sample translation;

the input module 620 is configured to input the sample original text and the first sample translation into a scoring model to obtain a first sample score corresponding to the first sample translation, and input the sample original text and each second sample translation into a scoring model to obtain a second sample score corresponding to each second sample translation;

a determination module 630 configured to determine loss information based on the first sample score and at least one second sample score;

an adjustment module 640 configured to adjust the scoring model based on the loss information.

acquiring a first sample vector corresponding to the first sample text;

Optionally, the determining module 630 is configured to:

the second preset formula is that

Optionally, the apparatus further comprises a use module configured to:

the input module 620 is configured to:

It should be noted that: in the device for training the scoring model provided in the above embodiment, only the division of the above functional modules is used for illustration when training the scoring model, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the device for training the scoring model provided in the above embodiment belongs to the same concept as the method embodiment for training the scoring model, and the specific implementation process is detailed in the method embodiment, which is not repeated here.

Fig. 7 shows a block diagram of a terminal 700 according to an exemplary embodiment of the present application. The terminal 700 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 700 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, etc.

In general, the terminal 700 includes: a processor 701 and a memory 702.

Processor 701 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 701 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 701 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 701 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and drawing of content that the display screen is required to display. In some embodiments, the processor 701 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 702 may include one or more computer-readable storage media, which may be non-transitory. The memory 702 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 702 is used to store at least one program code for execution by processor 701 to implement the method of training a scoring model provided by a method embodiment of the present application.

In some embodiments, the terminal 700 may further optionally include: a peripheral interface 703 and at least one peripheral. The processor 701, the memory 702, and the peripheral interface 703 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 703 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 704, a display 705, a camera assembly 706, audio circuitry 707, a positioning assembly 708, and a power supply 709.

A peripheral interface 703 may be used to connect I/O (Input/Output) related at least one peripheral to the processor 701 and memory 702. In some embodiments, the processor 701, memory 702, and peripheral interface 703 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 701, the memory 702, and the peripheral interface 703 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 704 is configured to receive and transmit RF (Radio Frequency) signals, also referred to as electromagnetic signals. The radio frequency circuitry 704 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 704 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 704 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 704 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 704 may also include NFC (Near Field Communication ) related circuitry, which is not limiting of the application.

The display screen 705 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 705 is a touch display, the display 705 also has the ability to collect touch signals at or above the surface of the display 705. The touch signal may be input to the processor 701 as a control signal for processing. At this time, the display 705 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 705 may be one and disposed on the front panel of the terminal 700; in other embodiments, the display 705 may be at least two, respectively disposed on different surfaces of the terminal 700 or in a folded design; in other embodiments, the display 705 may be a flexible display disposed on a curved surface or a folded surface of the terminal 700. Even more, the display 705 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The display 705 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 706 is used to capture images or video. Optionally, the camera assembly 706 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 706 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 707 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 701 for processing, or inputting the electric signals to the radio frequency circuit 704 for voice communication. For the purpose of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 700. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 701 or the radio frequency circuit 704 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 707 may also include a headphone jack.

The location component 708 is operative to locate the current geographic location of the terminal 700 for navigation or LBS (Location Based Service, location-based services). The positioning component 708 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati system of Russia, or the Galileo system of the European Union.

A power supply 709 is used to power the various components in the terminal 700. The power supply 709 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 709 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 700 further includes one or more sensors 710. The one or more sensors 710 include, but are not limited to: acceleration sensor 711, gyroscope sensor 712, pressure sensor 713, fingerprint sensor 714, optical sensor 715, and proximity sensor 716.

The acceleration sensor 711 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 700. For example, the acceleration sensor 711 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 701 may control the display screen 705 to display a user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 711. The acceleration sensor 711 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 712 may detect a body direction and a rotation angle of the terminal 700, and the gyro sensor 712 may collect a 3D motion of the user to the terminal 700 in cooperation with the acceleration sensor 711. The processor 701 may implement the following functions based on the data collected by the gyro sensor 712: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 713 may be disposed at a side frame of the terminal 700 and/or at a lower layer of the display screen 705. When the pressure sensor 713 is disposed at a side frame of the terminal 700, a grip signal of the user to the terminal 700 may be detected, and the processor 701 performs left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 713. When the pressure sensor 713 is disposed at the lower layer of the display screen 705, the processor 701 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 705. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 714 is used to collect a fingerprint of the user, and the processor 701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 714, or the fingerprint sensor 714 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 701 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 714 may be provided on the front, back, or side of the terminal 700. When a physical key or vendor Logo is provided on the terminal 700, the fingerprint sensor 714 may be integrated with the physical key or vendor Logo.

The optical sensor 715 is used to collect the ambient light intensity. In one embodiment, the processor 701 may control the display brightness of the display screen 705 based on the ambient light intensity collected by the optical sensor 715. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 705 is turned up; when the ambient light intensity is low, the display brightness of the display screen 705 is turned down. In another embodiment, the processor 701 may also dynamically adjust the shooting parameters of the camera assembly 706 based on the ambient light intensity collected by the optical sensor 715.

A proximity sensor 716, also referred to as a distance sensor, is typically provided on the front panel of the terminal 700. The proximity sensor 716 is used to collect the distance between the user and the front of the terminal 700. In one embodiment, when the proximity sensor 716 detects that the distance between the user and the front face of the terminal 700 gradually decreases, the processor 701 controls the display 705 to switch from the bright screen state to the off screen state; when the proximity sensor 716 detects that the distance between the user and the front surface of the terminal 700 gradually increases, the processor 701 controls the display screen 705 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 7 is not limiting of the terminal 700 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

The computer device provided by the embodiment of the application can be provided as a server. Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 800 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 801 and one or more memories 802, where at least one program code is stored in the memories 802, and the at least one program code is loaded by the processors 801 and executed to implement the method for training the scoring model provided in the above method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input obtaining interface, and other components for implementing the functions of the device, which are not described herein.

In an exemplary embodiment, a computer readable storage medium, e.g. a memory comprising program code, executable by a processor in a terminal or server to perform the method of training the scoring model in the above embodiments is also provided. For example, the computer readable storage medium may be read-only memory (ROM), random-access memory (random access memory), RAM), compact-disk-read-only memory (cd-ROM), magnetic tape, floppy disk, optical data storage device, etc.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by program code related hardware, where the program may be stored in a computer readable storage medium, and the above storage medium may be a read only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims

1. A method of training a scoring model, the method comprising:

adjusting a scoring model based on the loss information;

wherein the determining loss information based on the first sample score and at least one second sample score comprises at least one of:

the first preset formula is that；

Wherein,for the loss information, < >>For a sample translation set consisting of said first sample translation and said at least one second sample translation +_>For the sample translation set +.>Is a translation of any of the samples +.>Translation for the sample->Corresponding score,/->For the preset coefficient, ++>Is a preset constant->The numerical range of (1, 0);

or alternatively, the first and second heat exchangers may be,

the second preset formula is that；

Wherein,for the loss information, < >>For translating from said first sample and said at leastA second sample translation set of sample translations,/for each of the first and second sample translations >For the first sample translation, +.>For a first sample fraction corresponding to said first sample translation,For the sample translation set +.>Is a translation of any of the samples +.>Translation for the sample->The corresponding score is a score of the score,is a preset constant for increasing the difference between the first sample fraction and the second sample fraction.

2. The method of claim 1, wherein prior to obtaining the sample text, the first sample translation, and the at least one second sample translation, the method further comprises:

3. The method of claim 1, wherein prior to obtaining the sample text, the first sample translation, and the at least one second sample translation, the method further comprises:

acquiring a first text sample vector corresponding to the first text sample translation;

4. The method according to claim 1, wherein the method further comprises:

5. The method of claim 1, wherein the scoring model comprises a text preprocessing module, a feature extraction module, and a scoring module;

6. An apparatus for training a scoring model, the apparatus comprising:

an adjustment module configured to adjust a scoring model based on the loss information;

the determining module is specifically configured to perform at least one of the following:

The first preset formula is that；

or alternatively, the first and second heat exchangers may be,

the second preset formula is that；

Wherein,for the loss information, < >>For a sample translation set consisting of said first sample translation and said at least one second sample translation +_>For the first sample translation, +.>For a first sample fraction corresponding to said first sample translation,For the sample translation set +.>Is a translation of any of the samples +.>Translation for the sample->The corresponding score is a score of the score,is a preset constant for increasing the difference between the first sample fraction and the second sample fraction.

7. A terminal comprising a processor and a memory, the memory having stored therein at least one program code that is loaded and executed by the processor to perform the operations performed by the method of training a scoring model according to any one of claims 1 to 5.

8. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to perform the operations performed by the method of training a scoring model of any one of claims 1 to 5.