CN115662392A

CN115662392A - Transliteration method based on phoneme memory, electronic equipment and storage medium

Info

Publication number: CN115662392A
Application number: CN202211595293.3A
Authority: CN
Inventors: 宋彦; 田元贺
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-01-31
Anticipated expiration: 2042-12-13
Also published as: CN115662392B

Abstract

The invention discloses a transliteration method based on phoneme memory, electronic equipment and a storage medium, wherein the transliteration method comprises the following steps: 1. extracting the transliterated words and splitting the transliterated words into letters, 2, constructing a phoneme library, and extracting phoneme features associated with each letter; 3. constructing an encoder of an L layer, and encoding letters to obtain letter encoding vectors corresponding to each letter in each layer; 4. establishing an L-layer phoneme memory network for modeling letter coding vectors and phoneme characteristics to obtain a letter coding matrix; 5. inputting the letter coding matrix and a target letter output by the classifier at the previous t moment into a decoder of an L layer, and sending an obtained letter prediction vector output by the decoder at the t moment into the classifier to obtain a predicted target letter at the t moment; 6. assigning T +1 to T, and repeating the step 5 until the time T, thereby obtaining the predicted letter sequence. The invention aims to fuse the phoneme characteristics into a standard text generation process, thereby improving the transliteration quality and effect.

Description

Transliteration method based on phoneme memory, electronic equipment and storage medium

Technical Field

The invention belongs to the field of natural language processing, and particularly relates to a transliteration method based on phoneme memory, electronic equipment and a storage medium.

Background

Transliteration refers to translating a person's name in a source language, such as Smith, to text in a target language, such as chinese, such as Smith) without changing the pronunciation of the name in the source language. For example, the name "Smith" in the Source language English is transliterated to "Smith" in Chinese.

The existing method mostly regards the task as a sequence-to-sequence generation task, and adopts a high-level encoder and a decoder to generate name transliteration of a target language, and lacks the utilization of phonetic features, especially phoneme features, in a source language and a target language, so that words generated by transliteration lose pronunciation features of the source language, and the accuracy of transliteration is reduced.

Disclosure of Invention

The present invention is directed to overcome the above-mentioned shortcomings of the prior art, and provides a transliteration method, an electronic device and a storage medium based on phoneme memory, so as to integrate phoneme features into a standard text generation process, thereby improving the transliteration quality and effect.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention relates to a transliteration method based on phoneme memory, which is characterized by comprising the following steps:

step 1, extracting a plurality of transliterated words from a source language corpus, and splitting each word into letters; wherein, the ith word X _i The split letter sequence is recorded asx _i,1 ,…x _i,j ,…,

}，x _i,j Represents the ith word X _i The jth letter in (1), n _i Represents the ith word X _i The total number of middle letters;

step 2, selecting the jth letter from the phoneme libraryx _i,j Associated m phoneme characteristics and forming a phoneme characteristic setS _i,j ={s _i,j,1 …s _i,j,u ,…s _i,j,m And (c) the step of (c) in which,s _i,j,u is the j-th letterx _i,j An associated u-th phoneme feature, m being a total number of associated phoneme features;

step 3, constructing a transliteration network, comprising: an L-layer encoder, an L-layer phoneme memory network, an L-layer decoder and a classifier;

step 3.1, processing of an encoder:

the j letterx _i,j Conversion to jth letter vector

Then inputting the encoded vector into the encoder, sequentially processing the encoded vector by a multi-head self-attention layer of L layers to obtain L letter encoded vectors

|l=1,2, \8230;, L }; wherein the content of the first and second substances,

is shown aslThe multi-head of the layer outputs the jth letter encoding vector from the attention layer;

step 3.2, processing the phoneme memory network:

collecting phoneme characteristicsS _i,j Conversion into phoneme vector set

| u =1,2, \8230 |, m } back, and a hard page

|l=1,2, \ 8230;, L } are input into the phoneme memory network together for processing, and n is enhanced _i Letter code vector

|l=1,2,…,L;j=1,2,…,n _i And is marked as the ith word X _i Alpha-code matrix H _i (ii) a Wherein the content of the first and second substances,

representing the u-th phonemes _i,j,u The phoneme vector of (a);

representing an enhanced jth letter code vector;

and 3.3, processing by a decoder:

encoding letters into matrix H _i Inputting the letter and the target letter output by the classifier at the previous t moment into a decoder of an L layer, and obtaining a letter prediction vector h output by the decoder at the t moment _i,t (ii) a When t =1, making the letter output by the classifier at the previous t moment be null;

and 3.4, processing by a classifier:

the classifier utilizes the full-connection layer to predict the letter vector h output by the decoder at the time t _i,t Processing to obtain the ith word X at the current time t _i Predicted target letter y _i,t ；

Step 3.5, after assigning t +1 to t, returning to the step 3.3 for sequential execution until the step is finishedTUntil the moment, thereby obtaining the ith word X _i Predicted letter sequence of { y } _i,1 ,…, y _i,t ,…, y _i,T }。

The transliteration method based on phoneme memory of the invention is also characterized in that the step 2 comprises the following steps:

step 2.1, calculate the jth letter using equation (1)x _i,j With the qth phoneme feature in the phoneme librarys _q Point-by-point mutual information PMI of (x _i,j ,s _q ) To obtain the j letterx _i,j Point-to-point mutual information with all M phoneme features { PMI (PMI) ((PMI))x _i,j ,s _q )|1<=q<=M }；MRepresenting the number of all phoneme characteristics in the phoneme library;

(1)

in the formula (1), p: (x _i,j ,s _q ) Represents the j letterx _i,j With the qth phoneme features _q Probability of co-occurrence; p (a)x _i,j ) Represents the j letterx _i,j Appearing in the ith word X _i A probability of (1); p (a)s _q ) Representing the qth phoneme features _q Appearing in the ith word X _i Probability in pronunciation of (a);

step 2.2 from point-to-point mutual information { PMI ((PMI))x _i,j ,s _q )|1<=q<Selecting M phoneme characteristics corresponding to the highest point-by-point mutual information from the M phoneme characteristics, and forming a phoneme characteristic setS _i,j ={s _i,j,1 …s _i,j,u ,…s _i,j,m }。

Said step 3.2 comprises:

step 3.2.1, the u-th phonemes _i,j,u Conversion to the u-th phoneme vector

Then, with

Are input together intolIn a hierarchical phoneme memory network, said firstlThe layer phoneme memory network utilizes the pairs of formula (2) and formula (3)

After mapping, get the secondlU-th phoneme key vector of layer

And a firstlU-th phoneme value vector of layer

：

(2)

(3)

In the formula (1) and the formula (2),

is shown aslThe key matrix of the layer(s),

denotes the firstlA matrix of values for the layers; reLU denotes the activation function; "·" denotes the multiplication of a matrix and a vector;

step 3.2.2, saidlThe phoneme memory network of the layer calculates the second by using the formula (4)lU-th phoneme weight of layer

：

(4)

In the formula (3), ". Denotes a vector inner product;

step 3.2.3, saidlThe hierarchical phoneme memory network calculates a weighted average vector using equation (5)

：

(5)

Step 3.2.4, saidlThe phoneme memory network of the layer is obtained by using the formula (6)lLayer jth letter reset vector

：

(6)

In the formula (5), sigmoid represents an activation function,

and

respectively representlA first reset matrix and a second reset matrix of a layer,

is shown aslA reset offset vector of a layer;

step 3.2.5, saidlThe phoneme memory network of the layer is obtained by using the formula (7)lLayer enhanced jth letter code vector

So that the enhanced jth letter code vector is output by the L-layer phoneme memory network

|l=1,2, \ 8230;, L }, and thus enhanced n _i Letter code vector

|l=1,2,…,L, j=1,2,…,n _i And is marked as the ith word X _i Alpha-code matrix H _i ；

(7)

In the formula (7), the reaction mixture is,

the product of the hadamard is represented,

representing a vector concatenation, with 1 representing a vector with all dimension values of 1.

The electronic device of the invention comprises a memory and a processor, and is characterized in that the memory is used for storing programs for supporting the processor to execute the transliteration method, and the processor is configured to execute the programs stored in the memory.

The invention relates to a computer-readable storage medium, on which a computer program is stored, characterized in that the computer program executes the steps of the transliteration method when being executed by a processor.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the method, for each letter in the input, the representation of the letter is enhanced by using the phoneme characteristics associated with the letter through the phoneme memory neural network of the L layer, so that the understanding of the model to the pronunciation characteristics of the target language is enhanced, the transliteration text generated by the model retains the phonetic characteristics of the source language as far as possible, and the model transliteration performance is improved.

2. The method realizes the identification and utilization of the importance of different phoneme characteristics by weighting different phoneme characteristics, and effectively avoids the influence of potential noise in the phoneme characteristics on the model performance.

Drawings

FIG. 1 is a flow chart of the transliteration method of the present invention.

Detailed Description

In this embodiment, a transliteration method based on phoneme memory is performed as shown in fig. 1, and includes the following steps:

}，x _i,j Represents the ith word X _i The j-th letter in (1), n _i Represents the ith word X _i The total number of middle letters; for example, 4 transliterated words extracted from an english source language corpus are { Tom, smith, bob, cook }, and then after the words are split, the letter sequence of the split 2 nd word "Smith" is { "S", "m", "i", "t", "h", wherein there are 5 letters in total, and the 3 rd letter is "i".

Step 2, selecting the jth letter from the phoneme libraryx _i,j Associated m phoneme characteristics and forming a phoneme characteristic setS _i,j ={s _i,j,1 …s _i,j,u ,…s _i,j,m And (c) the step of (c) in which,s _i,j,u is the jth letterx _i,j An associated u-th phoneme feature, m being a total number of associated phoneme features; for example, the phoneme library is a set including all international phonetic symbols { "a", "e", "o", "t", "g", "k", "i", "i:", "is" _I ", \8230;". When m =3, the phoneme feature set extracted from it and associated with the 3 rd letter "i" is { "i:", { "i:" _I "," i "}. The 1 st phoneme feature associated with the 3 rd letter is "i:".

Step 2.1, calculate the jth letter using equation (1)x _i,j And the q-th phoneme feature in the phoneme librarys _q PMI (a) of point-by-point mutual informationx _i,j ,s _q ) To obtain the j letterx _i,j Point-to-point mutual information with all M phoneme features { PMI (PMI) ((PMI))x _i,j ,s _q )|1<=q<=M }；MRepresenting the number of all phoneme characteristics in the phoneme library;

(1)

in the formula (1), p: (x _i,j ,s _q ) Represents the j letterx _i,j With the qth phoneme features _q Probability of co-occurrence; p (a)x _i,j ) Represents the j letterx _i,j Appearing in the ith word X _i The probability of (2); p (a)s _q ) Representing the qth phoneme features _q Appearing in the ith word X _i Is determined. For example, the process of calculating the 3 rd letter "i" and the 8 th phoneme feature "i:" in the phoneme library is. The probability 0.6 of co-occurrence of the 3 rd letter "i" with the 8 th phoneme feature "i:" in the phoneme library is calculated, the probability 0.3 of the 3 rd letter "i" occurring in the 2 nd word "Smith" is calculated, and the probability 0.5 of the 8 th phoneme feature "i:" occurring in the pronunciation of "Smith" is calculated. Using formula (1), the point-to-point mutual information of the 3 rd letter "i" and the 8 th phoneme feature "i:" in the phoneme library is calculated to be 2. By the same method, the point-by-point mutual information of the 3 rd letter "i" and all the phoneme characteristics in the phoneme library can be calculated.

Step 2.2 from point-to-point mutual information { PMI ((PMI))x _i,j ,s _q )|1<=q<Selecting M phoneme characteristics corresponding to the highest point-by-point mutual information from the = M } and forming a phoneme characteristic setS _i,j ={s _i,j,1 …s _i,j,u ,…s _i,j,m }. For example, the 3 highest-scoring phoneme feature of the 3 rd letter "i" is { "i:", { "i:" _I ”, “i”}。

Step 3, building a transliteration network, comprising the following steps: an L-layer encoder, an L-layer phoneme memory network, an L-layer decoder and a classifier;

and 3.1, processing of an encoder:

the j letterx _i,j Conversion to jth letter vector

|l=1,2, \8230;, L }; wherein, the first and the second end of the pipe are connected with each other,

denotes the firstlThe multi-head of the layer outputs the jth letter encoding vector from the attention layer; for example, when L =6, the 3 rd letter "i" is first converted into an alphabet vector, and then, after processing of a multi-head self-attention layer of 6 layers, a 6-letter coded vector of the 3 rd letter "i" is obtained.

Step 3.2, processing the phoneme memory network:

collecting phoneme characteristicsS _i,j Converting into phoneme vector set

| u =1,2, \8230 |, m } back, and a hard page

|l=1,2, \8230:, L is input into the phoneme memory network together for processing, and n is enhanced _i Letter code vector

representing the u-th phonemes _i,j,u The phoneme vector of (a);

representing an enhanced jth letter code vector; for example, the phoneme feature set of the 3 rd letter "i {" i: ", {" i: " _I "," i "} is converted to a set of 3 phoneme vectors, which 3 phoneme vectors are input into the phoneme memory network together with the 6 letter code vectors for the 3 rd letter" i ", resulting in an enhanced 3 rd letter code vector

Then, the same operation is carried out on all the letters to obtain a letter coding matrix H of the 2 nd word "Smith ₂ 。

Step 3.2.1, the u-th phoneme features _i,j,u Conversion to the u-th phoneme vector

Then, with

After mapping, the first one is obtainedlLayer u-th phoneme key vector

And a firstlU-th phoneme value vector of layer

：

(2)

(3)

In the formula (1) and the formula (2),

is shown aslThe key matrix of the layer(s),

denotes the firstlA matrix of values for the layers; reLU denotes activation function; "·" denotes the multiplication of a matrix and a vector; for example, the 1 st phoneme feature "i:" is converted into a 1 st phoneme vector, and the 1 st phoneme vector and the 3 rd letter "i" are input into the phoneme memory network of the 4 th layer, so as to obtain the 1 st phoneme key vector of the 4 th layer and the first phoneme value vector of the 4 th layer.

Step 3.2.2, saidlThe phoneme memory network of the layer is calculated by using the formula (4)lU-th phoneme weight of layer

：

(4)

In the formula (3), ". Denotes a vector inner product; for example, the 3 rd letter "i" has a total of three phoneme weights at level 4, the first of which is 0.5, the second of which is 0.3, and the third of which is 0.2.

：

(5)

For example, the weighted average vector of the layer 4 of the 3 rd letter "i" is the weighted average of the value vectors of the layer 4 of the 3 rd letter "i", which in turn has a weight of 0.5,0.3,0.2.

Step 3.2.4, saidlPhoneme memory of layersThe network utilizes formula (6) to obtainlLayer jth letter reset vector

：

(6)

In the formula (5), sigmoid represents an activation function,

and

is shown aslReset offset vector of layer. Layer i jth letter reset vector due to the nature of sigmoid activation function

The value of each dimension is between 0-1, representing the reset weight for each dimension of the vector.

Step 3.2.5, saidlThe phoneme memory network of the layer is obtained by using the formula (7)lLayer enhanced jth letter coded vector

So that the enhanced jth letter code vector is output from the phoneme memory network of L layer

|l=1,2, \ 8230;, L }, and thus enhanced n _i Letter code vector

|l=1,2,…,L, j=1,2,…,n _i And is marked as the ith word X _i Alphabet coding matrix H of _i ；

(7)

In the formula (7), the reaction mixture is,

the hadamard product is represented by,

And

play a role inlJth letter coded vector of a layer

And a first step oflJth weighted average vector of a layer

The contribution of each dimension is weighted separately.

And 3.3, processing by a decoder:

encoding letters into matrix H _i Inputting the letter and the target letter output by the classifier at the previous t moment into a decoder of an L layer, and obtaining a letter prediction vector h output by the decoder at the t moment _i,t (ii) a When t =1, making the letter output by the classifier at the previous t moment be null; for example, when t =1, the input to the decoder is the letter code matrix H _i And a special letter for indicating an empty letter { "<s>"}. When t =3, the input of the decoder is the letter code matrix H _i And the target letters { "history", "secret" } that the classifier has output.

And 3.4, processing by a classifier:

the classifier using full-link layers for the output of the decoder at time tLetter prediction vector h _i,t Processing to obtain the ith word X at the current time t _i Predicted target letter y _i,t (ii) a For example, when t =1, the target letter predicted for the 2 nd word "Smith" at time t is "history"; when t =3 is, the target letter predicted for the 2 nd word "Smith" at time t is "s".

Step 3.5, after assigning t +1 to t, returning to the step 3.3 for sequential execution until the step is finishedTUntil the moment, thereby obtaining the ith word X _i Predicted letter sequence y _i,1 ,…, y _i,t ,…, y _i,T }. Specific judgment timeTIs that the ith word X is at time T +1 _i Is a predictive letter of "</s>". For example, the 2 nd word X at time t =4 _i Is predicted to be "</s>", then T =3, the predicted letter sequence {" history "," dense "," si "} for the 2 nd word" Smith "can be obtained.

In this embodiment, an electronic device includes a memory for storing a program that supports a processor to execute the transliteration method described above, and a processor configured to execute the program stored in the memory.

In this embodiment, a computer-readable storage medium stores a computer program, and the computer program is executed by a processor to perform the steps of the transliteration method.

Claims

1. A transliteration method based on phoneme memory is characterized by comprising the following steps:

}，x _i,j Represents the ith word X _i The j-th letter in (1), n _i Represents the ith word X _i The total number of middle letters;

step 2, selecting the jth letter from the phoneme libraryx _i,j M associated phoneme characteristics and forming a phoneme characteristic setS _i,j ={s _i,j,1 …s _i,j,u ,…s _i,j,m And (c) the step of (c) in which,s _i,j,u is the jth letterx _i,j An associated u-th phoneme feature, m being a total number of associated phoneme features;

step 3.1, processing of an encoder:

the jth letterx _i,j Conversion to jth letter vector

|l=1,2, \8230;, L }; wherein the content of the first and second substances,

denotes the firstlThe multi-head of the layer outputs the jth letter encoding vector from the attention layer;

step 3.2, processing the phoneme memory network:

collecting phoneme feature setS _i,j Conversion into phoneme vector set

| u =1,2, \8230;, m } rear, with a great opening

|l=1,2, \ 8230;, L } are input together into the phoneme memory netProcessing in the net to obtain enhanced n _i Letter code vector

representing the u-th phonemes _i,j,u The phoneme vector of (a);

representing an enhanced jth letter code vector;

and 3.3, processing by a decoder:

and 3.4, processing by a classifier:

2. The transliteration method based on phoneme memory as claimed in claim 1, wherein said step 2 comprises:

step 2.1, calculate the jth letter using equation (1)x _i,j And the q-th phoneme feature in the phoneme librarys _q Point-by-point mutual information PMI of (x _i,j ,s _q ) To obtain the j letterx _i,j Point-to-point mutual information with all M phoneme features { PMI: (x _i,j ,s _q )|1<=q<=M }；MRepresenting the number of all phoneme characteristics in the phoneme library;

(1)

in the formula (1), p: (1)x _i,j ,s _q ) Represents the j letterx _i,j With the qth phoneme features _q Probability of co-occurrence; p (a)x _i,j ) Represents the jth letterx _i,j Appearing in the ith word X _i The probability of (2); p (a)s _q ) Representing the qth phoneme features _q Appearing in the ith word X _i Probability in pronunciation of (a);

3. Transliteration method based on phoneme memory according to claim 1, characterised in that said step 3.2 comprises:

step 3.2.1, the u-th phonemes _i,j,u Conversion to the u-th phoneme vector

Then, with

Are input together intolIn a hierarchical phoneme memory network, the firstlLayer phoneme memory network using type (2)And formula (3) pair

After mapping, the first one is obtainedlLayer u-th phoneme key vector

And a firstlU-th phoneme value vector of layer

：

(2)

(3)

In the formula (1) and the formula (2),

is shown aslThe key matrix of the layer(s),

is shown aslA matrix of values for the layers; reLU denotes the activation function; "·" denotes the multiplication of a matrix and a vector;

：

(4)

In the formula (3), "·" represents a vector inner product;

：

(5)

：

(6)

In the formula (5), sigmoid represents an activation function,

and

respectively representlA first reset matrix and a second reset matrix of the layer,

is shown aslA reset offset vector of a layer;

|l=1,2, \ 8230;, L }, and thus enhanced n _i Letter code vector

(7)

In the formula (7), the reaction mixture is,

the product of the hadamard is represented,

4. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that enables the processor to perform the transliteration method of any of claims 1-3, and wherein the processor is configured to execute the program stored in the memory.

5. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the transliteration method according to any one of claims 1 to 3.