CN115662392B

CN115662392B - Transliteration method based on phoneme memory, electronic equipment and storage medium

Info

Publication number: CN115662392B
Application number: CN202211595293.3A
Authority: CN
Inventors: 宋彦; 田元贺
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-04-25
Anticipated expiration: 2042-12-13
Also published as: CN115662392A

Abstract

The invention discloses a transliteration method based on phoneme memory, electronic equipment and a storage medium, comprising the following steps: 1. extracting transliterated words and splitting the transliterated words into letters, 2, constructing a phoneme library, and extracting phoneme features associated with each letter; 3. constructing an L-layer encoder, and encoding letters to obtain letter encoding vectors of each layer corresponding to each letter; 4. establishing an L-layer phoneme memory network for modeling the letter code vectors and the phoneme characteristics to obtain a letter code matrix; 5. inputting the letter coding matrix and the target letter output by the former t-moment classifier into an L-layer decoder, and sending the obtained letter prediction vector output by the t-moment decoder into the classifier to obtain the target letter predicted at the t moment; 6. assigning t+1 to T, and repeating step 5 until time T, thereby obtaining the predicted letter sequence. The invention aims to fuse the phoneme characteristics into the standard text generation process, so that the transliteration quality and effect can be improved.

Description

Transliteration method based on phoneme memory, electronic equipment and storage medium

Technical Field

The invention belongs to the field of natural language processing, and particularly relates to a transliteration method based on phoneme memory, electronic equipment and a storage medium.

Background

Transliteration refers to the translation of a person name, e.g., smith, in a source language into a target language, e.g., chinese, text, e.g., smith, without changing the pronunciation of the name in the source language. For example, the name "Smith" in English in the source language is transliterated to "Smith" in Chinese.

The existing methods mostly consider this task as a sequence-to-sequence generation task, and use advanced encoders and decoders to generate name transliterations of the target language, and lack of utilization of speech features, particularly phoneme features, in the source language and the target language, thereby causing words generated by transliterations to lose pronunciation features of the source language, resulting in reduced accuracy of transliterations.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides a transliteration method, electronic equipment and a storage medium based on phoneme memory, so that phoneme features can be fused into a standard text generation process, and the quality and effect of transliteration can be improved.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

the transliteration method based on phoneme memory is characterized by comprising the following steps:

step 1, extracting a plurality of transliterated words from a source language corpus, and splitting each word into letters; wherein the ith word X _i The split letter sequence is recorded as {x _i,1 ,…x _i,j ,…,

}，x _i,j Representing the ith word X _i The j-th letter, n _i Representing the ith word X _i The total number of medium letters;

step 2, selecting the j-th letter from the phoneme libraryx _i,j Associated m phoneme features and forming a phoneme feature setS _i,j ={s _i,j,1 …s _i,j,u ,…s _i,j,m And } wherein,s _i,j,u for and j-th letterx _i,j Associated u-th phoneme features, m being the total number of associated phoneme features;

step 3, constructing a transliteration network, which comprises the following steps: an encoder of an L layer, a phoneme memory network of the L layer, a decoder of the L layer and a classifier;

step 3.1, processing of an encoder:

will j-th letterx _i,j Conversion to the jth letter directionMeasuring amount

Then input into the encoder, and after the multi-head self-attention layer processing of the L layers, L letter coding vectors { about are obtained from the L layers respectively>

|l=1, 2, …, L }; wherein, the liquid crystal display device comprises a liquid crystal display device,

represent the firstlThe j-th letter code vector output by the multi-head self-attention layer of the layer;

step 3.2, processing of a phoneme memory network:

assembling phoneme featuresS _i,j Conversion to a phoneme vector set {

After |u=1, 2, …, m }, and { +.>

|l=1, 2, …, L } are input together into the phoneme memory network for processing to obtain enhanced n _i The individual letter code vector { {>

|l=1,2,…,L;j=1,2,…,n _i And is noted as the i-th word X _i Letter code matrix H _i The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>

Representing the u-th phonemes _i,j,u Is a phoneme vector of (a); />

Representing the enhanced j-th letter code vector;

step 3.3, processing of a decoder:

coding the letters into matrix H _i Inputting the target letter output by the previous t moment classifier into an L-layer decoder together, and obtaining a letter predictive vector h output by the t moment decoder _i,t The method comprises the steps of carrying out a first treatment on the surface of the When t=1, the letter output by the classifier at the previous t moment is made to be empty;

step 3.4, processing of a classifier:

the classifier uses the full-connection layer to predict the vector h of the letter output by the decoder at the moment t _i,t Processing to obtain the current t moment for the ith word X _i Predicted target letter y _i,t ；

Step 3.5, after assigning t+1 to t, returning to step 3.3 for sequential execution untilTUntil the moment, thereby obtaining the ith word X _i Is the predicted letter sequence { y } _i,1 ,…, y _i,t ,…, y _i,T }。

The transliteration method based on phoneme memory is also characterized in that the step 2 comprises the following steps:

step 2.1, calculating the j-th letter using the formula (1)x _i,j And the q-th phoneme feature in the phoneme librarys _q Point-by-point mutual information PMIx _i,j ,s _q ) Thereby obtaining the j-th letterx _i,j Point-to-point mutual information { PMI } with all M phoneme featuresx _i,j ,s _q )|1≤q≤M }；MRepresenting the number of all the phoneme features in the phoneme library;

(1)

in the formula (1), p is%x _i,j ,s _q ) Representing the j-th letterx _i,j And the q-th phoneme features _q Probability of co-occurrence; p%x _i,j ) Representing the j-th letterx _i,j Appear in the ith word X _i Probability of (a); p%s _q ) Representing the q-th phoneme features _q Appear in the ith word X _i Is a probability in pronunciation of (a);

step 2.2 from the point-by-point mutual information { PMI }x _i,j ,s _q ) M-most M are selected from the I1, q and MPhoneme characteristic corresponding to high point-by-point mutual information and forming phoneme characteristic setS _i,j ={s _i,j,1 …s _i,j,u ,…s _i,j,m }。

The step 3.2 comprises:

step 3.2.1, the u-th phoneme is processeds _i,j,u Conversion to a u-th phoneme vector

After that, and->

Input together the firstlIn the phoneme memory network of the layer, the firstlThe layer phoneme memory network uses the formula (2) and the formula (3) to +.>

Mapping to obtain the firstlThe u-th phoneme key vector of the layer +.>

And (d)lThe u-th phoneme value vector of the layer +.>

：

(2)

(3)

In the formulas (2) and (3),

represent the firstlKey matrix of layer->

Represent the firstlA matrix of values for the layer; reLU represents an activation function; "·" represents multiplication of the matrix and the vector;

step 3.2.2, the firstlOf layers ofThe phoneme memory network calculates the first using (4)lLayer(s) th phoneme weight

：

(4)/>

In formula (4), "·" represents the vector inner product;

step 3.2.3, the firstlThe phoneme memory network of the layer calculates a weighted average vector using (5)

：

(5)

Step 3.2.4, the firstlThe phoneme memory network of the layer is obtained by using the method (6)lLayer j letter reset vector

：

(6)

In the formula (6), sigmoid represents an activation function,

and->

Respectively represent the firstlFirst and second reset matrix of layer, < > in>

Represent the firstlA reset offset vector for a layer;

step 3.2.5, the firstlThe phoneme memory network of the layer is obtained by using the method (7)lLayer enhanced jth alphabet encoding vector

Thereby outputting the enhanced j-th letter code vector { { about }, from the L-layer phoneme memory network>

|l=1, 2, …, L }, thereby obtaining enhanced n _i The individual letter code vector { {>

|l=1,2,…,L, j=1,2,…,n _i And is noted as the i-th word X _i Letter code matrix H _i ；

(7)

In the formula (7), the amino acid sequence of the compound,

representing Hadamard product, ->

Representing a series of vectors, 1 representing a vector with all dimension values of 1.

The invention provides an electronic device comprising a memory and a processor, wherein the memory is used for storing a program for supporting the processor to execute the transliteration method, and the processor is configured to execute the program stored in the memory.

The invention relates to a computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the steps of the transliteration method.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, for each letter in the input, the representation of the letter is enhanced by utilizing the phoneme characteristic associated with the letter through the L-layer phoneme memory neural network, so that the understanding of the model on the pronunciation characteristic of the target language is enhanced, the transliterated text generated by the model keeps the phonetic characteristics of the source language as much as possible, and the transliterated performance of the model is improved.

2. According to the invention, by weighting different phoneme features, the importance of different phoneme features is identified and utilized, and the influence of potential noise in the phoneme features on the model performance is effectively avoided.

Drawings

FIG. 1 is a flow chart of the transliteration method of the present invention.

Detailed Description

In this embodiment, a transliteration method based on phoneme memory, as shown in fig. 1, is performed according to the following steps:

}，x _i,j Representing the ith word X _i The j-th letter, n _i Representing the ith word X _i The total number of medium letters; for example, the 4 transliterated words extracted from the english source language corpus are { Tom, smith, bob, cook }, and then after word splitting, the letter sequence of the 2 nd word "Smith" after splitting is { "S", "m", "i", "t", "h" }, where there are 5 letters in total, and the 3 rd letter is "i".

Step 2, selecting the j-th letter from the phoneme libraryx _i,j Associated m phoneme features and forming a phoneme feature setS _i,j ={s _i,j,1 …s _i,j,u ,…s _i,j,m And } wherein,s _i,j,u for and j-th letterx _i,j Associated u-th phoneme features, m being the total number of associated phoneme features; for example, a phone library is a collection of all international phonetic symbols { "a", "e", "o", "t", "g", "k", "i", "i:", "and" a "phone library. _I ", … }. When m=3, the extract is the same as the 3 rd oneThe set of phoneme features associated with the letter "i" is { "i:", ". _I "," i "}. The 1 st phoneme feature associated with the 3 rd letter is "i:".

(1)

in the formula (1), p is%x _i,j ,s _q ) Representing the j-th letterx _i,j And the q-th phoneme features _q Probability of co-occurrence; p%x _i,j ) Representing the j-th letterx _i,j Appear in the ith word X _i Probability of (a); p%s _q ) Representing the q-th phoneme features _q Appear in the ith word X _i Is a probability in pronunciation of (a). For example, the process of calculating the 3 rd letter "i" and the 8 th phoneme feature "i:" in the phoneme library is. The probability of co-occurrence of the 3 rd letter "i" with the 8 th phoneme feature "i:" in the phoneme library is calculated to be 0.6, the probability of occurrence of the 3 rd letter "i" in the 2 nd word "Smith" is calculated to be 0.3, and the probability of occurrence of the 8 th phoneme feature "i:" in the pronunciation of "Smith" is calculated to be 0.5. And (3) calculating the point-to-point mutual information of the 3 rd letter 'i' and the 8 th phoneme characteristic 'i:' in the phoneme library by using the formula (1) to obtain 2. By adopting the same method, the point-by-point mutual information of the 3 rd letter 'i' and all the phoneme features in the phoneme library can be calculated.

Step 2.2 from the point-by-point mutual information { PMI }x _i,j ,s _q ) The M highest point-by-point mutual information pairs are selected from the I1, q and MThe corresponding phoneme features and forming a phoneme feature setS _i,j ={s _i,j,1 …s _i,j,u ,…s _i,j,m }. For example, the 3 rd phoneme feature with the highest score of the 3 rd letter "i" is { "i:", ". _I ”, “i”}。

step 3.1, processing of an encoder:

will j-th letterx _i,j Conversion to the jth letter vector

represent the firstlThe j-th letter code vector output by the multi-head self-attention layer of the layer; for example, when l=6, the 3 rd letter "i" is converted into a letter vector first, and then after the processing of the multi-headed self-attention layer of 6 layers, the 6-letter code vector of the 3 rd letter "i" is obtained.

Step 3.2, processing of a phoneme memory network:

assembling phoneme featuresS _i,j Conversion to a phoneme vector set {

After |u=1, 2, …, m }, and { +.>

Representing the u-th phonemes _i,j,u Is a phoneme vector of (a); />

Representing the enhanced j-th letter code vector; for example, the phoneme feature set of the 3 rd letter "i" { "i:", "is shown. _I "i" is converted to a set of 3 phoneme vectors which are input into the phoneme memory network together with the 6 letter code vectors of the 3 rd letter "i" to obtain the enhanced 3 rd letter code vector

Then the same operation is carried out on all letters to obtain a letter coding matrix H of the 2 nd word Smith ₂ 。

Step 3.2.1, feature the u-th phonemes _i,j,u Conversion to a u-th phoneme vector

After that, and->

Mapping to obtain the firstlThe u-th phoneme key vector of the layer +.>

And (d)lThe u-th phoneme value vector of the layer +.>

：

(2)

(3)

In the formulas (2) and (3),

represent the firstlKey matrix of layer->

Represent the firstlA matrix of values for the layer; reLU represents an activation function; "·" represents multiplication of the matrix and the vector; for example, the 1 st phoneme feature "i" is converted into a 1 st phoneme vector, and the 1 st phoneme key vector and the hidden vector of the 3 rd letter "i" are input into the 4 th phoneme memory network to obtain the 1 st phoneme key vector of the 4 th layer and the first phoneme value vector of the 4 th layer.

Step 3.2.2, the firstlThe phoneme memory network of the layer calculates the first using (4)lLayer(s) th phoneme weight

：

(4)

In formula (4), "·" represents the vector inner product; for example, the 4 th layer of the 3 rd letter "i" has three total phoneme weights, the first being 0.5, the second being 0.3, and the third being 0.2.

：

(5)

For example, the weighted average vector of the 4 th layer of the 3 rd letter "i" is the weighted average of the value vectors of the 4 th layer of the 3 rd letter "i", which in turn is 0.5,0.3,0.2.

：

(6)

In the formula (6), sigmoid represents an activation function,

and->

Represent the firstlLayer reset offset vector. Due to the nature of the sigmoid activation function, the j-th letter reset vector of the first layer +.>

The value of each dimension is between 0 and 1, representing the reset weight of each dimension of the vector.

(7)

In the formula (7), the amino acid sequence of the compound,

representing Hadamard product, ->

And->

Play a role inlThe j-th letter code vector of the layer +.>

And the firstlThe j-th weighted average vector of layers +.>

The respective dimensions weight the contribution.

Step 3.3, processing of a decoder:

coding the letters into matrix H _i Inputting the target letter output by the previous t moment classifier into an L-layer decoder together, and obtaining a letter predictive vector h output by the t moment decoder _i,t The method comprises the steps of carrying out a first treatment on the surface of the When t=1, the letter output by the classifier at the previous t moment is made to be empty; for example, when t=1, the input to the decoder is the letter encoding matrix H _i And a special letter { "representing an empty letter"

"}. When t=3, the input to the decoder is the letter encoding matrix H _i And the target letters that the classifier has output { "history", "secret" }.

Step 3.4, processing of a classifier:

the classifier uses the full-connection layer to predict the vector h of the letter output by the decoder at the moment t _i,t Processing to obtain the current t moment for the ith word X _i Predicted target letter y _i,t The method comprises the steps of carrying out a first treatment on the surface of the For example, when t=1, the target letter predicted for the 2 nd word "Smith" at time t is "history"; when t=3, the target letter predicted for the 2 nd word "Smith" at time t is "si".

Step 3.5, after assigning t+1 to t, returning to step 3.3 for sequential execution untilTUntil the moment, thereby obtaining the ith word X _i Is the predicted letter sequence { y } _i,1 ,…, y _i,t ,…, y _i,T }. Determining the time of dayTThe criteria for (a) is that,Ttime +1 ith word X _i The predicted letter is'

". For example, word 2X at time t=4 _i Predictive letter is "">

"then t=3, the predicted letter sequence {" history "," secret "," s "} for the 2 nd word" Smith "can be obtained.

In this embodiment, an electronic device includes a memory for storing a program for supporting the processor to execute the transliteration method described above, and a processor configured to execute the program stored in the memory.

In this embodiment, a computer readable storage medium stores a computer program, which when executed by a processor, causes the steps of the transliteration method to be performed.

Claims

1. A transliteration method based on phoneme memory is characterized by comprising the following steps:

step 1, extracting a plurality of transliterated words from a source language corpus, and splitting each word into letters; wherein, the firsti words X _i The split letter sequence is recorded as {x _i,1 ,…x _i,j ,…,

step 3.1, processing of an encoder:

will j-th letterx _i,j Conversion to the jth letter vector

|l=1, 2, …, L }; wherein (1)>

step 3.2, processing of a phoneme memory network:

assembling phoneme featuresS _i,j Conversion to a phoneme vector set {

After |u=1, 2, …, m }, and { +.>

Representing the u-th phonemes _i,j,u Is a phoneme vector of (a); />

Representing the enhanced j-th letter code vector;

step 3.3, processing of a decoder:

step 3.4, processing of a classifier:

2. The transliteration method based on phoneme memory as claimed in claim 1, wherein said step 2 comprises:

step 2.1, calculating the j-th letter using the formula (1)x _i,j And the q-th phoneme feature in the phoneme librarys _q Point-by-point mutual information PMIx _i,j ,s _q ) Thereby obtaining the j-th letterx _i,j Point-to-point mutual information { PMI } with all M phoneme featuresx _i,j ,s _q )|1≤q≤M}；MRepresenting the number of all the phoneme features in the phoneme library;

(1)

step 2.2 from the point-by-point mutual information { PMI }x _i,j ,s _q ) Selecting the phoneme features corresponding to M highest point-by-point mutual information from the I1, q and M, and forming a phoneme feature setS _i,j ={s _i,j,1 …s _i,j,u ,…s _i,j,m }。

3. The transliteration method based on phoneme memory as claimed in claim 1, wherein the step 3.2 comprises:

After that, and->

Mapping to obtain the firstlThe u-th phoneme key vector of the layer +.>

And (d)lThe u-th phoneme value vector of the layer +.>

：

(2)

(3)

In the formulas (2) and (3),

represent the firstlKey matrix of layer->

：

(4)

In formula (4), "·" represents the vector inner product;

：

(5)

：

(6)

In the formula (6), sigmoid represents an activation function,

and->

Represent the firstlA reset offset vector for a layer;

(7)

In the formula (7), the amino acid sequence of the compound,

representing Hadamard product, ->

4. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that supports the processor to perform the transliteration method of any one of claims 1-3, the processor being configured to execute the program stored in the memory.

5. A computer readable storage medium having a computer program stored thereon, characterized in that the computer program when executed by a processor performs the steps of the transliteration method of any of claims 1 to 3.