CN112148856B

CN112148856B - Method and device for establishing punctuation prediction model

Info

Publication number: CN112148856B
Application number: CN202011000536.5A
Authority: CN
Inventors: 梁鸣心; 付晓寅; 张辽
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2024-01-23
Anticipated expiration: 2040-09-22
Also published as: CN112148856A

Abstract

The application discloses a method and a device for establishing a punctuation prediction model, and relates to the technical fields of deep learning and natural language processing. The implementation scheme adopted in the process of establishing the punctuation prediction model is as follows: acquiring training data; inputting the punctuation-free text into a coding component in a transformation model to obtain a coding result output by the coding component; inputting the punctuation marking result and the coding result corresponding to the punctuation-free text into a decoding component in the transform model to obtain a decoding result output by the decoding component; determining a predicted sequence corresponding to the decoding result, and obtaining a training tag sequence according to a punctuation marking result corresponding to the predicted sequence and the punctuation-free text; and training the decoding component according to the coding result, the prediction sequence and the training label sequence, and taking the coding component and the decoding component obtained by training as punctuation prediction models. The punctuation prediction model accuracy in the punctuation prediction process can be improved.

Description

Method and device for establishing punctuation prediction model

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a method, an apparatus, an electronic device, and a readable storage medium for establishing a punctuation prediction model in the field of natural language processing technologies.

Background

Automatic speech recognition systems are an important component of man-machine interaction today, and are also a common way of user input on intelligent terminals. However, speech recognition systems output typically punctuation-free text, which suffers from the following drawbacks: 1) In a man-machine interaction scene, an interaction system generally needs to transfer a voice instruction sent by a user into a text, and the text is subjected to natural language processing so as to interact, and the nonstandard text generally has an ambiguity problem, so that the performance of a natural language processing task and the fluency of the interaction system are seriously influenced; 2) In a speech transcription scene, punctuation-free text cannot be used for rapidly dividing semantic units for users, starting and ending positions of sentence sentences cannot be defined, and readability is poor. Along with the enrichment of human-computer interaction scenes, the semantics are more and more complex, and the defects of serious punctuation-free text ambiguity and poor readability are more obvious.

The prior art typically implements predictions of punctuation contained in punctuation-free text based on a transducer model. However, when punctuation is predicted based on the transducer model, a problem of incorrect punctuation prediction occurs, which results in a decrease in the degree of mismatch during training prediction in the transducer model due to a change in the sequence lengths of the input and output, thereby decreasing the accuracy of punctuation prediction.

Disclosure of Invention

The technical scheme adopted by the application for solving the technical problem is to provide a method for establishing a punctuation prediction model, which comprises the following steps: acquiring training data, wherein the training data comprises a plurality of punctuation-free texts and punctuation marking results corresponding to the punctuation-free texts; inputting the punctuation-free text into a coding component in a transformation model to obtain a coding result output by the coding component; inputting a punctuation marking result corresponding to the punctuation-free text and the coding result into a decoding component in a transform model to obtain a decoding result output by the decoding component; determining a predicted sequence corresponding to the decoding result, and obtaining a training tag sequence according to a punctuation marking result corresponding to the predicted sequence and the punctuation-free text; and training the decoding component according to the coding result, the prediction sequence and the training label sequence, and taking the coding component and the decoding component obtained by training as punctuation prediction models.

The technical scheme adopted by the application for solving the technical problem is to provide a device for establishing a punctuation prediction model, which comprises the following components: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring training data, and the training data comprises a plurality of punctuation-free texts and punctuation marking results corresponding to the punctuation-free texts; the coding unit is used for inputting the punctuation-free text into a coding component in the transducer model to obtain a coding result output by the coding component; the decoding unit is used for inputting the punctuation marking result corresponding to the punctuation-free text and the coding result into a decoding component in the transform model to obtain a decoding result output by the decoding component; the determining unit is used for determining a predicted sequence corresponding to the decoding result and obtaining a training tag sequence according to a punctuation marking result corresponding to the predicted sequence and the punctuation-free text; and the training unit is used for training the decoding component according to the coding result, the prediction sequence and the training label sequence, and taking the coding component and the decoding component obtained by training as punctuation prediction models.

An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.

A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above method.

A computer program product comprising a computer program which, when executed by a processor, implements the method described above.

One embodiment of the above application has the following advantages or benefits: according to the method and the device, accuracy of punctuation prediction of the punctuation prediction model obtained based on the transformation model can be improved, continuous punctuation prediction can be achieved, and therefore better expansibility is achieved. Because the technical means that the decoding component in the transformation former model carries out twice processing according to the corresponding input in the training process is adopted, the technical problem that the unmatched degree in the training prediction in the transformation former model is reduced due to the occurrence of punctuation prediction errors in the prior art is solved, and the technical effect of improving the accuracy of the punctuation prediction model in the punctuation prediction is realized.

Other effects of the above alternative will be described below in connection with specific embodiments.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present application;

FIG. 2 is a schematic diagram according to a second embodiment of the present application;

FIG. 3 is a schematic diagram according to a third embodiment of the present application;

FIG. 4 is a schematic diagram according to a fourth embodiment of the present application;

FIG. 5 is a schematic diagram according to a fifth embodiment of the present application;

FIG. 6 is a block diagram of an electronic device for implementing a method of building punctuation predictive models in accordance with an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present application. As shown in fig. 1, the method for establishing a punctuation prediction model of the present embodiment may specifically include the following steps:

S101, acquiring training data, wherein the training data comprises a plurality of punctuation-free texts and punctuation marking results corresponding to the punctuation-free texts;

s102, inputting the punctuation-free text into a coding component in a transformation model to obtain a coding result output by the coding component;

s103, inputting a punctuation marking result corresponding to the punctuation-free text and the coding result into a decoding component in a transform model to obtain a decoding result output by the decoding component;

s104, determining a predicted sequence corresponding to the decoding result, and obtaining a training tag sequence according to a punctuation marking result corresponding to the predicted sequence and the punctuation-free text;

s105, training the decoding component according to the coding result, the prediction sequence and the training label sequence, and taking the coding component and the decoding component obtained by training as punctuation prediction models.

According to the method for establishing the punctuation prediction model, the decoding component in the transformation former model is adopted to perform twice processing according to the corresponding input in the training process, so that the unmatched degree of the original transformation former model in training and prediction can be reduced, and the accuracy of the punctuation prediction model in the punctuation prediction obtained in training is improved; and because the Encoder-Decode structure of the original transducer model is maintained, the length limitation of the input and the output of the punctuation prediction model is avoided, the continuous punctuation prediction can be realized, and the punctuation prediction model has better expansibility.

The embodiment executes the punctuation marking result of the corresponding non-punctuation text obtained in S101, namely the text containing the punctuation obtained after marking the punctuation in the non-punctuation text. For example, if the obtained punctuation-free text is "hello needs help", the punctuation marking result corresponding to the punctuation-free text is "hello needs help? ".

In this embodiment, after executing S101 to obtain a training data including a plurality of punctuation-free texts and punctuation labeling results corresponding to the punctuation-free texts, S102, S103, S104 and S105 may be executed respectively for each punctuation-free text and corresponding punctuation labeling result in the training data, so as to implement training of a transducer model according to the obtained training data, thereby obtaining a punctuation prediction model.

The embodiment executes S102 to take the punctuation-free text as an input of the encoding component in the transducer model, and obtains a decoding result output by the decoding component. The encoding result obtained in this embodiment is a vector sequence corresponding to the punctuation-free text.

The specific process when the embodiment executes S102 to obtain the encoding result of the corresponding punctuation-free text is: performing embedded processing on each word in the punctuation-free text, and then performing position coding; the position coding result of each word is input into a coding component comprising a plurality of encoders, wherein each encoder comprises a self-attention layer and a feed-forward network layer respectively, and a vector sequence obtained after all the encoders in the coding component are processed is used as a coding result.

After the implementation S102 obtains the encoding result output by the encoding component, the implementation S103 uses the punctuation mark result corresponding to the punctuation-free text and the encoding result of the punctuation-free text as the input of the decoding component in the transform model, so as to obtain the decoding result output by the decoding component.

The decoding result obtained in this embodiment is a predicted tag sequence corresponding to the punctuation-free text, where the predicted tag sequence includes specific tags and punctuation, and each specific tag represents a corresponding character and then does not contact the punctuation.

For example, if the punctuation-free text is "hello needs help," the decoding component may output a decoding result of "OO, ooooooo? The "</s >", "</s >" is an end-of-sentence symbol, "O" is a specific tag, "," and "? "is punctuation.

The specific process of executing step S103 when obtaining the decoding result according to the punctuation mark result and the encoding result corresponding to the punctuation-free text is as follows: performing embedded processing on each word and punctuation in the punctuation marking result corresponding to the punctuation-free text, and then performing position coding; inputting the coding result, the position coding of each word and the position coding result of each punctuation into a decoding component comprising a plurality of decoders, wherein each decoder comprises a mask self-attention layer, a self-attention layer and a feed-forward network layer respectively, and the prediction tag sequence obtained after all the decoders in the decoding component are processed is used as a decoding result.

It can be understood that, in this embodiment, when inputting the punctuation marking result corresponding to the punctuation-free text into the decoding component, the punctuation marking result shifted to the right by one bit is used as the input of the decoding component, that is "< S > hello, does need to be assisted? "input decoding component, wherein" < S > "is a start of period symbol.

After the decoding result output by the decoding component is obtained in S103, the embodiment executes S104 to determine a prediction sequence corresponding to the decoding result, and obtains a training tag sequence according to a punctuation marking result corresponding to the prediction sequence and the punctuation-free text, where the prediction sequence includes characters and punctuations.

In the prior art, the problem of punctuation prediction errors occurs when punctuation is predicted based on a transducer model, and a predicted sequence obtained by the misprediction has different lengths from an input sequence, so that the mismatching degree in training and prediction in the transducer model is reduced due to the change of the lengths of the input sequence and the output sequence, thereby reducing the accuracy of punctuation prediction.

In order to solve the above-mentioned problems, the present embodiment proposes a sampling method for variable-length sequences, which changes the length of the sequences used in the training process of the decoding component according to the error of punctuation insertion and punctuation deletion that may occur during model prediction, so as to reduce the difference between the training and the prediction of the model and improve the prediction performance of the model.

Specifically, in this embodiment, when executing S104 to obtain the training tag sequence according to the punctuation marking result corresponding to the predicted sequence and the punctuation-free text, optional implementation manners may be: comparing the determined prediction sequence with a punctuation marking result corresponding to the punctuation-free text, and determining a prediction error type; and processing the decoding result by using a sampling method corresponding to the determined prediction error type to obtain a training tag sequence, wherein the obtained training tag sequence comprises specific tags and punctuations, and each specific tag represents a corresponding character and does not contact the punctuation.

It can be understood that the decoding result output by the decoding component in this embodiment may include a predicted sequence in addition to the predicted tag sequence, that is, the decoding component outputs "OO, OOOOO? While predicting tag sequence of "is you can output" is you needed to help? Predicted sequence of < s > ". If the decoding result output by the decoding component does not include the prediction sequence, the embodiment can also combine the position information to obtain the prediction sequence corresponding to the decoding result according to the characters in the punctuation-free text.

In this embodiment, in executing S104, processing the decoding result by using a sampling method corresponding to the determined prediction error type, to obtain a training tag sequence includes: and in response to the determined prediction error type being punctuation deletion, replacing a specific tag at a position corresponding to the deleted punctuation in the decoding result by using the deleted punctuation, and obtaining a training tag sequence.

For example, if the punctuation mark corresponding to the punctuation-free text is "< s > hello, do help be needed? ", if the predicted tag sequence in the decoding result is" OOOOOOO? Is the predicted sequence corresponding to the predicted tag sequence "< s > is help required for hello? "the prediction error occurred is punctuation deletion, then the deleted punctuation is used to replace" O "at the third position in the predicted tag sequence, resulting in a training tag sequence of" OO, oooooo? </s > ".

In this embodiment, in executing S104, processing the decoding result by using a sampling method corresponding to the determined prediction error type, to obtain a training tag sequence includes: and replacing the punctuation added in the decoding result by using a specific label to obtain a training label sequence in response to the determined prediction error type being punctuation insertion.

For example, if the punctuation mark corresponding to the punctuation-free text is "< s > hello, do help be needed? ", if the predicted tag sequence in the decoding result is" OO, OOO? "is the predicted sequence corresponding to the predicted tag sequence" < s > is you good, is you needed, is you help? ", if the generated prediction error is punctuation insertion, then a specific label" O "is used to replace the added punctuation in the predicted label sequence,", and the resulting training label sequence is "OO, ooooooo? </s > ".

That is, in this embodiment, the prediction error type is determined by comparing the punctuation labeling results of the prediction sequence and the punctuation-free text, and then the length of the prediction label sequence is changed according to the determined prediction error type to obtain the training label sequence, so that the training label sequence used in the training process has the same length as the prediction sequence, thereby improving the prediction accuracy of the punctuation labeling model.

After the training tag sequence is determined in S104, the embodiment executes S105 to train the decoding component according to the encoding result, the prediction sequence and the training tag sequence, so that the encoding component and the decoding component obtained by training are used as punctuation prediction models.

Specifically, in the embodiment, when executing S105 to train the decoding component according to the encoding result, the prediction sequence and the training tag sequence, the optional implementation manner may be: taking the coding result and the prediction sequence as the input of a decoding component to obtain the output of the decoding component; and adjusting parameters in the decoding assembly, such as gradient among decoders, weight corresponding to each decoder and the like, according to the training tag sequence and the output of the decoding assembly until the decoding assembly converges.

According to the embodiment, a loss function can be calculated according to the training tag sequence and the output of the decoding component, and then parameters of the decoding component are adjusted according to the calculated loss function until the decoding component converges, so that training of the decoding component is completed.

That is, when training is performed on a punctuation-free text and a punctuation marking result corresponding to the punctuation-free text in training data, the decoding component in the transformation former model needs to use different inputs to perform two times of calculation, and parameters in the decoding component are adjusted during the second time of calculation, so that the training precision of the decoding component is improved, and a punctuation prediction model formed by the decoding component based on training has higher prediction accuracy.

After training of the decoding component is completed, the encoding component and the decoding component obtained through training form a punctuation marking model, and the punctuation mark predicting model can output a predicted label sequence corresponding to the text according to the input text without the punctuation mark, so that the punctuation in the text without the marking is marked according to the predicted label sequence.

According to the method, the decoding component in the transformation former model is adopted to perform twice processing according to the corresponding input in the training process, so that the unmatched degree of the original transformation former model in training and prediction can be reduced, and the accuracy of the punctuation prediction model for predicting punctuation obtained in training is improved; and because the Encoder-Decode structure of the original transducer model is maintained, the length limitation of the input and the output of the punctuation prediction model is avoided, the continuous punctuation prediction can be realized, and the punctuation prediction model has better expansibility.

Fig. 2 is a schematic diagram of a process for training decoding components in a transducer model according to a second embodiment of the present application, and fig. 2 is a schematic diagram of a process for training decoding components in a transducer model. "do you need help" in figure 2, "group try) < s >? "is the punctuation mark result that the text of no punctuation corresponds; the Encoder outputs "Encoder output"; "OOOOOOOOO? "is the predicted tag sequence; "< s > is that no help is required for hello" is the predicted sequence obtained by the planned sampling (Schedule Sampling); "OO, OOOO? The < s > "is a training tag sequence; output Embedding; position code "Positional Encoding" means; masking a Multi-Head Attention layer "Masked Multi-Head Attention"; residual connection & normalization "Add & Norm"; multi-head Attention layer "Multi-head Attention"; feed Forward neural layer; linear transformation "Linear" representation; the Softmax function "Softmax".

Fig. 3 is a schematic diagram according to a third embodiment of the present application. FIG. 3 is an internal block diagram of a transducer model as used herein, the transducer model comprising encoding components each comprising N encoders and decoding components each comprising N decoders; each encoder comprises two layers, namely a Multi-Head self-Attention layer and a feedforward nerve layer, wherein the Multi-Head self-Attention layer and the feedforward nerve layer are respectively a Multi-Head Attention layer and a Feed Forward layer; each decoder comprises three layers, namely a mask Multi-Head Attention layer, a Multi-Head Attention layer and a feedforward nerve layer; a Decoder; an Encoder "; input Embedding; an Output Embedding representation; position code "Positional Encoding"; residual connection & normalization "Add & Norm"; linear transformation; the Softmax function "Softmax".

Fig. 4 is a schematic diagram according to a fourth embodiment of the present application. As shown in fig. 4, the method for establishing a punctuation prediction model of the present embodiment may specifically include the following steps:

s401, setting a first matrix corresponding to the masking self-attention layer in each decoder contained in the decoding component, and setting a second matrix corresponding to the self-attention layer in each decoder contained in the decoding component;

s402, completing self-attention calculation of the mask self-attention layer in each decoder by combining the first matrix, and completing self-attention calculation of the self-attention layer in each decoder by combining the second matrix.

That is, in this embodiment, the first matrix and the second matrix are set in the mask attention layer and the self-attention layer, respectively, so that when self-attention calculation is implemented by using the set matrices, information with excessive time sequence can be excluded from information currently used for prediction, so that information used for prediction is controlled within a limited time sequence length, stream decoding of a decoder is implemented, punctuation prediction is not required to be performed after identification of transfer is completed, delay of a model in punctuation prediction is reduced, and efficiency of punctuation prediction of the model is improved.

In this embodiment, S401 may obtain the second matrix according to the set first parameter, where the first parameter is used to limit the length of the self-focusing layer when using the subsequent information. For example, if the first parameter is L1, it means that the self-attention layer predicts using only information of at most L1 words after the current word.

In the prior art, when performing self-attention calculation, a calculation formula is generally used as follows:

in the formula: q, K and V represent vectors input to the mask attention layer or the self-attention layer; d, d _k The dimension of the input vector is represented.

In this embodiment, when performing the calculation of the masking self-attention layer and the self-attention layer in the decoder, the calculation formula used is:

in the formula: q, K and V represent vectors input to the mask attention layer or the self-attention layer; m is M _ij Either as a first matrix or as a second matrix.

The first matrix M used by the decoder in this embodiment ¹ _ij The method comprises the following steps:

in the formula: i represents the position index of the current word; j represents the position index of the word seen by the attention mechanism.

The second matrix M used by the decoder in this embodiment ² _ij The method comprises the following steps:

in the formula: i represents the position index of the current word; l (L) ₁ For the first parameter, it is indicated that the attention mechanism uses only up to L after the current word ₁ Predicting information of the individual words; p is p _i Representing the total number of punctuations before the current decoder input i.

In addition, the present embodiment can realize the calculation of the mask self-attention layer and the self-attention layer in the decoder using the self-attention calculation formula added with the first matrix or the second matrix, and can realize the calculation of the self-attention layer in the encoder using the self-attention calculation formula added with the third matrix obtained from the set second parameter.

The third matrix M used by the encoder in this embodiment ³ _ij The method comprises the following steps:

in the formula: i represents the position index of the current word; l (L) ₂ For the second parameter, it is indicated that the attention mechanism uses only up to L after the current word ₂ Predicting information of the individual words; j represents the position index of the word seen by the attention mechanism.

That is, the present embodiment can combine the matrix to modify the calculation formula used by the self-attention layer in the encoder, the calculation formula used by the mask self-attention layer in the decoder, and the calculation method used by the decoding self-attention layer, so that each attention layer limits the length of the subsequent information used when performing self-attention calculation, thereby realizing that the trained punctuation prediction model can perform stream punctuation.

Fig. 5 is a schematic diagram according to a fifth embodiment of the present application. As shown in fig. 5, the apparatus for establishing a punctuation prediction model of this embodiment includes:

an obtaining unit 501, configured to obtain training data, where the training data includes a plurality of punctuation-free texts and punctuation marking results corresponding to the punctuation-free texts;

the coding unit 502 is used for inputting the punctuation-free text into a coding component in the transducer model to obtain a coding result output by the coding component;

the decoding unit 503 is configured to input a punctuation marking result corresponding to the punctuation-free text and the encoding result into a decoding component in the transform model, so as to obtain a decoding result output by the decoding component;

a determining unit 504, configured to determine a predicted sequence corresponding to the decoding result, and obtain a training tag sequence according to a punctuation marking result corresponding to the predicted sequence and the punctuation-free text;

the training unit 505 is configured to train the decoding component according to the encoding result, the prediction sequence, and the training tag sequence, and take the encoding component and the decoding component obtained by training as punctuation prediction models.

The punctuation marking result of the corresponding punctuation-free text obtained by the obtaining unit 501 is the text containing the punctuation obtained after marking the punctuation in the punctuation-free text.

After acquiring the training data including a plurality of punctuation texts and punctuation labeling results corresponding to the punctuation texts, the acquiring unit 501 may process each punctuation text and the corresponding punctuation labeling result in the training data by the encoding unit 502, the decoding unit 503, the determining unit 504 and the training unit 505, so as to complete training of the transducer model according to the acquired training data, thereby obtaining a punctuation prediction model.

The encoding unit 502 takes the punctuation-free text as an input to an encoding component in the transducer model, and obtains a decoding result output by a decoding component. The encoding result obtained by the encoding unit 402 is a vector sequence corresponding to the punctuation-free text.

After the encoding unit 502 obtains the encoding result output by the encoding component, the decoding unit 503 uses the punctuation mark result corresponding to the punctuation free text and the encoding result of the punctuation free text as the input of the decoding component in the transform model, and obtains the decoding result output by the decoding component.

The decoding result obtained by the decoding unit 503 is a predicted tag sequence corresponding to the punctuation-free text, where the predicted tag sequence includes specific tags and punctuation, and each specific tag represents a corresponding character and then does not contact the punctuation.

It can be understood that, when inputting the punctuation mark result corresponding to the punctuation-free text into the decoding component, the decoding unit 503 takes the punctuation mark result shifted to the right by one bit as the input of the decoding component, that is "< S > hello, does need to be assisted? "input decoding component, wherein" < S > "is a start of period symbol.

After obtaining the decoding result output by the decoding component, the decoding unit 503 determines a prediction sequence corresponding to the decoding result by the determining unit 504, and obtains a training tag sequence according to the punctuation marking result corresponding to the prediction sequence and the punctuation-free text, where the prediction sequence includes characters and punctuations.

Specifically, when the determining unit 504 obtains the training tag sequence according to the punctuation labeling result corresponding to the predicted sequence and the non-punctuation text, the optional implementation manner may be: comparing the determined prediction sequence with a punctuation marking result corresponding to the punctuation-free text, and determining a prediction error type; and processing the decoding result by using a sampling method corresponding to the determined prediction error type to obtain a training tag sequence, wherein the obtained training tag sequence comprises specific tags and punctuations, and each specific tag represents a corresponding character and does not contact the punctuation.

It will be appreciated that the decoding result outputted by the decoding component in the decoding unit 503 may include a predicted sequence in addition to the predicted tag sequence, i.e. the decoding component outputs "OO, OOOOO? While predicting tag sequence of "is you can output" is you needed to help? Predicted sequence of < s > ". If the decoding result output by the decoding component does not include the prediction sequence, the determining unit 504 may further combine the position information to obtain the prediction sequence corresponding to the decoding result according to the text in the punctuation-free text.

The determining unit 504 processes the decoding result by using a sampling method corresponding to the determined prediction error type, and obtains a training tag sequence including: and in response to the determined prediction error type being punctuation deletion, replacing a specific tag at a position corresponding to the deleted punctuation in the decoding result by using the deleted punctuation, and obtaining a training tag sequence.

The determining unit 504 processes the decoding result by using a sampling method corresponding to the determined prediction error type, and obtains a training tag sequence including: and replacing the punctuation added in the decoding result by using a specific label to obtain a training label sequence in response to the determined prediction error type being punctuation insertion.

That is, the determining unit 504 determines the prediction error type by comparing the punctuation labeling result of the prediction sequence corresponding to the punctuation-free text, and further changes the length of the prediction tag sequence according to the determined prediction error type to obtain the training tag sequence, so that the training tag sequence used in the training process has the same length as the prediction sequence, thereby improving the prediction accuracy of the punctuation labeling model.

After determining the training tag sequence, the training unit 505 trains the decoding component according to the encoding result, the prediction sequence and the training tag sequence, so that the encoding component and the decoding component obtained by training are used as punctuation prediction models.

Specifically, when the training unit 505 trains the decoding component according to the encoding result, the predicted sequence and the training tag sequence, the optional implementation manner may be: taking the coding result and the prediction sequence as the input of a decoding component to obtain the output of the decoding component; and adjusting parameters in the decoding assembly, such as gradient among decoders, weight corresponding to each decoder and the like, according to the training tag sequence and the output of the decoding assembly until the decoding assembly converges.

The training unit 505 may calculate a loss function according to the training tag sequence and the output of the decoding component, and further adjust parameters of the decoding component according to the calculated loss function until the decoding component converges, so as to complete training of the decoding component.

That is, when training is performed on one punctuation-free text and the corresponding punctuation marking result in the training data, the training unit 505 needs to perform two computations using different inputs for the decoding component in the transform model, and adjusts parameters in the decoding component during the second computation, so that the training precision of the decoding component is improved, and the punctuation prediction model formed by the decoding component based on training has higher prediction accuracy.

After training the decoding component, the training unit 505 constructs a punctuation marking model from the encoding component and the decoding component obtained by training, and can output a predicted tag sequence corresponding to the text according to the inputted text without punctuation, thereby marking the punctuation in the text without marking according to the predicted tag sequence.

In addition, the present embodiment may further include a setting unit 506 configured to perform: setting a first matrix corresponding to the masking self-attention layer in each decoder included in the decoding component, and setting a second matrix corresponding to the self-attention layer in each decoder included in the decoding component; the self-attention calculation of the masking self-attention layer in each decoder is done in combination with the first matrix and the self-attention calculation of the self-attention layer in each decoder is done in combination with the second matrix.

That is, the setting unit 506 sets the first matrix and the second matrix in the mask attention layer and the self-attention layer, respectively, so that when performing self-attention calculation by the set matrices, information with excessive timing can be excluded from information currently used for prediction, so that information used for prediction is controlled within a limited timing length, stream decoding of a decoder is realized, punctuation prediction is not required to be performed after the end of recognition transfer, delay of a model in performing punctuation prediction is reduced, and efficiency of punctuation prediction by the model is improved.

Wherein the setting unit 506 may obtain the second matrix according to a set first parameter, where the first parameter is used to limit the length of the self-care layer in using the subsequent information.

The setting unit 506 can set a third matrix, which is obtained according to the set second parameter, to realize the calculation of the self-attention layer in the encoder, in addition to the first matrix or the second matrix.

According to embodiments of the present application, there is also provided an electronic device, a computer-readable storage medium, and a computer program product.

As shown in fig. 6, a block diagram of an electronic device of a method of creating punctuation prediction model based on a transducer model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 6, the electronic device includes: one or more processors 601, memory 602, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 601 is illustrated in fig. 6.

Memory 602 is a non-transitory computer-readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for establishing a punctuation prediction model based on a transducer model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of establishing a punctuation prediction model based on a transducer model provided by the present application.

The memory 602 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules (e.g., the acquisition unit 501, the encoding unit 502, the decoding unit 503, the determination unit 504, the training unit 505, and the setting unit 506 shown in fig. 5) corresponding to the method of searching for an emoticon in the embodiments of the present application. The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, i.e., implements the method of establishing a punctuation prediction model based on a transducer model in the above-described method embodiment.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the electronic device, etc. In addition, the memory 602 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 602 optionally includes memory remotely located with respect to processor 601, which may be connected via a network to the electronic device of the method of building punctuation predictive models based on the transducer model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for establishing the punctuation prediction model based on the transducer model may further include: an input device 603 and an output device 604. The processor 601, memory 602, input device 603 and output device 604 may be connected by a bus or otherwise, for example in fig. 6.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device for a method of building punctuation predictive models based on a transducer model, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, and the like. The output means 604 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome.

According to the technical scheme of the embodiment of the application, the decoding component in the transformation former model is adopted to perform twice processing according to the corresponding input in the training process, so that the unmatched degree of the original transformation former model in training and prediction can be reduced, and the accuracy of the punctuation prediction model for predicting punctuation obtained in training is improved; and because the Encoder-Decode structure of the original transducer model is maintained, the length limitation of the input and the output of the punctuation prediction model is avoided, the continuous punctuation prediction can be realized, and the punctuation prediction model has better expansibility.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of building a punctuation predictive model, comprising:

acquiring training data, wherein the training data comprises a plurality of punctuation-free texts and punctuation marking results corresponding to the punctuation-free texts;

inputting the punctuation-free text into a coding component in a transformation model to obtain a coding result output by the coding component;

inputting a punctuation marking result corresponding to the punctuation-free text and the coding result into a decoding component in a transform model to obtain a decoding result output by the decoding component, wherein the decoding result is a predicted tag sequence corresponding to the punctuation-free text, the predicted tag sequence comprises specific tags and punctuation, and each specific tag represents a corresponding character and then does not contact the punctuation;

Determining a predicted sequence corresponding to the decoding result, and obtaining a training tag sequence according to a punctuation marking result corresponding to the predicted sequence and the punctuation-free text, wherein the predicted sequence comprises characters and punctuation, the training tag sequence comprises specific tags and punctuation, and each specific tag represents a corresponding character and is not connected with the punctuation;

training the decoding component according to the coding result, the prediction sequence and the training label sequence, and taking the coding component and the decoding component obtained by training as punctuation prediction models;

the obtaining a training label sequence according to the punctuation marking result of the prediction sequence and the punctuation-free text comprises the following steps:

comparing the prediction sequence with a punctuation marking result corresponding to the punctuation-free text to determine a prediction error type;

processing the decoding result by using a sampling method corresponding to the prediction error type to obtain the training tag sequence;

said training said decoding component according to said encoding result, said predicted sequence, and said training tag sequence comprising:

taking the coding result and the prediction sequence as the input of the decoding component to obtain the output of the decoding component;

And adjusting parameters in the decoding component according to the training tag sequence and the output of the decoding component until the decoding component converges.

2. The method of claim 1, wherein the processing the decoding result using the sampling method corresponding to the prediction error type to obtain the training tag sequence comprises:

and in response to the prediction error type being punctuation deletion, replacing a specific label at a position corresponding to the deleted punctuation in the decoding result by using the deleted punctuation to obtain the training label sequence.

3. The method of claim 1, wherein the processing the decoding result using the sampling method corresponding to the prediction error type to obtain the training tag sequence comprises:

and in response to the prediction error type being punctuation insertion, replacing the added punctuation in the decoding result by using a specific label to obtain the training label sequence.

4. The method of claim 1, further comprising,

setting a first matrix corresponding to a masking self-attention layer in each decoder included in the decoding component, and setting a second matrix corresponding to a self-attention layer in each decoder included in the decoding component;

The self-attention calculation of the masking self-attention layer in each decoder is done in combination with the first matrix, and the self-attention calculation of the self-attention layer in each decoder is done in combination with the second matrix.

5. The method of claim 4, further comprising,

setting a third matrix corresponding to a self-attention layer in each encoder contained in the coding component;

and combining the third matrix to complete the self-attention calculation of the self-attention layer in each encoder.

6. An apparatus for building punctuation predictive models, comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring training data, and the training data comprises a plurality of punctuation-free texts and punctuation marking results corresponding to the punctuation-free texts;

the coding unit is used for inputting the punctuation-free text into a coding component in the transducer model to obtain a coding result output by the coding component;

the decoding unit is used for inputting the punctuation marking result corresponding to the punctuation-free text and the coding result into a decoding component in the transform model to obtain a decoding result output by the decoding component, wherein the decoding result is a predicted tag sequence corresponding to the punctuation-free text, the predicted tag sequence comprises specific tags and punctuation, and each specific tag represents a corresponding character and is not connected with the corresponding punctuation;

The determining unit is used for determining a prediction sequence corresponding to the decoding result, and obtaining a training tag sequence according to a punctuation marking result corresponding to the prediction sequence and the punctuation-free text, wherein the prediction sequence comprises characters and punctuation, the training tag sequence comprises specific tags and punctuation, and each specific tag represents a corresponding character and does not contact the corresponding punctuation;

the training unit is used for training the decoding component according to the coding result, the prediction sequence and the training label sequence, and taking the coding component and the decoding component obtained by training as punctuation prediction models;

the determining unit specifically executes when obtaining a training tag sequence according to the punctuation marking result of the prediction sequence and the punctuation-free text:

the training unit specifically performs, when training the decoding component according to the encoding result, the prediction sequence, and the training tag sequence:

7. The apparatus of claim 6, wherein the determining unit, when processing the decoding result by using a sampling method corresponding to the prediction error type, specifically performs:

8. The apparatus of claim 6, wherein the determining unit, when processing the decoding result by using a sampling method corresponding to the prediction error type, specifically performs:

9. The apparatus of claim 6, further comprising a setting unit,

A first matrix for setting a mask self-attention layer corresponding to each decoder included in the decoding component, and a second matrix for setting a mask self-attention layer corresponding to each decoder included in the decoding component;

10. The apparatus of claim 9, the setup unit further configured to perform,

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.