CN111832248A

CN111832248A - Text normalization method and device, electronic equipment and storage medium

Info

Publication number: CN111832248A
Application number: CN202010731385.4A
Authority: CN
Inventors: 戚婷; 万根顺; 高建清; 王智国; 胡国平
Original assignee: iFlytek Co Ltd
Current assignee: University of Science and Technology of China USTC; iFlytek Co Ltd
Priority date: 2020-07-27
Filing date: 2020-07-27
Publication date: 2020-10-27

Abstract

The embodiment of the invention provides a text normalization method, a text normalization device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a text to be structured; inputting the text to be structured into a text structuring model to obtain a structured text corresponding to the text to be structured and output by the text structuring model; the text normalization model is obtained by training based on a text to be normalized by a sample, a text normalized by the sample and the sample editing type of each word in the text to be normalized by the sample; the text normalization model is used for determining the editing type of each participle in the text to be normalized, determining the normalization mode of the text to be normalized based on whether the text to be normalized contains inserted participles of which the editing types are insertion types, and normalizing the text to be normalized based on the normalization mode. The method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention improve the accuracy and efficiency of text normalization.

Description

Text normalization method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of natural language processing, in particular to a text normalization method, a text normalization device, electronic equipment and a storage medium.

Background

The text normalization refers to deleting words with substantial meanings or repeated descriptions with similar semantics in the original text, words or spoken words irrelevant to the theme and adjusting the word sequence in the original text under the condition of keeping the text semantics basically unchanged, so that the normalized text is more written, neatly and concise.

Current text normalization methods typically convert text to normalized text based on an Encoder-Decoder model (Encoder-Decoder). However, the encoder-decoder model has the problems of difficult model learning in the training process, poor regularization effect, large model calculation amount and low efficiency.

Disclosure of Invention

The embodiment of the invention provides a text normalization method, a text normalization device, electronic equipment and a storage medium, which are used for solving the defects of poor accuracy and low efficiency of the text normalization method in the prior art.

An embodiment of the present invention provides a text normalization method, including:

determining a text to be structured;

inputting the text to be structured into a text structuring model to obtain a structured text corresponding to the text to be structured and output by the text structuring model;

the text normalization model is obtained by training based on a text to be normalized by a sample, a text normalized by the sample and a sample editing type of each word in the text to be normalized by the sample;

the text normalization model is used for determining the editing type of each participle in the text to be normalized, determining the normalization mode of the text to be normalized based on whether the text to be normalized contains inserted participles of which the editing types are insertion types, and normalizing the text to be normalized based on the normalization mode.

According to the text normalization method of an embodiment of the present invention, the inputting the text to be normalized to a text normalization model to obtain a normalized text corresponding to the text to be normalized output by the text normalization model specifically includes:

inputting the text to be structured to an editing type determining layer of the text structuring model to obtain an editing type of each word segmentation in the text to be structured output by the editing type determining layer;

inputting each non-insertion word with the editing type being a non-insertion type and the editing type thereof into a text editing layer of the text normalization model to obtain a candidate normalization text output by the text editing layer;

if the text to be normalized has the insertion participle with the editing type of the insertion type, inputting each insertion participle and the candidate normalized text into an insertion participle normalization layer of the text normalization model to obtain a normalization result of each insertion participle output by the insertion participle normalization layer;

and inputting the candidate structured texts or the candidate structured texts and the structured result of each inserted word segmentation into a text arrangement layer of the text structured model to obtain the structured texts output by the text arrangement layer.

According to the text normalization method of an embodiment of the present invention, the inputting the text to be normalized to the editing type determining layer of the text normalization model to obtain the editing type of each word segmentation in the text to be normalized output by the editing type determining layer specifically includes:

inputting the text to be structured to a semantic feature extraction layer of the editing type determination layer to obtain semantic features of each word in the text to be structured output by the semantic feature extraction layer;

and inputting the semantic features of each word segmentation into the sequence labeling layer of the editing type determining layer to obtain the editing type of each word segmentation output by the sequence labeling layer.

According to the text normalization method of an embodiment of the present invention, the inputting each inserted participle and the candidate normalized text into the inserted participle normalization layer of the text normalization model to obtain the normalization result of each inserted participle output by the inserted participle normalization layer specifically includes:

and inputting the semantic features of any inserted participle and the text vector of the candidate regular text into the inserted participle regular layer to obtain a regular result of any inserted participle output by the inserted participle regular layer.

According to the text normalization method of an embodiment of the present invention, the inputting the semantic features of any inserted participle and the text vector of the candidate normalized text into the inserted participle normalization layer to obtain the normalization result of any inserted participle output by the inserted participle normalization layer specifically includes:

if the editing type of any inserted participle is reserved and then inserted, inputting the semantic features and the text vector of any inserted participle and the text vector of each participle in the candidate structured text before the position corresponding to the position of any inserted participle into the inserted participle regulating layer to obtain a structured result of any inserted participle output by the inserted participle regulating layer;

and if the editing type of any inserted participle is inserted after deletion, inputting the semantic features of any inserted participle and the text vector of each participle in the candidate structured text before the position corresponding to the inserted participle into the inserted participle structuring layer to obtain a structured result of any inserted participle output by the inserted participle structuring layer.

According to the text normalization method of one embodiment of the present invention, the method for obtaining the sample edit type of each participle in the sample text to be normalized includes:

aligning the text to be structured with the text already structured with the sample to obtain a text to be structured with the sample and a text already structured with the sample;

and comparing the sample alignment text to be structured with the sample alignment structured text to obtain the sample editing type of each word segmentation in the sample text to be structured.

According to the text normalization method of an embodiment of the present invention, the comparing the sample-aligned to-be-normalized text with the sample-aligned to-be-normalized text to obtain the sample editing type of each word segmentation in the sample to-be-normalized text specifically includes:

comparing the sample alignment to-be-structured text with the sample alignment structured text to obtain a sample editing type of each word segmentation in the sample alignment to-be-structured text;

and carrying out sample editing type combination on the sample aligned to the blank participle with the sample editing type of the inserted blank participle in the text to be structured and the participle before the blank participle to obtain the sample editing type of each participle in the text to be structured.

An embodiment of the present invention further provides a text normalization apparatus, including:

the text determining unit is used for determining a text to be structured;

the text normalization unit is used for inputting the text to be normalized to a text normalization model to obtain a normalized text corresponding to the text to be normalized and output by the text normalization model;

An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements any of the steps of the text normalization method when executing the program.

Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any of the above-mentioned text warping methods.

According to the text normalization method, the text normalization device, the electronic equipment and the storage medium, the editing type of each participle in the text to be normalized is determined, and the normalization mode of the text to be normalized is determined based on whether the text to be normalized contains the inserted participle with the editing type as the insertion type, so that the text to be normalized is normalized based on the normalization mode, the accuracy of text normalization is improved, and the efficiency of text normalization is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a text normalization method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a text normalization model operating method according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of an edit type determination layer operation method according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a sample editing type obtaining method according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a text normalization model according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a text normalization apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

With the rapid development of the internet technology, people can be exposed to massive text information every day. However, there is usually a lot of noise in the text information, such as words without substantial meaning, repeated descriptions with similar semantics, especially the text obtained by speech recognition, in which there is a lot of repeated expressions caused by factors such as unsmooth speaking of the speech input person and high spoken expressions caused by the way of speaking in daily life. Due to the existence of noise, the expression of the text is random and cannot be directly used in formal occasions, for example, the text obtained after speech recognition is carried out on the interview data of a reporter usually has a large amount of noise such as tone words, answer words, spoken buddhist or repeated description, and the text cannot be directly edited into a draft for publication.

In order to eliminate noise data in text information and to make the text information more complete and written, text normalization is required. Current text normalization methods typically convert text to normalized text based on an Encoder-Decoder model (Encoder-Decoder). The encoder is generally a multi-layer long-time memory cyclic neural network structure or a transform structure and comprises an input layer, an implicit layer and an output layer, an output result of the output layer is input into the decoder through an Attention mechanism, the decoder is generally also a long-time memory cyclic neural network or a transform, then the output of the encoder is subjected to Attention transformation and then input into the decoder, and then the decoder outputs a normalized target sentence word by word.

However, in the training process of the encoder-decoder model, the mapping relationship between the text to be normalized of the input sample and the text already normalized by the training target sample needs to be learned, which causes a problem of difficulty in learning, and results in poor effect of the trained model in practical application. In addition, Attention interaction needs to be performed on the output of the encoder at each decoding moment, the decoder cannot perform parallel operation, only decoding and outputting word by word in a time-consuming manner is available, the requirement on computing power is high, time is consumed, and efficiency is low.

In view of this, the embodiment of the present invention provides a text normalization method. Fig. 1 is a schematic flow chart of a text normalization method according to an embodiment of the present invention, and as shown in fig. 1, the method includes:

step 110, determining a text to be structured.

Specifically, the text to be structured is a text which needs to be subjected to text structuring. Here, the text to be structured may be a text directly input by a user or collected through a network, and may also be a text obtained by performing voice recognition on voice data input by the user, which is not specifically limited in the embodiment of the present invention.

Step 120, inputting the text to be structured into the text structuring model to obtain a structured text corresponding to the text to be structured and output by the text structuring model;

the text normalization model is obtained by training based on a text to be normalized by a sample, a text normalized by the sample and the sample editing type of each word in the text to be normalized by the sample;

the text normalization model is used for determining the editing type of each participle in the text to be normalized, determining the normalization mode of the text to be normalized based on whether the text to be normalized contains the inserted participle of which the editing type is the insertion type, and normalizing the text to be normalized based on the normalization mode.

Specifically, the text normalization model is used for identifying the edit type of each word segmentation in the input text to be normalized. The editing type of any participle is used for indicating what type of editing needs to be performed on the participle when the text normalization is performed, so that the edited result meets the requirements of book-surface expression, neatness and conciseness. The edit type may include a plurality of types such as delete, reserve, insert, and flip, and may further include a plurality of composite types such as insert after reserve, insert after delete, and the like, which is not specifically limited by the present invention.

The text to be structured may include non-insertion participles with the editing type being a non-insertion type, and may also include insertion participles with the editing type being an insertion type. The insertion type may specifically include insertion after retention, insertion after deletion, and the like, and the non-insertion type refers to other editing types other than the insertion type, and specifically may include deletion, retention, inversion, and the like.

The inserted participles and the non-inserted participles are used for respectively representing the participles with the editing types of the inserted type and the non-inserted type, and the regular modes of the inserted participles and the non-inserted participles are different. For example, the non-inserted participles can be directly deleted if the editing type is deleted, the non-inserted participles are not changed if the editing type is reserved, the non-inserted participles are inverted if the editing type is reserved, and word order can be adjusted if the editing type is inverted non-inserted participles according to other related editing types; for the insertion segmentation, not only the insertion segmentation itself needs to be normalized, but also a new word to be inserted at the insertion segmentation position needs to be determined.

Therefore, after the text normalization model obtains the edit type of each word in the text to be normalized, whether the text to be normalized contains the inserted word with the edit type being the insertion type can be judged, the corresponding normalization mode is determined according to the judgment result, and then the text to be normalized is normalized according to the determined normalization mode. For example, if there is no inserted participle in the text to be structured, the corresponding structuring manner may be to directly perform text structuring on the non-inserted participle in the text to be structured, and output the structured text as the structured text. If the insertion participles exist in the text to be structured, the corresponding structuring mode not only needs to perform text structuring on the non-insertion participles in the text to be structured, but also needs to perform structuring on the insertion participles with the editing type being the insertion type, and particularly needs to predict the words or word sequences which should be inserted at the insertion participles when performing structuring on the insertion participles, so as to obtain the structured text.

Before step 120 is executed, a text-regularizing model may also be obtained through pre-training, and specifically, the text-regularizing model may be obtained through training in the following manner: firstly, a large amount of sample texts to be structured are collected, and the sample editing type of each word in the sample texts to be structured and the sample structured texts are determined. And then training an initial model based on the text to be normalized of the sample, the text normalized of the sample and the sample editing type of each participle in the text to be normalized of the sample, thereby obtaining a text normalization model.

In the embodiment of the invention, the text normalization model fully utilizes the characteristic that only local positions need to be modified in the text normalization process, converts the text normalization problem into the text sequence editing problem, does not directly use the sample normalized text as a model learning target, does not learn the relation between the sample to-be-normalized text and the sample normalized text, and identifies the editing type of each participle in the sample to-be-normalized text according to the learning of the sample normalized text, so that the local adjustment can be performed on each participle in the sample to-be-normalized text according to the editing type of each participle, the learning difficulty of the model is reduced, and the accuracy of text normalization can be improved.

In addition, the text normalization model divides the participles in the text to be normalized into non-insertion participles and insertion participles, and performs distinguishing processing based on whether the text to be normalized contains the insertion participles with the editing type being the insertion type, wherein the normalization of the non-insertion participles can be performed directly according to the editing type without applying a decoder in the traditional method for decoding, so that the operation efficiency of the model can be improved, and the efficiency of text normalization is improved. For example, the text to be structured is "useless" and the corresponding structured text should be "useless", and after the text structuring model identifies the edit type of each participle in the text to be structured, namely deleting, retaining, preserving and preserving, the first participle can be simply deleted without subsequent complicated decoding calculation.

According to the method provided by the embodiment of the invention, the editing type of each participle in the text to be structured is determined, and the formatting mode of the text to be structured is determined based on whether the text to be structured contains the inserted participle of which the editing type is the insertion type, so that the text to be structured is structured based on the formatting mode, the accuracy of text formatting is improved, and the efficiency of text formatting is improved.

Based on the foregoing embodiment, fig. 2 is a schematic flow chart of a text-normalization model operation method provided by the embodiment of the present invention, and as shown in fig. 2, step 120 specifically includes:

step 121, inputting the text to be structured to an editing type determining layer of the text structuring model, and obtaining the editing type of each word segmentation in the text to be structured output by the editing type determining layer.

Specifically, the edit type determining layer is configured to determine an edit type of each segment based on the text to be structured. Here, the edit type determining layer may be constructed based on a sequence labeling model, such as a Conditional Random Field (CRF) model or a comprehensive model of a CRF model and a Recurrent Neural Network (RNN) and a variation thereof, which is not specifically limited in this embodiment of the present invention.

And step 122, inputting each non-insertion participle with the editing type being a non-insertion type and the editing type thereof into a text editing layer of the text normalization model to obtain a candidate normalized text output by the text editing layer.

Specifically, non-insertion participles with the editing type of non-insertion in the text to be structured are screened out, and all the non-insertion participles and the editing types thereof are input into the text editing layer. And the text editing layer edits the non-inserted participles based on the editing types of the non-inserted participles to obtain the candidate regular text. For example, for a non-inserted participle whose edit type is delete, it can be deleted directly, and for a retained non-inserted participle whose edit type is flip, it is not necessary to process it, while for a non-inserted participle whose edit type is flip, it can be adjusted in word order according to the non-inserted participle that together causes this flip phenomenon.

And 123, if the inserting participles with the editing type of the inserting type exist in the text to be structured, inputting each inserting participle and the candidate structured text into an inserting participle structuring layer of the text structuring model, and obtaining a structured result of each inserting participle output by the inserting participle structuring layer.

Specifically, if there is an inserted participle whose edit type is an insertion type in the text to be normalized, it represents that a new word that does not exist in the original text needs to be inserted at the position of the inserted participle, so that each inserted participle and the candidate normalized text can be input to the inserted participle normalization layer of the text normalization model, so that the inserted participle normalization layer predicts the participle that needs to be inserted at each inserted participle position according to the inserted participle and the candidate normalized text, and obtains the normalization result of each inserted participle. Wherein, the regular result of any inserted participle includes the participle to be inserted at the position of the inserted participle. Here, the insertion word segmentation normalization layer may be constructed based on a single-layer decoder model, such as a single-layer LSTM (Long Short-Term Memory) model or a single-layer transform model, so as to reduce the number of model parameters and the structural complexity and further improve the operation efficiency of the text normalization model.

And step 124, inputting the candidate structured texts, or the candidate structured texts and the structured result of each inserted participle into a text arrangement layer of the text structured model to obtain the structured texts output by the text arrangement layer.

Specifically, if there is no insertion participle whose edit type is insertion type in the text to be normalized, representing that the editing of the text editing layer in step 122 has already normalized each participle in the text to be normalized, the output candidate normalized text can be directly used as the normalized text, and the insertion participle normalization layer does not need to perform any processing. At this time, the candidate regular texts may be input to the text arrangement layer, and the text arrangement layer directly outputs the candidate regular texts as the structured texts. Otherwise, the candidate structured texts and the structured results of each inserted word output by the inserted word regular layer need to be input into the text arrangement layer, and the text arrangement layer arranges the candidate structured texts and the structured results of each inserted word in sequence to obtain the structured texts and output the structured texts.

Therefore, for the case that the text to be structured contains the insertion participle with the edit type being the insertion type, the corresponding structuring method needs to sequentially execute

steps

122, 123 and 124, and for the case that the text to be structured does not contain the insertion participle with the edit type being the insertion type, the corresponding structuring method only needs to execute step 122 and step 124 without executing step 123, thereby realizing the improvement of the text structuring efficiency.

According to the method provided by the embodiment of the invention, text normalization is carried out on non-inserted participles based on the editing type of each participle, if the inserted participle exists in the text to be normalized, the inserted participle normalization layer only needs to predict the inserted participle at the position of the inserted participle, so that the complexity and the calculated amount of a model are reduced, and the efficiency of text normalization is improved.

Based on any of the above embodiments, fig. 3 is a schematic flowchart of an editing type determination layer operation method provided by the embodiment of the present invention, and as shown in fig. 3, step 121 specifically includes:

step 1211, inputting the text to be structured to a semantic feature extraction layer of the editing type determination layer, and obtaining semantic features of each participle in the text to be structured output by the semantic feature extraction layer;

step 1212, inputting the semantic features of each participle into the sequence annotation layer of the editing type determining layer to obtain the editing type of each participle output by the sequence annotation layer.

Specifically, the word segmentation process may be performed on the text to be normalized first, and a special symbol, such as < s >, may be inserted at the beginning of the text to be normalized to indicate the beginning character. Then, the semantic feature extraction layer extracts the semantic features of each participle based on the text vector of the text to be normalized. The text vector of the text to be normalized comprises a word vector of each participle, and the semantic features of any participle can comprise the semantic information of the participle and can also comprise the context semantic information related to the participle in the text to be normalized. When determining the semantic features of each participle, in order to avoid interference caused by the participles irrelevant to the participle, and capture the semantic information of the participle far away from the participle and having relevance, the text vector of the to-be-normalized text can be subjected to self-attention transformation, so that the semantic features of each participle are obtained. Here, the semantic feature extraction layer may be constructed based on an LSTM model or a Bi-directional Long-Short Term Memory (Bi-directional Long-Short Term Memory) model. And then, the semantic features of each participle are input into the sequence labeling layer, so that the sequence labeling layer determines the editing type of each participle based on the semantic features of each participle. It should be noted that the semantic feature extraction layer may output the semantic features of each participle in parallel, so that the sequence annotation layer may output the editing type of each participle in parallel based on the semantic features of each participle, so as to further improve the operating efficiency of the text normalization model.

Based on any of the above embodiments, step 123 specifically includes:

and inputting the semantic features of any inserted participle and the text vector of the candidate regular text into the inserted participle regular layer to obtain a regular result of the inserted participle output by the inserted participle regular layer.

Specifically, when determining a participle to be inserted at any inserted participle position, the inserted participle normalization layer only needs to predict the participle which is possibly present at the inserted participle position, and the semantic features of the inserted participle already contain the context semantic information related to the inserted participle in the text to be normalized, so that the semantic features of the inserted participle and the semantic information contained in the text vector of the candidate normalized text can be used for prediction without excessively considering the semantic features of other participles in the text to be normalized, and without calculating and acquiring the semantic features of the participle with a large association degree in the text to be normalized by using an Attention mechanism during prediction, so that the complexity and the calculation amount of a text normalization model are reduced, and the efficiency of text normalization is improved. Here, the semantic features of the inserted participles are obtained based on the semantic feature extraction layer of the edit type determination layer, and therefore the semantic features of the inserted participles output by the semantic feature extraction layer can be directly input into the inserted participle normalization layer.

It should be noted that, if other inserted participles are included before the inserted participle, the text vector of the normalization result of other inserted participles, together with the semantic features of the inserted participles and the text vector of the candidate normalized text, may be input to the inserted participle normalization layer for prediction by the inserted participle normalization layer.

Further, considering that a participle possibly missing at a position of an inserted participle is generally more related to a plurality of participles before the inserted participle, a text vector corresponding to each participle before the position of the inserted participle in the candidate regular text and semantic features of the inserted participle can be input into the inserted participle regular layer for prediction.

According to the method provided by the embodiment of the invention, the inserted word regulation layer only needs to predict the inserted words based on the semantic features of any inserted word and the text vectors of the candidate regulated texts, so that the complexity and the calculated amount of a text regulation model can be reduced, and the efficiency of text regulation is improved.

Based on any embodiment, the semantic features of any inserted participle and the text vector of the candidate structured text are input to the inserted participle structuring layer, so as to obtain a structured result of the inserted participle output by the inserted participle structuring layer, and the method specifically comprises the following steps:

if the editing type of any inserted participle is reserved and then inserted, inputting the semantic features and the text vectors of the inserted participle and the text vectors of each participle in the candidate regular text before the position corresponding to the inserted participle into the inserted participle regular layer to obtain a regular result of the inserted participle output by the inserted participle regular layer;

and if the editing type of any inserted participle is inserted after deletion, inputting the semantic features of the inserted participle and the text vector of each participle in the candidate regular text before the position corresponding to the inserted participle into the inserted participle regular layer to obtain the regular result of the inserted participle output by the inserted participle regular layer.

Specifically, the insertion type can be further specifically divided into insertion after retention and insertion after deletion, and if the editing type of any inserted participle is insertion after retention, the inserted participle does not need to be processed, and only a plurality of participles need to be inserted thereafter; if the editing type of any inserted participle is inserted after deletion, the inserted participle needs to be deleted, and then a plurality of participles are inserted.

For an inserted participle with an editing type of being inserted after retention, in addition to context semantic information related to the inserted participle, the semantic information of the inserted participle needs to be taken into consideration, so that semantic features and text vectors of the inserted participle and text vectors of each participle in a candidate regular text before the position corresponding to the inserted participle need to be input into an inserted participle regular layer, and a regular result of the inserted participle output by the inserted participle regular layer is obtained. The regularizing result of the inserted participle comprises the inserted participle and the predicted participle needing to be inserted in the position.

For the inserted participles with the editing type of being inserted after deletion, the semantic information of the inserted participles can not be considered, so that only the semantic features of the inserted participles and the text vectors of each participle in the candidate regular text before the position of the inserted participle are required to be input into the inserted participle regular layer, and the inserted participle regular layer can predict the inserted words according to the semantic features and the text vectors. Wherein the regularized result of the insertion participle only comprises the predicted participle needing to be inserted here.

Alternatively, the missing participles at any inserted participle position can be decoded based on a single layer decoder, such as a single layer LSTM model or a single layer transform model. For example, when the edit type of the inserted participle is insertion after preservation, the following formula can be used for decoding:

W_i，j＝f^de(h_i，[y₁，y₂，...，y_i，W_i，1，W_i，2，...，W_i，j-1])

wherein, W_i，jThe method comprises the steps of representing a jth word needing to be inserted at an ith word segmentation position in a text to be normalized, namely inserting a word segmentation obtained by decoding an integral layer of word segmentation at the current moment, wherein the ith word segmentation is an inserted word segmentation which is inserted after the editing type is reserved; w_i，1，W_i，2，...，W_i，j-1Representing a participle decoded before the current moment by the insertion participle normalization layer, h_iSemantic features representing the ith participle, y₁，y₂，...，y_i-1A text vector, y, representing each participle in the candidate regular text before the position corresponding to the ith participle_iA text vector for the ith word segmentation, f^de() Is a single layer decoder.

When the edit type of the inserted participle is inserted after deletion, the following formula can be adopted for decoding:

W_i，j＝f^de(h_i，[y₁，y₂，...，y_i-1，W_i，1，W_i，2，...，W_i，j-1])

wherein, W_i，jThe method comprises the steps of representing a jth word needing to be inserted at an ith word segmentation position in a text to be normalized, namely inserting a word segmentation obtained by decoding an integral layer of word segmentation at the current moment, wherein the ith word segmentation is an inserted word segmentation inserted after an editing type is deleted; w_i，1，W_i，2，...，W_i，j-1Representing a participle decoded before the current moment by the insertion participle normalization layer, h_iSemantic features representing the ith participle, y₁，y₂，...，y_i-1A text vector f representing each participle in the candidate regular text before the position corresponding to the ith participle^de() Is a single layer decoder.

Based on any of the above embodiments, fig. 4 is a schematic flowchart of a sample editing type obtaining method provided by an embodiment of the present invention, and as shown in fig. 4, the method includes:

step 410, aligning a sample text to be structured and a sample structured text to obtain a sample aligned text to be structured and a sample aligned structured text;

step 420, comparing the sample alignment text to be structured with the sample alignment structured text to obtain a sample editing type of each word segmentation in the sample text to be structured.

Specifically, since the lengths of the sample text to be structured and the sample structured text may not be the same, in order to correctly compare what type of editing each word in the sample text to be structured has been performed, the sample structured text may be obtained, and the two texts need to be aligned. The alignment mode can adopt a dynamic programming algorithm for calculating the minimum editing distance in the voice recognition technology. The length of the text to be structured in the sample alignment obtained by alignment is the same as that of the text already structured in the sample alignment.

Then, performing word-by-word comparison on the sample alignment to-be-structured text and the sample alignment structured text from left to right, determining which editing mode each word in the sample alignment to-be-structured text can obtain the sample alignment structured text, and further determining the sample editing type of each word in the sample to-be-structured text according to the corresponding relationship between the sample alignment to-be-structured text and the word of the sample to-be-structured text.

For example, the text to be structured of the sample "we/us/tomorrow/meeting/bar" corresponds to the text to be structured of the sample "we/tomorrow/meeting", and after aligning the two texts, the obtained text to be structured of the sample alignment and the text to be structured of the sample alignment are shown in the following table:

sample alignment text to be structured	We have found that	We have found that	Tomorrow (tomorrow)	See	Bar
						Sample aligned structured text		We have found that	Tomorrow (tomorrow)	See

And finally determining the sample editing types of each word in the obtained sample text to be structured as D, K, K, K and D according to the word-by-word comparison result of the two words, wherein D represents deletion, and K represents retention.

Based on any of the above embodiments, step 420 specifically includes:

comparing the sample alignment text to be structured with the sample alignment structured text to obtain a sample editing type of each word segmentation in the sample alignment text to be structured;

and carrying out sample editing type combination on the sample editing type of the inserted blank participles and the participles before the blank participles in the sample alignment to-be-structured text to obtain the sample editing type of each participle in the sample to-be-structured text.

Specifically, the sample to-be-structured text and the sample to-be-structured text are subjected to word-by-word comparison from left to right to obtain a sample editing type of each word in the sample to-be-structured text. For example, if the ith participle in the sample-aligned structured text is the same as the ith participle in the sample-aligned to-be-structured text, the editing type corresponding to the ith participle in the sample-aligned to-be-structured text is considered to be reserved; if the ith participle in the sample alignment structured text is different from the ith participle in the sample alignment to-be-structured text, judging whether the ith participle in the sample alignment structured text is empty or not; and if the number of the segmented words in the sample alignment to-be-structured text is null, the editing type corresponding to the ith segmented word in the sample alignment to-be-structured text is considered to be deleted, otherwise, the editing type corresponding to the ith segmented word in the sample alignment to-be-structured text is considered to be inserted, and the inserted segmented word is the ith segmented word in the sample alignment structured text.

Because the lengths of the text to be normalized of the sample and the text already normalized of the sample are not consistent, if the situation that the word segmentation is missing exists in the text to be normalized of the sample, a new word segmentation needs to be inserted when the text is normalized, and after the text to be normalized of the sample and the text already normalized of the sample are aligned, blank word segmentation exists in the aligned text to be normalized of the sample. For example, the sample to-be-structured text "this/hospital/once/li/courtyard/tasked/this/courtyard", where there is a word segmentation missing situation between "once" and "li", the corresponding sample structured text is "this/hospital/once/li/courtyard/tasked/courtyard", after aligning the two, the obtained sample alignment to-be-structured text and sample alignment structured text are aligned, and the sample edit type of each word in the sample alignment to-be-structured text is shown in the following table:

it can be seen that in the sample aligned text to be normalized, blank participles exist between the participle "once" and the participle "plum". After the sample alignment to-be-structured text and the sample alignment structured text are compared, the sample editing type of each participle in the obtained sample alignment to-be-structured text is K, K, K, I | yes, K, K, K, D and K, wherein "I | is" means that the sample editing type is insertion, and the participle to be inserted is "yes".

When a new participle needs to be inserted into a sample text to be structured, the length of the sample text to be structured is not consistent with that of a sample aligned text to be structured, and in order to determine the sample editing type of each participle in the sample text to be structured, the sample editing type in the sample aligned text to be structured needs to be the sample editing type combination of an inserted blank participle and a participle before the blank participle, so that a sample editing type sequence with the length consistent with that of the sample text to be structured is obtained, and thus the sample editing type of each participle in the sample text to be structured is obtained. If the sample editing type of the word segmentation before the blank word segmentation is reserved, the combined sample editing type is inserted after the reservation; and if the sample editing type of the word segmentation before the blank word segmentation is deletion, the merged sample editing type is deletion after retention. If there are a plurality of consecutive blank participles, the plurality of blank participles are merged with the non-blank participles closest to the blank participles before the blank participles in a sample editing type.

Also taking the above table as an example, after aligning the sample editing type in the sample to-be-structured text as the inserted blank participle and the participle before the blank participle, that is, "once", and merging the sample editing types, the sample editing type of each participle in the obtained sample to-be-structured text is K, K, K | I, K, K, K, D and K, wherein the participle inserted at K | I is "yes".

Based on any of the above embodiments, it is considered that the existence of the flip phenomenon also causes that after the text to be warped is aligned with the warped text, blank participles exist in the text to be warped after the sample is aligned, and the blank participles are determined to be inserted when the sample editing type of the blank participles is determined. Among them, the flip-chip phenomenon is the reversal of the existing order of two consecutive word sequences, which may occur at any position in the text. In this case, when the text to be structured is structured, only the word order needs to be adjusted, and new participles do not need to be inserted. For example, the sample to-be-structured text "that/they/back/from/home/all over, the corresponding sample structured text is" that/they/from/home/all over/back/all over ", after aligning the two, the obtained sample aligned to-be-structured text and sample aligned structured text, the sample edit type of each participle in the sample aligned to-be-structured text, and the merged sample edit type are shown in the following table:

however, when the text normalization is actually performed, new participles do not need to be inserted behind the 'places', and the 'back/back' and the 'from/home/places' are just required to be reversed.

Therefore, after the sample editing type in the sample alignment text to be normalized is the sample editing type combination of the inserted blank participle and the participle before the blank participle, whether the inversion phenomenon exists or not needs to be identified. If a plurality of continuous sample editing types are deleted participles, then a plurality of sample editing types are followed by reserved participles and a sample editing type is inserted after the reserved participles, and the continuous sample editing types are deleted participles which are the same as the participles needing to be inserted at the part where the sample editing types are inserted after the reserved participles, the existence of the inversion phenomenon can be determined. For example, in the above table, the sample edit type is that the deleted participles are "back" and "back", and then the participles that need to be inserted at the part of the participle "each place" that is inserted after the sample edit type is reserved are also "back" and "back", which indicates that the flip-chip phenomenon exists.

In order to mark the segmented words corresponding to the flip phenomenon, the positions of the two word sequences with reversed sequences in the text to be normalized are determined. For example, the beginning and ending participles of a first word sequence and the beginning and ending participles of a second word sequence may be identified. Wherein, the ending participle of the first word sequence and the starting participle of the second word sequence are continuous, so that only one of them can be marked. Specifically, the sample edit type in which the plurality of consecutive sample edit types are the first participle in the deleted participle may be converted into an inverted version, the sample edit types in which the first sample edit type is the retained participle and the sample edit type is the inserted participle after retention may be converted into an inverted version, and then the sample edit types of other participles may be set to retention. Still taking the above table as an example, the sample edit types of "back", "from" and "each place" are converted into flip, the sample edit types of "having" and "home" are set as reserved, and the sample edit types of each participle in the finally obtained sample to-be-warped text are D, K, E, K, E, K and E, where E represents flip.

Based on any embodiment, the construction method of the text-structured model comprises the following steps:

first, a large number of samples are collected for text warping. During collection, a large amount of spoken voice data, such as a large amount of interview voice data, can be collected, corresponding sample texts to be structured are obtained after voice recognition is performed on the spoken voice data, and texts containing non-standard expressions such as spoken language descriptions and the like, such as spoken language dialogue texts of a novel figure and the like, can be directly collected. Then, a sample structured text corresponding to the sample text to be structured is labeled, and the sample editing type of each word segmentation in the sample text to be structured is obtained by using the sample editing type obtaining method in any embodiment. The sample editing type comprises a retaining type, a deleting type, an inverting type, an inserting type after the retaining type and an inserting type after the deleting type, the retaining type, the deleting type and the inverting type are non-inserting types, and the inserting type after the retaining type and the inserting type after the deleting type are inserting types. In addition, the text to be structured and the structured text can be subjected to word segmentation processing, and a special symbol, such as < s >, is inserted at the starting position of the text to represent the starting character.

Then, the structure of the text-warping model is determined. Fig. 5 is a schematic structural diagram of a text normalization model according to an embodiment of the present invention, and as shown in fig. 5, the text normalization model includes an edit type determination layer, a text editing layer, an insertion participle normalization layer, and a text sorting layer, where the edit type determination layer includes a semantic feature extraction layer and a sequence labeling layer. The semantic feature extraction layer is used for determining semantic features h of each participle in the text to be structured based on the text to be structured₁、h₂…、h_nWherein n is the total number of word segmentation; the sequence labeling layer is used for semantic features h based on each participle₁、h₂…、h_nDetermining the editing type of each word segmentation; the text editing layer is used for determining candidate regular texts based on each non-insertion participle of which the editing type is a non-insertion type and the editing type of the non-insertion participle; the inserting segmentation regulation layer is executed only when inserting segmentation words with the editing type of the inserting type exist in the text to be regulated, and is used for determining a regulation result of the inserting segmentation words based on the semantic features of any inserting segmentation word and the text vectors of the candidate regulated text; the text arrangement layer is used for determining the structured text based on the candidate structured text and the structured result of each inserted participle when the inserted participle with the editing type of the insertion type exists in the text to be structured, as shown by a solid line part in fig. 5, or determining the structured text based on the candidate structured text, as shown by a dotted line part in fig. 5.

And then training the parameters of the text normalization model based on the text to be normalized, the normalized text and the sample editing type of each participle in the text to be normalized. Wherein the model can be updated using a general training algorithm, such as a gradient descent algorithm.

The following describes a text-normalization device provided in an embodiment of the present invention, and the text-normalization device described below and the text-normalization method described above may be referred to correspondingly.

Based on any of the above embodiments, fig. 6 is a schematic structural diagram of a text-normalization device according to an embodiment of the present invention, and as shown in fig. 6, the device includes a text-determining unit 610 and a text-normalization unit 620.

The text determining unit 610 is configured to determine a text to be structured;

the text normalization unit 620 is configured to input the text to be normalized to the text normalization model, and obtain a normalized text corresponding to the text to be normalized output by the text normalization model;

The device provided by the embodiment of the invention determines the editing type of each participle in the text to be structured and determines the formatting mode of the text to be structured based on whether the text to be structured contains the inserted participle of which the editing type is the insertion type, so that the text to be structured is structured based on the formatting mode, the accuracy of text formatting is improved, and the efficiency of text formatting is improved.

Based on any of the above embodiments, the text normalization unit 620 specifically includes:

the editing type determining unit is used for inputting the text to be structured to the editing type determining layer of the text structuring model to obtain the editing type of each word segmentation in the text to be structured output by the editing type determining layer;

the text editing unit is used for inputting each non-insertion participle with the editing type of non-insertion and the editing type thereof into the layer of the text structured model to obtain a candidate structured text output by the text editing layer;

the inserting word segmentation normalization unit is used for inputting each inserting word and the candidate normalization text into an inserting word segmentation normalization layer of the text normalization model if the inserting word with the editing type of the inserting type exists in the text to be normalized, and obtaining a normalization result of each inserting word output by the inserting word segmentation normalization layer;

and the text arrangement unit is used for inputting the candidate structured texts or the candidate structured texts and the structured result of each inserted participle into a text arrangement layer of the text structured model to obtain the structured texts output by the text arrangement layer.

The device provided by the embodiment of the invention can be used for carrying out text normalization on non-inserted participles based on the editing type of each participle, and if the inserted participle exists in the text to be normalized, the inserted participle normalization layer only needs to carry out inserted word prediction at the position of the inserted participle, so that the complexity and the calculated amount of a model are reduced, and the efficiency of text normalization is improved.

Based on any of the above embodiments, the editing-type determining unit specifically includes:

the semantic feature extraction unit is used for inputting the text to be structured to a semantic feature extraction layer of the editing type determination layer to obtain the semantic feature of each participle in the text to be structured output by the semantic feature extraction layer;

and the sequence labeling unit is used for inputting the semantic features of each participle into the layer of the editing type determining layer to obtain the editing type of each participle output by the sequence labeling layer.

Based on any of the embodiments above, the insertion participle regularization unit is specifically configured to:

According to the device provided by the embodiment of the invention, the inserted word regulation layer only needs to predict the inserted words based on the semantic features of any inserted word and the text vectors of the candidate regulated texts, so that the complexity and the calculation amount of a text regulation model can be reduced, and the efficiency of text regulation is improved.

Based on any of the above embodiments, the apparatus further includes a sample editing type obtaining unit, where the sample editing type obtaining unit specifically includes:

the alignment unit is used for aligning the text to be structured of the sample and the text already structured of the sample to obtain the text to be structured of the sample and the text already structured of the sample;

and the sample editing type determining unit is used for comparing the sample alignment to-be-structured text with the sample alignment structured text to obtain the sample editing type of each word segmentation in the sample to-be-structured text.

Based on any of the above embodiments, the determining of the sample editing type specifically includes:

the comparison unit is used for comparing the sample alignment to-be-structured text with the sample alignment structured text to obtain a sample editing type of each word in the sample alignment to-be-structured text;

and the merging unit is used for merging the sample editing types of the inserted blank participles and the participles before the blank participles in the sample aligned to-be-structured text to obtain the sample editing type of each participle in the sample to-be-structured text.

Fig. 7 illustrates a physical structure diagram of an electronic device, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a text normalization method comprising: determining a text to be structured; inputting the text to be structured into a text structuring model to obtain a structured text corresponding to the text to be structured and output by the text structuring model; the text normalization model is obtained by training based on a text to be normalized by a sample, a text normalized by the sample and a sample editing type of each word in the text to be normalized by the sample; the text normalization model is used for determining the editing type of each participle in the text to be normalized, determining the normalization mode of the text to be normalized based on whether the text to be normalized contains inserted participles of which the editing types are insertion types, and normalizing the text to be normalized based on the normalization mode.

In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the text normalization method provided by the above-mentioned method embodiments, where the method includes: determining a text to be structured; inputting the text to be structured into a text structuring model to obtain a structured text corresponding to the text to be structured and output by the text structuring model; the text normalization model is obtained by training based on a text to be normalized by a sample, a text normalized by the sample and a sample editing type of each word in the text to be normalized by the sample; the text normalization model is used for determining the editing type of each participle in the text to be normalized, determining the normalization mode of the text to be normalized based on whether the text to be normalized contains inserted participles of which the editing types are insertion types, and normalizing the text to be normalized based on the normalization mode.

In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the text normalization method provided in the foregoing embodiments, and the method includes: determining a text to be structured; inputting the text to be structured into a text structuring model to obtain a structured text corresponding to the text to be structured and output by the text structuring model; the text normalization model is obtained by training based on a text to be normalized by a sample, a text normalized by the sample and a sample editing type of each word in the text to be normalized by the sample; the text normalization model is used for determining the editing type of each participle in the text to be normalized, determining the normalization mode of the text to be normalized based on whether the text to be normalized contains inserted participles of which the editing types are insertion types, and normalizing the text to be normalized based on the normalization mode.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for text normalization, comprising:

determining a text to be structured;

2. The method according to claim 1, wherein the inputting the text to be structured into a text structuring model to obtain a structured text corresponding to the text to be structured output by the text structuring model specifically comprises:

3. The method as claimed in claim 2, wherein the inputting the text to be structured into an editing type determining layer of the text structuring model to obtain the editing type of each word segmentation in the text to be structured output by the editing type determining layer specifically includes:

4. The text normalization method according to claim 3, wherein the inputting each inserted participle and the candidate normalized text into an inserted participle normalization layer of the text normalization model to obtain a normalization result of each inserted participle output by the inserted participle normalization layer specifically comprises:

5. The text normalization method according to claim 4, wherein the inputting semantic features of any inserted participle and the text vector of the candidate normalized text into the inserted participle normalization layer to obtain the normalization result of any inserted participle output by the inserted participle normalization layer specifically comprises:

6. The text normalization method according to any one of claims 1-5, wherein the method for obtaining the sample edit type of each participle in the sample text to be normalized comprises:

7. The text normalization method according to claim 6, wherein the comparing the sample-aligned to-be-normalized text with the sample-aligned to-be-normalized text to obtain a sample edit type of each word segmentation in the sample to-be-normalized text specifically comprises:

8. A text normalization apparatus, comprising:

the text determining unit is used for determining a text to be structured;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the text normalization method according to any one of claims 1 to 7 are implemented when the program is executed by the processor.

10. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the text warping method according to any one of claims 1 to 7.