CN117113941A

CN117113941A - Punctuation mark recovery method and device, electronic equipment and storage medium

Info

Publication number: CN117113941A
Application number: CN202311375497.0A
Authority: CN
Inventors: 周月辉; 赵雷; 田维政
Original assignee: Xinsheng Technology Shenzhen Co ltd; Shenzhen Peoples Hospital
Current assignee: Xinsheng Technology Shenzhen Co ltd; Shenzhen Peoples Hospital
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2023-11-24
Anticipated expiration: 2043-10-23
Also published as: CN117113941B

Abstract

The application relates to a punctuation mark recovery method, a punctuation mark recovery device, electronic equipment and a storage medium, wherein the punctuation mark recovery method comprises the following steps: obtaining a text to be recovered, and performing word segmentation processing on the text to be recovered to obtain a word segmentation text containing a plurality of words; the text to be restored does not have punctuation marks; inserting a first identifier between every two words in the word segmentation text to obtain an inserted text; inputting the inserted text into a punctuation prediction model trained in advance, and obtaining a first prediction result of the punctuation prediction model aiming at each first identifier; performing punctuation recovery processing on the inserted text according to the first prediction result of each first identifier to obtain a target text; the target text has punctuation marks. Therefore, the efficiency and the accuracy of punctuation recovery can be improved.

Description

Punctuation mark recovery method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a punctuation mark recovery method, apparatus, electronic device, and storage medium.

Background

With the continuous development of the field of speech recognition, the accuracy of speech recognition by a speech recognition system is also increasing. However, most speech recognition systems today typically do not include punctuation, which is poorly readable to the user, and thus punctuation recovery of the recognized speech text is essential to speech recognition systems.

In the prior art, when punctuation recovery is carried out on a voice text, whether the punctuation is needed to be added after each word in the voice text is predicted is often judged, if so, the punctuation is added, and if not, the next word is predicted continuously; the punctuation mark prediction in the prior art is realized by taking words as units, the method is low in efficiency, the relation between words is ignored, the phenomenon of prediction errors is easy to occur, and the efficiency and the accuracy of punctuation mark recovery for a voice text are low.

Disclosure of Invention

The application provides a punctuation mark recovery method, a punctuation mark recovery device, electronic equipment and a storage medium, which are used for solving the technical problems of low efficiency and low accuracy of punctuation mark recovery for a voice text in the prior art.

In a first aspect, the present application provides a punctuation recovery method, the method comprising:

obtaining a text to be recovered, and performing word segmentation processing on the text to be recovered to obtain a word segmentation text containing a plurality of words; the text to be restored does not have punctuation marks;

inserting a first identifier between every two words in the word segmentation text to obtain an inserted text;

Inputting the inserted text into a punctuation prediction model trained in advance, and obtaining a first prediction result of the punctuation prediction model aiming at each first identifier;

performing punctuation recovery processing on the inserted text according to the first prediction result of each first identifier to obtain a target text; the target text has punctuation marks.

As one possible implementation manner, the inputting the inserted text into a pre-trained punctuation prediction model, obtaining a first prediction result of the punctuation prediction model for each of the first identifiers includes:

extracting vector features of the inserted text through the punctuation prediction model to obtain first vector features corresponding to each first identifier in the inserted text;

performing dimension reduction processing on the first vector features corresponding to each first identifier to obtain second vector features corresponding to the first identifiers;

normalizing the second vector feature to obtain a punctuation mark prediction set corresponding to the first identifier; the punctuation mark prediction set comprises at least one punctuation mark, and one punctuation mark corresponds to one probability value;

And determining the punctuation mark with the maximum probability value in the punctuation mark prediction set as a first prediction result of the first identifier.

As a possible implementation manner, the punctuation recovery processing is performed on the inserted text according to the first prediction result of each first identifier, including:

determining whether a probability value corresponding to the first prediction result is larger than a preset probability threshold value;

determining whether the first prediction result is a null punctuation mark or not under the condition that the probability value corresponding to the first prediction result is larger than the preset probability threshold value;

if the first prediction result is the empty punctuation mark, deleting a first identifier corresponding to the first prediction result in the inserted text;

and if the first predicted result is not the empty punctuation mark, restoring the corresponding first identifier into the first predicted result in the inserted text.

Determining that the first identifier is a special identifier under the condition that the probability value corresponding to the first prediction result is smaller than or equal to the preset probability threshold value;

determining whether a preset special punctuation mark exists before the special identifier;

carrying out emotion analysis on a text between the special punctuation mark and the special identifier under the condition that the special punctuation mark exists before the special identifier, and determining the special punctuation mark corresponding to the special identifier;

and carrying out emotion analysis on all texts before the special identifier under the condition that the special punctuation mark does not exist before the special identifier, and determining the special punctuation mark corresponding to the special identifier.

As a possible implementation manner, before the inserting text is input into the pre-trained punctuation prediction model, the method further includes:

acquiring a training text containing punctuation marks; the training text contains at least one word;

replacing random texts at any positions in the training texts through a second identifier to obtain replaced texts, wherein the random texts comprise words and/or punctuations;

inputting the replacement text into an initial recognition model containing initial parameters, and acquiring a second prediction result of the initial recognition model for each second identifier;

Determining a predicted loss value of the initial recognition model according to a second predicted result of each second identifier;

and when the predicted loss value does not meet a preset convergence condition, iteratively updating initial parameters of the initial recognition model, and recording the converged initial recognition model as the punctuation prediction model until the predicted loss value meets the preset convergence condition.

As one possible implementation manner, the determining a predicted loss value of the initial recognition model according to the second prediction result of each second identifier includes:

determining a cross entropy loss value and a contrast learning loss value of the initial recognition model according to the second prediction result of each second identifier;

and carrying out weighted summation calculation on the cross entropy loss value and the comparison learning loss value to obtain a predicted loss value of the initial recognition model.

As one possible implementation, determining a comparative learning loss value of the initial recognition model according to the second prediction result of each of the second identifiers includes:

acquiring the initial vector features corresponding to each second prediction result in the initial recognition model; the initial vector features are obtained by extracting vector features of the initial recognition model on the second identifier;

Determining the same prediction result set and a difference prediction result set from a plurality of second prediction results aiming at each second prediction result;

performing similarity calculation on the initial vector features corresponding to the second predicted results and the initial vector features of each second predicted result in the same predicted result set to obtain first similarity of the second predicted result relative to the same predicted result set;

performing similarity calculation on the initial vector features corresponding to the second predicted results and the initial vector features of each second predicted result in the difference predicted result set to obtain second similarity of the second predicted result relative to the difference predicted result set;

and obtaining a comparison learning loss value of the initial recognition model according to the first similarity and the second similarity.

In a second aspect, an embodiment of the present application provides a punctuation mark recovery apparatus, the apparatus including:

the device comprises an acquisition module, a word segmentation module and a word segmentation module, wherein the acquisition module is used for acquiring a text to be restored and performing word segmentation processing on the text to be restored to obtain a word segmentation text containing a plurality of words; the text to be restored does not have punctuation marks;

The inserting module is used for inserting a first identifier between every two words in the word segmentation text to obtain an inserted text;

the input module is used for inputting the inserted text into a punctuation prediction model trained in advance, and obtaining a first prediction result of the punctuation prediction model aiming at each first identifier;

the processing module is used for performing punctuation recovery processing on the inserted text according to the first prediction result of each first identifier to obtain a target text; the target text has punctuation marks.

In a third aspect, an embodiment of the present application provides an electronic device, including: the system comprises a processor and a memory, wherein the processor is used for executing a punctuation recovery program stored in the memory so as to realize the punctuation recovery method in the first aspect.

In a fourth aspect, embodiments of the present application provide a storage medium storing one or more programs executable by one or more processors to implement the punctuation recovery method of any of the first aspects.

According to the technical scheme provided by the embodiment of the application, the text to be restored is obtained, word segmentation processing is carried out on the text to be restored to obtain the word segmentation text containing a plurality of words, the text to be restored does not have punctuation marks, first identifiers are inserted between every two words in the word segmentation text to obtain the inserted text, the inserted text is input into a pre-trained punctuation prediction model to obtain a first prediction result of the punctuation prediction model aiming at each first identifier, punctuation restoration processing is carried out on the inserted text according to the first prediction result of each first identifier to obtain the target text, and the target text has the punctuation marks.

According to the technical scheme, the first identifier is inserted between any two words in the text to be recovered by taking the words as granularity, and the first identifier is predicted by the punctuation prediction model trained in advance, so that the specific punctuation mark of the first identifier is determined to recover the punctuation mark in the text to be recovered, and compared with the classification of each word element in the text to be recovered, the punctuation mark can be recovered more efficiently and accurately, and the efficiency and the accuracy of punctuation mark recovery are improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.

FIG. 1 is a flowchart of an embodiment of a punctuation recovery method according to an embodiment of the present application;

FIG. 2 is a flowchart of obtaining a first prediction result of each first identifier through a punctuation prediction model according to an embodiment of the present application;

FIG. 3 is a flowchart of another embodiment of a punctuation recovery method according to an embodiment of the present application;

FIG. 4 is a flowchart of obtaining a second prediction result of each second identifier through an initial recognition model according to an embodiment of the present application;

FIG. 5 is a flowchart of an embodiment of a method for determining a predicted loss value according to an embodiment of the present application;

FIG. 6 is a block diagram of an embodiment of a punctuation mark recovery device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The following disclosure provides many different embodiments, or examples, for implementing different structures of the application. In order to simplify the present disclosure, components and arrangements of specific examples are described below. They are, of course, merely examples and are not intended to limit the application. Furthermore, the present application may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.

In order to solve the technical problems that in the prior art, when punctuation is recovered, a voice text without punctuation is processed into individual word elements through a word segmentation device, whether a punctuation is behind each word element or not is predicted through probability, namely, each word in the voice text is classified, the efficiency is low, and errors are easy to occur.

The punctuation recovery method provided by the application is further explained by the specific embodiments with reference to the drawings, and the embodiments do not limit the embodiments of the application.

Referring to fig. 1, a flowchart of an embodiment of a punctuation recovery method is provided in an embodiment of the present application. As shown in fig. 1, the process may include the steps of:

step 101, obtaining a text to be restored, and performing word segmentation processing on the text to be restored to obtain a word segmentation text containing a plurality of words, wherein the text to be restored does not have punctuation marks.

The punctuation marks mentioned above refer to punctuation marks used in daily text, which may include, but are not limited to: comma, period, semicolon, and pause. The text to be restored refers to the text needing to restore punctuation marks, that is, the text to be restored does not have punctuation marks. The terms refer to terms capable of completely expressing semantics, and the terms can be words or terms formed by two or more words, and the embodiment of the application is not limited to the terms. The word segmentation text refers to a text subjected to word segmentation processing on a text to be recovered, and the word segmentation text comprises a plurality of words.

In an embodiment, the text to be restored may be a pre-stored text without punctuation marks. Based on the above, the execution body of the embodiment of the application can acquire the text to be restored without punctuation marks from the preset storage medium. In another embodiment, the text to be recovered may be a voice text recognized by the voice recognition system and not including any punctuation marks. Based on the above, the execution subject of the embodiment of the present application may obtain the recognized voice text from the voice recognition system, and use the voice text as the text to be restored.

In practical application, compared with the case that the word is used as granularity to express the semantic meaning, the word can express the complete semantic meaning, so that after the execution body of the embodiment of the application obtains the text to be recovered, in order to determine different words included in the text to be recovered more accurately, word segmentation processing can be performed on the text to be recovered to obtain the word segmentation text containing a plurality of words. As an exemplary embodiment, the word segmentation tool may be used to perform word segmentation on the text to be recovered, so as to obtain a word segmentation text containing a plurality of words. In this embodiment, the word segmentation tool is preferably LTP; before the word segmentation task is executed, the word segmentation tool can be trained aiming at an independent application scene, so that the word segmentation tool can automatically identify the word classification commonly used in the current scene, the phenomenon of word segmentation error is avoided, and the accuracy rate of punctuation mark recovery is further improved.

Step 102, inserting a first identifier between every two words in the word segmentation text to obtain an inserted text.

The first identifier is used to mark the location where a punctuation mark may be present, i.e. if the first identifier is inserted between two words, it means that a punctuation mark may be present between the two words. The first identifier may be a predetermined special symbol (e.g., [ mask ]), or may be an english letter, which is not limited in the embodiment of the present application.

In the embodiment of the application, when punctuation recovery is carried out on the text to be recovered, in order to improve the recovery efficiency, the words can be used as granularity, and a first identifier is inserted between every two words of the word segmentation text to obtain the inserted text. The punctuation between the two words inserted into the first identifiers may then be determined by determining the specific punctuation of each first identifier. The specific punctuation marks, which are specific to how the first identifier is determined, are described below and will not be described in detail here.

In addition, in order to reduce the length of the inserted text after the first identifier is inserted and improve the punctuation recovery efficiency, emotion analysis can be carried out on each adjacent word in the word segmentation text, and a strong association relationship between the two adjacent words is determined.

Optionally, if the strong association between two adjacent words is large (for example, greater than a preset association threshold), the first identifier is not inserted between the two adjacent words. Conversely, if the strong association between two adjacent words is small (e.g., less than or equal to a preset association threshold), the first identifier may be inserted between the two adjacent words.

For example, assuming that two adjacent words are "dangerous" and "industry" respectively, it can be determined by word analysis that the association between the two words is relatively large, so that no special symbol may be inserted between the two words. Continuing to assume that two adjacent words are "food" and "sun", respectively, it can be determined by word analysis that the association between the two words is small, and thus, a special symbol can be added. The word analysis is to judge the semantic relationship and the part-of-speech relationship of two words, and the semantic relationship and the part-of-speech relationship between a plurality of words are continuous, namely associated; therefore, the words are always sequentially displayed in the text, and only two adjacent words are needed to be judged when the strong association relation is judged, and words with far intervals are not needed to be judged.

And step 103, inputting the inserted text into a pre-trained punctuation prediction model to obtain a first prediction result of the punctuation prediction model aiming at each first identifier.

The punctuation prediction model is a model which is obtained in advance and used for predicting punctuation marks corresponding to the first identifier.

The first prediction result refers to a prediction result corresponding to each first identifier, which is used for predicting a punctuation mark corresponding to the first identifier, where the first prediction results corresponding to the first identifiers inserted into different positions in the text may be the same or different, which is not limited in the embodiment of the present application.

It may be appreciated that the location of the first identifier may or may not have punctuation, and when the location of the first identifier has punctuation, the first prediction result may be punctuation; when the punctuation mark does not exist in the location of the first identifier, the first prediction result may be a null punctuation mark, for example, "null", which is not limited in this embodiment of the present application.

As can be seen from step 102, in the embodiment of the present application, by inserting a first identifier between every two words in the segmented text and determining the specific content of each first identifier, it is determined whether or not there is a punctuation mark between the two words inserted with the first identifier, and in particular which punctuation mark.

Based on this, in an embodiment, the execution subject of the embodiment of the present application obtains a predicted result (hereinafter referred to as a first predicted result for convenience of description) of each first identifier in the inserted text output by the punctuation prediction model by inputting the inserted text into the punctuation prediction model trained in advance. The specific training of the punctuation prediction model is described below by the flow shown in fig. 3, and will not be described in detail here.

As an exemplary embodiment, the punctuation prediction model may include a feature vector layer, a linear layer, and a normalized exponential function. The feature vector layer can be used for carrying out vector conversion on the input features to obtain vector features with preset dimensions; the linear layer can be used for reducing the dimension of the vector feature obtained by the feature vector layer; and predicting the vector features with reduced dimensionality by the normalized exponential function to obtain probability values of the input features corresponding to each preset value.

Based on this, the first prediction result corresponding to each first identifier may be obtained through the flow shown in fig. 2. Referring to fig. 2, a flowchart of obtaining a first prediction result of each first identifier through a punctuation prediction model is provided in an embodiment of the present application. As shown in fig. 2, the process may include the steps of:

and 201, extracting vector features of the inserted text through a punctuation prediction model to obtain first vector features corresponding to each first identifier in the inserted text.

The first vector feature refers to a vector feature of a preset dimension extracted for each first identifier in the inserted text. As can be seen from the above description, the feature vector layer may be used to perform vector transformation on the input features to obtain feature vectors with preset dimensions. Based on this, in the embodiment of the present application, after the text is inserted into the punctuation prediction model, the punctuation prediction model may perform vector feature extraction on each first identifier of the inserted text through the feature vector layer, so as to obtain a first vector feature corresponding to each first identifier.

Step 202, performing dimension reduction processing on the first vector features corresponding to each first identifier to obtain second vector features corresponding to the first identifiers.

The second vector feature refers to a preset dimension vector feature after the first vector feature is subjected to the dimension reduction processing, that is, a vector dimension corresponding to the second vector feature is smaller than a vector dimension corresponding to the first vector feature.

In an embodiment, the feature dimension of the second vector feature may be consistent with the number of punctuation predictors, and the vector feature of each dimension may correspond to one predictor, for example, the predictor is a punctuation set [ comma, period, question mark, and other (non-punctuation) respectively, and then the feature dimension of the second vector feature may be 4.

From the above description, the punctuation prediction model may include a linear layer that may be used to reduce the dimensionality of the vector features obtained by the feature vector layer. Based on this, in the embodiment of the present application, the first vector feature corresponding to each first identifier may be subjected to the dimension reduction processing through the linear layer of the punctuation prediction model, so as to obtain the second vector feature corresponding to each first identifier.

Step 203, performing normalization processing on the second vector feature to obtain a punctuation mark prediction set corresponding to the first identifier, where the punctuation mark prediction set includes at least one punctuation mark, and one punctuation mark corresponds to one probability value.

The punctuation mark prediction set refers to a predicted value of the predicted first identifier for each punctuation mark in the preset punctuation marks, so that the punctuation mark prediction set can comprise at least one punctuation mark, and one punctuation mark corresponds to one probability value. It will be appreciated that the preset punctuation mark may be set to different fields, for example, in the chinese medical field, more punctuation marks including comma, stop sign, semicolon, period, etc., and then the preset punctuation mark may include several punctuation marks for improving efficiency.

For example, assuming that the predetermined punctuation is comma, period, question and other, which represent commas, periods, question marks, and others (non-punctuation), respectively, continuing to assume that the predicted probability value for each punctuation is 0.2, 0.5, 0.1, and 0.2, then the punctuation prediction set may be { comm: 0.2, period:0.5, query: 0.1, other:0.2}.

It should be noted that, each of the first identifiers may correspond to one punctuation mark prediction set, that is, when predicting each of the first identifiers, the probabilities that the first identifier belongs to the four types of punctuation marks may be predicted, so as to form the punctuation mark prediction set corresponding to the first identifier.

As can be seen from the above description, the punctuation prediction model may include a normalized exponential function, where the normalized exponential function predicts the vector feature after the dimensionality reduction to obtain a probability value of the input feature corresponding to each preset value. Based on this, in the embodiment of the present application, the normalization function of the punctuation prediction model may normalize the second vector feature corresponding to each first identifier to obtain the punctuation prediction set corresponding to each first identifier.

And 204, determining the punctuation mark with the maximum probability value in the punctuation mark prediction set as a first prediction result of the first identifier.

As can be seen from the above description, the punctuation prediction set corresponding to each first identifier may include a probability value corresponding to each preset punctuation of the first identifier. Thus, the punctuation mark having the largest probability value in the punctuation mark prediction set may be determined as the first prediction result of the first identifier.

Illustratively, assume that there is one punctuation prediction set corresponding to the first identifier as { comma:0.2, period:0.5, query: 0.1, other:0.2, since the probability value corresponding to the period is the largest, the period can be determined as the first prediction result corresponding to the first identifier. Assume again that there is another punctuation prediction set corresponding to the first identifier as { comma:0.2, period:0.1, query: 0.1, other:0.6}, since the probability value corresponding to the other (non-punctuation) is the largest, a null punctuation (e.g., a space or null) can be determined as the first prediction result corresponding to the first identifier.

Further, if at least two maximum probability values exist, determining whether a probability value corresponding to the non-punctuation mark exists in the at least two maximum probability values, and if the probability value corresponding to the non-punctuation mark exists, directly determining the probability value corresponding to the non-punctuation mark as a final maximum probability value; if the probability value corresponding to the non-punctuation mark does not exist, a maximum probability value can be randomly selected as a final maximum probability value, and after the punctuation mark corresponding to the maximum probability value is determined to be a first predicted result, the first predicted result is marked, so that a user can further judge and correct the first predicted result.

The description of the flow shown in fig. 2 is completed so far.

And 104, performing punctuation recovery processing on the inserted text according to the first prediction result of each first identifier to obtain a target text, wherein the target text is provided with punctuation marks.

The target text refers to a text with punctuation marks after punctuation recovery processing is carried out on the inserted text, namely, the target text has the same text content as the text to be recovered, but has the punctuation marks, and has strong readability.

In the embodiment of the application, after the first prediction result of each first identifier in the inserted text is obtained, punctuation recovery processing can be carried out on the inserted text according to the first prediction result of each first identifier, so that the target text after the punctuation is recovered is obtained.

As an exemplary embodiment, it may be determined whether the first prediction result is a null punctuation mark, and if the first prediction result is a null punctuation mark, deleting a first identifier corresponding to the first prediction result in the inserted text; and if the first predicted result is not the empty punctuation mark, restoring the corresponding first identifier into the first predicted result in the inserted text.

Further, in practical applications, when the punctuation recovery is performed, special situations may exist, which may cause inaccurate punctuation recovery, for example, the punctuation corresponding to the first identifier is not in the preset punctuation prediction set, and at this time, according to the above method, the punctuation corresponding to the first identifier may be determined as the punctuation in the punctuation set.

For example, assume that there is a first identifier corresponding to the punctuation prediction set described above as { comma:0.3, period:0.2, query: 0.25, other:0.25}, the punctuation corresponding to the maximum probability value is the first prediction result corresponding to the first identifier, so that the first prediction result corresponding to the first identifier is comma, and in practical application, the punctuation corresponding to the first identifier is an exclamation mark, which results in inaccuracy of the recovered punctuation when the punctuation is recovered for the inserted text.

Based on this, in order to avoid the above situation and improve the accuracy of punctuation recovery, in the embodiment of the present application, a probability threshold may be set, and if the probability value corresponding to the first prediction result is smaller than the probability threshold, it is indicated that the punctuation corresponding to the first identifier is not the first prediction result.

Alternatively, since the sum of probability values in the punctuation mark prediction set corresponding to each first identifier output by the punctuation pre-model is 1, the probability threshold may be determined based on the basic probability threshold by dividing 1 by the number of punctuation marks in the punctuation mark prediction set. For example, if the number of punctuation marks in the punctuation mark prediction set is 4, then the basic probability threshold is 0.25, so that the probability threshold can be determined to be 0.3, or 0.35, etc.

Based on this, in an embodiment, it may be determined whether the probability value corresponding to the first prediction result is greater than the preset probability threshold.

Optionally, if it is determined that the probability value corresponding to the first predicted result is greater than the probability threshold, determining whether the first predicted result is a null punctuation mark, and if the first predicted result is a null punctuation mark, deleting a first identifier corresponding to the first predicted result in the inserted text; and if the first predicted result is not the empty punctuation mark, restoring the corresponding first identifier into the first predicted result in the inserted text.

In contrast, in the case where it is determined that the probability value corresponding to the first prediction result is less than or equal to the preset probability threshold, the location where the first identifier is located may be determined to be a special punctuation mark (i.e., the punctuation mark at the location of the first identifier is not any punctuation mark in the punctuation mark prediction set at this time), and thus the first identifier may be determined to be a special identifier.

Thereafter, it may be determined whether a pre-set special punctuation (i.e., a punctuation in a non-punctuation prediction set) is present before the special identifier.

Optionally, in the case that a special punctuation mark exists before the special identifier, emotion analysis can be performed on the text between the special punctuation mark and the special punctuation mark, that is, emotion analysis is performed on the special punctuation mark and the text before the special punctuation mark, so as to determine the special punctuation mark (such as an exclamation mark or a question mark) corresponding to the special identifier.

Conversely, in the case where no special punctuation mark exists before a special identifier, emotion analysis may be performed on all text before the special identifier, thereby determining a special punctuation mark (e.g., an exclamation mark or question mark) corresponding to the special identifier.

According to the technical scheme provided by the embodiment of the application, the text to be restored is obtained, word segmentation processing is carried out on the text to be restored to obtain the word segmentation text containing a plurality of words, the text to be restored does not have punctuation marks, first identifiers are inserted between every two words in the word segmentation text to obtain the inserted text, the inserted text is input into a pre-trained punctuation prediction model to obtain a first prediction result of the punctuation prediction model aiming at each first identifier, punctuation restoration processing is carried out on the inserted text according to the first prediction result of each first identifier to obtain the target text, and the target text has the punctuation marks. According to the technical scheme, the first identifier is inserted between any two words in the text to be recovered by taking the words as granularity, and the first identifier is predicted by the punctuation prediction model trained in advance, so that the specific punctuation mark of the first identifier is determined to recover the punctuation mark in the text to be recovered, and compared with the classification of each word element in the text to be recovered, the punctuation mark can be recovered more efficiently and accurately, and the efficiency and the accuracy of punctuation mark recovery are improved.

Referring to fig. 3, a flowchart of an embodiment of another punctuation recovery method according to an embodiment of the present application is provided. The flow shown in fig. 3 describes how the punctuation predictive model is trained specifically on the basis of the flow shown in fig. 1. As shown in fig. 3, the process may include the steps of:

step 301, obtaining training text containing punctuation marks, wherein the training text contains at least one word.

The training text may be a text containing punctuation marks, and the training text may be a normal text or a text obtained by performing speech recognition conversion on a daily session, which is not limited in the embodiment of the present application. It should be noted that the training text may include at least one word, and punctuation marks included in the training text are all correct punctuation marks.

In one implementation, the training text may be pre-stored text that includes the correct punctuation marks. Based on the above, the execution body of the embodiment of the application can obtain the training text containing the correct punctuation marks from the preset storage medium.

In another embodiment, the training text may be a voice text recognized by the voice recognition system and correctly marked with punctuation marks by the user. Based on this, the execution subject of the embodiment of the present application may obtain the recognized voice text from the voice recognition system, and after the user correctly marks the punctuation on the voice text, use the voice text marked with the punctuation as the training text.

It should be noted that, if the punctuation prediction model for training is specific to a specific limited scene, for example, a medical scene and a government scene, a text under the corresponding scene may be collected as a training text; if the scene is not required to be specially limited, the scene can be acquired at will.

Step 302, replacing random texts at any positions in the training texts through the second identifiers to obtain replaced texts, wherein the random texts comprise words and/or punctuation marks.

The random text refers to a text at any position in the training text, which may be a word, or a punctuation mark, and the embodiment of the present application is not limited thereto. The second identifier is used for replacing random text at any position in the training text. The second identifier may be a predetermined special symbol (e.g., [ mask ]), or may be an english letter, which is not limited in the embodiment of the present application.

In the embodiment of the application, in order to train the punctuation prediction model, the punctuation prediction model can correctly predict whether each position in the input text is a punctuation mark or not, and specifically which punctuation mark, the execution subject of the embodiment of the application can replace the random text at any position in the training text with the second identifier to obtain a replacement text, so that the punctuation prediction model is trained according to the replacement text.

For example, assume that a training text is "Chinese medicine treatment takes tonifying spleen and lung and replenishing qi as a method, and the recipe is modified by the decoction for tonifying middle-jiao and Qi, and the recipe comprises 15g of radix pseudostellariae, 15g of bighead atractylodes rhizome, 10g of dried orange peel, 10g of roughhaired holly root and 15g of rhizoma pinellinae praeparata. And supposing that the second identifier is [ mask ], randomly replacing by using the [ mask ] according to the method in the step, and converting the replaced training text into a Chinese medicine recipe for tonifying [ mask ] and invigorating qi and opening sound, wherein the Chinese medicine recipe for tonifying middle-jiao and Qi decoction is modified by adding and subtracting the [ mask ] recipe, and the Chinese medicine recipe comprises the following components of radix pseudostellariae [ mask ], bighead atractylodes rhizome 15g [ mask ] dried orange peel 10g, roughhaired holly 10g and rhizoma pinellinae praeparata 15g. ".

Step 303, inputting the replacement text into an initial recognition model containing initial parameters, and obtaining a second prediction result of the initial recognition model for each second identifier.

Step 304, determining a predicted loss value of the initial recognition model according to the second predicted result of each second identifier.

Step 305, determining whether the predicted loss value is greater than a preset convergence condition, if yes, executing step 306; if not, step 307 is performed.

Step 306, iteratively updating initial parameters of the initial recognition model, and returning to the execution step 303.

Step 307, recording the converged initial recognition model as a punctuation prediction model.

The following collectively describes steps 303 to 307:

the initial recognition model refers to a preset model which contains initial parameters and is not trained. It will be appreciated that the initial recognition model may be derived by: a pre-training model RoBERTa-WWM (A Robustly Optimized BERT Pre-training Approach-Whole Word Masking) is obtained in advance, with which a teacher model, i.e., a PMP model, of a conventional teacher-student model is initialized. Several coding layers are randomly selected from the teacher model and initialized into the student model. And collecting relevant training texts, carrying out similar processing on the training texts by replacing the identifiers, and inputting the processed texts into a teacher model and a student model so as to enable the teacher model to output a predicted result of the text and the student model to output the predicted result of the text. Taking the prediction result output by the teacher model as a real label, taking the prediction result output by the student model as a prediction label, calculating the loss between the real label and the prediction label through a mean square error function, and adjusting the student model to extract an hidden layer from the teacher model to the student model. Thus, the pre-training model or the whole scale of the teacher model can be miniaturized, and the finally obtained student model is the initial recognition model.

The second prediction result refers to the result of each second identifier in the predicted alternative text, which is used for predicting the specific content of the second identifier, for example, whether the second identifier is a punctuation mark, and if the second identifier is a punctuation mark, which punctuation mark is specific, and so on. It may be appreciated that the second prediction results corresponding to the second identifiers at different positions in the substitution text may be the same or different, which is not limited in the embodiment of the present application.

The convergence condition refers to a judgment criterion for judging whether the initial recognition model can be used as a punctuation prediction model when the initial recognition model is trained. Alternatively, the convergence condition may be that the predicted loss value is smaller than the loss value threshold, or the training frequency reaches a preset training frequency threshold, which is not limited in the embodiment of the present application. In the embodiment of the application, the preset initial recognition model can be trained through a plurality of alternative texts to obtain the punctuation prediction model. That is, after the training of the initial recognition model by the replacement text reaches the preset convergence condition, one text can be reselected and processed into a new replacement text, or a second identifier at a different position is replaced by the replacement text to form the new replacement text.

Specifically, the replacement text may be input into the initial recognition model including the initial parameters, and a second prediction result of each second identifier in the replacement text output by the initial recognition model is obtained. Then, according to the second prediction result of each second identifier, determining a predicted loss value of the initial recognition model, and determining whether the loss value meets a preset convergence condition, if so, recording the initial recognition model after current convergence as a punctuation prediction model; if the initial parameters do not meet the preset convergence conditions, the initial parameters of the current initial recognition model can be updated through training text iteration until the predicted loss value meets the preset convergence conditions, and the converged initial recognition model is recorded as a punctuation prediction model.

As to how the predicted loss value of the initial recognition model is determined based on the second prediction result, it will be described below by the flow shown in fig. 5, which will not be described in detail.

As an exemplary embodiment, the initial recognition model may include a basic eigenvector layer, a basic linear layer, and a basic normalized exponential function. The basic feature vector layer can be used for carrying out vector conversion on the input features to obtain basic vector features with preset dimensions; the basic linear layer can be used for reducing the dimensionality of basic vector features obtained by the basic feature vector layer; and predicting the basic vector features after the dimensionality reduction by the basic normalization exponential function to obtain probability values of the input features corresponding to each preset value.

Based on this, the second prediction result of the initial recognition model for each second identifier can be obtained through the flow shown in fig. 4. Referring to fig. 4, a flowchart is provided for obtaining a second prediction result of each second identifier through an initial recognition model according to an embodiment of the present application. As shown in fig. 4, the process may include the steps of:

and 401, extracting vector features of the replacement text through the initial recognition model to obtain first basic vector features corresponding to each second identifier in the replacement text.

The first basic vector feature refers to a vector feature of a preset dimension extracted for each second identifier in the substitution text.

As can be seen from the above description, the basic feature vector layer may be used to perform vector conversion on the input features to obtain basic feature vectors with preset dimensions. Based on this, in the embodiment of the present application, after the replacing text is input into the initial recognition model, the initial recognition model may perform vector feature processing on each second identifier of the inserted text through the basic feature vector layer, so as to obtain a first basic vector feature corresponding to each second identifier.

And step 402, performing dimension reduction processing on the first basic vector features corresponding to each second identifier to obtain second basic vector features corresponding to the second identifiers.

The second basic vector feature refers to a vector feature of a preset dimension after the first basic vector feature is subjected to the dimension reduction processing, that is, a vector dimension corresponding to the second basic vector feature is smaller than a vector dimension corresponding to the first basic vector feature.

In an embodiment, the feature dimension of the second basic vector feature may be consistent with the number of punctuation predictors, and the vector feature of each dimension may correspond to one predictor, for example, the predictor is a punctuation set [ comma, period, query, other ] representing comma, period, question mark, and others (non-punctuation), and then the feature dimension of the second basic vector feature may be 4.

From the above description, the initial recognition model may include a base linear layer that may be used to reduce the dimensions of the base vector features that result from the base feature vector layer. Based on this, in the embodiment of the present application, the first basis vector feature corresponding to each second identifier may be subjected to the dimension reduction processing through the basic linear layer of the initial recognition model, so as to obtain the second basis vector feature corresponding to each second identifier.

Step 403, performing normalization processing on the second basic vector feature to obtain a basic punctuation mark prediction set corresponding to the second identifier, where the basic punctuation mark prediction set includes at least one punctuation mark, and one punctuation mark corresponds to one basic probability value.

The above-mentioned basic punctuation mark prediction set refers to the predicted value of the predicted second identifier for each punctuation mark in the preset punctuation marks, so that the punctuation mark prediction set may include at least one punctuation mark, and one punctuation mark corresponds to one basic probability value. It will be appreciated that the preset punctuation mark may be set to different fields, for example, in the chinese medical field, more punctuation marks including comma, stop sign, semicolon, period, etc., and then the preset punctuation mark may include several punctuation marks for improving efficiency.

For example, assuming that the predetermined punctuation is comma, period, question and other, which represent commas, periods, question marks, and others (non-punctuation), respectively, continuing to assume that the predicted probability value for each punctuation is 0.2, 0.5, 0.1, and 0.2, then the base punctuation prediction set may be { comm: 0.2, period:0.5, query: 0.1, other:0.2}.

It should be noted that, each of the second identifiers may correspond to one base punctuation mark prediction set, that is, when predicting each of the second identifiers, the probabilities that the second identifier belongs to the four types of base punctuation marks may be predicted, so as to form the base punctuation mark prediction set corresponding to the second identifier.

As can be seen from the above description, the initial recognition model may include a basic normalized exponential function, and the basic normalized exponential function predicts the feature of the reduced dimension basic vector to obtain a probability value of the input feature corresponding to each preset value. Based on this, in the embodiment of the present application, the second basis vector feature corresponding to each second identifier may be normalized by the basis normalization function of the initial recognition model, so as to obtain the basis punctuation mark prediction set corresponding to each second identifier.

And step 404, determining the punctuation mark with the largest basic probability value in the basic punctuation mark prediction set as a second prediction result of the second identifier.

As can be seen from the above description, the base punctuation mark prediction set corresponding to each second identifier may include a base probability value corresponding to each preset punctuation mark of the second identifier. Thus, the punctuation with the greatest base probability value in the base punctuation prediction set may be determined as the second prediction result of the second identifier.

For example, assume that there is a set of base punctuation predictions corresponding to the second identifier as { comma:0.2, period:0.5, query: 0.1, other:0.2, since the probability value corresponding to the period is the largest, the period can be determined as the second prediction result corresponding to the second identifier. Assume again that there is another set of base punctuation predictions corresponding to the second identifier as { comma:0.2, period:0.1, query: 0.1, other:0.6, since the probability value corresponding to the other (non-punctuation mark) is the largest, a space can be determined as the second prediction result corresponding to the second identifier.

Further, if there are at least two maximum probability values, in order to train the initial equipment model more accurately, punctuation marks corresponding to the at least two maximum probability values may be used as the second prediction results, respectively, that is, the second identifier may correspond to the at least two second prediction results.

According to the technical scheme provided by the embodiment of the application, the training text containing punctuation marks is obtained, the training text contains at least one word, the text at any position in the training text is replaced through the second identifiers to obtain a replaced text, the text comprises the word and/or the punctuation marks, the replaced text is input into an initial recognition model containing initial parameters, a second prediction result of the initial recognition model for each second identifier is obtained, a prediction loss value of the initial recognition model is determined according to the second prediction result of each second identifier, whether the prediction loss value is larger than a preset convergence condition is determined, if yes, the initial parameters of the initial recognition model are iteratively updated, and the step of determining the prediction loss value is continuously carried out in a returning mode; if not, recording the initial recognition model after convergence as a punctuation prediction model. According to the technical scheme, the initial recognition model containing the initial parameters is trained through the training text containing the correct punctuation marks, and the initial parameters of the initial recognition model are continuously updated according to the prediction loss value, so that the punctuation prediction model is obtained, the more accurate punctuation prediction model is trained, and therefore the efficiency and the accuracy of punctuation mark recovery are improved.

Referring to fig. 5, a flowchart of an embodiment of a method for determining a predicted loss value is provided in an embodiment of the present application. The flow shown in fig. 5 describes how the predicted loss value of the initial recognition model is determined based on the second prediction result of each second identifier, specifically, on the basis of the flow shown in fig. 3. As shown in fig. 5, the process may include the steps of:

step 501, determining a cross entropy loss value and a contrast learning loss value of the initial recognition model according to the second prediction result of each second identifier.

The cross entropy loss value refers to a loss value determined by a cross entropy function.

The contrast learning loss value is a loss value determined by contrast learning. Contrast learning is a new machine learning method, and is divided into unsupervised contrast learning and supervised contrast learning, i.e. whether a training set has labels or not. The supervised contrast learning used in the present application is to pull the feature vector representation of the same class of data in vector space, and the feature vector representation of the different classes of data in vector space, so that the subsequent classifier may simply classify the data.

In the embodiment of the application, after the second prediction result of each second identifier is obtained, punctuation recovery can be performed on the replacement text according to the second prediction result to form a recovered text, and at this time, the recovered text has a certain punctuation mark (but not necessarily an accurate punctuation mark). Therefore, the cross entropy loss value and the contrast learning loss value can be determined according to the restored text and the training text. That is, determining whether the word or punctuation of the location in the recovered text that was originally replaced by the second identifier is the same as the word or punctuation of the corresponding location in the training text determines the cross entropy loss value and the contrast learning loss value of the initial recognition model.

In an embodiment, the cross entropy loss value of the initial recognition model may be determined according to the second prediction result and the training text corresponding to each second identifier.

As an exemplary implementation manner, the execution body according to the embodiment of the present application may perform punctuation recovery on the replacement text according to the second prediction result corresponding to each second identifier, so as to obtain a recovered text corresponding to the replacement text, where the recovered text may be a text obtained after recognition by the initial recognition model, and punctuation marks in the recovered text may be the same as or may be different from that in the training text. It can be understood that when the punctuation marks in the recovery text are identical or mostly identical to the punctuation marks in the training text, the prediction result of the initial recognition model is relatively accurate, so that the recovery text and the training text can be input into a predicted cross entropy function to determine the cross entropy loss value of the initial recognition model of the training.

In one embodiment, a comparative learning loss value for the initial recognition model may be determined based on the second prediction result for each second identifier.

As an exemplary embodiment, the comparative learning loss value of the initial recognition model may be determined by:

first, since the calculation of the contrast learning loss value is based on the vector feature calculation, a first base vector feature corresponding to the second prediction result of each second identifier may be obtained, where the first base vector feature is the first base vector feature determined in the flowchart shown in fig. 4.

Thereafter, for each second prediction result (hereinafter, for convenience of description, the second prediction result currently being aimed at will be referred to as a target prediction result), the same prediction result set and the difference prediction result set may be determined from among the plurality of second prediction results. The same prediction result set refers to a set formed by a second prediction result in which the second prediction result and the target prediction result are the same punctuation mark, and the difference prediction result set refers to a set formed by a second prediction result in which the second prediction result and the target prediction result are different punctuation marks.

Based on the above, similarity calculation can be performed on the first basic vector feature corresponding to the target prediction result and the first basic vector feature of each second prediction result in the same prediction result set, so as to obtain the first similarity of the target prediction result relative to the same prediction result set.

As an exemplary embodiment, it may be determined that the first base vector feature corresponding to the target prediction result performs similarity calculation with the first base vector feature of each second prediction result in the same prediction result set, so as to obtain a plurality of third similarities. And then, carrying out weighted summation calculation on the third similarities to obtain the first similarity.

And simultaneously, carrying out similarity calculation on the first basic vector characteristic corresponding to the target predicted result and the first basic vector characteristic of each second predicted result in the difference predicted result set to obtain the second similarity of the target predicted result relative to the difference predicted result set.

As an exemplary embodiment, similarity calculation may be performed on the first basis vector feature corresponding to the target prediction result and each second prediction result in the difference prediction result set, so as to obtain a plurality of fourth similarities. Then, a weighted sum calculation may be performed on the plurality of fourth similarities to obtain the second similarity.

And finally, determining the first similarity and the second similarity corresponding to each second predicted result according to the method, and dividing the sum of the first similarities of the second predicted results by the sum of the first similarities and the second similarities of the second predicted results to obtain the comparison learning loss value of the initial recognition model of the training.

For example, assuming that n second prediction results exist, the first similarity corresponding to each second prediction result is A1, A2, … …, an, and the second similarity corresponding to each second prediction result is B1, B2, … …, bn, then the comparison learning loss value of the initial recognition model in this training can be obtained by the following equation (one):

first, the first is

And 502, carrying out weighted summation calculation on the cross entropy loss value and the comparison learning loss value to obtain a predicted loss value of the initial recognition model.

In the embodiment of the application, after the cross entropy loss value and the contrast learning loss value of the initial recognition model are determined, the cross entropy loss value and the contrast learning loss value are subjected to weighted summation calculation to obtain the loss value of the training.

For example, the cross entropy loss value and the contrast learning loss value may be input into the following equation (two) to obtain the loss value of the present training:

two kinds of

Wherein,weight coefficient corresponding to cross entropy loss value, < ->For cross entropy loss value, < >>The loss values are learned for comparison.

According to the technical scheme provided by the embodiment of the application, the cross entropy loss value and the contrast learning loss value of the initial recognition model are determined according to the second prediction result of each second identifier, and the weighted summation calculation is carried out on the cross entropy loss value and the contrast learning loss value, so that the prediction loss value of the initial recognition model is obtained. According to the technical scheme, when the punctuation prediction model is trained, the cross entropy loss value and the contrast learning loss value corresponding to the initial recognition model are determined, the prediction loss value of the initial recognition model is determined according to the cross entropy loss value and the contrast learning loss value, the punctuation prediction model is obtained through training according to the prediction loss value, and the punctuation prediction model is obtained through more accurate training.

Referring to fig. 6, a block diagram of an embodiment of a punctuation mark recovery device according to an embodiment of the present application is provided. As shown in fig. 6, the apparatus may include:

the obtaining module 61 is configured to obtain a text to be recovered, and perform word segmentation processing on the text to be recovered to obtain a word segmentation text containing a plurality of words; the text to be restored does not have punctuation marks;

an inserting module 62, configured to insert a first identifier between every two words in the word segmentation text, to obtain an inserted text;

an input module 63, configured to input the inserted text into a punctuation prediction model trained in advance, and obtain a first prediction result of the punctuation prediction model for each of the first identifiers;

a processing module 64, configured to perform punctuation recovery processing on the inserted text according to a first prediction result of each of the first identifiers, to obtain a target text; the target text has punctuation marks.

As shown in fig. 7, a schematic structural diagram of an electronic device according to an embodiment of the present application includes a processor 71, a communication interface 72, a memory 73 and a communication bus 74, where the processor 71, the communication interface 72, and the memory 73 perform communication with each other through the communication bus 74,

A memory 73 for storing a computer program;

in one embodiment of the present application, the processor 71 is configured to implement the punctuation mark recovery method provided in any one of the foregoing method embodiments when executing the program stored in the memory 73, where the method includes:

The embodiment of the application also provides a storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the punctuation recovery method provided by any of the method embodiments described above.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, or may be implemented by hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the related art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the method described in the respective embodiments or some parts of the embodiments.

It is to be understood that the terminology used herein is for the purpose of describing particular example embodiments only, and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The terms "comprises," "comprising," "includes," "including," and "having" are inclusive and therefore specify the presence of stated features, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, elements, components, and/or groups thereof. The method steps, processes, and operations described herein are not to be construed as necessarily requiring their performance in the particular order described or illustrated, unless an order of performance is explicitly stated. It should also be appreciated that additional or alternative steps may be used.

The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A punctuation recovery method, the method comprising:

2. The method of claim 1, wherein said inputting the inserted text into a pre-trained punctuation prediction model, obtaining a first prediction result of the punctuation prediction model for each of the first identifiers, comprises:

3. The method according to claim 2, wherein the punctuation recovery processing of the inserted text according to the first prediction result of each of the first identifiers includes:

4. The method according to claim 2, wherein the punctuation recovery processing of the inserted text according to the first prediction result of each of the first identifiers includes:

5. The method of claim 1, wherein prior to said entering the inserted text into the pre-trained punctuation predictive model, further comprising:

6. The method of claim 5, wherein said determining a predictive loss value for said initial recognition model based on said second predictive outcome for each said second identifier comprises:

7. The method of claim 6, wherein determining a comparative learning loss value for the initial recognition model based on the second prediction result for each of the second identifiers comprises:

8. A punctuation mark recovery device, said device comprising:

9. An electronic device, comprising: the system comprises a processor and a memory, wherein the processor is used for executing a punctuation recovery program stored in the memory so as to realize the punctuation recovery method of any one of claims 1-7.

10. A storage medium storing one or more programs executable by one or more processors to implement the punctuation recovery method of any one of claims 1-7.