CN111027291A - Method and device for adding punctuation marks in text and training model and electronic equipment - Google Patents
Method and device for adding punctuation marks in text and training model and electronic equipment Download PDFInfo
- Publication number
- CN111027291A CN111027291A CN201911182421.XA CN201911182421A CN111027291A CN 111027291 A CN111027291 A CN 111027291A CN 201911182421 A CN201911182421 A CN 201911182421A CN 111027291 A CN111027291 A CN 111027291A
- Authority
- CN
- China
- Prior art keywords
- text
- word
- training
- character
- added
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 106
- 238000000034 method Methods 0.000 title claims abstract description 59
- 239000013598 vector Substances 0.000 claims abstract description 143
- 238000012545 processing Methods 0.000 claims abstract description 76
- 230000011218 segmentation Effects 0.000 claims abstract description 51
- 238000010606 normalization Methods 0.000 claims abstract description 28
- 238000001914 filtration Methods 0.000 claims abstract description 24
- 230000009467 reduction Effects 0.000 claims abstract description 16
- 230000015654 memory Effects 0.000 claims description 31
- 230000002457 bidirectional effect Effects 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 4
- 230000006403 short-term memory Effects 0.000 claims description 4
- 230000007787 long-term memory Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 5
- 238000003780 insertion Methods 0.000 description 4
- 230000037431 insertion Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention provides a method, a device and electronic equipment for adding a mark symbol in a text and training a model, wherein the method comprises the following steps: performing word segmentation processing and part-of-speech recognition on the text to be added, performing normalization processing, and determining a word/word vector; splicing the part of speech information, the word segmentation boundary information and the word/word vectors to obtain a characteristic vector; inputting the feature vectors into a trained seq2seq model to obtain a plurality of candidate text sequences added with punctuations and form a candidate text sequence set; filtering text sequences which do not meet the conditions in the candidate text sequence set; in the remaining text sequences of the candidate text sequence set, the text sequence which has the highest joint probability and meets the punctuation mark specification is output, and the output text sequence is subjected to normalized reduction operation, so that the problem that a plurality of punctuation marks are added behind characters can be well solved, and the accuracy of adding the punctuation marks is improved.
Description
Technical Field
The embodiment of the invention relates to a language processing technology, in particular to a method, a device and electronic equipment for adding punctuation marks in texts and training models.
Background
With the rapid development of society and high-tech technologies, natural language processing such as smart home control, automatic question answering, voice assistance and the like is receiving more and more attention. However, punctuation prediction is an extremely important natural language processing task because spoken dialog has no punctuation and cannot distinguish between sentence boundaries and canonical language structures. In the service scene of the intelligent telephone, for the speech of the user, the original text without punctuation and punctuation is obtained through speech recognition, and the original text without punctuation and punctuation is not directly used, so that the punctuation prediction is carried out on the original text, and the aim of adding punctuation is fulfilled.
In the related art, different technical solutions have been developed for the scene of automatically adding punctuation marks, and at present, the method mainly includes two categories: the method comprises the steps of judging the positions of punctuation marks by judging the voice pause duration based on voice information, and judging the insertion positions of the punctuation marks by judging text information based on text sequence information. The former has a limitation that the processing of a long text with a fast speech speed or the processing of a speech pause in the middle cannot be well processed because the judgment can be made only by the pause duration. The latter method based on text sequence information can judge the adding position of punctuation mark according to the context characteristic. However, in some scenarios, multiple punctuation marks need to be added behind the character, but the current methods do not yet address the need to insert multiple punctuation marks behind the character.
Disclosure of Invention
The embodiment of the invention provides a method and a device for adding punctuation marks in a text and training a model, and electronic equipment, which can well solve the problem of adding a plurality of punctuation marks behind characters and improve the accuracy of adding the punctuation marks.
In a first aspect, an embodiment of the present invention provides a method for adding a punctuation mark in a text, including:
acquiring a text to be added without punctuations, performing word segmentation processing and part-of-speech recognition on the text to be added, and performing normalization processing on set words in the processed text to be added;
acquiring a word/word vector corresponding to each character in the processed text to be added through a pre-trained word/word vector model;
splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the file to be added to obtain a characteristic vector;
inputting the characteristic vector into a trained seq2seq model to obtain a plurality of candidate text sequences added with punctuations and form a candidate text sequence set;
filtering text sequences which are inconsistent with characters/words in the text to be added in the candidate text sequence set, and filtering text sequences which are not in accordance with punctuation mark specifications in the candidate text sequence set;
and outputting the text sequence which has the highest joint probability and meets the punctuation mark specification in the remaining text sequences of the candidate text sequence set, and performing normalized reduction operation on the output text sequence.
In a second aspect, an embodiment of the present invention further provides a training method for adding a landmark symbol in a text, including:
removing punctuation marks from a plurality of original texts with punctuation marks to obtain training texts;
performing word segmentation processing and part-of-speech recognition on the training text, and performing normalization processing on the set words of the training text after processing;
acquiring a word/word vector corresponding to each character in the processed training text through a pre-trained word/word vector model;
splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the training text to obtain a characteristic vector;
and inputting the characteristic vector into a seq2seq model, and training the seq2seq model to obtain the trained seq2seq model.
In a third aspect, an embodiment of the present invention further provides a device for adding a punctuation mark in a text, including:
the processing module is used for acquiring a text to be added without punctuations, performing word segmentation processing and part-of-speech recognition on the text to be added, and performing normalization processing on set words in the processed text to be added;
the character/word vector obtaining module is used for obtaining a character/word vector corresponding to each character in the processed text to be added through a pre-trained character/word vector model;
the splicing module is used for splicing the part-of-speech information, the participle boundary information and the character/word vector corresponding to each character in the file to be added to obtain a characteristic vector;
a candidate text sequence set obtaining module, configured to input the feature vector into a trained seq2seq model, obtain a plurality of candidate text sequences to which punctuations are added, and form a candidate text sequence set;
the filtering module is used for filtering text sequences which are inconsistent with characters/words in the text to be added in the candidate text sequence set and filtering text sequences which are inconsistent with punctuation mark specifications in the candidate text sequence set;
and the output/reduction module is used for outputting the text sequence which has the highest joint probability and meets the punctuation mark specification in the residual text sequences of the candidate text sequence set, and carrying out normalized reduction operation on the output text sequence.
In a fourth aspect, an embodiment of the present invention provides a training apparatus for a punctuation mark adding model in a text, including:
the training text obtaining module is used for removing punctuation marks from a plurality of original texts with punctuation marks to obtain training texts;
the processing module is used for performing word segmentation processing and part-of-speech recognition on the training text and performing normalization processing on the set words of the training text after processing;
the character/word vector obtaining module is used for obtaining a character/word vector corresponding to each character in the processed training text through a pre-trained character/word vector model;
the splicing module is used for splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the training text to obtain a characteristic vector;
and the training module is used for inputting the characteristic vector into a seq2seq model, and training the seq2seq model to obtain the trained seq2seq model.
In a fifth aspect, an embodiment of the present invention further provides an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement a method for adding a bid-marking symbol in a text provided by the embodiment of the present invention, or a method for training a model for adding a bid-marking symbol in a text provided by the embodiment of the present invention.
In a sixth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for adding a bid-marking symbol in a text, or a method for training a model for adding a bid-marking symbol in a text, according to an embodiment of the present invention.
According to the technical scheme provided by the embodiment of the invention, word segmentation processing, part of speech recognition and normalization processing are carried out on the text to be added without punctuation marks, a character/word vector corresponding to each character in the file to be added is obtained, the character/word vector, part of speech information and word segmentation boundary information are spliced to obtain a characteristic vector, the characteristic vector is input into a trained seq2seq model to obtain a plurality of candidate text sequences with punctuation marks added, a candidate text sequence set is formed, the candidate texts which do not meet the conditions are filtered, the text sequences which have the highest joint probability and meet the punctuation mark specifications are output, and the output text sequences are subjected to normalization reduction operation, so that the problem of adding a plurality of punctuation marks behind the characters can be well solved, and the accuracy of punctuation mark addition is improved.
Drawings
Fig. 1a is a flowchart of a method for adding a punctuation mark in a text according to an embodiment of the present invention;
FIG. 1b is a schematic structural diagram of a seq2seq model provided by an embodiment of the present invention;
FIG. 1c is a flowchart of a method for seq2seq model training according to an embodiment of the present invention;
fig. 2a is a flowchart of a method for adding a punctuation mark in a text according to an embodiment of the present invention;
fig. 2b is a flowchart of a method for adding a punctuation mark in a text according to an embodiment of the present invention;
fig. 3 is a block diagram of a device for adding a punctuation mark in a text according to an embodiment of the present invention;
FIG. 4 is a block diagram of a training apparatus for adding a landmark symbol in a text according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Fig. 1a is a flowchart of a method for adding punctuation marks in a text according to an embodiment of the present invention, where the method may be executed by a device for adding punctuation marks in a text, where the device may be implemented by software and/or hardware, the device may be configured in an electronic device such as a terminal or a server, and the method may be applied in a scenario where punctuation marks are added to a text without punctuation marks, and optionally, in a scenario where multiple punctuation marks are added after words in a text.
As shown in fig. 1a, the technical solution provided by the embodiment of the present invention includes:
s110: the method comprises the steps of obtaining a text to be added without punctuations, carrying out word segmentation processing and part-of-speech recognition on the text to be added, and carrying out normalization processing on set words in the processed text to be added.
In the embodiment of the present invention, the text to be added has no punctuation mark, where the text to be added may be a text converted by speech, for example, a text converted by customer service dialog, a text converted by a speech assistant, or another text without punctuation mark.
In the embodiment of the present invention, the setting word may be a word such as a number, and the word such as a number may be normalized. The part-of-speech recognition specifically analyzes the attribute of each character/word in the text to be added, and can label part-of-speech information obtained after the part-of-speech recognition.
S120: and acquiring a word/word vector corresponding to each character in the processed text to be added through a pre-trained word/word vector model.
In the embodiment of the invention, the normalized text to be added can be input into the pre-trained character/word vector model to obtain the character/word vector corresponding to each character in the text to be added. The pre-trained word/word vector model may refer to a model in the related art.
S130: and splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the file to be added to obtain the characteristic vectors.
In the embodiment of the present invention, the sequence of the part of speech information, the word segmentation boundary information, and the word/word vector concatenation is not limited. The part-of-speech information is that each character corresponds to a name, a verb, other part-of-speech information, and the like. The segmentation boundary information refers to information such as information of the first character and the last character of a word.
S140: and inputting the characteristic vector into a trained seq2seq model to obtain a plurality of candidate text sequences added with punctuations and form a candidate text sequence set.
In an embodiment of the present invention, the feature vectors are input into a trained Sequence-to-Sequence (seq 2seq) model.
The seq2seq model can be based on a bidirectional long-short term memory model and an attention mechanism, and the model can be used for exploring the semantic information of the above and improving the accuracy of adding punctuations. The seq2seq model may include an embedding layer, an encoder, an attention layer, and a decoder. Reference may be made to fig. 1b with respect to the specific structure of the model. Wherein the encoder includes a bidirectional long short term memory unit (bidirectional LSTM). Since punctuation marks can be used for segmenting semantic groups, the insertion positions of the punctuation marks need to be comprehensively judged by using semantic information of an upper layer and a lower layer at the same time, an encoder adopts bidirectional LSTM to present 2 independent hidden states to an input vector according to a forward sequence and a reverse sequence so as to capture past and future information respectively, and then combines the 2 hidden states as final output.
Wherein, the attention layer adopts a standard attention implementation method. The coding layer obtains hidden states of a positive layer and a negative layer, and the context vector c output by the attention layer can be obtained based on attention weight weighting and 2 hidden states output by the coding layer, and the weight can embody the relation between the current hidden state and the context.
The decoder comprises a bidirectional long-short term memory unit, a bidirectional long-short term memory unit and a bidirectional long-short term memory unit, wherein the bidirectional long-short term memory unit is used for generating probability distribution of characters at each moment through a softmax function based on output results of the attention layer and vectors output by the embedding layer, selecting the characters at each moment based on the probability size, and forming at least one text sequence added with punctuation marks. Wherein the output result of the attention layer may be a context vector. Specifically, N characters are selected through a decoding layer at each moment based on the probability; at least one punctuation-added text sequence is formed based on the characters selected at the respective time instants. Wherein N is a positive integer. At each moment, at least one predicted character is provided, each character has a certain probability, the character with the probability greater than the set probability can be selected as the predicted character, and at least one text sequence added with punctuations can be combined according to the at least one character selected at each moment, so that at least one candidate text sequence is obtained, and a candidate text sequence set is formed.
S150: and filtering text sequences which are inconsistent with characters/words in the text to be added in the candidate text sequence set, and filtering text sequences which are not in accordance with the punctuation mark specification in the candidate text sequence set.
In the embodiment of the present invention, the text sequences in the candidate text sequence set may be inconsistent with the words and/or phrases in the text to be added. For example, one text sequence of the candidate text set is: ABC, XX; but the text to be added is ABBXX, if the 'ABC, XX' in the candidate text set is inconsistent with the characters/words in the text to be added, then the 'ABC, XX' in the candidate text set is filtered.
In the embodiment of the present invention, the text sequences in the candidate text sequence set may have a situation that the punctuation mark does not meet the punctuation mark specification, for example, a comma connection situation, a left-right parentheses mismatch situation, a left-right title mismatch situation, an ellipsis misuse situation, and the like, and the text sequences whose punctuation marks do not meet the punctuation mark specification are filtered.
S160: and outputting the text sequence which has the highest joint probability and meets the punctuation mark specification in the remaining text sequences of the candidate text sequence set, and performing normalized reduction operation on the output text sequence.
In the embodiment of the invention, in the residual text sequences of the candidate text sequence set, each character of the text sequence can obtain the predicted probability through a seq2seq model, and the text sequence with the highest joint probability is selected for output. The joint probability refers to that the text sequence is determined based on the probability of each character, and may be, for example, the sum of the probabilities of each character in the text sequence, or the average of the probabilities of each character, or the sum of the product of the probability of each character and the corresponding weight.
In the embodiment of the present invention, the reduction operation of normalizing the output text sequence may specifically be to reduce words such as numbers in the output text. By filtering the text sequence output by the seq2seq model, unreasonable situations and situations of violating punctuation mark specifications can be filtered, the text sequence can be accurately predicted, and the accuracy of punctuation addition is improved.
Fig. 1c may be referred to in a specific text as a method for adding a punctuation mark, and it should be noted that the method provided in the embodiment of the present invention may be an online process.
In the related art, the insertion position of a punctuation mark is judged by judging text information based on the text sequence information, and the adding position of the punctuation mark is judged by context characteristics based on the text sequence information, which can be performed in a sequence labeling manner, but in some scenes, a plurality of punctuation marks need to be added behind characters. For example, "teacher arranges students to write" issue List "by default. ", multiple punctuation marks need to be inserted after the last" table "word in the text. The method provided by the embodiment of the invention can obtain a candidate text sequence added with punctuations by processing the text to be added without punctuations to obtain a characteristic vector, inputting the characteristic vector into a trained seq2seq model, and filtering to obtain a final candidate text sequence, wherein the seq2seq model inputs a sequence and outputs a sequence which is also filtered and screened, so that the problem of adding a plurality of punctuations behind characters can be well solved, and the accuracy of adding punctuations is improved.
According to the technical scheme provided by the embodiment of the invention, word segmentation processing, part of speech recognition and normalization processing are carried out on the text to be added without punctuation marks, a character/word vector corresponding to each character in the file to be added is obtained, the character/word vector, part of speech information and word segmentation boundary information are spliced to obtain a characteristic vector, the characteristic vector is input into a trained seq2seq model to obtain a plurality of candidate text sequences with punctuation marks added, a candidate text sequence set is formed, the candidate texts which do not meet the conditions are filtered, the text sequences which have the highest joint probability and meet the punctuation mark specifications are output, and the output text sequences are subjected to normalization reduction operation, so that the problem of adding a plurality of punctuation marks behind the characters can be well solved, and the accuracy of punctuation mark addition is improved.
Fig. 2a is a flowchart of a method for adding a landmark symbol in a text according to an embodiment of the present invention, where on the basis of the above embodiment, a process of training a seq2seq model is added in the embodiment of the present invention, and the training process of the seq2seq model may be an offline process.
As shown in fig. 2a, the technical solution provided by the embodiment of the present invention includes:
s210: removing punctuation marks from a plurality of original texts with punctuation marks to obtain training texts; performing word segmentation processing and part-of-speech recognition on the training text, and performing normalization processing on the set words of the training text after processing.
The normalization processing on the training text may be normalization processing on numbers and the like in the training text, for example, the numbers and the like may be replaced by setting marks, so as to increase the processing speed of the model.
The word segmentation processing and the part-of-speech recognition in the training text are the same as those in the above embodiment.
S220: and acquiring a word/word vector corresponding to each character in the processed training text through a pre-trained word/word vector model.
S230: and splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the training text to obtain the feature vectors.
In the embodiment of the invention, semantic information of the current input state can be obtained through the character vector and/or the word vector and the part-of-speech characteristics, the characteristics of word segmentation, part-of-speech and the like are introduced, the boundary information of the current input character is also obtained, the knowledge of word boundaries can be learned in training, and the situation that punctuation marks cut the word segmentation content in the prediction stage is avoided.
S240: and inputting the characteristic vector into a seq2seq model, and training the seq2seq model to obtain the trained seq2seq model.
The seq2seq model can be based on a bidirectional long-short term memory model and an attention mechanism, and the model can be used for exploring the semantic information of the above and improving the accuracy of adding punctuations. The seq2seq model may include an embedding layer, an encoder, an attention layer, and a decoder. Reference may be made to fig. 1b with respect to the specific structure of the model. Wherein the encoder includes a bidirectional long short term memory unit (bidirectional LSTM). Since punctuation marks can be used for segmenting semantic groups, the insertion positions of the punctuation marks need to be comprehensively judged by using semantic information of an upper layer and a lower layer at the same time, an encoder adopts bidirectional LSTM to present 2 independent hidden states to an input vector according to a forward sequence and a reverse sequence so as to capture past and future information respectively, and then combines the 2 hidden states as final output.
Wherein, the attention layer adopts a standard attention implementation method. The coding layer obtains hidden states of a positive layer and a negative layer, and the context vector c output by the attention layer can be obtained based on attention weight weighting and 2 hidden states output by the coding layer, and the weight can embody the relation between the current hidden state and the context.
The decoder comprises a bidirectional long-short term memory unit, a bidirectional long-short term memory unit and a bidirectional long-short term memory unit, wherein the bidirectional long-short term memory unit is used for generating probability distribution of characters at each moment through a softmax function based on output results of the attention layer and vectors output by the embedding layer, selecting the characters at each moment based on the probability size, and forming at least one text sequence added with punctuation marks. Wherein the output result of the attention layer may be a context vector. Specifically, N characters are selected through a decoding layer at each moment based on the probability; at least one punctuation-added text sequence is formed based on the characters selected at the respective time instants. Wherein N is a positive integer. At each moment, at least one predicted character is provided, each character has a certain probability, characters with the probability higher than the set probability can be selected as predicted characters, and at least one punctuation added text sequence can be combined according to at least one character selected at each moment. And matching the text sequence output by the decoding layer with the original text, adjusting the seq2seq model, finishing the training of the seq2seq model and obtaining the trained seq2seq model. Wherein, the process of training the seq2seq model specifically can refer to fig. 2 b. The seq2seq model trained by the method fully introduces word vectors and word vectors of context for judgment, fully considers state conversion relation among punctuations, and can obtain better prediction effect.
S250: the method comprises the steps of obtaining a text to be added without punctuations, carrying out word segmentation processing and part-of-speech recognition on the text to be added, and carrying out normalization processing on set words in the processed text to be added.
S260: and acquiring a word/word vector corresponding to each character in the processed text to be added through a pre-trained word/word vector model.
S270: and splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the file to be added to obtain the characteristic vectors.
S280: and inputting the characteristic vector into a trained seq2seq model to obtain a plurality of candidate text sequences added with punctuations and form a candidate text sequence set.
S290: and filtering text sequences which are inconsistent with characters/words in the text to be added in the candidate text sequence set, and filtering text sequences which are not in accordance with the punctuation mark specification in the candidate text sequence set.
S291: and outputting the text sequence which has the highest joint probability and meets the punctuation mark specification in the remaining text sequences of the candidate text sequence set, and performing normalized reduction operation on the output text sequence.
Fig. 3 is a flowchart of a method for training a text punctuation mark adding model, which may be performed by a device for training a text punctuation mark adding model, where the device may be implemented by software and/or hardware.
As shown in fig. 3, the technical solution provided by the embodiment of the present invention includes:
s310: and removing punctuation marks from the original texts with the punctuation marks to obtain the training texts.
S320: performing word segmentation processing and part-of-speech recognition on the training text, and performing normalization processing on the set words of the training text after processing.
S330: and acquiring a word/word vector corresponding to each character in the processed training text through a pre-trained word/word vector model.
S340: and splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the training text to obtain the feature vectors.
S350: and inputting the characteristic vector into a seq2seq model, and training the seq2seq model to obtain the trained seq2seq model.
In this embodiment, reference may be made to the descriptions of S210 to S250 in the above embodiments for the descriptions of S310 to S350.
The embodiment of the invention can acquire the semantic information of the current input state through the character vector and/or the word vector and the part-of-speech characteristics, introduces the characteristics of word segmentation, part-of-speech and the like, also acquires the boundary information of the current input character, can learn the knowledge of word boundaries in training, and avoids the situation that punctuation marks are generated in the prediction stage to separate word segmentation contents.
Fig. 3 is a block diagram of a structure of a device for adding a punctuation mark in a text according to an embodiment of the present invention, as shown in fig. 3, the device includes: a processing module 310, a word/word vector obtaining module 320, a concatenation module 330, a candidate text sequence set obtaining module 340, a filtering module 350, and an output/restoration module 360.
The processing module 310 is configured to obtain a to-be-added text without punctuation marks, perform word segmentation processing and part-of-speech recognition on the to-be-added text, and perform normalization processing on set words in the processed to-be-added text;
a word/word vector obtaining module 320, configured to obtain, through a pre-trained word/word vector model, a word/word vector corresponding to each character in the processed text to be added;
the splicing module 330 is configured to splice part-of-speech information, word segmentation boundary information, and word/word vectors corresponding to each character in the file to be added to obtain a feature vector;
a candidate text sequence set obtaining module 340, configured to input the feature vector into the trained seq2seq model, obtain a plurality of candidate text sequences to which punctuations are added, and form a candidate text sequence set;
a filtering module 350, configured to filter text sequences in the candidate text sequence set that are inconsistent with the characters/words in the text to be added, and filter text sequences in the candidate text sequence set that are inconsistent with the punctuation mark specification;
and the output/reduction module 360 is configured to output, from the remaining text sequences in the candidate text sequence set, a text sequence that has the highest joint probability and meets the punctuation specification, and perform a reduction operation of normalization on the output text sequence.
Optionally, the seq2seq model is a seq2seq model based on a bidirectional long-short term memory model and an attention mechanism.
Optionally, the seq2seq model comprises an embedded layer, an encoder, an attention layer and a decoder;
the encoder comprises a bidirectional long-short term memory unit; the attention layer includes long and short term memory cells;
the decoder comprises a bidirectional long-short term memory unit, and the bidirectional long-short term memory unit is used for generating probability distribution of characters at each moment through a softmax function based on the output result of the attention layer and the output result of the embedding layer, selecting the characters at each moment based on the probability size, and forming at least one text sequence added with punctuation marks.
Optionally, the apparatus further comprises a training module, configured to:
removing punctuation marks from a plurality of original texts with punctuation marks to obtain training texts; performing word segmentation processing and part-of-speech recognition on the training text, and performing normalization processing on the set words of the training text after processing;
acquiring a word/word vector corresponding to each character in the processed training text through a pre-trained word/word vector model;
splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the training text to obtain a characteristic vector;
and inputting the characteristic vector into a seq2seq model, and training the seq2seq model to obtain the trained seq2seq model.
Optionally, the setting words comprise numbers.
The device can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 4 is a block diagram of a structure of a training apparatus for adding a landmark symbol in a text according to an embodiment of the present invention, and as shown in fig. 4, the apparatus includes: a training text obtaining module 410, a processing module 420, a word/word vector obtaining module 430, a concatenation module 440, and a training module 450.
The training text obtaining module 410 is configured to remove punctuation marks from a plurality of original texts with punctuation marks to obtain training texts;
the processing module 420 is configured to perform word segmentation processing and part-of-speech recognition on the training text, and perform normalization processing on the set words of the training text after processing;
a word/word vector obtaining module 430, configured to obtain, through a pre-trained word/word vector model, a word/word vector corresponding to each character in the processed training text;
a concatenation module 440, configured to concatenate the part-of-speech information, the segmentation boundary information, and the word/word vectors corresponding to each character in the training text to obtain a feature vector;
the training module 450 is configured to input the feature vector into a seq2seq model, and train the seq2seq model to obtain a trained seq2seq model.
The device can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device includes:
one or more processors 510, one processor 510 being illustrated in FIG. 5;
a memory 520;
the apparatus may further include: an input device 530 and an output device 540.
The processor 510, the memory 520, the input device 530 and the output device 540 of the apparatus may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The memory 520, which is a non-transitory computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a method for adding punctuation marks in text in an embodiment of the present invention (for example, the processing module 310, the word/word vector obtaining module 320, the concatenation module 330, the candidate text sequence set obtaining module 340, the filtering module 350, and the output/restoration module 360 shown in fig. 3). Or a program instruction/module corresponding to the training method for adding a model to a punctuation mark in a text in the embodiment of the present invention (for example, the training text obtaining module 410, the processing module 420, the word/word vector obtaining module 430, the concatenation module 440, and the training module 450 shown in fig. 4)
The processor 510 executes various functional applications and data processing of the computer device by executing the software programs, instructions and modules stored in the memory 520, namely, implementing a text punctuation mark adding method of the above method embodiment, namely:
acquiring a text to be added without punctuations, performing word segmentation processing and part-of-speech recognition on the text to be added, and performing normalization processing on set words in the processed text to be added;
acquiring a word/word vector corresponding to each character in the processed text to be added through a pre-trained word/word vector model;
splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the file to be added to obtain a characteristic vector;
inputting the characteristic vector into a trained seq2seq model to obtain a plurality of candidate text sequences added with punctuations and form a candidate text sequence set;
filtering text sequences which are inconsistent with characters/words in the text to be added in the candidate text sequence set, and filtering text sequences which are not in accordance with punctuation mark specifications in the candidate text sequence set;
and outputting the text sequence which has the highest joint probability and meets the punctuation mark specification in the remaining text sequences of the candidate text sequence set, and performing normalized reduction operation on the output text sequence.
Or, implementing the method for training the punctuation mark adding model in the text provided by the embodiment of the invention, namely:
removing punctuation marks from a plurality of original texts with punctuation marks to obtain training texts;
performing word segmentation processing and part-of-speech recognition on the training text, and performing normalization processing on the set words of the training text after processing;
acquiring a word/word vector corresponding to each character in the processed training text through a pre-trained word/word vector model;
splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the training text to obtain a characteristic vector;
and inputting the characteristic vector into a seq2seq model, and training the seq2seq model to obtain the trained seq2seq model.
The memory 520 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 520 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 520 may optionally include memory located remotely from processor 510, which may be connected to a terminal device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 530 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus. The output device 540 may include a display device such as a display screen.
An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for adding a landmark symbol in a text, such as:
acquiring a text to be added without punctuations, performing word segmentation processing and part-of-speech recognition on the text to be added, and performing normalization processing on set words in the processed text to be added;
acquiring a word/word vector corresponding to each character in the processed text to be added through a pre-trained word/word vector model;
splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the file to be added to obtain a characteristic vector;
inputting the characteristic vector into a trained seq2seq model to obtain a plurality of candidate text sequences added with punctuations and form a candidate text sequence set;
filtering text sequences which are inconsistent with characters/words in the text to be added in the candidate text sequence set, and filtering text sequences which are not in accordance with punctuation mark specifications in the candidate text sequence set;
and outputting the text sequence which has the highest joint probability and meets the punctuation mark specification in the remaining text sequences of the candidate text sequence set, and performing normalized reduction operation on the output text sequence.
Or, implementing the method for training the punctuation mark adding model in the text provided by the embodiment of the invention, namely:
removing punctuation marks from a plurality of original texts with punctuation marks to obtain training texts;
performing word segmentation processing and part-of-speech recognition on the training text, and performing normalization processing on the set words of the training text after processing;
acquiring a word/word vector corresponding to each character in the processed training text through a pre-trained word/word vector model;
splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the training text to obtain a characteristic vector;
and inputting the characteristic vector into a seq2seq model, and training the seq2seq model to obtain the trained seq2seq model.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (10)
1. A method for adding a punctuation mark in a text is characterized by comprising the following steps:
acquiring a text to be added without punctuations, performing word segmentation processing and part-of-speech recognition on the text to be added, and performing normalization processing on set words in the processed text to be added;
acquiring a word/word vector corresponding to each character in the processed text to be added through a pre-trained word/word vector model;
splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the file to be added to obtain a characteristic vector;
inputting the characteristic vector into a trained seq2seq model to obtain a plurality of candidate text sequences added with punctuations and form a candidate text sequence set;
filtering text sequences which are inconsistent with characters/words in the text to be added in the candidate text sequence set, and filtering text sequences which are not in accordance with punctuation mark specifications in the candidate text sequence set;
and outputting the text sequence which has the highest joint probability and meets the punctuation mark specification in the remaining text sequences of the candidate text sequence set, and performing normalized reduction operation on the output text sequence.
2. The method of claim 1, wherein the seq2seq model is a seq2seq model based on a two-way long-short term memory model and an attention mechanism.
3. The method of claim 1 or 2, wherein the seq2seq model comprises an embedded layer, an encoder, an attention layer and a decoder;
the encoder comprises a bidirectional long-short term memory unit; the attention layer includes long and short term memory cells;
the decoder comprises a bidirectional long-short term memory unit, and the bidirectional long-short term memory unit is used for generating probability distribution of characters at each moment through a softmax function based on the output result of the attention layer and the output result of the embedding layer, selecting the characters at each moment based on the probability size, and forming at least one text sequence added with punctuation marks.
4. The method of claim 1, further comprising:
removing punctuation marks from a plurality of original texts with punctuation marks to obtain training texts; performing word segmentation processing and part-of-speech recognition on the training text, and performing normalization processing on the set words of the training text after processing;
acquiring a word/word vector corresponding to each character in the processed training text through a pre-trained word/word vector model;
splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the training text to obtain a characteristic vector;
and inputting the characteristic vector into a seq2seq model, and training the seq2seq model to obtain the trained seq2seq model.
5. The method of claim 1, wherein the set word comprises a number.
6. A training method for adding a model to a punctuation mark in a text is characterized by comprising the following steps:
removing punctuation marks from a plurality of original texts with punctuation marks to obtain training texts;
performing word segmentation processing and part-of-speech recognition on the training text, and performing normalization processing on the set words of the training text after processing;
acquiring a word/word vector corresponding to each character in the processed training text through a pre-trained word/word vector model;
splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the training text to obtain a characteristic vector;
and inputting the characteristic vector into a seq2seq model, and training the seq2seq model to obtain the trained seq2seq model.
7. A device for adding a punctuation mark in a text, comprising:
the processing module is used for acquiring a text to be added without punctuations, performing word segmentation processing and part-of-speech recognition on the text to be added, and performing normalization processing on set words in the processed text to be added;
the character/word vector obtaining module is used for obtaining a character/word vector corresponding to each character in the processed text to be added through a pre-trained character/word vector model;
the splicing module is used for splicing the part-of-speech information, the participle boundary information and the character/word vector corresponding to each character in the file to be added to obtain a characteristic vector;
a candidate text sequence set obtaining module, configured to input the feature vector into a trained seq2seq model, obtain a plurality of candidate text sequences to which punctuations are added, and form a candidate text sequence set;
the filtering module is used for filtering text sequences which are inconsistent with characters/words in the text to be added in the candidate text sequence set and filtering text sequences which are inconsistent with punctuation mark specifications in the candidate text sequence set;
and the output/reduction module is used for outputting the text sequence which has the highest joint probability and meets the punctuation mark specification in the residual text sequences of the candidate text sequence set, and carrying out normalized reduction operation on the output text sequence.
8. A device for training a punctuation mark adding model in a text is characterized by comprising:
the training text obtaining module is used for removing punctuation marks from a plurality of original texts with punctuation marks to obtain training texts;
the processing module is used for performing word segmentation processing and part-of-speech recognition on the training text and performing normalization processing on the set words of the training text after processing;
the character/word vector obtaining module is used for obtaining a character/word vector corresponding to each character in the processed training text through a pre-trained character/word vector model;
the splicing module is used for splicing the part-of-speech information, the word segmentation boundary information and the word/word vectors corresponding to each character in the training text to obtain a characteristic vector;
and the training module is used for inputting the characteristic vector into a seq2seq model, and training the seq2seq model to obtain the trained seq2seq model.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a method of textual bid symbol addition according to any of claims 1-5 or a method of textual bid symbol addition model training according to claim 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a method for adding a bid in text as claimed in any one of claims 1 to 5, or a method for training a model for adding a bid in text as claimed in claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911182421.XA CN111027291B (en) | 2019-11-27 | 2019-11-27 | Method and device for adding mark symbols in text and method and device for training model, and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911182421.XA CN111027291B (en) | 2019-11-27 | 2019-11-27 | Method and device for adding mark symbols in text and method and device for training model, and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111027291A true CN111027291A (en) | 2020-04-17 |
CN111027291B CN111027291B (en) | 2024-03-26 |
Family
ID=70207202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911182421.XA Active CN111027291B (en) | 2019-11-27 | 2019-11-27 | Method and device for adding mark symbols in text and method and device for training model, and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111027291B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111753524A (en) * | 2020-07-01 | 2020-10-09 | 携程计算机技术(上海)有限公司 | Text sentence break position identification method and system, electronic device and storage medium |
CN111950256A (en) * | 2020-06-23 | 2020-11-17 | 北京百度网讯科技有限公司 | Sentence break processing method and device, electronic equipment and computer storage medium |
CN112001167A (en) * | 2020-08-26 | 2020-11-27 | 四川云从天府人工智能科技有限公司 | Punctuation mark adding method, system, equipment and medium |
CN112199927A (en) * | 2020-10-19 | 2021-01-08 | 古联(北京)数字传媒科技有限公司 | Ancient book mark point filling method and device |
WO2021213155A1 (en) * | 2020-11-25 | 2021-10-28 | 平安科技(深圳)有限公司 | Method, apparatus, medium, and electronic device for adding punctuation to text |
CN113609850A (en) * | 2021-07-02 | 2021-11-05 | 北京达佳互联信息技术有限公司 | Word segmentation processing method and device, electronic equipment and storage medium |
CN115394298A (en) * | 2022-08-26 | 2022-11-25 | 思必驰科技股份有限公司 | Training method and prediction method of speech recognition text punctuation prediction model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107767870A (en) * | 2017-09-29 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Adding method, device and the computer equipment of punctuation mark |
CN108932226A (en) * | 2018-05-29 | 2018-12-04 | 华东师范大学 | A kind of pair of method without punctuate text addition punctuation mark |
CN109918666A (en) * | 2019-03-06 | 2019-06-21 | 北京工商大学 | A kind of Chinese punctuation mark adding method neural network based |
WO2019174422A1 (en) * | 2018-03-16 | 2019-09-19 | 北京国双科技有限公司 | Method for analyzing entity association relationship, and related apparatus |
-
2019
- 2019-11-27 CN CN201911182421.XA patent/CN111027291B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107767870A (en) * | 2017-09-29 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | Adding method, device and the computer equipment of punctuation mark |
WO2019174422A1 (en) * | 2018-03-16 | 2019-09-19 | 北京国双科技有限公司 | Method for analyzing entity association relationship, and related apparatus |
CN108932226A (en) * | 2018-05-29 | 2018-12-04 | 华东师范大学 | A kind of pair of method without punctuate text addition punctuation mark |
CN109918666A (en) * | 2019-03-06 | 2019-06-21 | 北京工商大学 | A kind of Chinese punctuation mark adding method neural network based |
Non-Patent Citations (1)
Title |
---|
任智慧;徐浩煜;封松林;周晗;施俊;: "基于LSTM网络的序列标注中文分词法" * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111950256A (en) * | 2020-06-23 | 2020-11-17 | 北京百度网讯科技有限公司 | Sentence break processing method and device, electronic equipment and computer storage medium |
CN111753524A (en) * | 2020-07-01 | 2020-10-09 | 携程计算机技术(上海)有限公司 | Text sentence break position identification method and system, electronic device and storage medium |
CN112001167A (en) * | 2020-08-26 | 2020-11-27 | 四川云从天府人工智能科技有限公司 | Punctuation mark adding method, system, equipment and medium |
CN112199927A (en) * | 2020-10-19 | 2021-01-08 | 古联(北京)数字传媒科技有限公司 | Ancient book mark point filling method and device |
WO2021213155A1 (en) * | 2020-11-25 | 2021-10-28 | 平安科技(深圳)有限公司 | Method, apparatus, medium, and electronic device for adding punctuation to text |
CN113609850A (en) * | 2021-07-02 | 2021-11-05 | 北京达佳互联信息技术有限公司 | Word segmentation processing method and device, electronic equipment and storage medium |
CN113609850B (en) * | 2021-07-02 | 2024-05-17 | 北京达佳互联信息技术有限公司 | Word segmentation processing method and device, electronic equipment and storage medium |
CN115394298A (en) * | 2022-08-26 | 2022-11-25 | 思必驰科技股份有限公司 | Training method and prediction method of speech recognition text punctuation prediction model |
Also Published As
Publication number | Publication date |
---|---|
CN111027291B (en) | 2024-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111027291B (en) | Method and device for adding mark symbols in text and method and device for training model, and electronic equipment | |
CN105931644B (en) | A kind of audio recognition method and mobile terminal | |
CN108985358B (en) | Emotion recognition method, device, equipment and storage medium | |
CN112528637B (en) | Text processing model training method, device, computer equipment and storage medium | |
CN108922564B (en) | Emotion recognition method and device, computer equipment and storage medium | |
CN108710704B (en) | Method and device for determining conversation state, electronic equipment and storage medium | |
CN116884391B (en) | Multimode fusion audio generation method and device based on diffusion model | |
CN113469298B (en) | Model training method and resource recommendation method | |
CN112036122B (en) | Text recognition method, electronic device and computer readable medium | |
CN115587598A (en) | Multi-turn dialogue rewriting method, equipment and medium | |
CN116434752A (en) | Speech recognition error correction method and device | |
CN113160820B (en) | Speech recognition method, training method, device and equipment of speech recognition model | |
CN116913278B (en) | Voice processing method, device, equipment and storage medium | |
CN113032538A (en) | Topic transfer method based on knowledge graph, controller and storage medium | |
CN117093864A (en) | Text generation model training method and device | |
CN112836476B (en) | Summary generation method, device, equipment and medium | |
CN115527520A (en) | Anomaly detection method, device, electronic equipment and computer readable storage medium | |
CN115496734A (en) | Quality evaluation method of video content, network training method and device | |
CN112002325B (en) | Multi-language voice interaction method and device | |
CN115346520A (en) | Method, apparatus, electronic device and medium for speech recognition | |
CN114297409A (en) | Model training method, information extraction method and device, electronic device and medium | |
CN114358019A (en) | Method and system for training intention prediction model | |
CN114239601A (en) | Statement processing method and device and electronic equipment | |
US20240127812A1 (en) | Method and system for auto-correction of an ongoing speech command | |
CN115563962A (en) | Method, device and related equipment for detecting wrongly written text characters in parallel voice data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Country or region after: China Address after: Room 501, 502, 503, No. 66 Boxia Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai, March 2012 Applicant after: Daguan Data Co.,Ltd. Address before: Room 301, 303 and 304, block B, 112 liangxiu Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai, 201203 Applicant before: DATAGRAND INFORMATION TECHNOLOGY (SHANGHAI) Co.,Ltd. Country or region before: China |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |