CN115796125B

CN115796125B - Text generation method, model training method and device

Info

Publication number: CN115796125B
Application number: CN202310078329.9A
Authority: CN
Inventors: 耿瑞莹; 石翔; 李亮; 黎槟华; 李永彬
Original assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Current assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date: 2023-02-08
Filing date: 2023-02-08
Publication date: 2023-05-05
Anticipated expiration: 2043-02-08
Also published as: CN115796125A

Abstract

The application provides a text generation method, a model training method and a device. The method comprises the steps of obtaining a table to be processed; generating a plurality of character groups based on the table to be processed, wherein the plurality of character groups comprise: the text groups are composed of key texts in the tables to be processed and one value word corresponding to the key texts, the value word is at the positive sequence position in the value text, and the value word is at the negative sequence position in the value text according to the preset sequence; inputting a plurality of character groups into an encoder for encoding processing to obtain a plurality of encoding vectors; inputting a plurality of coding vectors into a text content extraction model to extract text content, so as to obtain a first output text; and inputting the plurality of coding vectors and the first output text into a text splicing model for text splicing to obtain a target output text corresponding to the to-be-processed form. The method and the device can generate smooth text based on the content in the to-be-processed table.

Description

Text generation method, model training method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a text generating method, a model training method, and a device.

Background

The text (Table-to-text) is generated based on a Table, is a technology for generating text description in a natural language form by using information elements in a given structured Table, namely, the text, can help a user to quickly acquire key information in the Table, and has wide application in scenes such as character biography generation, weather broadcasting, news event broadcasting and the like. Therefore, how to convert a form into text with high quality is an important subject worthy of research.

In the related art, text is generated based on end-to-end frame processing of a table, but smoothness of the generated text cannot be guaranteed.

Disclosure of Invention

Various aspects of the application provide a text generation method, a model training method and a device so as to realize fluency of a text generated by a form.

A first aspect of an embodiment of the present application provides a text generating method, including: acquiring a form to be processed; generating a plurality of character groups based on the table to be processed, wherein the plurality of character groups comprise: the text groups are composed of key texts in the tables to be processed and one value word corresponding to the key texts, the value word is at the positive sequence position in the value text, and the value word is at the negative sequence position in the value text according to the preset sequence; inputting a plurality of character groups into an encoder for encoding processing to obtain a plurality of encoding vectors, wherein the encoding vectors correspond to the character groups one by one; inputting a plurality of coding vectors into a text content extraction model to extract text content, and obtaining a first output text, wherein the first output text comprises at least one value word in a to-be-processed table; and inputting the plurality of coding vectors and the first output text into a text splicing model for text splicing to obtain a target output text corresponding to the to-be-processed form, wherein the target output text comprises the first output text and characters in a preset word stock.

A second aspect of the embodiments of the present application provides a text generation method, applied to a terminal device, where the text generation method includes: acquiring a form to be processed; the form to be processed is sent to a server; receiving target output text sent by the server, wherein the target output text is determined by the server according to the text generation method of the first aspect.

A third aspect of the embodiments of the present application provides a text generating apparatus, including:

the acquisition module is used for acquiring the form to be processed;

the generating module is used for generating a plurality of character groups based on the table to be processed, wherein the plurality of character groups comprise: the text groups are composed of key texts in the tables to be processed and one value word corresponding to the key texts, the value word is at the positive sequence position in the value text, and the value word is at the negative sequence position in the value text according to the preset sequence;

the coding module is used for inputting a plurality of character groups into the coder to carry out coding processing to obtain a plurality of coding vectors, wherein the coding vectors correspond to the character groups one by one;

the extraction module is used for inputting the plurality of coding vectors into the text content extraction model to extract text content to obtain a first output text, wherein the first output text comprises at least one value word in a to-be-processed table;

And the splicing module is used for inputting the plurality of coding vectors and the first output text into a text splicing model to carry out text splicing, so as to obtain a target output text corresponding to the to-be-processed form, wherein the target output text comprises the first output text and characters in a preset word stock.

A fourth aspect of the present application provides an electronic device, including: a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the text generation method as in the first or second aspect when the computer program is executed by the processor.

A fifth aspect of the embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement the text generation method as the first aspect or the second aspect.

The method and the device are applied to the classification scene of the table generation content, and the table to be processed is obtained; generating a plurality of character groups based on the table to be processed, wherein the plurality of character groups comprise: the text groups are composed of key texts in the tables to be processed and one value word corresponding to the key texts, the value word is at the positive sequence position in the value text, and the value word is at the negative sequence position in the value text according to the preset sequence; inputting a plurality of character groups into an encoder for encoding processing to obtain a plurality of encoding vectors, wherein the encoding vectors correspond to the character groups one by one; inputting a plurality of coding vectors into a text content extraction model to extract text content, and obtaining a first output text, wherein the first output text comprises at least one value word in a to-be-processed table; and inputting the plurality of coding vectors and the first output text into a text splicing model for text splicing to obtain a target output text corresponding to the to-be-processed form, wherein the target output text comprises the first output text and characters in a preset word stock, so that the text which is smooth and based on the content in the to-be-processed form can be generated.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 is an application scenario diagram provided in an exemplary embodiment of the present application;

FIG. 2 is a flowchart illustrating steps of a method for generating text according to an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of a plurality of character sets provided in an exemplary embodiment of the present application;

FIG. 4 is a flowchart of steps of another text generation method provided by an exemplary embodiment of the present application;

fig. 5 is a flowchart of a text generating method according to an exemplary embodiment of the present application;

FIG. 6 is a schematic diagram of a second word predictor provided in an exemplary embodiment of the present application;

FIG. 7 is a flowchart illustrating steps of a model training method according to an exemplary embodiment of the present application;

fig. 8 is a block diagram of a text generating apparatus according to an exemplary embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The method is based on the frame end-to-end generation of the encoder-decoder, the two-stage method is low in generation efficiency and difficult to apply in an actual scene, and the decoding efficiency can be greatly improved by adopting a non-autoregressive method, but the consistency and consistency of the generated text are difficult to ensure due to the fact that all single times are predicted in parallel by adopting the non-autoregressive method.

Based on the above problems, the application proposes a method for designing a two-stage non-autoregressive prediction model (a text content extraction model and a text splicing model), so that the prediction of the text splicing model in the second stage is built on the prediction result in the first stage, thereby obtaining the semantic dependency relationship between the characters predicted by the text content extraction model in the first stage, and greatly improving the fluency of the predicted target output text.

In the present embodiment, the execution apparatus of the text generation method is not limited. Alternatively, the text generation method may implement an overall text generation method by means of a cloud computing system. For example, the text generation method may be applied to a cloud server to run various models by virtue of resources on the cloud; compared with the cloud terminal, the text generation method can be applied to the server terminal equipment such as a conventional server, a cloud server or a server array.

In addition, referring to fig. 1, an application scenario diagram of the present application is shown. The terminal device 11 sends the table to be processed to the server 12, the server 12 carries the model, the table to be processed is processed by the model in the server 12 to obtain a descriptive target output text of the data in the table to be processed, such as "Li Wen in fig. 1 is an actor and model", and then the target output text is sent to the terminal device 11 to be provided to the user.

Fig. 1 is only an exemplary application scenario of the present application, and the present application may also be applied to classification scenarios of other texts, which is not limited herein.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

Fig. 2 is a flowchart of steps of a text generating method according to an exemplary embodiment of the present application. The text generation method as shown in fig. 2 specifically comprises the following steps:

s201, a to-be-processed form is acquired.

The to-be-processed form can be sent by the terminal equipment, is a form, is a visual communication mode, and is a means for organizing and organizing data. The personal information table referred to in fig. 1 is a table to be processed.

S202, generating a plurality of character groups based on the table to be processed.

Wherein the plurality of character sets includes: the text groups are composed of key texts in the tables to be processed and one value word corresponding to the key texts, the value word is in the positive sequence position of the value text, and the value word is in the reverse sequence position of the value text according to the preset sequence. In this application, a key text and a value word form a key-value pair (key-value), the key text is a key in the key-value pair, the key is in a text form, the value word is a value in the key-value pair, and the value is in a word form.

In the embodiment of the application, each text group is a character group and can be a value wordwKey textfPosition of positive sequenceP ⁺ And a reverse order positionP ^- Sequentially. Wherein the value word can be a word or a word. Where each text group has a sequence number, e.g., ri, the table T to be processed may be represented as t= { r1, r2, r3, …, rn }, where n is a positive integer. Each ri= {w _i ，f _i ，P _i ⁺ ，P _i ^- }。

Referring to fig. 3, the plurality of character sets in fig. 3 are generated based on the table to be processed in fig. 1, and include 5 text sets, specifically, r1{ Li Wen, name, 1}, r2{ Wang Wu, spouse, 1}, r3{ actor, occupation, 1, 2}, r4{ model, occupation, 2, 1}, r5{22, age, 1}. In the embodiment of the present application, reference is made to the table to be processed, the text group r3, and the text group r4. In the table to be processed, the key text is a occupation, the value text corresponding to the key text is { actor model }, wherein the value word "actor" is ranked as positive first and second last in the value text, the positive sequence position of the value word "actor" is 1, the reverse sequence position is 2, the value word "model" is ranked as positive second and first last in the value text, the positive sequence position of the value word "model" is 2, and the reverse sequence position is 1.

S203, inputting the plurality of character groups into an encoder for encoding processing to obtain a plurality of encoding vectors.

Wherein the code vectors and the character sets are in one-to-one correspondence.

In the embodiment of the application, in particular in the encoder, four trained embedding matrices (empoding) are adopted for the text group r _i Converting, wherein an embedding matrix is used for converting value wordsw _i Converting to obtain a value vector e _wi An embedding matrix for the text of the keyf _i Conversion is carried out to obtain a key vector e _fi An embedding matrix for aligning the sequence positionsP _i ⁺ Conversion is carried out to obtain a positive sequence vector e _Pi+ An embedding matrix for the reverse order positionP _i ^- Conversion is carried out to obtain an inverted vector e _Pi- . These embedded vectors are then concatenated to obtain a concatenated vector [ e ] _wi ; e _fi ; e _Pi+ ; e _Pi- ]Wherein "; "means a join to the embedded vector, then join vector [ e ] _wi ; e _fi ; e _Pi+ ; e _Pi- ]Performing linear projection to obtain projection vector e of each text group _i Specifically, the projection vector e _i Is represented by the following formula (1):

formula (1)

In the formula (1)

And->

Are all parameters that are trained in advance. Further, transform (a neural network model) coding is used in the encoder to encode the projection vectors into a context sequence representation

Wherein, the method comprises the steps of, wherein,H ^e representing a plurality of encoded vectors->

For one of the encoded vectors, the projection vector e _i And coding vector->

One-to-one correspondence, i.e. coding vector->

Sum character set r _i One-to-one correspondence.

S204, inputting the plurality of coding vectors into a text content extraction model to extract text content, and obtaining a first output text.

The first output text comprises at least one value word in the to-be-processed table, and the value words are distributed in the first output text in sequence according to a certain sequence.

In the embodiment of the present application, the text content extraction model is a pre-trained non-autoregressive model. The text content extraction model can extract the value words which need to form the target output text in the text to be processed. For example, referring to fig. 1 or 3, the value word of the text to be processed includes: li Wen, wang Wu, actors, models and 22. If the text content extraction model is trained for describing professional requirements, a plurality of coding vectors are input into the text content extraction model to extract text content, and the obtained first output text is { Li Wen, actors and models }. If the text content extraction model is trained for description partner requirements, a plurality of encoding vectors are input into the text content extraction model for text content extraction, and the obtained first output text is { Li Wen, wang Wu }. If the text content extraction model is trained for describing age requirements, a plurality of coding vectors are input into the text content extraction model to extract text content, the obtained first output text is { Li Wen, 22}, and if the text content extraction model is trained for describing spouse, occupation and age requirements, a plurality of coding vectors are input into the text content extraction model to extract text content, and the obtained first output text is { Li Wen, wang Wu, actor, model, 22}.

In the embodiment of the application, it can be understood that the text content extraction model extracts at least value words in the to-be-processed table, and sorts the extracted value words according to a certain order to obtain the first output text, where the first output text can ensure consistency with the text in the to-be-processed table.

S205, inputting the plurality of coding vectors and the first output text into a text splicing model for text splicing, and obtaining a target output text corresponding to the to-be-processed form.

The target output text comprises a first output text and characters in a preset character library. In the embodiment of the present application, the text stitching model is also a pre-trained non-autoregressive model. The text splicing model can predict connection characters between any two value words in the first output text, wherein the connection characters are characters in a preset word stock.

Illustratively, the first output text is { Li Wen, actor, model }, then the target output text is { Li Wen is actor and model }. The "Yes", "Yes" and "Yes" in the target output text are obtained by predicting the text splicing model, and the target output text can be obtained after the target output text is combined with the first output text.

In the embodiment of the application, the text content extraction is performed in the text content extraction model, so that the first output text which keeps consistency with the data in the to-be-processed table can be obtained. In addition, after the first output text is processed by the text splicing model, the fluency of the obtained target output text can be improved.

Referring to fig. 4, a flowchart of steps of another text generation method according to an exemplary embodiment of the present application is provided. The text generation method as shown in fig. 4 specifically includes the following steps:

s401, a to-be-processed table is acquired.

The specific implementation process of this step refers to S201, and will not be described here again.

S402, generating a plurality of character groups based on the table to be processed.

The specific implementation process of this step refers to S202, and will not be described here again.

S403, inputting the plurality of character groups into an encoder for encoding processing to obtain a plurality of encoding vectors.

In the specific implementation process of this step, referring to S203, it should be noted that the plurality of character groups further includes: a first identification character set and a second identification character set. The first identification character set is the same as the second identification character set and the text set in format, and is one character set, and the second identification character set is also one character set. Referring to fig. 3, the first identification character set may be set to r0= { [ BOS ], [ BOS ] }, and the second identification character set may be set to r (n+1) = { [ EOS ], [ EOS ] }.

In addition, before the first identification character set is spliced into a plurality of text sets, after the second identification character set is spliced into a plurality of text sets, the representation of a table to be processed is T= { r0, r1, r2, r3, …, rn, r (n+1) }, then T= { r0, r1, r2, r3, …, rn, r (n+1) } is subjected to the coding treatment of the formula (1) and the transform, and the projection vector corresponding to the obtained first identification character set is e ₀ The projection vector corresponding to the second identification character set is e _n+1 The context sequence of the plurality of encoding vectors is expressed as

Wherein->

Is the encoding vector corresponding to the first character set identifier, is->

Is the encoding vector corresponding to the second character set.

S404, inputting the coded vectors corresponding to the first identification character set and the second identification character set into a decoder for decoding processing to obtain corresponding identification decoding data.

Wherein, referring to fig. 5, the text content extraction model includes: decoder, placeholder predictor and first word predictor. In fig. 5, the same name is used for the same module, i.e. all decoders in fig. 5 are one decoder.

In an embodiment of the present application, the encoded vectors of the first identification character set are combined

And the coding vector of the second identification character set +. >

Obtaining the identification vector->

Inputting the identification vector into a decoder to obtain corresponding identification decoding data, wherein the identification decoding data comprises decoding data of a first identification character ∈>

(and coding vector->

Correspondence) and decoding data of the second identification character +.>

(and coding vector->

Corresponding to).

S405, inputting the identification decoding data into a placeholder predictor, predicting the first number of placeholders between the first identification character and the second identification character in the placeholder predictor, and adding the first number of placeholders between the first identification character and the second identification character to obtain a placeholder sequence.

Specifically, the first number of placeholders is determined in the placeholder predictor using the following equation (2).

Formula (2)

Wherein, in the formula (2), softmax is a logistic regression function, _l Wfor a pre-trained projection matrix, pi ^l Representing a first character identification [ BOS ]]And a second character identification [ EOS ]]A probability distribution of the number of placeholders in between,lrepresenting the number of placeholders that take the highest probability,lis the first number of placeholders. Then in the first character identification [ BOS ]]And a second character identification [ EOS ]]Interposed therebetweenlFirst placeholders [ PLH ]]Obtaining the placeholder sequence y ₁ 。

Illustratively, referring to fig. 5,

after being processed by a decoder and a placeholder predictor, the first placeholder number is 3, and a placeholder sequence y is obtained ₁ ={[BOS][PLH][PLH][PLH][EOS]。

S406, inputting the placeholder sequences and the plurality of encoding vectors into a decoder for decoding processing, and obtaining the placeholder decoding data of the first placeholder.

Wherein a plurality of encoding vectors H ^e And placeholder sequence y ₁ Processing the input decoder to obtain the first placeholder for decoding data

And (3) representing. Illustratively, for y in FIG. 5 ₁ ={[BOS][PLH][PLH][PLH][EOS]First [ PLH]The corresponding placeholder decoded data is denoted +.>

Second [ PLH]The corresponding placeholder decoded data is denoted +.>

Third [ PLH]The corresponding placeholder decoded data is denoted +.>

。

S407, inputting the placeholder decoding data and the plurality of coding vectors into a first word predictor, determining confidence scores between the placeholders and the coding vectors in the first word predictor, and replacing the first placeholders in the placeholder sequence by words with the highest confidence scores for the first placeholders to obtain a second output text.

Specifically, in the first word predictor, the confidence score between each first placeholder and each encoding vector is first determined using the following equation (3).

Formula (3)

Wherein in formula (3), softmax is a logistic regression function, W _p For a pre-trained projection matrix, pi ^P Representing the i first placeholder [ PLH ]]And the probability distribution of confidence between the jth value word,

confidence representing maximum probability of getting first placeholder [ PLH ]]And confidence between the j-th value word.

Exemplary, referring to FIG. 5, there are 3 first placeholders, and 5 value words (the respective corresponding encoding vectors are

And->

) Wherein the confidence level +/between each first placeholder and each value word is calculated>

The calculation process and the results are referred to in table 1.

TABLE 1

Further, in the embodiment of the application, after the confidence of each first placeholder and each value word is determined, the first placeholder in the placeholder sequence is replaced by the value word with the highest confidence score for each first placeholder, so that the second output text is obtained.

Illustratively, referring to Table 1, for a first placeholder [ PLH ]]The highest confidence is

The first placeholder is replaced with the corresponding value word "Li Wen". For the second first placeholder [ PLH ]]The highest confidence is +.>

The second first placeholder is replaced with the corresponding value word "actor". For the third first placeholder [ PLH ] ]The highest confidence is +.>

The third first placeholder is replaced with the corresponding value word "model". And then the second output text y can be obtained ₂ ={[BOS][ Li Wen ]][ actor ]][ model ]][EOS]Referring to FIG. 5, placeholder sequences y ₁ ={[BOS][PLH][PLH][PLH][EOS]The second output text y can be obtained after the processing of the decoder and the first word predictor ₂ ={[BOS][ Li Wen ]][ actor ]][ model ]][EOS]。

S408, determining the first output text according to the second output text.

In an alternative embodiment, the second output text is directly determined as the first output text for processing of the subsequent text splicing model.

In another alternative embodiment, the text content extraction model further includes a second word predictor that determines a first output text from the second output text, including: determining an embedded vector of the word in the second output text; inputting the embedded vectors, the placeholder decoding data and the plurality of encoding vectors into a decoder for decoding processing to obtain first decoding data of the words in the second output text; inputting the first decoding data and a plurality of coding vectors into a second word predictor, determining confidence scores between value words in a second output text and the coding vectors in the second word predictor, and replacing corresponding words in the second output text by using the value word with the highest confidence score aiming at the value words in the second output text to obtain a first output text.

Wherein, referring to FIG. 6, if the first [ PLH ]]Corresponding placeholder decoding data

Second [ PLH]Corresponding placeholder decoding data +.>

Third [ PLH]Corresponding placeholder decoding data +.>

Inputting the second output text obtained by the first word predictor as y ₂ ={[BOS][ Li Wen ]][ actor ]][ actor ]][EOS]It can be seen that there is a duplicate value word "actor" and then an embedded vector (e ₁ 、e ₃ And e ₃ ) The method comprises the steps of carrying out a first treatment on the surface of the Embedding vectors (e ₁ 、e ₃ And e ₃ ) And placeholder decoding data (>

、/>

、/>

) A plurality of coding vectorsH ^e ) The input decoder performs decoding processing to obtain first decoded data (h '1, h '2, and h ' 3) of the word in the second output text.

Further, in the embodiment of the present application, the second word may be pre-determinedThe text output by the tester is used as a first output text. For example, the text output by the second word predictor in FIG. 6 is y' ₂ ={[BOS][ Li Wen ]][ actor ]][ model ]][EOS]. The text y 'can be entered' ₂ ={[BOS][ Li Wen ]][ actor ]][ model ]][EOS]As the first output text.

In the embodiment of the application, the prediction mode of the second word predictor is the same as that of the first word predictor, and the projection matrix is different. In addition, the second word predictor is added to calibrate the output result of the first word predictor, so that more accurate first output text can be output.

In another alternative embodiment, the text content extraction model further includes: a word remover for determining a first output text from a second output text, comprising: inputting the second output text and the plurality of encoding vectors into a decoder to obtain second decoding data corresponding to the second output text; and inputting the second decoding data into a word remover, determining the accuracy rate of the words in the second output text in the word remover, and removing the words with the accuracy rate smaller than a first threshold value in the second output text to obtain the first output text.

In the embodiments of the present application. Referring to fig. 5, the second output text output from the first word predictor may be processed in the text input word remover, or the second output text input word predictor of fig. 6 may be processed in the text input word remover.

Specifically, the second output text and the plurality of encoding vectors are input into a decoder to obtain second decoding data corresponding to the second output text, namely second decoding data corresponding to each value word in the second output text can be obtained, and the second decoding data is expressed as

For example, the second decoded data corresponding to "Li Wen" is expressed as +.>

Second decoded data corresponding to "actor" is expressed as +. >

Pair of' modelsThe second decoded data is denoted +.>

.

Specifically, in the word remover, the following equation (4) is used to determine the accuracy.

Formula (4)

Wherein in formula (4), softmax is a logistic regression function, W _d For a pre-trained projection matrix, pi ^d Probability distribution, d, representing the accuracy of the i-th value word _i The correct rate of the highest probability is represented as the correct rate of the i-th value word.

Further, the correctness is set to 0.5, and then the value words with the correctness smaller than 0.5 in the second output text are deleted. Illustratively, referring to FIG. 5, the second output text y ₂ ={[BOS][ Li Wen ]][ actor ]][ model ]][EOS]In the input word deleter, due to value word [ Li Wen ]][ actor ]][ model ]]The correctness of the value words in the second output text is larger than 0.5, the value words in the second output text can not be deleted, and the first output text y output by the word deleting device ₃ ={[BOS][ Li Wen ]][ actor ]][ model ]][EOS]The same as the second output text.

In the embodiment of the application, the setting word deleter can further improve the prediction accuracy of the first output text.

S409, inputting the first output text and the plurality of encoding vectors into a decoder for decoding processing to obtain third decoded data.

In an embodiment of the present application, a text splicing model includes: a decoder, a placeholder predictor, and a third word predictor. The text splicing model and the text content extraction model share the decoder, the placeholder predictor and the word remover, so that the calculation cost and the memory occupation can be reduced.

In addition, the decoding principle of the decoder is referred to the above, and will not be described herein.

S410, inputting the third decoded data into a placeholder predictor, predicting the number of second placeholders between adjacent value words in the first output text in the placeholder predictor, and adding second placeholders corresponding to the number of second placeholders between the adjacent value words to obtain a character sequence.

In the embodiment of the application, the first identification character [ BOS ] and the second identification character [ EOS ] are also used as value words, and the second placeholder is predicted together with other value words. Wherein the second placeholder may be identical to the first placeholder.

Illustratively, referring to FIG. 5, at the placeholder predictor for the first output text y ₃ ={[BOS][ Li Wen ]][ actor ]][ model ]][EOS]Prediction [ BOS][ Li Wen ]]The number of second placeholders in between, e.g. 0. Prediction [ Li Wen ]][ actor ]]The number of second placeholders in between is e.g. 3. Prediction [ actor ]][ model ]]The number of second placeholders in between is e.g. 1. Predictive model][EOS]The number of the second placeholders is 0, and the corresponding placeholders are filled in the corresponding positions to obtain a character sequence y ₄ ={[BOS][ Li Wen ]][PLH][PLH][PLH][ actor ]][PLH][ model ]][EOS]。

S411, inputting the character sequence and the plurality of coding vectors into a decoder for decoding processing to obtain fourth decoding data.

And S412, inputting the fourth decoded data into a third word predictor, and predicting the filling characters corresponding to the placeholders in the word predictor, and replacing the corresponding second placeholders in the character sequence by the filling characters to obtain a third output text.

Wherein the third word predictor is pre-trained, and the third word predictor can predict the filling text of each placeholder. Referring to FIG. 5, for example, [ Li Wen ]][ actor ]]The first placeholder word is "yes", the second placeholder word is "one", and the third placeholder word is "one". [ actor ]][ model ]]The text corresponding to the placeholder between is "and". And then the third output text output by the third word predictor is y ₅ ={[BOS][ Li Wen ]][ is]First one][ number ]][ actor ]][ sum ]][ model ]][EOS]。

S413, determining a target output text according to the third output text.

The method comprises the following steps ofIn an embodiment, the third output text may be directly determined as the target output text, and then the target output text is y ₅ ={[BOS][ Li Wen ]][ is]First one][ number ]][ actor ]][ sum ]][ model ]][EOS]I.e. "Li Wen is an actor and model.

In another embodiment, the text splice model further comprises: a word remover for determining a target output text from the third output text, comprising: inputting the third output text and the plurality of encoding vectors into a decoder to obtain fifth decoding data corresponding to the third output text; and inputting the fifth decoded data into a word remover, determining the correct rate of each value word in the third output text in the word remover, and removing the words with the correct rate smaller than the second threshold value in the third output text to obtain the target output text.

The specific content of the word deleter refers to S408, and is not described herein.

Illustratively, referring to FIG. 5, the target output text output by the word remover in the text concatenation model is y ₆ ={[BOS][ Li Wen ]][ is][ number ]][ actor ]][ sum ]][ model ]][EOS]I.e. "Li Wen is an actor and model. That is, the correct rate corresponding to "one" is determined to be less than 0.5 in the word remover, and thus the word "one" is removed in the word remover.

In the embodiment of the application, firstly, the text content extraction model and the text splicing model share some modules, so that the calculation resources of the model can be reduced. In addition, the text content extraction model and the text splicing model are of non-self-scale type, so that the quality of the target output text can be improved. In addition, the first output text is extracted through the text content extraction model, the consistency between the extracted text and the to-be-processed form can be kept, then the first output text is processed through the text splicing model to obtain the target output text, the fluency of the target output text can be improved, and further the consistent and fluent target output text can be obtained.

Referring to fig. 7, a model training method provided in the present application specifically includes the following steps:

S701, acquiring a training sample and a label text corresponding to the training sample.

Wherein the training samples comprise a plurality of sample character sets, the plurality of sample character sets comprising: the text sets are composed of sample value words in a sample table and sample value words corresponding to the sample value words, the sample value words are located at positive sequence positions in the sample value text to which the sample value words belong, and the sample value words are located at negative sequence positions in the sample value text to which the sample value words belong according to a preset sequence.

In this embodiment of the present application, a sample table may be obtained first, and then the sample table is processed to obtain a sample character set, where the format of the sample character set is referred to the above embodiment and will not be described herein again. The label sample is a natural language text with consistency and fluency corresponding to the training sample.

S702, inputting a plurality of sample character sets into an encoder for encoding processing to obtain a plurality of sample encoding vectors.

Wherein the sample encoding vectors and the sample character sets are in one-to-one correspondence. In addition, the encoding mode of the encoder refers to the above embodiment, and will not be described herein.

S703, inputting the plurality of sample coding vectors into a text content extraction model to extract text content, thereby obtaining a first predicted text.

S704, determining the characters which are the same as the sample value words in the label text, and obtaining the intermediate text.

Wherein, for example, the label text is "Li Wen is actor and modality", the sample value word includes: li Wen, wang Wu, actors, models and 22. Then Li Wen, actors and models are determined to be intermediate text.

S705, adjusting model parameters of the text content extraction model according to the first loss values of the first predicted text and the intermediate text.

In the embodiment of the application, the model parameters of the text content extraction model may be adjusted according to the first loss values of the first predicted text and the intermediate text first until the first loss values of the first predicted text and the intermediate text are smaller than the first loss value threshold.

S706, inputting the plurality of sample coding vectors and the first predicted text into a text splicing model for text splicing, and obtaining a target predicted text.

And S707, adjusting model parameters of the text content extraction model and model parameters of the text splicing model according to the second loss values of the target predicted text and the label text.

In the embodiment of the application, the text content extraction model and the text splicing model can be trained simultaneously in a multi-task mode. Wherein the loss function is expressed as formula (5):

L=λL ₁ +L ₂ Formula (5)

Wherein L is the total loss value, lambda is a preset coefficient, L ₁ And extracting a loss value corresponding to the model for the text content. L (L) ₂ And the loss value corresponding to the text splicing model.

Further, L ₁ = L'1+ L'2+ L'3 ；L ₂ =l ʺ 1 +l ʺ 2+L ʺ 3. Wherein L '1 represents the loss value corresponding to the placeholder predictor in the processing process of the text content extraction model, L '2 represents the loss value corresponding to the first word predictor in the processing process of the text content extraction model, and L '3 represents the loss value corresponding to the word remover in the processing process of the text content extraction model. L ʺ represents a loss value corresponding to a placeholder predictor in a processing process of the text splicing model, L ʺ represents a loss value corresponding to a third word predictor in a processing process of the text splicing model, and L ʺ 3 represents a loss value corresponding to a word remover in a processing process of the text splicing model.

Specifically, the loss value is calculated with reference to the following formula,

；/>

；/>

. In these formulas, T' represents the sample table, P represents the tag sample, ++>

And the combination vector of the first identification character group and the second identification character group corresponding to the label text is represented.And->

Corresponding to the above. />

The number of words in the tag text may be a set number. />

Further, the correctness of each value word predicted by the word deleter

Will be lower than the threshold +.>

And deleting the corresponding value word, and calculating a loss value L'3 between the obtained result and the intermediate text.

Exemplary, if the tag text is 5 words, then

5, will->

The number of placeholders is predicted in the input placeholder predictor, based on the number of placeholders predicted and +.>

The loss value of (a) adjusts the parameters of the placeholder predictor (projection matrix).

Further, will

The number of placeholders is filled between the first identification character and the second identification character, resulting in +.>

. For example: />

=[BOS] [PLH][PLH] [PLH][PLH] [PLH][EOS]Inputting into a first word predictor, determining confidence level of each placeholder and each value word in the label sample, wherein +.>

Representing the confidence of the ith placeholder with the corresponding encoding vector. For example, if the first placeholder [ PLH ]]The corresponding value word is [ Li Wen ]]Then correspond to the first placeholder [ PLH ]]And [ Li Wen ]]The true value of the confidence coefficient of the word is set to be 1, the confidence coefficient of the word with other values is set to be 0, and then the first word predictor is adjusted according to the prediction result and the true value.

In the embodiment of the present application, the determination of L ʺ 1, L ʺ 2 and L ʺ refers to the foregoing, and in addition, in the embodiment of the present application, the portion of the training process for obtaining the prediction result may refer to the process of determining the target output text in the foregoing embodiment, which is not described herein again.

In addition, the application also provides a text generation method, which is applied to the terminal equipment and comprises the following steps: acquiring a form to be processed; the form to be processed is sent to a server; and receiving target output text sent by the server, wherein the target output text is determined by the server according to the text generation method of the embodiment.

According to the embodiment of the application, the to-be-processed form can be obtained from the terminal equipment, and the to-be-processed form is sent to the server, so that the description text of the to-be-processed form can be obtained.

In addition, the application also provides a text generation system, which comprises:

the cloud server is provided with a pre-trained text content extraction model;

the terminal equipment is used for acquiring the form to be processed and sending the form to be processed to the server;

the cloud server is used for acquiring a to-be-processed form; generating a plurality of character groups based on the table to be processed, wherein the plurality of character groups comprise: the text groups are composed of key texts in the tables to be processed and one value word corresponding to the key texts, the value word is at the positive sequence position in the value text, and the value word is at the negative sequence position in the value text according to the preset sequence; inputting a plurality of character groups into an encoder for encoding processing to obtain a plurality of encoding vectors, wherein the encoding vectors correspond to the character groups one by one; inputting a plurality of coding vectors into a text content extraction model to extract text content, and obtaining a first output text, wherein the first output text comprises at least one value word in a to-be-processed table; inputting a plurality of coding vectors and the first output text into a text splicing model for text splicing to obtain a target output text corresponding to a to-be-processed form, wherein the target output text comprises the first output text and characters in a preset word stock;

And the terminal equipment is used for receiving the target output text sent by the server.

The specific implementation process refers to the above embodiment, and will not be described herein.

In the embodiment of the present application, in addition to providing a text generating method, a text generating apparatus is provided, as shown in fig. 8, the text generating apparatus 80 includes:

an obtaining module 81, configured to obtain a table to be processed;

a generating module 82, configured to generate a plurality of character groups based on the table to be processed, where the plurality of character groups includes: the text groups are composed of key texts in the tables to be processed and one value word corresponding to the key texts, the value word is at the positive sequence position in the value text, and the value word is at the negative sequence position in the value text according to the preset sequence;

the encoding module 83 is configured to input a plurality of character sets into the encoder to perform encoding processing, so as to obtain a plurality of encoding vectors, where the encoding vectors correspond to the character sets one by one;

the extracting module 84 is configured to input the plurality of encoding vectors into the text content extracting model to extract text content, so as to obtain a first output text, where the first output text includes at least one value word in the to-be-processed table;

and the splicing module 85 is configured to input the plurality of encoding vectors and the first output text into the text splicing model for text splicing, so as to obtain a target output text corresponding to the to-be-processed form, where the target output text includes the first output text and characters in a preset word stock.

In an alternative embodiment, the plurality of character sets further comprises: the text content extraction model comprises: the decoder, the placeholder predictor and the first word predictor, the extraction module 84 is specifically configured to: inputting the coded vectors corresponding to the first identification character group and the second identification character group into a decoder for decoding to obtain corresponding identification decoding data, wherein the identification decoding data comprises decoding data of the first identification character and decoding data of the second identification character; inputting the identification decoding data into a placeholder predictor, predicting the first number of placeholders between a first identification character and a second identification character in the placeholder predictor, and adding the first number of placeholders between the first identification character and the second identification character to obtain a placeholder sequence; inputting the placeholder sequences and the plurality of coding vectors into a decoder for decoding processing to obtain placeholder decoding data of the first placeholder; inputting the placeholder decoding data and a plurality of coding vectors into a first word predictor, determining confidence scores between the placeholders and the coding vectors in the first word predictor, and replacing the first placeholders in the placeholder sequence by using a value word with the highest confidence score for the first placeholders to obtain a second output text; the first output text is determined from the second output text.

In an alternative embodiment, the text content extraction model further includes a second word predictor, and the extraction module 84 is configured to determine the first output text based on the second output text, specifically: determining an embedded vector of the word in the second output text; inputting the embedded vectors, the placeholder decoding data and the plurality of encoding vectors into a decoder for decoding processing to obtain first decoding data of the words in the second output text; inputting the first decoding data and a plurality of coding vectors into a second word predictor, determining confidence scores between value words in a second output text and the coding vectors in the second word predictor, and replacing corresponding words in the second output text by using the value word with the highest confidence score aiming at the value words in the second output text to obtain a first output text.

In an alternative embodiment, the text content extraction model further comprises: the word remover, the extraction module 84 is specifically configured to determine the first output text from the second output text: inputting the second output text and the plurality of encoding vectors into a decoder to obtain second decoding data corresponding to the second output text; and inputting the second decoding data into a word remover, determining the accuracy rate of the words in the second output text in the word remover, and removing the words with the accuracy rate smaller than a first threshold value in the second output text to obtain the first output text.

In an alternative embodiment, the text splice model includes: decoder, placeholder predictor and third word predictor, splice module 85, specifically for: inputting the first output text and the plurality of encoding vectors into a decoder for decoding processing to obtain third decoding data; inputting the third decoded data into a placeholder predictor, predicting the number of second placeholders between adjacent value words in the first output text in the placeholder predictor, and adding second placeholders corresponding to the number of second placeholders between the adjacent value words to obtain a character sequence; inputting the character sequence and the plurality of coding vectors into a decoder for decoding processing to obtain fourth decoding data; inputting the fourth decoded data into a third word predictor, predicting filling characters corresponding to the placeholders in the word predictor, and replacing the corresponding second placeholders in the character sequence by the filling characters to obtain a third output text; and determining target output text according to the third output text.

In an alternative embodiment, the text splice model further comprises: the word remover, the splicing module 85, in determining the target output text according to the third output text, is specifically configured to: inputting the third output text and the plurality of encoding vectors into a decoder to obtain fifth decoding data corresponding to the third output text; and inputting the fifth decoded data into a word remover, determining the accuracy of the words in the third output text in the word remover, and removing the words with the accuracy smaller than a second threshold value in the third output text to obtain the target output text.

In an alternative embodiment, the device further comprises a training module (not shown), specifically configured to: the text content extraction model is trained in the following manner: acquiring a training sample and a label text corresponding to the training sample, wherein the training sample comprises a plurality of sample character sets, and the plurality of sample character sets comprise: the text sets are composed of sample value words in a sample table and sample value words corresponding to the sample value words, the positive sequence positions of the sample value words in the sample value text to which the sample value words belong, and the negative sequence positions of the sample value words in the sample value text to which the sample value words belong are formed according to a preset sequence; inputting a plurality of sample character sets into an encoder for encoding processing to obtain a plurality of sample encoding vectors, wherein the sample encoding vectors correspond to the sample character sets one by one; inputting a plurality of sample coding vectors into a text content extraction model to extract text content, so as to obtain a first predicted text; determining the characters which are the same as the sample value words in the label text to obtain an intermediate text; model parameters of the text content extraction model are adjusted according to the first predicted text and the first penalty value of the intermediate text.

In an alternative embodiment, the training module (not shown) is further configured to: the text content extraction model and the text splice model are trained in the following manner: inputting a plurality of sample coding vectors and the first predicted text into a text splicing model for text splicing to obtain a target predicted text; and adjusting model parameters of the text content extraction model and model parameters of the text splicing model according to the second loss values of the target predicted text and the tag text.

The text generation device provided by the embodiment of the application can improve the fluency of the text generated based on the form data. The specific implementation process refers to the above method embodiment, and is not described herein.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations appearing in a particular order are included, but it should be clearly understood that the operations may be performed out of order or performed in parallel in the order in which they appear herein, merely for distinguishing between the various operations, and the sequence number itself does not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.

Fig. 9 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application. The electronic equipment is used for running the text generation method and the text generation method. As shown in fig. 9, the electronic device includes: a memory 94 and a processor 95.

Memory 94 is used to store computer programs and may be configured to store various other data to support operations on the electronic device. The memory 94 may be an object store (Object Storage Service, OSS).

The memory 94 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

A processor 95 coupled to the memory 94 for executing the computer program in the memory 94 for: acquiring a form to be processed; generating a plurality of character groups based on the table to be processed, wherein the plurality of character groups comprise: the text groups are composed of key texts in the tables to be processed and one value word corresponding to the key texts, the value word is at the positive sequence position in the value text, and the value word is at the negative sequence position in the value text according to the preset sequence; inputting a plurality of character groups into an encoder for encoding processing to obtain a plurality of encoding vectors, wherein the encoding vectors correspond to the character groups one by one; inputting a plurality of coding vectors into a text content extraction model to extract text content, and obtaining a first output text, wherein the first output text comprises at least one value word in a to-be-processed table; and inputting the plurality of coding vectors and the first output text into a text splicing model for text splicing to obtain a target output text corresponding to the to-be-processed form, wherein the target output text comprises the first output text and characters in a preset word stock.

Further optionally, the plurality of character sets further includes: the text content extraction model comprises: the decoder, the placeholder predictor and the first word predictor, the processor 95 is specifically configured to: inputting the coded vectors corresponding to the first identification character group and the second identification character group into a decoder for decoding to obtain corresponding identification decoding data, wherein the identification decoding data comprises decoding data of the first identification character and decoding data of the second identification character; inputting the identification decoding data into a placeholder predictor, predicting the first number of placeholders between a first identification character and a second identification character in the placeholder predictor, and adding the first number of placeholders between the first identification character and the second identification character to obtain a placeholder sequence; inputting the placeholder sequences and the plurality of coding vectors into a decoder for decoding processing to obtain placeholder decoding data of the first placeholder; inputting the placeholder decoding data and a plurality of coding vectors into a first word predictor, determining confidence scores between the placeholders and the coding vectors in the first word predictor, and replacing the first placeholders in the placeholder sequence by using a value word with the highest confidence score for the first placeholders to obtain a second output text; the first output text is determined from the second output text.

In an alternative embodiment, the text content extraction model further includes a second word predictor, and the processor 95 is configured to determine the first output text from the second output text by: determining an embedded vector of the word in the second output text; inputting the embedded vectors, the placeholder decoding data and the plurality of encoding vectors into a decoder for decoding processing to obtain first decoding data of the words in the second output text; inputting the first decoding data and a plurality of coding vectors into a second word predictor, determining confidence scores between value words in a second output text and the coding vectors in the second word predictor, and replacing corresponding words in the second output text by using the value word with the highest confidence score aiming at the value words in the second output text to obtain a first output text.

In an alternative embodiment, the text content extraction model further comprises: the word remover, the processor 95 is configured to determine the first output text from the second output text, specifically: inputting the second output text and the plurality of encoding vectors into a decoder to obtain second decoding data corresponding to the second output text; and inputting the second decoding data into a word remover, determining the accuracy rate of the words in the second output text in the word remover, and removing the words with the accuracy rate smaller than a first threshold value in the second output text to obtain the first output text.

In an alternative embodiment, the text splice model includes: the decoder, the placeholder predictor and the third word predictor, the processor 95 is specifically configured to: inputting the first output text and the plurality of encoding vectors into a decoder for decoding processing to obtain third decoding data; inputting the third decoded data into a placeholder predictor, predicting the number of second placeholders between adjacent value words in the first output text in the placeholder predictor, and adding second placeholders corresponding to the number of second placeholders between the adjacent value words to obtain a character sequence; inputting the character sequence and the plurality of coding vectors into a decoder for decoding processing to obtain fourth decoding data; inputting the fourth decoded data into a third word predictor, predicting filling characters corresponding to the placeholders in the word predictor, and replacing the corresponding second placeholders in the character sequence by the filling characters to obtain a third output text; and determining target output text according to the third output text.

In an alternative embodiment, the text splice model further comprises: the word remover, the processor 95 is configured to determine the target output text based on the third output text, specifically: inputting the third output text and the plurality of encoding vectors into a decoder to obtain fifth decoding data corresponding to the third output text; and inputting the fifth decoded data into a word remover, determining the accuracy of the words in the third output text in the word remover, and removing the words with the accuracy smaller than a second threshold value in the third output text to obtain the target output text.

In an alternative embodiment, processor 95 is coupled to memory 94 for executing a computer program in memory 94 for further: acquiring a form to be processed; the form to be processed is sent to a server; and receiving target output text sent by the server, wherein the target output text is determined by the server according to the text generation method of the embodiment.

Further, as shown in fig. 9, the electronic device further includes: firewall 91, load balancer 92, communication component 96, power component 93, and other components. Only some of the components are schematically shown in fig. 9, which does not mean that the electronic device only comprises the components shown in fig. 9.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing a computer program, which when executed by a processor causes the processor to implement the steps in the methods shown above.

Accordingly, embodiments of the present application also provide a computer program product comprising a computer program/instructions which, when executed by a processor, cause the processor to carry out the steps of the method shown above.

The communication assembly of fig. 9 is configured to facilitate wired or wireless communication between the device in which the communication assembly is located and other devices. The device where the communication component is located can access a wireless network based on a communication standard, such as a mobile communication network of WiFi,2G, 3G, 4G/LTE, 5G, etc., or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related text from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

The power supply assembly shown in fig. 9 provides power to various components of the device in which the power supply assembly is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable text processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable text processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable text processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable text processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs and/or GPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both permanent and non-permanent, removable and non-removable media, may implement text storage by any method or technology. The text may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store text that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A text generation method, comprising:

acquiring a form to be processed;

generating a plurality of character groups based on the table to be processed, wherein the plurality of character groups comprise: the text groups are composed of key texts in the to-be-processed form and one value word corresponding to the key texts, the value word is at a positive sequence position in the value text, and the value word is at a reverse sequence position in the value text according to a preset sequence;

inputting the plurality of character groups into an encoder for encoding processing to obtain a plurality of encoding vectors, wherein the encoding vectors correspond to the character groups one by one;

inputting the plurality of coding vectors into a text content extraction model to extract text content, and obtaining a first output text, wherein the first output text comprises at least one value word in the to-be-processed table;

And inputting the plurality of coding vectors and the first output text into a text splicing model for text splicing to obtain a target output text corresponding to the to-be-processed form, wherein the target output text comprises the first output text and characters in a preset word stock, and the text content extraction model and the text splicing model are both non-autoregressive prediction models.

2. The text generation method according to claim 1, wherein the plurality of character sets further comprises: a first set of identification characters and a second set of identification characters, the text content extraction model comprising: the method comprises the steps of inputting the plurality of coding vectors into a pre-trained text content extraction model to extract text content, obtaining a first output text, and the method comprises the following steps:

inputting the encoding vectors corresponding to the first identification character group and the second identification character group into a decoder for decoding processing to obtain corresponding identification decoding data, wherein the identification decoding data comprises decoding data of the first identification character and decoding data of the second identification character;

inputting the identification decoding data into a placeholder predictor, predicting the first number of placeholders between the first identification character and the second identification character in the placeholder predictor, and adding the first placeholders of the first number of placeholders between the first identification character and the second identification character to obtain a placeholder sequence;

Inputting the placeholder sequences and the plurality of encoding vectors into the decoder for decoding processing to obtain placeholder decoding data of a first placeholder;

inputting the placeholder decoding data and the plurality of coding vectors into a first word predictor, determining confidence scores between the placeholders and the coding vectors in the first word predictor, and replacing the first placeholders in the placeholder sequence by using a value word with the highest confidence score for the first placeholders to obtain a second output text;

and determining the first output text according to the second output text.

3. The text generation method of claim 2, wherein the text content extraction model further comprises a second word predictor, the determining the first output text from the second output text comprising:

determining an embedded vector of the word in the second output text;

inputting the embedded vector, the placeholder decoding data and the plurality of encoding vectors into the decoder for decoding processing to obtain first decoding data of the word in the second output text;

inputting the first decoding data and the plurality of coding vectors into the second word predictor, determining confidence scores between value words in a second output text and the coding vectors in the second word predictor, and replacing corresponding words in the second output text with the value words with the highest confidence scores aiming at the value words in the second output text to obtain the first output text.

4. The text generation method according to claim 2, wherein the text content extraction model further includes: a word remover, said determining said first output text from said second output text, comprising:

inputting the second output text and the plurality of encoding vectors into the decoder to obtain second decoding data corresponding to the second output text;

and inputting the second decoding data into the word remover, determining the accuracy rate of the words in the second output text in the word remover, and deleting the words with the accuracy rate smaller than a first threshold value in the second output text to obtain the first output text.

5. The text generation method of claim 4, wherein the text splicing model comprises: the decoder, the placeholder predictor, and the third word predictor, the inputting the plurality of encoding vectors and the first output text into a pre-trained text splicing model for text splicing, to obtain a target output text, includes:

inputting the first output text and the plurality of encoding vectors into the decoder for decoding processing to obtain third decoded data;

Inputting the third decoded data into the placeholder predictor, predicting the number of second placeholders between adjacent value words in the first output text in the placeholder predictor, and adding second placeholders corresponding to the number of second placeholders between adjacent value words to obtain a character sequence;

inputting the character sequence and the plurality of coding vectors into the decoder for decoding processing to obtain fourth decoding data;

inputting the fourth decoded data into the third word predictor, and predicting filling characters corresponding to placeholders in the word predictor, and replacing corresponding second placeholders in a character sequence by using the filling characters to obtain a third output text;

and determining the target output text according to the third output text.

6. The text generation method of claim 5, wherein the text splicing model further comprises: the word remover, the determining the target output text according to the third output text, includes:

inputting the third output text and the plurality of encoding vectors into the decoder to obtain fifth decoding data corresponding to the third output text;

And inputting the fifth decoded data into the word remover, determining the accuracy rate of the words in the third output text in the word remover, and deleting the words with the accuracy rate smaller than a second threshold value in the third output text to obtain the target output text.

7. A method of model training, comprising:

acquiring a training sample and a label text corresponding to the training sample, wherein the training sample comprises a plurality of sample character sets, and the plurality of sample character sets comprise: the text groups are composed of sample value words in a sample table and sample value words corresponding to the sample value words, the sample value words are at positive sequence positions in the belonging sample value text, and the sample value words are at reverse sequence positions in the belonging sample value text according to a preset sequence;

inputting the plurality of sample character sets into an encoder for encoding processing to obtain a plurality of sample encoding vectors, wherein the sample encoding vectors correspond to the sample character sets one by one;

inputting the plurality of sample coding vectors into a text content extraction model to extract text content, so as to obtain a first predicted text;

Determining the characters which are the same as the sample value words in the label text to obtain an intermediate text;

adjusting model parameters of the text content extraction model according to the first predicted text and the first loss value of the intermediate text;

inputting the plurality of sample coding vectors and the first predicted text into a text splicing model for text splicing to obtain a target predicted text;

and adjusting model parameters of the text content extraction model and model parameters of the text splicing model according to the second loss values of the target predicted text and the label text, wherein the text content extraction model and the text splicing model are non-autoregressive prediction models.

8. A text generation method, characterized by being applied to a terminal device, comprising:

acquiring a form to be processed;

the form to be processed is sent to a server;

receiving target output text sent by the server, wherein the target output text is determined by the server according to the text generation method of any one of claims 1 to 6.

9. A text generating apparatus, comprising:

the acquisition module is used for acquiring the form to be processed;

The generating module is used for generating a plurality of character groups based on the table to be processed, wherein the plurality of character groups comprise: the text groups are composed of key texts in the to-be-processed form and one value word corresponding to the key texts, the value word is at a positive sequence position in the value text, and the value word is at a reverse sequence position in the value text according to a preset sequence;

the coding module is used for inputting the plurality of character groups into the coder to carry out coding processing to obtain a plurality of coding vectors, and the coding vectors are in one-to-one correspondence with the character groups;

the extraction module is used for inputting the plurality of coding vectors into a text content extraction model to extract text content to obtain a first output text, wherein the first output text comprises at least one value word in the to-be-processed table;

and the splicing module is used for carrying out text splicing on the plurality of coding vectors and the first output text in a text splicing model to obtain a target output text corresponding to the to-be-processed form, wherein the target output text comprises the first output text and characters in a preset word stock, and the text content extraction model and the text splicing model are both non-autoregressive prediction models.

10. A text generation system, comprising:

the cloud server is provided with a pre-trained text content extraction model;

the terminal equipment is used for acquiring a table to be processed and sending the table to be processed to the server;

the cloud server is used for acquiring a table to be processed; generating a plurality of character groups based on the table to be processed, wherein the plurality of character groups comprise: the text groups are composed of key texts in the to-be-processed form and one value word corresponding to the key texts, the value word is at a positive sequence position in the value text, and the value word is at a reverse sequence position in the value text according to a preset sequence; inputting the plurality of character groups into an encoder for encoding processing to obtain a plurality of encoding vectors, wherein the encoding vectors correspond to the character groups one by one; inputting the plurality of coding vectors into a text content extraction model to extract text content, and obtaining a first output text, wherein the first output text comprises at least one value word in the to-be-processed table; inputting the plurality of coding vectors and the first output text into a text splicing model for text splicing to obtain a target output text corresponding to the to-be-processed form, wherein the target output text comprises the first output text and characters in a preset word stock, and the text content extraction model and the text splicing model are both non-autoregressive prediction models;

The terminal equipment is used for receiving the target output text sent by the server.

11. An electronic device, comprising: a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the text generation method of any one of claims 1 to 6,8 and/or the model training method of claim 7 when the computer program is executed.