CN116595969A

CN116595969A - Text generation method and device, storage medium and electronic equipment

Info

Publication number: CN116595969A
Application number: CN202310162725.XA
Authority: CN
Inventors: 李怀松; 张天翼; 黄涛; 贾娟; 刘昶
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-02-23
Filing date: 2023-02-23
Publication date: 2023-08-15

Abstract

The specification discloses a method, a device, a storage medium and electronic equipment for generating text, which are used for acquiring service data, inputting the service data into a preset text generation model, so that the text generation model carries out statistics on the service data according to different statistical modes to obtain statistical results, and encoding the statistical results to obtain encoding characteristics corresponding to the statistical results. And determining a risk identification result aiming at the service data based on the obtained basic features and the coding features corresponding to each statistical result through a text generation model, and generating a description text for describing the service risk represented by the service data through the text generation model according to the risk identification result, the basic features and the coding features corresponding to each statistical result.

Description

Text generation method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for generating text, a storage medium, and an electronic device.

Background

With the rapid development of internet technology, text generation is widely applied to various business fields. The prior art often generates text content through a text generation model, and the general process is as follows: after input data is input into the text generation model, the text generation model outputs text content associated with the input data based on the input data.

In the risk identification field, service data can be input into a text generation model, the text generation model can perform risk identification according to the service data, and a text for describing service risks represented by the service data is generated for reference by air control personnel.

However, the text content generated at present also lacks logic in terms of semantic expression, and the obtained risk recognition result is also not accurate enough, so how to generate text content with clear and accurate logic expression is a problem to be solved urgently.

Disclosure of Invention

The specification provides a text generation method, a device, a storage medium and electronic equipment, which are used for solving the problem that text content with clear and accurate logic expression cannot be generated in the prior art.

The technical scheme adopted in the specification is as follows:

the present specification provides a method of text generation, the method comprising:

acquiring service data;

inputting the service data into a preset text generation model, so that the text generation model carries out statistics on the service data according to different statistical modes to obtain statistical results, coding the statistical results to obtain coding features corresponding to the statistical results, and determining basic features corresponding to the service data through the text generation model;

Determining a risk identification result for the service data based on the basic features and the coding features corresponding to each statistical result through the text generation model;

and generating a description text for describing the business risk represented by the business data through the text generation model according to the risk identification result, the basic feature and the coding feature corresponding to each statistical result.

Optionally, the service data table is input into a preset text generation model, so that the text generation model counts the data in the service data table according to different statistical modes to obtain each statistical result, and specifically includes:

inputting the service data into a preset text generation model, so that the text generation model carries out statistics on the service data according to each statistical mode aiming at the statistical modes, obtaining a statistical result to be processed, determining an identification character corresponding to each numerical value in the statistical result to be processed under the statistical modes, and determining the statistical result corresponding to the statistical modes according to the identification characters.

Optionally, the statistical method includes: the first statistical mode is used for counting the occurrence times of each data contained in the service data under the data category of each data;

Inputting the service data table into a preset text generation model, so that the text generation model carries out statistics on the service data according to each statistical mode aiming at the statistical mode to obtain a statistical result to be processed, determining an identification character corresponding to each numerical value in the statistical result to be processed under the statistical mode, and determining a statistical result corresponding to the statistical mode according to the identification character, wherein the method specifically comprises the following steps of:

and inputting the service data into a preset text generation model, respectively determining the times of occurrence of each data contained in the service data under the data category of each data when the text generation model adopts the first statistical mode, determining the identification characters corresponding to each data under each data category according to the identification characters corresponding to the numerical value of the times under the first statistical mode, and determining the statistical result corresponding to the first statistical mode according to the identification characters corresponding to each data under each data category.

Optionally, the statistical method includes: the second statistical mode is used for determining a data form adopted by the data under each data category in the service data;

Inputting the service data into a preset text generation model, so that the text generation model carries out statistics on the service data according to each statistical mode aiming at the statistical mode to obtain a statistical result to be processed, determining an identification character corresponding to each numerical value in the statistical result to be processed under the statistical mode, and determining a statistical result corresponding to the statistical mode according to the identification character, wherein the method specifically comprises the following steps of:

and inputting the service data into a preset text generation model, respectively determining the data form adopted by the data in each data type in the service data when the text generation model adopts the second statistical mode, determining the identification character corresponding to each data in each data type according to the identification character corresponding to the different data forms in the second statistical mode, and determining the statistical result corresponding to the second statistical mode according to the identification character corresponding to each data in each data type.

Optionally, the statistical method includes: the third statistical mode is used for sequencing all data contained in the service data in the data category of each data according to a preset arrangement sequence;

and when the text generation model adopts the third statistical mode, sequencing each data contained in the service data in the data category of each data according to a preset arrangement sequence, determining the identification character corresponding to each data in each data category according to the identification character corresponding to each data in the third statistical mode at different sequencing positions, and determining the statistical result corresponding to the third statistical mode according to the identification character corresponding to each data in each data category.

Optionally, generating a description text for describing the business risk represented by the business data according to the risk identification result, the basic feature and the coding feature corresponding to each statistical result, which specifically includes:

Determining the weight corresponding to each coding feature according to the basic feature;

generating each word according to the risk identification result, the weight and the coding features;

and generating the descriptive text according to the generated segmentation words.

Optionally, generating each word according to the risk identification result, the weight and each coding feature specifically includes:

determining a weighting vector according to the coding feature corresponding to each statistical result and the weight corresponding to each coding feature;

obtaining a predicted word vector based on the weighted vector through the text generation model, and determining the word corresponding to the predicted word vector according to the corresponding relation between each word and each word vector recorded in a preset dictionary;

re-inputting the word segment corresponding to the predicted word vector into the text generation model to generate the next word segment, and re-inputting the generated next word segment into the text generation model until all the word segments are determined.

Optionally, training the text generation model specifically includes:

acquiring sample service data and a preset standard text for describing service risks represented by the sample service data;

Inputting the sample service data into the text generation model, so that the text generation model carries out statistics on the sample service data according to different statistical modes to obtain statistical results corresponding to the sample service data, coding the statistical results corresponding to the sample service data to obtain coding characteristics of each statistical result corresponding to the sample service data, and determining basic characteristics corresponding to the sample service data through the text generation model;

determining a risk identification result for the sample service data based on basic features corresponding to the sample service data and coding features of each statistical result corresponding to the sample service data through the text generation model;

generating a description text for describing the service risk represented by the sample service data through the text generation model according to the risk identification result aiming at the sample service data, the basic characteristic corresponding to the sample service data and the coding characteristic of each statistical result corresponding to the sample service data, and taking the description text as the description text corresponding to the sample service data;

And training the text generation model by taking the difference between the description text corresponding to the minimized sample service data and the standard text as an optimization target.

The present specification provides an apparatus for text generation, the apparatus comprising:

the acquisition module is used for acquiring service data;

the input module is used for inputting the service data into a preset text generation model so that the text generation model can count the service data according to different statistical modes to obtain statistical results, coding the statistical results to obtain coding features corresponding to the statistical results, and determining basic features corresponding to the service data through the text generation model;

the determining module is used for determining a risk identification result aiming at the service data based on the basic characteristics and the coding characteristics corresponding to each statistical result through the text generation model;

and the generation module is used for generating a description text for describing the business risk represented by the business data through the text generation model according to the risk identification result, the basic characteristic and the coding characteristic corresponding to each statistical result.

Optionally, the input module is specifically configured to input the service data into a preset text generation model, so that the text generation model counts the service data according to each statistical mode, obtain a statistical result to be processed, determine an identification character corresponding to each numerical value in the statistical result to be processed in the statistical mode, and determine a statistical result corresponding to the statistical mode according to the identification character.

the first input module is specifically configured to input the service data into a preset text generation model, so that when the text generation model adopts the first statistical mode, the number of occurrences of each data included in the service data in each data category is respectively determined, the identification character corresponding to each data in each data category is determined according to the identification character corresponding to the numerical value of the number in the first statistical mode, and the statistical result corresponding to the first statistical mode is determined according to the identification character corresponding to each data in each data category.

the input module is specifically configured to input the service data into a preset text generation model, so that when the text generation model adopts the second statistical mode, a data form adopted by data in each data category in the service data is respectively determined, identification characters corresponding to each data in each data category are determined according to identification characters corresponding to different data forms in the second statistical mode, and a statistical result corresponding to the second statistical mode is determined according to the identification characters corresponding to each data in each data category.

the input module is specifically configured to input the service data into a preset text generation model, so that when the text generation model adopts the third statistical mode, sort each data included in the service data in a data category where each data is located according to a preset arrangement sequence, determine, according to identification characters corresponding to different sorting positions under the third statistical mode, identification characters corresponding to each data in each data category, and determine, according to identification characters corresponding to each data in each data category, a statistical result corresponding to the third statistical mode.

Optionally, the generating module is specifically configured to determine, according to the basic feature, a weight corresponding to each coding feature; generating each word according to the risk identification result, the weight and the coding features; and generating the descriptive text according to the generated segmentation words.

Optionally, the generating module is specifically configured to determine a weighting vector according to the coding feature corresponding to each statistical result and the weight corresponding to each coding feature; obtaining a predicted word vector based on the weighted vector through the text generation model, and determining the word corresponding to the predicted word vector according to the corresponding relation between each word and each word vector recorded in a preset dictionary; re-inputting the word segment corresponding to the predicted word vector into the text generation model to generate the next word segment, and re-inputting the generated next word segment into the text generation model until all the word segments are determined.

Optionally, the apparatus further comprises:

the training module is used for training the text generation model, wherein:

the training module is specifically used for acquiring sample service data and a preset standard text for describing service risks represented by the sample service data; inputting the sample service data into the text generation model, so that the text generation model carries out statistics on the sample service data according to different statistical modes to obtain statistical results corresponding to the sample service data, coding the statistical results corresponding to the sample service data to obtain coding characteristics of each statistical result corresponding to the sample service data, and determining basic characteristics corresponding to the sample service data through the text generation model; determining a risk identification result for the sample service data based on basic features corresponding to the sample service data and coding features of each statistical result corresponding to the sample service data through the text generation model; generating a description text for describing the service risk represented by the sample service data through the text generation model according to the risk identification result aiming at the sample service data, the basic characteristic corresponding to the sample service data and the coding characteristic of each statistical result corresponding to the sample service data, and taking the description text as the description text corresponding to the sample service data; and training the text generation model by taking the difference between the description text corresponding to the minimized sample service data and the standard text as an optimization target.

The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the method of text generation described above.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of text generation described above when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

in the text generation method provided by the specification, business data are input into a preset text generation model, so that the text generation model carries out statistics on the business data according to different statistical modes to obtain statistical results, and the statistical results are encoded to obtain encoding characteristics corresponding to the statistical results; and determining a risk identification result aiming at the service data based on the obtained basic features and the coding features corresponding to each statistical result through a text generation model, and generating a description text for describing the service risk represented by the service data through the text generation model according to the risk identification result, the basic features and the coding features corresponding to each statistical result.

According to the method, the text generation model adopts different statistical modes to carry out statistics on the service data, and all statistical results obtained under different statistical modes can reflect some characteristics associated with all data contained in the service data. Therefore, the text generation model can mine out the logic relation possibly hidden among the data through different statistical modes, encode the obtained statistical results, and determine the corresponding encoding characteristics of each statistical result, so that the subsequent text generation model is facilitated to generate the descriptive text with clear and accurate logic expression.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. Attached at

In the figure:

FIG. 1 is a flow diagram of a method of text generation provided in the present specification;

FIG. 2 is a schematic diagram of a business data table provided in the present specification;

FIG. 3 is a schematic diagram of an apparatus for a method of text generation provided herein;

fig. 4 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a flow chart of a method for generating text provided in the present specification, including the following steps:

s100: and acquiring service data.

With the rapid development of internet technology, users are now faced with many scenarios for performing payment services, such as face-to-face payment with merchants after shopping is completed, or payment operations performed by users on shopping software. After the user executes the payment service, the server may acquire a service data table generated by the payment behavior of the user in a period of time, and analyze the service data table to determine whether the payment behavior of the user includes a service risk.

The service data may include a service record table generated by the same user executing the service in a past period of time, or may include a service record table of different users executing the same service in a past period of time.

The execution subject of the present application may be a server, or may be an electronic device such as a tablet computer, a notebook computer, a desktop computer, or the like. For convenience of explanation, the text generation method provided by the present application will be explained below with only the server as the execution subject.

It should be noted that, the acquired service data may actually be a plurality of pieces of service data, and when the acquired service data is input into the text generation model in the subsequent process, the service data may actually be input into the text generation model in the form of a service data table.

S102: inputting the service data into a preset text generation model, so that the text generation model carries out statistics on the service data according to different statistical modes to obtain statistical results, coding the statistical results to obtain coding features corresponding to the statistical results, and determining basic features corresponding to the service data through the text generation model.

Fig. 2 is a schematic diagram of service data provided in the present specification, and the service data acquired by the server may be a service data table similar to that of fig. 2, and for convenience of explanation, the service data table in fig. 2 will be taken as an example.

The server may sequentially input data in the service data table into a preset text generation model by rows, and input an input terminator "CLS" into the preset text generation model. Wherein the input terminator indicates that the input of the service data is completed. After all the data in the service data table are input into the preset text generation model, the text generation model can start to count the data in the service data table according to different statistical modes, wherein the statistical results obtained by the different statistical modes are different.

Specifically, the text generation model may use a first statistical manner, a second statistical manner, and a third statistical manner to perform statistics on the service data. The first statistical method is used for counting the occurrence times of each data contained in the service data under the data category of each data.

For example, when the text generation model uses the first statistical mode to count the service data, the number of times of occurrence of each data contained in the determined service data under the data category of each data can be used as the statistical result to be processed. Specifically, as shown in fig. 2, under the data category of "time", the data category of "1 month and 1 day" appears 2 times, the data category of "1 month and 2 days" appears 2 times, and the data category of "1 month and 3 days" appears 1 time, then, for the data category of "1 month and 1 day", the number of times appearing under the data category of "time" is 2, the text generation model can take 2 as a statistical result to be processed, so that the subsequent further processing of the data category of "1 month and 1 day" is convenient.

Further, after determining the number of times of occurrence of each data included in the service data under the data category where each data is located by adopting the first statistical mode, the text generation model may further determine the identification character corresponding to each data under each data category according to the identification character corresponding to the number of times (i.e. the statistical result to be processed) under the first statistical mode, so as to determine the statistical result corresponding to the first statistical mode.

It should be noted that, the server may preset the identification character corresponding to the statistical result to be processed of each data in the first statistical mode. For example, when the text generation model uses the first statistical mode to count each data, if the statistical result to be processed is 1, the identification character corresponding to the result to be processed is "c_1", if the statistical result to be processed is 2, the identification character corresponding to the result to be processed is "c_2", and so on.

For the data of "1 month and 1 day" in fig. 2, the number of occurrences of the data under the category of "time" is 2, that is, the statistical result to be processed is 2, then the identification character corresponding to the statistical result to be processed of the data of "1 month and 1 day" is "c_2", and the text generation model can use "c_2" as the statistical result corresponding to the data of "1 month and 1 day" in the first statistical mode. And so on, for the data "B" in fig. 2, the number of times of occurrence of the data "B" under the category of the data "place" is 1, that is, the statistical result to be processed is 1, then the identification character corresponding to the statistical result to be processed of the data "B" is "c_1", and the text generation model can use "c_1" as the statistical result corresponding to the data "B" under the first statistical mode.

In addition, the text generation model can also adopt a second statistical mode to carry out statistics on the service data, and the data form adopted by the data under each data category in the determined service data is used as a statistical result to be processed. The data form adopted by the data under each data category in the service data can comprise date, integer, percentage, letter and the like, and the specification does not limit the data form adopted by the data.

In order to facilitate the processing of the data, the server may further preset an identification character corresponding to a data form (i.e. a statistical result to be processed) adopted by the data under each data category in the service data in the second statistical manner. For example, when the text generation model uses the second statistical mode to count each data, if the data format adopted by the data is "date", the identification character corresponding to the statistical result to be processed is "ca_1", if the data format adopted by the data is "integer", the identification character corresponding to the statistical result to be processed is "ca_2", if the data format adopted by the data is "letter", the identification character corresponding to the statistical result to be processed is "ca_3", and so on.

Specifically, as shown in fig. 2, the data forms adopted by the data in the data category of "time" are all dates, and when the text generation model adopts the second statistical mode to count the data, the statistical result to be processed of all the data in the data category of "time" is all the "date", the identification character corresponding to the statistical result to be processed is "ca_1", and the text generation model can use "ca_1" as the statistical result corresponding to all the data in the data category of "time" in the second statistical mode. Similarly, in fig. 2, the statistics to be processed of all data under the item of data category of "item" is an "integer", the corresponding identification characters are "ca_2", and the coding model may use "ca_2" as the statistics corresponding to all data under the item of data category of "item" in the second statistical mode.

In the present specification, the text generation model may further use a third statistical manner to perform statistics on service data, specifically, sort each data included in the service data in a data category where each data is located according to a preset arrangement sequence, and use different sorting positions of each data as a statistical result corresponding to the third statistical manner. The preset arrangement sequence may refer to the order of sizes of the data values in the respective data categories, including the order from large to small or the order from small to large.

For example, when the text generation model uses the third statistical method to count each data, all the data "200, 300, 360, 200, 360" in the data category "item" in fig. 2, which indicates the number of items purchased by the user, are referred to. After the numerical values of all the data are ordered in the order from small to large, the ordering position result of each data is 1,3,4,2,5, the text generation model can take the position result of all the data in the item of data category from small to large as a statistical result, and can also take the position result of all the data ordered from large to small as a statistical result.

It should be noted that, in the present specification, there are various statistical manners in which the text generation model performs statistics on the service data, and the statistical manners including the first statistical manner and the second statistical manner and the third statistical manner are not limited in the present specification.

After the business data are counted according to different statistical modes and the statistical results are obtained, the text generation model can encode the statistical results through a preset encoding mode to obtain the encoding features corresponding to each statistical result, and the subsequent server can process the obtained encoding features through the text generation model.

The server also needs to determine basic characteristics corresponding to the service data through a text generation model, namely, the server inputs the service data into the text generation model, and the text generation model obtains the basic characteristics according to the semantics in the service data.

S104: and determining a risk identification result aiming at the service data based on the basic characteristics and the coding characteristics corresponding to each statistical result through the text generation model.

S106: and generating a description text for describing the business risk represented by the business data through the text generation model according to the risk identification result, the basic feature and the coding feature corresponding to each statistical result.

After obtaining the coding features corresponding to each statistical result in different statistical modes, the server can perform risk identification on the service data based on the obtained basic features and the coding features through a text generation model so as to obtain a risk identification result.

The server may determine weights corresponding to the coding features according to the basic features by using a text generation model, and then, perform weighted sum calculation on the coding features and the weights corresponding to the coding features, and further determine the weighted vector.

The server may generate a model through text, and obtain a predicted word vector based on the weighted vector. After the text generation model obtains the predicted word vector, the word corresponding to the predicted word vector can be determined according to the corresponding relation between each word and each word vector recorded in the preset dictionary. The preset dictionary comprises corresponding relations between each word and each word vector.

According to the predicted word vector, the text generation model can output the word segment corresponding to the predicted word vector as the next word segment. After the next word is obtained, the server can input the word as input and re-input the word into the text generation model, so that the text generation model continues to generate the next word according to the input word until all the words are determined, and generates descriptive text according to all the determined words.

The description text is used for describing the business risk characterized by the business data. After the service data is acquired, the text generation model can analyze the service data by adopting different statistical modes to mine the implicit logic relation among the data, so that the description text generated by the text generation model can reflect the information related to the data contained in the service data, and further judge whether the service behavior of the user has service risk.

For example, the business data records data of a plurality of users executing a certain business, the text generation model can mine an implicit logic relation between the business data, for example, if some users purchase certain virtual articles, the number of the articles purchased by the users is similar, and the IP addresses or harvest addresses of the users are also highly similar, the descriptive text generated by the text generation model can reflect the information, thereby determining that the users possibly belong to a risk user group, and further judging that the businesses executed by the users have a certain risk.

It is noted that the server may train the text generation model before entering business data into the text generation model to obtain descriptive text.

Specifically, the server may obtain sample service data and a preset standard text for describing service risk represented by the sample service data, and input the sample service data into a text generation model, where the text generation model may perform statistics on the sample service data according to different statistics modes, encode the statistics results to obtain coding features corresponding to each statistics result, and obtain basic features corresponding to the sample service data through the text generation model.

After the coding feature of each statistical result corresponding to the sample service data and the basic feature corresponding to the sample service data are obtained, a model can be generated through the text, and a risk identification result for the sample service data is determined based on the basic feature corresponding to the sample service data and the coding feature of each statistical result corresponding to the sample service data. And then, according to the risk identification result aiming at the sample service data, the basic characteristics corresponding to the sample service data and the coding characteristics of each statistical result corresponding to the sample service data, generating a description text for describing the service risk represented by the sample service data through a text generation model as the description text corresponding to the sample service data.

In order to further improve the accuracy of the generated text, the server may train the text generation model with the objective of optimizing minimizing the difference between the descriptive text corresponding to the sample business data and the standard text.

It should be noted that, the coding layer and the decoding layer may be set in the text generation model, and then the server may input the service data into the coding layer in the text generation model, so as to obtain the basic feature and the coding feature corresponding to each statistic result through the coding layer. The obtained coding features and basic features can then be input into a decoding layer in the text generation model again to obtain the risk identification result, and the descriptive text is generated through the decoding layer.

In the process of generating the descriptive text, after a word is obtained through the decoding layer, the word can be input into the decoding layer again, so that the decoding layer outputs the next word, and the like, and the decoding layer is worth outputting all the words. The decoding layer mentioned in the present specification may refer to GPT-3 (General Pre-trained Transformer-3, GPT-3), etc., however, the decoding model may also be other forms, and the present specification is not limited thereto.

The text generating method provided for one or more embodiments of the present specification further provides a corresponding text generating device based on the same thought, as shown in fig. 3.

Fig. 3 is a schematic diagram of an apparatus for text generation provided in the present specification, where the apparatus includes:

an acquisition module 300, configured to acquire service data;

the input module 302 is configured to input the service data into a preset text generation model, so that the text generation model performs statistics on the service data according to different statistical manners, obtain each statistical result, encode each statistical result, obtain a coding feature corresponding to each statistical result, and determine, through the text generation model, a basic feature corresponding to the service data;

a determining module 304, configured to determine, by using the text generation model, a risk identification result for the service data based on the basic feature and the coding feature corresponding to each statistical result;

and the generating module 306 is configured to generate, according to the risk identification result, the basic feature, and the coding feature corresponding to each statistical result, a description text for describing the business risk represented by the business data through the text generating model.

Optionally, the input module 302 is specifically configured to input the service data into a preset text generation model, so that the text generation model counts the service data according to each statistical mode, obtain a statistical result to be processed, determine an identification character corresponding to each numerical value in the statistical result to be processed in the statistical mode, and determine a statistical result corresponding to the statistical mode according to the identification character.

the input module 302 is specifically configured to input the service data into a preset text generation model, so that when the text generation model adopts the first statistical mode, the number of occurrences of each data included in the service data in the respective data category is determined, the identification character corresponding to each data in each data category is determined according to the identification character corresponding to the numerical value of the number in the first statistical mode, and the statistical result corresponding to the first statistical mode is determined according to the identification character corresponding to each data in each data category.

the input module 302 is specifically configured to input the service data into a preset text generation model, so that when the text generation model adopts the second statistical manner, a data form adopted by data in each data category in the service data is determined respectively, identification characters corresponding to each data in each data category are determined according to identification characters corresponding to different data forms in the second statistical manner, and a statistical result corresponding to the second statistical manner is determined according to the identification characters corresponding to each data in each data category.

the input module 302 is specifically configured to input the service data into a preset text generation model, so that when the text generation model adopts the third statistical manner, sort the data included in the service data in respective data types according to a preset arrangement sequence, determine, according to identification characters corresponding to different sorting positions under the third statistical manner, identification characters corresponding to the data in each data type, and determine, according to identification characters corresponding to the data in each data type, a statistical result corresponding to the third statistical manner.

Optionally, the generating module 306 is specifically configured to determine, according to the basic feature, a weight corresponding to each coding feature; generating each word according to the risk identification result, the weight and the coding features; and generating the descriptive text according to the generated segmentation words.

Optionally, the generating module 306 is specifically configured to determine a weighting vector according to the coding feature corresponding to each statistical result and the weight corresponding to each coding feature; obtaining a predicted word vector based on the weighted vector through the text generation model, and determining the word corresponding to the predicted word vector according to the corresponding relation between each word and each word vector recorded in a preset dictionary; re-inputting the word segment corresponding to the predicted word vector into the text generation model to generate the next word segment, and re-inputting the generated next word segment into the text generation model until all the word segments are determined.

Optionally, the apparatus further comprises:

a training module 308, configured to train the text generation model, where:

the training module 308 is specifically configured to obtain sample service data and a preset standard text for describing a service risk represented by the sample service data; inputting the sample service data into the text generation model, so that the text generation model carries out statistics on the sample service data according to different statistical modes to obtain statistical results corresponding to the sample service data, coding the statistical results corresponding to the sample service data to obtain coding characteristics of each statistical result corresponding to the sample service data, and determining basic characteristics corresponding to the sample service data through the text generation model; determining a risk identification result for the sample service data based on basic features corresponding to the sample service data and coding features of each statistical result corresponding to the sample service data through the text generation model; generating a description text for describing the service risk represented by the sample service data through the text generation model according to the risk identification result aiming at the sample service data, the basic characteristic corresponding to the sample service data and the coding characteristic of each statistical result corresponding to the sample service data, and taking the description text as the description text corresponding to the sample service data; and training the text generation model by taking the difference between the description text corresponding to the minimized sample service data and the standard text as an optimization target.

The present specification also provides a computer readable storage medium storing a computer program operable to perform a method of text generation as provided in fig. 1 above.

The present specification also provides a schematic structural diagram of an electronic device corresponding to fig. 1 shown in fig. 4. At the hardware level, as in fig. 4, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, although it may include hardware required for other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the method of text generation of fig. 1 described above. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely an example of the present specification and is not intended to limit the present specification. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A method of text generation, the method comprising:

acquiring service data;

2. The method of claim 1, wherein the business data table is input into a preset text generation model, so that the text generation model performs statistics on data in the business data table according to different statistical modes to obtain each statistical result, and the method specifically comprises the following steps:

3. The method of claim 2, the statistical means comprising: the first statistical mode is used for counting the occurrence times of each data contained in the service data under the data category of each data;

4. The method of claim 2, the statistical means comprising: the second statistical mode is used for determining a data form adopted by the data under each data category in the service data;

5. The method of claim 2, the statistical means comprising: the third statistical mode is used for sequencing all data contained in the service data in the data category of each data according to a preset arrangement sequence;

6. The method of claim 1, generating a description text for describing the business risk represented by the business data according to the risk identification result, the basic feature and the coding feature corresponding to each statistical result, specifically including:

7. The method of claim 6, generating each word segment according to the risk identification result, the weight and each coding feature, specifically comprising:

8. The method of claim 1, training the text generation model, comprising:

9. An apparatus for text generation, the apparatus comprising:

the acquisition module is used for acquiring service data;

10. The apparatus of claim 9, wherein the input module is specifically configured to input the service data into a preset text generation model, so that the text generation model counts the service data according to each statistical mode, obtain a statistical result to be processed, determine an identification character corresponding to each numerical value in the statistical result to be processed in the statistical mode, and determine a statistical result corresponding to the statistical mode according to the identification character.

11. The apparatus of claim 10, the statistical means comprising: the first statistical mode is used for counting the occurrence times of each data contained in the service data under the data category of each data;

the input module is specifically configured to input the service data into a preset text generation model, so that when the text generation model adopts the first statistical mode, the number of occurrences of each data included in the service data in each data category is respectively determined, the identification character corresponding to each data in each data category is determined according to the identification character corresponding to the number of occurrences in the first statistical mode, and the statistical result corresponding to the first statistical mode is determined according to the identification character corresponding to each data in each data category.

12. The apparatus of claim 10, the statistical means comprising: the second statistical mode is used for determining a data form adopted by the data under each data category in the service data;

13. The apparatus of claim 10, the statistical means comprising: the third statistical mode is used for sequencing all data contained in the service data in the data category of each data according to a preset arrangement sequence;

14. The apparatus of claim 9, wherein the generating module is specifically configured to determine, according to the basic feature, a weight corresponding to each coding feature; generating each word according to the risk identification result, the weight and the coding features; and generating the descriptive text according to the generated segmentation words.

15. The apparatus of claim 14, wherein the generating module is specifically configured to determine a weighting vector according to the coding feature corresponding to each statistical result and the weight corresponding to each coding feature; obtaining a predicted word vector based on the weighted vector through the text generation model, and determining the word corresponding to the predicted word vector according to the corresponding relation between each word and each word vector recorded in a preset dictionary; re-inputting the word segment corresponding to the predicted word vector into the text generation model to generate the next word segment, and re-inputting the generated next word segment into the text generation model until all the word segments are determined.

16. The apparatus of claim 9, the apparatus further comprising:

the training module is used for training the text generation model, wherein:

17. A computer readable storage medium storing a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-8.

18. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the preceding claims 1-8 when the program is executed.