CN115983227A

CN115983227A - File generation method, device, equipment and storage medium

Info

Publication number: CN115983227A
Application number: CN202211684920.0A
Authority: CN
Inventors: 付炜; 李媛媛
Original assignee: Shanghai Mobvoi Information Technology Co ltd
Current assignee: Shanghai Mobvoi Information Technology Co ltd
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2023-04-18

Abstract

The present disclosure provides a method, an apparatus, a device and a storage medium for generating a document, wherein the method comprises: acquiring title information, attribute information and commodity pictures of a target commodity; extracting description text features aiming at the target commodity on the basis of the title information and the attribute information; extracting picture characteristics of the commodity picture; splicing the picture characteristics, the description text characteristics and the prompt template characteristics corresponding to the target commodity to obtain target splicing characteristics; and inputting the target splicing characteristics into a pre-trained pattern generation model to obtain a description pattern corresponding to the target commodity. By adopting the method, the description case information of the commodity is generated by using the case generation model, so that the labor resource consumption for generating the commodity case is reduced to a greater extent, the novelty of the generated case is improved, and the generated description case can reflect the characteristics of the commodity better.

Description

File generation method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of deep learning technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a document.

Background

In recent years, the phenomenon of displaying and selling commodities through live broadcasting on a network is becoming more and more common. In the live broadcast process, the user can influence the cognition of the person watching the live broadcast on the commodity characteristics for the live broadcast file displayed on the commodity. Therefore, how to configure a suitable live file for the commodity is crucial.

At present, the live-broadcast file is mainly compiled manually or generated through a file template. However, manually writing live documents for each commodity consumes a lot of time and human resources due to the wide variety of commodities. The live broadcast file generated by the file template is inexplicable and lacks novelty, and the use of the live broadcast file lacking novelty can also influence the cognition of people watching the live broadcast on the commodity characteristics.

Therefore, how to improve the novelty of generating the file while reducing the consumption of manpower resources becomes an urgent problem to be solved.

Disclosure of Invention

The present disclosure provides a document generation method, apparatus, device and storage medium to at least solve the above technical problems in the prior art.

According to a first aspect of the present disclosure, there is provided a document generation method, the method comprising:

acquiring title information, attribute information and commodity pictures of a target commodity;

extracting descriptive text features for the target commodity based on the title information and the attribute information;

extracting picture characteristics of the commodity picture;

splicing the picture features, the description text features and prompt template features corresponding to the target commodity to obtain target splicing features;

inputting the target splicing characteristics into a pre-trained pattern generation model to obtain a description pattern corresponding to the target commodity; the pattern generation model is obtained by training a neural network to be trained in advance according to the splicing characteristics and the standard patterns corresponding to the sample commodities.

In an implementation manner, after the splicing feature is input into a pre-trained pattern generation model to obtain a target pattern corresponding to the target commodity, the method further includes:

extracting a predicted attribute text representing the attribute information of the target commodity from the description file;

comparing the predicted attribute text with the attribute information, and determining whether the description file has missing attribute text and/or redundant predicted attribute text and/or wrong predicted attribute text;

if the description file has missing attribute text, acquiring the missing attribute text from the attribute information and adding the missing attribute text into the description file, and/or if the description file has redundant predicted attribute text, removing the redundant predicted attribute text from the description file, and/or if the description file has wrong predicted attribute text, replacing the wrong attribute text with the attribute information corresponding to the wrong attribute text in the attribute information to obtain a modified description file.

In an embodiment, the extracting, from the description document, a predicted attribute text of attribute information representing the target product includes:

and inputting the description file into a pre-trained attribute extraction model to obtain a prediction attribute text representing the attribute information of the target commodity.

In an embodiment, the training method of the pattern generation model includes:

inputting the splicing characteristics corresponding to the sample commodity into a deep learning model to be trained to obtain a prediction case; the splicing characteristics are characteristics obtained by splicing description text characteristics of the sample commodities, picture characteristics of commodity pictures of the sample commodities and prompt template characteristics;

calculating a cross entropy loss function value of the deep learning model to be trained based on the standard case and the prediction case corresponding to the sample commodity;

adjusting parameters of a deep learning model to be trained based on the cross entropy loss function value;

when the iteration times of the model reach the preset iteration times, finishing training, and determining the model with the minimum value of the corresponding cross entropy loss function in the stored deep learning model as the pattern generation model;

and when the iteration times of the model do not reach the preset iteration times, returning to the step of inputting the splicing characteristics corresponding to the sample commodity into the deep learning model to be trained.

In an implementation manner, the calculating the cross entropy loss function value of the deep learning model to be trained based on the standard case and the prediction case corresponding to the sample commodity comprises:

calculating a cross entropy loss function value of the deep learning model to be trained based on the standard case and the prediction case corresponding to the sample commodity by adopting the following formula:

wherein L represents a cross entropy loss function value corresponding to a single character position in the prediction document, p (x) _i ) Representing the real probability value of the ith character in the current position in the preset word list determined based on the standard language, q (x) _i ) And the prediction probability value of the ith character in the preset word list at the current position determined based on the prediction language is shown, and n represents the number of the characters in the preset word list.

In an embodiment, the extracting the descriptive text feature for the target product based on the title information and the attribute information includes:

processing the title information according to a preset title template to obtain a target commodity title text;

converting the attribute information into an unstructured text based on a preset attribute template to obtain a target commodity attribute text;

and inputting a target splicing text obtained by splicing the target commodity title text and the target commodity attribute text into a pre-trained text feature extraction module to obtain description text features aiming at the target commodity.

In an implementation manner, the extracting the picture feature of the commodity picture includes:

carrying out normalization processing on the commodity picture to obtain a commodity picture after normalization processing;

and inputting the commodity picture after the normalization processing into a pre-trained picture visual characteristic extraction model to obtain the picture characteristic of the commodity picture.

According to a second aspect of the present disclosure, there is provided a document generation apparatus, the apparatus comprising:

the commodity information acquisition module is used for acquiring the title information, the attribute information and the commodity picture of the target commodity;

a text feature extraction module for extracting a description text feature for the target commodity based on the title information and the attribute information;

the picture feature extraction module is used for extracting the picture features of the commodity picture;

the feature fusion module is used for splicing the picture features, the description text features and prompt template features corresponding to the target commodity to obtain target splicing features;

the pattern generation module is used for inputting the target splicing characteristics into a pre-trained pattern generation model to obtain a description pattern corresponding to the target commodity; the pattern generation model is obtained by training a neural network to be trained in advance according to the splicing characteristics and the standard patterns corresponding to the plurality of sample commodities.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods of the present disclosure.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the present disclosure.

The title information, the attribute information and the commodity picture of the target commodity are acquired by the file generation method, the file generation device, the equipment and the storage medium; extracting description text features for the target commodity based on the title information and the attribute information; extracting picture characteristics of the commodity picture; splicing the picture characteristics, the description text characteristics and the prompt template characteristics corresponding to the target commodity to obtain target splicing characteristics; and inputting the target splicing characteristics into a pre-trained pattern generation model to obtain a description pattern corresponding to the target commodity. According to the method, the splicing characteristics and the standard case corresponding to a plurality of sample commodities are utilized, the neural network to be trained is trained in advance to obtain the case generation model, the description case information of the commodities is generated by utilizing the case generation model, and the labor resource consumption for generating the commodity case is reduced to a great extent. In addition, the title information, the attribute information and the commodity picture of the commodity are used for generating the description file, so that the generated description file comprehensively contains various information of the commodity, the novelty of the generated file is improved by adopting the method, and the generated description file can better reflect the characteristics of the commodity.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

FIG. 1 is a schematic diagram illustrating an implementation flow of a method for generating a document provided by an embodiment of the present disclosure;

FIG. 2 illustrates a flow chart for extracting descriptive textual features provided by the present disclosure;

FIG. 3 illustrates a schematic diagram of attribute information for an article provided by the present disclosure;

FIG. 4 is a schematic diagram illustrating a method for determining an attribute text of a commodity based on a preset attribute template according to the present disclosure;

FIG. 5 shows a flow chart of an image feature extraction method provided by the present disclosure;

FIG. 6 illustrates a schematic diagram of generating a descriptive case provided by the present disclosure;

fig. 7 shows a block diagram of a picture encoder provided by the present disclosure;

FIG. 8 illustrates a schematic diagram of a splice feature provided by the present disclosure;

FIG. 9 illustrates a training flow diagram of a pattern generation model provided by the present disclosure;

FIG. 10 is a schematic diagram of a document generation provided by the present disclosure;

FIG. 11 is a flow chart of a correction description case provided by the present disclosure;

FIG. 12 is a schematic structural diagram of a document generation apparatus provided in an embodiment of the present disclosure;

fig. 13 is a schematic diagram illustrating a composition structure of an electronic device according to an embodiment of the disclosure.

Detailed Description

In order to make the objects, features and advantages of the present disclosure more apparent and understandable, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

Because the existing mode of manually writing the live-broadcast file consumes a large amount of time and manual resources, the mode of generating the live-broadcast file through the file template leads the generated file to be lack of novelty at all times, and further influences the cognition of people watching the live broadcast on the commodity characteristics. Therefore, in order to improve the novelty of generating a document while reducing the consumption of human resources, the present disclosure provides a document generation method, apparatus, device and storage medium. The method provided by the disclosure can be applied to electronic equipment such as mobile phones, personal computers and servers.

The file generation method, the file generation device, the file generation equipment and the storage medium can be applied to application scenes of displaying commodities and selling commodities through network live broadcast, and the commodities can be any products which can be sold through network live broadcast, such as clothes, digital products, electric appliances and the like.

The technical solutions of the embodiments of the present disclosure will be described below with reference to the drawings in the embodiments of the present disclosure.

Fig. 1 shows a schematic flow chart of an implementation of a document generation method provided by an embodiment of the present disclosure, and as shown in fig. 1, the method includes:

s101, title information, attribute information and commodity pictures of the target commodities are obtained.

The target commodity refers to a commodity which needs to be displayed or sold through live broadcasting.

In the present disclosure, the name of the target commodity may be obtained as the title information of the target commodity through a commodity specification provided by a commodity manufacturer, and the attribute information and the commodity picture of the target commodity are directly obtained from the commodity specification. For example, the product specification of the target product a includes the text "product: XX longuette, pattern: pure color, color: butter green, the material composition is: polyester fiber, sleeve length: long sleeves ", and a frontal display of the article XX. The title information of the target commodity a obtained according to the commodity specification is as follows: XX longuette; the attribute information of the target commodity a is: the patterns are pure colors, the color is avocado green, the material component is polyester fiber, and the sleeve length is long sleeves; the commodity picture is a front display picture of the commodity XX.

In the present disclosure, the title information, the attribute information, and the product picture of the target product may be directly acquired from each e-commerce website.

And S102, extracting the description text characteristics aiming at the target commodity based on the title information and the attribute information.

In an implementation manner, fig. 2 shows a flowchart of extracting descriptive text features provided by the present disclosure, and as shown in fig. 2, the extracting descriptive text features for the target product based on the title information and the attribute information may include:

s201, processing the title information according to a preset title template to obtain a title text of the target commodity.

In this disclosure, the preset title template may be "product is title" or "commodity is title", and the text obtained after replacing "title" in the preset title template with the title information of the target commodity is used as the title text of the target commodity. For example, if the preset title template is "product is title", and the title information of the target product a is XX skirt, the title information of the target product a is used to replace the title in the preset title template, and the target product title text of the target product a is "product is XX skirt".

S202, converting the attribute information into an unstructured text based on a preset attribute template to obtain a target commodity attribute text.

Fig. 3 illustrates a schematic diagram of attribute information of an article provided by the present disclosure, and as shown in fig. 3, the attribute information of the article is structured data of (key, value), where key represents an attribute type, and value represents a specific attribute value, and as shown in fig. 3, the attribute type includes "pattern", "collar" ... "skirt length", and the text on the right side of each attribute type is the specific attribute value corresponding to the attribute type.

In the present disclosure, the preset property template is a template that can convert a structured text into an unstructured text, for example, the preset property template may be "a is B" and "a specific value is B". And replacing A in the preset attribute template with the attribute type text in the attribute information of the target commodity, and replacing B in the preset attribute template with the specific attribute value text in the attribute information of the target commodity to obtain the target commodity attribute text corresponding to the target commodity.

Fig. 4 shows a schematic diagram for determining a product attribute text based on a preset attribute template, where as shown in fig. 4, the left square box is structured attribute information of a target product, the preset attribute template is "a is B", the preset attribute template is replaced by an attribute type text in the attribute information of the target product to be a in "a is B", and the specific attribute value text in the attribute information of the target product is replaced by a B in "a is B", and an obtained unstructured natural language text in the right square box is a target product attribute text corresponding to the target product.

S203, splicing the title text of the target commodity and the attribute text of the target commodity to obtain a target spliced text, and extracting description text features aiming at the target commodity based on the target spliced text.

According to the method and the device, the title text of the target commodity and the attribute text of the target commodity can be spliced in sequence directly, and the unstructured natural language text is obtained and serves as the target spliced text. For example, if the target product title text of the target product B is "the product is XY pants", and the target product attribute text of the target product B is "the pattern is solid, the waist type is high waist, and the trouser length is seventeen", the target stitched text obtained by stitching the target product title text and the target product attribute text of the target product B is "the product is XY pants, the pattern is solid, the waist type is high waist, and the trouser length is seventeen".

In the present disclosure, token refers to each participle obtained after a text is subjected to a participle model, if a target spliced text can be split into n tokens through the participle model, the target spliced text is input into a pre-trained text feature extraction module, and the dimension of the obtained description text feature for a target commodity is [ n, M ], that is, each token corresponds to an M-dimensional vector, the value of M depends on the structure of the text feature extraction model, for example, for a bert (Bidirectional Encoder representation from Transformers) basic model, the value of M is 768.

In the present disclosure, the embedding layer of GPT2 (general Pre-trained Transformer 2, multitask trainer) may be utilized to extract descriptive text features for a target commodity from a target mosaic text.

In another possible implementation, the extracting the descriptive text feature for the target product based on the title information and the attribute information may include the following steps A1 to A3:

step A1, determining title information as a target commodity title text, and converting the attribute information into an unstructured text as a target commodity attribute text;

and step A2, inputting the title text of the target commodity into a pre-trained text feature extraction module to obtain title text features, and inputting the attribute text of the target commodity into the pre-trained text feature extraction module to obtain attribute text features.

And step A3, splicing the title text features and the attribute text features to obtain description text features for the target commodity.

The Pre-trained text feature extraction module is an embedding layer which can be a GPT2 (general Pre-trained Transformer 2, multitask trainer).

And S103, extracting the picture characteristics of the commodity picture.

In an implementation manner, fig. 5 shows a flowchart of an image feature extraction method provided by the present disclosure, and as shown in fig. 5, the extracting a picture feature of the commodity picture may include:

s501, carrying out normalization processing on the commodity picture to obtain the commodity picture after the normalization processing.

In the present disclosure, the commodity picture may be converted into an RGB format, and then the commodity picture is cut to a preset size, where the preset size may be set according to actual requirements, for example, the size may be set to 214 × 214. Then, normalization processing is performed on the cropped commodity picture, specifically, mean (mean) of the normalization processing may be set to [0.5,0.5 ], std (Standard development, standard Deviation) may be set to [0.5,0.5 ], and the normalized pixel value is calculated for each pixel point of the cropped commodity picture by using the following formula:

X*＝(X-mean)/std

wherein, X is the pixel value after the pixel point normalization, and X is the original pixel value of the pixel point.

Finally, the cut commodity picture is processed into an feature tensor with one dimension of [3, z1, z2], wherein z1 and z2 are preset sizes, and the values of z1 and z2 can be 214, and then the cut commodity picture is processed into an feature tensor with one dimension of [3,214 ].

S502, inputting the commodity picture after the normalization processing into a pre-trained picture visual characteristic extraction model to obtain the picture characteristic of the commodity picture.

In the present disclosure, the pre-trained picture visual feature extraction model may include a picture encoder, an MLP (multi layer Perceptron), and a Resize module. The picture encoder may employ a Pre-trained Clip (multi-modal Pre-trained algorithm) model.

The image encoder can perform feature extraction on the commodity image after the normalization processing to obtain an image feature vector which is mapped to a vector space by the commodity image after the normalization processing, and the image feature vector comprises visual features of the image.

Because the picture and the text are not in the same semantic space, in order to better generate the description text, the MLP and Resize modules can be continuously used to map the picture feature vector to the text semantic space, and the feature vector with the dimension [ M, M ] is obtained as the picture feature of the commodity picture. If the value of M is 768, the picture features of the commodity picture are feature vectors formed by M tokens with 768 dimensions.

And S104, splicing the picture features, the description text features and the prompt template features corresponding to the target commodity to obtain target splicing features.

In the present disclosure, a prompt (prompt template feature) is defined in advance for each type of product, and is used as a prompt for generating a document. The prompt vector corresponding to each type of commodity is initialized randomly at first and then updated along with the iteration of the pattern generation model.

The prompt template features corresponding to the target commodities consist of vectors of k tokens, each token corresponds to a vector of M dimensions, namely the prompt template features corresponding to the target commodities are feature vectors with dimensions [ k,768 ].

For example, if the picture feature of the target commodity is a vector of [ M, M ], the description text feature of the target commodity is a vector of [ n, M ], and the prompt template feature corresponding to the target commodity is a vector of [ k, M ], the picture feature, the description text feature, and the prompt template feature corresponding to the target commodity are spliced to obtain a vector with a target splicing feature of [ k + n + M, M ].

And S105, inputting the target splicing characteristics into a pre-trained pattern generation model to obtain a description pattern corresponding to the target commodity.

The pattern generation model is obtained by training a neural network to be trained in advance according to the splicing characteristics and the standard patterns corresponding to the sample commodities.

In the disclosure, a trained GPT2 decoder can be used as a pattern generation model, and the target splicing characteristics are input into the GPT2 decoder to obtain a description pattern corresponding to the target commodity.

Fig. 6 shows a schematic diagram of generating a description document provided by the present disclosure, and as shown in fig. 6, a description text feature for a target product is determined based on acquired title information and attribute information 601 of the target product; then, extracting picture characteristics of a commodity picture 602 of the target commodity; splicing the picture characteristics, the description text characteristics and the prompt template characteristics corresponding to the target commodity to obtain target splicing characteristics; inputting the target splicing characteristics into a pre-trained pattern generation model to obtain a description pattern 603 corresponding to the target commodity.

By adopting the method provided by the disclosure, title information, attribute information and commodity pictures of the target commodity are obtained; extracting description text features for the target commodity based on the title information and the attribute information; extracting picture characteristics of the commodity picture; splicing the picture characteristics, the description text characteristics and the prompt template characteristics corresponding to the target commodity to obtain target splicing characteristics; and inputting the target splicing characteristics into a pre-trained pattern generation model to obtain a description pattern corresponding to the target commodity. According to the method, the splicing characteristics and the standard case corresponding to a plurality of sample commodities are utilized, the neural network to be trained is trained in advance to obtain the case generation model, the description case information of the commodities is generated by utilizing the case generation model, and the labor resource consumption for generating the commodity case is reduced to a great extent. In addition, the title information, the attribute information and the commodity picture of the commodity are used for generating the description file, so that the generated description file comprehensively contains various information of the commodity, the novelty of the generated file is improved by adopting the method, and the generated description file can better reflect the characteristics of the commodity.

In the present disclosure, the pattern generation model may be trained in advance. Before training the pattern generation model, preparation steps such as data acquisition, data preprocessing, graphic coding, text coding, prompt initialization, feature fusion and the like need to be carried out. See in particular the processing steps of the first to sixth steps below.

First, data acquisition refers to acquiring commodity information corresponding to sample commodities used for training to obtain a pattern generation model. Specifically, the commodity information of various types of sample commodities can be crawled from various E-commerce websites. The commodity information includes: the commodity title, the commodity attribute information, the commodity picture and the commodity propaganda information.

The commodity publicity information can be used for determining the standard case corresponding to the sample commodity. Specifically, information which is not contained in the commodity title, the commodity attribute information and the commodity picture can be filtered aiming at the commodity propaganda information, so that the commodity propaganda information of the commodity is consistent with the information in the commodity title, the commodity attribute information and the commodity picture.

Specifically, the commodity propaganda information can be divided into short sentences according to punctuation marks. A portion of the sample merchandise may then be selected, and the partial phrases manually labeled in view of the merchandise promotional information for the portion of the sample merchandise. The phrases unrelated to the product title, the product attribute information, and the information product in the product picture are labeled as 0, and the related phrases are labeled as 1.

For example, the sample commodity a has the commodity advertising information of "an elegant skirt that can simply and beautify you, a striking black fungus that turns up, and youth with vitality. A small A-type waist-tightening model is thin and cannot be chosen by people when the crotch is covered. The product is particularly good for home selling, and the needed parents can get on the bill; the product title of the sample product a is "xy longuette", and the product attribute information of the sample product a is the information shown in fig. 3.

The sample commodity A can be divided into short sentences according to punctuation marks, namely ' elegant skirt capable of simply and simply beautifying the body of a user ', ' colliding edible fungus turning-over greatly ', ' youth active feeling ', ' small waisting A edition ', ' thin and hidden crotch not chosen ', ' good home for the user ' and ' needed parents ' getting down the bill '.

Short sentences which are irrelevant to the product title, the product attribute information and the product information in the product picture, such as "the product is particularly good for the family to sell" and "the required parents take the order" in the product publicity information of the sample product A can be marked as 0, and other short sentences which are relevant to the product information and correct can be marked as 1. The labeled merchandise promotional information may then be utilized to train a Bert (binary) model. And then, predicting short sentences of other unmarked commodity propaganda information by using the trained Bert model, reserving the short sentences predicted to be 1, filtering the short sentences predicted to be 0, and finally splicing the reserved short sentences to serve as the standard case corresponding to the sample commodity.

And secondly, after the commodity title, the commodity attribute information and the commodity picture of the sample commodity are obtained, data preprocessing needs to be carried out on the commodity title, the commodity attribute information and the commodity picture.

The method specifically comprises the step of converting structured commodity attribute information into an unstructured natural language text aiming at the preprocessing of the commodity attribute information of the sample commodity.

Various attribute information of sample commodities are comprehensively depicted from different dimensions, but because the attribute information of the sample commodities is various and not every attribute information is valuable, the attribute information of the sample commodities needs to be screened out, the attribute information which can embody the characteristics of the sample commodities is screened out, and other irrelevant attribute information is filtered out. Specifically, important attributes may be retained by setting keywords. For example, keywords such as "style", "type", "collar", "material composition", "sleeve length", "clothing length" and the like may be set, and only these attributes are retained, and other attributes are filtered out. Unimportant attributes can be screened out manually by workers.

And then format conversion is performed on the retained commodity attribute information. Specifically, since the attributes of the commodity are all (key, value) structured data, it needs to be converted into unstructured natural language text before being input into the model. Therefore, a preset attribute template, for example, a preset attribute template with "key is value" is adopted to convert all attribute information (key, value) into unstructured texts and concatenate them to be used as an attribute text of a sample commodity, which may be specifically shown in fig. 4.

Preprocessing of the product title for the sample product: the commodity title usually contains rich commodity information and can supplement the missing commodity information in the commodity attribute information, so the commodity title also has very important function on generating the description file of the sample commodity. For example, the title of a clothing article usually contains information about the style, model, size, thickness, and style of the clothing.

Specifically, a preset title template may be adopted to convert the product title of the sample product into a title text. For example, if the preset title template is "product is title", the text obtained after replacing "title" in the template with a specific commodity title is the title text corresponding to the commodity title of the sample commodity.

Preprocessing a commodity picture aiming at a sample commodity: the commodity picture may be converted into an RGB format, and then the commodity picture is cut to a preset size, where the preset size may be set according to actual requirements, for example, the size may be set to 214 × 214. Then, normalization processing is performed on the cropped commodity picture, specifically, mean (mean) of the normalization processing may be set to [0.5,0.5 ], std (Standard development, standard Deviation) may be set to [0.5,0.5 ], and the normalized pixel value is calculated for each pixel point of the cropped commodity picture by using the following formula:

X*＝(X-mean)/std

And thirdly, the graphic coding refers to training a picture visual characteristic extraction model by utilizing the preprocessed commodity picture.

The picture visual feature extraction model may be a picture encoder. According to the method and the device, the commodity pictures of the sample commodities can be subjected to feature extraction by constructing the picture encoder. Specifically, fig. 7 shows a structure diagram of a picture encoder provided by the present disclosure, and as shown in fig. 7, the picture visual feature extraction model may include a pre-trained CLip model as a picture encoder, an MLP module and a Resize module. The picture encoder can encode each commodity picture image into a 512-dimensional picture feature vector, and the picture feature vector contains the visual features of the commodity. Because the picture and the text are not in the same semantic space, in order to better generate the description text, the MLP and Resize modules can be continuously used to map the picture feature vector to the text semantic space, and the feature vector with the dimension [ M, M ] is obtained as the picture feature of the commodity picture. If the value of M is 768, the picture features of the commodity picture are feature vectors formed by M tokens with 768 dimensions.

Fourthly, text coding refers to extracting description text features aiming at the target commodity by using the title text and the attribute text of the sample commodity.

Specifically, taking an embedding layer of the GPT2 as a text encoder as an example, the embedding layer of the GPT2 may be used to extract a text feature vector from a spliced text of an attribute text and a title text of a sample commodity, and the text feature vector is used as a description text feature of a target commodity and includes a semantic feature of the sample commodity. Assuming that the spliced text is n tokens in total, the dimensionality of the finally obtained text feature vector is [ n, M ], each text token corresponds to an M-dimensional vector, and the value of M can be 768.

The order of the third step and the fourth step is not limited.

Fifthly, performing Prompt initialization: in the disclosure, a learnable prompt may be defined for each type of commodity as a prompt for generating a document, where the prompt is composed of k tokens, each token corresponds to an M-dimensional vector, and then the dimension of the prompt feature vector is [ k, M ]. The prompt feature vector is initialized randomly and updated continuously with training of the pattern generation model.

Sixthly, feature fusion: splicing the image feature vector, the text feature vector and the prompt feature vector obtained through the above steps in a token dimension for feature fusion, wherein fig. 8 shows a splicing feature schematic diagram provided by the present disclosure, as shown in fig. 8, the dimension of the prompt feature vector is [ k,768], the dimension of the text feature vector is [ n,768], the dimension of the image feature vector is [ m,768], and the dimension of the splicing feature is [ k + n + m,768].

After the processing steps from the first step to the sixth step, the splicing characteristics corresponding to the sample commodity and the standard pattern can be utilized to train the pattern to generate a model. Specifically, in an implementation manner, fig. 9 shows a training flowchart of a pattern generation model provided by the present disclosure, and as shown in fig. 9, a training manner of the pattern generation model includes:

and S901, inputting the splicing characteristics corresponding to the sample commodity into a deep learning model to be trained to obtain a prediction pattern.

And the splicing characteristics are characteristics obtained by splicing the description text characteristics of the sample commodity, the picture characteristics of the sample commodity picture and the prompt template characteristics.

The prompt template features are prompt features, the prompt features are initially randomly initialized prompt feature vectors, and the prompt features can be updated with training of the pattern generation model.

Specifically, the deep learning model to be trained may be a GPT2 decoder. Fig. 10 is a schematic diagram of generating a pattern provided by the present disclosure, and as shown in fig. 10, after the splicing features corresponding to the sample commodity are input into the deep learning model to be trained, the features of the correspondingly generated pattern can be obtained as "caption tokens" in fig. 10.

And S902, calculating a cross entropy loss function value of the deep learning model to be trained based on the standard pattern corresponding to the sample commodity and the prediction pattern.

Specifically, the following formula may be adopted in the present disclosure, and based on the standard pattern and the prediction pattern corresponding to the sample commodity, the cross entropy loss function value of the deep learning model to be trained is calculated:

And S903, adjusting the parameters of the deep learning model to be trained based on the cross entropy loss function value.

The principle of adjusting the parameters of the deep learning model to be trained based on the cross entropy loss function value is as follows: and adjusting the value of a parameter related to the cross entropy loss function value in the deep learning model to be trained, and reducing the cross entropy loss function value by adjusting the parameter.

And S904, when the iteration times of the model reach the preset iteration times, finishing the training, and determining the model with the minimum value of the corresponding cross entropy loss function in the stored deep learning model as the pattern generation model.

The preset iteration number may be set according to an actual application scenario, for example, set to 500 or 1000, and is not specifically limited herein.

In the disclosure, the deep learning model to be trained obtained after each training may be stored, and then the value of the cross entropy loss function corresponding to each stored deep learning model to be trained may be calculated by using the validation set. And then, when the iteration times of the model reach the preset iteration times, finishing the training, and determining the model with the minimum value of the corresponding cross entropy loss function in the stored deep learning model as the pattern generation model.

And S905, when the iteration frequency of the model does not reach the preset iteration frequency, returning to the step of inputting the splicing characteristics corresponding to the sample commodity into the deep learning model to be trained.

By adopting the method provided by the disclosure, the splicing characteristics and the standard case corresponding to a plurality of sample commodities are utilized, the neural network to be trained is trained in advance to obtain the case generation model, the description case information of the commodities is generated by utilizing the case generation model, and the consumption of manual resources for generating the commodity cases is reduced to a greater extent. In addition, the title information, the attribute information and the commodity picture of the commodity are used for generating the description file, so that the generated description file comprehensively contains various information of the commodity, the novelty of the generated file is improved by adopting the method, and the generated description file can better reflect the characteristics of the commodity.

In an implementation mode, due to the uncontrollable character generation model, the generated description character may have the problems that key information is lost or information which is not in accordance with actual conditions is generated. In order to solve these problems, the generated description document needs to be corrected.

Specifically, fig. 11 is a flowchart of correcting a description document provided by the present disclosure, and as shown in fig. 11, after the splicing feature is input into a pre-trained document generation model to obtain a target document corresponding to the target commodity, the method further includes:

and S1101, extracting a prediction attribute text representing the attribute information of the target commodity from the description file.

In the present disclosure, the predicted attribute text of the attribute information of the target product may be extracted by an attribute extraction model. Specifically, the description file may be input into a pre-trained attribute extraction model to obtain a predicted attribute text representing attribute information of the target commodity.

In the disclosure, the attribute extraction model may be trained by using the commodity propaganda information of the sample commodity in advance, specifically, the attribute information describing the sample attribute in the commodity propaganda information of the sample commodity may be labeled as an entity by using a BIO (three-dimensional labeling) marker, and other characters are labeled as "O", so as to obtain a standard labeling sequence of the commodity propaganda language. Then, the NER (named entity recognition) model is trained by utilizing the commodity propaganda information and the standard marking sequence, and the adopted NER model can be a Bert-BilSTM-CRF model.

Specifically, the process of training the Bert-BilSTM-CRF model may include: and inputting the commodity propaganda information and the standard labeling sequence of the sample commodity into a Bert-BilSTM-CRF model to be trained, calculating the maximum likelihood loss by using the final CRF module, and then updating the model parameters by using the loss. When the loss value of the model on the verification set does not decrease any more, the training is stopped.

In the present disclosure, a prediction attribute text representing the attribute information of the target commodity may also be extracted from the description document by manual screening.

S1102, comparing the predicted attribute text with the attribute information, and determining whether the description file has missing attribute text and/or redundant predicted attribute text and/or wrong predicted attribute text.

S1103, if the description file has missing attribute texts, acquiring the missing attribute texts from the attribute information and adding the missing attribute texts into the description file, and/or if the description file has redundant predicted attribute texts, removing the redundant predicted attribute texts from the description file, and/or if the description file has wrong predicted attribute texts, replacing the wrong attribute texts with the attribute information corresponding to the wrong attribute texts in the attribute information, so as to obtain a modified description file.

And comparing the predicted attribute text with the attribute information, and if the description file has the missing attribute text, namely the attribute text appears in the attribute information of the target commodity but does not appear in the description file, adding the missing attribute text into the description file through a preset attribute template.

If the redundant predicted attribute text exists in the description file, namely the attribute text appears in the description file but does not appear in the attribute information of the target commodity, the attribute text is the content generated by the blank, so that the attribute text can be deleted from the description file in order to ensure the reliability of the generated description file.

If the wrong predicted attribute text exists in the description file, namely the predicted attribute text simultaneously appears in the description file and the attribute information of the target commodity, but the attribute values conflict, so that the attribute information corresponding to the wrong attribute text in the attribute information needs to be used for replacing the wrong attribute text, and the accuracy of the generated description file is ensured.

The corrected description file is used as a modified description file and is used as live broadcast content to be displayed to a user when commodities are displayed or sold through network live broadcast, so that the user can know the characteristics of the commodities.

By adopting the method provided by the disclosure, the description case generated by the case generation model can be corrected, and the accuracy of the generated description case is further ensured.

Based on the same inventive concept, according to the method for generating a document provided by the above embodiment of the present disclosure, correspondingly, another embodiment of the present disclosure further provides an apparatus for generating a document, a schematic structural diagram of which is shown in fig. 12, and specifically includes:

a commodity information obtaining module 1201, configured to obtain title information, attribute information, and a commodity picture of a target commodity;

a text feature extraction module 1202, configured to extract a description text feature for the target product based on the title information and the attribute information;

a picture feature extraction module 1203, configured to extract picture features of the commodity picture;

the feature fusion module 1204 is configured to splice the picture features, the description text features, and the prompt template features corresponding to the target commodity to obtain target splicing features;

the pattern generation module 1205 is configured to input the target splicing characteristics into a pre-trained pattern generation model to obtain a description pattern corresponding to the target commodity; the pattern generation model is obtained by training a neural network to be trained in advance according to the splicing characteristics and the standard patterns corresponding to the plurality of sample commodities.

By adopting the device provided by the disclosure, title information, attribute information and commodity pictures of the target commodity are obtained; determining a description text feature for the target commodity based on the title information and the attribute information; extracting picture characteristics of the commodity picture; splicing the picture characteristics, the description text characteristics and the prompt template characteristics corresponding to the target commodity to obtain target splicing characteristics; and inputting the target splicing characteristics into a pre-trained pattern generation model to obtain a description pattern corresponding to the target commodity. According to the method, the splicing characteristics and the standard case corresponding to a plurality of sample commodities are utilized, the neural network to be trained is trained in advance to obtain the case generation model, the description case information of the commodities is generated by utilizing the case generation model, and the labor resource consumption for generating the commodity case is reduced to a great extent. In addition, the title information, the attribute information and the commodity picture of the commodity are used for generating the description file, so that the generated description file comprehensively contains various information of the commodity, the novelty of the generated file is improved by adopting the device disclosed by the invention, and the generated description file can better reflect the characteristics of the commodity.

In one embodiment, the apparatus further comprises:

a pattern correction module (not shown in the figure) for extracting a predicted attribute text representing the attribute information of the target commodity from the description pattern; comparing the predicted attribute text with the attribute information to determine whether the description file has missing attribute text and/or redundant predicted attribute text and/or wrong predicted attribute text; if the description file has the missing attribute text, acquiring the missing attribute text from the attribute information and adding the missing attribute text into the description file, and/or if the description file has redundant prediction attribute text, removing the redundant prediction attribute text from the description file, and/or if the description file has wrong prediction attribute text, replacing the wrong attribute text with the attribute information corresponding to the wrong attribute text in the attribute information to obtain the modified description file.

In an implementation manner, the pattern correction module is specifically configured to input the description pattern into a pre-trained attribute extraction model to obtain a predicted attribute text representing attribute information of the target product.

In one embodiment, the apparatus further comprises:

the model training module (not shown in the figure) is used for inputting the splicing characteristics corresponding to the sample commodity into the deep learning model to be trained to obtain a prediction case; the splicing characteristics are characteristics obtained by splicing description text characteristics of the sample commodities, picture characteristics of commodity pictures of the sample commodities and prompt template characteristics; calculating a cross entropy loss function value of the deep learning model to be trained based on the standard case and the prediction case corresponding to the sample commodity; adjusting parameters of a deep learning model to be trained based on the cross entropy loss function value; when the iteration times of the model reach the preset iteration times, finishing the training, and determining the model with the minimum value of the corresponding cross entropy loss function in the stored deep learning model as the pattern generation model; and when the iteration times of the model do not reach the preset iteration times, returning to the step of inputting the splicing characteristics corresponding to the sample commodity into the deep learning model to be trained.

In an implementation manner, the model training module is specifically configured to calculate a cross entropy loss function value of the deep learning model to be trained based on a preset word list and the prediction case by using the following formula:

In an implementation manner, the text feature extraction module 1202 is specifically configured to process the title information according to a preset title template to obtain a target product title text; converting the attribute information into an unstructured text based on a preset attribute template to obtain a target commodity attribute text; and inputting a target splicing text obtained by splicing the target commodity title text and the target commodity attribute text into a pre-trained text feature extraction module to obtain a description text feature for the target commodity.

In an implementation manner, the picture feature extraction module 1203 is specifically configured to perform normalization processing on the commodity picture to obtain a commodity picture after the normalization processing; and inputting the commodity picture after the normalization processing into a pre-trained picture visual characteristic extraction model to obtain the picture characteristic of the commodity picture.

By adopting the device provided by the disclosure, the neural network to be trained is trained in advance by utilizing the splicing characteristics and the standard case corresponding to the plurality of sample commodities to obtain the case generation model, and the description case information of the commodities is generated by utilizing the case generation model, so that the consumption of manual resources for generating the commodity case is reduced to a greater extent. In addition, the title information, the attribute information and the commodity picture of the commodity are used for generating the description file, so that the generated description file comprehensively contains various information of the commodity, the novelty of the generated file is improved by adopting the device disclosed by the invention, and the generated description file can better reflect the characteristics of the commodity. In addition, the description file generated by the file generation model can be corrected, and the accuracy of the generated description file is further ensured.

The present disclosure also provides an electronic device and a readable storage medium according to an embodiment of the present disclosure.

Fig. 13 shows a schematic block diagram of an example electronic device 1300 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 13, the apparatus 1300 includes a computing unit 1301 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1302 or a computer program loaded from a storage unit 1308 into a Random Access Memory (RAM) 1303. In the RAM 1303, various programs and data necessary for the operation of the device 1300 can also be stored. The calculation unit 1301, the ROM 1302, and the RAM 1303 are connected to each other via a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.

A number of components in the device 1300 connect to the I/O interface 1305, including: an input unit 1306 such as a keyboard, a mouse, or the like; an output unit 1307 such as various types of displays, speakers, and the like; a storage unit 1308 such as a magnetic disk, optical disk, or the like; and a communication unit 1309 such as a network card, modem, wireless communication transceiver, etc. A communication unit 1309 allows the device 1300 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 1301 may be a variety of general and/or special purpose processing components that include processing and computing capabilities. Some examples of computing unit 1301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1301 performs the respective methods and processes described above, such as the document generation method. For example, in some embodiments, the document generation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1308. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 1300 via ROM 1302 and/or communications unit 1309. When the computer program is loaded into RAM 1303 and executed by computing unit 1301, one or more steps of the document generation method described above may be performed. Alternatively, in other embodiments, the computing unit 1301 may be configured to perform the pattern generation method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, "a plurality" means two or more unless specifically limited otherwise.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present disclosure, and shall cover the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method for generating a document, the method comprising:

extracting picture characteristics of the commodity picture;

inputting the target splicing characteristics into a pre-trained pattern generation model to obtain a description pattern corresponding to the target commodity; the pattern generation model is obtained by training a neural network to be trained in advance according to the splicing characteristics and the standard patterns corresponding to the plurality of sample commodities.

2. The method according to claim 1, wherein after the splicing features are input into a pre-trained pattern generation model to obtain a target pattern corresponding to the target commodity, the method further comprises:

if the description file has the missing attribute text, acquiring the missing attribute text from the attribute information and adding the missing attribute text into the description file, and/or if the description file has redundant prediction attribute text, removing the redundant prediction attribute text from the description file, and/or if the description file has wrong prediction attribute text, replacing the wrong attribute text with the attribute information corresponding to the wrong attribute text in the attribute information to obtain the modified description file.

3. The method according to claim 2, wherein the extracting of the predicted attribute text representing the attribute information of the target product from the description document comprises:

and inputting the description file into a pre-trained attribute extraction model to obtain a predicted attribute text representing the attribute information of the target commodity.

4. The method of claim 1, wherein the training of the pattern generation model comprises:

when the iteration times of the model reach the preset iteration times, finishing the training, and determining the model with the minimum value of the corresponding cross entropy loss function in the stored deep learning model as the pattern generation model;

5. The method of claim 4, wherein the calculating the cross-entropy loss function value of the deep learning model to be trained based on the standard pattern and the prediction pattern corresponding to the sample commodity comprises:

wherein L represents a cross entropy loss function value corresponding to a single character position in the prediction document, p (x) _i ) Representing the real probability value of the ith character appearing at the current position in the preset word list determined based on the standard case, q (x) _i ) And the prediction probability value of the ith character in the preset word list at the current position is determined based on the prediction scheme, and n represents the number of the characters in the preset word list.

6. The method of claim 1, wherein the extracting descriptive text features for the target product based on the title information and the attribute information comprises:

and inputting a target splicing text obtained by splicing the target commodity title text and the target commodity attribute text into a pre-trained text feature extraction module to obtain a description text feature for the target commodity.

7. The method according to claim 1, wherein the extracting the picture feature of the commodity picture comprises:

8. A document creation apparatus, comprising:

the feature fusion module is used for splicing the picture features, the description text features and the prompt template features corresponding to the target commodities to obtain target splicing features;

9. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of claims 1-7.