CN113256007B

CN113256007B - Multi-mode-oriented new product sales forecasting method and device

Info

Publication number: CN113256007B
Application number: CN202110593370.0A
Authority: CN
Inventors: 朱海洋; 陈为; 周俊; 严凡; 钱中昊; 夏祯锋
Original assignee: Zhongda Group Co ltd; Zhejiang University ZJU
Current assignee: Zhongda Group Co ltd; Zhejiang University ZJU
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2022-02-25
Anticipated expiration: 2041-05-28
Also published as: CN113256007A

Abstract

The embodiment of the specification provides a new product sales forecasting method and device oriented to multiple modes. The method comprises the following steps: firstly, acquiring product attributes and product images of target products and multidimensional time characteristics of target sale time; inputting the acquired content into a sales prediction device for sales prediction, wherein the sales prediction device comprises a plurality of coding layers, a fusion layer and a decoder, and the coding layers comprise an attribute coding layer, an image coding layer and a time coding layer; the sales prediction includes: determining an attribute coding vector corresponding to the product attribute through the attribute coding layer; determining an image coding vector corresponding to the product image through the image coding layer; determining a time coding vector corresponding to the multidimensional time characteristic through the time coding layer; performing fusion processing according to a plurality of encoding vectors determined by a plurality of encoding layers through the fusion layer to obtain a fusion vector; and outputting the predicted sales volume of the target product at the target sales time through the decoder according to the fusion vector.

Description

Multi-mode-oriented new product sales forecasting method and device

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for predicting new product sales in a multi-modal environment.

Background

The prediction of the sales volume of a new product is crucial to merchants, for example, many merchants have a problem of large backlog of inventory due to inaccurate prediction, and in order to maintain brand value, the merchants may reduce the price of the product, promote the product, or even destroy the inventory; and the inventory backlog itself also brings huge cost.

For a new product, no historical sales data is used for analysis, and how to more accurately predict sales volume is the problem to be solved by the invention.

Disclosure of Invention

One or more embodiments of the present specification describe a multi-mode-oriented method and apparatus for predicting sales of new products, which can realize accurate prediction of sales of new products without historical sales data.

According to a first aspect, a new product sales forecasting method facing multiple modes is provided, which comprises a product sales forecasting method, and comprises the following steps: and acquiring the product attribute, the product image and the multidimensional time characteristic of the target sale time of the target product. Inputting the obtained content into a sales prediction device for sales prediction, wherein the sales prediction device comprises a plurality of coding layers, a fusion layer and a decoder, and the coding layers comprise an attribute coding layer, an image coding layer and a time coding layer; the sales prediction comprises: determining an attribute coding vector corresponding to the product attribute through the attribute coding layer; determining an image coding vector corresponding to the product image through the image coding layer; determining a time coding vector corresponding to the multidimensional time characteristic through the time coding layer; performing fusion processing according to the plurality of encoding vectors determined by the plurality of encoding layers through the fusion layer to obtain a fusion vector; and outputting the predicted sales volume of the target product at the target sales time according to the fusion vector through the decoder.

In one embodiment, the product attributes include at least one of: product name, product color, product category, product style, sales price, product volume, product weight, manufacturer information.

In one embodiment, the multi-dimensional temporal features include at least one of: whether the target sale time overlaps with a weekend, a proportion of overlapping with a weekend, whether it overlaps with a holiday, a proportion of overlapping with a holiday, whether it overlaps with a period of a specific event, a proportion of overlapping with a period of a specific event, and a time difference between specific events.

In one embodiment, the multidimensional time features are arranged in sequence to form a time feature sequence, and the time coding layer is a time sequence network; determining a time feature vector corresponding to the multidimensional time feature through the time characterization layer, wherein the determining includes: and processing the time characteristic sequence through the time sequence network to obtain a time sequence coding vector as the time coding vector.

In a specific embodiment, the timing network is implemented as a recurrent neural network RNN, a long short term memory network LSTM, or a gated cyclic unit GRU.

In one embodiment, the obtained content further includes weather information and/or event information corresponding to the target sale time; the plurality of coding layers further comprise a weather coding layer and/or an event coding layer; the sales prediction further comprises: determining a weather coding vector corresponding to the weather information through the weather coding layer; and/or determining an event coding vector corresponding to the event information through the event coding layer.

In one embodiment, performing a fusion process on a plurality of encoding vectors determined by the plurality of encoding layers to obtain a fusion vector includes: for each vector element in a plurality of vector elements contained in each encoding vector, determining a plurality of attention weights allocated to the vector elements by the vector element; weighting and summing the vector elements by using the attention weights to obtain a reconstructed vector element corresponding to the vector element; a plurality of reconstruction vector elements corresponding to the plurality of vector elements form a reconstruction vector; and performing fusion processing on the plurality of encoding vectors and a plurality of reconstruction vectors corresponding to the plurality of encoding vectors to determine the fusion vector.

In one embodiment, performing a fusion process on a plurality of encoding vectors determined by the plurality of encoding layers to obtain a fusion vector includes: for each code vector, determining a plurality of attention weights allocated to the code vector by the code vector for the plurality of code vectors; carrying out weighted summation on the plurality of coding vectors by utilizing the plurality of attention weights to obtain corresponding reconstructed vectors; and performing the fusion processing on the plurality of encoding vectors and a plurality of reconstruction vectors corresponding to the plurality of encoding vectors to obtain the fusion vector.

In one embodiment, the fusion process includes: a stitching process, a bit-by-bit multiplication process, or a weighted summation process.

According to a second aspect, a multi-mode-oriented new product sales forecasting device is provided, which comprises an input layer, a display layer and a display layer, wherein the input layer is used for acquiring product attributes, product images and multi-dimensional time characteristics of target sales time of target products; the attribute coding layer is used for determining attribute coding vectors corresponding to the product attributes; the image coding layer is used for determining an image coding vector corresponding to the product image; the time coding layer is used for determining a time coding vector corresponding to the multidimensional time characteristic; the fusion layer is used for carrying out fusion processing according to the plurality of encoding vectors determined by the plurality of encoding layers to obtain fusion vectors; and the decoder is used for outputting the predicted sales amount of the target product in the target sales period according to the fusion vector.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.

According to the method and the device provided by the embodiment of the specification, for a new product lacking historical sales data, the accuracy of predicting the sales volume of the product can be effectively improved by utilizing multi-modal data such as product attributes and product pictures and introducing the multi-dimensional time characteristics of target sales time. Furthermore, after a plurality of coding vectors are obtained on a plurality of coding layers, the coding vectors are further transformed and fused on a fusion layer, so that semantic information represented by the fusion vectors is more accurate, and the accuracy of predicting the product sales is more effectively improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 illustrates a multi-modal oriented new product sales forecasting device architecture diagram, in accordance with one embodiment;

FIG. 2 illustrates a flow diagram of a multi-modal oriented new product sales forecasting method, according to one embodiment;

FIG. 3 illustrates a block diagram of a multi-modal oriented new product sales forecasting apparatus, according to one embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

As mentioned above, for a new product, there is no historical sales data, and how to predict it more accurately is the problem to be solved by the present invention.

The invention provides a new product sales predicting scheme oriented to multiple modes, and the scheme realizes accurate prediction of product sales by introducing multi-mode characteristics of product attributes, images and the like and time characteristics of sales. FIG. 1 shows a multi-modal oriented architecture diagram of a new product sales prediction device, as shown in FIG. 1, the prediction device as a whole is an encoder-decoder architecture, and its input includes at least three parts in the encoder part, one part being product attributes, usually text, including textual descriptions of product name, color, style, etc.; one part is a product image, various design drawings or a real object drawing and the like; and part is a multi-dimensional time characteristic, such as weekends or holidays. And the encoder encodes each part of input respectively, and then performs fusion processing on the encoding vectors of each part by using the fusion layer to obtain a fusion vector. In the decoder, the fused vector obtained by the encoder is decoded to obtain the prediction sales.

The method for predicting the product sales disclosed in the embodiments of the present disclosure is described below with reference to fig. 1.

Fig. 2 shows a flow diagram of a method for multi-modal oriented new product sales forecasting, according to an embodiment, the execution subject of the method can be any device, server or equipment cluster with computing and processing capabilities. As shown in fig. 2, the method comprises the steps of:

step S210, acquiring product attributes, product images and multidimensional time characteristics of target sale time of the target product. Step S220, inputting the obtained content into a sales prediction device for sales prediction, wherein the sales prediction device comprises a plurality of coding layers, the plurality of coding layers comprise an attribute coding layer 11, an image coding layer 12 and a time coding layer 13, and the sales prediction device further comprises a fusion layer 14 and a decoder 15; wherein the sales prediction comprises: step S211, determining an attribute encoding vector corresponding to the product attribute through the attribute encoding layer 11; step S222, determining an image coding vector corresponding to the product image through the image coding layer 12; step S223, determining a time coding vector corresponding to the multi-dimensional time feature through the time coding layer 13; step S224, performing fusion processing on the plurality of encoded vectors determined by the plurality of encoding layers by the fusion layer 14 to obtain a fusion vector; step S225, outputting the predicted sales amount of the target product at the target sales time through the decoder 15 according to the fusion vector.

The steps are as follows:

first, in step S210, product attributes, product images, and multi-dimensional time characteristics of a target sale time of a target product are acquired. It should be noted that the target product may be a solid product, such as steel, clothing, automobile, etc., or may be a virtual product, such as an electronic book, game, etc.

In one embodiment, the product attributes may include product name, product color, product category, product style, sales price, product volume, product weight, manufacturer information. It is to be understood that product attributes are typically text. In one embodiment, the product image may be a design drawing, a physical drawing, or a product interface drawing, etc. of the product.

In one embodiment, the granularity of the target sale time may be set according to actual demand, and in one example, may be a certain period, such as 8 am to 10 am on 1 month and 2 days; in another example, it may be a certain date, such as 1 month 1 day 2021; in yet another example, it may be a certain weekend, or a certain month, etc. In one embodiment, the target time of sale may be observed from multiple time perspectives, thereby correspondingly determining a multi-dimensional time signature. The plurality of time angles may include whether it is a weekend, whether it is a holiday, whether it catches up to a particular event, whether the time is before or after a particular event, etc. accordingly, the determined multi-dimensional time characteristic may include: whether the target sales time overlaps with weekends, the ratio of overlapping with weekends, whether the target sales time overlaps with holidays, the ratio of overlapping with holidays, whether the target sales time overlaps with the period of a specific event, the ratio of overlapping with the period of a specific event, and the time difference between specific events. In one specific embodiment, the specific event may include a promotional activity, such as 618 shopping festival, twenty-one, etc., or a significant emergency event, such as an improvement file issued by a government department for the industry to which the product belongs.

It is to be understood that the category of the target product is not limited. In one example, the target product may be a metal-based product, such as steel, iron ore, or the like. In another example, the target product may be an energy based product, such as crude oil, product oil, fuel oil, and the like. In yet another example, the target product may be a chemical product, such as a liquid chemical, a pharmaceutical intermediate, a tire, a cable, a textile, a garment, or the like. In yet another example, the target product may be a vehicle such as an automobile. In yet another example, the target product may be a commodity, such as corn, soy, supplies, and the like. Therefore, multi-modal characteristics such as product attributes and product images of the target product and multi-dimensional time characteristics obtained by observing the target sale time from multiple angles can be obtained.

Thereafter, in step S220, the acquired contents are input to the sales amount prediction means for sales amount prediction. The sales predicting apparatus includes a plurality of encoding layers including at least a property encoding layer 11, an image encoding layer 12, and a temporal encoding layer 13 shown in fig. 1, and a fusion layer 14 and a decoder 15 shown in fig. 1.

The prediction of the sales by the sales prediction means may include:

step S211, determining an attribute encoding vector corresponding to the product attribute through the attribute encoding layer 11. In one embodiment, the property encoding layer 11 may be implemented using a word embedding algorithm. In one embodiment, the attribute encoding layer 11 may be implemented as a deep Neural network dnn (deep Neural networks) or a recurrent Neural network rnn (secure Neural networks). Thus, by using the product attribute of the target product as the attribute encoding layer 11, the corresponding attribute encoding vector x can be obtained_a。

In step S212, an image coding vector corresponding to the product image is determined through the image coding layer 12. In one embodiment, the image encoding layer 12 may be implemented as a convolutional neural Network CNN (convolutional Network networks), Fast CNN, inclusion, or Resnet. In a particular embodiment, the attributes may also be encoded into a vector x₁As input to the image encoding layer 12, to help better generate the encoded vectors for the image. In this way, the product image of the target product is used as an input to the image encoding layer 12, or the product image and the attribute encoding vector x₁Together as an input to the image coding layer 12, an image coding vector x may be obtained_f。

In step S213, the time code layer 13 determines the time code vector corresponding to the multi-dimensional time feature. In an embodiment, the time coding layer 13 is implemented by using a DNN network or a CNN network, and accordingly, the multidimensional time characteristic may be input to the time coding layer 13 to obtain a corresponding time coding vector. In another embodiment, the multi-dimensional temporal features are in accordance withAfter the predetermined sequence is arranged, a time signature sequence is formed, and accordingly, the time signature sequence may be input into the time sequence coding layer 13 implemented as a time sequence network to obtain a time sequence coding vector as the time coding vector. In a specific embodiment, the timing network may be an RNN network, a Long Short-Term Memory network LSTM (Long Short-Term Memory), a gated round-robin unit gru (gated current unit), or the like. In this way, the time-coded vector x can be obtained using the multi-dimensional time characteristics as input to the time-coding layer 13_t。

From the above, the attribute encoding vector x can be obtained by the attribute encoding layer 11, the image encoding layer 12, and the time encoding layer 13 among the plurality of encoding layers, respectively_aImage coding vector x_fAnd a temporal coding vector x_t。

In an embodiment, the content acquired in step S210 may further include weather information corresponding to the target sale time, such as temperature, humidity, air pressure, weather cloud, and the like; correspondingly, step S220 may further include: and inputting the weather information into a weather coding layer included in the plurality of coding layers to obtain a weather coding vector. In a specific embodiment, the weather coding layer may be implemented using any word embedding algorithm. In a particular embodiment, the weather coding layer may be implemented as a DNN network or a CNN network. In another specific embodiment, the weather coding layer may perform one-hot coding on the weather information and then perform word embedding processing to obtain the weather coding vector. In this way, weather information can be introduced at the encoding stage.

In another embodiment, the content acquired in step S210 may include event information corresponding to the target sale time. In a particular embodiment, the event information may indicate an event that hits at the target sale time or indicate that no event hit at the target sale time. Correspondingly, step S220 may further include: and inputting the event information into an event coding layer included in the plurality of coding layers to obtain an event coding vector. In a specific embodiment, the event coding layer may be implemented using any word embedding algorithm. In a particular embodiment, the event coding layer may be implemented as a DNN network or a CNN network. In another specific embodiment, the event information may be subjected to one-hot encoding by the event encoding layer and then subjected to word embedding processing, so as to obtain an event embedding vector. In this way, event information can be introduced at the encoding stage.

In the above, a plurality of corresponding coded vectors can be obtained by a plurality of coding layers. Further, the execution order of the steps S211, S212, and S213 is not limited.

Thereafter, in step S214, the fusion layer 14 performs fusion processing on the basis of the plurality of encoded vectors to obtain a fusion vector. In one embodiment, the multiple encoding vectors may be directly subjected to a fusion process to obtain a fusion vector. In another embodiment, the plurality of encoded vectors may be transformed first, and then the transformed vectors and the plurality of encoded vectors may be fused to obtain a fused vector.

In a specific embodiment, an attention mechanism may be introduced to perform transformation processing on the plurality of code vectors.

In a more specific embodiment, a self-entry mechanism may be introduced for each encoding vector to perform processing, so as to obtain a corresponding reconstruction vector. Specifically, for a first vector element of any one of a plurality of vector elements included in each encoding vector, determining a plurality of attention weights assigned to the plurality of vector elements by the first vector element; weighting and summing the vector elements by using the attention weights to obtain a reconstruction element corresponding to the first vector element; and a plurality of reconstruction elements corresponding to the plurality of vector elements form a reconstruction vector. In one example, wherein the determining of the plurality of attention weights may comprise: and respectively calculating products between the first vector element and the plurality of vector elements to obtain a plurality of product values, and then carrying out normalization processing on the plurality of product values to obtain a plurality of attention weights.

In another more specific embodiment, any two coding vectors may be spliced to obtain spliced vectors, and then a self-attention mechanism is introduced for each spliced vector to perform processing to obtain a reconstruction vector corresponding to each spliced vector.

In a more specific embodiment, a plurality of encoding vectors may be taken as a whole, and a self-entry mechanism is introduced for processing to obtain a plurality of reconstruction vectors corresponding to the plurality of encoding vectors. Specifically, for each code vector, determining a plurality of attention weights assigned by the code vector to the plurality of code vectors; and carrying out weighted summation on the plurality of coding vectors by utilizing the plurality of attention weights to obtain a reconstructed vector corresponding to the coding vector. In this way, a plurality of reconstructed vectors corresponding to the plurality of encoded vectors can be obtained.

In another specific embodiment, the multiple coded vectors may be transformed by using a feature crossing technique, a feature combination technique, or the like, to obtain a feature crossing vector or a feature combination vector.

Therefore, by introducing an attention mechanism, a feature crossing technology, a feature combination technology, and the like, a plurality of code vectors can be transformed, and then the vector obtained by the transformation and the plurality of code vectors are fused to obtain the fused vector. In one embodiment, the fusion process may include a stitching process, a bit-by-bit multiplication process, or a weighted summation process. In one example, the weights used in the weighted sum process may be predetermined, such as setting the weights of the respective vectors to be equal and the sum value to be 1. In another example, the weights used by the weighted summation process may be learning parameters in the fusion layer 14.

After the fused vector is obtained, in step S225, the decoder 15 outputs the predicted sales amount of the target product at the target sales time based on the fused vector. In one embodiment, the decoder 15 may be implemented using a DNN network or an LSTM network, etc.

In summary, by using the product sales prediction method disclosed in the embodiments of the present specification, for a new product lacking historical sales data, the accuracy of product sales prediction can be effectively improved by using multi-modal data such as product attributes and product pictures and introducing multi-dimensional time characteristics of target sales time. Furthermore, after a plurality of coding vectors are obtained on a plurality of coding layers, the coding vectors are further transformed and fused on a fusion layer, so that semantic information represented by the fusion vectors is more accurate, and the accuracy of predicting the product sales is more effectively improved.

It should be noted that the foregoing embodiment describes the use of the sales prediction apparatus, and the training of the sales prediction apparatus is similar to the use of the sales prediction apparatus, and mainly differs in that the product data used in the training stage is labeled with the sales, and after the predicted value of the sales is obtained, the training loss is determined by using the predicted value and the label value, and then the model parameters in the sales prediction apparatus are adjusted based on the training loss. Therefore, it will not be described in detail.

Corresponding to the prediction method, the embodiment of the specification also discloses a prediction device. Fig. 3 shows a structure diagram of a new product sales forecasting apparatus oriented to multiple modes according to an embodiment, as shown in fig. 3, the forecasting apparatus 300 includes:

the input layer 310 is used for acquiring the product attributes, the product images and the multidimensional time characteristics of the target sale time of the target product. An encoding layer 320, wherein the encoding layer 320 comprises: the attribute coding layer 321 is configured to determine an attribute coding vector corresponding to the product attribute; the image coding layer 322 is used for determining an image coding vector corresponding to the product image; and the time coding layer 323 is used for determining a time coding vector corresponding to the multi-dimensional time characteristic. And the fusion layer 330 is configured to perform fusion processing according to the multiple encoding vectors determined by the multiple encoding layers to obtain a fusion vector. The decoder 340 is configured to output the predicted sales amount of the target product in the target sales period according to the fusion vector.

In one embodiment, the multidimensional temporal features are arranged in sequence to form a temporal feature sequence, and the temporal coding layer 323 is a time-series network; determining a time feature vector corresponding to the multidimensional time feature through the time characterization layer, wherein the determining includes: and processing the time characteristic sequence through the time sequence network to obtain a time sequence coding vector as the time coding vector.

In one embodiment, the obtained content further includes weather information corresponding to the target sale time, and the encoding layer 320 further includes a weather encoding layer 324 for determining a weather encoding vector corresponding to the weather information.

In one embodiment, the obtained content further includes event information corresponding to the target sale time, and the encoding layer 320 further includes an event encoding layer 325 for determining an event encoding vector corresponding to the event information.

In one embodiment, the fusion layer 330 is specifically used to: for each vector element in a plurality of vector elements contained in each encoding vector, determining a plurality of attention weights allocated to the vector elements by the vector element; weighting and summing the vector elements by using the attention weights to obtain a reconstructed vector element corresponding to the vector element; a plurality of reconstruction vector elements corresponding to the plurality of vector elements form a reconstruction vector; and performing fusion processing on the plurality of encoding vectors and a plurality of reconstruction vectors corresponding to the plurality of encoding vectors to determine the fusion vector.

In one embodiment, the fusion layer 330 is specifically used to: for each code vector, determining a plurality of attention weights allocated to the code vector by the code vector for the plurality of code vectors; carrying out weighted summation on the plurality of coding vectors by utilizing the plurality of attention weights to obtain corresponding reconstructed vectors; and performing fusion processing on the plurality of encoding vectors and a plurality of reconstruction vectors corresponding to the plurality of encoding vectors to obtain the fusion vector.

In summary, with the product sales prediction apparatus disclosed in the embodiments of the present specification, for a new product lacking historical sales data, the accuracy of product sales prediction can be effectively improved by using multi-modal data such as product attributes and product pictures and introducing multi-dimensional time characteristics of target sales time. Furthermore, after a plurality of coding vectors are obtained on a plurality of coding layers, the coding vectors are further transformed and fused on a fusion layer, so that semantic information represented by the fusion vectors is more accurate, and the accuracy of predicting the product sales is more effectively improved.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 2.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A training method for a multi-modal-oriented new product sales forecasting device comprises the following steps:

acquiring an attribute description text and a product image of a target product and multidimensional time characteristics of target sale time; the multi-dimensional temporal features include at least one of: whether the target sale time overlaps a weekend, a proportion of the target sale time overlaps a weekend, whether the target sale time overlaps a holiday, a proportion of the target sale time overlaps a holiday, whether the target sale time overlaps a period of a specific event, a proportion of the target sale time overlaps a period of a specific event, and a time difference between specific events;

inputting the obtained content into a sales prediction device for sales prediction, wherein the sales prediction device comprises a plurality of coding layers, a fusion layer and a decoder, and the coding layers comprise an attribute coding layer, an image coding layer and a time coding layer; the sales prediction comprises:

processing the attribute description text in a word embedding mode through the attribute coding layer, and determining an attribute coding vector corresponding to the attribute description text;

processing the product image through the image coding layer, and determining an image coding vector corresponding to the product image;

processing a time characteristic sequence formed by the multi-dimensional time characteristics through the time coding layer realized as a time sequence network, and determining a time coding vector corresponding to the multi-dimensional time characteristics;

performing fusion processing according to the plurality of encoding vectors determined by the plurality of encoding layers through the fusion layer to obtain a fusion vector;

outputting, by the decoder, a predicted sales volume of the target product at the target sales time based on the fused vector;

and determining a training loss based on the predicted sales amount and the sales amount label of the target product at the target sales time, and adjusting the model parameters in the sales amount prediction device based on the training loss.

2. The method of claim 1, wherein the attribute description text comprises at least one of: product name, product color, product category, product style, sales price, product volume, product weight, manufacturer information.

3. The method of claim 1, wherein the timing network is implemented as a Recurrent Neural Network (RNN), a Long Short Term Memory (LSTM), or a Gated Recurrent Unit (GRU).

4. The method of claim 1, wherein the obtained content further comprises weather information and/or event information corresponding to the target sale time; the plurality of coding layers further comprise a weather coding layer and/or an event coding layer; the sales prediction further comprises:

determining a weather coding vector corresponding to the weather information through the weather coding layer; and/or

And determining an event coding vector corresponding to the event information through the event coding layer.

5. The method of claim 1, wherein performing a fusion process on the plurality of encoded vectors determined by the plurality of encoding layers to obtain a fused vector comprises:

for each vector element in a plurality of vector elements contained in each encoding vector, determining a plurality of attention weights allocated to the vector elements by the vector element; weighting and summing the vector elements by using the attention weights to obtain a reconstructed vector element corresponding to the vector element; a plurality of reconstruction vector elements corresponding to the plurality of vector elements form a reconstruction vector;

and performing fusion processing on the plurality of encoding vectors and a plurality of reconstruction vectors corresponding to the plurality of encoding vectors to determine the fusion vector.

6. The method of claim 1, wherein performing a fusion process on the plurality of encoded vectors determined by the plurality of encoding layers to obtain a fused vector comprises:

for each code vector, determining a plurality of attention weights allocated to the code vector by the code vector for the plurality of code vectors; carrying out weighted summation on the plurality of coding vectors by utilizing the plurality of attention weights to obtain corresponding reconstructed vectors;

and performing the fusion processing on the plurality of encoding vectors and a plurality of reconstruction vectors corresponding to the plurality of encoding vectors to obtain the fusion vector.

7. The method of any of claims 1-6, wherein the fusion process comprises: a stitching process, a bit-by-bit multiplication process, or a weighted summation process.

8. A training system for a multi-mode-oriented new product sales forecasting device comprises the sales forecasting device and a training unit, wherein the sales forecasting device comprises:

the input layer is used for acquiring the attribute description text and the product image of the target product and the multidimensional time characteristic of the target sale time; the multi-dimensional temporal features include at least one of: whether the target sale time overlaps a weekend, a proportion of the target sale time overlaps a weekend, whether the target sale time overlaps a holiday, a proportion of the target sale time overlaps a holiday, whether the target sale time overlaps a period of a specific event, a proportion of the target sale time overlaps a period of a specific event, and a time difference between specific events;

the attribute coding layer is used for processing the attribute description text in a word embedding mode and determining an attribute coding vector corresponding to the product attribute; the image coding layer is used for processing the product image and determining an image coding vector corresponding to the product image; the time coding layer is realized as a time sequence network and is used for processing a time characteristic sequence formed by the multi-dimensional time characteristics and determining time coding vectors corresponding to the multi-dimensional time characteristics;

the fusion layer is used for carrying out fusion processing according to the plurality of encoding vectors determined by the plurality of encoding layers to obtain fusion vectors;

a decoder for outputting the predicted sales amount of the target product in the target sales period according to the fusion vector;

the training unit is configured to: and determining a training loss based on the predicted sales volume and the sales volume label of the target product at the target sales time, and adjusting the model parameters in the sales volume prediction device based on the training loss.