CN113537623B

CN113537623B - Attention mechanism and multi-mode based service demand dynamic prediction method and system

Info

Publication number: CN113537623B
Application number: CN202110872257.6A
Authority: CN
Inventors: 刘志中; 海燕; 宋宗珀; 丰凯
Original assignee: Yantai University
Current assignee: Yantai University
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2023-08-18
Anticipated expiration: 2041-07-30
Also published as: CN113537623A

Abstract

The present disclosure provides a method and a system for dynamic prediction of service demand based on an attention mechanism and multiple modes, comprising: acquiring text data and image data generated in the service using process; respectively extracting the characteristics of the text data and the image data; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-mode machine learning, so as to realize the prediction of the service demand of the user at the next moment; the prediction model based on soft attention and multi-mode machine learning specifically comprises the following steps: based on a feature sharing mechanism, the fusion of the multi-mode data features is realized; processing the fused features by using a soft attention mechanism, and inputting the obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; based on the user information characteristics and the service interest characteristic vector representation thereof, the prediction of the service demand of the user at the next moment is realized through the full connection layer.

Description

Attention mechanism and multi-mode based service demand dynamic prediction method and system

Technical Field

The disclosure belongs to the technical field of service preference prediction, and particularly relates to a method and a system for dynamically predicting service demands based on an attention mechanism and multiple modes.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

In recent years, with the rapid development and maturity of novel computing modes such as service computing, cloud computing, mobile edge computing and the like, a large number of available services from different fields appear on a network; meanwhile, with the wide popularization of mobile networks and intelligent terminals, large-scale users can access services anytime and anywhere, and the life and work of the users are greatly facilitated. However, how to find the service required by the user from a large number of candidate services has a certain challenge, which affects the utilization rate of the service and the satisfaction of the user. In order to solve the problem, researchers develop a great deal of research work aiming at service recommendation, and obtain rich research results, so that the problem of service discovery is solved to a certain extent. However, the inventor finds that the existing service recommendation method mostly recommends the service which is interested in the user and can be used for the user by mining the information between similar users or similar services, and does not consider the actual service requirement of the user, so that the recommendation accuracy needs to be improved.

Service demand prediction is an important basis for improving service recommendation accuracy. At present, domestic and foreign scholars conduct preliminary researches on service demand prediction and obtain a certain result. Existing service demand prediction methods mainly include prediction methods based on collaborative filtering (CF: collaborative Filtering) technology, machine Learning (ML) and Deep Learning (DL). Specifically, guo et al propose a residual space-time network for short-term travel demand prediction, which can capture the space, time and dependency relationship between travel demands, and has a good prediction effect in travel demand prediction. Liu et al propose a deep integration network model based on an attention mechanism, which models inter-channel relationships, spatial relationships, and positional relationships of feature maps and predicts service demands of users, respectively. Zheng et al propose a demand-aware path planning algorithm that considers both space-time prediction and supply-demand conditions, and construct a space-time diagram convolution sequence prediction model that can predict user service requests by location and time, against space-time demand prediction and competitive supply problems. To assist service providers in pre-assigning service origins to reduce customer latency, chu et al propose a multi-scale convolutional long-term memory network model that can simultaneously consider both temporal and spatial correlations to predict future user demand. The Lu et al provides a user collaborative filtering method combining privacy attention intensity aiming at the lack of considering the privacy problem of users in movement in the existing service demand prediction, and performs service demand prediction by considering relevant factors related to the user privacy. Gardino et al learn the association between views using a multi-view approach to solve the problem of demand forecasting between inter-industry retailers and wholesalers. Rob et al use a deep learning method to relieve the complexity brought by an artificial network model, take the travel search intensity as a unique input index, and provide demand prediction between tourists and destinations for travel service personnel. Xu et al uses the historical time water demand as data information to provide an effective water demand forecast for the municipal water supply. Although existing service demand prediction studies have improved the accuracy of service recommendations to some extent; however, most of the existing research works are performed based on single-modality data, and service demand prediction in multi-modality data is not considered.

Disclosure of Invention

In order to solve the problems, the present disclosure provides a method and a system for dynamic prediction of service demand based on an attention mechanism and multiple modes, where the solution considers text data and image data generated in a service usage process, and utilizes a prediction model based on soft attention and multiple mode machine learning to implement accurate prediction of user service demand.

According to a first aspect of an embodiment of the present disclosure, there is provided a method for dynamically predicting a service demand based on an attention mechanism and multiple modes, including:

acquiring text data and image data generated in the service using process;

respectively extracting the characteristics of the text data and the image data; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-mode machine learning, so as to realize the prediction of the service demand of the user at the next moment;

the prediction model based on soft attention and multi-mode machine learning specifically comprises the following steps: based on a feature sharing mechanism, the fusion of the multi-mode data features is realized; processing the fused features by using a soft attention mechanism, and inputting the obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; based on the user information characteristics and the service interest characteristic vector representation thereof, the prediction of the service demand of the user at the next moment is realized through the full connection layer.

Furthermore, the fusion of the multi-mode data features is realized based on a feature sharing mechanism, specifically: respectively inputting the extracted text features and the extracted image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and carrying out logic addition on the output of each full-connection layer of the image characteristic network and the text characteristic network, and finally obtaining a fusion result through the output of the text characteristic network and the image characteristic network through one full-connection layer.

Further, the processing of the fused features by using the soft attention mechanism specifically includes: and calculating the weight of the fused characteristic information based on a soft attention mechanism, and obtaining diversified service interest expression vectors.

Further, the obtained result is input to a pre-trained GRU network to obtain a service interest feature vector representation of the user, specifically: and learning the service used by each moment of the user and the influence of the service used by the past moment on the service used by the current moment through the GRU network, storing a learning result in a hidden state vector of each moment, and outputting a hidden state vector at each moment to represent the learned service interest information so as to obtain the service use interest of each moment of the user.

Further, an auxiliary loss function is introduced into the GRU network, and the difference between the hidden state of each moment of the GRU and the service feature fusion vector of the next moment is calculated through the auxiliary loss function.

According to a second aspect of the embodiments of the present disclosure, there is provided a system for dynamic prediction of service demand based on an attention mechanism and multiple modalities, comprising:

the data acquisition unit is used for acquiring text data and image data generated in the service use process;

the demand prediction unit is used for extracting characteristics of the text data and the image data respectively; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-mode machine learning, so as to realize the prediction of the service demand of the user at the next moment;

According to a third aspect of the disclosed embodiments, there is provided an electronic device, including a memory, a processor and a computer program running on the memory, where the processor implements the dynamic prediction method of service demand based on an attention mechanism and multiple modes when executing the program.

According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the described dynamic prediction method of service demand based on an attention mechanism and multiple modalities.

Compared with the prior art, the beneficial effects of the present disclosure are:

(1) The proposal of the present disclosure considers text data and image data generated by the service using process, provides a dynamic prediction method for service demands based on a attention mechanism and multiple modes based on a soft attention and multiple modes machine learning (Soft Attention and Multimodal Machine Learning, SAMML) model, and realizes accurate prediction of user service demands.

(2) The soft attention and multi-mode machine learning (Soft Attention and Multimodal Machine Learning, SAMML) model is provided in the scheme, firstly, feature vectors are respectively extracted from text data and image data, feature sharing is carried out, fusion of multi-mode data features is realized, and the expression capability of a user and service association is improved; then, a Soft Attention (Soft Attention) mechanism is applied to process the fused characteristic data, and the obtained result is input into the GRU network, so that the GRU network can learn the service use interests of the user better; and finally, training a SAMML model based on the user characteristics and the service characteristic data, and using the trained SAMML model to realize accurate prediction of the service demands of the users.

Additional aspects of the disclosure will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain the disclosure, and do not constitute an undue limitation on the disclosure.

FIG. 1 is a schematic diagram of a structure of a SAMML model according to the first embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a multi-modal feature fusion structure according to a first embodiment of the disclosure;

FIG. 3 is a schematic diagram of a GRU neuron structure according to the first embodiment of the disclosure;

FIG. 4 (a) is a model loss value for a learning rate of 1e-2 for the SAMML model described in embodiment one of the present disclosure;

FIG. 4 (b) is a model loss value for a learning rate of 1e-3 for the SAMML model described in embodiment one of the present disclosure;

FIG. 4 (c) is a model loss value for a learning rate of 1e-4 for the SAMML model described in embodiment one of the present disclosure;

fig. 4 (d) is a model loss value when the SAMML model learning rate is 1e-5 in the first embodiment of the present disclosure.

Detailed Description

The disclosure is further described below with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

Embodiment one:

an objective of the present embodiment is to provide a method for dynamically predicting service demands based on an attention mechanism and multiple modes.

A service demand dynamic prediction method based on an attention mechanism and multiple modes comprises the following steps:

acquiring text data and image data generated in the service using process;

The user information features refer to information such as gender, age, occupation, economic income and the like of the user.

Further, the text feature network and the image feature network are both composed of a plurality of full connection layers.

In particular, for easy understanding, the following detailed description of the embodiments of the present disclosure will be given with reference to the accompanying drawings:

the present disclosure provides a service demand dynamic prediction method based on soft attention and multi-modal machine learning (Soft Attention and Multimodal Machine Learning, SAMML) model, considering text data and image data generated by a service usage process. Firstly, extracting feature vectors from text data and image data respectively and carrying out feature sharing to realize the fusion of multi-mode data features and improve the expression capacity of the association of a user and services; then, a Soft Attention (Soft Attention) mechanism is applied to process the fused characteristic data, and the obtained result is input into the GRU network, so that the GRU network can learn the service use interests of the user better; and finally, training a SAMML model based on the user characteristics and the service characteristic data, and using the trained SAMML model to realize accurate prediction of the service demands of the users.

As shown in fig. 1, the SAMML model is a network structure of SAMML model, which is composed of a multi-modal data feature sharing module, a service interest extraction module and a service demand prediction module,

in the SAMML model, firstly, extracting feature vectors from text data and image data of user service information based on a Doc2Vec model and a ResNet model respectively, and fusing the extracted feature vectors through a feature sharing module; then, aiming at different service use data of the user, learning the weight by utilizing a soft attention mechanism; based on the feature data obtained by GRU network processing, learning out user service application expression vectors; and finally training a SAMML model based on the user characteristics and the service characteristic vectors, and realizing the service demand prediction of the user. Specific:

multimode data feature sharing module

For the service demand prediction problem, let the training data set be t= [ (X) ₁ ,Y ₁ ),(X ₂ ,Y ₂ ),…,(X _m ,Y _m ),…,(X _n ,Y _n )]N is the data size of the training data set. Wherein, the liquid crystal display device comprises a liquid crystal display device,training data representing item m, Y _m Representing the user at X _m Corresponding service requirements; />Is a user characteristic including the sex, age, occupation, etc. of the user;representing service characteristic information. Each service characteristic information includes text data and image data related to the service application, wherein +_>Characteristic data representing k service items, +.>Text data feature representing k service items, +.>Representing the image data characteristics of the k service items.

In order to realize effective fusion of different mode data characteristics, a characteristic sharing mechanism is adopted to realize association among the multi-mode data characteristics, so that accuracy of service demand prediction is improved. Specifically, in the SAMML model, feature fusion is performed by a text feature network M _txt And an image feature network M _img Composition, and M _txt And M _img Is composed of a fully connected network. Taking a service item k of a user as an example, text feature sequences of the service item kAnd image feature sequence->Respectively input to M _txt And M _img A network; text feature->And image feature network M _img Output P of each layer of dense _img Logic addition is performed to characterize the imageWith text feature network M _txt Output P of each layer of dense _txt Logic addition is performed. Fig. 2 is a schematic diagram of a network structure of the sharing module.

Setting the node number of an input layer of the feature sharing module as a, the layer number as c, and outputting a text feature network when the feature vector passes through a first layerImage characteristic network->Wherein l is E [1, c]. After the operation of the first layer, M _txt For example, the network transmission is shown in the following formula (1) and (2):

in the above formula, represents a dot product,is M _txt The text feature vector after the l-1 layer feature sharing,represents M _txt The activation function ReLu, < ->Represents M _txt Weight matrix vector of the first layer, < ->Represents M _txt The bias value of the first layer. Finally, feature sharing vector->And->Feature sharing expression vector Feature share (Fs), fs of output user services via a layer of fully connected network ^k The calculation formula of (2) is shown as formula (3):

wherein sigma ₁ Representing ReLu activation function, W in fully connected networks ₁ Representing a weight matrix, b ₁ The value of the offset is indicated and,representing vector concatenation operator symbols.

(II) feature weight acquisition module based on Soft Attention mechanism

The Soft Attention (SA) mechanism performs a re-weighted aggregation calculation on the rest of the information by selectively ignoring part of the information, and all the information is re-weighted in an adaptive manner before being aggregated, so that important information can be separated, and interference of unimportant information on the information can be avoided, thereby improving accuracy. The weight of the feature information is obtained by selecting an SA mechanism, so that feature vectors are learned in the training process of a model, and the relevance of expression vectors is enhanced; after the diversified service interest expression vectors are obtained, the model adjusts the influence weight value of the diversified service interests of the user on the service demands of the user through the service sequences of the user in the learning process; then, multiplying the weight and the service interest expression vector of user diversification, and inputting the multiplied weight and the service interest expression vector into the GRU network to dynamically model the change process of the user diversification service interest. The SAMML model takes the feature sharing vector Fs as the input of the soft attention mechanism, and finally calculates the weighted average of different service feature vectors according to the following operation, and intuitively analyzes the occupation ratio between the different service vectors. The weight acquisition step based on SA is as follows:

step 1: initializing. Defining an attention variable z to represent an index value to be queried; z epsilon [1, N]N represents the total amount of user service characteristic items; when z=k, it is shown that the feature sharing vector Fs of the kth term is selected ^k 。

Step 2: after determining the query vector q and the feature sharing vector Fs ^k Then, for the query vector q and the query key Fs ^k The similarity is calculated and compared, and the probability alpha of the feature sharing vector of the kth term is calculated according to the formula (4) _k And carrying out normalization adjustment.

Step 3: a weighted average is performed. In the attention distribution alpha _k And when the vector q is inquired, the association degree of the characteristic shared information of the kth item in Fs and the query vector q is obtained, and the value of Soft attribute is obtained as shown in a formula (6).

Step 4: after the relevance of the different feature sharing information is calculated, processing is carried out according to the result of the formula (6) for the different feature sharing information, and the output is taken as the result and sequentially input into the GRU network for the next operation.

Wherein alpha is _k Is a probability vector representing the attention distribution, S ^k Is a scoring function of attention, the present disclosure uses a dot product model as the scoring function, and the calculation formula is shown in formula (5).

S ^k ＝s(Fs ^k ,q)＝(Fs ^k ) ^T q (5)

(III) GRU-based service interest extraction module

In recent years, GRU (Gate Recurrent Unit) neural networks have been widely used as variants of LSTM (Long-Short Term Memory) neural networks in NLP and time series data processing. Compared with LSTM, GRU network has the advantages of simple structure, high calculation speed, etc. To this end, the present disclosure learns service feature sharing vectors using a GRU network, extracting interests of user service usage. The structure of the GRU network is shown in figure 3. Wherein r is _t And z _t Representing a reset gate and an update gate, respectively. Updating the gate to control the extent to which the last-minute service state information is retained in the current state, z _t The larger the value of (c) indicates the more service information remains in the current state at the previous time; the function of the reset gate is to control how much service information was written to the current candidate set at the last momentOn r _t The smaller the value of (c), the smaller the amount of information written. The data processed by the SA mechanism is used as input x of the GRU network.

The processing procedure of the service feature sharing information in the GRU network is as follows:

step 1: according to the current state x _t Hidden state h from last moment _t-1 Through z _t Output [0,1 ]]The specific operation of the unit value is shown in the formula (7):

step 2: according to x _t Hidden state h from last moment _t-1 Through r _t Output [0,1 ]]The unit values in between, while the function tanh creates a candidate vector of values at this pointThe specific operation formula is shown as formula (8) and formula (9):

step 3: from z _t As weight vector, the candidate vector and the output vector at the previous moment are weighted and averaged to obtain the output h of the GRU network _t . The specific operation formula is shown as (10):

for the above formula, the expression point multiplication operation, sigma is a sigmoid function, tanh is a tanh activation function, x _t For time t stateInput of (time t is input of the kth service sequence after SA processing), h _t-1 R is the state function of the hidden layer at the last moment _t For resetting the output of the gate, mapping the result between 0 and 1 through a sigmoid function, wherein the information is easier to be reserved when the result is closer to 1; z _t Mapping the result to between 0 and 1 through a sigmoid function for updating the output of the gate;representing candidate activation states at time t, by a new input x _t Front state h _t-1 And weight W ^h Calculating and updating the value; h is a _t Representing the activation state at time t, representing the t-th hidden state vector in the GRU network, according to the new z _t State h of the previous time of (2) _t-1 And->To obtain a new output value of the GRU. W (W) ^u ，W ^r ，W ^h And U ^u ，U ^r ，U ^h Weight matrix representing update gate and reset gate, b ^u ，b ^r ，b ^h Representing the offset values of the update gate and the reset gate, respectively.

The GRU network can learn the service used by each moment of the user and the influence of the service used by the past moment on the service used by the current moment, store the learning result in the hidden state vector of each moment, and output a hidden state vector to represent the learned service interest information at each moment, so that the hidden state vector h of each moment in the GRU network _t The service usage interest of the user at each moment can be represented. In order to improve the extraction effect of the interest of the GRU in service use, the present disclosure introduces an auxiliary loss function L to the GRU network _lf (as shown in (11)) for calculating the gap between the hidden state of each moment of the GRU and the service feature fusion vector of the next moment.

(IV) SAMML-based service demand prediction

In the process of obtaining the service interest feature expression vector h _t Thereafter, a user-based service interest feature expression vector h _t And the user information feature vector predicts the service requirement of the user at the next moment. When training the service demand prediction module, defining input data asWherein (1)>Representing the feature vector of the user information>Representing the final service interest expression vector of the user, y _i The value representing the model represents the service demand at the next moment of the user. The prediction function of the service demand prediction module is shown in formula (12):

wherein sigma ₁ Represents a ReLu activation function, W represents a weight matrix, I _i Representing input data, b represents a bias value.

In the SAMML model, the service demand problem at the next moment of the user is predicted to belong to the regression problem in the machine learning based on the multi-modal machine learning according to the service sequence used by the user. For regression problems in machine learning, a commonly used loss function is the square absolute error (MAE), which refers to the predicted value of the service demand prediction modelAverage value of the distance from the true tag value y. Assuming that the number of samples of training data is n, the calculation formula of MAE is shown as (13).

The SAMML model total loss function L is mainly predicted by service demand loss function L _tag And an auxiliary loss function L _tf Two parts. L (L) _tag And L _tf The MAE penalty function is used, except for the input part of the MAE. The calculation formula of the total loss function L is shown in (14):

L＝L _tag +α*L _tf (14)

where α represents a hyper-parameter for balancing the expression of user service interests with the prediction of the model. The present disclosure employs Adam optimization algorithms. The service demand prediction method based on the SAMML model is shown in algorithm 1.

------------------------------------------------

Algorithm 1: service demand dynamic prediction algorithm based on SAMML

Stage 1: training of SAMML model

Input: data set Data for Data// model training

1. Initializing parameters of a model;

for i TO N DO; the number of batches of data volume is// N

3. Input of training data items (X _i ,Y _i )；

4. According to formula (1), implementing a text feature learning network M _txt Output of layer 1-1 (1. Ltoreq.l. Ltoreq.c)And picture feature vector->Is a feature sharing operation of (1);

5. acquiring text feature learning network M according to formula (2) _txt Output of layer 1-1 (1. Ltoreq.l. Ltoreq.c)

6. Repeating the steps 4 and 5 to make the picture specialThe sign and the text feature are fused, and the output of the first-1 (1.ltoreq.l.ltoreq.c) layer of the picture is obtained

7. Obtaining the usage service expression vector Fs of the user according to the formula (3) ^k ；

8. According to equation (4) (5), calculate soft attention mechanisms against Fs ^k And obtaining the attention distribution of the service vector;

9. according to formula (6), for Fs ^k Performing weighted average to obtain the association degree between different services;

10. outputting an update gate and an output gate of the GRU network according to formulas (7) and (8);

11. according to formulas (9) and (10), calculating expression vector h of user using service _t ；

12. Calculating an auxiliary function value, a loss function value, a prediction function value, and a total loss according to formulas (11), (12), (13), and (14);

13. updating the parameters of the SAMML model;

14.END FOR；

UNTIL (until model training end conditions are met);

stage 2: service demand forecast

16. According to data I _i Inputting and running a SAMML model;

17. and (3) outputting: the service requirements of the user;

----------------------------------------------------

further, to demonstrate the effectiveness of the protocol of the present disclosure, specific experiments were performed as follows:

(1) Experimental environment and experimental data

To verify the effectiveness of the proposed method, the present disclosure experimentally verified the proposed method of the present disclosure using the Debiasing dataset 1 provided by the alicloud-sky pool. The data set file is stored in CSV format, and the coding format is UTF-8. The dataset contains more than one hundred thousand pieces of recorded information, mainly including user characteristics, commodity characteristics, and labels, where the commodity is mapped to a service. The user characteristics include user ID, age, gender, etc.; the service features include text features txt_vec and image features img_vec, marking the id of the item as tag item_id. The user data information of the user CSV file includes the following table 1:

TABLE 1 Debiasting data information

The experimental environment is as follows: 64-bit special version of Windows 10, CPU Intel i7 5500U, RAM 4+4GB; SAMML model was implemented using Python and TensorFlow 2.0. The present disclosure employs square absolute error MAE, mean square error MSE, root mean square error RMSE and R ² Metrics are used to evaluate SAMML performance. MAE is calculated by equation (13), MSE, RMSE and R ² The calculation formulas of (a) are shown in formulas (15) - (17):

wherein, the smaller the values of MAE, MSE and RMSE, the higher the prediction accuracy of the model, R ² The larger the value of (c) indicates the higher the prediction accuracy of the service demand prediction model.

(2) Model parameter setting

In the SAMML model, the purpose of feature sharing is to fuse the data feature vectors of two modes and improve the use of the data feature vectorsThe association of a user with a service and the expression capability. The network layer number M of the module has a certain influence on the model precision, and in order to enable the SAMML model to have good prediction capability, the experiment is carried out by setting different network layer numbers. In this experiment, initial learning rates were set to 0.001 and 0.0001, respectively, and then different network layers M were set to observe evaluation indexes of SAMML model (MAE, MSE, RMSE, R ² ) And further determines an optimal value for the number of network layers in the feature sharing module. The experimental results are shown in tables 2 and 3. From the above tables 2 and 3, it can be seen that increasing the number of layers helps to improve the prediction accuracy of the SAMML model, and the accuracy of the model shows a normal distribution trend with the increase of the number of network layers.

Table 2 learning rate = 0.0001, results of network layer M on SAMML model

Table 3 results of network layer M on SAMML model at learning rate=0.00001

However, when the number of network layers of the feature sharing module is increased, more parameters need to be learned, longer training time is occupied, and the risk of overfitting is increased. According to the experimental result, when the network layer number is 3, each index is optimal and stable under the relative condition, so that the network layer number of the feature sharing module is determined to be 3, and the learning rate is set to be 0.0001. In the SAMML model, the number of the neuron nodes of each layer of network in the feature sharing module also has a certain influence on model prediction, in order to enable the service demand prediction model to have higher prediction precision, in the SAMML model, the number of the neuron nodes of each layer of network in the feature sharing module is respectively set to be 16, 32, 64, 128 and 256, the optimal value of the neuron nodes is determined through experiments, and the experimental results are shown in table 4.

TABLE 4 influence of the number of neuronal nodes in the feature sharing Module on the SAMML model

As can be seen from Table 4 above, the index MAE, MSE, RMSE, R is evaluated when the number of neuronal nodes is 16 and 64 ² The predictability of the model varies slightly as the number of neuron nodes increases relative to the optimum value. Meanwhile, too low a number of neuron nodes in each layer of network easily causes insufficient fitting of data, and too much number of neuron nodes can increase risk of over fitting of a model. Based on experimental results, the present disclosure sets the number of network nodes per layer of the feature sharing module of the SAMML model to 64.

In the SAMML model, adam algorithm is adopted to tune the parameters of the model. The learning rate of Adam algorithm has a great influence on the stability and learning ability of the SAMML model, in order to make the model have a strong prediction ability, the learning rates are set to be 1e-2, 1e-3, 1e-4 and 1e-5 in the SAMML model respectively, the model is trained, and the experimental results are stored, and are shown in fig. 4 (a) to 4 (d).

From fig. 4 (a) to 4 (d), it can be seen that when the learning rate is 1e-4, fitting has been shown when 100 epochs are used. With increasing epochs on the training set, the loss on the test set is not reduced, and when the learning rate is 1e-5, with increasing epochs on the training set, the loss on the test set is reduced, and the test set is under-fitted, and the test set is not reduced until 300 epochs are more gentle. In combination with graph analysis, it is evident that the smaller epoch, the better the fitting effect can be achieved, and the experimental results also show higher accuracy. From the above analysis, the present disclosure rates learning to 1e-4, i.e., 0.0001.

(3) Model performance comparison

In order to verify the performance of the prediction model proposed by the present disclosure, four prediction models based on multi-modal machine learning were selected by the present disclosure to compare with the method proposed by the present disclosure. Four typical predictive models are: RBMI (Recommendation Based on Multimodal Information), multimodal IRIS (Interest-Related Item Similarity Model Based on Multimoda), SDML (Scalable deep Multimodal learning) and IMMML (Improved Multimodal Machine Learning). The experiment used 80% of the dataset as training data for the model and 20% of the dataset as test data for the model. The performance of each model was evaluated according to the evaluation index, and the experimental results are shown in table 5.

Table 5 evaluation of the performance of different models on datasets

As can be seen from Table 5 above, the SAMML model evaluates the metrics MAE, MSE, RMSE and R ² All above are superior to other comparative models. In evaluating index R ² The SAMML model is better than the optimal result in other comparison models by 3.1%; the indices MAE, MSE and RMSE lead 2.18%, 2.63% and 2.73% of suboptimal results, respectively. The comparison result of the table shows that the SAMML model provided by the disclosure reduces the feature vector expression difference between the multiple modes by introducing a soft attention mechanism, and improves the prediction accuracy of the user service requirement.

In order to better predict the service demands of users, the present disclosure proposes a service demand dynamic prediction method based on soft attention and multi-modal machine learning. The method comprises the steps of firstly, fusing multidimensional service features of user services through a feature sharing module, and enhancing the relevance of the user services; then introducing a Soft-Attention mechanism, so that the model can dynamically change the weight, thereby changing the influence on the user service requirement; and finally, predicting the service requirement of the user through the fully connected network according to the user information and the multi-mode feature expression vector of the service. A number of experimental tests were performed based on a large number of real data sets, verifying the superiority of the proposed method of the present disclosure by comparison with other typical multimodal models.

Embodiment two:

it is an object of the present embodiment to provide a system for dynamic prediction of service demand based on an attention mechanism and multiple modes.

A system for dynamic prediction of service demand based on an attention mechanism and multiple modalities, comprising:

In further embodiments, there is also provided:

an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the method of embodiment one. For brevity, the description is omitted here.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.

A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of embodiment one.

The method in the first embodiment may be directly implemented as a hardware processor executing or implemented by a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.

The method and the system for dynamically predicting the service demand based on the attention mechanism and the multiple modes can be realized, and have wide application prospects.

The foregoing description of the preferred embodiments of the present disclosure is provided only and not intended to limit the disclosure so that various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A method for dynamic prediction of service demand based on an attention mechanism and multiple modes, comprising:

acquiring text data and image data generated in the service using process;

the prediction model based on soft attention and multi-mode machine learning specifically comprises the following steps: based on a feature sharing mechanism, the fusion of the multi-mode data features is realized; processing the fused features by using a soft attention mechanism, calculating the weight of the fused feature information based on the soft attention mechanism, and obtaining diversified service interest expression vectors; inputting the obtained result into a pre-trained GRU network to obtain service interest feature vector representation of the user; based on the user information characteristics and the service interest characteristic vector representation thereof, the prediction of the service demand of the user at the next moment is realized through a full connection layer;

inputting the obtained result into a pre-trained GRU network to obtain service interest feature vector representation of the user, wherein the service interest feature vector representation comprises the following specific steps: and learning the service used by each moment of the user and the influence of the service used by the past moment on the service used by the current moment through the GRU network, storing a learning result in a hidden state vector of each moment, and outputting a hidden state vector at each moment to represent the learned service interest information so as to obtain the service use interest of each moment of the user.

2. The method for dynamically predicting service demands based on an attention mechanism and multiple modes as recited in claim 1, wherein the feature sharing mechanism is used for realizing the fusion of the characteristics of the multiple modes, specifically: respectively inputting the extracted text features and the extracted image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and carrying out logic addition on the output of each full-connection layer of the image characteristic network and the text characteristic network, and finally obtaining a fusion result through the output of the text characteristic network and the image characteristic network through one full-connection layer.

3. The method for dynamic prediction of service demand based on an attention mechanism and multiple modes according to claim 2, wherein the text feature network and the image feature network are each composed of a plurality of fully connected layers.

4. The method for dynamically predicting service demand based on an attention mechanism and multiple modes as recited in claim 1, wherein an auxiliary loss function is introduced into said GRU network, and a gap between a hidden state of each moment of said GRU and a service feature fusion vector of a next moment is calculated through said auxiliary loss function.

5. A system for dynamic prediction of service demand based on an attention mechanism and multiple modalities, comprising:

6. The system for dynamically predicting service demands based on an attention mechanism and multiple modes as recited in claim 5, wherein the feature sharing mechanism is used for realizing the fusion of the characteristics of the multiple modes, specifically: respectively inputting the extracted text features and the extracted image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and carrying out logic addition on the output of each full-connection layer of the image characteristic network and the text characteristic network, and finally obtaining a fusion result through the output of the text characteristic network and the image characteristic network through one full-connection layer.

7. An electronic device comprising a memory, a processor and a computer program stored for execution on the memory, wherein the processor implements a dynamic attention-based and multi-modal service demand prediction method as claimed in any one of claims 1 to 4 when executing the program.

8. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a method for dynamic prediction of service demand based on an attention mechanism and multiple modalities as claimed in any one of claims 1-4.