CN113537623A - Attention mechanism and multi-mode based dynamic service demand prediction method and system - Google Patents

Attention mechanism and multi-mode based dynamic service demand prediction method and system Download PDF

Info

Publication number
CN113537623A
CN113537623A CN202110872257.6A CN202110872257A CN113537623A CN 113537623 A CN113537623 A CN 113537623A CN 202110872257 A CN202110872257 A CN 202110872257A CN 113537623 A CN113537623 A CN 113537623A
Authority
CN
China
Prior art keywords
service
network
feature
user
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110872257.6A
Other languages
Chinese (zh)
Other versions
CN113537623B (en
Inventor
刘志中
海燕
宋宗珀
丰凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yantai University
Original Assignee
Yantai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yantai University filed Critical Yantai University
Priority to CN202110872257.6A priority Critical patent/CN113537623B/en
Publication of CN113537623A publication Critical patent/CN113537623A/en
Application granted granted Critical
Publication of CN113537623B publication Critical patent/CN113537623B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Human Resources & Organizations (AREA)
  • Mathematical Physics (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • General Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present disclosure provides a dynamic prediction method and system for service demand based on attention mechanism and multi-mode, comprising: acquiring text data and image data generated in the service use process; respectively extracting the characteristics of the text data and the image data; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment; the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.

Description

Attention mechanism and multi-mode based dynamic service demand prediction method and system
Technical Field
The disclosure belongs to the technical field of service preference prediction, and particularly relates to a dynamic prediction method and system for service demand based on an attention mechanism and multiple modes.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In recent years, with the rapid development and maturity of novel computing modes such as service computing, cloud computing and mobile edge computing, a great amount of available services from different fields appear on a network; meanwhile, with the wide popularization of mobile networks and intelligent terminals, large-scale users can access services anytime and anywhere, and the life and work of the users are greatly facilitated. However, how a user finds a service required by the user from a large number of candidate services has certain challenges, and the utilization rate of the service and the satisfaction degree of the user are affected. In order to solve the problem, researchers develop a great deal of research work aiming at service recommendation, and obtain rich research results, thereby solving the problem of service discovery to a certain extent. However, the inventor finds that most of the existing service recommendation methods recommend services which are likely to be used and are interested by users by mining information among similar users or similar services, and does not consider the actual service requirements of the users, so that the recommendation accuracy needs to be improved.
Service demand forecasting is an important basis for improving service recommendation accuracy. At present, scholars at home and abroad make preliminary research on service demand prediction and obtain certain achievements. The existing service demand prediction methods mainly include a Collaborative Filtering (CF) technology, a Machine Learning (ML) technology and a Deep Learning (DL) technology. Specifically, Guo et al propose a residual space-time network for short-term trip demand prediction, which can capture the space, time and dependency relationship among trip demands and has a good prediction effect in trip demand prediction. Liu et al propose a deep integration network model based on an attention mechanism, which models the inter-channel relationship, the spatial relationship and the position relationship of a feature map and predicts the service requirements of users. Zheng et al propose a demand-aware path planning algorithm considering both spatio-temporal prediction and supply-demand states for spatio-temporal demand prediction and competitive supply problems, and construct a spatio-temporal graph convolution sequential prediction model that can predict user service requests by location and time. To help service providers pre-assign service starting points to reduce customer latency, Chu et al propose a multi-scale convolutional long-short term memory network model that can predict future user demand taking into account both temporal and spatial correlations. Lu et al propose a user collaborative filtering method combining privacy concern strength for the lack of consideration of privacy problem of users in moving in the existing service demand prediction, and consider relevant factors about user privacy to carry out service demand prediction. Gardino et al use a multi-view approach to learn the relationship between views to address the demand forecasting problem between inter-industry retailers and wholesalers. Rob et al utilize a deep learning approach to mitigate the complexity brought by artificial network models, take the travel search strength as the only input index, and provide travel business personnel with demand prediction between tourists and destinations. Xu et al uses historical time water demand as data information to provide effective water demand prediction for municipal water supply systems. Although existing service demand forecasting studies improve the accuracy of service recommendations to some extent; however, most of the existing research works are developed based on single-modal data, and service demand prediction under multi-modal data is not considered.
Disclosure of Invention
In order to solve the above problems, the present disclosure provides a dynamic prediction method and system for service demand based on attention mechanism and multi-mode, which considers text data and image data generated during service usage and uses a prediction model based on soft attention and multi-mode machine learning to realize accurate prediction of user service demand.
According to a first aspect of the embodiments of the present disclosure, there is provided a method for dynamic prediction of service demand based on attention mechanism and multi-modal, including:
acquiring text data and image data generated in the service use process;
respectively extracting the characteristics of the text data and the image data; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
Further, the fusion of the multi-modal data features is realized based on a feature sharing mechanism, which specifically comprises the following steps: respectively inputting the extracted text features and image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and logically adding the image characteristic and the output of each full-connection layer of the text characteristic network, and finally passing the output of the text characteristic network and the image characteristic network through one full-connection layer to obtain a fusion result.
Further, the processing the fused features by using a soft attention mechanism specifically includes: and calculating the weight of the fused feature information based on a soft attention mechanism, and obtaining diversified service interest expression vectors.
Further, the step of inputting the obtained result into a pre-trained GRU network to obtain a service interest feature vector representation of the user specifically includes: the GRU network learns the service used by the user at each moment and the influence of the service used at the past moment on the service used at the current moment, the learning result is stored in the hidden state vector at each moment, and a hidden state vector is output at each moment to represent the learned service interest information, so that the service use interest of the user at each moment is obtained.
Furthermore, an auxiliary loss function is introduced into the GRU network, and the difference between the hidden state of the GRU at each moment and the service feature fusion vector at the next moment is calculated through the auxiliary loss function.
According to a second aspect of the embodiments of the present disclosure, there is provided an attention mechanism and multimodal service demand dynamic prediction system, including:
a data acquisition unit for acquiring text data and image data generated during service use;
a demand prediction unit for performing feature extraction on the text data and the image data, respectively; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the memory, wherein the processor implements the method for dynamic prediction of service demand based on attention mechanism and multi-modal when executing the program.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor implements the method for dynamic prediction of service demand based on an attentional mechanism and multiple modalities.
Compared with the prior art, the beneficial effect of this disclosure is:
(1) the scheme disclosed by the invention considers text data and image data generated in the service using process, provides a dynamic service demand prediction method based on a Soft Attention and multi-modal Machine Learning (SAMML) model and realizes accurate prediction of user service demand.
(2) According to a Soft Attention and multi-modal Machine Learning (SAMML) model provided in the scheme, firstly, feature vectors are respectively extracted from text data and image data and feature sharing is carried out, fusion of multi-modal data features is realized, and the expression capability of a user related to a service is improved; then, processing the fused characteristic data by using a Soft Attention (Soft Attention) mechanism, and inputting the obtained result into the GRU network, so that the GRU network can better learn the service use interest of the user; and finally, training the SAMML model based on the user characteristics and the service characteristic data, and realizing accurate prediction of the service requirements of the user by using the trained SAMML model.
Advantages of additional aspects of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a schematic diagram of a SAMML model according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a multi-modal feature fusion structure according to a first embodiment of the disclosure;
fig. 3 is a schematic structural diagram of a GRU neuron according to a first embodiment of the present disclosure;
FIG. 4(a) is a model loss value when the SAMML model learning rate is a learning rate of 1e-2 in the first embodiment of the disclosure;
FIG. 4(b) is a model loss value when the SAMML model learning rate is a learning rate of 1e-3 in the first embodiment of the disclosure;
FIG. 4(c) is a model loss value when the SAMML model learning rate is a learning rate of 1e-4 in the first embodiment of the disclosure;
fig. 4(d) is a model loss value when the SAMML model learning rate is a learning rate of 1e-5 in the first embodiment of the present disclosure.
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
The first embodiment is as follows:
the embodiment aims to provide a dynamic prediction method for service demand based on attention mechanism and multi-mode.
A dynamic prediction method for service demand based on attention mechanism and multi-mode comprises the following steps:
acquiring text data and image data generated in the service use process;
respectively extracting the characteristics of the text data and the image data; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
The user information characteristics refer to information such as gender, age, occupation and economic income of the user.
Further, the fusion of the multi-modal data features is realized based on a feature sharing mechanism, which specifically comprises the following steps: respectively inputting the extracted text features and image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and logically adding the image characteristic and the output of each full-connection layer of the text characteristic network, and finally passing the output of the text characteristic network and the image characteristic network through one full-connection layer to obtain a fusion result.
Furthermore, the text feature network and the image feature network are both composed of a plurality of fully connected layers.
Further, the processing the fused features by using a soft attention mechanism specifically includes: and calculating the weight of the fused feature information based on a soft attention mechanism, and obtaining diversified service interest expression vectors.
Further, the step of inputting the obtained result into a pre-trained GRU network to obtain a service interest feature vector representation of the user specifically includes: the GRU network learns the service used by the user at each moment and the influence of the service used at the past moment on the service used at the current moment, the learning result is stored in the hidden state vector at each moment, and a hidden state vector is output at each moment to represent the learned service interest information, so that the service use interest of the user at each moment is obtained.
Furthermore, an auxiliary loss function is introduced into the GRU network, and the difference between the hidden state of the GRU at each moment and the service feature fusion vector at the next moment is calculated through the auxiliary loss function.
Specifically, for ease of understanding, the embodiments of the present disclosure are described in detail below with reference to the accompanying drawings:
the present disclosure provides a dynamic prediction method of service demand based on a Soft Attention and Multimodal Machine Learning (SAMML) model, considering text data and image data generated during service usage. Firstly, extracting feature vectors from text data and image data respectively and performing feature sharing, realizing fusion of multi-modal data features, and improving the expression capacity of user and service association; then, processing the fused characteristic data by using a Soft Attention (Soft Attention) mechanism, and inputting the obtained result into the GRU network, so that the GRU network can better learn the service use interest of the user; and finally, training the SAMML model based on the user characteristics and the service characteristic data, and realizing accurate prediction of the service requirements of the user by using the trained SAMML model.
As shown in fig. 1, the network structure of the SAMML model is composed of a multi-modal data feature sharing module, a service interest extraction module and a service demand prediction module,
in the SAMML model, firstly, feature vectors are respectively extracted from text data and image data of user service information based on a Doc2Vec model and a ResNet model, and the extracted feature vectors are fused through a feature sharing module; then, aiming at different service use data of the user, learning the weight by using a soft attention mechanism; learning a user service application expression vector based on the characteristic data obtained by GRU network processing; and finally, training the SAMML model based on the user characteristics and the service characteristic vector, and realizing the service demand prediction of the user. Specifically, the method comprises the following steps:
multimodal data feature sharing module
For the service demand prediction problem, let the training data set be T ═ X [ (-)1,Y1),(X2,Y2),…,(Xm,Ym),…,(Xn,Yn)]N is the trainingData size of the training data set. Wherein the content of the first and second substances,
Figure BDA0003189202500000071
representing the m-th item of training data, YmIndicating that the user is at XmThe corresponding service requirement;
Figure BDA0003189202500000072
characteristics of the user, including gender, age, occupation, etc. of the user;
Figure BDA0003189202500000081
representing service characteristic information. Each service characteristic information includes text data and image data related to a service application, wherein,
Figure BDA0003189202500000082
characteristic data representing the k service items,
Figure BDA0003189202500000083
a text data feature representing the k service items,
Figure BDA0003189202500000084
image data characteristic of k service items.
In order to realize effective fusion of different modal data characteristics, a characteristic sharing mechanism is adopted to realize the association between multi-modal data characteristics, thereby improving the accuracy of service demand prediction. Specifically, in the SAMML model, the feature fusion is performed by a text feature network MtxtAnd image feature network MimgIs composed of and MtxtAnd MimgConsisting of a fully connected network. Taking the service item k of the user as an example, the text characteristic sequence of the service item k is used
Figure BDA0003189202500000085
And image feature sequences
Figure BDA0003189202500000086
Are respectively input to MtxtAnd MimgA network; text will be writtenFeature(s)
Figure BDA0003189202500000087
And image feature network MimgOutput P of dense of each layerimgPerforming logical addition to characterize the image
Figure BDA0003189202500000088
With text feature network MtxtOutput P of dense of each layertxtA logical addition is performed. Fig. 2 is a schematic diagram of a network structure of the shared module.
Setting the number of nodes of an input layer of the feature sharing module as a and the number of layers as c, and outputting a text feature network when a feature vector passes through the l-th layer
Figure BDA0003189202500000089
Image feature network
Figure BDA00031892025000000810
Wherein l is ∈ [1, c ∈ ]]. After the first layer operation, using MtxtFor example, the feature fusion formula is shown as formula (1) and formula (2):
Figure BDA00031892025000000811
Figure BDA00031892025000000812
in the above formula,. represents a dot product,
Figure BDA00031892025000000813
is MtxtThe text feature vector after l-1 layer feature sharing,
Figure BDA00031892025000000814
represents MtxtThe activation function ReLu of the l-th layer,
Figure BDA00031892025000000815
represents MtxtThe weight matrix vector of the l-th layer,
Figure BDA00031892025000000816
represents MtxtBias value of the l-th layer. Finally, the features are shared with vectors
Figure BDA00031892025000000817
And
Figure BDA00031892025000000818
outputting Feature sharing expression vector Feature share (Fs), Fs of user service through a layer of full connection networkkThe calculation formula (2) is shown as formula (3):
Figure BDA00031892025000000819
wherein σ1Representing ReLu activation function, W, in fully connected networks1Representing a weight matrix, b1Which is indicative of the value of the offset,
Figure BDA00031892025000000820
representing the vector splicing operator.
(II) feature weight acquisition module based on Soft Attention mechanism
The Soft Attention (SA) mechanism performs a re-weighting aggregation calculation on the rest of the information by selectively ignoring part of the information, and all the information is re-weighted in an adaptive manner before being aggregated, so that important information can be separated and the information is prevented from being interfered by unimportant information, thereby improving the accuracy. According to the method, the SA mechanism is selected to obtain the weight of the feature information, so that the feature vectors are guaranteed to be learned in the process of training the model, and the relevance of the expression vectors is enhanced; after obtaining diversified service interest expression vectors, enabling the model to adjust the weight value of the influence of diversified service interests of the user on the service requirements of the user through the service sequence used by the user in the learning process; then, multiplying the weight by the expression vector of the diversified service interests of the user, inputting the multiplied weight into the GRU network, and dynamically modeling the change process of the diversified service interests of the user. The SAMML model takes a feature sharing vector Fs as the input of a soft attention mechanism, finally calculates the weighted average of different service feature vectors according to the following operations, and intuitively analyzes the occupation ratio of the different service vectors. The weight acquisition steps based on SA are as follows:
step 1: and (5) initializing. Defining an attention variable z to represent an index value needing to be queried; z is equal to [1, N ]]N represents the total amount of the user service characteristic items; when z is k, it indicates that the feature sharing vector Fs of the k-th item is selectedk
Step 2: after determining query vector q and feature sharing vector FskThen, the query vector q and the query key Fs are comparedkThe similarity is calculated and compared, and the probability alpha of the feature sharing vector of the kth item is calculated according to a formula (4)kAnd carrying out normalization adjustment.
Figure BDA0003189202500000091
Step 3: a weighted average is performed. In the attention distribution alphakWhen the vector q is queried, the correlation degree between the feature sharing information of the kth item in the Fs and the query vector q obtains the value of Soft Attention, as shown in formula (6).
Step 4: after calculating the relevance of the different feature sharing information, respectively processing the different feature sharing information according to the result of the formula (6), and sequentially inputting the output as the result into the GRU network to perform the next operation.
Wherein alpha iskThe probability vector of (A) represents the attention distribution, SkIs a scoring function of attention, the present disclosure adopts a dot product model as the scoring function, and the calculation formula thereof is shown as formula (5).
Sk=s(Fsk,q)=(Fsk)Tq (5)
Figure BDA0003189202500000101
(III) GRU-based service interest extraction module
In recent years, a gru (gate recovery unit) neural network has been widely and successfully applied to NLP and time series data processing as a variant of LSTM (Long-Short Term Memory) neural network. Compared with the LSTM, the GRU network has the advantages of simple structure, high calculation speed and the like. To this end, the present disclosure learns service feature sharing vectors using a GRU network, extracting interest in user service usage. The structure of the GRU network is shown in fig. 3. Wherein r istAnd ztRespectively representing a reset gate and an update gate. Updating the doors to control the extent to which the last service status information is retained in the current status, ztThe larger the value of (A) is, the more the service information is left in the current state at the last moment; the reset gate is used to control how much service information is written into the current candidate set
Figure BDA0003189202500000102
Upper, rtThe smaller the value of (a), the smaller the amount of information written. The data processed by the SA mechanism is used as the input x of the GRU network.
The processing process of the service characteristic shared information in the GRU network is as follows:
step 1: according to the current state xtHidden state h from the last momentt-1Through ztOutput [0,1 ]]The specific operation is shown in formula (7):
Figure BDA0003189202500000103
step 2: according to xtHidden state h from the last momentt-1Through rtOutput [0,1 ]]While the function tanh creates a vector of candidate values at that moment
Figure BDA0003189202500000104
The specific operation formula is shown as (8) and (9):
Figure BDA0003189202500000105
Figure BDA0003189202500000106
step 3: from ztAs a weight vector, the candidate vector and the output vector of the previous moment are weighted and averaged to obtain the output h of the GRU networkt. The specific operation formula is shown as (10):
Figure BDA0003189202500000107
for the above formula,. represents a dot product operation,. sigma.tIs the input of the state at the time t (the time t is the input of the kth service sequence after SA processing), ht-1As a function of the state of the hidden layer at the previous moment, rtMapping the result to 0-1 through a sigmoid function for the output of a reset gate, wherein the information is easier to be preserved as the information is closer to 1; z is a radical oftMapping the result to be between 0 and 1 through a sigmoid function in order to update the output of the gate;
Figure BDA0003189202500000111
indicating candidate activation state at time t, by new input xtFront state ht-1And a weight WhCalculating and updating the value; h istRepresenting the active state at time t, representing the t-th hidden state vector in the GRU network, according to the new ztState h of the previous momentt-1And
Figure BDA0003189202500000112
to obtain a new output value of GRU. Wu,Wr,WhAnd Uu,Ur,UhWeight matrices representing the update gate and the reset gate, respectively, bu,br,bhRespectively representThe offset values of the update gate and the reset gate.
The GRU network can learn the service used by the user at each moment and the influence of the service used at the past moment on the service used at the current moment, store the learning result in the hidden state vector at each moment, and output one hidden state vector at each moment to represent the learned service interest information, so that the hidden state vector h at each moment in the GRU networktIt is possible to express a service use interest of a user at every moment. In order to improve the extraction effect of GRU on the interest of the service use, the auxiliary loss function L is introduced into the GRU network by the methodlf(as shown in (11)) to calculate the gap between the hidden state of the GRU at each time and the service feature fusion vector at the next time.
Figure BDA0003189202500000113
(IV) SAMML-based service demand forecasting
In obtaining the service interest feature expression vector htThen, expressing vector h based on service interest characteristic of usertAnd the user information characteristic vector is used for predicting the service requirement of the user at the next moment. When training the service demand prediction module, defining the input data as
Figure BDA0003189202500000114
Wherein the content of the first and second substances,
Figure BDA0003189202500000115
a feature vector representing the information of the user,
Figure BDA0003189202500000116
representing the final service interest expression vector, y, of the useriThe values representing the model represent the service requirements of the user at the next moment. The prediction function of the service demand prediction module is shown in formula (12):
Figure BDA0003189202500000117
wherein σ1Representing the ReLu activation function, W representing the weight matrix, IiRepresenting the input data and b the offset value.
In the SAMML model, a service demand problem of a user at the next time is predicted to belong to a regression problem in machine learning based on multi-modal machine learning according to a service sequence used by the user. For the regression problem in machine learning, a commonly used loss function is the square absolute error (MAE), which refers to the prediction value of the service demand prediction model
Figure BDA0003189202500000121
Average of the distance from the true tag value y. Assuming that the number of samples of the training data is n, the calculation formula of MAE is shown as (13).
Figure BDA0003189202500000122
SAMML model Total loss function L is mainly predicted by service demandtagAnd an auxiliary loss function LtfTwo parts are formed. L istagAnd LtfThe MAE loss functions are all used, only the input part of the MAE is different. The overall loss function L is calculated as shown in (14):
L=Ltag+α*Ltf (14)
where α represents a hyper-parameter, which is used to balance the expression of user service interest and the prediction of the model. The present disclosure employs an Adam optimization algorithm. The service demand prediction method based on the SAMML model is shown as algorithm 1.
------------------------------------------------
Algorithm 1: SAMML-based service demand dynamic prediction algorithm
Stage 1: training of SAMML models
Inputting: data// model trained Data set Data
1. Initializing parameters of a model;
FOR i TO N DO; // N is the number of batches of data volume
3. Inputting training data items (X)i,Yi);
4. Implementing a text feature learning network M according to equation (1)txtOutput of l-1 (l is more than or equal to 1 and less than or equal to c) layer
Figure BDA0003189202500000123
And picture feature vector
Figure BDA0003189202500000124
A feature sharing operation of (1);
5. obtaining text feature learning network M according to formula (2)txtOutput of l-1 (l is more than or equal to 1 and less than or equal to c) layer
Figure BDA0003189202500000125
6. Repeating the steps 4 and 5, fusing the picture characteristics and the text characteristics, and obtaining the output of the l-1 (l is more than or equal to 1 and less than or equal to c) layer of the picture
Figure BDA0003189202500000126
7. Obtaining a user service expression vector Fs according to the formula (3)k
8. Calculating the Soft attention mechanism for Fs according to equations (4) (5)kAnd obtaining the attention distribution of the service vector;
9. according to equation (6), for FskCarrying out weighted average to obtain the association degree between different services;
10. outputting an update gate and an output gate of the GRU network according to equations (7) and (8);
11. calculating an expression vector h of the service used by the user according to the formulas (9) and (10)t
12. Calculating an auxiliary function value, a loss function value, a prediction function value, and a total loss according to equations (11), (12), (13), (14);
13. updating SAMML model parameters;
14.END FOR;
UNTIL (UNTIL the model training end condition is met);
and (2) stage: service demand prediction
16. According to data IiInputting and running a SAMML model;
17. and (3) outputting: the service requirements of the user;
----------------------------------------------------
further, in order to prove the effectiveness of the scheme of the present disclosure, specific experiments are performed as follows:
(1) experimental environment and experimental data
In order to verify the effectiveness of the proposed method, the method provided by the present disclosure is experimentally verified by using a debossing dataset 1 provided by an aristoloc-sky pool. The data set file is stored in CSV format, and the encoding format is UTF-8. The data set contains more than ten million pieces of recorded information, mainly comprising user characteristics, commodity characteristics and labels, wherein commodities are mapped into services. The user characteristics include user ID, age, gender, etc.; the service feature comprises a text feature txt _ vec and an image feature img _ vec, and the id of the commodity is marked as a tag item _ id. The user data information of the user CSV file includes the following table 1:
TABLE 1 Debiasing data information
Figure BDA0003189202500000131
Figure BDA0003189202500000141
The experimental environment is as follows: 64 bits of Windows 10 professional edition operating system, CPU Intel i 75500U, RAM 4+4 GB; the SAMML model was implemented using Python with TensorFlow 2.0. The present disclosure employs the square absolute error MAE, mean square error MSE, root mean square error RMSE and R2Indices to evaluate SAMML performance. The formula for MAE is shown in formula (13), MSE, RMSE and R2The calculation formula (2) is shown in formulas (15) to (17):
Figure BDA0003189202500000142
Figure BDA0003189202500000143
Figure BDA0003189202500000144
wherein, the smaller the values of MAE, MSE and RMSE are, the higher the prediction accuracy of the model is, R2A larger value of (d) indicates a higher prediction accuracy of the service demand prediction model.
(2) Model parameter setting
In the SAMML model, the purpose of feature sharing is to merge data feature vectors of two modalities and improve the relevance and expressive power of users and services. The network layer number M of the module has certain influence on the model precision, and in order to enable the SAMML model to have better prediction capability, the experiment is carried out by setting different network layer numbers. In this experiment, initial learning rates were set to 0.001 and 0.0001, respectively, and then different numbers of network layers M were set to observe evaluation indices (MAE, MSE, RMSE, R) of the SAMML model2) And determining the optimal value of the number of network layers in the feature sharing module. The results of the experiment are shown in tables 2 and 3. As can be seen from tables 2 and 3 above, increasing the number of layers helps to improve the prediction accuracy of the SAMML model, and as the number of network layers increases, the accuracy of the model shows a normal distribution trend.
Table 2 results of the SAMML model for the network layer M when the learning rate is 0.0001
Figure BDA0003189202500000151
Table 3 results of the SAMML model for the network layer M when the learning rate is 0.00001
Figure BDA0003189202500000152
However, when the number of the feature sharing module network layers is increased, more parameters need to be learned, longer training time is occupied, and the risk of overfitting is increased. According to the experimental result, when the number of network layers is 3, each index is optimal and stable under the relative condition, so that the number of network layers of the feature sharing module is determined to be 3, and the set learning rate is 0.0001. In the SAMML model, the number of neuron nodes in each layer of the network in the feature sharing module has a certain influence on model prediction, and in order to make the service demand prediction model have higher prediction accuracy, the number of neuron nodes in each layer of the network in the feature sharing module is set to be 16, 32, 64, 128 and 256, respectively, and the optimal value of the neuron node is determined through experiments, and the experimental results are shown in table 4.
TABLE 4 influence of the number of neuronal nodes in the feature sharing Module on the SAMML model
Figure BDA0003189202500000161
As can be seen from Table 4 above, the evaluation indexes MAE, MSE, RMSE, R are shown when the number of neuron nodes is 16 and 642The value of (a) is relatively optimal, and as the number of neuron nodes increases, the model's predictability varies slightly. Meanwhile, too low number of neuron nodes in each layer of network easily results in insufficient fitting of data, and too much neuron nodes can increase the risk of overfitting of the model. According to experimental results, by comprehensive comparison, the number of nodes of each layer of the network of the feature sharing modules of the SAMML model is set to 64.
In the SAMML model, parameters of the model are optimized by adopting an Adam algorithm. The learning rate of the Adam algorithm has a large influence on the stability and learning capability of the SAMML model, and in order to enable the model to have strong prediction capability, the learning rates are respectively set to be 1e-2, 1e-3, 1e-4 and 1e-5 in the SAMML model, the model is trained, and the experimental results are stored, and are shown in fig. 4(a) to 4 (d).
From FIGS. 4(a) through 4(d), it can be seen that when the learning rate is 1e-4, a fit has been shown using 100 epochs. The loss on the test set is not reduced along with the increase of the epoch on the training set, and when the learning rate is 1e-5, the loss on the test set is reduced along with the increase of the epoch on the training set, so that the loss is under-fitted, and the loss is not relatively flat until 300 epochs are increased and is not reduced any more. By combining graph analysis, it is obvious that the fitting effect is better when the epoch is smaller, and the experimental result also shows higher precision. From the above analysis, the present disclosure sets the learning rate to 1e-4, i.e., 0.0001.
(3) Model performance comparison
In order to verify the performance of the prediction model provided by the disclosure, four prediction models based on multi-modal machine learning are selected by the disclosure to be compared with the method provided by the disclosure. The four typical prediction models are: RBMI (recommendation Based on Multimode information), Multimode IRIS (Interest-Related Item library Model Based on Multimode), SDML (scalable deep multimode learning), and IMMML (advanced Multimode Machine learning). The experiment used 80% of the data set as training data for the model and 20% of the data set as test data for the model. The performance of each model was evaluated according to the evaluation index, and the experimental results are shown in table 5.
TABLE 5 Performance evaluation of different models on datasets
Figure BDA0003189202500000171
As can be seen from Table 5 above, the SAMML model is at the evaluation indices MAE, MSE, RMSE and R2The method is superior to other comparison models. In evaluating the index R2In the above, the SAMML model is superior to the optimal result of 3.1% in other comparison models; the indexes MAE, MSE and RMSE are 2.18%, 2.63% and 2.73% ahead of the suboptimal results, respectively. The results obtained by the comparison in the table above show that the SAMML model provided by the disclosure reduces the feature vector expression difference among multiple modes by introducing a soft attention mechanism, and improves the prediction accuracy of user service requirements.
In order to better predict the service demand of a user, the disclosure provides a service demand dynamic prediction method based on soft attention and multi-modal machine learning. The method firstly fuses the multidimensional service characteristics of user services through a characteristic sharing module, and enhances the relevance of the user services; then introducing a Soft-Attention mechanism, so that the model can dynamically change the weight, thereby changing the influence on the user service requirement; and finally, predicting the service requirement of the user through a full-connection network according to the user information and the multi-mode feature expression vector of the service. A number of experimental tests were performed based on a number of real data sets, verifying the superiority of the method proposed by the present disclosure compared to other typical multi-modal models.
Example two:
the present embodiment is directed to a system for dynamic prediction of service demand based on attention mechanism and multi-modal.
An attention mechanism and multi-modal based dynamic prediction system for service demand, comprising:
a data acquisition unit for acquiring text data and image data generated during service use;
a demand prediction unit for performing feature extraction on the text data and the image data, respectively; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
In further embodiments, there is also provided:
an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of embodiment one. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of embodiment one.
The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
The dynamic service demand prediction method and system based on the attention mechanism and the multi-mode can be realized, and have wide application prospects.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (10)

1. A dynamic prediction method for service demand based on attention mechanism and multi-mode is characterized by comprising the following steps:
acquiring text data and image data generated in the service use process;
respectively extracting the characteristics of the text data and the image data; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
2. The method according to claim 1, wherein the feature-sharing mechanism is used to implement fusion of multi-modal data features, specifically: respectively inputting the extracted text features and image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and logically adding the image characteristic and the output of each full-connection layer of the text characteristic network, and finally passing the output of the text characteristic network and the image characteristic network through one full-connection layer to obtain a fusion result.
3. The method as claimed in claim 1, wherein the text feature network and the image feature network are composed of a plurality of fully connected layers.
4. The method according to claim 1, wherein the fused features are processed by using a soft attention mechanism, specifically: and calculating the weight of the fused feature information based on a soft attention mechanism, and obtaining diversified service interest expression vectors.
5. The method according to claim 1, wherein the obtained result is input to a pre-trained GRU network to obtain a service interest feature vector representation of the user, specifically: the GRU network learns the service used by the user at each moment and the influence of the service used at the past moment on the service used at the current moment, the learning result is stored in the hidden state vector at each moment, and a hidden state vector is output at each moment to represent the learned service interest information, so that the service use interest of the user at each moment is obtained.
6. The method as claimed in claim 1, wherein an auxiliary penalty function is introduced into the GRU network, and the difference between the hidden state of the GRU at each time and the service feature fusion vector at the next time is calculated by the auxiliary penalty function.
7. An attention-based and multi-modal dynamic prediction system for service demand, comprising:
a data acquisition unit for acquiring text data and image data generated during service use;
a demand prediction unit for performing feature extraction on the text data and the image data, respectively; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
8. The system according to claim 7, wherein the feature-sharing mechanism is used to implement fusion of multi-modal data features, specifically: respectively inputting the extracted text features and image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and logically adding the image characteristic and the output of each full-connection layer of the text characteristic network, and finally passing the output of the text characteristic network and the image characteristic network through one full-connection layer to obtain a fusion result.
9. An electronic device comprising a memory, a processor and a computer program stored and executed on the memory, wherein the processor implements a method for dynamic prediction of service demand based on attention mechanism and multi-modality as claimed in any one of claims 1 to 6 when executing the program.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements a method for dynamic prediction of service demand based on attentional mechanisms and multi-modalities according to any of claims 1-6.
CN202110872257.6A 2021-07-30 2021-07-30 Attention mechanism and multi-mode based service demand dynamic prediction method and system Active CN113537623B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110872257.6A CN113537623B (en) 2021-07-30 2021-07-30 Attention mechanism and multi-mode based service demand dynamic prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110872257.6A CN113537623B (en) 2021-07-30 2021-07-30 Attention mechanism and multi-mode based service demand dynamic prediction method and system

Publications (2)

Publication Number Publication Date
CN113537623A true CN113537623A (en) 2021-10-22
CN113537623B CN113537623B (en) 2023-08-18

Family

ID=78121626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110872257.6A Active CN113537623B (en) 2021-07-30 2021-07-30 Attention mechanism and multi-mode based service demand dynamic prediction method and system

Country Status (1)

Country Link
CN (1) CN113537623B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987261A (en) * 2021-11-08 2022-01-28 烟台大学 Video recommendation method and system based on dynamic trust perception
CN114330866A (en) * 2021-12-24 2022-04-12 江苏微皓智能科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170270546A1 (en) * 2016-03-21 2017-09-21 Tata Motors Limited Service churn model
CN112529638A (en) * 2020-12-22 2021-03-19 烟台大学 Service demand dynamic prediction method and system based on user classification and deep learning
CN112529637A (en) * 2020-12-22 2021-03-19 烟台大学 Service demand dynamic prediction method and system based on context awareness
CN113128671A (en) * 2021-04-19 2021-07-16 烟台大学 Service demand dynamic prediction method and system based on multi-mode machine learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170270546A1 (en) * 2016-03-21 2017-09-21 Tata Motors Limited Service churn model
CN112529638A (en) * 2020-12-22 2021-03-19 烟台大学 Service demand dynamic prediction method and system based on user classification and deep learning
CN112529637A (en) * 2020-12-22 2021-03-19 烟台大学 Service demand dynamic prediction method and system based on context awareness
CN113128671A (en) * 2021-04-19 2021-07-16 烟台大学 Service demand dynamic prediction method and system based on multi-mode machine learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987261A (en) * 2021-11-08 2022-01-28 烟台大学 Video recommendation method and system based on dynamic trust perception
CN114330866A (en) * 2021-12-24 2022-04-12 江苏微皓智能科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN114330866B (en) * 2021-12-24 2023-11-24 江苏微皓智能科技有限公司 Data processing method, device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN113537623B (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN112667714B (en) User portrait optimization method and device based on deep learning and storage medium
Chen et al. Deep reinforcement learning in recommender systems: A survey and new perspectives
CN111104595B (en) Deep reinforcement learning interactive recommendation method and system based on text information
CN112221159B (en) Virtual item recommendation method and device and computer readable storage medium
CN113537623B (en) Attention mechanism and multi-mode based service demand dynamic prediction method and system
CN113128671B (en) Service demand dynamic prediction method and system based on multi-mode machine learning
CN112417289A (en) Information intelligent recommendation method based on deep clustering
CN116594748A (en) Model customization processing method, device, equipment and medium for task
Tong et al. A deep discriminative and robust nonnegative matrix factorization network method with soft label constraint
CN115718826A (en) Method, system, device and medium for classifying target nodes in graph structure data
CN114817508A (en) Sparse graph and multi-hop attention fused session recommendation system
CN115879508A (en) Data processing method and related device
Zhang et al. NAS4FBP: Facial beauty prediction based on neural architecture search
CN117408735A (en) Client management method and system based on Internet of things
WO2023174064A1 (en) Automatic search method, automatic-search performance prediction model training method and apparatus
CN116910357A (en) Data processing method and related device
CN115422369B (en) Knowledge graph completion method and device based on improved TextRank
CN116208399A (en) Network malicious behavior detection method and device based on metagraph
Zhang et al. Hybrid structural graph attention network for POI recommendation
Zhou et al. Improving indoor visual navigation generalization with scene priors and Markov relational reasoning
Christoforidis et al. Recommending points of interest in LBSNs using deep learning techniques
CN117194966A (en) Training method and related device for object classification model
Liu POI recommendation model using multi-head attention in location-based social network big data
CN113822291A (en) Image processing method, device, equipment and storage medium
Zhang et al. Online bionic visual siamese tracking based on mixed time-event triggering mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant