CN113537623A - Attention mechanism and multi-mode based dynamic service demand prediction method and system - Google Patents
Attention mechanism and multi-mode based dynamic service demand prediction method and system Download PDFInfo
- Publication number
- CN113537623A CN113537623A CN202110872257.6A CN202110872257A CN113537623A CN 113537623 A CN113537623 A CN 113537623A CN 202110872257 A CN202110872257 A CN 202110872257A CN 113537623 A CN113537623 A CN 113537623A
- Authority
- CN
- China
- Prior art keywords
- service
- network
- feature
- user
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 54
- 238000000034 method Methods 0.000 title claims abstract description 48
- 239000013598 vector Substances 0.000 claims abstract description 65
- 238000010801 machine learning Methods 0.000 claims abstract description 26
- 230000004927 fusion Effects 0.000 claims abstract description 25
- 238000012545 processing Methods 0.000 claims abstract description 17
- 230000008569 process Effects 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 25
- 239000013604 expression vector Substances 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 description 16
- 210000002569 neuron Anatomy 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 101100328887 Caenorhabditis elegans col-34 gene Proteins 0.000 description 1
- 206010065042 Immune reconstitution inflammatory syndrome Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Human Resources & Organizations (AREA)
- Mathematical Physics (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Tourism & Hospitality (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- General Business, Economics & Management (AREA)
- Molecular Biology (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present disclosure provides a dynamic prediction method and system for service demand based on attention mechanism and multi-mode, comprising: acquiring text data and image data generated in the service use process; respectively extracting the characteristics of the text data and the image data; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment; the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
Description
Technical Field
The disclosure belongs to the technical field of service preference prediction, and particularly relates to a dynamic prediction method and system for service demand based on an attention mechanism and multiple modes.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In recent years, with the rapid development and maturity of novel computing modes such as service computing, cloud computing and mobile edge computing, a great amount of available services from different fields appear on a network; meanwhile, with the wide popularization of mobile networks and intelligent terminals, large-scale users can access services anytime and anywhere, and the life and work of the users are greatly facilitated. However, how a user finds a service required by the user from a large number of candidate services has certain challenges, and the utilization rate of the service and the satisfaction degree of the user are affected. In order to solve the problem, researchers develop a great deal of research work aiming at service recommendation, and obtain rich research results, thereby solving the problem of service discovery to a certain extent. However, the inventor finds that most of the existing service recommendation methods recommend services which are likely to be used and are interested by users by mining information among similar users or similar services, and does not consider the actual service requirements of the users, so that the recommendation accuracy needs to be improved.
Service demand forecasting is an important basis for improving service recommendation accuracy. At present, scholars at home and abroad make preliminary research on service demand prediction and obtain certain achievements. The existing service demand prediction methods mainly include a Collaborative Filtering (CF) technology, a Machine Learning (ML) technology and a Deep Learning (DL) technology. Specifically, Guo et al propose a residual space-time network for short-term trip demand prediction, which can capture the space, time and dependency relationship among trip demands and has a good prediction effect in trip demand prediction. Liu et al propose a deep integration network model based on an attention mechanism, which models the inter-channel relationship, the spatial relationship and the position relationship of a feature map and predicts the service requirements of users. Zheng et al propose a demand-aware path planning algorithm considering both spatio-temporal prediction and supply-demand states for spatio-temporal demand prediction and competitive supply problems, and construct a spatio-temporal graph convolution sequential prediction model that can predict user service requests by location and time. To help service providers pre-assign service starting points to reduce customer latency, Chu et al propose a multi-scale convolutional long-short term memory network model that can predict future user demand taking into account both temporal and spatial correlations. Lu et al propose a user collaborative filtering method combining privacy concern strength for the lack of consideration of privacy problem of users in moving in the existing service demand prediction, and consider relevant factors about user privacy to carry out service demand prediction. Gardino et al use a multi-view approach to learn the relationship between views to address the demand forecasting problem between inter-industry retailers and wholesalers. Rob et al utilize a deep learning approach to mitigate the complexity brought by artificial network models, take the travel search strength as the only input index, and provide travel business personnel with demand prediction between tourists and destinations. Xu et al uses historical time water demand as data information to provide effective water demand prediction for municipal water supply systems. Although existing service demand forecasting studies improve the accuracy of service recommendations to some extent; however, most of the existing research works are developed based on single-modal data, and service demand prediction under multi-modal data is not considered.
Disclosure of Invention
In order to solve the above problems, the present disclosure provides a dynamic prediction method and system for service demand based on attention mechanism and multi-mode, which considers text data and image data generated during service usage and uses a prediction model based on soft attention and multi-mode machine learning to realize accurate prediction of user service demand.
According to a first aspect of the embodiments of the present disclosure, there is provided a method for dynamic prediction of service demand based on attention mechanism and multi-modal, including:
acquiring text data and image data generated in the service use process;
respectively extracting the characteristics of the text data and the image data; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
Further, the fusion of the multi-modal data features is realized based on a feature sharing mechanism, which specifically comprises the following steps: respectively inputting the extracted text features and image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and logically adding the image characteristic and the output of each full-connection layer of the text characteristic network, and finally passing the output of the text characteristic network and the image characteristic network through one full-connection layer to obtain a fusion result.
Further, the processing the fused features by using a soft attention mechanism specifically includes: and calculating the weight of the fused feature information based on a soft attention mechanism, and obtaining diversified service interest expression vectors.
Further, the step of inputting the obtained result into a pre-trained GRU network to obtain a service interest feature vector representation of the user specifically includes: the GRU network learns the service used by the user at each moment and the influence of the service used at the past moment on the service used at the current moment, the learning result is stored in the hidden state vector at each moment, and a hidden state vector is output at each moment to represent the learned service interest information, so that the service use interest of the user at each moment is obtained.
Furthermore, an auxiliary loss function is introduced into the GRU network, and the difference between the hidden state of the GRU at each moment and the service feature fusion vector at the next moment is calculated through the auxiliary loss function.
According to a second aspect of the embodiments of the present disclosure, there is provided an attention mechanism and multimodal service demand dynamic prediction system, including:
a data acquisition unit for acquiring text data and image data generated during service use;
a demand prediction unit for performing feature extraction on the text data and the image data, respectively; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the memory, wherein the processor implements the method for dynamic prediction of service demand based on attention mechanism and multi-modal when executing the program.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor implements the method for dynamic prediction of service demand based on an attentional mechanism and multiple modalities.
Compared with the prior art, the beneficial effect of this disclosure is:
(1) the scheme disclosed by the invention considers text data and image data generated in the service using process, provides a dynamic service demand prediction method based on a Soft Attention and multi-modal Machine Learning (SAMML) model and realizes accurate prediction of user service demand.
(2) According to a Soft Attention and multi-modal Machine Learning (SAMML) model provided in the scheme, firstly, feature vectors are respectively extracted from text data and image data and feature sharing is carried out, fusion of multi-modal data features is realized, and the expression capability of a user related to a service is improved; then, processing the fused characteristic data by using a Soft Attention (Soft Attention) mechanism, and inputting the obtained result into the GRU network, so that the GRU network can better learn the service use interest of the user; and finally, training the SAMML model based on the user characteristics and the service characteristic data, and realizing accurate prediction of the service requirements of the user by using the trained SAMML model.
Advantages of additional aspects of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a schematic diagram of a SAMML model according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a multi-modal feature fusion structure according to a first embodiment of the disclosure;
fig. 3 is a schematic structural diagram of a GRU neuron according to a first embodiment of the present disclosure;
FIG. 4(a) is a model loss value when the SAMML model learning rate is a learning rate of 1e-2 in the first embodiment of the disclosure;
FIG. 4(b) is a model loss value when the SAMML model learning rate is a learning rate of 1e-3 in the first embodiment of the disclosure;
FIG. 4(c) is a model loss value when the SAMML model learning rate is a learning rate of 1e-4 in the first embodiment of the disclosure;
fig. 4(d) is a model loss value when the SAMML model learning rate is a learning rate of 1e-5 in the first embodiment of the present disclosure.
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
The first embodiment is as follows:
the embodiment aims to provide a dynamic prediction method for service demand based on attention mechanism and multi-mode.
A dynamic prediction method for service demand based on attention mechanism and multi-mode comprises the following steps:
acquiring text data and image data generated in the service use process;
respectively extracting the characteristics of the text data and the image data; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
The user information characteristics refer to information such as gender, age, occupation and economic income of the user.
Further, the fusion of the multi-modal data features is realized based on a feature sharing mechanism, which specifically comprises the following steps: respectively inputting the extracted text features and image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and logically adding the image characteristic and the output of each full-connection layer of the text characteristic network, and finally passing the output of the text characteristic network and the image characteristic network through one full-connection layer to obtain a fusion result.
Furthermore, the text feature network and the image feature network are both composed of a plurality of fully connected layers.
Further, the processing the fused features by using a soft attention mechanism specifically includes: and calculating the weight of the fused feature information based on a soft attention mechanism, and obtaining diversified service interest expression vectors.
Further, the step of inputting the obtained result into a pre-trained GRU network to obtain a service interest feature vector representation of the user specifically includes: the GRU network learns the service used by the user at each moment and the influence of the service used at the past moment on the service used at the current moment, the learning result is stored in the hidden state vector at each moment, and a hidden state vector is output at each moment to represent the learned service interest information, so that the service use interest of the user at each moment is obtained.
Furthermore, an auxiliary loss function is introduced into the GRU network, and the difference between the hidden state of the GRU at each moment and the service feature fusion vector at the next moment is calculated through the auxiliary loss function.
Specifically, for ease of understanding, the embodiments of the present disclosure are described in detail below with reference to the accompanying drawings:
the present disclosure provides a dynamic prediction method of service demand based on a Soft Attention and Multimodal Machine Learning (SAMML) model, considering text data and image data generated during service usage. Firstly, extracting feature vectors from text data and image data respectively and performing feature sharing, realizing fusion of multi-modal data features, and improving the expression capacity of user and service association; then, processing the fused characteristic data by using a Soft Attention (Soft Attention) mechanism, and inputting the obtained result into the GRU network, so that the GRU network can better learn the service use interest of the user; and finally, training the SAMML model based on the user characteristics and the service characteristic data, and realizing accurate prediction of the service requirements of the user by using the trained SAMML model.
As shown in fig. 1, the network structure of the SAMML model is composed of a multi-modal data feature sharing module, a service interest extraction module and a service demand prediction module,
in the SAMML model, firstly, feature vectors are respectively extracted from text data and image data of user service information based on a Doc2Vec model and a ResNet model, and the extracted feature vectors are fused through a feature sharing module; then, aiming at different service use data of the user, learning the weight by using a soft attention mechanism; learning a user service application expression vector based on the characteristic data obtained by GRU network processing; and finally, training the SAMML model based on the user characteristics and the service characteristic vector, and realizing the service demand prediction of the user. Specifically, the method comprises the following steps:
multimodal data feature sharing module
For the service demand prediction problem, let the training data set be T ═ X [ (-)1,Y1),(X2,Y2),…,(Xm,Ym),…,(Xn,Yn)]N is the trainingData size of the training data set. Wherein,representing the m-th item of training data, YmIndicating that the user is at XmThe corresponding service requirement;characteristics of the user, including gender, age, occupation, etc. of the user;representing service characteristic information. Each service characteristic information includes text data and image data related to a service application, wherein,characteristic data representing the k service items,a text data feature representing the k service items,image data characteristic of k service items.
In order to realize effective fusion of different modal data characteristics, a characteristic sharing mechanism is adopted to realize the association between multi-modal data characteristics, thereby improving the accuracy of service demand prediction. Specifically, in the SAMML model, the feature fusion is performed by a text feature network MtxtAnd image feature network MimgIs composed of and MtxtAnd MimgConsisting of a fully connected network. Taking the service item k of the user as an example, the text characteristic sequence of the service item k is usedAnd image feature sequencesAre respectively input to MtxtAnd MimgA network; text will be writtenFeature(s)And image feature network MimgOutput P of dense of each layerimgPerforming logical addition to characterize the imageWith text feature network MtxtOutput P of dense of each layertxtA logical addition is performed. Fig. 2 is a schematic diagram of a network structure of the shared module.
Setting the number of nodes of an input layer of the feature sharing module as a and the number of layers as c, and outputting a text feature network when a feature vector passes through the l-th layerImage feature networkWherein l is ∈ [1, c ∈ ]]. After the first layer operation, using MtxtFor example, the feature fusion formula is shown as formula (1) and formula (2):
in the above formula,. represents a dot product,is MtxtThe text feature vector after l-1 layer feature sharing,represents MtxtThe activation function ReLu of the l-th layer,represents MtxtThe weight matrix vector of the l-th layer,represents MtxtBias value of the l-th layer. Finally, the features are shared with vectorsAndoutputting Feature sharing expression vector Feature share (Fs), Fs of user service through a layer of full connection networkkThe calculation formula (2) is shown as formula (3):
wherein σ1Representing ReLu activation function, W, in fully connected networks1Representing a weight matrix, b1Which is indicative of the value of the offset,representing the vector splicing operator.
(II) feature weight acquisition module based on Soft Attention mechanism
The Soft Attention (SA) mechanism performs a re-weighting aggregation calculation on the rest of the information by selectively ignoring part of the information, and all the information is re-weighted in an adaptive manner before being aggregated, so that important information can be separated and the information is prevented from being interfered by unimportant information, thereby improving the accuracy. According to the method, the SA mechanism is selected to obtain the weight of the feature information, so that the feature vectors are guaranteed to be learned in the process of training the model, and the relevance of the expression vectors is enhanced; after obtaining diversified service interest expression vectors, enabling the model to adjust the weight value of the influence of diversified service interests of the user on the service requirements of the user through the service sequence used by the user in the learning process; then, multiplying the weight by the expression vector of the diversified service interests of the user, inputting the multiplied weight into the GRU network, and dynamically modeling the change process of the diversified service interests of the user. The SAMML model takes a feature sharing vector Fs as the input of a soft attention mechanism, finally calculates the weighted average of different service feature vectors according to the following operations, and intuitively analyzes the occupation ratio of the different service vectors. The weight acquisition steps based on SA are as follows:
step 1: and (5) initializing. Defining an attention variable z to represent an index value needing to be queried; z is equal to [1, N ]]N represents the total amount of the user service characteristic items; when z is k, it indicates that the feature sharing vector Fs of the k-th item is selectedk。
Step 2: after determining query vector q and feature sharing vector FskThen, the query vector q and the query key Fs are comparedkThe similarity is calculated and compared, and the probability alpha of the feature sharing vector of the kth item is calculated according to a formula (4)kAnd carrying out normalization adjustment.
Step 3: a weighted average is performed. In the attention distribution alphakWhen the vector q is queried, the correlation degree between the feature sharing information of the kth item in the Fs and the query vector q obtains the value of Soft Attention, as shown in formula (6).
Step 4: after calculating the relevance of the different feature sharing information, respectively processing the different feature sharing information according to the result of the formula (6), and sequentially inputting the output as the result into the GRU network to perform the next operation.
Wherein alpha iskThe probability vector of (A) represents the attention distribution, SkIs a scoring function of attention, the present disclosure adopts a dot product model as the scoring function, and the calculation formula thereof is shown as formula (5).
Sk=s(Fsk,q)=(Fsk)Tq (5)
(III) GRU-based service interest extraction module
In recent years, a gru (gate recovery unit) neural network has been widely and successfully applied to NLP and time series data processing as a variant of LSTM (Long-Short Term Memory) neural network. Compared with the LSTM, the GRU network has the advantages of simple structure, high calculation speed and the like. To this end, the present disclosure learns service feature sharing vectors using a GRU network, extracting interest in user service usage. The structure of the GRU network is shown in fig. 3. Wherein r istAnd ztRespectively representing a reset gate and an update gate. Updating the doors to control the extent to which the last service status information is retained in the current status, ztThe larger the value of (A) is, the more the service information is left in the current state at the last moment; the reset gate is used to control how much service information is written into the current candidate setUpper, rtThe smaller the value of (a), the smaller the amount of information written. The data processed by the SA mechanism is used as the input x of the GRU network.
The processing process of the service characteristic shared information in the GRU network is as follows:
step 1: according to the current state xtHidden state h from the last momentt-1Through ztOutput [0,1 ]]The specific operation is shown in formula (7):
step 2: according to xtHidden state h from the last momentt-1Through rtOutput [0,1 ]]While the function tanh creates a vector of candidate values at that momentThe specific operation formula is shown as (8) and (9):
step 3: from ztAs a weight vector, the candidate vector and the output vector of the previous moment are weighted and averaged to obtain the output h of the GRU networkt. The specific operation formula is shown as (10):
for the above formula,. represents a dot product operation,. sigma.tIs the input of the state at the time t (the time t is the input of the kth service sequence after SA processing), ht-1As a function of the state of the hidden layer at the previous moment, rtMapping the result to 0-1 through a sigmoid function for the output of a reset gate, wherein the information is easier to be preserved as the information is closer to 1; z is a radical oftMapping the result to be between 0 and 1 through a sigmoid function in order to update the output of the gate;indicating candidate activation state at time t, by new input xtFront state ht-1And a weight WhCalculating and updating the value; h istRepresenting the active state at time t, representing the t-th hidden state vector in the GRU network, according to the new ztState h of the previous momentt-1Andto obtain a new output value of GRU. Wu,Wr,WhAnd Uu,Ur,UhWeight matrices representing the update gate and the reset gate, respectively, bu,br,bhRespectively representThe offset values of the update gate and the reset gate.
The GRU network can learn the service used by the user at each moment and the influence of the service used at the past moment on the service used at the current moment, store the learning result in the hidden state vector at each moment, and output one hidden state vector at each moment to represent the learned service interest information, so that the hidden state vector h at each moment in the GRU networktIt is possible to express a service use interest of a user at every moment. In order to improve the extraction effect of GRU on the interest of the service use, the auxiliary loss function L is introduced into the GRU network by the methodlf(as shown in (11)) to calculate the gap between the hidden state of the GRU at each time and the service feature fusion vector at the next time.
(IV) SAMML-based service demand forecasting
In obtaining the service interest feature expression vector htThen, expressing vector h based on service interest characteristic of usertAnd the user information characteristic vector is used for predicting the service requirement of the user at the next moment. When training the service demand prediction module, defining the input data asWherein,a feature vector representing the information of the user,representing the final service interest expression vector, y, of the useriThe values representing the model represent the service requirements of the user at the next moment. The prediction function of the service demand prediction module is shown in formula (12):
wherein σ1Representing the ReLu activation function, W representing the weight matrix, IiRepresenting the input data and b the offset value.
In the SAMML model, a service demand problem of a user at the next time is predicted to belong to a regression problem in machine learning based on multi-modal machine learning according to a service sequence used by the user. For the regression problem in machine learning, a commonly used loss function is the square absolute error (MAE), which refers to the prediction value of the service demand prediction modelAverage of the distance from the true tag value y. Assuming that the number of samples of the training data is n, the calculation formula of MAE is shown as (13).
SAMML model Total loss function L is mainly predicted by service demandtagAnd an auxiliary loss function LtfTwo parts are formed. L istagAnd LtfThe MAE loss functions are all used, only the input part of the MAE is different. The overall loss function L is calculated as shown in (14):
L=Ltag+α*Ltf (14)
where α represents a hyper-parameter, which is used to balance the expression of user service interest and the prediction of the model. The present disclosure employs an Adam optimization algorithm. The service demand prediction method based on the SAMML model is shown as algorithm 1.
------------------------------------------------
Algorithm 1: SAMML-based service demand dynamic prediction algorithm
Stage 1: training of SAMML models
Inputting: data// model trained Data set Data
1. Initializing parameters of a model;
FOR i TO N DO; // N is the number of batches of data volume
3. Inputting training data items (X)i,Yi);
4. Implementing a text feature learning network M according to equation (1)txtOutput of l-1 (l is more than or equal to 1 and less than or equal to c) layerAnd picture feature vectorA feature sharing operation of (1);
5. obtaining text feature learning network M according to formula (2)txtOutput of l-1 (l is more than or equal to 1 and less than or equal to c) layer
6. Repeating the steps 4 and 5, fusing the picture characteristics and the text characteristics, and obtaining the output of the l-1 (l is more than or equal to 1 and less than or equal to c) layer of the picture
7. Obtaining a user service expression vector Fs according to the formula (3)k;
8. Calculating the Soft attention mechanism for Fs according to equations (4) (5)kAnd obtaining the attention distribution of the service vector;
9. according to equation (6), for FskCarrying out weighted average to obtain the association degree between different services;
10. outputting an update gate and an output gate of the GRU network according to equations (7) and (8);
11. calculating an expression vector h of the service used by the user according to the formulas (9) and (10)t;
12. Calculating an auxiliary function value, a loss function value, a prediction function value, and a total loss according to equations (11), (12), (13), (14);
13. updating SAMML model parameters;
14.END FOR;
UNTIL (UNTIL the model training end condition is met);
and (2) stage: service demand prediction
16. According to data IiInputting and running a SAMML model;
17. and (3) outputting: the service requirements of the user;
----------------------------------------------------
further, in order to prove the effectiveness of the scheme of the present disclosure, specific experiments are performed as follows:
(1) experimental environment and experimental data
In order to verify the effectiveness of the proposed method, the method provided by the present disclosure is experimentally verified by using a debossing dataset 1 provided by an aristoloc-sky pool. The data set file is stored in CSV format, and the encoding format is UTF-8. The data set contains more than ten million pieces of recorded information, mainly comprising user characteristics, commodity characteristics and labels, wherein commodities are mapped into services. The user characteristics include user ID, age, gender, etc.; the service feature comprises a text feature txt _ vec and an image feature img _ vec, and the id of the commodity is marked as a tag item _ id. The user data information of the user CSV file includes the following table 1:
TABLE 1 Debiasing data information
The experimental environment is as follows: 64 bits of Windows 10 professional edition operating system, CPU Intel i 75500U, RAM 4+4 GB; the SAMML model was implemented using Python with TensorFlow 2.0. The present disclosure employs the square absolute error MAE, mean square error MSE, root mean square error RMSE and R2Indices to evaluate SAMML performance. The formula for MAE is shown in formula (13), MSE, RMSE and R2The calculation formula (2) is shown in formulas (15) to (17):
wherein, the smaller the values of MAE, MSE and RMSE are, the higher the prediction accuracy of the model is, R2A larger value of (d) indicates a higher prediction accuracy of the service demand prediction model.
(2) Model parameter setting
In the SAMML model, the purpose of feature sharing is to merge data feature vectors of two modalities and improve the relevance and expressive power of users and services. The network layer number M of the module has certain influence on the model precision, and in order to enable the SAMML model to have better prediction capability, the experiment is carried out by setting different network layer numbers. In this experiment, initial learning rates were set to 0.001 and 0.0001, respectively, and then different numbers of network layers M were set to observe evaluation indices (MAE, MSE, RMSE, R) of the SAMML model2) And determining the optimal value of the number of network layers in the feature sharing module. The results of the experiment are shown in tables 2 and 3. As can be seen from tables 2 and 3 above, increasing the number of layers helps to improve the prediction accuracy of the SAMML model, and as the number of network layers increases, the accuracy of the model shows a normal distribution trend.
Table 2 results of the SAMML model for the network layer M when the learning rate is 0.0001
Table 3 results of the SAMML model for the network layer M when the learning rate is 0.00001
However, when the number of the feature sharing module network layers is increased, more parameters need to be learned, longer training time is occupied, and the risk of overfitting is increased. According to the experimental result, when the number of network layers is 3, each index is optimal and stable under the relative condition, so that the number of network layers of the feature sharing module is determined to be 3, and the set learning rate is 0.0001. In the SAMML model, the number of neuron nodes in each layer of the network in the feature sharing module has a certain influence on model prediction, and in order to make the service demand prediction model have higher prediction accuracy, the number of neuron nodes in each layer of the network in the feature sharing module is set to be 16, 32, 64, 128 and 256, respectively, and the optimal value of the neuron node is determined through experiments, and the experimental results are shown in table 4.
TABLE 4 influence of the number of neuronal nodes in the feature sharing Module on the SAMML model
As can be seen from Table 4 above, the evaluation indexes MAE, MSE, RMSE, R are shown when the number of neuron nodes is 16 and 642The value of (a) is relatively optimal, and as the number of neuron nodes increases, the model's predictability varies slightly. Meanwhile, too low number of neuron nodes in each layer of network easily results in insufficient fitting of data, and too much neuron nodes can increase the risk of overfitting of the model. According to experimental results, by comprehensive comparison, the number of nodes of each layer of the network of the feature sharing modules of the SAMML model is set to 64.
In the SAMML model, parameters of the model are optimized by adopting an Adam algorithm. The learning rate of the Adam algorithm has a large influence on the stability and learning capability of the SAMML model, and in order to enable the model to have strong prediction capability, the learning rates are respectively set to be 1e-2, 1e-3, 1e-4 and 1e-5 in the SAMML model, the model is trained, and the experimental results are stored, and are shown in fig. 4(a) to 4 (d).
From FIGS. 4(a) through 4(d), it can be seen that when the learning rate is 1e-4, a fit has been shown using 100 epochs. The loss on the test set is not reduced along with the increase of the epoch on the training set, and when the learning rate is 1e-5, the loss on the test set is reduced along with the increase of the epoch on the training set, so that the loss is under-fitted, and the loss is not relatively flat until 300 epochs are increased and is not reduced any more. By combining graph analysis, it is obvious that the fitting effect is better when the epoch is smaller, and the experimental result also shows higher precision. From the above analysis, the present disclosure sets the learning rate to 1e-4, i.e., 0.0001.
(3) Model performance comparison
In order to verify the performance of the prediction model provided by the disclosure, four prediction models based on multi-modal machine learning are selected by the disclosure to be compared with the method provided by the disclosure. The four typical prediction models are: RBMI (recommendation Based on Multimode information), Multimode IRIS (Interest-Related Item library Model Based on Multimode), SDML (scalable deep multimode learning), and IMMML (advanced Multimode Machine learning). The experiment used 80% of the data set as training data for the model and 20% of the data set as test data for the model. The performance of each model was evaluated according to the evaluation index, and the experimental results are shown in table 5.
TABLE 5 Performance evaluation of different models on datasets
As can be seen from Table 5 above, the SAMML model is at the evaluation indices MAE, MSE, RMSE and R2The method is superior to other comparison models. In evaluating the index R2In the above, the SAMML model is superior to the optimal result of 3.1% in other comparison models; the indexes MAE, MSE and RMSE are 2.18%, 2.63% and 2.73% ahead of the suboptimal results, respectively. The results obtained by the comparison in the table above show that the SAMML model provided by the disclosure reduces the feature vector expression difference among multiple modes by introducing a soft attention mechanism, and improves the prediction accuracy of user service requirements.
In order to better predict the service demand of a user, the disclosure provides a service demand dynamic prediction method based on soft attention and multi-modal machine learning. The method firstly fuses the multidimensional service characteristics of user services through a characteristic sharing module, and enhances the relevance of the user services; then introducing a Soft-Attention mechanism, so that the model can dynamically change the weight, thereby changing the influence on the user service requirement; and finally, predicting the service requirement of the user through a full-connection network according to the user information and the multi-mode feature expression vector of the service. A number of experimental tests were performed based on a number of real data sets, verifying the superiority of the method proposed by the present disclosure compared to other typical multi-modal models.
Example two:
the present embodiment is directed to a system for dynamic prediction of service demand based on attention mechanism and multi-modal.
An attention mechanism and multi-modal based dynamic prediction system for service demand, comprising:
a data acquisition unit for acquiring text data and image data generated during service use;
a demand prediction unit for performing feature extraction on the text data and the image data, respectively; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
In further embodiments, there is also provided:
an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of embodiment one. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of embodiment one.
The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
The dynamic service demand prediction method and system based on the attention mechanism and the multi-mode can be realized, and have wide application prospects.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Claims (10)
1. A dynamic prediction method for service demand based on attention mechanism and multi-mode is characterized by comprising the following steps:
acquiring text data and image data generated in the service use process;
respectively extracting the characteristics of the text data and the image data; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
2. The method according to claim 1, wherein the feature-sharing mechanism is used to implement fusion of multi-modal data features, specifically: respectively inputting the extracted text features and image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and logically adding the image characteristic and the output of each full-connection layer of the text characteristic network, and finally passing the output of the text characteristic network and the image characteristic network through one full-connection layer to obtain a fusion result.
3. The method as claimed in claim 1, wherein the text feature network and the image feature network are composed of a plurality of fully connected layers.
4. The method according to claim 1, wherein the fused features are processed by using a soft attention mechanism, specifically: and calculating the weight of the fused feature information based on a soft attention mechanism, and obtaining diversified service interest expression vectors.
5. The method according to claim 1, wherein the obtained result is input to a pre-trained GRU network to obtain a service interest feature vector representation of the user, specifically: the GRU network learns the service used by the user at each moment and the influence of the service used at the past moment on the service used at the current moment, the learning result is stored in the hidden state vector at each moment, and a hidden state vector is output at each moment to represent the learned service interest information, so that the service use interest of the user at each moment is obtained.
6. The method as claimed in claim 1, wherein an auxiliary penalty function is introduced into the GRU network, and the difference between the hidden state of the GRU at each time and the service feature fusion vector at the next time is calculated by the auxiliary penalty function.
7. An attention-based and multi-modal dynamic prediction system for service demand, comprising:
a data acquisition unit for acquiring text data and image data generated during service use;
a demand prediction unit for performing feature extraction on the text data and the image data, respectively; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;
the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.
8. The system according to claim 7, wherein the feature-sharing mechanism is used to implement fusion of multi-modal data features, specifically: respectively inputting the extracted text features and image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and logically adding the image characteristic and the output of each full-connection layer of the text characteristic network, and finally passing the output of the text characteristic network and the image characteristic network through one full-connection layer to obtain a fusion result.
9. An electronic device comprising a memory, a processor and a computer program stored and executed on the memory, wherein the processor implements a method for dynamic prediction of service demand based on attention mechanism and multi-modality as claimed in any one of claims 1 to 6 when executing the program.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements a method for dynamic prediction of service demand based on attentional mechanisms and multi-modalities according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110872257.6A CN113537623B (en) | 2021-07-30 | 2021-07-30 | Attention mechanism and multi-mode based service demand dynamic prediction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110872257.6A CN113537623B (en) | 2021-07-30 | 2021-07-30 | Attention mechanism and multi-mode based service demand dynamic prediction method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113537623A true CN113537623A (en) | 2021-10-22 |
CN113537623B CN113537623B (en) | 2023-08-18 |
Family
ID=78121626
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110872257.6A Active CN113537623B (en) | 2021-07-30 | 2021-07-30 | Attention mechanism and multi-mode based service demand dynamic prediction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113537623B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113987261A (en) * | 2021-11-08 | 2022-01-28 | 烟台大学 | Video recommendation method and system based on dynamic trust perception |
CN114330866A (en) * | 2021-12-24 | 2022-04-12 | 江苏微皓智能科技有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170270546A1 (en) * | 2016-03-21 | 2017-09-21 | Tata Motors Limited | Service churn model |
CN112529637A (en) * | 2020-12-22 | 2021-03-19 | 烟台大学 | Service demand dynamic prediction method and system based on context awareness |
CN112529638A (en) * | 2020-12-22 | 2021-03-19 | 烟台大学 | Service demand dynamic prediction method and system based on user classification and deep learning |
CN113128671A (en) * | 2021-04-19 | 2021-07-16 | 烟台大学 | Service demand dynamic prediction method and system based on multi-mode machine learning |
-
2021
- 2021-07-30 CN CN202110872257.6A patent/CN113537623B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170270546A1 (en) * | 2016-03-21 | 2017-09-21 | Tata Motors Limited | Service churn model |
CN112529637A (en) * | 2020-12-22 | 2021-03-19 | 烟台大学 | Service demand dynamic prediction method and system based on context awareness |
CN112529638A (en) * | 2020-12-22 | 2021-03-19 | 烟台大学 | Service demand dynamic prediction method and system based on user classification and deep learning |
CN113128671A (en) * | 2021-04-19 | 2021-07-16 | 烟台大学 | Service demand dynamic prediction method and system based on multi-mode machine learning |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113987261A (en) * | 2021-11-08 | 2022-01-28 | 烟台大学 | Video recommendation method and system based on dynamic trust perception |
CN114330866A (en) * | 2021-12-24 | 2022-04-12 | 江苏微皓智能科技有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
CN114330866B (en) * | 2021-12-24 | 2023-11-24 | 江苏微皓智能科技有限公司 | Data processing method, device, electronic equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113537623B (en) | 2023-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Deep reinforcement learning in recommender systems: A survey and new perspectives | |
CN112667714B (en) | User portrait optimization method and device based on deep learning and storage medium | |
CN111104595B (en) | Deep reinforcement learning interactive recommendation method and system based on text information | |
CN110322446A (en) | A kind of domain adaptive semantic dividing method based on similarity space alignment | |
CN112989064A (en) | Recommendation method for aggregating knowledge graph neural network and self-adaptive attention | |
CN113128671B (en) | Service demand dynamic prediction method and system based on multi-mode machine learning | |
CN113537623B (en) | Attention mechanism and multi-mode based service demand dynamic prediction method and system | |
Brando et al. | Modelling heterogeneous distributions with an uncountable mixture of asymmetric laplacians | |
CN116304279B (en) | Active perception method and system for evolution of user preference based on graph neural network | |
Tong et al. | A deep discriminative and robust nonnegative matrix factorization network method with soft label constraint | |
CN116594748A (en) | Model customization processing method, device, equipment and medium for task | |
Jiang et al. | Few-shot learning in spiking neural networks by multi-timescale optimization | |
CN115718826A (en) | Method, system, device and medium for classifying target nodes in graph structure data | |
CN116208399A (en) | Network malicious behavior detection method and device based on metagraph | |
CN112000788A (en) | Data processing method and device and computer readable storage medium | |
Zhang et al. | Hybrid structural graph attention network for POI recommendation | |
Zhang et al. | NAS4FBP: Facial beauty prediction based on neural architecture search | |
Zhou et al. | Improving indoor visual navigation generalization with scene priors and Markov relational reasoning | |
Liu | POI recommendation model using multi-head attention in location-based social network big data | |
WO2023174064A1 (en) | Automatic search method, automatic-search performance prediction model training method and apparatus | |
CN116910357A (en) | Data processing method and related device | |
CN117194765A (en) | Dual-channel graph contrast learning session recommendation method for interest perception | |
Christoforidis et al. | Recommending points of interest in LBSNs using deep learning techniques | |
CN117194966A (en) | Training method and related device for object classification model | |
CN113822291A (en) | Image processing method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |