CN113537623A

CN113537623A - Method and system for dynamic prediction of service demand based on attention mechanism and multimodality

Info

Publication number: CN113537623A
Application number: CN202110872257.6A
Authority: CN
Inventors: 刘志中; 海燕; 宋宗珀; 丰凯
Original assignee: Yantai University
Current assignee: Yantai University
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-10-22
Anticipated expiration: 2041-07-30
Also published as: CN113537623B

Abstract

The present disclosure provides a method and system for dynamic prediction of service demand based on attention mechanism and multi-modality, including: acquiring text data and image data generated during service use; and performing feature extraction on the text data and image data respectively. ; Input the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize the prediction of the user's service demand at the next moment; wherein, the soft attention and multi-modal machine learning-based prediction model The prediction model is specifically: based on the feature sharing mechanism to realize the fusion of multi-modal data features; use the soft attention mechanism to process the fused features, and input the obtained results into the pre-trained GRU network to obtain the user's service interest Feature vector representation: Based on the feature vector representation of user information and service interest, the prediction of the user's service demand at the next moment is realized through the fully connected layer.

Description

Method and system for dynamic prediction of service demand based on attention mechanism and multimodality

技术领域technical field

本公开属于服务偏好预测技术领域，尤其涉及一种基于注意力机制及多模态的服务需求动态预测方法及系统。The present disclosure belongs to the technical field of service preference prediction, and in particular relates to a method and system for dynamic prediction of service demand based on an attention mechanism and multimodality.

背景技术Background technique

本部分的陈述仅仅是提供了与本公开相关的背景技术信息，不必然构成在先技术。The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art.

近年来，随着服务计算、云计算、移动边缘计算等新型计算模式的快速发展与成熟，网络上出现了大量来自于不同领域的可用服务；与此同时，随着移动网络与智能终端的广泛普及，使得大规模用户可以随时随地的访问服务，极大地方便了用户的生活和工作。然而，用户如何从海量的候选服务中发现用户需要的服务具有一定挑战性，影响了服务的利用率和用户的满意度。为了解决这一问题，研究人员针对服务推荐开展了大量的研究工作，并取得了丰富的研究成果，在一定程度上解决的服务发现的难题。但是，发明人发现，已有的服务推荐方法大都通过挖掘相似用户或相似服务之间的信息，为用户推荐其感兴趣的可能会使用的服务，没有考虑用户的实际服务需求，导致推荐准确度有待提高。In recent years, with the rapid development and maturity of new computing models such as service computing, cloud computing, and mobile edge computing, a large number of available services from different fields have appeared on the network. Popularization enables large-scale users to access services anytime and anywhere, which greatly facilitates the user's life and work. However, it is challenging for users to find the services that users need from a large number of candidate services, which affects the utilization of services and the satisfaction of users. In order to solve this problem, researchers have carried out a lot of research work on service recommendation, and have achieved rich research results, which solve the problem of service discovery to a certain extent. However, the inventor found that most of the existing service recommendation methods recommend services that users are interested in and may use by mining information between similar users or similar services, without considering the actual service needs of users, resulting in the accuracy of recommendation. needs improvement.

服务需求预测是提高服务推荐准确率的重要基础。目前，国内外学者针对服务需求预测进行了初步的研究，并取得了一定的成果。已有服务需求预测方法主要包括基于协同过滤(CF：Collaborative Filtering)技术、基于机器学习(ML:Machine Learning)以及基于深度学习(DL：Deep Learning)的预测方法。具体的，Guo等人提出了一种用于短期出行需求预测的残差时空网络，能够捕捉出行需求的空间、时间及出行需求之间的依赖关系，在出行需求预测方面具有较好的预测效果。Liu等人提出了一种基于注意力机制的深度集成网络模型，分别对特征地图的通道间关系、空间关系和位置关系进行建模并预测用户的服务需求。Zheng等人针对时空需求预测和竞争性供应问题，提出了一种同时考虑时空预测和供需状态的需求感知路径规划算法，并构建了一种时空图卷积顺序预测模型，该模型能够通过位置和时间预测用户服务请求。为了帮助服务提供商预先分配服务起点，以减少客户的等待时间，Chu等人提出了一种多尺度卷积长短期记忆网络模型，该模型可以同时考虑时间和空间的相关性来预测未来的用户需求。Lu等人针对现有的服务需求预测缺少考虑用户在移动中的隐私问题，提出了一种结合隐私关注强度的用户协同过滤方法，并考虑了关于用户隐私的相关因素进行服务需求预测。Gardino等人利用多视角的方法来学习视图之间的联系，以此解决行业间零售商与批发商之间的需求预测问题。Rob等人利用深度学习方法缓解人工网络模型带来的复杂度，将旅游搜索强度作为唯一输入指标，为旅游业务人员提供了游客与目的地之间的需求预测。Xu等人将历史时刻需水量作为数据信息，对城市供水系统提供有效的需水量预测。尽管已有的服务需求预测研究在一定程度上提高了服务推荐的准确度；但是，已有的研究工作大都是基于单一模态数据开展的，没有考虑多模态数据下的服务需求预测。Service demand prediction is an important basis for improving the accuracy of service recommendation. At present, scholars at home and abroad have carried out preliminary research on service demand forecasting, and have achieved certain results. Existing service demand forecasting methods mainly include forecasting methods based on collaborative filtering (CF: Collaborative Filtering) technology, based on machine learning (ML: Machine Learning) and based on deep learning (DL: Deep Learning). Specifically, Guo et al. proposed a residual space-time network for short-term travel demand forecasting, which can capture the spatial, temporal and travel demand dependencies of travel demand, and has a good forecasting effect in travel demand forecasting. . Liu et al. proposed a deep ensemble network model based on attention mechanism, which respectively modeled the inter-channel relationship, spatial relationship and location relationship of the feature map and predicted the service demand of users. For spatiotemporal demand forecasting and competitive supply problems, Zheng et al. proposed a demand-aware path planning algorithm that considered both spatiotemporal forecasting and supply and demand states, and constructed a spatiotemporal graph convolution sequential forecasting model that Time to predict user service requests. To help service providers pre-allocate service starting points to reduce customer waiting time, Chu et al. propose a multi-scale convolutional long short-term memory network model that can predict future users by considering both temporal and spatial correlations need. Aiming at the lack of considering the privacy of users in mobile, Lu et al. proposed a user collaborative filtering method combined with privacy concern strength, and considered the relevant factors about user privacy to predict service demand. Gardino et al. used a multi-view approach to learn the connections between views to solve the demand forecasting problem between retailers and wholesalers across industries. Rob et al. used deep learning methods to alleviate the complexity brought by artificial network models, and used tourism search intensity as the only input indicator to provide tourism business personnel with demand forecasts between tourists and destinations. Xu et al. used historical water demand as data information to provide effective water demand prediction for urban water supply systems. Although the existing service demand prediction research has improved the accuracy of service recommendation to a certain extent; however, most of the existing research work is based on single-modal data, and does not consider service demand forecasting under multi-modal data.

发明内容SUMMARY OF THE INVENTION

本公开为了解决上述问题，提供了一种基于注意力机制及多模态的服务需求动态预测方法及系统，所述方案考虑服务使用过程产生的文本数据和图像数据，并利用基于软注意力与多模态机器学习的预测模型，实现了用户服务需求的精确预测。In order to solve the above problems, the present disclosure provides a method and system for dynamic prediction of service demand based on attention mechanism and multimodality. The prediction model of multimodal machine learning realizes the accurate prediction of user service needs.

根据本公开实施例的第一个方面，提供了一种基于注意力机制及多模态的服务需求动态预测方法，包括：According to a first aspect of the embodiments of the present disclosure, a method for dynamic prediction of service demand based on attention mechanism and multimodality is provided, including:

获取服务使用过程中产生的文本数据及图像数据；Obtain text data and image data generated during the use of the service;

分别对所述文本数据及图像数据进行特征提取；将提取的特征输入预先训练的基于软注意力与多模态机器学习的预测模型中，实现用户下一时刻的服务需求的预测；Perform feature extraction on the text data and image data respectively; input the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning, so as to predict the user's service demand at the next moment;

其中，所述基于软注意力与多模态机器学习的预测模型，具体为：基于特征共享机制实现多模态数据特征的融合；利用软注意力机制对融合后的特征进行处理，并将获得的结果输入到预先训练的GRU网络，获得用户的服务兴趣特征向量表示；基于用户信息特征及其服务兴趣特征向量表示，通过全连接层实现对用户下一时刻的服务需求的预测。Among them, the prediction model based on soft attention and multimodal machine learning is specifically: based on the feature sharing mechanism to realize the fusion of multimodal data features; use the soft attention mechanism to process the fused features, and obtain The results are input into the pre-trained GRU network to obtain the user's service interest feature vector representation; based on the user information features and their service interest feature vector representation, the full connection layer is used to predict the user's service demand at the next moment.

进一步的，基于特征共享机制实现多模态数据特征的融合，具体为：将提取的文本特征及图像特征分别输入到文本特征网络和图像特征网络中，将文本特征与图像特征网络每层全连接层的输出进行逻辑相加；将图像特征与文本特征网络每层全连接层的输出进行逻辑相加，最后将文本特征网络与图像特征网络的输出通过一层全连接层，获得融合结果。Further, the fusion of multi-modal data features is realized based on the feature sharing mechanism, specifically: inputting the extracted text features and image features into the text feature network and the image feature network respectively, and fully connecting the text features and image feature networks at each layer. The output of the layer is logically added; the image feature and the output of each fully connected layer of the text feature network are logically added, and finally the output of the text feature network and the image feature network is passed through a fully connected layer to obtain the fusion result.

进一步的，所述利用软注意力机制对融合后的特征进行处理，具体为：基于软注意力机制计算融合后的特征信息的权重，并获得多样化的服务兴趣表达向量。Further, using the soft attention mechanism to process the fused features specifically includes: calculating the weight of the fused feature information based on the soft attention mechanism, and obtaining diverse service interest expression vectors.

进一步的，所述将获得的结果输入到预先训练的GRU网络，获得用户的服务兴趣特征向量表示，具体为：通过所述GRU网络对用户每一时刻所使用的服务、以及过去时刻使用的服务对当前时刻使用服务的影响进行学习，其学习结果储存在每一时刻的隐藏状态向量中，并在每一时刻输出一个隐藏状态向量来表示学习到的服务兴趣信息，进而获得用户每一时刻的服务使用兴趣。Further, inputting the obtained results into a pre-trained GRU network to obtain a feature vector representation of the user's service interest, specifically: the GRU network for the service used by the user at each moment and the service used at the past moment. Learn the impact of using the service at the current moment, the learning result is stored in the hidden state vector at each moment, and output a hidden state vector at each moment to represent the learned service interest information, and then obtain the user's information at each moment. Service Use Interests.

进一步的，所述GRU网络中引入辅助损失函数，通过所述辅助损失函数计算GRU每个时刻的隐藏状态和下个时刻服务特征融合向量之间的差距。Further, an auxiliary loss function is introduced into the GRU network, and the gap between the hidden state of the GRU at each moment and the service feature fusion vector at the next moment is calculated by the auxiliary loss function.

根据本公开实施例的第二个方面，提供了一种基于注意力机制及多模态的服务需求动态预测系统，包括：According to a second aspect of the embodiments of the present disclosure, there is provided an attention mechanism and multimodal service demand dynamic prediction system, including:

数据获取单元，其用于获取服务使用过程中产生的文本数据及图像数据；A data acquisition unit, which is used to acquire text data and image data generated during the use of the service;

需求预测单元，其用于分别对所述文本数据及图像数据进行特征提取；将提取的特征输入预先训练的基于软注意力与多模态机器学习的预测模型中，实现用户下一时刻的服务需求的预测；A demand prediction unit, which is used to extract features from the text data and image data respectively; input the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning, so as to realize the service of the user at the next moment demand forecast;

根据本公开实施例的第三个方面，提供了一种电子设备，包括存储器、处理器及存储在存储器上运行的计算机程序，所述处理器执行所述程序时实现所述的一种基于注意力机制及多模态的服务需求动态预测方法。According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the memory, the processor implements the attention-based attention when the processor executes the program Force mechanism and multimodal service demand dynamic prediction method.

根据本公开实施例的第四个方面，提供了一种非暂态计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现所述的一种基于注意力机制及多模态的服务需求动态预测方法。According to a fourth aspect of the embodiments of the present disclosure, a non-transitory computer-readable storage medium is provided, on which a computer program is stored, and when the program is executed by a processor, the described attention-based mechanism and multiple A modal service demand dynamic forecasting method.

与现有技术相比，本公开的有益效果是：Compared with the prior art, the beneficial effects of the present disclosure are:

(1)本公开所述方案考虑采用服务使用过程产生的文本数据和图像数据，基于提出的一种基于软注意力与多模态机器学习(Soft Attention and Multimodal MachineLearning，SAMML)模型，提供了一种基于注意力机制及多模态的服务需求动态预测方法，实现了用户服务需求的精确预测。(1) The solution described in this disclosure considers the use of text data and image data generated during service usage, and based on the proposed Soft Attention and Multimodal Machine Learning (SAMML) model, a A dynamic prediction method of service demand based on attention mechanism and multimodality, which realizes accurate prediction of user service demand.

(2)所述方案中提出的基于软注意力与多模态机器学习(Soft Attention andMultimodal Machine Learning，SAMML)模型，首先分别从文本数据和图像数据提取特征向量并进行特征共享，实现多模态数据特征的融合，提升用户与服务关联的表达能力；然后，应用软注意力(Soft Attention)机制处理融合后的特征数据，并将所得到的结果输入到GRU网络，从而使GRU网络能够更好地学习出用户的服务使用兴趣；最后，基于用户特征与服务特征数据训练SAMML模型，并使用训练好的SAMML模型实现用户的服务需求精确预测。(2) The Soft Attention and Multimodal Machine Learning (SAMML) model proposed in the scheme firstly extracts feature vectors from text data and image data respectively and performs feature sharing to realize multimodality The fusion of data features improves the expressive ability of users and services; then, the Soft Attention mechanism is applied to process the fused feature data, and the obtained results are input into the GRU network, so that the GRU network can better Finally, the SAMML model is trained based on the user characteristics and service characteristic data, and the trained SAMML model is used to accurately predict the user's service requirements.

本公开附加方面的优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本公开的实践了解到。Advantages of additional aspects of the disclosure will be set forth in part in the description that follows, and in part will become apparent from the description below, or will be learned by practice of the disclosure.

附图说明Description of drawings

构成本公开的一部分的说明书附图用来提供对本公开的进一步理解，本公开的示意性实施例及其说明用于解释本公开，并不构成对本公开的不当限定。The accompanying drawings that constitute a part of the present disclosure are used to provide further understanding of the present disclosure, and the exemplary embodiments of the present disclosure and their descriptions are used to explain the present disclosure and do not constitute an improper limitation of the present disclosure.

图1为本公开实施例一中所述的SAMML模型结构示意图；FIG. 1 is a schematic structural diagram of the SAMML model described in Embodiment 1 of the present disclosure;

图2为本公开实施例一中所述的多模态特征融合结构示意图；FIG. 2 is a schematic diagram of the multimodal feature fusion structure described in Embodiment 1 of the present disclosure;

图3为本公开实施例一中所述的GRU神经元结构示意图；3 is a schematic structural diagram of the GRU neuron described in Embodiment 1 of the present disclosure;

图4(a)为本公开实施例一中所述SAMML模型学习率为学习率为1e-2时的模型损失值；Figure 4(a) is the model loss value when the learning rate of the SAMML model described in Embodiment 1 of the present disclosure is 1e-2;

图4(b)为本公开实施例一中所述SAMML模型学习率为学习率为1e-3时的模型损失值；FIG. 4(b) is the model loss value when the learning rate of the SAMML model described in Embodiment 1 of the present disclosure is 1e-3;

图4(c)为本公开实施例一中所述SAMML模型学习率为学习率为1e-4时的模型损失值；Figure 4(c) is the model loss value when the learning rate of the SAMML model described in Embodiment 1 of the present disclosure is 1e-4;

图4(d)为本公开实施例一中所述SAMML模型学习率为学习率为1e-5时的模型损失值。FIG. 4(d) is the model loss value when the learning rate of the SAMML model described in Embodiment 1 of the present disclosure is 1e-5.

具体实施方式Detailed ways

下面结合附图与实施例对本公开做进一步说明。The present disclosure will be further described below with reference to the accompanying drawings and embodiments.

应该指出，以下详细说明都是例示性的，旨在对本公开提供进一步的说明。除非另有指明，本公开使用的所有技术和科学术语具有与本公开所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the present disclosure. Unless otherwise defined, all technical and scientific terms used in this disclosure have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本公开的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used herein is for the purpose of describing specific embodiments only, and is not intended to limit the exemplary embodiments according to the present disclosure. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural as well, furthermore, it is to be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates that There are features, steps, operations, devices, components and/or combinations thereof.

在不冲突的情况下，本公开中的实施例及实施例中的特征可以相互组合。The embodiments of this disclosure and features of the embodiments may be combined with each other without conflict.

实施例一：Example 1:

本实施例的目的是提供一种基于注意力机制及多模态的服务需求动态预测方法。The purpose of this embodiment is to provide a dynamic prediction method for service demand based on attention mechanism and multimodality.

一种基于注意力机制及多模态的服务需求动态预测方法，包括：A dynamic prediction method of service demand based on attention mechanism and multimodality, comprising:

其中，所述用户信息特征指用户的性别、年龄、职业、经济收入等信息。The user information feature refers to the user's gender, age, occupation, economic income and other information.

进一步的，所述文本特征网络和图像特征网络均由若干全连接层构成。Further, both the text feature network and the image feature network are composed of several fully connected layers.

具体的，为了便于理解，以下结合附图对本公开所述方案进行详细说明：Specifically, in order to facilitate understanding, the solutions described in the present disclosure are described in detail below with reference to the accompanying drawings:

本公开考虑服务使用过程产生的文本数据和图像数据，基于软注意力与多模态机器学习(Soft Attention and Multimodal Machine Learning，SAMML)模型提供了一种服务需求动态预测方法。该方法首先分别从文本数据和图像数据提取特征向量并进行特征共享，实现多模态数据特征的融合，提升用户与服务关联的表达能力；然后，应用软注意力(Soft Attention)机制处理融合后的特征数据，并将所得到的结果输入到GRU网络，从而使GRU网络能够更好地学习出用户的服务使用兴趣；最后，基于用户特征与服务特征数据训练SAMML模型，并使用训练好的SAMML模型实现用户的服务需求精确预测。The present disclosure considers text data and image data generated during service usage, and provides a dynamic prediction method for service requirements based on a Soft Attention and Multimodal Machine Learning (SAMML) model. The method firstly extracts feature vectors from text data and image data respectively and performs feature sharing to realize the fusion of multimodal data features and improve the expressive ability of users and services. Then, the Soft Attention mechanism is used to process the fusion. and input the obtained results into the GRU network, so that the GRU network can better learn the user's service usage interest; finally, the SAMML model is trained based on the user characteristics and service characteristic data, and the trained SAMML is used. The model realizes accurate prediction of users' service needs.

如图1所示为SAMML模型的网络结构，所述SAMML模型由多模态数据特征共享模块、服务兴趣提取模块和服务需求预测模块组成，Figure 1 shows the network structure of the SAMML model. The SAMML model consists of a multimodal data feature sharing module, a service interest extraction module and a service demand prediction module.

在SAMML模型中，首先基于Doc2Vec模型和ResNet模型分别从用户服务信息的文本数据和图像数据提取特征向量,并通过特征共享模块对所提取的特征向量进行融合；然后针对用户不同的服务使用数据，利用软注意力机制进行权重的学习；再基于GRU网络处理所得到的特征数据，学习出用户服务应用表达向量；最后基于用户特征和服务特征向量训练SAMML模型，并实现用户的服务需求预测。具体的：In the SAMML model, firstly, based on the Doc2Vec model and the ResNet model, feature vectors are extracted from the text data and image data of the user service information, respectively, and the extracted feature vectors are fused through the feature sharing module; The soft attention mechanism is used to learn the weights; then based on the feature data processed by the GRU network, the user service application expression vector is learned; finally, the SAMML model is trained based on the user features and service feature vectors, and the user's service demand prediction is realized. specific:

(一)多模态数据特征共享模块(1) Multimodal data feature sharing module

对于服务需求预测问题，设训练数据集为T＝[(X₁,Y₁),(X₂,Y₂),…,(X_m,Y_m),…,(X_n,Y_n)]，n为训练数据集的数据规模。其中，

表示第m项的训练数据，Y_m表示用户在X_m时对应的服务需求；

为用户特征，包括用户的性别、年龄、职业等；

表示服务特征信息。每一个服务特征信息包括与服务应用相关的文本数据与图像数据，其中，

表示k服务项的特征数据，

表示k服务项的文本数据特征，

表示k服务项的图像数据特征。For the service demand prediction problem, let the training data set be T=[(X ₁ ,Y ₁ ),(X ₂ ,Y ₂ ),…,(X _m ,Y _m ),…,(X _n ,Y _n )] , n is the data size of the training dataset. in,

represents the training data of the mth item, and Y _m represents the corresponding service demand of the user at X _m ;

User characteristics, including the user's gender, age, occupation, etc.;

Indicates service characteristic information. Each service feature information includes text data and image data related to the service application, wherein,

represents the characteristic data of k service items,

represents the text data features of k service items,

Image data features representing k service items.

为了实现不同模态数据特征的有效融合，采用一种特征共享机制，实现多模态数据特征之间的关联，进而提升服务需求预测的精确度。具体的，在SAMML模型中，特征融合是由文本特征网络M_txt和图像特征网络M_img组成，并且M_txt和M_img由全连接网络构成。以用户的服务项k为例，将服务项k的文本特征序列

和图像特征序列

分别输入到M_txt和M_img网络；将文本特征

与图像特征网络M_img中每层dense的输出P_img进行逻辑相加，将图像特征

与文本特征网络M_txt中每层dense的输出P_txt进行逻辑相加。如图2所示为共享模块的网络结构示意图。In order to realize the effective fusion of different modal data features, a feature sharing mechanism is adopted to realize the correlation between multi-modal data features, thereby improving the accuracy of service demand prediction. Specifically, in the SAMML model, the feature fusion is composed of a text feature network M _txt and an image feature network M _img , and M _txt and M _img are composed of a fully connected network. Taking the user's service item k as an example, the text feature sequence of the service item k is

and image feature sequence

Input to M _txt and M _img networks respectively; the text features

It is logically added with the output P _img of each layer of dense in the image feature network M _img , and the image features

It is logically added to the output P _txt of each layer of dense in the text feature network M _txt . Figure 2 is a schematic diagram of the network structure of the shared module.

设定特征共享模块的输入层的节点数为a，层数为c，特征向量在经过第l层时，输出文本特征网络

图像特征网络

其中l∈[1,c]。在经过第l层的操作后，以M_txt网络传输为例，特征融合公式如(1)式和(2)式所示：Set the number of nodes in the input layer of the feature sharing module as a, and the number of layers as c. When the feature vector passes through the lth layer, the text feature network is output.

image feature network

where l∈[1,c]. After the operation of the first layer, taking _Mtxt network transmission as an example, the feature fusion formula is shown in formulas (1) and (2):

在上述公式中，·表示点乘，

为M_txt中经过l-1层特征共享后的文本特征向量，

表示M_txt中第l层的激活函数ReLu,

表示M_txt中第l层的权重矩阵向量，

表示M_txt中第l层的偏置值。最后，将特征共享向量

和

经过一层全连接网络输出用户服务的特征共享表达向量Feature share(Fs),Fs^k的计算公式如(3)式所示：In the above formula, · represents the dot product,

is the text feature vector after l-1 layer feature sharing in M _txt ,

represents the activation function _ReLu of the lth layer in Mtxt,

represents the weight matrix vector of the lth layer in _Mtxt ,

Represents the bias value of the lth layer in _Mtxt . Finally, share the features with the vector

and

After a layer of fully connected network outputs the feature share expression vector Feature share(Fs) of the user service, the calculation formula of ^Fsk is shown in formula (3):

其中，σ₁表示全连接网络中的ReLu激活函数，W₁表示权重矩阵，b₁表示偏置值，

表示向量拼接操作符号。where σ ₁ represents the ReLu activation function in the fully connected network, W ₁ represents the weight matrix, b ₁ represents the bias value,

Represents the vector concatenation operator symbol.

(二)基于Soft Attention机制的特征权重获取模块(2) Feature weight acquisition module based on Soft Attention mechanism

软注意力(Soft Attention，SA)机制通过选择性地忽略部分信息来对其余信息进行重加权聚合计算，所有信息在被聚合之前会以自适应的方式进行重新加权，这样可以分离出重要信息，并避免这些信息受到不重要信息的干扰，从而提高准确性。本公开选择使用SA机制获取特征信息的权重，保证在训练模型过程中特征向量都被学习到，从而增强表达向量的关联性；在得到多样化的服务兴趣表达向量之后，让模型在学习的过程中，通过用户使用服务序列调整用户多样化的服务兴趣对用户服务需求的影响权重值；然后，将权重和用户多样化的服务兴趣表达向量相乘，并输入到GRU网络中，动态建模用户多样化服务兴趣的变化过程。SAMML模型以特征共享向量Fs作为软注意力机制的输入，根据下列操作最终计算出不同服务特征向量的加权平均，直观分析出不同服务向量之间的占比率。基于SA的权重获取步骤如下：The Soft Attention (SA) mechanism reweights and aggregates the remaining information by selectively ignoring part of the information. All information will be re-weighted in an adaptive manner before being aggregated, so that important information can be separated. And avoid the interference of these information by unimportant information, thus improving the accuracy. The present disclosure chooses to use the SA mechanism to obtain the weight of the feature information to ensure that the feature vectors are all learned during the model training process, thereby enhancing the relevance of the expression vectors; after obtaining diverse service interest expression vectors, let the model learn in the process of learning , adjust the weight value of the influence of the user's diverse service interests on the user's service demand through the user's service sequence; then, multiply the weight and the user's diverse service interest expression vector, and input it into the GRU network to dynamically model the user The changing process of diverse service interests. The SAMML model uses the feature sharing vector Fs as the input of the soft attention mechanism, and finally calculates the weighted average of different service feature vectors according to the following operations, and intuitively analyzes the ratio between different service vectors. The steps for obtaining weights based on SA are as follows:

Step 1：初始化。定义注意力变量z表示需要查询的索引值；z∈[1,N]，N表示为用户服务特征项总量；当z＝k时，表明选择了第k项的特征共享向量Fs^k。Step 1: Initialization. Define the attention variable z to represent the index value to be queried; z∈[1,N], N represents the total amount of user service feature items; when z=k, it indicates that the feature sharing vector Fs ^k of the kth item is selected.

Step 2：在确定了查询向量q和特征共享向量Fs^k之后，对查询向量q与查询键Fs^k相似度进行计算比较，依据公式(4)计算第k项的特征共享向量的概率α_k，并进行归一化调整。Step 2: After the query vector q and the feature sharing vector Fs ^k are determined, the similarity between the query vector q and the query key Fs ^k is calculated and compared, and the probability α _k of the feature sharing vector of the kth item is calculated according to formula (4), and make a normalization adjustment.

Step 3：进行加权平均。在注意力分布α_k查询向量q时，Fs中的第k项的特征共享信息与查询向量q的关联程度，得到Soft Attention的值，如公式(6)所示。Step 3: Perform a weighted average. When the attention distribution α _k queries the vector q, the degree of correlation between the feature sharing information of the kth item in Fs and the query vector q is obtained, and the value of Soft Attention is obtained, as shown in formula (6).

Step 4：在计算出不同特征共享信息关联度之后，分别针对不同的特征共享信息，依据(6)式的结果进行处理，并将输出作为结果，依次输入到GRU网络，进行下一步操作。Step 4: After calculating the correlation degree of different feature sharing information, process the different feature sharing information according to the result of formula (6), and take the output as the result and input it to the GRU network in turn for the next step.

其中，α_k的概率向量表示注意力分布，S^k是注意力的打分函数，本公开采用点积模型作为打分函数，其计算公式如(5)式所示。Among them, the probability vector of α _k represents the attention distribution, and ^Sk is the scoring function of attention. The present disclosure adopts the dot product model as the scoring function, and its calculation formula is shown in formula (5).

S^k＝s(Fs^k,q)＝(Fs^k)^Tq (5)S ^k =s(Fs ^k ,q)=(Fs ^k ) ^T q (5)

(三)基于GRU的服务兴趣提取模块(3) Service interest extraction module based on GRU

近年来，GRU(Gate Recurrent Unit)神经网络作为LSTM(Long-Short TermMemory)神经网络的变体，在NLP以及时序数据处理方面得到了广泛成功的应用。与LSTM相比，GRU网络具有结构简单、计算速度快等优点。为此，本公开使用GRU网络对服务特征共享向量进行学习，提取用户服务使用的兴趣。所述GRU网络的结构如图3所示。其中，r_t和z_t分别表示重置门和更新门。更新门用来控制上一刻的服务状态信息被保留在当前状态中的程度，z_t的值越大说明上一刻服务信息留在当前状态的越多；重置门的作用是控制上一刻的服务信息有多少被写入当前候选集

上，r_t的值越小，被写入的信息量越少。经过SA机制处理后的数据作为GRU网络的输入x。In recent years, GRU (Gate Recurrent Unit) neural network, as a variant of LSTM (Long-Short TermMemory) neural network, has been widely and successfully applied in NLP and time series data processing. Compared with LSTM, GRU network has the advantages of simple structure and fast calculation speed. To this end, the present disclosure uses the GRU network to learn the service feature sharing vector to extract the user's interest in service usage. The structure of the GRU network is shown in FIG. 3 . where r _t and z _t represent the reset gate and update gate, respectively. The update gate is used to control the degree to which the service status information of the last moment is retained in the current state. The larger the value of z _t , the more the service information of the last moment is left in the current state; the function of the reset gate is to control the service status of the last moment. How much information is written to the current candidate set

On the other hand, the smaller the value of r _t , the less information is written. The data processed by the SA mechanism is used as the input x of the GRU network.

其中，服务特征共享信息在GRU网络中处理过程如下：The processing process of the service feature sharing information in the GRU network is as follows:

Step 1：根据当前状态x_t与上一刻的隐藏状态h_t-1经过z_t输出[0,1]之间的单位值，具体操作如公式(7)所示：Step 1: Output the unit value between [0,1] through z _t according to the current state x _t and the hidden state h _t-1 of the previous moment. The specific operation is shown in formula (7):

Step 2：根据x_t与上一刻的隐藏状态h_t-1，经过r_t输出[0,1]之间的单位值，同时，函数tanh创建此刻候选值向量

具体操作公式如(8)式(9)式所示：Step 2: According to x _t and the hidden state h _t-1 of the previous moment, output the unit value between [0, 1] through r _t , and at the same time, the function tanh creates the candidate value vector at the moment

The specific operation formula is shown in formula (8) (9):

Step 3：由z_t作为权重向量，候选向量与前一时刻输出向量通过加权平均得到GRU网络的输出h_t。具体操作公式如(10)所示：Step 3: Using z _t as the weight vector, the candidate vector and the output vector at the previous moment are weighted to obtain the output h _t of the GRU network. The specific operation formula is shown in (10):

针对上述公式，·表示点乘运算，σ为sigmoid函数，tanh表示tanh激活函数，x_t为时刻t状态的输入(时刻t为第k项服务序列经过SA处理后的输入)，h_t-1为上一时刻隐含层的状态函数，r_t为重置门的输出，经过sigmoid函数将结果映射到0～1之间，越接近于1信息越容易被保留；z_t为更新门的输出，经过sigmoid函数将结果映射到0～1之间；

表示t时刻的候选激活状态，由新输入x_t前状态h_t-1和权重W^h计算更新其值；h_t表示t时刻的激活状态，表示GRU网络中第t个隐藏状态向量，根据新的z_t的前一时刻的状态h_t-1和

的值，得到新的GRU的输出值。W^u，W^r，W^h和U^u，U^r，U^h分别表示更新门和重置门的权重矩阵，b^u，b^r，b^h分别表示更新门和重置门的偏置值。For the above formula, · represents the point multiplication operation, σ is the sigmoid function, tanh represents the tanh activation function, x _t is the input of the state at time t (time t is the input of the kth service sequence after SA processing), h _t-1 is the state function of the hidden layer at the last moment, r _t is the output of the reset gate, and the result is mapped between 0 and 1 through the sigmoid function, the closer the information is to 1, the easier it is to be retained; z _t is the output of the update gate , and map the result between 0 and 1 through the sigmoid function;

Represents the candidate activation state at time t, and its value is calculated and updated by the state h _t-1 and the weight W ^h before the new input x _t ; h _t represents the activation state at time t, and represents the t-th hidden state vector in the GRU network. The state h _t-1 at the previous moment of z _t and

value to get the output value of the new GRU. W ^u , W ^r , W ^h and U ^u , U ^r , U ^h represent the weight matrices of the update gate and the reset gate, respectively, and b ^u , ^br , and b ^h represent the bias values of the update gate and the reset gate, respectively.

GRU网络能够对用户每一时刻所使用的服务、以及过去时刻使用的服务对当前时刻使用服务的影响都进行了学习，将学习结果储存在每一时刻的隐藏状态向量中，并在每一时刻输出一个隐藏状态向量来表示学习到的服务兴趣信息，使得GRU网络中每一时刻的隐藏状态向量h_t能够表示用户每一个时刻的服务使用兴趣。为了提高GRU对服务使用兴趣的提取效果，本公开对GRU网络引入一个辅助损失函数L_lf(如(11)所示)，用来计算GRU每个时刻的隐藏状态和下个时刻服务特征融合向量之间的差距。The GRU network can learn the services used by the user at each moment and the impact of the services used in the past on the services used at the current moment, and store the learning results in the hidden state vector at each moment, and at each moment A hidden state vector is output to represent the learned service interest information, so that the hidden state vector h _t at each moment in the GRU network can represent the user's service usage interest at each moment. In order to improve the extraction effect of GRU on service usage interest, the present disclosure introduces an auxiliary loss function L _lf (as shown in (11)) to the GRU network, which is used to calculate the hidden state of the GRU at each moment and the service feature fusion vector at the next moment. gap between.

(四)基于SAMML的服务需求预测(4) Service demand forecast based on SAMML

在获取服务兴趣特征表达向量h_t之后，基于用户的服务兴趣特征表达向量h_t和用户信息特征向量，预测用户下一刻的服务需求。在对服务需求预测模块进行训练时，定义输入数据为

其中，

表示用户信息特征向量，

表示用户最终的服务兴趣表达向量，y_i表示模型的值，表示用户下一时刻的服务需求。服务需求预测模块的预测函数如公式(12)所示：After obtaining the service interest feature expression vector h _t , based on the user's service interest feature expression vector h _t and the user information feature vector, the user's service demand at the next moment is predicted. When training the service demand prediction module, define the input data as

in,

represents the user information feature vector,

It represents the final service interest expression vector of the user, and y _i represents the value of the model, which represents the service demand of the user at the next moment. The forecasting function of the service demand forecasting module is shown in formula (12):

其中，σ₁表示ReLu激活函数，W表示权重矩阵，I_i表示输入数据，b表示偏置值。where σ ₁ represents the ReLu activation function, W represents the weight matrix, I _i represents the input data, and b represents the bias value.

在SAMML模型中，根据用户使用的服务序列，基于多模态机器学习预测用户下一时刻的服务需求问题属于机器学习中的回归问题。对于机器学习中的回归问题，常用的损失函数为平方绝对误差(MAE)，MAE指服务需求预测模型的预测值

与真实的标签值y之间距离的平均值。假设训练数据的样本数量为n，则MAE的计算公式如(13)所示。In the SAMML model, according to the service sequence used by the user, the problem of predicting the user's service demand at the next moment based on multimodal machine learning belongs to the regression problem in machine learning. For regression problems in machine learning, the commonly used loss function is the squared absolute error (MAE), which refers to the predicted value of the service demand forecasting model

Average distance from the true label value y. Assuming that the number of samples of training data is n, the calculation formula of MAE is shown in (13).

SAMML模型总的损失函数L主要由服务需求预测的损失函数L_tag和辅助损失函数L_tf两部分组成。L_tag和L_tf均采用MAE损失函数，只是MAE的输入部分不同。总的损失函数L的计算公式如(14)所示：The total loss function L of the SAMML model is mainly composed of two parts, the loss function L _tag of service demand prediction and the auxiliary loss function L _tf . Both L _tag and L _tf use the MAE loss function, but the input part of MAE is different. The calculation formula of the total loss function L is shown in (14):

L＝L_tag+α*L_tf (14)L=L _tag +α*L _tf (14)

其中，α表示超参数，用于平衡用户服务兴趣的表达和模型的预测。本公开采用Adam优化算法。基于SAMML模型的服务需求预测方法如算法1所示。Among them, α represents a hyperparameter, which is used to balance the expression of user service interest and the prediction of the model. The present disclosure adopts the Adam optimization algorithm. The service demand prediction method based on SAMML model is shown in Algorithm 1.

------------------------------------------------------------------------------------------------

算法1：基于SAMML的服务需求动态预测算法Algorithm 1: Service Demand Dynamic Prediction Algorithm Based on SAMML

阶段1：SAMML模型的训练Stage 1: Training of the SAMML model

输入：Data//模型训练的数据集DataInput: Data//Dataset Data for model training

1.模型初始化参数；1. Model initialization parameters;

2.FOR i TO N DO；//N为数据量的批量数2. FOR i TO N DO; //N is the batch number of data volume

3.输入训练数据项(X_i,Y_i)；3. Input training data items (X _i , Y _i );

4.根据公式(1)，实现文本特征学习网络M_txt第l-1(1≤l≤c)层的输出

与图片特征向量

的特征共享操作；4. According to formula (1), realize the output of the l-1 (1≤l≤c) layer of the text feature learning network _Mtxt

with picture feature vector

The feature sharing operation of ;

5.根据公式(2)获取文本特征学习网络M_txt第l-1(1≤l≤c)层的输出

5. Obtain the output of the l-1 (1≤l≤c) layer of the text feature learning network _Mtxt according to formula (2)

6.重复上述4、5步骤，将图片特征与文本特征融合，并获取图片的第l-1(1≤l≤c)层的输出

6. Repeat the above steps 4 and 5, fuse the image features with the text features, and obtain the output of the l-1 (1≤l≤c) layer of the picture

7.根据公式(3)，获取用户的使用服务表达向量Fs^k；7. According to formula (3), obtain the use service expression vector Fs ^k of the user;

8.根据公式(4)(5)，计算软注意力机制针对Fs^k的打分函数，并获取服务向量的注意力分布；8. According to formula (4) (5), calculate the scoring function of the soft attention mechanism for Fs ^k , and obtain the attention distribution of the service vector;

9.根据公式(6)，对Fs^k进行加权平均，获取不同服务之间的关联程度；9. According to formula (6), perform a weighted average on ^Fsk to obtain the degree of association between different services;

10.根据公式(7)与(8)，输出GRU网络的更新门和输出门；10. According to formulas (7) and (8), output the update gate and output gate of the GRU network;

11.根据公式(9)与(10)，计算用户使用服务的表达向量h_t；11. According to formulas (9) and (10), calculate the expression vector h _t of the service used by the user;

12.根据公式(11)、(12)、(13)、(14)，计算辅助函数值、损失函数值、预测函数值以及总损失；12. According to formulas (11), (12), (13), (14), calculate the auxiliary function value, the loss function value, the prediction function value and the total loss;

13.更新SAMML模型参数；13. Update SAMML model parameters;

14.END FOR；14.END FOR;

15.UNTIL(直至满足模型训练结束条件)；15. UNTIL (until the end condition of model training is met);

阶段2：服务需求预测Stage 2: Service Demand Forecast

16.根据数据I_i输入并运行SAMML模型；16. Input and run the SAMML model according to the data I _i ;

17.输出：用户的服务需求；17. Output: user's service requirements;

------------------------------------------------------------------------------------------------------ --

进一步的，为了证明本公开所述方案的有效性，以下进行了具体实验证明：Further, in order to prove the effectiveness of the scheme described in the present disclosure, the following specific experiments were carried out to prove:

(1)实验环境与实验数据(1) Experimental environment and experimental data

为了验证所提出方法的有效性，本公开采用阿里云-天池提供的Debiasing数据集1对本公开所提出的方法进行实验验证。数据集文件以CSV格式存储，编码格式为UTF-8。数据集包含一百多万条记录信息，主要包括用户特征、商品特征以及标签，这里将商品映射为服务。用户特征包括用户ID、年龄、性别等；服务特征包括文本特征txt_vec和图像特征img_vec，将商品的id标记为标签item_id。用户CSV文件的用户数据信息包括如下表1所示:In order to verify the effectiveness of the proposed method, the present disclosure uses the Debiasing data set 1 provided by Alibaba Cloud-Tianchi to experimentally verify the proposed method. Dataset files are stored in CSV format, encoded in UTF-8. The dataset contains more than one million records, mainly including user features, product features, and labels. Here, products are mapped to services. User features include user ID, age, gender, etc.; service features include text feature txt_vec and image feature img_vec, marking the id of the product as the tag item_id. The user data information of the user CSV file includes the following table 1:

表1 Debiasing数据信息Table 1 Debiasing data information

实验环境为：操作系统Windows 10专业版64位，CPU Intel i7 5500U，RAM 4+4GB；使用Python与TensorFlow 2.0实现SAMML模型。本公开采用平方绝对误差MAE、均方误差MSE、均方根误差RMSE和R²指标来评估SAMML的性能。MAE的计算公式如式(13)，MSE、RMSE和R²的计算公式如式(15)-(17)所示：The experimental environment is: operating system Windows 10 Professional Edition 64-bit, CPU Intel i7 5500U, RAM 4+4GB; using Python and TensorFlow 2.0 to implement the SAMML model. The present disclosure employs squared absolute error, MAE, mean squared error, MSE, root mean ^squared error, RMSE, and R2 metrics to evaluate the performance of SAMML. The calculation formula of MAE is shown in formula (13), and the calculation formulas of MSE, RMSE and R ² are shown in formulas (15)-(17):

其中，MAE、MSE和RMSE的值越小，表示模型的预测精度越高，R²的值越大表明服务需求预测模型的预测精度越高。Among them, the smaller the values of MAE, MSE and RMSE, the higher the prediction accuracy of the model, and the ^larger the value of R2, the higher the prediction accuracy of the service demand prediction model.

(2)模型参数设置(2) Model parameter settings

SAMML模型中，特征共享的目的是为了将两种模态的数据特征向量融合，提高对用户与服务的关联性以及表达能力。该模块的网络层数M对模型精度具有一定影响，为了能使SAMML模型具有较好的预测能力，本实验先通过设置不同的网络层数进行实验。在该实验中，分别设定初始学习率为0.001和0.0001，然后设置不同的网络层数M来观察SAMML模型的评估指标(MAE、MSE、RMSE、R²)的变化，进而确定特征共享模块中网络层数的最佳值。实验结果如表2、表3所示。从上表2、表3可以看出，增加层数有助于提高SAMML模型的预测准确度，随着网络层数的增加，模型的精度呈现正态分布趋势。In the SAMML model, the purpose of feature sharing is to fuse the data feature vectors of the two modalities to improve the relevance and expressiveness of users and services. The number of network layers M of this module has a certain influence on the accuracy of the model. In order to make the SAMML model have better prediction ability, this experiment is conducted by setting different network layers. In this experiment, the initial learning rates are set to 0.001 and 0.0001 respectively, and then different network layers M are set to observe the changes of the evaluation indicators (MAE, MSE, RMSE, R ² ) of the SAMML model, and then determine the feature sharing module in the The optimal value for the number of network layers. The experimental results are shown in Table 2 and Table 3. From Table 2 and Table 3 above, it can be seen that increasing the number of layers helps to improve the prediction accuracy of the SAMML model. As the number of network layers increases, the accuracy of the model presents a normal distribution trend.

表2学习率＝0.0001时、网络层M对SAMML模型的结果Table 2 The results of the network layer M on the SAMML model when the learning rate = 0.0001

表3学习率＝0.00001时、网络层M对SAMML模型的结果Table 3 The results of the network layer M on the SAMML model when the learning rate = 0.00001

然而，在增加特征共享模块网络层数时，需要学习更多参数，占用更长的训练时间，增加过拟合的风险。根据实验结果，在网络层数为3时，各项指标均为相对情况下的最优且稳定，因此，本公开确定特征共享模块的网络层数为3，设定学习率为0.0001。在SAMML模型中，特征共享模块内每层网络的神经元节点数目对模型预测也具有一定的影响，为了使服务需求预测模型具有较高的预测精度，在SAMML模型中，分别设置特征共享模块内每层网络的神经元节点数为16、32、64、128与256，通过实验来确定神经元节点最佳取值，实验结果如表4所示。However, when increasing the number of network layers of the feature sharing module, more parameters need to be learned, which takes longer training time and increases the risk of overfitting. According to the experimental results, when the number of network layers is 3, each index is optimal and stable under relative conditions. Therefore, the present disclosure determines that the number of network layers of the feature sharing module is 3, and the learning rate is set to 0.0001. In the SAMML model, the number of neuron nodes in each layer of the network in the feature sharing module also has a certain influence on the model prediction. The number of neuron nodes in each layer of the network is 16, 32, 64, 128 and 256. The optimal value of neuron nodes is determined through experiments. The experimental results are shown in Table 4.

表4特征共享模块中神经元节点数对SAMML模型的影响Table 4 Influence of the number of neuron nodes in the feature sharing module on the SAMML model

从上表4可以看出，当神经元节点数16和64时，评估指标MAE、MSE、RMSE、R²的值相对最优，随着神经元节点数量的增加，模型的预测性略微有所不同。同时，每层网络的神经元节点数量过低容易导致数据的拟合不足，过多则会增加模型过拟合的风险。根据实验结果，综合比较，本公开将SAMML模型的特征共享模块每层网络节点数量设置为64。As can be seen from Table 4 above, when the number of neuron nodes is 16 and 64, the values of the evaluation indicators MAE, MSE, RMSE, and R ² are relatively optimal. With the increase of the number of neuron nodes, the predictability of the model is slightly improved. different. At the same time, if the number of neuron nodes in each layer of the network is too low, it will easily lead to insufficient data fitting, and too many will increase the risk of model overfitting. According to the experimental results and comprehensive comparison, the present disclosure sets the number of network nodes in each layer of the feature sharing module of the SAMML model to 64.

在SAMML模型中，采用Adam算法对模型的参数进行调优。Adam算法的学习率对SAMML模型的稳定性和学习能力有较大的影响，为了使得模型具有较强的预测能力，该实验在SAMML模型中，分别设置学习率为1e-2、1e-3、1e-4、1e-5，训练模型并保存实验结果，实验结果如图4(a)至图4(d)所示。In the SAMML model, the Adam algorithm is used to tune the parameters of the model. The learning rate of the Adam algorithm has a great influence on the stability and learning ability of the SAMML model. In order to make the model have strong predictive ability, in this experiment, the learning rate of the SAMML model is set to 1e-2, 1e-3, 1e-4, 1e-5, train the model and save the experimental results. The experimental results are shown in Figure 4(a) to Figure 4(d).

从图4(a)至图4(d)以看出，当学习率为1e-4时，使用100个epoch时已经显示出拟合。随着在训练集上epoch的增加，测试集上的loss没有降低，当学习率为1e-5时，随着训练集epoch的增加，测试集上的loss还在减小，是欠拟合的，直到300个epoch才比较平缓，不再降低。结合图表分析，显然epoch越小就能达到拟合的效果越好，并且实验结果也显示精度更高。根据上述分析，本公开将学习率定为1e-4，即0.0001。From Figure 4(a) to Figure 4(d), it can be seen that when the learning rate is 1e-4, the fitting has been shown when using 100 epochs. As the epoch on the training set increases, the loss on the test set does not decrease. When the learning rate is 1e-5, as the epoch on the training set increases, the loss on the test set is still decreasing, which is underfitting. , until the 300 epoch is relatively flat and no longer decreases. Combined with the chart analysis, it is obvious that the smaller the epoch, the better the fitting effect, and the experimental results also show that the accuracy is higher. According to the above analysis, the present disclosure sets the learning rate as 1e-4, which is 0.0001.

(3)模型性能比较(3) Model performance comparison

为了验证本公开所提出的预测模型的性能，本公开选取了四种基于多模态机器学习的预测模型与本公开所提出的方法进行比较。四种典型的预测模型分别为：RBMI(Recommendation Based on Multimodal Information)、Multimodal IRIS(Interest-Related Item Similarity Model Based on Multimoda)、SDML(Scalable deepmultimodal learning)和IMMML(Improved Multimodal Machine Learning)。该实验使用数据集的80％作为模型的训练数据，使用数据集的20％作为模型的测试数据。依据评估指标来评测每一个模型的性能，实验结果如表5所示。In order to verify the performance of the prediction models proposed in the present disclosure, the present disclosure selects four multimodal machine learning-based predictive models to compare with the methods proposed in the present disclosure. Four typical prediction models are: RBMI (Recommendation Based on Multimodal Information), Multimodal IRIS (Interest-Related Item Similarity Model Based on Multimoda), SDML (Scalable deepmultimodal learning) and IMMML (Improved Multimodal Machine Learning). This experiment uses 80% of the dataset as training data for the model and 20% of the dataset as test data for the model. The performance of each model is evaluated according to the evaluation indicators, and the experimental results are shown in Table 5.

表5不同模型在数据集上的性能评估Table 5. Performance evaluation of different models on the dataset

从上表5可以看出，SAMML模型在评估指标MAE、MSE、RMSE和R²上均优于其他对比模型。在评估指标R²上，SAMML模型优于其他对比模型中最优结果3.1％；在指标MAE、MSE和RMSE上，分别领先于次优结果2.18％、2.63％和2.73％。通过上表对比结果得出，本公开所提出的SAMML模型通过引入软注意力机制，降低了多模态之间的特征向量表达差异，提升了用户服务需求的预测精确度。As can be seen from Table ⁵ above, the SAMML model outperforms other comparative models on the evaluation metrics MAE, MSE, RMSE and R2. ^On the evaluation metric R2, the SAMML model outperforms the best results among other comparison models by 3.1%; on the metrics MAE, MSE, and RMSE, it leads the suboptimal results by 2.18%, 2.63%, and 2.73%, respectively. From the comparison results in the above table, it can be concluded that the SAMML model proposed in the present disclosure reduces the difference in feature vector expression between multimodalities by introducing a soft attention mechanism, and improves the prediction accuracy of user service requirements.

为了更好的预测用户的服务需求，本公开提出了一种基于软注意力与多模态机器学习的服务需求动态预测方法。该方法首先通过特征共享模块将用户服务的多维服务特征进行融合，增强用户服务的关联性；然后引入Soft-Attention机制，从而使模型能够动态改变权重，以此来改变对用户服务需求的影响；最后，依据用户信息和服务的多模态特征表达向量，通过全连接网络预测用户的服务需求。基于大量的真实数据集进行了大量实验测试，通过与其他典型的多模态模型相比，验证了本公开所提出方法的优越性。In order to better predict the service requirements of users, the present disclosure proposes a dynamic prediction method for service requirements based on soft attention and multimodal machine learning. This method first fuses the multi-dimensional service features of user services through the feature sharing module to enhance the relevance of user services; then introduces the Soft-Attention mechanism, so that the model can dynamically change the weight to change the impact on user service requirements; Finally, according to the multimodal feature expression vector of user information and service, the user's service demand is predicted through the fully connected network. Extensive experimental tests are carried out based on a large number of real data sets, and the superiority of the method proposed in this disclosure is verified by comparing with other typical multimodal models.

实施例二：Embodiment 2:

本实施例的目的是提供一种基于注意力机制及多模态的服务需求动态预测系统。The purpose of this embodiment is to provide a dynamic prediction system for service demand based on attention mechanism and multimodality.

一种基于注意力机制及多模态的服务需求动态预测系统，包括：A dynamic prediction system for service demand based on attention mechanism and multimodality, including:

在更多实施例中，还提供：In further embodiments, there is also provided:

一种电子设备，包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令，所述计算机指令被处理器运行时，完成实施例一中所述的方法。为了简洁，在此不再赘述。An electronic device includes a memory, a processor, and computer instructions stored on the memory and executed on the processor, and when the computer instructions are executed by the processor, the method described in the first embodiment is completed. For brevity, details are not repeated here.

应理解，本实施例中，处理器可以是中央处理单元CPU，处理器还可以是其他通用处理器、数字信号处理器DSP、专用集成电路ASIC，现成可编程门阵列FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that, in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general-purpose processors, digital signal processors DSP, application-specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic devices , discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

存储器可以包括只读存储器和随机存取存储器，并向处理器提供指令和数据、存储器的一部分还可以包括非易失性随机存储器。例如，存储器还可以存储设备类型的信息。The memory may include read-only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

一种计算机可读存储介质，用于存储计算机指令，所述计算机指令被处理器执行时，完成实施例一中所述的方法。A computer-readable storage medium is used to store computer instructions, and when the computer instructions are executed by a processor, the method described in the first embodiment is completed.

实施例一中的方法可以直接体现为硬件处理器执行完成，或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器、闪存、只读存储器、可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器，处理器读取存储器中的信息，结合其硬件完成上述方法的步骤。为避免重复，这里不再详细描述。The method in the first embodiment can be directly embodied as being executed by a hardware processor, or executed by a combination of hardware and software modules in the processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware. To avoid repetition, detailed description is omitted here.

上述实施例提供的一种基于注意力机制及多模态的服务需求动态预测方法及系统可以实现，具有广阔的应用前景。The method and system for dynamic prediction of service demand based on attention mechanism and multi-modality provided by the above embodiments can be realized and have broad application prospects.

以上所述仅为本公开的优选实施例而已，并不用于限制本公开，对于本领域的技术人员来说，本公开可以有各种更改和变化。凡在本公开的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本公开的保护范围之内。The above descriptions are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. For those skilled in the art, the present disclosure may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure shall be included within the protection scope of the present disclosure.

Claims

1. A dynamic prediction method for service demand based on attention mechanism and multi-mode is characterized by comprising the following steps:

acquiring text data and image data generated in the service use process;

respectively extracting the characteristics of the text data and the image data; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;

the prediction model based on soft attention and multi-modal machine learning specifically comprises: realizing the fusion of multi-modal data features based on a feature sharing mechanism; processing the fused features by using a soft attention mechanism, and inputting an obtained result into a pre-trained GRU network to obtain service interest feature vector representation of a user; and based on the user information characteristics and the service interest characteristic vector representation thereof, the service demand of the user at the next moment is predicted through the full connection layer.

2. The method according to claim 1, wherein the feature-sharing mechanism is used to implement fusion of multi-modal data features, specifically: respectively inputting the extracted text features and image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and logically adding the image characteristic and the output of each full-connection layer of the text characteristic network, and finally passing the output of the text characteristic network and the image characteristic network through one full-connection layer to obtain a fusion result.

3. The method as claimed in claim 1, wherein the text feature network and the image feature network are composed of a plurality of fully connected layers.

4. The method according to claim 1, wherein the fused features are processed by using a soft attention mechanism, specifically: and calculating the weight of the fused feature information based on a soft attention mechanism, and obtaining diversified service interest expression vectors.

5. The method according to claim 1, wherein the obtained result is input to a pre-trained GRU network to obtain a service interest feature vector representation of the user, specifically: the GRU network learns the service used by the user at each moment and the influence of the service used at the past moment on the service used at the current moment, the learning result is stored in the hidden state vector at each moment, and a hidden state vector is output at each moment to represent the learned service interest information, so that the service use interest of the user at each moment is obtained.

6. The method as claimed in claim 1, wherein an auxiliary penalty function is introduced into the GRU network, and the difference between the hidden state of the GRU at each time and the service feature fusion vector at the next time is calculated by the auxiliary penalty function.

7. An attention-based and multi-modal dynamic prediction system for service demand, comprising:

a data acquisition unit for acquiring text data and image data generated during service use;

a demand prediction unit for performing feature extraction on the text data and the image data, respectively; inputting the extracted features into a pre-trained prediction model based on soft attention and multi-modal machine learning to realize prediction of service requirements of the user at the next moment;

8. The system according to claim 7, wherein the feature-sharing mechanism is used to implement fusion of multi-modal data features, specifically: respectively inputting the extracted text features and image features into a text feature network and an image feature network, and logically adding the text features and the output of each full-connection layer of the image feature network; and logically adding the image characteristic and the output of each full-connection layer of the text characteristic network, and finally passing the output of the text characteristic network and the image characteristic network through one full-connection layer to obtain a fusion result.

9. An electronic device comprising a memory, a processor and a computer program stored and executed on the memory, wherein the processor implements a method for dynamic prediction of service demand based on attention mechanism and multi-modality as claimed in any one of claims 1 to 6 when executing the program.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements a method for dynamic prediction of service demand based on attentional mechanisms and multi-modalities according to any of claims 1-6.