CN114647787A

CN114647787A - A user-personalized recommendation method based on multimodal data

Info

Publication number: CN114647787A
Application number: CN202210322829.8A
Authority: CN
Inventors: 郭楠; 傅章鹏; 高天寒
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-06-21
Anticipated expiration: 2042-03-30
Also published as: CN114647787B

Abstract

The invention provides a user personalized recommendation method based on multi-modal data, and relates to the technical field of network recommendation. According to the method, after historical behavior records and mapping logs allowed to be collected by a user are obtained, the characteristics of all objects related to the historical behavior records and the characteristics of each object to be recommended are extracted and mapped to the same multi-dimensional space, multi-modal data are integrated, then a reinforcement learning model is used as a recommendation system agent, the agent is trained through the collected user records, and the trained recommendation agent is used for recommendation, so that user-customized recommendation of the multi-modal data is achieved. The user historical behavior record, the mapping log, the object set to be recommended and the like oriented in the method provided by the invention can all contain objects (such as texts, pictures, videos and the like) in a plurality of fields, and the difference among multiple modes is blurred by a method of integrating after feature extraction, so that the problem that a traditional recommendation system can only be applied to a single field for recommendation is solved.

Description

A user-personalized recommendation method based on multimodal data

技术领域technical field

本发明涉及网络推荐技术领域，尤其涉及一种基于多模态数据的用户个性化推荐方法。The invention relates to the technical field of network recommendation, in particular to a method for user personalized recommendation based on multimodal data.

背景技术Background technique

近年来，随着互联网的快速发展，现代社会已成为了一个信息化、数字化的社会，数据充斥着整个世界，信息爆炸已成为常态。然而面对大量的数据，用户对信息的利用率反而降低了，即产生了信息超载(Information over load)问题。对此，推荐系统是有效解决信息超载难题的关键技术之一。事实上，随着互联网、物联网和云计算技术的迅猛发展，个性化推荐系统目前已经成为互联网产品的标配，互联网用户在电商、视频、新闻及音乐等领域中将要面对的互联网信息已和推荐系统息息相关。In recent years, with the rapid development of the Internet, modern society has become an information-based and digital society, with data flooding the entire world, and information explosion has become the norm. However, in the face of a large amount of data, the utilization rate of information by users is reduced, that is, the problem of information overload (Information over load) occurs. In this regard, recommendation system is one of the key technologies to effectively solve the problem of information overload. In fact, with the rapid development of Internet, Internet of Things and cloud computing technologies, personalized recommendation systems have now become the standard configuration of Internet products. Internet users will face Internet information in the fields of e-commerce, video, news and music. It has been closely related to the recommendation system.

推荐系统通过获取用户的历史行为数据，如网页的浏览数据、购买历史或项目评级等，从而向用户进行个性化推荐。推荐问题的解决方案有基于内容过滤的推荐系统、基于协同过滤的推荐系统和混合推荐系统；随着深度学习算法的兴起，使用神经网络的各类推荐算法也被广泛应用于推荐系统的实现，如递归神经网络、卷积神经网络、生成式对抗网络等；此外，也有基于强化学习的推荐技术、基于异构网络的推荐技术等。The recommender system makes personalized recommendations to users by acquiring the user's historical behavior data, such as web page browsing data, purchase history or item ratings, etc. The solutions to the recommendation problem include recommendation systems based on content filtering, recommendation systems based on collaborative filtering and hybrid recommendation systems; with the rise of deep learning algorithms, various recommendation algorithms using neural networks are also widely used in the implementation of recommendation systems. Such as recurrent neural network, convolutional neural network, generative adversarial network, etc.; in addition, there are also recommendation technology based on reinforcement learning, recommendation technology based on heterogeneous network, etc.

目前的推荐系统大多只提供了基于单模态的用户个性化推荐问题解决方案，即只在单一领域(如新闻、视频、音乐等)学习用户的历史数据，因此也只能捕获用户在单一领域的兴趣偏好，并以此进行推荐，大大限制了推荐结果的多样性。Most of the current recommendation systems only provide a solution to the problem of user-personalized recommendation based on a single modality, that is, they only learn the user's historical data in a single field (such as news, video, music, etc.), so they can only capture users in a single field. , and make recommendations based on it, which greatly limits the diversity of recommendation results.

发明内容SUMMARY OF THE INVENTION

本发明要解决的技术问题是针对上述现有技术的不足，提供一种基于多模态数据的用户个性化推荐方法，根据用户在多种领域(如新闻、视频、音乐等)形成的历史数据，学习用户偏好并形成用户画像，进而能够对多模态数据进行推荐，扩展推荐结果的多样性。The technical problem to be solved by the present invention is to provide a user-personalized recommendation method based on multimodal data in view of the deficiencies of the above-mentioned prior art. , learn user preferences and form user portraits, which can then recommend multimodal data and expand the diversity of recommendation results.

为解决上述技术问题，本发明所采取的技术方案是：In order to solve the above-mentioned technical problems, the technical scheme adopted by the present invention is:

一种基于多模态数据的用户个性化推荐方法，包括以下步骤：A user-personalized recommendation method based on multimodal data, comprising the following steps:

步骤1：获取总对象集I及待推荐对象集L₀，明确需要面向的多模态信息种类。Step 1: Obtain the total object set I and the to-be-recommended object set L ₀ , and specify the types of multimodal information that need to be oriented.

步骤2：针对面向的每种模态，应用相应的特征提取算法，将总对象集中的每个对象提取特征并映射至数学上的同一多维空间S中。Step 2: For each oriented mode, apply a corresponding feature extraction algorithm to extract features from each object in the total object set and map them to the same mathematical multidimensional space S.

步骤3：获取并积累用户历史行为记录和映像日志。历史行为记录为用户在映像之前的点击历史记录，即已点击对象列表；映像日志则为对用户显示的对象列表和用户对这些对象的点击行为，1表示点击，0表示未点击。Step 3: Obtain and accumulate user historical behavior records and image logs. The historical behavior record is the user's click history before the image, that is, the list of objects that have been clicked; the image log is the list of objects displayed to the user and the user's click behavior on these objects, 1 means click, 0 means no click.

步骤4：初始化推荐系统，将步骤2得到的多维空间S及步骤3得到用户历史行为记录和映像日志输入推荐系统，并设置推荐系统智能体及强化学习环境参数。Step 4: Initialize the recommendation system, input the multi-dimensional space S obtained in step 2 and the user historical behavior records and image logs obtained in step 3 into the recommendation system, and set the recommendation system agent and reinforcement learning environment parameters.

步骤5：执行推荐系统智能体的训练。Step 5: Execute the training of the recommender system agent.

步骤6：使用训练好的推荐系统智能体处理待推荐对象集L₀，使用推荐系统模拟用户面向待推荐对象集L₀的交互行为，并生成对应的映像日志D。Step 6: Use the trained recommender system agent to process the object set L ₀ to be recommended, use the recommendation system to simulate the user's interaction behavior facing the object set L ₀ to be recommended, and generate a corresponding image log D.

步骤7：提取映像日志D中用户进行了交互的对象集，即推荐系统智能体对应该用户预测的个性化交互对象集，作为总推荐列表L。Step 7: Extract the set of objects interacted by the user in the image log D, that is, the set of personalized interaction objects predicted by the recommender system agent corresponding to the user, as the total recommendation list L.

步骤8：根据不同需求，处理总推荐列表L，生成多模态推荐列表。Step 8: According to different requirements, the total recommendation list L is processed to generate a multimodal recommendation list.

步骤9：将生成的多模态推荐列表进行推荐，获得具体的用户交互结果，从而产生映像日志。每当日志积累到给定n条时，进一步训练并更新智能体。Step 9: Recommend the generated multimodal recommendation list to obtain specific user interaction results, thereby generating an image log. The agent is further trained and updated every time the log accumulates to a given n.

采用上述技术方案所产生的有益效果在于：本发明提供的基于多模态数据的用户个性化推荐方法，在获得用户允许收集的历史行为记录以及映像日志后，提取其中涉及到的所有对象，以及每个待推荐对象的特征，映射至同一多维空间中，以此整合多模态数据，再使用强化学习模型作为推荐系统智能体，通过收集到的用户记录训练智能体，并使用经过训练的推荐智能体进行推荐，从而实现对多模态数据的用户个性化推荐。与现有技术相比，本发明提出的方法中面向的用户历史行为记录、映像日志以及待推荐对象集等都可以包含多个领域的对象(如文本、图片、视频等)，并通过特征提取后整合的方法模糊多模态之间的区别，从而解决一个传统推荐系统只能应用在单一领域进行推荐的问题。The beneficial effects of adopting the above technical solutions are: in the user-personalized recommendation method based on multi-modal data provided by the present invention, after obtaining the historical behavior records and image logs that the user allows to collect, extract all objects involved therein, and The features of each object to be recommended are mapped into the same multi-dimensional space to integrate multi-modal data, and then use the reinforcement learning model as the recommendation system agent, train the agent through the collected user records, and use the trained recommendation The agent recommends, so as to realize the user-personalized recommendation for multimodal data. Compared with the prior art, the user's historical behavior records, image logs, and object sets to be recommended in the method proposed by the present invention can all include objects in multiple fields (such as text, pictures, videos, etc.), and can be extracted by feature extraction. The post-integration method blurs the distinction between multiple modalities, thereby solving the problem that a traditional recommender system can only be applied to a single domain for recommendation.

附图说明Description of drawings

图1为本发明实施例提供的推荐方法流程图；1 is a flowchart of a recommendation method provided by an embodiment of the present invention;

图2为本发明实施例提供的智能体(rainbow模型)的训练方法流程图。FIG. 2 is a flowchart of a training method for an agent (rainbow model) provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例，对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明，但不用来限制本发明的范围。The specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. The following examples are intended to illustrate the present invention, but not to limit the scope of the present invention.

如图1所示，本实施例基于多模态数据的用户个性化推荐方法如下所述。As shown in FIG. 1 , the user personalized recommendation method based on multimodal data in this embodiment is as follows.

步骤2：针对面向的每种模态，应用相应的特征提取算法，将总对象集中的每个对象提取特征并映射至数学上的同一多维空间S中。如图像使用CNN或RNN卷积神经网络提取特征，文本使用tf-idf算法或cnn卷积神经网络提取特征，视频通过P3D(Pseudo-3DResidual Networks)残差网络提取特征，音频通过离散小波变换(DWT)或感知线性预测(PLP)算法提取特征等。Step 2: For each oriented mode, apply a corresponding feature extraction algorithm to extract features from each object in the total object set and map them to the same mathematical multidimensional space S. For example, CNN or RNN convolutional neural network is used to extract features for images, tf-idf algorithm or cnn convolutional neural network is used for text extraction, video features are extracted by P3D (Pseudo-3DResidual Networks) residual network, and audio is extracted by discrete wavelet transform (DWT). ) or perceptual linear prediction (PLP) algorithms to extract features, etc.

步骤5：执行推荐系统智能体的训练，如图2所示，过程如下：Step 5: Execute the training of the recommendation system agent, as shown in Figure 2, the process is as follows:

步骤5.1：智能体从步骤3得到的用户历史行为记录和映像日志中请求一名用户相关记录。Step 5.1: The agent requests a user-related record from the user's historical behavior record and image log obtained in step 3.

步骤5.2：内存返回记录，并通过步骤2的结果，将相应对象替换为特征表示。Step 5.2: The memory returns the record, and through the result of step 2, the corresponding object is replaced with the feature representation.

步骤5.3：智能体根据用户的历史行为记录形成用户画像，并根据用户画像依次执行映像，如果智能体动作与真实日志一致，则智能体获得奖励，如果智能体动作与真实日志不一致，则不获得奖励或获得惩罚。Step 5.3: The agent forms a user portrait according to the user's historical behavior records, and executes the images in sequence according to the user portrait. If the agent's action is consistent with the real log, the agent will receive a reward, and if the agent's action is inconsistent with the real log, it will not be awarded. Reward or get punished.

步骤5.4：执行完该用户映像日志中所有条目后，视为完成一次强化学习Frame，判断：如果满足步骤4设置的强化学习环境参数中的训练终止条件，则完成智能体训练，转至步骤6，否则返回至步骤5.1。Step 5.4: After executing all the entries in the user image log, it is regarded as completing a reinforcement learning frame. Judgment: if the training termination condition in the reinforcement learning environment parameters set in step 4 is satisfied, the agent training is completed, and then go to step 6 , otherwise return to step 5.1.

步骤8：根据不同需求，处理总推荐列表L，生成多模态推荐列表；Step 8: According to different requirements, process the total recommendation list L to generate a multimodal recommendation list;

当需要挖掘用户在多个领域上的广泛兴趣，或需要多样化推荐结果模态时，可将总推荐列表L根据具体场景需求进行简单排序后输出，生成多模态推荐列表L₁。When it is necessary to mine users' broad interests in multiple fields, or to diversify recommendation result modalities, the total recommendation list L can be simply sorted and output according to specific scene requirements to generate a multimodal recommendation list L ₁ .

当需要根据某个对象，对用户进行其他领域中的推荐(如为新闻配图，为电影推荐文字超链接等)，可从多维空间S中，根据最近邻原则，选择距离当前对象距离最近的其他模态对象，生成多模态推荐列表L₂。When it is necessary to recommend users in other fields according to a certain object (such as adding pictures for news, recommending text hyperlinks for movies, etc.), from the multi-dimensional space S, according to the nearest neighbor principle, select the closest distance to the current object. Other modal objects, generate a multimodal recommendation list L ₂ .

当需要结合不同层次的知识表达对用户进行更精准有效的推荐时，使用多模态融合(multimodal fusion)算法，将不同模态的特征进行融合，并基于融合特征训练智能体，以生成对同一对象的具有多模态表达的推荐列表L3，如新闻标题上配新闻封面。When it is necessary to combine different levels of knowledge expression to make more accurate and effective recommendations to users, the multimodal fusion algorithm is used to fuse the features of different modalities, and the agent is trained based on the fusion features to generate the same The object's recommendation list L3 with multimodal expressions, such as news headlines with news covers.

本实施例基于MIcrosoft News Dataset(简称MIND)数据集依次执行本发明的方法：This embodiment sequentially executes the method of the present invention based on MIcrosoft News Dataset (MIND for short) data set:

步骤1：由于MIND数据集中仅包含新闻文本数据，不满足多模态任务需求，因此根据MIND数据集中提供的新闻链接，通过spiderFlow爬虫工具(https://github.com/chenyuansgit/spiderFlow)爬取新闻中包含相应图片数据，形成与新闻文本对应的图片数据集。因此获取的总对象集I为所有新闻文本及新闻图片，待推荐对象集L₀为测试集中的映像日志新闻文本及新闻相应的图片集，明确需要面向的多模态信息种类为文本和图像。Step 1: Since the MIND dataset only contains news text data, which does not meet the needs of multimodal tasks, the spiderFlow crawler tool (https://github.com/chenyuansgit/spiderFlow) crawls according to the news links provided in the MIND dataset The news contains corresponding picture data, forming a picture data set corresponding to the news text. Therefore, the obtained total object set I is all news texts and news pictures, and the object set L ₀ to be recommended is the image log news text and the corresponding picture set of the news in the test set, and it is clear that the types of multimodal information that need to be oriented are text and images.

步骤2：针对面向的文本和图像两种模态，分别应用相应的特征提取算法，新闻文本特征通过新闻文本特征通过Tf–idf项加权提取基础文本特征，然后通过卡方检验算法进行特征选择，以减少上述得到的基础文本特征维度，使模型泛化能力更强，减少过拟合，并增强对特征和特征值之间的理解。新闻图片特征则通过keras库中预训练好的VGG16神经网络提取，最后通过umap算法(https://arxiv.org/abs/1802.03426)将总对象集中的每个对象的提取特征映射至数学上的同一多维空间S中。Step 2: For the two modalities of text and images, the corresponding feature extraction algorithms are respectively applied. The news text features are weighted by the Tf-idf term to extract the basic text features, and then the chi-square test algorithm is used for feature selection. In order to reduce the dimension of the basic text features obtained above, the generalization ability of the model is stronger, overfitting is reduced, and the understanding between features and feature values is enhanced. The news picture features are extracted by the pre-trained VGG16 neural network in the keras library, and finally the extracted features of each object in the total object set are mapped to the mathematical in the same multidimensional space S.

步骤3：读取数据集中积累好的用户历史行为记录和映像日志。Step 3: Read the user historical behavior records and image logs accumulated in the data set.

步骤4：初始化推荐系统，将步骤2得到的多维空间S及步骤3得到用户历史行为记录和映像日志输入推荐系统，并设置推荐系统智能体及强化学习环境参数(rainbow模型为例)：动作空间为0～10的整数，除于10即为预测该对象被点击的概率；状态空间为一个多维矩阵，反映用户画像及当前待判定对象特征；奖励则由对于映像日志输出的预测结果而得到的auc指标决定。Step 4: Initialize the recommendation system, input the multi-dimensional space S obtained in step 2 and the user historical behavior records and image logs obtained in step 3 into the recommendation system, and set the recommendation system agent and reinforcement learning environment parameters (rainbow model as an example): action space It is an integer from 0 to 10, divided by 10 to predict the probability of the object being clicked; the state space is a multi-dimensional matrix, reflecting the user portrait and the characteristics of the current object to be determined; the reward is obtained from the prediction result of the image log output auc indicator decision.

步骤5.1：智能体从步骤3得到的集合中请求一名用户相关记录；Step 5.1: The agent requests a user-related record from the set obtained in step 3;

步骤5.2：内存返回记录，并通过步骤2的结果，将相应对象替换为特征表示；Step 5.2: The memory returns the record, and through the result of step 2, the corresponding object is replaced with the feature representation;

步骤5.3：智能体根据用户的历史记录，使用Mean-Shift算法聚类分离用户偏好特征，形成用户画像，并根据得到用户画像逐条预测用户对映像日志中的对象的点击率，并根据预测结果和真实记录计算得到的auc指标给予智能体奖励；Step 5.3: According to the user's historical records, the agent uses the Mean-Shift algorithm to cluster and separate the user's preference features to form a user portrait, and predicts the user's click-through rate on the objects in the image log one by one according to the obtained user portrait. The auc index calculated by the real record is given to the agent as a reward;

步骤5.4：执行完该用户映像日志中所有条目后，视为完成一次强化学习Frame，判断：如果满足步骤4设置的强化学习环境参数中的训练终止条件，则完成智能体训练，否则返回至步骤5.1。Step 5.4: After executing all the entries in the user image log, it is regarded as completing a reinforcement learning frame. Judgment: if the training termination conditions in the reinforcement learning environment parameters set in step 4 are met, the agent training is completed, otherwise, return to the step 5.1.

步骤6：使用训练好的推荐系统智能体处理测试集中的映像日志，根据测试集中的用户历史行为记录预测用户对每个对象的点击概率，并计算auc指数，记录预测结果D。Step 6: Use the trained recommender system agent to process the image log in the test set, predict the user's click probability on each object according to the user's historical behavior record in the test set, calculate the auc index, and record the prediction result D.

步骤7：使用预测结果D中高点击概率对象作为中推荐列表L。Step 7: Use the object with high click probability in the prediction result D as the medium recommendation list L.

步骤8：可根据不同需求，处理总推荐列表L，生成多模态推荐列表；下列需求处理过程如：Step 8: According to different requirements, the total recommendation list L can be processed to generate a multimodal recommendation list; the following requirements processing process is as follows:

当需要根据某个对象，对用户进行其他领域中的推荐(如为新闻配图，或为图片推荐新闻超链接等)，可从多维空间S中，根据最近邻原则，选择距离当前对象距离最近的其他模态对象，生成多模态推荐列表L₂。When it is necessary to recommend users in other fields based on a certain object (such as adding pictures for news, or recommending news hyperlinks for pictures, etc.), from the multi-dimensional space S, according to the nearest neighbor principle, select the closest distance to the current object. other modal objects, generate a multimodal recommendation list L ₂ .

步骤9：将生成的多模态推荐列表进行推荐，获得具体的用户交互结果，从而产生映像日志。每当日志积累到给定n条时，可进一步训练并更新智能体。Step 9: Recommend the generated multimodal recommendation list to obtain specific user interaction results, thereby generating an image log. The agent can be further trained and updated whenever a given n logs are accumulated.

通过上述步骤，最终得到的实验结果指标如表1所示(步骤5分别采用不同的推荐模型)。Through the above steps, the finally obtained experimental result indicators are shown in Table 1 (step 5 adopts different recommendation models respectively).

表1各类推荐模型实验结果Table 1 Experimental results of various recommendation models

采用本发明所述的强化学习模型作为推荐系统智能体进行推荐的方法，在MIcrosoft News Dataset(简称MIND)数据集上进行测试，结果表明强化学习智能体和传统推荐模型的性能指标相近，即能够胜任推荐任务。通过爬虫扩充MIND数据集在图片模态上的数据，使其成为文本、图片混合模态的数据集，并进一步试验，从试验的结果来看，本发明的各项指标也能满足用于对多模态数据的用户个性化推荐需求，达到了推荐系统的要求。The reinforcement learning model of the present invention is used as the recommendation method for the recommendation system agent, and the test is carried out on the MIcrosoft News Dataset (MIND for short) data set. The results show that the performance indicators of the reinforcement learning agent and the traditional recommendation model are similar, that is, it can be Competent for recommended tasks. The data of the MIND data set in the picture mode is expanded by the crawler to make it a data set of text and picture mixed mode, and further experiments are carried out. From the test results, the indicators of the present invention can also meet the requirements for The user's personalized recommendation requirements of multimodal data meet the requirements of the recommendation system.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明权利要求所限定的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some or all of the technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the scope defined by the claims of the present invention.

Claims

1. a user personalized recommendation method based on multimodal data, is characterized in that: the method comprises the following steps:

Step 1: Obtain the total object set I and the object set L ₀ to be recommended, and specify the type of multimodal information that needs to be oriented;

Step 2: For each mode of orientation, apply the corresponding feature extraction algorithm to extract features from each object in the total object set and map them to the same mathematical multidimensional space S;

Step 3: Obtain and accumulate user historical behavior records and image logs;

Step 4: Initialize the recommendation system, input the multi-dimensional space S obtained in step 2 and the user historical behavior record and image log obtained in step 3 into the recommendation system, and set the recommendation system agent and reinforcement learning environment parameters;

Step 5: Execute the training of the recommender system agent;

Step 6: use the trained recommendation system agent to process the object set L ₀ to be recommended, use the recommendation system to simulate the user's interaction behavior towards the object set L ₀ to be recommended, and generate a corresponding image log D;

Step 7: Extract the object set that the user interacted with in the image log D, that is, the personalized interactive object set predicted by the recommender system agent corresponding to the user, as the total recommendation list L;

Step 8: According to different requirements, process the total recommendation list L to generate a multimodal recommendation list;

Step 9: Recommend the generated multimodal recommendation list to obtain specific user interaction results, thereby generating image logs; when the logs accumulate to a given n, further train and update the agent.

2. The user-personalized recommendation method based on multimodal data according to claim 1, wherein the historical behavior record in the step 3 is the click history record of the user before the image, that is, the clicked object list; The image log is the list of objects displayed to the user and the user's click behavior on these objects, 1 means click, 0 means no click.

3. The user-personalized recommendation method based on multimodal data according to claim 1, wherein the specific process of the step 5 is:

Step 5.1: The agent requests a user-related record from the user's historical behavior record and image log obtained in step 3;

Step 5.2: The memory returns the record, and through the result of step 2, the corresponding object is replaced with the feature representation;

Step 5.3: The agent forms a user portrait according to the user's historical behavior records, and executes the images in sequence according to the user portrait. If the agent's action is consistent with the real log, the agent will receive a reward, and if the agent's action is inconsistent with the real log, it will not be awarded. to reward or receive punishment;

Step 5.4: After executing all the entries in the user image log, it is regarded as completing a reinforcement learning frame. Judgment: if the training termination condition in the reinforcement learning environment parameters set in step 4 is satisfied, the agent training is completed, and then go to step 6 , otherwise return to step 5.1.

4. The method for user personalized recommendation based on multimodal data according to any one of claims 1-3, characterized in that: in the step 2, the image uses a CNN convolutional neural network to extract features; the text uses tf- Features are extracted by idf algorithm or CNN convolutional neural network; video features are extracted by P3D residual network, and audio features are extracted by discrete wavelet transform or perceptual linear prediction algorithm.

5. The user-personalized recommendation method based on multimodal data according to any one of claims 1-3, characterized in that: in step 8, when it is necessary to mine the extensive interests of users in multiple fields, or When diversified recommendation result modalities are required, the total recommendation list L is simply sorted according to specific scene requirements and then output to generate a multi-modal recommendation list L ₁ .

6. The method for user personalized recommendation based on multimodal data according to any one of claims 1-3, characterized in that: in the step 8, when it is necessary to perform a user recommendation in other fields according to a certain object When recommending, from the multi-dimensional space S, other modal objects closest to the current object are selected according to the nearest neighbor principle, and a multi-modal recommendation list L ₂ is generated.

7. The user-personalized recommendation method based on multimodal data according to any one of claims 1-3, characterized in that: in step 8, when it is necessary to combine different levels of knowledge expression to perform more accurate and effective on users When recommending, a multi-modal fusion algorithm is used to fuse the features of different modalities, and the agent is trained based on the fused features to generate a recommendation list L3 with multi-modal expressions for the same object.