CN116932896A

CN116932896A - Attention mechanism-based multimode fusion personalized recommendation architecture

Info

Publication number: CN116932896A
Application number: CN202310796244.4A
Authority: CN
Inventors: 张林超
Original assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Current assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Priority date: 2023-07-02
Filing date: 2023-07-02
Publication date: 2023-10-24

Abstract

The invention relates to the technical field of recommendation algorithm models, and discloses a multimode fusion architecture model framework based on an Attention mechanism, wherein a multimode result fusion method is constructed to realize the recommendation function of take-out multiple scenes, the diversity of model recommendation results is improved, the recommendation function of multiple scenes is verified through three scenes of food recommendation, store recommendation and advertisement recommendation of a take-out platform, and the accuracy of the recommendation results is improved through fusion of recommendation results of an AutoInt model and a transform model. And the performance test and implementation are respectively carried out on the proposed model, and meanwhile, the effectiveness of the recommended result is also analyzed. The result shows that the proposed multi-model fusion recommendation architecture can improve recommendation and distribution efficiency in a take-away scene.

Description

Attention mechanism-based multimode fusion personalized recommendation architecture

Technical Field

The invention belongs to the technical field of personalized recommendation, and particularly relates to a personalized recommendation scheme of a multi-model fusion framework. The multi-model fusion architecture improves the accuracy of the recommendation result in multiple scenes, and is suitable for the architecture scheme of the recommendation service in multiple scenes.

Background

With the rapid development of mobile internet technology, the recommendation system becomes a product of new information technology era, and can select products or contents of interest to users, so that the viscosity of the users is increased to a certain extent, the loss rate of the users is reduced, and considerable benefits are brought to enterprises. The current service mode based on O2O has the advantages of solving the limitation of bidirectional selection between individual merchants and users, providing various selections for users and providing wider passenger sources for merchants. However, this consumption mode is not sufficient to provide the user with a diversified choice. For example, in making food recommendations, only a single factor is considered, not the dietary characteristics of different users are fully considered; thus, existing delivery systems still lack personalization aspects. Rush hour order delivery inefficiency also adversely affects the user experience.

The current recommendation system gradually goes to a novel recommendation algorithm based on deep learning from an initial collaborative algorithm and a content algorithm, the deep learning can quickly and efficiently generate a recommendation model based on the recommendation algorithm and data, and repeated iteration of the model and the data enables the task of recommending multiple scenes to be possible. Therefore, an efficient and reliable recommendation system becomes an advantageous tool for improving the economy of the platform.

Disclosure of Invention

The invention aims at improving the satisfaction degree of users from the perspective of recommending service users by the takeaway platform, and solves the problem of recommending service solutions of the platform economy for favorite food, shops and related advertisements of the users. The service plan is implemented efficiently for effectively improving the ordering experience satisfaction of the target user; the invention provides personalized recommendation service of a multi-model fusion framework, and the personalized recommendation service is applied to diversified scenes. The focus of the present invention is therefore on detailed analysis and research of recommended model construction for food, restaurants and advertising. The following is a summary of the invention.

The invention designs a multi-model fusion architecture method, constructs multi-model algorithm fusion based on an attention mechanism, and the invented fusion algorithm mainly solves the following problems.

1. Efficiency of modeling of data feature sequences.

The most critical problem faced by the recommendation algorithm is the data characteristic problem. Feature extraction and relevance assessment of data is critical to the efficiency of establishing data feature behavior. The invention classifies and extracts the training data set data to divide semantic types and provides meaningful recommendation for users. The similar multi-model fusion recommendation has the advantages that the data sequence processing process and the extraction process between the models are completely the same, the system load is reduced, and the modeling efficiency of the feature sequence is improved.

2. Optimal solution recommendation based on multimodal service attributes.

And (3) identifying and decomposing the data attributes, and classifying the sequence set with high association degree, such as attribute identification-attribute set of dish data. The user attribute set mainly comprises a series of attributes such as gender, favorites, order history, location and the like. And extracting the relation between the user and the menu through model training, and generating a recommendation model to obtain an optimal solution.

3. And designing a multi-model fusion recommendation model.

The invention is applied to a take-out recommendation platform, and how to acquire the recommendation satisfaction according to the preference of the user becomes the primary problem of attention of a recommendation system. The recommendation algorithm generates recommendation results based on feature fits between the data features. Enhancing feature performance is an important issue in this regard. By using the thought of the multi-model result, the fusion technology can improve the accuracy of the recommendation model. The idea of result cross fusion is to utilize different training sets to insert the results of different recommendation models into the recommendation results. The method can integrate a plurality of recommendation models quickly and conveniently, and improves the expandability under a multi-model architecture.

4. And a multi-model fusion technology experimental result verification mechanism.

The invention provides feasibility and effectiveness verification based on a fusion method so as to know whether the fusion method has positive influence on a take-out platform. The research focuses on integrating multimodal algorithms to improve recommendation efficiency based on the target user. The study analyzed the data characteristic attribute classification of the multimodal algorithm. The main goal of the personalized service oriented recommendation framework is to provide a better user experience in terms of improving the performance of the existing take-away recommendation service model. In view of the foregoing, the invention herein has important theoretical and practical significance.

The multi-model fusion architecture method is suitable for simultaneously and efficiently displaying recommendation scenes of a plurality of results, each recommendation model receives different feature inputs, and relevant features are extracted for fusion, so that the recommendation features are diversified in the process, and the process comprises the following steps:

(1) Firstly, dividing data into a tracking_set and a test_set, and then dividing the tracking set into a tracking_set and a verification_set;

(2) The multiple models in the first layer fusion model structure may be of the same type or of different types;

(3) In step (2), the train_set trains a plurality of models and predicts the valid_set and the test_set using them, resulting in a valid_result and a prediction_result. Then, it concatenates the validation_result and the prediction_result to construct new prediction data;

(4) Then the second layer takes the formed data as a training set to carry out model learning;

(5) The secondary test set (i.e., new Prediction Dataset) is then predicted using the secondary trained model, and the resulting output is used as the output of the entire test set.

The application method of the attributes expansion algorithm in the fusion model is as shown in fig. 2, and two algorithms, namely AutoInt and transform, are used, namely an improved model algorithm based on self-attribute and multi-attribute respectively. Since the features in the dataset are typically high-dimensional sparse features, autoInt is adapted to handle higher-order cross-over features in model selection. The Transformer is adapted to process the unordered sequence feature so that food, stores, and advertisements are recommended in the take-away scenario. Multiplexing of data features between recommendation algorithms and high data fusion between models improves the efficiency of the recommendation system. The flow of the fusion architecture under the take-away application scene is as follows:

(1) Evaluating the data to generate a data set, constructing a multidimensional knowledge graph data set, and carrying out feature processing and feature sequence vectorization on the data set;

(2) Feature serialization vectors are input into two recommendation algorithms. The recommendation algorithm will process the vector sequence asynchronously. According to the defined knowledge graph, the advertisement recommendation algorithm utilizes an AutoInt model to carry out semantic analysis. Food and restaurant recommendations use a transducer recommendation algorithm to train the required feature vectors. And recommending the different recommended scenes by the primary recommendation model, and outputting other recommendation models as service results. When the fusion prediction scores of the recommendation results are the same, the recommendation results of the scene primary model are preferentially used;

(3) Once the two recommendation algorithms produce output, they are considered to be responsible for different multi-scenario recommendation tasks;

(4) When the multi-model recommendation algorithm is fused, each model outputs a feature sequence with the same dimension, so that the result can be spliced from the complete data set;

(5) The model trains the recommendation models of different scenes according to different recommendation scenes and outputs recommendation results.

The multi-recommendation fusion architecture is applied to a specific method of meal delivery service, as shown in fig. 3, for feature results generated by training data sets of a primary trainer, the data sets are divided into training data sets, test data sets and verification data sets, so that accuracy of high fusion model recommendation is provided, and service efficiency in various scenes is improved. Fusion of multiple models improves accuracy of recommendation. The sections following this chapter explain the data set processing, autoInt model, transformer model and GA. And finally, analyzing the application process and the fusion process of the multi-model fusion. According to the proposed fusion method, a recommended fusion flow chart in a take-away scene is constructed, the input layer distributes data according to 80% of training sets and 20% of test sets, the training data sets are further divided into 70% of training data sets and 30% of verification data sets, the training data sets are input into a recommended model (AutoInt, transformer) for training, the verification data sets are used for verifying each model, verification results of each model are output, and the output results are combined to obtain a verification result set. The prediction dataset is used to predict the model let characteristics. The output results are concatenated to obtain a set of predicted results. The verification result set is used as a new training data set, and the prediction result set is used as a new prediction result set and is input into the multi-head self-attribute. And re-fitting the fused result to output a final result.

The main method of fusion is to cross the recommendation results of the two recommendation models under the fusion model architecture in the take-away recommendation scene as shown in fig. 4, so as to ensure the diversity of the results. The fusion model is characterized in that the recommendation result can be shared. The multi-model structure utilizes an encoder-decoder structure to encode and decode features. The algorithm of the model is a Multi-head Attention structure formed based on a Self-Attention structure and a series of extensions of an Attention mechanism. According to the invention, an AutoInt model, a transducer model and a genetic algorithm based on attention are utilized, and the models are respectively utilized in three scenes of advertisement recommendation, food recommendation and restaurant recommendation; the advertisement recommendation scene is essentially the click rate (CTR) prediction of recommended items, and the research adopts an AutoInt model for feature training; food and restaurant recommendations are applied to package matching and personalized restaurant recommendation scenarios, using different fields in the dataset to divide the dataset into different parts by building different features. The proposed recommendation model implements the AutoInt model. The data features are user behavior features and click features. The user behavior characteristics are constructed according to browsing records, shopping lists and collection lists of users; the click function is built from the clicking actions of the user (click button, click tab, click page). And outputting a result of CTR of the position of the clicked page of the user, predicting the clicking behavior of the user, and putting different advertisements. The transducer model is used to recommend foods and restaurants. The input data features are user features (gender, age, home address, favorite food, historical order), food features (price, category, material, food score), restaurant features (restaurant type, restaurant score, restaurant position), and the food recommendation sequence and the restaurant recommendation sequence after the transform fit are output. And the characteristics are trained by using a transducer model, the recommended results output by the models are shared, and the recommended results are diversified by cross fusion, so that the recommended effect is improved.

In the Multi-head attribute mechanism, the flow and calculation method of the features in the model are shown in fig. 5, in the model, the features are divided into Q, K, V feature standards, each layer Q, K, V in the model maps the input ebedding to a 6-dimensional space, and each layer has 2 head self-attributes. A model network structure as shown is generated. Q, K and V vector mapping in self-attention, the number of clicks in the historical behavioral terms of the user with discrete features and the item of single click are input into the embedding for averaging. The corresponding embedding is taken from the embedding matrix and multiplied by the constant values of the successive features. After the two feature data are transmitted, an equilong transmitting vector can be obtained, and the vectors are spliced together to obtain Interacting Layer input. The characteristic data is subjected to continuous multi-layer characteristic iteration through the self-saturation algorithm, finally, the output value is mapped through the full-connection layer, and then the model parameters are updated through cross entropy loss, so that model iteration in Q, K, V is completed, and a recommendation result is output.

The AutoInt model and the transducer model fuse the working principle, as shown in fig. 6, the data set is first clarified and screened, the input data of the model has < Label > to indicate whether the advertisement is clicked, < Integer Feature > to represent the data Feature, 13 continuous features are selected in total, and < categorical Feature > to represent the classification Feature (discrete Feature) and 26 discrete features in total. The advertisement model utilizes CTR (click through Rate) metrics to predict product push and advertisement placement. The CTR predicts the click condition of each advertisement and predicts whether the user clicks. The CTR estimation model comprehensively considers various factors and characteristics, trains on a large amount of historical data, and finally improves the recommendation service of the advertisement model. And training by inputting a user behavior click sequence by using a transducer model and utilizing the output click prediction of the AutoInt model, wherein the click sequence of each commodity in the user behavior sequence is a combined sequence product function. The Location feature is composed, the series commodity features comprise item_id and category_id, and the Location feature is used for grabbing order information in the sequence. At the same time, the characteristics of the customer can be screened. Based on the information of the gender, the preference, the position and the like of the user, a personalized layer is added into the model, so that the characteristics which are not obvious to the user can be filtered, and the weight of the characteristics is increased, thereby ensuring that the output sequence characteristics of the product can more accurately meet the preference of the user. The inputs to the model are store set data and menu set data. After the two data are cleaned, the input data of the SKU units form a data sequence. The model is cold-started in an initial state, nearby foods and stores are recommended by location analysis, and initial data is acquired after a period of time to update the model recommendation. The output of the transducer model includes a food recommendation sequence and a restaurant recommendation sequence. The restaurant recommendation sequence contains much information about the restaurant, such as restaurant location information, restaurant type, restaurant preferences, etc.

The working principle of the transform model is shown in fig. 7, and in order to better represent the dependency information between the menu and the dish in the data set, the Encoder part extracts the semantic information of the dish by using the hierarchical attribute structure based on the Encoder-Decoder model of the transform structure, and the semantic information comprises a bottom menu, which is divided into two parts of a menu level attribute and an inter-menu attribute. The decoder section extracts user behavior data and SKU unit data of the food/store to input into the encoder. For the Menu-level attribute, the semantic vector of the Menu name and the label is obtained through a Multi-Head attribute structure of the word dimension. Multi-Head content is also used to obtain semantic vectors for the dish label. For the trade properties of dishes we use a multi-layer fully connected network to extract the semantic vector of trade features. Therefore, the dish name semantic vector, the dish label semantic vector and the transaction characteristic semantic vector are spliced together to obtain the dish semantic vector after Feed Forwarding. For the Attention layer among dishes, the study uses a Multi-layer Multi-Head Attention as a menu-level semantic vector list of the restaurant to acquire the menu-level semantic vector of the restaurant. The decoder portion of the model also decodes with Multi-Head Attention. The inputs include contextual information such as user preference information, decoding inputs for historical moments, and price constraints. The model outputs the probability distribution of the dishes in the merchant menu at each step. In the Decoder process, the study performs Multi-Head Attention on the user preference information and semantic vectors at the merchant menu level. And scoring and matching the target menu and the candidate menu through a decoder, and sorting according to the scores. The decoder relies on a fixed dictionary or dictionaries as a candidate set during decoding. It outputs the probability distribution of the words in the candidate set and the words selected at each step. For the recommended network, the candidate set decoded by the decoder comes from the input of the encoder. Item lists rather than fixed-dimension external menu vocabularies. As does restaurant recommendations. The output of the transducer model is also added to the output of the AutoInt model to ensure the diversity of the model results.

As shown in FIG. 8, the working principle of the AutoInt model is that the commodity with higher CTR point rate is selected as commodity advertisement recommendation by the model, and four features are assumed in the process of One-Head Self-attribute in the model, each Feature Embedding size is 3, so that One can be obtainedThe three matrices of Q and K, V in turn Map the original input to the new space to get the QueryMatrix, keyMatrix, valueA matrix. Afterwards we intend to calculate each feature and other featuresAttention to the person. We intend to perform matrix multiplication on the Query matrix and Key matrix and then access the SoftMax layer to get the Attention matrix. The Attention matrix is oneWhere each element a_ij represents the importance between the ith query, key_i, where the sum of each row is 1. Taking the light blue origin in the figure as an example, the second feature in the Attention information Query and the third feature in the Key of the Query are shown. And finally, performing matrix multiplication of the attribute matrix and the Value matrix to obtain a self-attribute result.

The recommendation effect of the fusion recommendation model is shown in fig. 9, and the feature classification output of the fusion model is compared with the feature classification output of the reference model. Three features were selected for classification model testing. The Logistic Regression classifier equally divides the three features into three regions; however, the classification is incomplete, and some features overlap and cannot be classified. The random forest classifier and the RBF core SVM feature classification effect are similar. The fusion model proposed by the present study maximally classifies these three features into respective feature regions. The fusion model provided by the method can classify the characteristics of the articles to the greatest extent in the recommended scene, and the recommending efficiency is improved.

The performance comparison of the fusion model in the application level is shown in fig. 10, the proposed multi-model fusion model result is compared with KDMRA and a MultiCofusion model, the results of the proposed fusion model on three performance indexes can be seen from the performance comparison, and the proposed fusion model can be proved to be more suitable for recommending take-out platform scene application.

Drawings

FIG. 1 is a diagram of a multimodal fusion technique architecture.

Fig. 2 is a proposed multi-model fusion recommendation system architecture.

FIG. 3 is a method of data usage in a multimodal fusion recommendation technique.

Fig. 4 is a generic feature fusion process in a multimodal technique.

Fig. 5 is a flow of a layer of network structure in the proposed Self-Attention mechanism.

FIG. 6 is a flow chart of a detailed multi-model fusion architecture of a recommendation system.

Fig. 7 is a recommended procedure for a encoder-decoder based transducer model.

Fig. 8 is an AutoInt-based advertisement recommendation model recommendation process.

Fig. 9 is a feature comparison analysis of the proposed fusion model features and the underlying classifier.

Fig. 10 is a comparative analysis of the proposed fusion model with other fusion models.

FIG. 11 is a restaurant recommendation result of the fusion recommendation model.

Fig. 12 is the result of the scoring phase of the AutoInt model training data.

Fig. 13 is a score distribution of the AutoInt model in advertisement features.

Detailed Description

Fig. 11 shows a commodity recommendation function test according to the proposed fusion architecture, and the present invention performs two recommendation tests (a) and (b), respectively, and when customer_id= 'ZZV76GY', 5 restaurants are selected from the recommendation list as recommendation results issued by the index. The recommended restaurants carry classification information such as restaurant prediction scores, scores and restaurant vendor_tag_name.

Fig. 12 and 13 show the performance of an advertisement recommendation test according to the proposed fusion architecture, the advertisement recommendation being a CTR point record test in product form, thus training the model using a training dataset, outputting training set results, outputting a result score of label=1. Setting other parameters of the fusion model, processing by the Multi-Head Self-Attention algorithm to obtain the Attention result of each feature, and outputting Importance Result Map after the AutoInt structure is completed. It can be seen that the importance of the intersection feature of I2 and I3 is higher. The higher the cross-over feature, the higher the sequence relevance of the food and restaurant recommendations. And extracting the features with high importance degree through the position features, and sequencing and outputting the results according to the feature scores, so as to obtain advertisement scores with high features, and recommending commodity advertisements according to the feature scores.

Claims

1. A multimode fusion personalized recommendation model based on an attention mechanism is characterized in that (1) a multimode fusion recommendation architecture is invented, (2) personalized recommendation service of a take-out platform is provided by using the fusion architecture, (3) the fusion model mainly provides multiple scenes such as food recommendation, merchant recommendation, advertisement recommendation and the like for the take-out platform, (4) the proposed fusion model provides multiple scene recommendation service, and the result accuracy of the recommendation model under the multiple scenes is improved.

2. The multi-model architecture according to claim (1) is shown in fig. 1, and is characterized in that a feature-oriented multi-model result fusion architecture is used, and the multi-model architecture is characterized in that a primary trainer can theoretically be composed of N models in parallel, the output result of the primary model is selected according to the service scene requirement, the result of the scene feature requirement is combined into a new training data set, the new training data set is input into a secondary trainer, the secondary trainer is usually composed of one model, a primary trainer inner model and a secondary trainer inner model, which are usually similar semantic models, so as to ensure the consistency of the model features of the secondary trainer.

3. The personalized recommendation service using a converged architecture according to claim (2), as shown in fig. 2,3 and 4, wherein the model in the primary trainer consists of an AutoInt model and a Transformer, and the secondary trainer consists of a Multi-head attribute algorithm. The process is as shown in fig. 5, after the result of feature training of each model in the primary trainer is output, a new feature vector is obtained by activating the feature vector, selecting the feature vector of the same type, and carrying out feature weight weighting and averaging; the new feature vector data is input as a training data set to a secondary trainer (i.e., multi-head training), and the final feature result is output.

4. The 4 recommendation services provided for the take-away platform according to claim (3) are shown in fig. 6,7 and 8, wherein after the fusion model processing of claim (2) and claim (3), recommendation results of multiple scenes (fig. 6) can be simultaneously output, and the multiple scenes are respectively recommended for food and merchant (fig. 7), and recommended for advertisement (fig. 8).

5. The accuracy of the results of the fusion model multi-scenario recommendation service according to claim (4) is shown in fig. 9, and the accuracy of the recommendation results under multiple scenarios is obtained by respectively analyzing the algorithm effectiveness and the model performance analysis of the fusion model and deducing and calculating the mathematical formula.