CN115203585A - Automatic architecture searching method of collaborative filtering model - Google Patents
Automatic architecture searching method of collaborative filtering model Download PDFInfo
- Publication number
- CN115203585A CN115203585A CN202211119443.3A CN202211119443A CN115203585A CN 115203585 A CN115203585 A CN 115203585A CN 202211119443 A CN202211119443 A CN 202211119443A CN 115203585 A CN115203585 A CN 115203585A
- Authority
- CN
- China
- Prior art keywords
- network
- sub
- architecture
- collaborative filtering
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001914 filtration Methods 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000012549 training Methods 0.000 claims abstract description 55
- 238000005070 sampling Methods 0.000 claims abstract description 28
- 238000010845 search algorithm Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 47
- 230000003993 interaction Effects 0.000 claims description 44
- 239000013598 vector Substances 0.000 claims description 33
- 239000011159 matrix material Substances 0.000 claims description 21
- 238000009826 distribution Methods 0.000 claims description 14
- 238000005516 engineering process Methods 0.000 claims description 11
- 238000005457 optimization Methods 0.000 claims description 9
- 210000002569 neuron Anatomy 0.000 claims description 8
- 230000035772 mutation Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 5
- 230000014509 gene expression Effects 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013178 mathematical model Methods 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims description 2
- 238000010295 mobile communication Methods 0.000 claims description 2
- 230000002452 interceptive effect Effects 0.000 description 15
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001939 inductive effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 235000006629 Prosopis spicigera Nutrition 0.000 description 1
- 240000000037 Prosopis spicigera Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000009401 outcrossing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an automatic architecture searching method of a collaborative filtering model. The method of the invention comprises the following steps: s1, constructing a super network according to a preset search space, wherein parameters in the super network are shared by all sub-architectures; s2, super network training: randomly sampling a sub-framework from the super network, and training the super network through the training sub-framework; and S3, after the training of the super network in the S2 is finished, searching the collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search. Compared with the prior art, the method has the advantages that the diversity of the search space is further expanded, and the expressive force of the collaborative filtering model is improved; the efficiency of the search algorithm is greatly improved by the weight sharing-based hyper-network training and architecture search.
Description
Technical Field
The invention relates to the fields of artificial intelligence, recommendation algorithm and automatic machine learning, in particular to an automatic architecture searching method of a collaborative filtering model.
Background
Personalized recommendations are ubiquitous and have been applied to many online services, such as e-commerce, advertising, and social media. At the heart of this is the estimation of the likelihood of a user adopting an item based on historical interactions such as purchases and clicks. Collaborative Filtering (CF) solves this problem by assuming that similarly behaving users will exhibit similar preferences for items. To achieve this assumption, one common paradigm is to model parameterize the user and the item to reconstruct the historical interactions and predict the user's preferences based on the parameters. Generally, a learnable model has two key parts:
(1) And embedding the representation, and converting the user and the article into a low-dimensional vector representation. The vector representations of all users and items are typically stored by creating an embedded matrix, each row vector being a vector representation of a specific user or item.
(2) And an interaction function for reconstructing historical interactions based on the embedding. For example, matrix Factorization (MF) directly models the interaction of user/item IDs into a matrix, and then uses matrix factorization to generate a hidden factor matrix representing the user or item for each of the user and item, that is, an embedded vector for each of the user and item, and builds a model of the interaction between the user and the item by using the inner product. The rise of deep learning in recent years inspires researchers to replace traditional inner product interaction models with more powerful deep neural networks. The neural collaborative filtering model replaces the inner product interaction function of matrix decomposition with a non-linear neural network, while the translation-based CF model uses the euclidean distance scale as the interaction function, and so on.
The invention mainly aims at interactive function modeling in a collaborative filtering model. The Neural Collaborative Filtering (NCF) introduces a deep neural network model for interactive modeling, and inputs of a user side and an article side are spliced together and input into the neural network model, so that stronger and effective feature intersection is realized, and the intersection of implicit features is enriched.
The deep neural network model adopted in NCF is mainly a multi-layer perceptron (MLP), each layer having a different number of neurons. However, the number of neurons in each layer usually depends on the experience of human experts, a great number of service scenes exist in a recommendation system, a set of hyper-parameters is difficult to be applied to all service scenes, and if the hyper-parameters are not well designed, the model effect is greatly influenced.
Therefore, some work attempts to apply automated machine learning (AutoML) techniques in interactive functions to search for appropriate interactive functions for a particular data set. SIF considers that the optimal interactive functions in collaborative filtering are different in different data sets, so SIF searches the interactive functions by using an NAS method. However, the search space of SIF only involves some simple mathematical operations, such as addition, subtraction, multiplication, division, maximum/minimum value, inner product, etc., and the model expression capability is limited.
Based on the defects of NCF and SIF, the invention provides an automatic architecture searching method of a collaborative filtering model, which is used for automatically searching an NAS algorithm of an interactive function of the collaborative filtering model. NAS algorithms are generally designed in three aspects:
(1) Searching a space: a collection of neural network structures that can be searched, i.e., a space of solutions, is defined.
(2) And (3) searching strategies: how to find the optimal network structure in the search space is defined.
(3) The evaluation method comprises the following steps: how to evaluate the performance of the searched network structure is defined.
Disclosure of Invention
The invention aims to: aiming at the problems and the defects in the prior art, the invention aims to provide an automatic architecture searching method of a collaborative filtering model, and solves the problems that the searching space of the current collaborative filtering model automatic architecture searching is limited and the model architecture performance capability is limited.
The technical scheme is as follows: in order to achieve the above object, the technical solution adopted by the present invention is an automated architecture search method for a collaborative filtering model, comprising the following steps:
s1, constructing a super network according to a preset search space, wherein parameters in the super network are shared by all sub-architectures;
s2, super network training: randomly sampling a sub-framework from the super-network, and training the super-network through the training sub-framework;
and S3, after the training of the super network in the S2 is finished, searching a collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search.
Further, the specific method for constructing the super-network in step S1 is:
s11, firstly, inducing and abstracting the existing collaborative filtering model to form a unified framework of the collaborative filtering model, representing the unified framework as a mathematical model, and specifically, abstracting the collaborative filtering model into two parts by analyzing the existing collaborative filtering model: embedding representation, converting users and articles into low-dimensional vector representation; an interaction function that reconstructs historical interactions of the user and the item based on the embedding; the following is a mathematical representation of the collaborative filtering abstraction framework: first define the userEmbedded vector ofArticle of manufactureEmbedded vector ofWhereinA set of real numbers is represented by,kis the dimension of the embedding vector; user' sIs embedded in the watchArticle of manufactureIs inserted in the watchIn whichmThe number of the users is the number of the users,nthe number of the articles;for the userTo the articleActual rating of (a);to train the data for the set of users and items that have had actual interaction,in order to be a function of the loss,for the regularization coefficient, the optimization objective of the collaborative filtering model is defined as:
whereinIn order to be a function of the interaction,a frobenius norm of the matrix is represented,for regularization terms, for alleviating the over-fitting problem in model optimization, by minimizing a functionTo optimize the co-ordinationThe same filtering model is used;
s12, on the basis of the unified framework of the collaborative filtering model obtained in the S11, searching key interaction functions in collaborative filtering by using a framework searching technologyWhich is represented in the collaborative filtering model in the form of a specific architecture, the mathematical definition of the architecture search technique is:
whereinTo verify the interaction data of the user and the item on the set,Aas a function of interactionThe search space of (a), i.e. the hyper-network to be built,the optimal interaction function which needs to be searched is a double-layer optimization problem, wherein the inner layer is a framework for optimizing a collaborative filtering model on training data, and the outer layer is a framework for optimizing the interaction function on verification data;
s13, finding an optimal interaction function architecture to enable the collaborative filtering model to achieve optimal performance on given data, and further analyzing the commonly used interaction function architecture in the conventional collaborative filtering model to provide an interaction function architecture search space, wherein the search space comprises commonly used operations in various recommendation system models, the search space is designed into a parameter sharing-based hyper-network, and the number of layers isNEach layer comprisingMDifferent blocks are seeded; wherein each block corresponds to onePerforming seed operation; obtaining a candidate architecture by sampling a block at each layer of the super-network; the operations in the search space are defined as follows:
(1) Element-Wise operation, i.e., EW operation: the method comprises bitwise adding, bitwise taking the maximum value, bitwise taking the minimum value, bitwise averaging and inner product, and 5 different blocks are formed in total;
(2) multi-layer perceptron operation, i.e., MLP operation: each defined MLP operation only comprises three layers, the input layer and the output layer are assigned with dimensions, so that the dimensions are aligned on a search space, the hidden layer uses different neuron numbers to embody different MLP operations, and the neuron numbers of the hidden layer comprise 16, 32, 64, 128, 256, 512 and 1024, which form 7 different blocks in total;
(3) cross network operation, i.e. CN operation: composed of a plurality of cross layers, the output of which is calculated by the following formula
Whereinx 0 For the initial input to the cross-over network,x l is a firstlThe output of the layer cross-over layer,w l andb l are respectively the firstlWeight parameters and bias parameters of layer-crossing layers, andx l+1 is that firstl+1 output of the cross layer;
the output of the crossover network contains all the signals from 1 st order tolCross term of +1 orderIn whichd CN In order to cross the dimension of the network output,representing a vectorx i To (1)a j A component;
the number of cross layers determines the maximum order of cross terms, 4 types of cross networks are used, the number of the cross layers is 1, 2, 3 and 4, and 4 different blocks are formed in total;
(4) self-attention operation, namely SA operation: it is provided withQ、K、V SA All three matrixes come from the same input, the results of the three matrixes are normalized into probability distribution by utilizing Softmax operation, and then the probability distribution is multiplied by the matrixesV SA A representation of the sum of weights is obtained; then divided by a scale in order to prevent excessive results(ii) a Using a differenceQ、K、V SA The matrix is a multi-head mechanism, so that different attention expressions are learned, different self-attention modules are formed by different head numbers, 1, 2, 3 and 4 self-attention modules with different head numbers are used, namely 4 different blocks are formed in total, and the calculation formula of the attention operation is as follows:
(5) factoring machine operation, i.e. FM operation: the factoring machine operation is represented by the following equation:
whereinAre the parameters of the factoring machine and,V FM in the form of a matrix of parameters,xin order to input the vector, the vector is input,d FM for the dimensions of the input vector,representing a parameter matrixV FM To (1)iThe number of rows is such that,representation matrixVTo (1) ajLine for mobile communication terminalRepresenting a vectorxTo (1)iThe number of the components is such that,representing a vectorxTo (1) ajAnd (4) a component.
Further, the specific method of the training of the super network in step S2 is: on the basis of the hyper-network constructed in S1, sampling a sub-architecture from the hyper-network, wherein the sampled sub-architecture inherits parameters of the hyper-network, and training the hyper-network by training the sampled sub-architecture; note bookFor super networksAAll of the parameters involved are weighted by the weight,for super networksAMiddle sub-structureaThe network parameters of (a) are set,representing the sub-architecture of a particular sample and its network parameters,for sampling sub-architecturesaThe prior distribution of (a), the goal of training the hyper-network is:
wherein,it is shown that it is desirable to,representing the loss of the architecture on the training data, i.e. according to a priori distribution from the super-networkThe expected performance of the random sampling sub-framework is optimal;
s21, randomly initializing parameters of the hyper-network: initializing parameters of a hyper-network through a standard network parameter initialization technology;
s22, training the super network, iteratively executing the training process of the super network, and designating the number of training roundsEach iteration comprises the following three steps:
(1) Randomly sampling a sub-architecture from the super network, wherein the sub-architecture inherits and shares parameters in the super network;
(2) Randomly sampling a batch of data from the training data;
(3) The sampled sub-architecture is trained with training data using gradient descent while corresponding parameters in the super-network are updated.
Further, the specific method of the evolutionary search algorithm in step S3 is:
s31, initializing a population through a randomly sampled sub-architectureP 0 In the population comprisingPEach individual in the population is a candidate sub-framework, and an evolutionary algorithm gradually iterates through population selection, crossing and variation to find an optimal solution;
s32, calculating the fitness: the fitness of each individual in the population is the performance of the corresponding sub-framework, a parameter sharing technology is used, the sub-framework is not required to be trained, and the parameters in the super network are directly inherited to be evaluated to obtain the fitness of the individual;
s33, an evolution step is executed: independently selecting from the population according to the fitness of the individualP/2The sub-individuals select one individual each time, the selected probability of the individual is in direct proportion to the fitness of the individual, and the selected individual serves as a parent; performing crossover and mutation operations on the parent, wherein the probability of mutation of each block in the selected individuals is 0.1, randomly selecting one block from two randomly selected individuals for crossover, and adding the crossover and mutated offspring into the populationSelecting the population with the highest fitness from the new populationPThe individual forms a new population.
S34, after the evolutionary algorithm reaches the predefined iteration number, selecting the population with the highest fitness from the final populationtAnd the sub-frameworks are trained and evaluated, and the optimal sub-framework is selected as a search result.
Has the beneficial effects that:
the invention provides an automatic architecture search of a collaborative filtering model, and an NAS algorithm is introduced to search an interactive function in the corresponding collaborative filtering model aiming at a specific data set, so that the effect of the collaborative filtering model is improved. Compared with other collaborative filtering models which work in some application NAS modes, the search space of the method is compact and efficient, and the method comprises the operations and modules which are commonly used in various recommendation system models. The search space of the invention is a parameter sharing-based hyper-network, and can utilize a sub-architecture of random sampling and high-efficiency training sampling so as to achieve the aim of training the whole hyper-network. Meanwhile, the search strategy of the invention is a two-stage method, firstly, the parameters of the sub-architecture training hyper-network in the hyper-network are sampled, and then, the excellent sub-architecture in the hyper-network is searched by using an evolutionary algorithm. Experiments are performed on the public data set, and compared with a manual design method and other NAS algorithms, the method provided by the invention is also obviously improved.
Drawings
FIG. 1 is a general framework of the present invention;
FIG. 2 is an overall schematic diagram of the search space and search algorithm of the present invention. In fig. 2, (a) represents a search space, and 2 (b) represents a solution of the search space, which is obtained by sampling one block at each layer.
Detailed Description
The present invention is further illustrated by the following figures and specific examples, which are to be understood as illustrative only and not as limiting the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.
The invention discloses a collaborative filtering model automatic architecture searching method, which comprises the following steps:
s1, constructing a super network according to a preset search space, wherein parameters in the super network are shared by all sub-architectures;
s11, firstly, inducing and abstracting the existing collaborative filtering model to form a unified framework of the collaborative filtering model, and expressing the unified framework as a mathematical model. By analyzing the existing collaborative filtering model, the collaborative filtering model can be abstracted into two parts: embedding representation, converting users and articles into low-dimensional vector representation; and an interaction function that reconstructs historical interactions of the user and the item based on the embedding. The following is a mathematical representation of the collaborative filtering abstraction framework: s12, on the basis of the unified framework of the collaborative filtering model obtained in the S11, searching for key interactive functions in collaborative filtering by using a framework searching technologyWhich is represented in the collaborative filtering model in the form of a specific framework, the mathematical definition of the framework search technique is:
whereinTo verify the interaction data of the user and the item on the set,Aas a function of interactionThe search space of (a), i.e. the hyper-network to be built,i.e. the optimal interaction function to be searched, which is a two-layer optimization problem, wherein the inner layerTo optimize the collaborative filtering model on the training data, the outer layer is the framework that optimizes the interaction function on the validation data.
It can be seen that the first half of the formula calculates a model, and calculates the difference between the similarity between the user and the item and the actual similarity between the user and the item, wherein the similarity calculated by the model is obtained by calculating the inner product of the respective embedding vectors of the user and the item, and the actual similarity between the user and the item is usually derived from the active action of the user on the item, such as the rating given to a certain movie by the user (1 star to 5 stars).For the loss function, mean-square error (MSE) is usually used, so the user's score for the item generally needs to be normalized or normalized accordingly. The second half of the formula is to calculate the two norms of the embedded tables of the user and the article, so as to achieve the purpose of punishment and restrain overfitting. Conventional collaborative filtering models typically utilize similaritiesThe inner product form of (2) calculates the similarity. However, the inner product does not perform optimally on all data sets, and different data sets may require different ways of interaction. The idea of AutoML can therefore be introduced, using NAS algorithms to search for specific interaction functions for specific datasets.
The optimization objective of the collaborative filtering model is defined as:
whereinIn order to be a function of the interaction,a frobenius norm of the matrix is represented,for the regularization term, defined by the above formulaUnlike the previous formula, the inner product will be used to calculate the similarityReplacing interactive functions for abstractionIt is a learnable neural network architecture, so that the definition is also advantageous for extension, for example in NCFIs a multi-layer perceptron, and can represent different mathematical interactive operations in SIF.
S12, on the basis of the abstract framework obtained in the S11 step, the invention provides a method for searching key interaction functions in collaborative filtering by using a framework search technologyWhich is represented in the collaborative filtering model in the form of a specific framework. The invention aims to obtain a more appropriate interactive function architecture through architecture search and improve the performance of the collaborative filtering model. The mathematical definition of the above architecture search problem is:
whereinFor authenticating users on setsAnd the interaction data of the item(s),Aas a function of interactionThe search space of (a), i.e. the hyper network to be built,the optimal interaction function to be searched is a two-layer optimization problem, wherein the inner layer is a framework for optimizing a collaborative filtering model on training data, and the outer layer is a framework for optimizing the interaction function on verification data. Overall, it is desirable to find an optimal interactive function architecture so that the collaborative filtering model can achieve optimal performance on given data.
S13, in order to solve the searching problem provided in S12, the invention provides a compact and efficient interactive function architecture searching space by further analyzing the commonly used interactive function architecture in the conventional collaborative filtering model. The search space contains a variety of operations that are common in recommendation system models. The whole search space is designed into a hyper-network based on parameter sharing, as shown in FIG. 2, which is a layer with the number of layersNEach layer comprisingMDifferent blocks (blocks), each of which processes data in a different manner. And all blocks of each layer are activated only by one specific block in the subsequent process of training the sub-architecture or the process of searching the sub-architecture. That is, as shown in fig. 2, only one of the dashed blocks of each layer is selected as the actual sampled architecture or searched architecture. It should be noted that the original inputs (user embedding and item embedding) are output to each layer as input to each layer, while the second isiThe output of the layer will be asAnd inputting the layers. Because the intersection of features is very important in the recommendation system model, the design can enrich the implicit intersection of low-order features (including original input) and high-order features, and the expressive ability of the model is improved. And at the same time, the design similar to Skip-Connection in Resnet is designed, and the residual is used for referenceThe poor learning idea can make the original input more easily transmitted to the deep part of the network, and the gradient is better transmitted to a shallower layer in the reverse feedback, so that the model is easier to learn.
Each layer comprisesMDifferent blocks, which are all common operations and modules in the recommendation system model, can be actually classified into 5 broad categories, and each broad category of blocks contains some hyper-parameters, and the 5 broad categories of operations are listed below:
(1) Element-Wise (EW) operation: mathematical operations, which mainly involve some element levels, include bitwise addition (sum), bitwise maximum (max), bitwise minimum (min), bitwise average (avg), and inner product (inner product), constitute 5 different blocks in total.
(2) Multilayer perceptron (MLP) operation: namely a simple feedforward neural network layer, in order to search the diversity of the space, each MLP operation defined by the invention only comprises three layers, the input layer and the output layer are both assigned with dimensions, so that the dimensions are aligned on the search space, the hidden layer uses different neuron numbers to embody different MLP operations, and the neuron numbers of the hidden layer comprise 16, 32, 64, 128, 256, 512 and 1024, and form 7 different blocks in total.
(3) Cross Network (CN) operation: composed of multiple cross layers, the output of the cross layer is calculated by the following formula
Whereinx 0 For the initial input to the cross-over network,x l is as followslThe output of the layer cross-over layer,w l andb l are respectively the firstlWeight parameters and bias parameters of layer-crossing layers, andx l+1 is thatlOutput of +1 layers of interleaved layers;
the output of the crossover network contains all the signals from 1 st order tolCross term of +1 orderWhereind CN In order to cross the dimension of the network output,representing a vectorx i To (1)a j A component;
the number of cross layers determines the maximum order of the cross terms, and this embodiment uses 4 types of cross networks, with the number of cross layers being 1, 2, 3 and 4, making up 4 different blocks in total.
(4) Self-Attention (SA) operation: the self-attention mechanism is a special case of the attention mechanism, and Q (Query), K (Key),V SA (Value) three matrixes all come from the same input, the results of the three matrixes are normalized into probability distribution by utilizing Softmax operation, and then the probability distribution is multiplied by the matrixesV SA A representation of the sum of weights is obtained and then divided by a scale in order to prevent the result from being too large(ii) a Using a differenceQ、K、V SA The matrix is a multi-head (multi-head) mechanism, which can learn different attention expressions, different head numbers form different self-attention modules, and the invention uses 1, 2, 3 and 4 self-attention modules with different head numbers, that is, 4 different blocks are formed in total. The calculation formula for attention operation is as follows:
(5) factorization Machine (FM) operation: the factoring machine operation may be represented by the following equation:
whereinAre the parameters of the factoring machine and,V FM in the form of a matrix of parameters,xin order to input the vector, the vector is input,d FM for the dimensions of the input vector,representing a parameter matrixV FM To (1) aiThe rows of the image data are, in turn,representation matrixVTo (1) ajLine ofRepresenting a vectorxTo (1)iThe number of the components is one,representing a vectorxTo (1)jAnd (4) a component.
The output of the factoring machine is the sum of the first and second order feature interactions. Since both inputs have been converted to low-dimensional embeddings, the present invention computes the interaction of first-order features and second-order features for a tensor by concatenating the inputs.
Thus, each layer in the search space of the present invention is commonDifferent blocks are seeded. While processing multiple inputs for parameter sharing and to facilitate different types of blocks, the output of each block is fixed to a specified dimension, noted as。
S2, super network training: randomly sampling a sub-framework from the super-network, and training the super-network through the training sub-framework;
on the basis of the hyper-network constructed in S1, sampling a framework from the hyper-network, the sampled framework inherits parameters of the hyper-network,the hyper-network is trained by training the sampled architecture. Note bookFor super networksAMiddle sub-structureaThe network parameters of (a) are set,representing the sub-architecture of a particular sample and its network parameters,for sampling sub-architecturesaThe object of training the hyper-network is:
wherein,it is shown that it is desirable to,representing the loss of the architecture on the training data, i.e. according to a priori distribution from the super-networkThe expected performance of the random sampling sub-architecture is optimal. As can be seen from the above formula, the architecture parameters are decoupled into a prior distributionIn (3), the parameters of the hyper-network are trained without relating to the optimized architecture parameters. Therefore, a priori distributionPlays a more important role. Recent work has found that pure random searching is very competitive with several of the most advanced NAS methods. And, from experience in some previous work,good sampling results have been obtained with uniform sampling (uniform sampling). In the present invention therefore, it is preferred that,the method of uniform sampling is adopted, and is fixed, and the distribution does not need to be learned in the process of training the hyper-network. The illustrated sub-architecture of fig. 2 may be viewed as a randomly sampled sub-architecture.
S21, randomly initializing parameters of the hyper-network: parameters of the hyper-network are initialized by standard network parameter initialization techniques.
And S22, training the super network. When training the hyper-network, the number of training rounds is assigned asIf the resource time is sufficient, the system will,it can be set larger, if the resource time is limited,the setting can be small, and the difference is that the parameters of the hyper-network cannot be sufficiently trained. Each iteration comprises the following three steps:
(1) Randomly sampling a sub-architecture from the super network, wherein the sub-architecture inherits and shares parameters in the super network.
(2) A batch of data is randomly sampled from the training data.
(3) The sampled sub-architecture is trained with training data using gradient descent while corresponding parameters in the hyper-network are updated.
And S3, after the training of the super network in the S2 is finished, searching the collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search. The method has the advantages that the searched candidate sub-architecture can obtain the accuracy of the sub-architecture through direct reasoning without training by virtue of a parameter sharing technology. The weight sharing technology greatly improves the efficiency of architecture search.
And S31, the evolutionary algorithm finds the optimal solution through step-by-step iteration on population selection, crossing and variation. Each individual in the population is a candidate sub-architecture. First, initializing a population by randomly sampling a sub-architectureP 0 In the population, comprisesPAnd (4) individuals.
And S32, calculating the fitness. The fitness of each individual in the population is the performance of the corresponding sub-architecture, a parameter sharing technology is used, the sub-architecture is not required to be trained, and the parameters in the super network are directly inherited to be evaluated to obtain the fitness of the individual.
S33, an evolution step is executed: independently selecting from the population according to the fitness of the individualP/2, selecting an individual each time, wherein the selected probability of the individual is in direct proportion to the fitness of the individual, and the selected individual is used as a parent; and carrying out crossing and mutation operations on the parent, wherein the probability of mutation of each block in the selected individuals is 0.1, and one block in two randomly selected individuals is randomly selected to be crossed. The cross and variant progeny are added to the population. Selecting the population with the highest fitness from the new populationPThe individual forms a new population.
S34, after the evolutionary algorithm reaches the predefined iteration times, selecting the population with the highest fitness from the final populationtAnd the sub-frameworks are trained and evaluated, and the optimal sub-framework is selected as a search result.
Table 1 lists the square mean error comparison of the present invention with other methods, and it can be seen that the present invention is significantly improved over other methods.
TABLE 1 comparison of mean square error of the invention with other methods
Claims (4)
1. An automatic architecture searching method of a collaborative filtering model is characterized by comprising the following steps:
s1, constructing a super network according to a preset search space, wherein parameters in the super network are shared by all sub-architectures;
s2, hyper-network training: randomly sampling a sub-framework from the super-network, and training the super-network through the training sub-framework;
and S3, after the training of the super network in the S2 is finished, searching the collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search.
2. The method for searching for an automated architecture of a collaborative filtering model according to claim 1, wherein the specific method for constructing a hyper-network in step S1 is:
s11, firstly, summarizing and abstracting the existing collaborative filtering model to form a unified framework of the collaborative filtering model, expressing the unified framework as a mathematical model, and specifically, abstracting the collaborative filtering model into two parts by analyzing the existing collaborative filtering model: embedding representation, converting users and articles into low-dimensional vector representation; an interaction function that reconstructs historical interactions of the user and the item based on the embedding; the following is a mathematical representation of the collaborative filtering abstraction framework: first define the userEmbedded vector ofArticle of manufactureEmbedded vector ofWhereinA set of real numbers is represented by,kis the dimension of the embedding vector; user' sIs embedded in the watchArticle of manufactureIs embedded in the watchIn whichmAs to the number of the users,nthe number of the articles;for the userTo the articleThe actual score of (a);in order to train the data that the user and the article in the set have actually interacted with,in order to be a function of the loss,for the regularization coefficient, the optimization objective of the collaborative filtering model is defined as:
whereinIn order to be a function of the interaction,a frobenius norm of the matrix is represented,for regularization terms, for alleviating the over-fitting problem in model optimization, by minimizing a functionTo optimize the collaborative filtering model;
s12, on the basis of the unified framework of the collaborative filtering model obtained in the S11, searching key interaction functions in collaborative filtering by using a framework searching technologyWhich is represented in the collaborative filtering model in the form of a specific framework, the mathematical definition of the framework search technique is:
whereinTo verify the interaction data of the user and the item on the set,Aas a function of interactionThe search space of (a), i.e. the hyper network to be built,the optimal interaction function which needs to be searched is a double-layer optimization problem, wherein the inner layer is a framework for optimizing a collaborative filtering model on training data, and the outer layer is a framework for optimizing the interaction function on verification data;
s13, finding an optimal interaction function architecture to enable the collaborative filtering model to achieve optimal performance on given data, and further analyzing the commonly used interaction function architecture in the conventional collaborative filtering model to provide an interaction function architecture search space, wherein the search space comprises commonly used operations in various recommendation system models, the search space is designed into a parameter sharing-based hyper-network, and the number of layers isNEach layer comprisingMDifferent blocks are seeded; wherein each block corresponds to an operation; obtaining a candidate architecture by sampling a block at each layer of the super network; the operations in the search space are defined as follows:
(1) Element-Wise operation, i.e., EW operation: the method comprises bitwise adding, bitwise taking the maximum value, bitwise taking the minimum value, bitwise averaging and inner product, and 5 different blocks are formed in total;
(2) multi-layer perceptron operation, i.e., MLP operation: each defined MLP operation only comprises three layers, the input layer and the output layer are assigned with dimensions, so that the dimensions are aligned on a search space, the hidden layer uses different neuron numbers to embody different MLP operations, and the neuron numbers of the hidden layer comprise 16, 32, 64, 128, 256, 512 and 1024, which form 7 different blocks in total;
(3) cross network operation, i.e. CN operation: composed of multiple cross layers, the output of the cross layer is calculated by the following formula
Whereinx 0 For the initial input to the cross-over network,x l is as followslThe output of the layer cross-over layer,w l andb l are respectively the firstlWeight parameters and bias parameters of layer-crossing layers, andx l+1 is thatl+1 output of the cross layer;
the output of the crossover network contains all the signals from 1 st order tolCross term of order +1Whereind CN In order to cross the dimension of the network output,representing a vectorx i To (1) aa j A component;
the number of cross layers determines the maximum order of the cross terms, 4 types of cross networks are used, the number of the cross layers is 1, 2, 3 and 4, and 4 different blocks are formed in total;
(4) self-attention operation, namely SA operation: it is composed ofQ、K、V SA All three matrixes come from the same input, the results of the three matrixes are normalized into probability distribution by utilizing Softmax operation, and then the probability distribution is multiplied by the matrixesV SA A representation of the sum of weights is obtained; then divided by a scale in order to prevent excessive results(ii) a Using a differenceQ、K、V SA The matrix is a multi-head mechanism, so that different attention expressions are learned, different self-attention modules are formed by different head numbers, 1, 2, 3 and 4 self-attention modules with different head numbers are used, namely 4 different blocks are formed in total, and the calculation formula of the attention operation is as follows:
(5) factoring machine operation, i.e. FM operation: the factoring machine operation is represented by the following equation:
whereinAre the parameters of the factoring machine and,V FM in the form of a matrix of parameters,xin order to input the vector, the vector is input,d FM for the dimensions of the input vector,representing a parameter matrixV FM To (1) aiThe number of rows is such that,representation matrixVTo (1) ajLine for mobile communication terminalRepresenting a vectorxTo (1) aiThe number of the components is one,representing a vectorxTo (1) ajAnd (4) a component.
3. The method for searching the automated architecture of the collaborative filtering model according to claim 1, wherein the specific method for training the hyper-network in step S2 is: on the basis of the super network constructed in the S1, sampling a sub-architecture from the super network, wherein the sampled sub-architecture inherits the parameters of the super network, and training the super network by training the sampled sub-architecture; note the bookFor super networksAAll of the parameters involved are weighted by the weight,for super networksAMiddle sub-structureaThe network parameters of (a) are set,representing the sub-architecture of a particular sample and its network parameters,for sampling sub-architecturesaThe prior distribution of (a), the goal of training the hyper-network is:
wherein,it is shown that it is desirable to,representing the loss of the structure on the training data, i.e. according to a priori distribution from the hyper-networkThe expected performance of the random sampling sub-framework is optimal;
s21, randomly initializing parameters of the hyper-network: initializing parameters of a hyper-network through a standard network parameter initialization technology;
s22, training the super network, iteratively executing the training process of the super network, and designating the number of training roundsEach iteration comprises the following three steps:
(1) Randomly sampling a sub-architecture from the super network, wherein the sub-architecture inherits and shares parameters in the super network;
(2) Randomly sampling a batch of data from the training data;
(3) The sampled sub-architecture is trained with training data using gradient descent while corresponding parameters in the super-network are updated.
4. The method for searching for an automated architecture of a collaborative filtering model according to claim 1, wherein the specific method of the evolutionary search algorithm in step S3 is:
s31, initializing a population through a randomly sampled sub-architectureP 0 In the population, comprisesPEach individual in the population is a candidate sub-framework, and an evolutionary algorithm finds an optimal solution through selection, intersection and variation of the population and gradual iteration;
s32, calculating the fitness: the fitness of each individual in the population is the performance of the corresponding sub-framework, a parameter sharing technology is used, the sub-framework is not required to be trained, and the parameters in the super network are directly inherited to be evaluated to obtain the fitness of the individual;
s33, an evolution step is executed: independently selecting from the population according to the fitness of the individualP/2The sub-individuals select one individual each time, the selected probability of the individual is in direct proportion to the fitness of the individual, and the selected individual serves as a parent; performing crossover and mutation operations on the parent, wherein the mutation probability of each block in the selected individuals is 0.1, randomly selecting one block from two randomly selected individuals for crossover, adding the crossover and mutated offspring into the population, and selecting the offspring with the highest fitness from the new populationPThe individuals form new populations;
s34, after the evolutionary algorithm reaches the predefined iteration number, selecting the population with the highest fitness from the final populationtAnd the sub-frameworks are trained and evaluated, and the optimal sub-framework is selected as a search result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211119443.3A CN115203585B (en) | 2022-09-15 | 2022-09-15 | Automatic architecture searching method of collaborative filtering model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211119443.3A CN115203585B (en) | 2022-09-15 | 2022-09-15 | Automatic architecture searching method of collaborative filtering model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115203585A true CN115203585A (en) | 2022-10-18 |
CN115203585B CN115203585B (en) | 2022-12-27 |
Family
ID=83572187
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211119443.3A Active CN115203585B (en) | 2022-09-15 | 2022-09-15 | Automatic architecture searching method of collaborative filtering model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115203585B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110288444A (en) * | 2019-06-28 | 2019-09-27 | 第四范式(北京)技术有限公司 | Realize the method and system of user's associated recommendation |
CN112464579A (en) * | 2021-02-02 | 2021-03-09 | 四川大学 | Identification modeling method for searching esophageal cancer lesion area based on evolutionary neural network structure |
CN114419389A (en) * | 2021-12-14 | 2022-04-29 | 上海悠络客电子科技股份有限公司 | Target detection model construction method based on neural network architecture search |
-
2022
- 2022-09-15 CN CN202211119443.3A patent/CN115203585B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110288444A (en) * | 2019-06-28 | 2019-09-27 | 第四范式(北京)技术有限公司 | Realize the method and system of user's associated recommendation |
CN112464579A (en) * | 2021-02-02 | 2021-03-09 | 四川大学 | Identification modeling method for searching esophageal cancer lesion area based on evolutionary neural network structure |
CN114419389A (en) * | 2021-12-14 | 2022-04-29 | 上海悠络客电子科技股份有限公司 | Target detection model construction method based on neural network architecture search |
Also Published As
Publication number | Publication date |
---|---|
CN115203585B (en) | 2022-12-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111428147B (en) | Social recommendation method of heterogeneous graph volume network combining social and interest information | |
CN111611472B (en) | Binding recommendation method and system based on graph convolution neural network | |
CN109785062B (en) | Hybrid neural network recommendation system based on collaborative filtering model | |
Wang et al. | A mobile recommendation system based on logistic regression and gradient boosting decision trees | |
CN111310063B (en) | Neural network-based article recommendation method for memory perception gated factorization machine | |
Li et al. | Deep probabilistic matrix factorization framework for online collaborative filtering | |
CN114357067B (en) | Personalized federal element learning method aiming at data isomerism | |
CN111881363B (en) | Recommendation method based on graph interaction network | |
CN112905648B (en) | Multi-target recommendation method and system based on multi-task learning | |
CN109582864A (en) | Course recommended method and system based on big data science and changeable weight adjustment | |
CN112287166B (en) | Movie recommendation method and system based on improved deep belief network | |
CN113918832B (en) | Graph convolution collaborative filtering recommendation system based on social relationship | |
CN113918833B (en) | Product recommendation method realized through graph convolution collaborative filtering of social network relationship | |
CN114357312B (en) | Community discovery method and personality recommendation method based on graph neural network automatic modeling | |
CN113918834B (en) | Graph convolution collaborative filtering recommendation method fusing social relations | |
CN111292197A (en) | Community discovery method based on convolutional neural network and self-encoder | |
CN115577283A (en) | Entity classification method and device, electronic equipment and storage medium | |
CN116910375A (en) | Cross-domain recommendation method and system based on user preference diversity | |
CN115525819A (en) | Cross-domain recommendation method for information cocoon room | |
CN113409157A (en) | Cross-social network user alignment method and device | |
CN115203585B (en) | Automatic architecture searching method of collaborative filtering model | |
CN116955810A (en) | Optimization method of knowledge collaborative recommendation algorithm based on graph convolution network | |
CN116664253A (en) | Project recommendation method based on generalized matrix decomposition and attention shielding | |
CN116385077A (en) | Multi-behavior recommendation system based on behavior perception fusion graph convolution network | |
Mu et al. | AD-link: An adaptive approach for user identity linkage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |