CN115203585A - Automatic architecture searching method of collaborative filtering model - Google Patents

Automatic architecture searching method of collaborative filtering model Download PDF

Info

Publication number
CN115203585A
CN115203585A CN202211119443.3A CN202211119443A CN115203585A CN 115203585 A CN115203585 A CN 115203585A CN 202211119443 A CN202211119443 A CN 202211119443A CN 115203585 A CN115203585 A CN 115203585A
Authority
CN
China
Prior art keywords
network
sub
architecture
collaborative filtering
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211119443.3A
Other languages
Chinese (zh)
Other versions
CN115203585B (en
Inventor
黄宜华
朱光辉
程锋
蒋申
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Hongcheng Big Data Technology And Application Research Institute Co ltd
Original Assignee
Jiangsu Hongcheng Big Data Technology And Application Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Hongcheng Big Data Technology And Application Research Institute Co ltd filed Critical Jiangsu Hongcheng Big Data Technology And Application Research Institute Co ltd
Priority to CN202211119443.3A priority Critical patent/CN115203585B/en
Publication of CN115203585A publication Critical patent/CN115203585A/en
Application granted granted Critical
Publication of CN115203585B publication Critical patent/CN115203585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an automatic architecture searching method of a collaborative filtering model. The method of the invention comprises the following steps: s1, constructing a super network according to a preset search space, wherein parameters in the super network are shared by all sub-architectures; s2, super network training: randomly sampling a sub-framework from the super network, and training the super network through the training sub-framework; and S3, after the training of the super network in the S2 is finished, searching the collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search. Compared with the prior art, the method has the advantages that the diversity of the search space is further expanded, and the expressive force of the collaborative filtering model is improved; the efficiency of the search algorithm is greatly improved by the weight sharing-based hyper-network training and architecture search.

Description

Automatic architecture searching method of collaborative filtering model
Technical Field
The invention relates to the fields of artificial intelligence, recommendation algorithm and automatic machine learning, in particular to an automatic architecture searching method of a collaborative filtering model.
Background
Personalized recommendations are ubiquitous and have been applied to many online services, such as e-commerce, advertising, and social media. At the heart of this is the estimation of the likelihood of a user adopting an item based on historical interactions such as purchases and clicks. Collaborative Filtering (CF) solves this problem by assuming that similarly behaving users will exhibit similar preferences for items. To achieve this assumption, one common paradigm is to model parameterize the user and the item to reconstruct the historical interactions and predict the user's preferences based on the parameters. Generally, a learnable model has two key parts:
(1) And embedding the representation, and converting the user and the article into a low-dimensional vector representation. The vector representations of all users and items are typically stored by creating an embedded matrix, each row vector being a vector representation of a specific user or item.
(2) And an interaction function for reconstructing historical interactions based on the embedding. For example, matrix Factorization (MF) directly models the interaction of user/item IDs into a matrix, and then uses matrix factorization to generate a hidden factor matrix representing the user or item for each of the user and item, that is, an embedded vector for each of the user and item, and builds a model of the interaction between the user and the item by using the inner product. The rise of deep learning in recent years inspires researchers to replace traditional inner product interaction models with more powerful deep neural networks. The neural collaborative filtering model replaces the inner product interaction function of matrix decomposition with a non-linear neural network, while the translation-based CF model uses the euclidean distance scale as the interaction function, and so on.
The invention mainly aims at interactive function modeling in a collaborative filtering model. The Neural Collaborative Filtering (NCF) introduces a deep neural network model for interactive modeling, and inputs of a user side and an article side are spliced together and input into the neural network model, so that stronger and effective feature intersection is realized, and the intersection of implicit features is enriched.
The deep neural network model adopted in NCF is mainly a multi-layer perceptron (MLP), each layer having a different number of neurons. However, the number of neurons in each layer usually depends on the experience of human experts, a great number of service scenes exist in a recommendation system, a set of hyper-parameters is difficult to be applied to all service scenes, and if the hyper-parameters are not well designed, the model effect is greatly influenced.
Therefore, some work attempts to apply automated machine learning (AutoML) techniques in interactive functions to search for appropriate interactive functions for a particular data set. SIF considers that the optimal interactive functions in collaborative filtering are different in different data sets, so SIF searches the interactive functions by using an NAS method. However, the search space of SIF only involves some simple mathematical operations, such as addition, subtraction, multiplication, division, maximum/minimum value, inner product, etc., and the model expression capability is limited.
Based on the defects of NCF and SIF, the invention provides an automatic architecture searching method of a collaborative filtering model, which is used for automatically searching an NAS algorithm of an interactive function of the collaborative filtering model. NAS algorithms are generally designed in three aspects:
(1) Searching a space: a collection of neural network structures that can be searched, i.e., a space of solutions, is defined.
(2) And (3) searching strategies: how to find the optimal network structure in the search space is defined.
(3) The evaluation method comprises the following steps: how to evaluate the performance of the searched network structure is defined.
Disclosure of Invention
The invention aims to: aiming at the problems and the defects in the prior art, the invention aims to provide an automatic architecture searching method of a collaborative filtering model, and solves the problems that the searching space of the current collaborative filtering model automatic architecture searching is limited and the model architecture performance capability is limited.
The technical scheme is as follows: in order to achieve the above object, the technical solution adopted by the present invention is an automated architecture search method for a collaborative filtering model, comprising the following steps:
s1, constructing a super network according to a preset search space, wherein parameters in the super network are shared by all sub-architectures;
s2, super network training: randomly sampling a sub-framework from the super-network, and training the super-network through the training sub-framework;
and S3, after the training of the super network in the S2 is finished, searching a collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search.
Further, the specific method for constructing the super-network in step S1 is:
s11, firstly, inducing and abstracting the existing collaborative filtering model to form a unified framework of the collaborative filtering model, representing the unified framework as a mathematical model, and specifically, abstracting the collaborative filtering model into two parts by analyzing the existing collaborative filtering model: embedding representation, converting users and articles into low-dimensional vector representation; an interaction function that reconstructs historical interactions of the user and the item based on the embedding; the following is a mathematical representation of the collaborative filtering abstraction framework: first define the user
Figure 866563DEST_PATH_IMAGE001
Embedded vector of
Figure 878381DEST_PATH_IMAGE002
Article of manufacture
Figure 236069DEST_PATH_IMAGE003
Embedded vector of
Figure 155483DEST_PATH_IMAGE004
Wherein
Figure 177666DEST_PATH_IMAGE005
A set of real numbers is represented by,kis the dimension of the embedding vector; user' s
Figure 891544DEST_PATH_IMAGE001
Is embedded in the watch
Figure 733598DEST_PATH_IMAGE006
Article of manufacture
Figure 191124DEST_PATH_IMAGE003
Is inserted in the watch
Figure 536655DEST_PATH_IMAGE007
In whichmThe number of the users is the number of the users,nthe number of the articles;
Figure 421434DEST_PATH_IMAGE008
for the user
Figure 954047DEST_PATH_IMAGE009
To the article
Figure 9072DEST_PATH_IMAGE003
Actual rating of (a);
Figure 740267DEST_PATH_IMAGE010
to train the data for the set of users and items that have had actual interaction,
Figure 530369DEST_PATH_IMAGE011
in order to be a function of the loss,
Figure 347015DEST_PATH_IMAGE012
for the regularization coefficient, the optimization objective of the collaborative filtering model is defined as:
Figure 739819DEST_PATH_IMAGE013
wherein
Figure 59942DEST_PATH_IMAGE014
In order to be a function of the interaction,
Figure 86192DEST_PATH_IMAGE015
a frobenius norm of the matrix is represented,
Figure 124555DEST_PATH_IMAGE016
for regularization terms, for alleviating the over-fitting problem in model optimization, by minimizing a function
Figure 993154DEST_PATH_IMAGE017
To optimize the co-ordinationThe same filtering model is used;
s12, on the basis of the unified framework of the collaborative filtering model obtained in the S11, searching key interaction functions in collaborative filtering by using a framework searching technology
Figure 167783DEST_PATH_IMAGE014
Which is represented in the collaborative filtering model in the form of a specific architecture, the mathematical definition of the architecture search technique is:
Figure 893162DEST_PATH_IMAGE018
Figure 481139DEST_PATH_IMAGE019
wherein
Figure 150499DEST_PATH_IMAGE020
To verify the interaction data of the user and the item on the set,Aas a function of interaction
Figure 648476DEST_PATH_IMAGE021
The search space of (a), i.e. the hyper-network to be built,
Figure 279178DEST_PATH_IMAGE022
the optimal interaction function which needs to be searched is a double-layer optimization problem, wherein the inner layer is a framework for optimizing a collaborative filtering model on training data, and the outer layer is a framework for optimizing the interaction function on verification data;
s13, finding an optimal interaction function architecture to enable the collaborative filtering model to achieve optimal performance on given data, and further analyzing the commonly used interaction function architecture in the conventional collaborative filtering model to provide an interaction function architecture search space, wherein the search space comprises commonly used operations in various recommendation system models, the search space is designed into a parameter sharing-based hyper-network, and the number of layers isNEach layer comprisingMDifferent blocks are seeded; wherein each block corresponds to onePerforming seed operation; obtaining a candidate architecture by sampling a block at each layer of the super-network; the operations in the search space are defined as follows:
(1) Element-Wise operation, i.e., EW operation: the method comprises bitwise adding, bitwise taking the maximum value, bitwise taking the minimum value, bitwise averaging and inner product, and 5 different blocks are formed in total;
(2) multi-layer perceptron operation, i.e., MLP operation: each defined MLP operation only comprises three layers, the input layer and the output layer are assigned with dimensions, so that the dimensions are aligned on a search space, the hidden layer uses different neuron numbers to embody different MLP operations, and the neuron numbers of the hidden layer comprise 16, 32, 64, 128, 256, 512 and 1024, which form 7 different blocks in total;
(3) cross network operation, i.e. CN operation: composed of a plurality of cross layers, the output of which is calculated by the following formula
Figure 495395DEST_PATH_IMAGE023
Whereinx 0 For the initial input to the cross-over network,x l is a firstlThe output of the layer cross-over layer,w l andb l are respectively the firstlWeight parameters and bias parameters of layer-crossing layers, andx l+1 is that firstl+1 output of the cross layer;
the output of the crossover network contains all the signals from 1 st order tolCross term of +1 order
Figure 440217DEST_PATH_IMAGE024
In whichd CN In order to cross the dimension of the network output,
Figure 120598DEST_PATH_IMAGE025
representing a vectorx i To (1)a j A component;
the number of cross layers determines the maximum order of cross terms, 4 types of cross networks are used, the number of the cross layers is 1, 2, 3 and 4, and 4 different blocks are formed in total;
(4) self-attention operation, namely SA operation: it is provided withQKV SA All three matrixes come from the same input, the results of the three matrixes are normalized into probability distribution by utilizing Softmax operation, and then the probability distribution is multiplied by the matrixesV SA A representation of the sum of weights is obtained; then divided by a scale in order to prevent excessive results
Figure 859883DEST_PATH_IMAGE026
(ii) a Using a differenceQKV SA The matrix is a multi-head mechanism, so that different attention expressions are learned, different self-attention modules are formed by different head numbers, 1, 2, 3 and 4 self-attention modules with different head numbers are used, namely 4 different blocks are formed in total, and the calculation formula of the attention operation is as follows:
Figure 360135DEST_PATH_IMAGE027
(5) factoring machine operation, i.e. FM operation: the factoring machine operation is represented by the following equation:
Figure 642736DEST_PATH_IMAGE028
wherein
Figure 912044DEST_PATH_IMAGE029
Are the parameters of the factoring machine and,V FM in the form of a matrix of parameters,xin order to input the vector, the vector is input,d FM for the dimensions of the input vector,
Figure 884548DEST_PATH_IMAGE030
representing a parameter matrixV FM To (1)iThe number of rows is such that,
Figure 606516DEST_PATH_IMAGE031
representation matrixVTo (1) ajLine for mobile communication terminal
Figure 689878DEST_PATH_IMAGE032
Representing a vectorxTo (1)iThe number of the components is such that,
Figure 813692DEST_PATH_IMAGE033
representing a vectorxTo (1) ajAnd (4) a component.
Further, the specific method of the training of the super network in step S2 is: on the basis of the hyper-network constructed in S1, sampling a sub-architecture from the hyper-network, wherein the sampled sub-architecture inherits parameters of the hyper-network, and training the hyper-network by training the sampled sub-architecture; note book
Figure 160360DEST_PATH_IMAGE034
For super networksAAll of the parameters involved are weighted by the weight,
Figure 366695DEST_PATH_IMAGE035
for super networksAMiddle sub-structureaThe network parameters of (a) are set,
Figure 457011DEST_PATH_IMAGE036
representing the sub-architecture of a particular sample and its network parameters,
Figure 497648DEST_PATH_IMAGE037
for sampling sub-architecturesaThe prior distribution of (a), the goal of training the hyper-network is:
Figure 218479DEST_PATH_IMAGE038
wherein,
Figure 711777DEST_PATH_IMAGE039
it is shown that it is desirable to,
Figure 543467DEST_PATH_IMAGE040
representing the loss of the architecture on the training data, i.e. according to a priori distribution from the super-network
Figure 173032DEST_PATH_IMAGE037
The expected performance of the random sampling sub-framework is optimal;
s21, randomly initializing parameters of the hyper-network: initializing parameters of a hyper-network through a standard network parameter initialization technology;
s22, training the super network, iteratively executing the training process of the super network, and designating the number of training rounds
Figure 861502DEST_PATH_IMAGE041
Each iteration comprises the following three steps:
(1) Randomly sampling a sub-architecture from the super network, wherein the sub-architecture inherits and shares parameters in the super network;
(2) Randomly sampling a batch of data from the training data;
(3) The sampled sub-architecture is trained with training data using gradient descent while corresponding parameters in the super-network are updated.
Further, the specific method of the evolutionary search algorithm in step S3 is:
s31, initializing a population through a randomly sampled sub-architectureP 0 In the population comprisingPEach individual in the population is a candidate sub-framework, and an evolutionary algorithm gradually iterates through population selection, crossing and variation to find an optimal solution;
s32, calculating the fitness: the fitness of each individual in the population is the performance of the corresponding sub-framework, a parameter sharing technology is used, the sub-framework is not required to be trained, and the parameters in the super network are directly inherited to be evaluated to obtain the fitness of the individual;
s33, an evolution step is executed: independently selecting from the population according to the fitness of the individualP/2The sub-individuals select one individual each time, the selected probability of the individual is in direct proportion to the fitness of the individual, and the selected individual serves as a parent; performing crossover and mutation operations on the parent, wherein the probability of mutation of each block in the selected individuals is 0.1, randomly selecting one block from two randomly selected individuals for crossover, and adding the crossover and mutated offspring into the populationSelecting the population with the highest fitness from the new populationPThe individual forms a new population.
S34, after the evolutionary algorithm reaches the predefined iteration number, selecting the population with the highest fitness from the final populationtAnd the sub-frameworks are trained and evaluated, and the optimal sub-framework is selected as a search result.
Has the beneficial effects that:
the invention provides an automatic architecture search of a collaborative filtering model, and an NAS algorithm is introduced to search an interactive function in the corresponding collaborative filtering model aiming at a specific data set, so that the effect of the collaborative filtering model is improved. Compared with other collaborative filtering models which work in some application NAS modes, the search space of the method is compact and efficient, and the method comprises the operations and modules which are commonly used in various recommendation system models. The search space of the invention is a parameter sharing-based hyper-network, and can utilize a sub-architecture of random sampling and high-efficiency training sampling so as to achieve the aim of training the whole hyper-network. Meanwhile, the search strategy of the invention is a two-stage method, firstly, the parameters of the sub-architecture training hyper-network in the hyper-network are sampled, and then, the excellent sub-architecture in the hyper-network is searched by using an evolutionary algorithm. Experiments are performed on the public data set, and compared with a manual design method and other NAS algorithms, the method provided by the invention is also obviously improved.
Drawings
FIG. 1 is a general framework of the present invention;
FIG. 2 is an overall schematic diagram of the search space and search algorithm of the present invention. In fig. 2, (a) represents a search space, and 2 (b) represents a solution of the search space, which is obtained by sampling one block at each layer.
Detailed Description
The present invention is further illustrated by the following figures and specific examples, which are to be understood as illustrative only and not as limiting the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.
The invention discloses a collaborative filtering model automatic architecture searching method, which comprises the following steps:
s1, constructing a super network according to a preset search space, wherein parameters in the super network are shared by all sub-architectures;
s11, firstly, inducing and abstracting the existing collaborative filtering model to form a unified framework of the collaborative filtering model, and expressing the unified framework as a mathematical model. By analyzing the existing collaborative filtering model, the collaborative filtering model can be abstracted into two parts: embedding representation, converting users and articles into low-dimensional vector representation; and an interaction function that reconstructs historical interactions of the user and the item based on the embedding. The following is a mathematical representation of the collaborative filtering abstraction framework: s12, on the basis of the unified framework of the collaborative filtering model obtained in the S11, searching for key interactive functions in collaborative filtering by using a framework searching technology
Figure 251550DEST_PATH_IMAGE014
Which is represented in the collaborative filtering model in the form of a specific framework, the mathematical definition of the framework search technique is:
Figure 683669DEST_PATH_IMAGE042
Figure 902161DEST_PATH_IMAGE043
wherein
Figure 27111DEST_PATH_IMAGE044
To verify the interaction data of the user and the item on the set,Aas a function of interaction
Figure 432685DEST_PATH_IMAGE045
The search space of (a), i.e. the hyper-network to be built,
Figure 402915DEST_PATH_IMAGE022
i.e. the optimal interaction function to be searched, which is a two-layer optimization problem, wherein the inner layerTo optimize the collaborative filtering model on the training data, the outer layer is the framework that optimizes the interaction function on the validation data.
It can be seen that the first half of the formula calculates a model, and calculates the difference between the similarity between the user and the item and the actual similarity between the user and the item, wherein the similarity calculated by the model is obtained by calculating the inner product of the respective embedding vectors of the user and the item, and the actual similarity between the user and the item is usually derived from the active action of the user on the item, such as the rating given to a certain movie by the user (1 star to 5 stars).
Figure 741492DEST_PATH_IMAGE011
For the loss function, mean-square error (MSE) is usually used, so the user's score for the item generally needs to be normalized or normalized accordingly. The second half of the formula is to calculate the two norms of the embedded tables of the user and the article, so as to achieve the purpose of punishment and restrain overfitting. Conventional collaborative filtering models typically utilize similarities
Figure 771765DEST_PATH_IMAGE046
The inner product form of (2) calculates the similarity. However, the inner product does not perform optimally on all data sets, and different data sets may require different ways of interaction. The idea of AutoML can therefore be introduced, using NAS algorithms to search for specific interaction functions for specific datasets.
The optimization objective of the collaborative filtering model is defined as:
Figure 133477DEST_PATH_IMAGE013
wherein
Figure 193485DEST_PATH_IMAGE014
In order to be a function of the interaction,
Figure 652148DEST_PATH_IMAGE015
a frobenius norm of the matrix is represented,
Figure 56584DEST_PATH_IMAGE016
for the regularization term, defined by the above formula
Figure 499067DEST_PATH_IMAGE017
Unlike the previous formula, the inner product will be used to calculate the similarity
Figure 279941DEST_PATH_IMAGE047
Replacing interactive functions for abstraction
Figure 858690DEST_PATH_IMAGE014
It is a learnable neural network architecture, so that the definition is also advantageous for extension, for example in NCF
Figure 434028DEST_PATH_IMAGE014
Is a multi-layer perceptron, and can represent different mathematical interactive operations in SIF.
S12, on the basis of the abstract framework obtained in the S11 step, the invention provides a method for searching key interaction functions in collaborative filtering by using a framework search technology
Figure 832648DEST_PATH_IMAGE014
Which is represented in the collaborative filtering model in the form of a specific framework. The invention aims to obtain a more appropriate interactive function architecture through architecture search and improve the performance of the collaborative filtering model. The mathematical definition of the above architecture search problem is:
Figure 951301DEST_PATH_IMAGE018
Figure 853398DEST_PATH_IMAGE019
wherein
Figure 599637DEST_PATH_IMAGE020
For authenticating users on setsAnd the interaction data of the item(s),Aas a function of interaction
Figure 219975DEST_PATH_IMAGE021
The search space of (a), i.e. the hyper network to be built,
Figure 873810DEST_PATH_IMAGE022
the optimal interaction function to be searched is a two-layer optimization problem, wherein the inner layer is a framework for optimizing a collaborative filtering model on training data, and the outer layer is a framework for optimizing the interaction function on verification data. Overall, it is desirable to find an optimal interactive function architecture so that the collaborative filtering model can achieve optimal performance on given data.
S13, in order to solve the searching problem provided in S12, the invention provides a compact and efficient interactive function architecture searching space by further analyzing the commonly used interactive function architecture in the conventional collaborative filtering model. The search space contains a variety of operations that are common in recommendation system models. The whole search space is designed into a hyper-network based on parameter sharing, as shown in FIG. 2, which is a layer with the number of layersNEach layer comprisingMDifferent blocks (blocks), each of which processes data in a different manner. And all blocks of each layer are activated only by one specific block in the subsequent process of training the sub-architecture or the process of searching the sub-architecture. That is, as shown in fig. 2, only one of the dashed blocks of each layer is selected as the actual sampled architecture or searched architecture. It should be noted that the original inputs (user embedding and item embedding) are output to each layer as input to each layer, while the second isiThe output of the layer will be as
Figure 895992DEST_PATH_IMAGE048
And inputting the layers. Because the intersection of features is very important in the recommendation system model, the design can enrich the implicit intersection of low-order features (including original input) and high-order features, and the expressive ability of the model is improved. And at the same time, the design similar to Skip-Connection in Resnet is designed, and the residual is used for referenceThe poor learning idea can make the original input more easily transmitted to the deep part of the network, and the gradient is better transmitted to a shallower layer in the reverse feedback, so that the model is easier to learn.
Each layer comprisesMDifferent blocks, which are all common operations and modules in the recommendation system model, can be actually classified into 5 broad categories, and each broad category of blocks contains some hyper-parameters, and the 5 broad categories of operations are listed below:
(1) Element-Wise (EW) operation: mathematical operations, which mainly involve some element levels, include bitwise addition (sum), bitwise maximum (max), bitwise minimum (min), bitwise average (avg), and inner product (inner product), constitute 5 different blocks in total.
(2) Multilayer perceptron (MLP) operation: namely a simple feedforward neural network layer, in order to search the diversity of the space, each MLP operation defined by the invention only comprises three layers, the input layer and the output layer are both assigned with dimensions, so that the dimensions are aligned on the search space, the hidden layer uses different neuron numbers to embody different MLP operations, and the neuron numbers of the hidden layer comprise 16, 32, 64, 128, 256, 512 and 1024, and form 7 different blocks in total.
(3) Cross Network (CN) operation: composed of multiple cross layers, the output of the cross layer is calculated by the following formula
Figure 875450DEST_PATH_IMAGE023
Whereinx 0 For the initial input to the cross-over network,x l is as followslThe output of the layer cross-over layer,w l andb l are respectively the firstlWeight parameters and bias parameters of layer-crossing layers, andx l+1 is thatlOutput of +1 layers of interleaved layers;
the output of the crossover network contains all the signals from 1 st order tolCross term of +1 order
Figure 920766DEST_PATH_IMAGE024
Whereind CN In order to cross the dimension of the network output,
Figure 906521DEST_PATH_IMAGE025
representing a vectorx i To (1)a j A component;
the number of cross layers determines the maximum order of the cross terms, and this embodiment uses 4 types of cross networks, with the number of cross layers being 1, 2, 3 and 4, making up 4 different blocks in total.
(4) Self-Attention (SA) operation: the self-attention mechanism is a special case of the attention mechanism, and Q (Query), K (Key),V SA (Value) three matrixes all come from the same input, the results of the three matrixes are normalized into probability distribution by utilizing Softmax operation, and then the probability distribution is multiplied by the matrixesV SA A representation of the sum of weights is obtained and then divided by a scale in order to prevent the result from being too large
Figure 48789DEST_PATH_IMAGE026
(ii) a Using a differenceQKV SA The matrix is a multi-head (multi-head) mechanism, which can learn different attention expressions, different head numbers form different self-attention modules, and the invention uses 1, 2, 3 and 4 self-attention modules with different head numbers, that is, 4 different blocks are formed in total. The calculation formula for attention operation is as follows:
Figure 136831DEST_PATH_IMAGE027
(5) factorization Machine (FM) operation: the factoring machine operation may be represented by the following equation:
Figure 997340DEST_PATH_IMAGE028
wherein
Figure 258557DEST_PATH_IMAGE029
Are the parameters of the factoring machine and,V FM in the form of a matrix of parameters,xin order to input the vector, the vector is input,d FM for the dimensions of the input vector,
Figure 989752DEST_PATH_IMAGE030
representing a parameter matrixV FM To (1) aiThe rows of the image data are, in turn,
Figure 45433DEST_PATH_IMAGE031
representation matrixVTo (1) ajLine of
Figure 127659DEST_PATH_IMAGE032
Representing a vectorxTo (1)iThe number of the components is one,
Figure 130250DEST_PATH_IMAGE033
representing a vectorxTo (1)jAnd (4) a component.
The output of the factoring machine is the sum of the first and second order feature interactions. Since both inputs have been converted to low-dimensional embeddings, the present invention computes the interaction of first-order features and second-order features for a tensor by concatenating the inputs.
Thus, each layer in the search space of the present invention is common
Figure 515619DEST_PATH_IMAGE049
Different blocks are seeded. While processing multiple inputs for parameter sharing and to facilitate different types of blocks, the output of each block is fixed to a specified dimension, noted as
Figure 211043DEST_PATH_IMAGE050
S2, super network training: randomly sampling a sub-framework from the super-network, and training the super-network through the training sub-framework;
on the basis of the hyper-network constructed in S1, sampling a framework from the hyper-network, the sampled framework inherits parameters of the hyper-network,the hyper-network is trained by training the sampled architecture. Note book
Figure 514985DEST_PATH_IMAGE035
For super networksAMiddle sub-structureaThe network parameters of (a) are set,
Figure 383584DEST_PATH_IMAGE036
representing the sub-architecture of a particular sample and its network parameters,
Figure 620530DEST_PATH_IMAGE037
for sampling sub-architecturesaThe object of training the hyper-network is:
Figure 814751DEST_PATH_IMAGE038
wherein,
Figure 524432DEST_PATH_IMAGE039
it is shown that it is desirable to,
Figure 665563DEST_PATH_IMAGE040
representing the loss of the architecture on the training data, i.e. according to a priori distribution from the super-network
Figure 225857DEST_PATH_IMAGE037
The expected performance of the random sampling sub-architecture is optimal. As can be seen from the above formula, the architecture parameters are decoupled into a prior distribution
Figure 325400DEST_PATH_IMAGE037
In (3), the parameters of the hyper-network are trained without relating to the optimized architecture parameters. Therefore, a priori distribution
Figure 541618DEST_PATH_IMAGE037
Plays a more important role. Recent work has found that pure random searching is very competitive with several of the most advanced NAS methods. And, from experience in some previous work,
Figure 486440DEST_PATH_IMAGE037
good sampling results have been obtained with uniform sampling (uniform sampling). In the present invention therefore, it is preferred that,
Figure 169750DEST_PATH_IMAGE037
the method of uniform sampling is adopted, and is fixed, and the distribution does not need to be learned in the process of training the hyper-network. The illustrated sub-architecture of fig. 2 may be viewed as a randomly sampled sub-architecture.
S21, randomly initializing parameters of the hyper-network: parameters of the hyper-network are initialized by standard network parameter initialization techniques.
And S22, training the super network. When training the hyper-network, the number of training rounds is assigned as
Figure 174615DEST_PATH_IMAGE051
If the resource time is sufficient, the system will,
Figure 878129DEST_PATH_IMAGE052
it can be set larger, if the resource time is limited,
Figure 626642DEST_PATH_IMAGE051
the setting can be small, and the difference is that the parameters of the hyper-network cannot be sufficiently trained. Each iteration comprises the following three steps:
(1) Randomly sampling a sub-architecture from the super network, wherein the sub-architecture inherits and shares parameters in the super network.
(2) A batch of data is randomly sampled from the training data.
(3) The sampled sub-architecture is trained with training data using gradient descent while corresponding parameters in the hyper-network are updated.
And S3, after the training of the super network in the S2 is finished, searching the collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search. The method has the advantages that the searched candidate sub-architecture can obtain the accuracy of the sub-architecture through direct reasoning without training by virtue of a parameter sharing technology. The weight sharing technology greatly improves the efficiency of architecture search.
And S31, the evolutionary algorithm finds the optimal solution through step-by-step iteration on population selection, crossing and variation. Each individual in the population is a candidate sub-architecture. First, initializing a population by randomly sampling a sub-architectureP 0 In the population, comprisesPAnd (4) individuals.
And S32, calculating the fitness. The fitness of each individual in the population is the performance of the corresponding sub-architecture, a parameter sharing technology is used, the sub-architecture is not required to be trained, and the parameters in the super network are directly inherited to be evaluated to obtain the fitness of the individual.
S33, an evolution step is executed: independently selecting from the population according to the fitness of the individualP/2, selecting an individual each time, wherein the selected probability of the individual is in direct proportion to the fitness of the individual, and the selected individual is used as a parent; and carrying out crossing and mutation operations on the parent, wherein the probability of mutation of each block in the selected individuals is 0.1, and one block in two randomly selected individuals is randomly selected to be crossed. The cross and variant progeny are added to the population. Selecting the population with the highest fitness from the new populationPThe individual forms a new population.
S34, after the evolutionary algorithm reaches the predefined iteration times, selecting the population with the highest fitness from the final populationtAnd the sub-frameworks are trained and evaluated, and the optimal sub-framework is selected as a search result.
Table 1 lists the square mean error comparison of the present invention with other methods, and it can be seen that the present invention is significantly improved over other methods.
TABLE 1 comparison of mean square error of the invention with other methods
Figure 895949DEST_PATH_IMAGE053

Claims (4)

1. An automatic architecture searching method of a collaborative filtering model is characterized by comprising the following steps:
s1, constructing a super network according to a preset search space, wherein parameters in the super network are shared by all sub-architectures;
s2, hyper-network training: randomly sampling a sub-framework from the super-network, and training the super-network through the training sub-framework;
and S3, after the training of the super network in the S2 is finished, searching the collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search.
2. The method for searching for an automated architecture of a collaborative filtering model according to claim 1, wherein the specific method for constructing a hyper-network in step S1 is:
s11, firstly, summarizing and abstracting the existing collaborative filtering model to form a unified framework of the collaborative filtering model, expressing the unified framework as a mathematical model, and specifically, abstracting the collaborative filtering model into two parts by analyzing the existing collaborative filtering model: embedding representation, converting users and articles into low-dimensional vector representation; an interaction function that reconstructs historical interactions of the user and the item based on the embedding; the following is a mathematical representation of the collaborative filtering abstraction framework: first define the user
Figure 828852DEST_PATH_IMAGE001
Embedded vector of
Figure 273740DEST_PATH_IMAGE002
Article of manufacture
Figure 486547DEST_PATH_IMAGE003
Embedded vector of
Figure 67701DEST_PATH_IMAGE004
Wherein
Figure 337620DEST_PATH_IMAGE005
A set of real numbers is represented by,kis the dimension of the embedding vector; user' s
Figure 4225DEST_PATH_IMAGE001
Is embedded in the watch
Figure 286302DEST_PATH_IMAGE006
Article of manufacture
Figure 456383DEST_PATH_IMAGE003
Is embedded in the watch
Figure 431292DEST_PATH_IMAGE007
In whichmAs to the number of the users,nthe number of the articles;
Figure 319614DEST_PATH_IMAGE008
for the user
Figure 139802DEST_PATH_IMAGE009
To the article
Figure 429969DEST_PATH_IMAGE003
The actual score of (a);
Figure 310201DEST_PATH_IMAGE010
in order to train the data that the user and the article in the set have actually interacted with,
Figure 682889DEST_PATH_IMAGE011
in order to be a function of the loss,
Figure 306768DEST_PATH_IMAGE012
for the regularization coefficient, the optimization objective of the collaborative filtering model is defined as:
Figure 717021DEST_PATH_IMAGE013
wherein
Figure 502574DEST_PATH_IMAGE014
In order to be a function of the interaction,
Figure 896647DEST_PATH_IMAGE015
a frobenius norm of the matrix is represented,
Figure 793058DEST_PATH_IMAGE016
for regularization terms, for alleviating the over-fitting problem in model optimization, by minimizing a function
Figure 323397DEST_PATH_IMAGE017
To optimize the collaborative filtering model;
s12, on the basis of the unified framework of the collaborative filtering model obtained in the S11, searching key interaction functions in collaborative filtering by using a framework searching technology
Figure 279852DEST_PATH_IMAGE014
Which is represented in the collaborative filtering model in the form of a specific framework, the mathematical definition of the framework search technique is:
Figure 892711DEST_PATH_IMAGE018
Figure 592814DEST_PATH_IMAGE019
wherein
Figure 712079DEST_PATH_IMAGE020
To verify the interaction data of the user and the item on the set,Aas a function of interaction
Figure 573856DEST_PATH_IMAGE021
The search space of (a), i.e. the hyper network to be built,
Figure 411362DEST_PATH_IMAGE022
the optimal interaction function which needs to be searched is a double-layer optimization problem, wherein the inner layer is a framework for optimizing a collaborative filtering model on training data, and the outer layer is a framework for optimizing the interaction function on verification data;
s13, finding an optimal interaction function architecture to enable the collaborative filtering model to achieve optimal performance on given data, and further analyzing the commonly used interaction function architecture in the conventional collaborative filtering model to provide an interaction function architecture search space, wherein the search space comprises commonly used operations in various recommendation system models, the search space is designed into a parameter sharing-based hyper-network, and the number of layers isNEach layer comprisingMDifferent blocks are seeded; wherein each block corresponds to an operation; obtaining a candidate architecture by sampling a block at each layer of the super network; the operations in the search space are defined as follows:
(1) Element-Wise operation, i.e., EW operation: the method comprises bitwise adding, bitwise taking the maximum value, bitwise taking the minimum value, bitwise averaging and inner product, and 5 different blocks are formed in total;
(2) multi-layer perceptron operation, i.e., MLP operation: each defined MLP operation only comprises three layers, the input layer and the output layer are assigned with dimensions, so that the dimensions are aligned on a search space, the hidden layer uses different neuron numbers to embody different MLP operations, and the neuron numbers of the hidden layer comprise 16, 32, 64, 128, 256, 512 and 1024, which form 7 different blocks in total;
(3) cross network operation, i.e. CN operation: composed of multiple cross layers, the output of the cross layer is calculated by the following formula
Figure 915156DEST_PATH_IMAGE023
Whereinx 0 For the initial input to the cross-over network,x l is as followslThe output of the layer cross-over layer,w l andb l are respectively the firstlWeight parameters and bias parameters of layer-crossing layers, andx l+1 is thatl+1 output of the cross layer;
the output of the crossover network contains all the signals from 1 st order tolCross term of order +1
Figure 420087DEST_PATH_IMAGE024
Whereind CN In order to cross the dimension of the network output,
Figure 249502DEST_PATH_IMAGE025
representing a vectorx i To (1) aa j A component;
the number of cross layers determines the maximum order of the cross terms, 4 types of cross networks are used, the number of the cross layers is 1, 2, 3 and 4, and 4 different blocks are formed in total;
(4) self-attention operation, namely SA operation: it is composed ofQKV SA All three matrixes come from the same input, the results of the three matrixes are normalized into probability distribution by utilizing Softmax operation, and then the probability distribution is multiplied by the matrixesV SA A representation of the sum of weights is obtained; then divided by a scale in order to prevent excessive results
Figure 329233DEST_PATH_IMAGE026
(ii) a Using a differenceQKV SA The matrix is a multi-head mechanism, so that different attention expressions are learned, different self-attention modules are formed by different head numbers, 1, 2, 3 and 4 self-attention modules with different head numbers are used, namely 4 different blocks are formed in total, and the calculation formula of the attention operation is as follows:
Figure 636718DEST_PATH_IMAGE027
(5) factoring machine operation, i.e. FM operation: the factoring machine operation is represented by the following equation:
Figure 730576DEST_PATH_IMAGE028
wherein
Figure 199734DEST_PATH_IMAGE029
Are the parameters of the factoring machine and,V FM in the form of a matrix of parameters,xin order to input the vector, the vector is input,d FM for the dimensions of the input vector,
Figure 746253DEST_PATH_IMAGE030
representing a parameter matrixV FM To (1) aiThe number of rows is such that,
Figure 857429DEST_PATH_IMAGE031
representation matrixVTo (1) ajLine for mobile communication terminal
Figure 805793DEST_PATH_IMAGE032
Representing a vectorxTo (1) aiThe number of the components is one,
Figure 711432DEST_PATH_IMAGE033
representing a vectorxTo (1) ajAnd (4) a component.
3. The method for searching the automated architecture of the collaborative filtering model according to claim 1, wherein the specific method for training the hyper-network in step S2 is: on the basis of the super network constructed in the S1, sampling a sub-architecture from the super network, wherein the sampled sub-architecture inherits the parameters of the super network, and training the super network by training the sampled sub-architecture; note the book
Figure 742317DEST_PATH_IMAGE035
For super networksAAll of the parameters involved are weighted by the weight,
Figure 657184DEST_PATH_IMAGE037
for super networksAMiddle sub-structureaThe network parameters of (a) are set,
Figure 460055DEST_PATH_IMAGE039
representing the sub-architecture of a particular sample and its network parameters,
Figure 536595DEST_PATH_IMAGE041
for sampling sub-architecturesaThe prior distribution of (a), the goal of training the hyper-network is:
Figure 323286DEST_PATH_IMAGE043
wherein,
Figure 510684DEST_PATH_IMAGE045
it is shown that it is desirable to,
Figure 433641DEST_PATH_IMAGE047
representing the loss of the structure on the training data, i.e. according to a priori distribution from the hyper-network
Figure 415504DEST_PATH_IMAGE041
The expected performance of the random sampling sub-framework is optimal;
s21, randomly initializing parameters of the hyper-network: initializing parameters of a hyper-network through a standard network parameter initialization technology;
s22, training the super network, iteratively executing the training process of the super network, and designating the number of training rounds
Figure 689490DEST_PATH_IMAGE049
Each iteration comprises the following three steps:
(1) Randomly sampling a sub-architecture from the super network, wherein the sub-architecture inherits and shares parameters in the super network;
(2) Randomly sampling a batch of data from the training data;
(3) The sampled sub-architecture is trained with training data using gradient descent while corresponding parameters in the super-network are updated.
4. The method for searching for an automated architecture of a collaborative filtering model according to claim 1, wherein the specific method of the evolutionary search algorithm in step S3 is:
s31, initializing a population through a randomly sampled sub-architectureP 0 In the population, comprisesPEach individual in the population is a candidate sub-framework, and an evolutionary algorithm finds an optimal solution through selection, intersection and variation of the population and gradual iteration;
s32, calculating the fitness: the fitness of each individual in the population is the performance of the corresponding sub-framework, a parameter sharing technology is used, the sub-framework is not required to be trained, and the parameters in the super network are directly inherited to be evaluated to obtain the fitness of the individual;
s33, an evolution step is executed: independently selecting from the population according to the fitness of the individualP/2The sub-individuals select one individual each time, the selected probability of the individual is in direct proportion to the fitness of the individual, and the selected individual serves as a parent; performing crossover and mutation operations on the parent, wherein the mutation probability of each block in the selected individuals is 0.1, randomly selecting one block from two randomly selected individuals for crossover, adding the crossover and mutated offspring into the population, and selecting the offspring with the highest fitness from the new populationPThe individuals form new populations;
s34, after the evolutionary algorithm reaches the predefined iteration number, selecting the population with the highest fitness from the final populationtAnd the sub-frameworks are trained and evaluated, and the optimal sub-framework is selected as a search result.
CN202211119443.3A 2022-09-15 2022-09-15 Automatic architecture searching method of collaborative filtering model Active CN115203585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211119443.3A CN115203585B (en) 2022-09-15 2022-09-15 Automatic architecture searching method of collaborative filtering model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211119443.3A CN115203585B (en) 2022-09-15 2022-09-15 Automatic architecture searching method of collaborative filtering model

Publications (2)

Publication Number Publication Date
CN115203585A true CN115203585A (en) 2022-10-18
CN115203585B CN115203585B (en) 2022-12-27

Family

ID=83572187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211119443.3A Active CN115203585B (en) 2022-09-15 2022-09-15 Automatic architecture searching method of collaborative filtering model

Country Status (1)

Country Link
CN (1) CN115203585B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288444A (en) * 2019-06-28 2019-09-27 第四范式(北京)技术有限公司 Realize the method and system of user's associated recommendation
CN112464579A (en) * 2021-02-02 2021-03-09 四川大学 Identification modeling method for searching esophageal cancer lesion area based on evolutionary neural network structure
CN114419389A (en) * 2021-12-14 2022-04-29 上海悠络客电子科技股份有限公司 Target detection model construction method based on neural network architecture search

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288444A (en) * 2019-06-28 2019-09-27 第四范式(北京)技术有限公司 Realize the method and system of user's associated recommendation
CN112464579A (en) * 2021-02-02 2021-03-09 四川大学 Identification modeling method for searching esophageal cancer lesion area based on evolutionary neural network structure
CN114419389A (en) * 2021-12-14 2022-04-29 上海悠络客电子科技股份有限公司 Target detection model construction method based on neural network architecture search

Also Published As

Publication number Publication date
CN115203585B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
CN111428147B (en) Social recommendation method of heterogeneous graph volume network combining social and interest information
CN111611472B (en) Binding recommendation method and system based on graph convolution neural network
CN109785062B (en) Hybrid neural network recommendation system based on collaborative filtering model
Wang et al. A mobile recommendation system based on logistic regression and gradient boosting decision trees
CN111310063B (en) Neural network-based article recommendation method for memory perception gated factorization machine
Li et al. Deep probabilistic matrix factorization framework for online collaborative filtering
CN114357067B (en) Personalized federal element learning method aiming at data isomerism
CN111881363B (en) Recommendation method based on graph interaction network
CN112905648B (en) Multi-target recommendation method and system based on multi-task learning
CN109582864A (en) Course recommended method and system based on big data science and changeable weight adjustment
CN112287166B (en) Movie recommendation method and system based on improved deep belief network
CN113918832B (en) Graph convolution collaborative filtering recommendation system based on social relationship
CN113918833B (en) Product recommendation method realized through graph convolution collaborative filtering of social network relationship
CN114357312B (en) Community discovery method and personality recommendation method based on graph neural network automatic modeling
CN113918834B (en) Graph convolution collaborative filtering recommendation method fusing social relations
CN111292197A (en) Community discovery method based on convolutional neural network and self-encoder
CN115577283A (en) Entity classification method and device, electronic equipment and storage medium
CN116910375A (en) Cross-domain recommendation method and system based on user preference diversity
CN115525819A (en) Cross-domain recommendation method for information cocoon room
CN113409157A (en) Cross-social network user alignment method and device
CN115203585B (en) Automatic architecture searching method of collaborative filtering model
CN116955810A (en) Optimization method of knowledge collaborative recommendation algorithm based on graph convolution network
CN116664253A (en) Project recommendation method based on generalized matrix decomposition and attention shielding
CN116385077A (en) Multi-behavior recommendation system based on behavior perception fusion graph convolution network
Mu et al. AD-link: An adaptive approach for user identity linkage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant