CN115203585A

CN115203585A - Automatic architecture searching method of collaborative filtering model

Info

Publication number: CN115203585A
Application number: CN202211119443.3A
Authority: CN
Inventors: 黄宜华; 朱光辉; 程锋; 蒋申
Original assignee: Jiangsu Hongcheng Big Data Technology And Application Research Institute Co ltd
Current assignee: Jiangsu Hongcheng Big Data Technology And Application Research Institute Co ltd
Priority date: 2022-09-15
Filing date: 2022-09-15
Publication date: 2022-10-18
Anticipated expiration: 2042-09-15
Also published as: CN115203585B

Abstract

The invention discloses an automatic architecture searching method of a collaborative filtering model. The method of the invention comprises the following steps: s1, constructing a super network according to a preset search space, wherein parameters in the super network are shared by all sub-architectures; s2, super network training: randomly sampling a sub-framework from the super network, and training the super network through the training sub-framework; and S3, after the training of the super network in the S2 is finished, searching the collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search. Compared with the prior art, the method has the advantages that the diversity of the search space is further expanded, and the expressive force of the collaborative filtering model is improved; the efficiency of the search algorithm is greatly improved by the weight sharing-based hyper-network training and architecture search.

Description

Automatic architecture searching method of collaborative filtering model

Technical Field

The invention relates to the fields of artificial intelligence, recommendation algorithm and automatic machine learning, in particular to an automatic architecture searching method of a collaborative filtering model.

Background

Personalized recommendations are ubiquitous and have been applied to many online services, such as e-commerce, advertising, and social media. At the heart of this is the estimation of the likelihood of a user adopting an item based on historical interactions such as purchases and clicks. Collaborative Filtering (CF) solves this problem by assuming that similarly behaving users will exhibit similar preferences for items. To achieve this assumption, one common paradigm is to model parameterize the user and the item to reconstruct the historical interactions and predict the user's preferences based on the parameters. Generally, a learnable model has two key parts:

(1) And embedding the representation, and converting the user and the article into a low-dimensional vector representation. The vector representations of all users and items are typically stored by creating an embedded matrix, each row vector being a vector representation of a specific user or item.

(2) And an interaction function for reconstructing historical interactions based on the embedding. For example, matrix Factorization (MF) directly models the interaction of user/item IDs into a matrix, and then uses matrix factorization to generate a hidden factor matrix representing the user or item for each of the user and item, that is, an embedded vector for each of the user and item, and builds a model of the interaction between the user and the item by using the inner product. The rise of deep learning in recent years inspires researchers to replace traditional inner product interaction models with more powerful deep neural networks. The neural collaborative filtering model replaces the inner product interaction function of matrix decomposition with a non-linear neural network, while the translation-based CF model uses the euclidean distance scale as the interaction function, and so on.

The invention mainly aims at interactive function modeling in a collaborative filtering model. The Neural Collaborative Filtering (NCF) introduces a deep neural network model for interactive modeling, and inputs of a user side and an article side are spliced together and input into the neural network model, so that stronger and effective feature intersection is realized, and the intersection of implicit features is enriched.

The deep neural network model adopted in NCF is mainly a multi-layer perceptron (MLP), each layer having a different number of neurons. However, the number of neurons in each layer usually depends on the experience of human experts, a great number of service scenes exist in a recommendation system, a set of hyper-parameters is difficult to be applied to all service scenes, and if the hyper-parameters are not well designed, the model effect is greatly influenced.

Therefore, some work attempts to apply automated machine learning (AutoML) techniques in interactive functions to search for appropriate interactive functions for a particular data set. SIF considers that the optimal interactive functions in collaborative filtering are different in different data sets, so SIF searches the interactive functions by using an NAS method. However, the search space of SIF only involves some simple mathematical operations, such as addition, subtraction, multiplication, division, maximum/minimum value, inner product, etc., and the model expression capability is limited.

Based on the defects of NCF and SIF, the invention provides an automatic architecture searching method of a collaborative filtering model, which is used for automatically searching an NAS algorithm of an interactive function of the collaborative filtering model. NAS algorithms are generally designed in three aspects:

(1) Searching a space: a collection of neural network structures that can be searched, i.e., a space of solutions, is defined.

(2) And (3) searching strategies: how to find the optimal network structure in the search space is defined.

(3) The evaluation method comprises the following steps: how to evaluate the performance of the searched network structure is defined.

Disclosure of Invention

The invention aims to: aiming at the problems and the defects in the prior art, the invention aims to provide an automatic architecture searching method of a collaborative filtering model, and solves the problems that the searching space of the current collaborative filtering model automatic architecture searching is limited and the model architecture performance capability is limited.

The technical scheme is as follows: in order to achieve the above object, the technical solution adopted by the present invention is an automated architecture search method for a collaborative filtering model, comprising the following steps:

s1, constructing a super network according to a preset search space, wherein parameters in the super network are shared by all sub-architectures;

s2, super network training: randomly sampling a sub-framework from the super-network, and training the super-network through the training sub-framework;

and S3, after the training of the super network in the S2 is finished, searching a collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search.

Further, the specific method for constructing the super-network in step S1 is:

s11, firstly, inducing and abstracting the existing collaborative filtering model to form a unified framework of the collaborative filtering model, representing the unified framework as a mathematical model, and specifically, abstracting the collaborative filtering model into two parts by analyzing the existing collaborative filtering model: embedding representation, converting users and articles into low-dimensional vector representation; an interaction function that reconstructs historical interactions of the user and the item based on the embedding; the following is a mathematical representation of the collaborative filtering abstraction framework: first define the user

Embedded vector of

Article of manufacture

Embedded vector of

Wherein

A set of real numbers is represented by,kis the dimension of the embedding vector; user' s

Is embedded in the watch

Article of manufacture

Is inserted in the watch

In whichmThe number of the users is the number of the users,nthe number of the articles;

for the user

To the article

Actual rating of (a);

to train the data for the set of users and items that have had actual interaction,

in order to be a function of the loss,

for the regularization coefficient, the optimization objective of the collaborative filtering model is defined as:

wherein

In order to be a function of the interaction,

a frobenius norm of the matrix is represented,

for regularization terms, for alleviating the over-fitting problem in model optimization, by minimizing a function

To optimize the co-ordinationThe same filtering model is used;

s12, on the basis of the unified framework of the collaborative filtering model obtained in the S11, searching key interaction functions in collaborative filtering by using a framework searching technology

Which is represented in the collaborative filtering model in the form of a specific architecture, the mathematical definition of the architecture search technique is:

wherein

To verify the interaction data of the user and the item on the set,Aas a function of interaction

The search space of (a), i.e. the hyper-network to be built,

the optimal interaction function which needs to be searched is a double-layer optimization problem, wherein the inner layer is a framework for optimizing a collaborative filtering model on training data, and the outer layer is a framework for optimizing the interaction function on verification data;

s13, finding an optimal interaction function architecture to enable the collaborative filtering model to achieve optimal performance on given data, and further analyzing the commonly used interaction function architecture in the conventional collaborative filtering model to provide an interaction function architecture search space, wherein the search space comprises commonly used operations in various recommendation system models, the search space is designed into a parameter sharing-based hyper-network, and the number of layers isNEach layer comprisingMDifferent blocks are seeded; wherein each block corresponds to onePerforming seed operation; obtaining a candidate architecture by sampling a block at each layer of the super-network; the operations in the search space are defined as follows:

(1) Element-Wise operation, i.e., EW operation: the method comprises bitwise adding, bitwise taking the maximum value, bitwise taking the minimum value, bitwise averaging and inner product, and 5 different blocks are formed in total;

(2) multi-layer perceptron operation, i.e., MLP operation: each defined MLP operation only comprises three layers, the input layer and the output layer are assigned with dimensions, so that the dimensions are aligned on a search space, the hidden layer uses different neuron numbers to embody different MLP operations, and the neuron numbers of the hidden layer comprise 16, 32, 64, 128, 256, 512 and 1024, which form 7 different blocks in total;

(3) cross network operation, i.e. CN operation: composed of a plurality of cross layers, the output of which is calculated by the following formula

Whereinx ₀ For the initial input to the cross-over network,x _l is a firstlThe output of the layer cross-over layer,w _l andb _l are respectively the firstlWeight parameters and bias parameters of layer-crossing layers, andx _l+1 is that firstl+1 output of the cross layer;

the output of the crossover network contains all the signals from 1 st order tolCross term of +1 order

In whichd ^CN In order to cross the dimension of the network output,

representing a vectorx _i To (1)a _j A component;

the number of cross layers determines the maximum order of cross terms, 4 types of cross networks are used, the number of the cross layers is 1, 2, 3 and 4, and 4 different blocks are formed in total;

(4) self-attention operation, namely SA operation: it is provided withQ、K、V ^SA All three matrixes come from the same input, the results of the three matrixes are normalized into probability distribution by utilizing Softmax operation, and then the probability distribution is multiplied by the matrixesV ^SA A representation of the sum of weights is obtained; then divided by a scale in order to prevent excessive results

(ii) a Using a differenceQ、K、V ^SA The matrix is a multi-head mechanism, so that different attention expressions are learned, different self-attention modules are formed by different head numbers, 1, 2, 3 and 4 self-attention modules with different head numbers are used, namely 4 different blocks are formed in total, and the calculation formula of the attention operation is as follows:

(5) factoring machine operation, i.e. FM operation: the factoring machine operation is represented by the following equation:

wherein

Are the parameters of the factoring machine and,V ^FM in the form of a matrix of parameters,xin order to input the vector, the vector is input,d ^FM for the dimensions of the input vector,

representing a parameter matrixV ^FM To (1)iThe number of rows is such that,

representation matrixVTo (1) ajLine for mobile communication terminal

Representing a vectorxTo (1)iThe number of the components is such that,

representing a vectorxTo (1) ajAnd (4) a component.

Further, the specific method of the training of the super network in step S2 is: on the basis of the hyper-network constructed in S1, sampling a sub-architecture from the hyper-network, wherein the sampled sub-architecture inherits parameters of the hyper-network, and training the hyper-network by training the sampled sub-architecture; note book

For super networksAAll of the parameters involved are weighted by the weight,

for super networksAMiddle sub-structureaThe network parameters of (a) are set,

representing the sub-architecture of a particular sample and its network parameters,

for sampling sub-architecturesaThe prior distribution of (a), the goal of training the hyper-network is:

wherein,

it is shown that it is desirable to,

representing the loss of the architecture on the training data, i.e. according to a priori distribution from the super-network

The expected performance of the random sampling sub-framework is optimal;

s21, randomly initializing parameters of the hyper-network: initializing parameters of a hyper-network through a standard network parameter initialization technology;

s22, training the super network, iteratively executing the training process of the super network, and designating the number of training rounds

Each iteration comprises the following three steps:

(1) Randomly sampling a sub-architecture from the super network, wherein the sub-architecture inherits and shares parameters in the super network;

(2) Randomly sampling a batch of data from the training data;

(3) The sampled sub-architecture is trained with training data using gradient descent while corresponding parameters in the super-network are updated.

Further, the specific method of the evolutionary search algorithm in step S3 is:

s31, initializing a population through a randomly sampled sub-architectureP ₀ In the population comprisingPEach individual in the population is a candidate sub-framework, and an evolutionary algorithm gradually iterates through population selection, crossing and variation to find an optimal solution;

s32, calculating the fitness: the fitness of each individual in the population is the performance of the corresponding sub-framework, a parameter sharing technology is used, the sub-framework is not required to be trained, and the parameters in the super network are directly inherited to be evaluated to obtain the fitness of the individual;

s33, an evolution step is executed: independently selecting from the population according to the fitness of the individualP/2The sub-individuals select one individual each time, the selected probability of the individual is in direct proportion to the fitness of the individual, and the selected individual serves as a parent; performing crossover and mutation operations on the parent, wherein the probability of mutation of each block in the selected individuals is 0.1, randomly selecting one block from two randomly selected individuals for crossover, and adding the crossover and mutated offspring into the populationSelecting the population with the highest fitness from the new populationPThe individual forms a new population.

S34, after the evolutionary algorithm reaches the predefined iteration number, selecting the population with the highest fitness from the final populationtAnd the sub-frameworks are trained and evaluated, and the optimal sub-framework is selected as a search result.

Has the beneficial effects that:

the invention provides an automatic architecture search of a collaborative filtering model, and an NAS algorithm is introduced to search an interactive function in the corresponding collaborative filtering model aiming at a specific data set, so that the effect of the collaborative filtering model is improved. Compared with other collaborative filtering models which work in some application NAS modes, the search space of the method is compact and efficient, and the method comprises the operations and modules which are commonly used in various recommendation system models. The search space of the invention is a parameter sharing-based hyper-network, and can utilize a sub-architecture of random sampling and high-efficiency training sampling so as to achieve the aim of training the whole hyper-network. Meanwhile, the search strategy of the invention is a two-stage method, firstly, the parameters of the sub-architecture training hyper-network in the hyper-network are sampled, and then, the excellent sub-architecture in the hyper-network is searched by using an evolutionary algorithm. Experiments are performed on the public data set, and compared with a manual design method and other NAS algorithms, the method provided by the invention is also obviously improved.

Drawings

FIG. 1 is a general framework of the present invention;

FIG. 2 is an overall schematic diagram of the search space and search algorithm of the present invention. In fig. 2, (a) represents a search space, and 2 (b) represents a solution of the search space, which is obtained by sampling one block at each layer.

Detailed Description

The present invention is further illustrated by the following figures and specific examples, which are to be understood as illustrative only and not as limiting the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.

The invention discloses a collaborative filtering model automatic architecture searching method, which comprises the following steps:

s11, firstly, inducing and abstracting the existing collaborative filtering model to form a unified framework of the collaborative filtering model, and expressing the unified framework as a mathematical model. By analyzing the existing collaborative filtering model, the collaborative filtering model can be abstracted into two parts: embedding representation, converting users and articles into low-dimensional vector representation; and an interaction function that reconstructs historical interactions of the user and the item based on the embedding. The following is a mathematical representation of the collaborative filtering abstraction framework: s12, on the basis of the unified framework of the collaborative filtering model obtained in the S11, searching for key interactive functions in collaborative filtering by using a framework searching technology

Which is represented in the collaborative filtering model in the form of a specific framework, the mathematical definition of the framework search technique is:

wherein

The search space of (a), i.e. the hyper-network to be built,

i.e. the optimal interaction function to be searched, which is a two-layer optimization problem, wherein the inner layerTo optimize the collaborative filtering model on the training data, the outer layer is the framework that optimizes the interaction function on the validation data.

It can be seen that the first half of the formula calculates a model, and calculates the difference between the similarity between the user and the item and the actual similarity between the user and the item, wherein the similarity calculated by the model is obtained by calculating the inner product of the respective embedding vectors of the user and the item, and the actual similarity between the user and the item is usually derived from the active action of the user on the item, such as the rating given to a certain movie by the user (1 star to 5 stars).

For the loss function, mean-square error (MSE) is usually used, so the user's score for the item generally needs to be normalized or normalized accordingly. The second half of the formula is to calculate the two norms of the embedded tables of the user and the article, so as to achieve the purpose of punishment and restrain overfitting. Conventional collaborative filtering models typically utilize similarities

The inner product form of (2) calculates the similarity. However, the inner product does not perform optimally on all data sets, and different data sets may require different ways of interaction. The idea of AutoML can therefore be introduced, using NAS algorithms to search for specific interaction functions for specific datasets.

The optimization objective of the collaborative filtering model is defined as:

wherein

In order to be a function of the interaction,

a frobenius norm of the matrix is represented,

for the regularization term, defined by the above formula

Unlike the previous formula, the inner product will be used to calculate the similarity

Replacing interactive functions for abstraction

It is a learnable neural network architecture, so that the definition is also advantageous for extension, for example in NCF

Is a multi-layer perceptron, and can represent different mathematical interactive operations in SIF.

S12, on the basis of the abstract framework obtained in the S11 step, the invention provides a method for searching key interaction functions in collaborative filtering by using a framework search technology

Which is represented in the collaborative filtering model in the form of a specific framework. The invention aims to obtain a more appropriate interactive function architecture through architecture search and improve the performance of the collaborative filtering model. The mathematical definition of the above architecture search problem is:

wherein

For authenticating users on setsAnd the interaction data of the item(s),Aas a function of interaction

The search space of (a), i.e. the hyper network to be built,

the optimal interaction function to be searched is a two-layer optimization problem, wherein the inner layer is a framework for optimizing a collaborative filtering model on training data, and the outer layer is a framework for optimizing the interaction function on verification data. Overall, it is desirable to find an optimal interactive function architecture so that the collaborative filtering model can achieve optimal performance on given data.

S13, in order to solve the searching problem provided in S12, the invention provides a compact and efficient interactive function architecture searching space by further analyzing the commonly used interactive function architecture in the conventional collaborative filtering model. The search space contains a variety of operations that are common in recommendation system models. The whole search space is designed into a hyper-network based on parameter sharing, as shown in FIG. 2, which is a layer with the number of layersNEach layer comprisingMDifferent blocks (blocks), each of which processes data in a different manner. And all blocks of each layer are activated only by one specific block in the subsequent process of training the sub-architecture or the process of searching the sub-architecture. That is, as shown in fig. 2, only one of the dashed blocks of each layer is selected as the actual sampled architecture or searched architecture. It should be noted that the original inputs (user embedding and item embedding) are output to each layer as input to each layer, while the second isiThe output of the layer will be as

And inputting the layers. Because the intersection of features is very important in the recommendation system model, the design can enrich the implicit intersection of low-order features (including original input) and high-order features, and the expressive ability of the model is improved. And at the same time, the design similar to Skip-Connection in Resnet is designed, and the residual is used for referenceThe poor learning idea can make the original input more easily transmitted to the deep part of the network, and the gradient is better transmitted to a shallower layer in the reverse feedback, so that the model is easier to learn.

Each layer comprisesMDifferent blocks, which are all common operations and modules in the recommendation system model, can be actually classified into 5 broad categories, and each broad category of blocks contains some hyper-parameters, and the 5 broad categories of operations are listed below:

(1) Element-Wise (EW) operation: mathematical operations, which mainly involve some element levels, include bitwise addition (sum), bitwise maximum (max), bitwise minimum (min), bitwise average (avg), and inner product (inner product), constitute 5 different blocks in total.

(2) Multilayer perceptron (MLP) operation: namely a simple feedforward neural network layer, in order to search the diversity of the space, each MLP operation defined by the invention only comprises three layers, the input layer and the output layer are both assigned with dimensions, so that the dimensions are aligned on the search space, the hidden layer uses different neuron numbers to embody different MLP operations, and the neuron numbers of the hidden layer comprise 16, 32, 64, 128, 256, 512 and 1024, and form 7 different blocks in total.

(3) Cross Network (CN) operation: composed of multiple cross layers, the output of the cross layer is calculated by the following formula

Whereinx ₀ For the initial input to the cross-over network,x _l is as followslThe output of the layer cross-over layer,w _l andb _l are respectively the firstlWeight parameters and bias parameters of layer-crossing layers, andx _l+1 is thatlOutput of +1 layers of interleaved layers;

Whereind ^CN In order to cross the dimension of the network output,

representing a vectorx _i To (1)a _j A component;

the number of cross layers determines the maximum order of the cross terms, and this embodiment uses 4 types of cross networks, with the number of cross layers being 1, 2, 3 and 4, making up 4 different blocks in total.

(4) Self-Attention (SA) operation: the self-attention mechanism is a special case of the attention mechanism, and Q (Query), K (Key),V ^SA (Value) three matrixes all come from the same input, the results of the three matrixes are normalized into probability distribution by utilizing Softmax operation, and then the probability distribution is multiplied by the matrixesV ^SA A representation of the sum of weights is obtained and then divided by a scale in order to prevent the result from being too large

(ii) a Using a differenceQ、K、V ^SA The matrix is a multi-head (multi-head) mechanism, which can learn different attention expressions, different head numbers form different self-attention modules, and the invention uses 1, 2, 3 and 4 self-attention modules with different head numbers, that is, 4 different blocks are formed in total. The calculation formula for attention operation is as follows:

(5) factorization Machine (FM) operation: the factoring machine operation may be represented by the following equation:

wherein

representing a parameter matrixV ^FM To (1) aiThe rows of the image data are, in turn,

representation matrixVTo (1) ajLine of

Representing a vectorxTo (1)iThe number of the components is one,

representing a vectorxTo (1)jAnd (4) a component.

The output of the factoring machine is the sum of the first and second order feature interactions. Since both inputs have been converted to low-dimensional embeddings, the present invention computes the interaction of first-order features and second-order features for a tensor by concatenating the inputs.

Thus, each layer in the search space of the present invention is common

Different blocks are seeded. While processing multiple inputs for parameter sharing and to facilitate different types of blocks, the output of each block is fixed to a specified dimension, noted as

。

on the basis of the hyper-network constructed in S1, sampling a framework from the hyper-network, the sampled framework inherits parameters of the hyper-network,the hyper-network is trained by training the sampled architecture. Note book

For super networksAMiddle sub-structureaThe network parameters of (a) are set,

for sampling sub-architecturesaThe object of training the hyper-network is:

wherein,

it is shown that it is desirable to,

The expected performance of the random sampling sub-architecture is optimal. As can be seen from the above formula, the architecture parameters are decoupled into a prior distribution

In (3), the parameters of the hyper-network are trained without relating to the optimized architecture parameters. Therefore, a priori distribution

Plays a more important role. Recent work has found that pure random searching is very competitive with several of the most advanced NAS methods. And, from experience in some previous work,

good sampling results have been obtained with uniform sampling (uniform sampling). In the present invention therefore, it is preferred that,

the method of uniform sampling is adopted, and is fixed, and the distribution does not need to be learned in the process of training the hyper-network. The illustrated sub-architecture of fig. 2 may be viewed as a randomly sampled sub-architecture.

S21, randomly initializing parameters of the hyper-network: parameters of the hyper-network are initialized by standard network parameter initialization techniques.

And S22, training the super network. When training the hyper-network, the number of training rounds is assigned as

If the resource time is sufficient, the system will,

it can be set larger, if the resource time is limited,

the setting can be small, and the difference is that the parameters of the hyper-network cannot be sufficiently trained. Each iteration comprises the following three steps:

(1) Randomly sampling a sub-architecture from the super network, wherein the sub-architecture inherits and shares parameters in the super network.

(2) A batch of data is randomly sampled from the training data.

(3) The sampled sub-architecture is trained with training data using gradient descent while corresponding parameters in the hyper-network are updated.

And S3, after the training of the super network in the S2 is finished, searching the collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search. The method has the advantages that the searched candidate sub-architecture can obtain the accuracy of the sub-architecture through direct reasoning without training by virtue of a parameter sharing technology. The weight sharing technology greatly improves the efficiency of architecture search.

And S31, the evolutionary algorithm finds the optimal solution through step-by-step iteration on population selection, crossing and variation. Each individual in the population is a candidate sub-architecture. First, initializing a population by randomly sampling a sub-architectureP ₀ In the population, comprisesPAnd (4) individuals.

And S32, calculating the fitness. The fitness of each individual in the population is the performance of the corresponding sub-architecture, a parameter sharing technology is used, the sub-architecture is not required to be trained, and the parameters in the super network are directly inherited to be evaluated to obtain the fitness of the individual.

S33, an evolution step is executed: independently selecting from the population according to the fitness of the individualP/2, selecting an individual each time, wherein the selected probability of the individual is in direct proportion to the fitness of the individual, and the selected individual is used as a parent; and carrying out crossing and mutation operations on the parent, wherein the probability of mutation of each block in the selected individuals is 0.1, and one block in two randomly selected individuals is randomly selected to be crossed. The cross and variant progeny are added to the population. Selecting the population with the highest fitness from the new populationPThe individual forms a new population.

S34, after the evolutionary algorithm reaches the predefined iteration times, selecting the population with the highest fitness from the final populationtAnd the sub-frameworks are trained and evaluated, and the optimal sub-framework is selected as a search result.

Table 1 lists the square mean error comparison of the present invention with other methods, and it can be seen that the present invention is significantly improved over other methods.

TABLE 1 comparison of mean square error of the invention with other methods

Claims

1. An automatic architecture searching method of a collaborative filtering model is characterized by comprising the following steps:

s2, hyper-network training: randomly sampling a sub-framework from the super-network, and training the super-network through the training sub-framework;

and S3, after the training of the super network in the S2 is finished, searching the collaborative filtering model which is excellent in performance from the super network by using an evolutionary algorithm, wherein the optimal model is a result of architecture search.

2. The method for searching for an automated architecture of a collaborative filtering model according to claim 1, wherein the specific method for constructing a hyper-network in step S1 is:

s11, firstly, summarizing and abstracting the existing collaborative filtering model to form a unified framework of the collaborative filtering model, expressing the unified framework as a mathematical model, and specifically, abstracting the collaborative filtering model into two parts by analyzing the existing collaborative filtering model: embedding representation, converting users and articles into low-dimensional vector representation; an interaction function that reconstructs historical interactions of the user and the item based on the embedding; the following is a mathematical representation of the collaborative filtering abstraction framework: first define the user

Embedded vector of

Article of manufacture

Embedded vector of

Wherein

Is embedded in the watch

Article of manufacture

Is embedded in the watch

In whichmAs to the number of the users,nthe number of the articles;

for the user

To the article

The actual score of (a);

in order to train the data that the user and the article in the set have actually interacted with,

in order to be a function of the loss,

wherein

In order to be a function of the interaction,

a frobenius norm of the matrix is represented,

To optimize the collaborative filtering model;

wherein

The search space of (a), i.e. the hyper network to be built,

s13, finding an optimal interaction function architecture to enable the collaborative filtering model to achieve optimal performance on given data, and further analyzing the commonly used interaction function architecture in the conventional collaborative filtering model to provide an interaction function architecture search space, wherein the search space comprises commonly used operations in various recommendation system models, the search space is designed into a parameter sharing-based hyper-network, and the number of layers isNEach layer comprisingMDifferent blocks are seeded; wherein each block corresponds to an operation; obtaining a candidate architecture by sampling a block at each layer of the super network; the operations in the search space are defined as follows:

(3) cross network operation, i.e. CN operation: composed of multiple cross layers, the output of the cross layer is calculated by the following formula

Whereinx ₀ For the initial input to the cross-over network,x _l is as followslThe output of the layer cross-over layer,w _l andb _l are respectively the firstlWeight parameters and bias parameters of layer-crossing layers, andx _l+1 is thatl+1 output of the cross layer;

the output of the crossover network contains all the signals from 1 st order tolCross term of order +1

Whereind ^CN In order to cross the dimension of the network output,

representing a vectorx _i To (1) aa _j A component;

the number of cross layers determines the maximum order of the cross terms, 4 types of cross networks are used, the number of the cross layers is 1, 2, 3 and 4, and 4 different blocks are formed in total;

(4) self-attention operation, namely SA operation: it is composed ofQ、K、V ^SA All three matrixes come from the same input, the results of the three matrixes are normalized into probability distribution by utilizing Softmax operation, and then the probability distribution is multiplied by the matrixesV ^SA A representation of the sum of weights is obtained; then divided by a scale in order to prevent excessive results

wherein

representing a parameter matrixV ^FM To (1) aiThe number of rows is such that,

representation matrixVTo (1) ajLine for mobile communication terminal

Representing a vectorxTo (1) aiThe number of the components is one,

representing a vectorxTo (1) ajAnd (4) a component.

3. The method for searching the automated architecture of the collaborative filtering model according to claim 1, wherein the specific method for training the hyper-network in step S2 is: on the basis of the super network constructed in the S1, sampling a sub-architecture from the super network, wherein the sampled sub-architecture inherits the parameters of the super network, and training the super network by training the sampled sub-architecture; note the book

For super networksAAll of the parameters involved are weighted by the weight,

for super networksAMiddle sub-structureaThe network parameters of (a) are set,

wherein,

it is shown that it is desirable to,

representing the loss of the structure on the training data, i.e. according to a priori distribution from the hyper-network

The expected performance of the random sampling sub-framework is optimal;

Each iteration comprises the following three steps:

(2) Randomly sampling a batch of data from the training data;

4. The method for searching for an automated architecture of a collaborative filtering model according to claim 1, wherein the specific method of the evolutionary search algorithm in step S3 is:

s31, initializing a population through a randomly sampled sub-architectureP ₀ In the population, comprisesPEach individual in the population is a candidate sub-framework, and an evolutionary algorithm finds an optimal solution through selection, intersection and variation of the population and gradual iteration;

s33, an evolution step is executed: independently selecting from the population according to the fitness of the individualP/2The sub-individuals select one individual each time, the selected probability of the individual is in direct proportion to the fitness of the individual, and the selected individual serves as a parent; performing crossover and mutation operations on the parent, wherein the mutation probability of each block in the selected individuals is 0.1, randomly selecting one block from two randomly selected individuals for crossover, adding the crossover and mutated offspring into the population, and selecting the offspring with the highest fitness from the new populationPThe individuals form new populations;