CN112035747B

CN112035747B - Information recommendation method and device

Info

Publication number: CN112035747B
Application number: CN202010913452.4A
Authority: CN
Inventors: 卢建东
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2023-09-29
Anticipated expiration: 2040-09-03
Also published as: CN112035747A

Abstract

The application provides an information recommendation method, an information recommendation device, electronic equipment and a computer readable storage medium; the method comprises the following steps: acquiring a historical information sequence and a recommendation information set of a user; determining a correlation factor of the historical information sequence corresponding to each piece of recommended information in the recommended information set; determining behavior characteristics of the user corresponding to each piece of recommended information according to the correlation factor of the historical information sequence corresponding to each piece of recommended information in the recommended information set; performing repeated iterative feature extraction processing on the behavior features of each piece of recommended information, and determining the click rate of each piece of recommended information based on the feature extraction result of each piece of recommended information; and executing the recommendation operation based on the click rate of each piece of recommendation information. According to the application, the recommendation accuracy can be improved.

Description

Information recommendation method and device

Technical Field

The present application relates to artificial intelligence technology, and in particular, to an information recommendation method, apparatus, electronic device, and computer readable storage medium.

Background

Artificial intelligence (AI, artificial Intelligence) is the theory, method and technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results.

Information recommendation is an important application of artificial intelligence, and a ranking stage in a recommendation system generally predicts click rate and ranks based on a machine learning model, and takes high-scoring objects as priority recommendation. In the related art, in order to improve the click rate prediction accuracy of the machine learning model, for example, a great amount of feature data is constructed in a feature engineering stage to enable the machine learning model to fully learn, but the applicant finds out that the utilization mode of the feature data lacks pertinence and distinctiveness in the process of implementing the embodiment of the application, and is difficult to effectively describe various interests of users, and the click rate prediction accuracy is influenced, so that the information recommendation accuracy is influenced.

Disclosure of Invention

The embodiment of the application provides an information recommendation method, an information recommendation device, electronic equipment and a computer readable storage medium, which can improve recommendation accuracy.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides an information recommendation method, which comprises the following steps:

acquiring a historical information sequence and a recommendation information set of a user;

determining a correlation factor of the historical information sequence corresponding to each piece of recommended information in the recommended information set;

Determining behavior characteristics of the user corresponding to each piece of recommended information according to the correlation factor of the historical information sequence corresponding to each piece of recommended information in the recommended information set;

performing repeated iterative feature extraction processing on the behavior features of each piece of recommended information, and determining the click rate of each piece of recommended information based on the feature extraction result of each piece of recommended information;

and executing the recommendation operation based on the click rate of each piece of recommendation information.

The embodiment of the application provides an information recommendation device, which comprises:

the acquisition module is used for acquiring a historical information sequence and a recommendation information set of the user;

the correlation factor determining module is used for determining the correlation factor of each piece of recommended information in the recommended information set corresponding to the historical information sequence;

the behavior feature determining module is used for determining the behavior feature of the user corresponding to each piece of recommended information according to the correlation factor of the historical information sequence corresponding to each piece of recommended information in the recommended information set;

the click rate determining module is used for carrying out repeated iterative feature extraction processing on the behavior features of each piece of recommended information and determining the click rate of each piece of recommended information based on the feature extraction result of each piece of recommended information;

And the recommending module is used for executing recommending operation based on the click rate of each piece of recommending information.

In the above scheme, the correlation factor of the history information sequence corresponding to each recommendation information includes: each history information in the history information sequence corresponds to the correlation factor of each recommended information;

the correlation factor determining module is further configured to:

determining characteristics of each history information in the history information sequence;

for any one of the recommended information in the recommended information set and any one of the history information in the history information sequence, the following processing is performed:

acquiring characteristics of the recommended information;

performing phase subtraction processing on the characteristics of the recommended information and the characteristics of the history information to obtain corresponding difference characteristics;

splicing the features of the recommended information, the features of the historical information and the corresponding difference features to obtain spliced features corresponding to the historical information;

and performing full connection processing on the spliced characteristics of the history information to obtain the correlation factor of the history information corresponding to the recommendation information.

In the above solution, the behavior feature determining module is further configured to:

the following processing is performed for each recommendation information in the recommendation information set:

and weighting the characteristics of a plurality of historical information by taking the correlation factor of the historical information corresponding to the recommended information as a weight to obtain behavior characteristics of the user aiming at the recommended information.

In the above solution, the click rate determining module is further configured to:

determining data characteristics of the user, recommended environment characteristics of the user and characteristics of each recommended information in the recommended information set;

splicing the behavior characteristics, the data characteristics, the recommended environment characteristics and the characteristics of the recommended information;

and carrying out iterative feature extraction processing on the splicing processing result.

performing feature extraction processing on the input of an nth neural network model through the nth neural network model in N cascaded neural network models, and

transmitting an nth feature extraction result output by the nth neural network model to the (n+1) th neural network model to continue feature extraction processing;

Wherein N is an integer whose value is increased from 1, the value range of N satisfies that N is more than or equal to 1 and less than or equal to N-1, and N is an integer more than or equal to 2; when the value of N is 1, the input of the N-th neural network model is the splicing processing result, and when the value of N is 2-N-1, the input of the N-th neural network model is the characteristic extraction result of the N-1-th neural network model.

In the scheme, when the value of N is 1-N-1, the N-th neural network model comprises a one-dimensional convolution layer and a maximum pooling layer;

the click rate determining module is further configured to:

carrying out convolution processing on the input of the nth neural network and one-dimensional convolution layer parameters of the one-dimensional convolution layer of the nth neural network model to obtain an nth convolution layer processing result corresponding to the splicing processing result;

performing maximum pooling treatment on the treatment result of the nth convolutional layer through a maximum pooling layer of the nth neural network model to obtain an nth characteristic extraction result output by the nth neural network model;

when the value of N is N-1, the n+1th neural network model comprises the one-dimensional convolution layer, a folding layer and the maximum pooling layer;

the click rate determining module is further configured to:

Carrying out convolution processing on the nth characteristic extraction result and one-dimensional convolution layer parameters of the one-dimensional convolution layer of the n+1th neural network model to obtain an nth+1th convolution layer processing result corresponding to the nth characteristic extraction result;

performing pairwise bit addition processing on convolution characteristic values of adjacent dimensions in the n+1th convolution layer processing result through the folding layer to obtain a folding processing result;

and carrying out maximum pooling treatment on the folding treatment result through a maximum pooling layer of the n+1th neural network model to obtain an n+1th characteristic extraction result output by the n+1th neural network model.

carrying out convolution processing on the characteristic value of each dimension input by the nth neural network and the one-dimensional convolution layer parameter to obtain a convolution characteristic value of each dimension;

and performing splicing processing on the convolution characteristic values of each dimension to obtain an nth convolution layer processing result based on the one-dimensional convolution layer parameters.

the following processing is executed for each dimension feature in the nth convolution layer processing result:

Acquiring a plurality of convolution calculated values of the dimension, and performing descending order sorting treatment on the convolution calculated values;

determining a plurality of convolution calculated values which are ranked ahead in the descending ranking result as a maximum pooling processing result of the feature of the dimension;

and performing splicing treatment on the maximum pooling treatment result of the characteristics of each dimension to obtain an nth characteristic extraction result output by the nth neural network model.

performing full-connection processing on the feature extraction result of each piece of recommended information, and performing maximum likelihood processing on the full-connection processing result to obtain the click rate corresponding to each piece of recommended information;

the recommendation module is further configured to:

and performing descending order sorting processing based on click rate on the recommendation information in the recommendation information set, and performing recommendation operation based on a plurality of recommendation information with top sorting.

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the information recommendation method provided by the embodiment of the application when executing the executable instructions stored in the memory.

The embodiment of the application provides a computer readable storage medium which stores executable instructions for realizing the information recommendation method provided by the embodiment of the application when being executed by a processor.

The embodiment of the application has the following beneficial effects:

based on the correlation factors of corresponding different recommendation information of the same historical information sequence, behavior features representing user interests are purposefully characterized for different recommendation information, bidirectional targeted feature characterization between recommendation information and users is achieved, diversified interests of users are effectively characterized, information recommendation precision of information recommendation based on click rate predicted by the behavior features is guaranteed, invalid recommendation is effectively avoided, and computing resources related to recommendation logic in a server are saved.

Drawings

FIG. 1 is a schematic diagram of an architecture of an information recommendation system based on artificial intelligence according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a server 200 applying an artificial intelligence based information recommendation method according to an embodiment of the present application;

FIG. 3A is a diagram of an overall model structure of an artificial intelligence based information recommendation method provided by an embodiment of the present application;

FIG. 3B is a schematic diagram of an attention module of an information recommendation method based on artificial intelligence according to an embodiment of the present application;

FIG. 3C is a schematic diagram of a deep convolutional neural network module of an information recommendation method based on artificial intelligence according to an embodiment of the present application;

FIGS. 4A-4D are schematic flow diagrams of an information recommendation method based on artificial intelligence according to an embodiment of the present application;

FIG. 5 is an overall architecture diagram of an artificial intelligence based information recommendation method provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of feature compression of an artificial intelligence based information recommendation method according to an embodiment of the present application;

FIG. 7 is a schematic feature cross diagram of an artificial intelligence based information recommendation method according to an embodiment of the present application;

fig. 8A-8B are schematic diagrams of attention mechanism models in the related art.

Detailed Description

The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.

1) Feature compression (Embedding): discrete features are mapped to a dense vector space, such as feature "man" to [0.3,0.4, …,0.9].

2) Predicted click-through rate (Pctr): the predicted click probability refers to the predicted click probability of certain information (advertisement) before the information is about to be displayed under a certain condition, the predicted click probability can be applied to a recommendation system, candidate recommendation objects are ranked based on the predicted click probability, and target objects with the highest ranking are recommended to a user.

3) Predicted conversion (Pcvr, predict conversion rate): the method comprises the steps of directly constructing a model from display data to predict the conversion rate of recommended information, or obtaining click data by using the display data and the conversion data, predicting the conversion rate by using the click data and the conversion data, and multiplying the predicted click rate by the predicted conversion rate to obtain the final predicted conversion rate.

4) The convolutional neural network (CNN, convolutional Neural Networks) is a feedforward neural network which comprises convolutional calculation and has a depth structure, is one of representative algorithms of deep learning, has characteristic learning capability, and can perform translation invariant classification on input information according to a hierarchical structure of the convolutional neural network, so that the convolutional neural network is also called as a 'translation invariant artificial neural network'.

5) Logistic model (LR, logistic Regression): the logistic regression assumes that the data obeys Bernoulli distribution, and the aim of classifying the data is achieved by applying gradient descent solving parameters by a method of maximizing likelihood functions.

6) Factorization machine model (FM, factorization Machine): the factorizer is a machine learning algorithm based on matrix decomposition, and has the greatest characteristic of having good learning ability for sparse data.

7) Click-through rate (CTR): the ratio of the actual click times of the information to the display times of the information;

8) Deep neural network (DNN, deep Neural Networks): it can be understood that a neural network with many hidden layers, sometimes called a multi-layer perceptron, is divided according to the positions of different layers, and the neural network layers inside the DNN can be divided into three types, an input layer, a hidden layer and an output layer.

9) Attention mechanism (Attention): attention is focused on important points, while other unimportant factors are ignored, wherein the judgment of the importance degree depends on application scenes, and the attention is divided into spatial attention and temporal attention according to the application scenes, the former is used for image processing, and the latter is used for natural language processing.

10 Thousands of display benefits (eCPM, effective cost per mile): the advertising revenue available for each thousand displays may be in the form of web pages, and by default, the advertising revenue may be in the form of thousands of web page displays, which are parameters that reflect the profitability of the web site and are not representative of revenue.

11 Behavioral characteristics): the characteristic of the historical behavior of the user is referred to, the historical behavior comprises the interactive behavior of the user aiming at the historical information, such as purchasing behavior, clicking behavior, comment behavior and the like, the characteristic of the historical behavior is the vectorized representation of the historical behavior data, and the historical behavior data is the data generated by integrating the interactive behavior of the user aiming at the historical information.

12 Data characteristics of the user): the method is characterized in that the method refers to vectorization representation of personal attribute data in a user portrait, the personal attribute data comprise age data, sex data, marital data, professional data and the like of a user, the data are discrete data, and the discrete data are vectorized to obtain data characteristics of the user.

In the technical scheme in the related technology, a CTR pre-estimated model is built based on an LR model, an FM model and a DNN model, the CTR pre-estimated model based on the LR model needs to be manually built with cross features, and because an embedded layer is not introduced, the generalization effect of the model is poor, the FM model is compared with the LR model, but the forced features in the FM model are only two-by-two crossed mode to learn the second-order combined features, the high-order combined features cannot be learned, the DNN model is compared with the LR model to introduce the embedded layer, and the high-order combined features are learned through a multi-layer perceptron, but the diversified interests of users cannot be described.

In order to solve the above problems, the embodiments of the present application provide an information recommendation method, apparatus, electronic device, and computer readable storage medium, which can characterize a user's diversified interests through an attention mechanism, learn high-order combination features with characterization capability through a multi-layer pooling manner, implement feature characterization with pertinence in both directions between recommendation information and the user, improve the characterization capability of a model, and effectively characterize the user's diversified interests, thereby ensuring information recommendation accuracy of information recommendation based on click rate predicted by the behavior features.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, a cloud database, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, basic cloud computing services such as big data and an artificial intelligent platform, and the server may be directly or indirectly connected through a wired or wireless communication mode.

Artificial intelligence cloud services, also commonly referred to as AIaaS (AI as Service, chinese is "AI as Service"). The service mode of the artificial intelligent platform is the mainstream at present, and particularly, the AIaaS platform can split several common AI services and provide independent or packaged services at the cloud. This service mode is similar to an AI theme mall: all developers can access one or more artificial intelligence services provided by the use platform through an API interface, and part of the highly-priced developers can deploy and operate and maintain cloud artificial intelligence services exclusive to themselves by using an AI framework and an AI infrastructure provided by the platform.

Referring to fig. 1, fig. 1 is a schematic architecture diagram of an information recommendation system based on artificial intelligence according to an embodiment of the present application, where the information recommendation system may be used to support various information recommendation scenarios, such as an application scenario for recommending news, an application scenario for recommending merchandise, an application scenario for recommending video, etc., and according to different application scenarios, the information may be news, video articles, graphics context, etc., or information related to products (e.g., real objects such as clothing, virtual articles such as game props, etc.). In the process that a user uses a client, the terminal 400 reports the collected interactive behavior of the user for information to the server 200, the interactive behavior is used as training sample data and user portraits and user characteristics corresponding to the user, the training sample data are behavior data of different users reported by each terminal, the training sample data are used for training a click rate prediction model based on the behavior data, the user portraits and the user characteristics are fed back by the terminal corresponding to a certain user, the click rate prediction model is used for determining the click rate of the information based on the user characteristics, the information characteristics and the environment characteristics, descending order of all recall information based on the click rate is performed, head information is determined based on the descending order ordering result, the head information can be 200 pieces of information which are ordered in front in a recommended information set, and the recommendation operation is performed for the 200 pieces of information which are ordered in front and the corresponding ordering order.

In the following description, a specific architecture of an information recommendation system is described, in which a terminal 400 is connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of both, and functions of the server 200 may be abstracted into a click rate prediction model and model training. The server 200 receives the recommended information request of the terminal 400, and requests to trigger the operation of the click rate prediction model, so as to determine the click rate of each piece of information recalled from the information database 500 based on the log containing the information exposure, the click rate and other data reported by the terminal 400, and sort the pieces of information in descending order according to the click rate, so as to return the pieces of information with the front sorting to the terminal 400 for presentation, and the terminal 400 reports the log containing the information exposure, the click rate and other data to the server 200 in real time as a training sample for generating the user real-time characteristics and the information real-time characteristics, so as to train the click rate prediction model.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a server 200 applying an information recommendation method based on artificial intelligence according to an embodiment of the present application, and the server 200 shown in fig. 2 includes: at least one processor 210, a memory 250, and at least one network interface 220. The various components in server 200 are coupled together by bus system 240. It is understood that the bus system 240 is used to enable connected communications between these components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 240 in fig. 2.

The processor 210 may be an integrated circuit chip with signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, or the like, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 250 optionally includes one or more storage devices physically located remote from processor 210.

Memory 250 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The non-volatile memory may be read only memory (ROM, read Only Me mory) and the volatile memory may be random access memory (RAM, random Access Memor y). The memory 250 described in embodiments of the present application is intended to comprise any suitable type of memory.

In some embodiments, memory 250 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 251 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

network communication module 252 for reaching other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 include: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (USB, universal Serial Bus), etc.

In some embodiments, the information recommending apparatus based on artificial intelligence provided in the embodiments of the present application may be implemented in a software manner, and fig. 2 shows an information recommending apparatus 255 based on artificial intelligence stored in a memory 250, where the information recommending apparatus includes a plurality of modules, and the modules may be software in the form of programs and plug-ins, including the following software modules: the acquisition module 2551, the correlation factor determination module 2552, the behavior feature determination module 2553, the click rate determination module 2554, and the recommendation module 2555 are logical, and thus may be arbitrarily combined or further split according to the implemented functions, and functions of the respective modules will be described below.

The information recommendation method based on artificial intelligence provided by the embodiment of the application will be described with reference to exemplary applications and implementations of the information recommendation system provided by the embodiment of the application, wherein the information recommendation system comprises a training stage and an application stage, and first, a model used in the information recommendation method based on artificial intelligence provided by the embodiment of the application and training performed by each model are described.

The information recommendation system provided by the embodiment of the application relates to a click rate prediction model, wherein the click rate prediction model comprises an attention module and a deep convolutional neural network module.

Referring to fig. 3A, fig. 3A is a block diagram of an overall click rate prediction model of an artificial intelligence based information recommendation method according to an embodiment of the present application, where the overall model includes an attention module and a deep convolutional neural network module, first, for each recommendation information in a recommendation information set, the attention module receives user behavior data, determines a user behavior feature for any recommendation information in the recommendation information set from the user behavior data, performs iterative feature extraction processing on the user behavior feature by the deep convolutional neural network module, and finally outputs a click rate of each recommendation information in the recommendation information set from the click rate prediction model, thereby performing a recommendation operation according to a descending order sequencing result of the click rate.

Referring to fig. 3B, fig. 3B is a block diagram of an attention module of an artificial intelligence based information recommendation method according to an embodiment of the present application, where the attention module includes a feature stitching layer, and is configured to stitch features (features of history information, features of recommendation information, and difference features between features of history information and features of recommendation information) input to the stitching layer, and the attention module further includes a full connection layer, and is configured to map the stitched features obtained by the stitching operation, so as to obtain a weight of each history information to the recommendation information.

Referring to fig. 3C, fig. 3C is a schematic structural diagram of a deep convolutional neural network module based on an information recommendation method based on artificial intelligence provided by the embodiment of the application, where the deep convolutional neural network module includes a plurality of groups of network structures and output layers, each group of network structures includes a one-dimensional convolutional layer and a pooling layer, the last group of network structures includes the one-dimensional convolutional layer, the pooling layer and the folding layer, the one-dimensional convolutional layer is a wide convolution, convolutions are performed on the same embedded dimension of different features, interaction relations among different features can be learned, on data with local correlation, a convolution kernel has feature extraction capability, features on an information side do not have local correlation, but through continuous one-dimensional convolution and pooling processing, the features on the finally obtained feature map are equivalent to a very high-order feature combination, a very wide feature combination is perceived, the pooling layer is used for extracting k features with stronger features (larger feature values) in the features obtained by the corresponding convolutional layer, the pooling layer can select the k features, the maximum k values are retained, on one hand, the relative position of the k values can be retained, and the corresponding information is actually extracted, and the probability is the probability information is input as the actual click probability information.

In some embodiments, the click rate is determined by calling a click rate prediction model, the training process of the click rate prediction model is as follows, training data (history information sequence, user portrait, environment data, recommended information) is propagated forward in the model, finally, after the features obtained by the pooling layer are transferred to the output layer, the output layer obtains a probability value of whether each recommended information is clicked, namely a click rate prediction value, through a maximum likelihood function, a cross entropy loss function of the model can be obtained through the maximum likelihood probability value of the model, parameters of the model are learned by minimizing the loss function, and the training method adopts a random gradient descent method.

The application of the model in the information recommendation method based on artificial intelligence provided by the embodiment of the application is explained next. Referring to fig. 4A, fig. 4A is a schematic flow diagram of an information recommendation method based on artificial intelligence according to an embodiment of the present application, and will be described with reference to steps 101-105 shown in fig. 4A.

In step 101, a sequence of historical information and a set of recommendation information for a user is obtained.

As an example, the historical information sequence is an information sequence that generates user behavior, for example, a sequence formed by information clicked by a user in a window time, a sequence formed by information browsed in a window time, when the information is commodity information, or a sequence formed by commodity purchased by the user in a window time, and the recommended information set is a set of information obtained in a recall stage of the recommendation system, and in practical application, recall processing is performed on millions of information, and a recall mode can be adopted to obtain information conforming to a user portrait in a collaborative filtering mode to form a recommended information set, so that further recommendation processes are continuously performed on the recommended information set by executing steps 102-105.

In step 102, a correlation factor is determined for each of the set of recommendation information corresponding to the historical information sequence.

As an example, the correlation factor of the history information sequence corresponding to each recommendation information includes: each history information in the history information sequence corresponds to a correlation factor of each recommended information, for example, the history information sequence is a commodity identifier (skirt, lipstick, milk powder, book and snack) purchased by a user, and the recommended information set comprises the following recommended information: in the process of implementing the embodiment of the application, the applicant finds that the interactive behaviors of the five pieces of history information, namely skirt, lipstick, milk powder, book and snack, are different from the reference meaning of the recommended information, namely, the influence correlation of the history behavior corresponding to the lipstick is larger than the influence correlation of the history behavior corresponding to the lipstick in the recommended information, so that in order to truly characterize the influence of the history behavior of the user on the recommended information, the correlation factors of each piece of information in the history information sequence on the recommended information need to be determined, for example, the correlation factors of the five pieces of history information, namely, the correlation factors of the skirt, the lipstick, the milk powder, the book and the snack, respectively correspond to the lipstick in the recommended information, the correlation factors of the history information corresponding to the recommended information form the correlation factors of the history information sequence corresponding to the recommended information, and the correlation factors of the five pieces of history information corresponding to the trousers respectively are determined according to the thought.

Referring to fig. 4B, fig. 4B is a schematic flow chart of an information recommendation method based on artificial intelligence according to an embodiment of the present application, in which the determining, in step 102, a correlation factor of each recommendation information in the recommendation information set corresponding to the history information sequence may be implemented by steps 1021-1022, and in step 1022, steps 10221-10224 are included.

In step 1021, a characteristic of each of the history information in the sequence of history information is determined.

In step 1022, the following processing is performed for any one of the recommended information in the recommended information set and any one of the history information in the history information sequence:

in step 10221, obtaining features of the recommendation information;

as an example, first, relevant data of history information, which may be an Identification (ID) of history information, a category of history information, etc., are acquired, the feature expression form of these data is discrete sparse features, in order to facilitate the subsequent processing, the discrete features need to be mapped to dense features, that is, dense features are obtained through an embedding layer (embedded layer) as features of corresponding recommendation information, the sparse features refer to dimensions (lengths) of which the number of non-zero values in the feature vector is far smaller than that of the feature vector, here, a sparse feature threshold may be set, that is, feature vectors of which the number of non-zero values in the feature vector is smaller than that of the sparse feature threshold are sparse features, relatively speaking, dense features refer to vectorized representations of the above sparse features, the number of zero values is smaller than that of the dense feature threshold, and different dimensions in the dense features may have correlation, so that the model may have a strong generalization capability based on the correlation between the dense feature description information.

In step 10222, the features of the recommended information and the features of the history information are subjected to phase subtraction processing to obtain corresponding difference features;

in step 10223, the features of the recommended information, the features of the history information and the corresponding difference features are spliced to obtain spliced features corresponding to the history information;

in step 10224, the splicing features of the history information are fully connected to obtain the correlation factor of the recommendation information corresponding to the history information.

As an example, the features of the recommended information and the features of the history information are subjected to phase subtraction to obtain corresponding difference features, for example, the features of the "skirt" and the features of the "lipstick" are subjected to phase subtraction to obtain difference features, then the features of the "skirt" and the features of the "lipstick" are subjected to splicing (jointing operation), which may also be referred to as merging operation, so as to obtain spliced features, and the spliced features are input into the full connection layer to obtain relevant factors of the recommended information "lipstick" corresponding to the history information "skirt".

As an example, the user's behavior characteristics are formed by averaging the characteristics of all user's historical behavior objects (characteristics of the history information) by means of an averaging pooling layer, but the interests of each user are various, i.e. the historical behavior objects may be totally uncorrelated, such as "books" and "powdered milk" in the above examples, but when making an actual recommendation, the user does not need to take too much into account the preferences previously characterized by the historical behavior for "books" when browsing the recommendation information "diapers", i.e. the task of predicting the click rate for "diapers", the historical behavior generated by the user for "powdered milk", the historical behavior generated by the user for "books", the importance of these two historical behaviors being different, i.e. the attention to the different historical behaviors is different when predicting the click rate for the recommendation information, and the correlation factor of each user behavior for the recommendation information is generated by the above embodiments.

In step 103, according to the correlation factor of the historical information sequence corresponding to each recommendation information in the recommendation information set, determining the behavior characteristic of the user corresponding to each recommendation information.

Referring to fig. 4C, fig. 4C is a flowchart illustrating an information recommendation method based on artificial intelligence according to an embodiment of the present application, in step 103, according to a correlation factor of historical information sequences corresponding to each recommendation information in a recommendation information set, determining a behavior feature of a user corresponding to each recommendation information may be implemented through steps 1031-1032.

In step 1031, a characteristic of each of the historical information in the sequence of historical information is determined.

In step 1032, the following processing is performed for each recommendation information in the recommendation information set: and weighting the characteristics of the plurality of historical information by taking the correlation factor of the historical information corresponding to the recommendation information as a weight to obtain the behavior characteristics of the characterization user aiming at the recommendation information.

As an example, continuing the above example, regarding the recommended information "lipstick", the history information "skirt" corresponds to the recommended information "lipstick" with a correlation factor of g1, the history information "lipstick" corresponds to the recommended information "lipstick" with a correlation factor of g2, the history information "milk powder" corresponds to the recommended information "lipstick" with a correlation factor of g3, the history information "book" corresponds to the recommended information "lipstick" with a correlation factor of g4, the history information "snack" corresponds to the recommended information "lipstick" with a correlation factor of g5, the characteristics of the history information "lipstick" are multiplied with the corresponding correlation factor g1, and after similar processing is performed on other history information, the multiplication results are added, namely, the weighting processing is equivalent to completion, and finally the behavior characteristics of the user for the recommended information are obtained.

In step 104, the behavioral characteristics of each recommendation information are subjected to a plurality of iterative characteristic extraction processes, and the click rate of each recommendation information is determined based on the characteristic extraction result of each recommendation information.

Referring to fig. 4D, fig. 4D is a schematic flow chart of an information recommendation method based on artificial intelligence according to an embodiment of the present application, and the iterative feature extraction processing for the behavioral features of each recommendation information in step 104 may be implemented by steps 1041-1043.

In step 1041, a data characteristic of the user, a recommended environment characteristic of the user, and a characteristic of each recommendation information in the set of recommendation information are determined.

As an example, the data features may originate from a user representation, such as an age feature of the user, a occupation feature of the user, a geographic feature of the user, etc., a recommendation environment feature of the user may be a network feature in which the user is located, a client type feature served by a recommendation system, etc., and a feature of each recommendation information in the recommendation set may be an Identification (ID) of the recommendation information, a category of the recommendation information, etc.

In step 1042, the behavior feature, the data feature, the recommended environment feature, and the feature of the recommended information are spliced.

As an example, the concatenation process (concatenation operation) may be a merging concatenation process of merging a plurality of feature vectors into a matrix, and as a result of the concatenation process, the object of the concatenation process may be a behavior feature and other features of the embedded layer output, that is, features not limited to the above-described data feature, recommended environment feature, and recommended information.

In step 1043, the stitching result is subjected to iterative feature extraction processing.

In some embodiments, the iterative feature extraction processing performed on the splicing processing result in step 1043 may be implemented by the following technical solutions: performing feature extraction processing on the input of the nth neural network model through an nth neural network model in the N cascaded neural network models, and transmitting an nth feature extraction result output by the nth neural network model to the (n+1) th neural network model to continue the feature extraction processing; wherein N is an integer whose value is increased from 1, the value range of N satisfies that N is more than or equal to 1 and less than or equal to N-1, and N is an integer more than or equal to 2; when the value of N is 1, the input of the N-th neural network model is a splicing processing result, and when the value of N is 2-N-1, the input of the N-th neural network model is a characteristic extraction result of the N-1-th neural network model.

As an example, the network formed by cascading a plurality of neural network models performs iterative feature extraction processing on the splicing processing result, and the output of the last neural network model is the input of the current neural network model, and the output of the current neural network model is the input of the next neural network model.

In some embodiments, when N takes on the value 1.ltoreq.n.ltoreq.N-1, the nth neural network model includes a one-dimensional convolution layer, and a max pooling layer; the feature extraction processing is performed on the input of the nth neural network model through the nth neural network model in the N cascaded neural network models, and the feature extraction processing can be realized through the following technical scheme: carrying out convolution processing on the input of the nth neural network and one-dimensional convolution layer parameters of the one-dimensional convolution layer of the nth neural network model to obtain an nth convolution layer processing result corresponding to the splicing processing result; carrying out maximum pooling treatment on the treatment result of the nth convolution layer through a maximum pooling layer of the nth neural network model to obtain an nth characteristic extraction result output by the nth neural network model; when the value of N is N-1, the n+1th neural network model comprises the one-dimensional convolution layer, a folding layer and the maximum pooling layer; the above-mentioned n feature extraction result output by the n-th neural network model is transmitted to the n+1-th neural network model to continue the feature extraction process, which can be implemented by the following technical scheme: carrying out convolution processing on the nth characteristic extraction result and one-dimensional convolution layer parameters of the one-dimensional convolution layer of the n+1th neural network model to obtain an nth+1th convolution layer processing result corresponding to the nth characteristic extraction result; performing pairwise bit addition processing on convolution characteristic values of adjacent dimensions in the n+1th convolution layer processing result through the folding layer to obtain a folding processing result; and carrying out maximum pooling treatment on the folding treatment result through the maximum pooling layer of the n+1 neural network model to obtain an n+1 characteristic extraction result output by the n+1 neural network model.

As an example, between the pooling layer and the next convolution layer, after multiplication with some weight parameters, a certain offset parameter is added to obtain a plurality of feature graphs finally to ensure the diversity of extracted features, the plurality of feature graphs are obtained through a plurality of convolution kernels (the one-dimensional convolution layer parameters), the convolution kernels are applied to each row of the feature matrix, namely, each dimension of the vector representation, the operations between different rows are mutually independent, the dependency relationship between two adjacent rows can be realized through folding operations, the folding operations can be to add the vectors of the two adjacent rows in a para-position manner, the dimension of the vector representation is reduced by half, the operation does not increase the number of parameters, but the association between the rows in the feature matrix is considered before the final full-connection layer.

In some embodiments, the convolution processing is performed on the input of the nth neural network and the one-dimensional convolution layer parameter of the one-dimensional convolution layer of the nth neural network model to obtain an nth convolution layer processing result corresponding to the splicing processing result, which may be implemented by the following technical scheme: carrying out convolution processing on the characteristic value of each dimension input by the nth neural network and the one-dimensional convolution layer parameter to obtain a convolution characteristic value of each dimension; and performing splicing treatment on the convolution characteristic values of each dimension to obtain an nth convolution layer treatment result based on the one-dimensional convolution layer parameters.

As an example, the convolution process is for each dimension, i.e. the convolution is performed separately in each dimension of the feature vector, i.e. instead of multi-dimensional convolution of the whole sentence using one convolution kernel of size [ w,1], one convolution kernel of size [ w,1] is performed in each dimension, i.e. each convolution kernel can only be moved laterally to convolve a certain dimension, so that it is a one-dimensional convolution, and different available information can be captured from different dimensions.

As an example, the convolution operation tends to shorten the input length after convolution (L-w+1, L is the input length, and w is the convolution kernel width), and the convolution process in the information recommendation method of the embodiment of the present application is a wide convolution, and the wide convolution increases the input length (l+w-1), because the window of the wide convolution does not need to cover all input values, and the portion without values can be filled with 0 values, so that the edge information is not lost.

In some embodiments, the above-mentioned processing result of the nth convolutional layer is processed by the maximum pooling layer of the nth neural network model to obtain the nth feature extraction result output by the nth neural network model, which may be implemented by the following technical scheme: the following processing is performed for each dimension feature in the nth convolution layer processing result: acquiring a plurality of convolution calculated values of the dimension, and performing descending order sorting treatment on the plurality of convolution calculated values; determining a plurality of convolution calculated values which are ranked ahead in the descending ranking result as a largest pooling processing result of the dimension characteristic; and performing splicing treatment on the maximum pooling treatment result of the characteristics of each dimension to obtain an nth characteristic extraction result output by the nth neural network model.

As an example, the neural network model closest to the output layer includes a convolution layer, a folded layer and a pooling layer, other neural network models include a convolution layer and a pooling layer, parameters of the pooling layer of each neural network model may be the same or different, that is, the number k of features selected by the pooling layer in each neural network model may be directly set to be the same, that is, k features with the largest value are selected as input of the next layer in each pooling, or in another embodiment, parameters of the pooling layer of each neural network model are set to be different, that is, corresponding k values are determined according to positions of the pooling layers, that is, a dynamic pooling process is implemented, and a formula of a corresponding relation is as followsWherein L represents the current convolution layer number, L represents the total number of convolution layers in the model, and k _top The k value representing pooling operation of the topmost convolution layer, S is the total number of features, and the dynamic k-max pooling processing has the significance that the corresponding number of semantic feature information is extracted from sentences with different lengths so as to ensure the uniformity of the subsequent convolution layers.

In some embodiments, the determining the click rate of each piece of recommended information based on the feature extraction result of each piece of recommended information may be implemented by the following technical scheme: and carrying out full-connection processing on the feature extraction result of each piece of recommended information, and carrying out maximum likelihood processing on the full-connection processing result to obtain the click rate corresponding to each piece of recommended information.

As an example, the operations of the convolution layer, the pooling layer, etc. are mapping the original data to the hidden layer feature space, and the fully connected layer is the effect of mapping the learned "distributed feature representation" to the sample mark space, and in actual use, the fully connected layer may be implemented by the convolution operation: the fully connected layer which is fully connected to the front layer can be converted into convolution with convolution kernel of 1x 1; the full-connection layer of which the front layer is a convolution layer can be converted into global convolution with a convolution kernel of hxw, h and w are respectively the height and width of the convolution result of the front layer, and the maximum likelihood function processing can be to calculate the output result of the full-connection layer through a softmax function so as to obtain the predicted click rate of corresponding recommended information.

In step 105, a recommendation operation is performed based on the click rate of each recommendation information.

In some embodiments, the performing of the recommendation operation based on the click rate of each recommendation information in step 105 may be implemented by the following technical scheme: and performing descending order sorting processing based on the click rate on the recommendation information in the recommendation information set, and executing recommendation operation based on a plurality of recommendation information with the top sorting.

As an example, the recommendation operation may be performed based on the top-ranked plurality of recommendation information by pushing the plurality of recommendation information directly to the user terminal for presentation, or by continuing to perform the reordering process on the top-ranked plurality of recommendation information and performing the recommendation operation based on the reordering result.

An exemplary application of the information recommendation method provided by the embodiment of the present application in an actual application scenario will be described below.

Referring to fig. 5, fig. 5 is an overall architecture diagram of an artificial intelligence based information recommendation method provided by the embodiment of the application, taking an advertisement recommendation system as an example, a recall (Candidate Generation) stage and a Ranking (Ranking) stage are included in the recommendation system, a recall process is a coarse Ranking stage, a model for selecting several hundred candidate recommendation information that a user may click, a general recall stage selects simpler strategies to recall based on delay consideration, such as collaborative filtering, an LR model, a recall strategy based on a user equipment location information service, etc., the recall stage is generally used by various recall strategies, obtain candidate recommendation information that a user may click, since the recall stage can recall hundreds of recommendation information, a click rate prediction model used in the recall stage is relatively simple, there is no high requirement on accuracy, a commonly used feature source is a user portrait and user history behavior data, a model for recall is a model based on vector representation learning, a Ranking (Ranking) is a fine Ranking stage, N pieces of information before the user may click are selected, a stage needs to perform a collaborative filtering, a bid rate prediction model is required by the stage, a high precision is a bid rate prediction model is used by the user, a user has a high accuracy rate prediction model is a high in order, and a user has a high accuracy rate is a high in order, and a user has a high accuracy is a high accuracy, a Ranking model is used by a user has a high accuracy, and a user has a high accuracy is a high accuracy, and a user is predicted by a high Ranking model, the platform side cannot control, the platform side needs to make click rate prediction according to user characteristics and advertisement characteristics, CTR is used for advertisement sequencing, sequencing is the core of bidding advertisements, so that click rate prediction is one of core technologies of bidding advertisements, a click rate prediction model based on a deep-attention-deep convolutional neural network is mainly applied to the fine ranking stage of various recommendation systems, the click rate prediction model is applied to a news recommendation system, the attention module is introduced to describe various interests of users, and the high-order combination characteristics with characterization capability are learned in a multi-layer pooling mode, so that the click rate prediction accuracy is improved.

In the field of natural language processing, CNN models are used to perform emotion classification of text. The model can extract important semantic information in sentences through word combination, in a certain sense, the grammar parsing process is shown in fig. 7, fig. 7 is a characteristic cross diagram of an artificial intelligence based information recommendation method provided by the embodiment of the application, the function of a hierarchical structure characteristic tree is similar to that of a grammar parsing tree, the model is a process of modeling the semantics of sentences, the bottom layer can be seen to be transmitted upwards gradually through combining adjacent word information, and the upper layer is combined with new phrase information, so that even words far away from each other in sentences have interaction behavior (or certain semantic relation), in visual sense, the model can extract important semantic information in sentences through word combination, for example, the combination between a 'cat' and a 'sitting', and the function of the hierarchical structure characteristic diagram is similar to that of a grammar parsing tree. The extraction of the combined features is completed in the network through the one-dimensional convolution layer and the k-max pooling layer, and from the visual point of view, the model can extract important semantic information in sentences through the combination of words, and finally, a syntactic analysis tree is similarly constructed to complete emotion classification of the sentences.

In view of the excellent performance of the CNN network in terms of natural language processing emotion analysis, a DCNN-based click rate prediction model is proposed, then the DCNN-based click rate prediction model is improved, the attention module is used for modeling the diversified interests of the user, a depth-attention-depth convolutional neural network-based click rate prediction model is proposed, see fig. 3C, the depth convolutional neural network model comprises an input layer, a plurality of network combinations (one-dimensional convolutional layer and pooled layer) and a fully-connected layer, see fig. 6, fig. 6 is a feature compression schematic diagram of the artificial intelligence-based information recommendation method provided by the embodiment of the application, advertisement click features are almost discrete features, the features after single thermal encoding are sparse and have high dimension, the sparse feature space is required to be mapped into a dense feature space through an Embedding layer (embedded layer), so that the problem of the features is solved, and the output after each feature passes through the Embedding layer is：e _i ＝Embedding(x _i ) Where xi represents the ith feature, e, of the input _i Representing the output vector of the ith feature after passing through the embedding layer, the total feature of each sample after being input to the embedding layer is expressed as: a, a ⁽⁰⁾ ＝[e ₁ ,e ₂ ,.....e _m ]Wherein a is ⁽⁰⁾ Representing the input layer of a deep convolutional neural network, e _i Embedded features representing the ith feature, m representing the number of features, e _i The dimension K of (2) is 10.

The convolution layer of the DCNN model is a one-dimensional convolution, assuming w _i ∈R ^w ，s _i ∈R ⁿ ，r _i ∈R ^(n+w-1) ，w _i Is a parameter of one-dimensional convolution (convolution kernel window size), s _i Is a column vector on the same embedding dimension of different features on the embedding layer, r _i The one-dimensional convolution result is obtained by the one-dimensional convolution, and the calculation formula of the one-dimensional convolution is as follows:the convolution used in the model is one-dimensional wide convolution, the convolution is carried out on the same embedded dimension of different features, the interaction relation between the different features can be learned, on data with local correlation, the convolution kernel has the capability of feature extraction, the advertisement features do not have local correlation, but DCNN is subjected to continuous one-dimensional convolution and pooling processing, and finally the features on the feature map are equivalent to high-order feature combinations, and feature combinations in a wider range are felt.

The pooling layer of the DCNN model extracts k stronger features from the features by using a k-max pooling layer, the pooling layer selects the largest k values from all the feature values, on one hand, the relative position information of the k feature values is reserved, and on the other hand, a plurality of important information (k values) are extracted at the same time, the pooling mode can also adapt to the input of different lengths, because only k values are finally required to be extracted and applied to the output layer, after the features obtained by the last pooling layer are transferred to the output layer, the output layer obtains the probability value of whether the information is clicked or not, namely the click rate, in each sample through a maximum likelihood function (Softmax), Wherein y is _DCNN For the result obtained through the full connection processing, the cross entropy loss function of the model can be obtained by maximizing the likelihood probability value of the click rate prediction model, see formula (1): />

Wherein T is a training data set, y _i X is the true class of the ith sample _i For the characteristics of the ith sample, θ is a parameter of the model, J (θ) is a cross entropy loss function, and the model parameter is learned by minimizing the loss function, and the training method adopts a random gradient descent method.

The click rate prediction model based on DCNN can finely model the interests of users by learning high-order combination features, but can not model the diversified interests of the users, so that the model is structurally reconstructed on the basis of the DCNN model, an attention module is introduced before a convolution layer to describe the diversified interests of the users, the click rate prediction model based on a depth-attention-depth convolution neural network is provided, an attention mechanism is derived from the machine translation field, taking machine translation as an example, fig. 8A is a schematic diagram of the attention mechanism model in the related art, text content (A, B, C) is read in through a cyclic neural network encoder, a text content vector w (the last hidden layer state of the cyclic neural network) is obtained, then another cyclic neural network decoder takes the hidden layer state as a starting state, generating each word of the object (X, Y, Z) in turn, the disadvantage of this attention mechanism is that no matter how long the previous text content contains, how much information, the final state vector is compressed into a vector of several hundred dimensions, which means that the larger the text content is, the more information is lost by the final state vector, and after the length of the input sentence is increased, the result of the final decoder translation is significantly worse, because the text content is known at the time of input, the applicant finds that the model can obtain better effect by using all the information of the text content in the decoding process when implementing the embodiment of the present application, and not just by using the last state vector, the core idea of the attention module is the same, see fig. 8B, which is a schematic diagram of the attention mechanism model in the related art, first, when generating the state (h 7 … h 9) on the target side, all text content vectors (h 1 … h 5) are used as input, and secondly, not all text content has an influence on the generation of the next state, for example, when translating an english article, attention is paid to the "currently translated part" rather than the whole article, "Attention" means selecting the proper text content and generating the next state by using it, attention is a weight vector (usually the output of soft max), the dimension of which is equal to the length of the text content, and the larger weight represents the more important text content at the corresponding position, so that the applicant finds that as the article to be recommended is different, the data of the user history browsing should play different roles in characterizing the user vector, and learns the weight vector to characterize the importance difference.

The embedded feature of the user's behavioral interest is constructed by all data of the user history, and the summation process or the averaging process is generally directly used to obtain a representation of the embedded feature of the user, for example, all the features in a feature class (field) are directly accumulated after the embedding process, but much detail information is lost, because the user's historical behavior contains many aspects, the user's interest is also diversified, for example, the history information (browsing record) of a young mother may include cosmetics, women's clothes, milk powder, infant products, etc., and for the case that the information to be recommended is women's clothes, the click data of the user about clothing products historically plays a major role in constructing the embedded feature of the user, while the data of the infant products browsed historically plays no role in constructing the embedded feature of the user, so the modeling of the user's diversified interest by the attention module is considered.

The click rate prediction model based on the depth-attention-depth convolutional neural network can capture diversified user interests from rich user historical click data, and is divided into two modules: an attention module for mapping sparse identification features to a dense embedded feature vector space; the DCNN module is used for modeling vectors output by the embedded layer, learning high-order cross combination features, recommending that the number of features contained by each sample in the data set is not fixed, mapping sparse identification type features of a user to the embedded feature vectors with fixed length before modeling by using the depth network, and obtaining the fixed embedded feature vectors by using an average pooling mode can cause a lot of information loss, so that the structure of the click rate prediction model based on the depth-attention-depth convolutional neural network can be decomposed into an attention module and a DCNN module, wherein the attention module maps sparse advertisement features to dense embedded feature vectors, and the DCNN module carries out cross modeling on the embedded feature vectors.

Because there are user data and recommended information data in the data, the recommended information must be considered when the attention module is designed to learn user expression, that is, different user vectors can be obtained according to different recommended information, before the attention module is not used to learn expression of embedded features of the user, a fixed embedded feature is used to represent that a strong mathematical assumption exists, for example, the user U clicks the product a and the product B simultaneously, it is assumed that the embedded feature vectors of the user U are Vu, the embedded feature vectors of the product a and the product B are Ua and Ub, because the user clicks a and B simultaneously, < Vu, ua > and < Vu, ub > simultaneously are larger, under the limitation of Vu, that is, when the attention module is not introduced, the embedded feature vectors of different products clicked by the user are very close, which is unreasonable, because the historic click behaviors of a user are diversified, for example, the user clicks a skirt and a B are unreasonable, the embedded feature vectors are represented by the following weights are firstly, and the corresponding historic feature is calculated as a difference between the corresponding characteristics, and the recommended information is calculated, and the recommended information is completely, and the corresponding historic feature information is represented by the formula is calculated, and the difference is calculated, and the corresponding to the recommended information is represented by the historic feature information 2:

Where Vi represents the embedded feature vector of the item i that the user interacted with (e.g. clicked or purchased), e.g. the item ID clicked by the user, the category ID of the item, vu is the user feature vector learned by the attention network from the user's historical behavior data, g (Vi, va) is the correlation between the representation of the item feature a to be recommended and the representation of the user feature i, which correlation is learned by the attention neural network. The learning of the attention can take the diversified interests of the user into consideration to obtain vector representation of the behavior characteristics of the user, the correlation is calculated through the attention network, and the behavior characteristics of the user are obtained by carrying out weighted accumulation on the characteristics of each historical information, so that different behavior characteristic representations Vu of the user can be obtained along with different articles Va to be recommended.

The click rate prediction model based on the depth-attention-depth convolution neural network models the diversified click data of users by using an attention module to obtain the behavior characteristics of the users changing with the articles to be recommended, models the extracted embedded characteristics by using the depth convolution neural network, learns the high-order cross combination characteristics, and compared with the LR model, the FM model and the index improvement experimental results of the depth-average-DCNN model, the information recommendation method provided by the embodiment of the application is shown in the following table (1):

(1) Model ordering index comparison table

Compared with an LR model, the ordering capacity index (AUC) of the FM model is improved by 1.04%, advertisement data can be better modeled by characterizing the introduced embedded layer, the generalization capacity of the model is improved, the DCNN is improved by 3.29% compared with the FM model for the multi-hot data type, the high-order combined characteristic learning capacity of the DCNN network is proved by the AUC compared with the FM model, the accuracy of the model can be improved, the AUC is improved by 4.07% compared with the FM model after the attention network is used, the diversity of historical interests of a user can be considered after the attention mechanism is proved, and the modeling of the data is more careful.

The click rate prediction model based on the depth-attention-depth convolution neural network is used for introducing an attention module to draw diversified interests of users, learning high-order combination features with characterization capability in a multi-layer pooling mode, further improving the characterization capability of the model, simultaneously using an LR model on the basis of the depth-attention-depth convolution neural network, modeling strong features through the LR model, and learning high-order combination features through the depth-attention-depth convolution neural network, so that the memory and generalization of a final model can be ensured.

Continuing with the description below of an exemplary structure of the information recommendation device 255 implemented as a software module provided by an embodiment of the present application, in some embodiments, as shown in fig. 2, the software module stored in the information recommendation device 255 of the memory 250 may include: an obtaining module 2551, configured to obtain a historical information sequence and a recommendation information set of a user; a correlation factor determining module 2552, configured to determine a correlation factor of each recommendation information in the recommendation information set corresponding to the historical information sequence; the behavior feature determining module 2553 is configured to determine, according to the correlation factor of the historical information sequence corresponding to each recommendation information in the recommendation information set, a behavior feature of the user corresponding to each recommendation information; the click rate determining module 2554 is configured to perform feature extraction processing on the behavioral features of each recommendation information for multiple iterations, and determine the click rate of each recommendation information based on the feature extraction result of each recommendation information; the recommendation module 2555 is configured to perform a recommendation operation based on the click rate of each recommendation information.

In the above scheme, the correlation factor of the history information sequence corresponding to each recommendation information includes: each history information in the history information sequence corresponds to a correlation factor of each recommendation information; the correlation factor determining module 2552 is further configured to: determining the characteristics of each history information in the history information sequence; for any one of the recommended information in the recommended information set and any one of the history information in the history information sequence, the following processing is performed: acquiring characteristics of recommended information; performing phase subtraction processing on the features of the recommended information and the features of the history information to obtain corresponding difference features; splicing the features of the recommended information, the features of the history information and the corresponding difference features to obtain spliced features of the corresponding history information; and performing full connection processing on the spliced characteristics of the historical information to obtain the correlation factor of the recommendation information corresponding to the historical information.

In the above solution, the behavioral characteristics determination module 2553 is further configured to: determining the characteristics of each history information in the history information sequence; the following processing is performed for each recommendation information in the recommendation information set: and weighting the characteristics of the plurality of historical information by taking the correlation factor of the historical information corresponding to the recommendation information as a weight to obtain the behavior characteristics of the characterization user aiming at the recommendation information.

In the above solution, the click rate determination module 2554 is further configured to: determining the data characteristics of the user, the recommended environment characteristics of the user and the characteristics of each recommended information in the recommended information set; splicing the behavior characteristics, the data characteristics, the recommended environment characteristics and the characteristics of the recommended information; and carrying out iterative feature extraction processing on the splicing processing result.

In the above solution, the click rate determination module 2554 is further configured to: performing feature extraction processing on the input of the nth neural network model through an nth neural network model in the N cascaded neural network models, and transmitting an nth feature extraction result output by the nth neural network model to the (n+1) th neural network model to continue the feature extraction processing; wherein N is an integer whose value is increased from 1, the value range of N satisfies that N is more than or equal to 1 and less than or equal to N-1, and N is an integer more than or equal to 2; when the value of N is 1, the input of the N-th neural network model is a splicing processing result, and when the value of N is 2-N-1, the input of the N-th neural network model is a characteristic extraction result of the N-1-th neural network model.

In the scheme, when the value of N is 1-N-1, the N-th neural network model comprises a one-dimensional convolution layer and a maximum pooling layer; click rate determination module 2554, further configured to: carrying out convolution processing on the input of the nth neural network and one-dimensional convolution layer parameters of the one-dimensional convolution layer of the nth neural network model to obtain an nth convolution layer processing result corresponding to the splicing processing result; carrying out maximum pooling treatment on the treatment result of the nth convolution layer through a maximum pooling layer of the nth neural network model to obtain an nth characteristic extraction result output by the nth neural network model; when the value of N is N-1, the n+1th neural network model comprises the one-dimensional convolution layer, a folding layer and the maximum pooling layer; click rate determination module 2554, further configured to: carrying out convolution processing on the nth characteristic extraction result and one-dimensional convolution layer parameters of the one-dimensional convolution layer of the n+1th neural network model to obtain an nth+1th convolution layer processing result corresponding to the nth characteristic extraction result; performing pairwise bit addition processing on convolution characteristic values of adjacent dimensions in the n+1th convolution layer processing result through the folding layer to obtain a folding processing result; and carrying out maximum pooling treatment on the folding treatment result through the maximum pooling layer of the n+1 neural network model to obtain an n+1 characteristic extraction result output by the n+1 neural network model.

In the above solution, the click rate determination module 2554 is further configured to: carrying out convolution processing on the characteristic value of each dimension input by the nth neural network and the one-dimensional convolution layer parameter to obtain a convolution characteristic value of each dimension; and performing splicing treatment on the convolution characteristic values of each dimension to obtain an nth convolution layer treatment result based on the one-dimensional convolution layer parameters.

In the above solution, the click rate determination module 2554 is further configured to: the following processing is performed for each dimension feature in the nth convolution layer processing result: acquiring a plurality of convolution calculated values of the dimension, and performing descending order sorting treatment on the plurality of convolution calculated values; determining a plurality of convolution calculated values which are ranked ahead in the descending ranking result as a largest pooling processing result of the dimension characteristic; and performing splicing treatment on the maximum pooling treatment result of the characteristics of each dimension to obtain an nth characteristic extraction result output by the nth neural network model.

In the above solution, the click rate determination module 2554 is further configured to: performing full-connection processing on the feature extraction result of each piece of recommended information, and performing maximum likelihood processing on the full-connection processing result to obtain click rate corresponding to each piece of recommended information; recommendation module 2555, further to: and performing descending order sorting processing based on the click rate on the recommendation information in the recommendation information set, and performing recommendation operation based on the plurality of recommendation information with the top sorting.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the information recommendation method according to the embodiment of the present application.

Embodiments of the present application provide a computer readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a method provided by embodiments of the present application, for example, an information recommendation method as shown in fig. 4A-4D.

In some embodiments, the computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.

In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, the executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in one or more scripts stored in a HyperText Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or, alternatively, distributed across multiple sites and interconnected by a communication network.

In summary, according to the embodiment of the application, based on the correlation factors of the same historical information sequence corresponding to different recommendation information, the behavior features representing the interests of the user are purposefully depicted for different recommendation information, so that the bidirectional targeted feature depiction between the recommendation information and the user is realized, and the diversified interests of the user are effectively depicted. Therefore, the information recommendation precision of information recommendation based on the click rate predicted by the behavior characteristics is ensured, invalid recommendation is effectively avoided, and computing resources related to recommendation logic in a server are further saved.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. An information recommendation method, comprising:

acquiring characteristics of the recommended information;

performing full connection processing on the spliced characteristics of the history information to obtain a correlation factor of the history information corresponding to the recommendation information;

weighting the characteristics of a plurality of historical information by taking the correlation factor of the historical information corresponding to the recommendation information as a weight to obtain behavior characteristics of the user aiming at the recommendation information;

2. The method according to claim 1, wherein the performing feature extraction processing on the behavior feature of each recommended information for a plurality of iterations includes:

3. The method according to claim 2, wherein iteratively performing feature extraction processing on the splice processing results comprises:

4. The method of claim 3, wherein the step of,

when N is equal to or greater than 1 and equal to or less than N-1, the nth neural network model comprises a one-dimensional convolution layer and a maximum pooling layer, and the characteristic extraction processing is carried out on the input of the nth neural network model through the nth neural network model in the N cascaded neural network models, and the method comprises the following steps:

when the value of N is N-1, the n+1th neural network model includes the one-dimensional convolution layer, the folding layer and the maximum pooling layer, and the transmitting the N feature extraction result output by the N neural network model to the n+1th neural network model to continue the feature extraction process includes:

5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

the convolution processing is performed on the input of the nth neural network and the one-dimensional convolution layer parameters of the one-dimensional convolution layer of the nth neural network model to obtain an nth convolution layer processing result corresponding to the splicing processing result, and the convolution processing method comprises the following steps:

6. The method of claim 4, wherein the performing, by the max pooling layer of the nth neural network model, max pooling the nth convolutional layer processing result to obtain an nth feature extraction result output by the nth neural network model, comprises:

7. The method of claim 1, wherein determining the click rate of each of the recommended information based on the feature extraction result of each of the recommended information comprises:

the executing the recommendation operation based on the click rate of each recommendation information comprises the following steps:

and performing descending order sorting processing based on click rate on the recommendation information in the recommendation information set, and executing recommendation operation based on a plurality of recommendation information with top sorting.

8. An information recommendation device, characterized by comprising:

the correlation factor determining module is used for executing the following processing on any one of the recommended information in the recommended information set and any one of the history information in the history information sequence:

Acquiring characteristics of the recommended information;

a behavior feature determining module, configured to perform the following processing for each recommendation information in the recommendation information set:

9. An information recommendation device, characterized by comprising:

a memory for storing executable instructions;

a processor for implementing the information recommendation method according to any one of claims 1 to 7 when executing executable instructions stored in said memory.

10. A computer readable storage medium storing executable instructions for causing a processor to implement the information recommendation method of any one of claims 1 to 7 when executed.