CN117009912A

CN117009912A - Information recommendation method and training method for neural network model for information recommendation

Info

Publication number: CN117009912A
Application number: CN202211222984.9A
Authority: CN
Inventors: 喻先哲; 李敏丽; 杨建博
Original assignee: Tencent Technology Shanghai Co Ltd
Current assignee: Tencent Technology Shanghai Co Ltd
Priority date: 2022-10-08
Filing date: 2022-10-08
Publication date: 2023-11-07

Abstract

The application provides an information recommendation method and a training method for a neural network model for information recommendation; the information recommendation method comprises the following steps: acquiring at least one piece of multi-mode information to be sent to an object, wherein each piece of multi-mode information comprises a plurality of display strategies; acquiring display strategy characteristic information and object characteristic information of objects corresponding to the display strategies respectively; based on the feature information of each display strategy and the feature information of the object, invoking a neural network model to conduct prediction processing to obtain a sequencing parameter corresponding to each display strategy, wherein the neural network model is obtained by training based on sequencing results of a plurality of applied display strategy samples; determining a target display strategy corresponding to each piece of multi-mode information; based on at least one piece of multi-mode information to which the corresponding target presentation policy is respectively applied, a recommendation operation for the object is performed. According to the method and the device for recommending the resources, the recommending effect can be optimized, and the utilization rate of the recommended resources is improved.

Description

Information recommendation method and training method for neural network model for information recommendation

Technical Field

The application relates to the technical field of artificial intelligence and big data, in particular to an information recommendation method and a training method for a neural network model for information recommendation.

Background

Artificial intelligence (AI, artificial Intelligence) is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence.

Information recommendation is an important branch of artificial intelligence, and also involves big data processing of cloud technology, mainly studying how to recommend appropriate information to a specific object. For multi-modal information including multiple impression policies, such as multi-creative advertisements including multiple creatives, in the schemes provided by the related art, a random policy is typically applied, i.e., all creatives included in one multi-creative advertisement are recommended with equal probability. However, this approach tends to result in poor recommendation results and thus low utilization of recommended resources.

Disclosure of Invention

The embodiment of the application provides an information recommendation method, a training method and device for a neural network model for information recommendation, electronic equipment, a computer readable storage medium and a computer program product, which can optimize the recommendation effect and improve the utilization rate of recommendation resources.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides an information recommendation method, which comprises the following steps:

acquiring at least one piece of multi-mode information to be sent to an object, wherein each piece of multi-mode information comprises a plurality of display strategies;

acquiring display strategy characteristic information and object characteristic information of the object, wherein the display strategy characteristic information and the object characteristic information correspond to the display strategies respectively;

based on the feature information of each display strategy and the feature information of the object, invoking a neural network model to conduct prediction processing to obtain a sequencing parameter corresponding to each display strategy, wherein the neural network model is obtained by training based on sequencing results of a plurality of applied display strategy samples;

determining a target display strategy corresponding to each piece of multi-mode information, wherein the target display strategy is the display strategy with the largest sorting parameter in the multi-mode information;

And performing a recommendation operation for the object based on the at least one piece of multi-mode information to which the corresponding target presentation policy is respectively applied.

The embodiment of the application provides an information recommendation device, which comprises:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring at least one piece of multi-mode information to be sent to an object, and each piece of multi-mode information comprises a plurality of display strategies;

the acquisition module is further used for acquiring display strategy characteristic information and object characteristic information of the object, wherein the display strategy characteristic information and the object characteristic information correspond to the display strategies respectively;

the prediction module is used for calling a neural network model to perform prediction processing based on the feature information of each display strategy and the feature information of the object to obtain a sequencing parameter corresponding to each display strategy, wherein the neural network model is trained based on sequencing results of a plurality of applied display strategy samples;

the determining module is used for determining a target display strategy corresponding to each piece of multi-mode information, wherein the target display strategy is the display strategy with the largest sorting parameter in the multi-mode information;

and the recommending module is used for executing recommending operation for the object based on the at least one piece of multi-mode information to which the corresponding target showing strategy is respectively applied.

The embodiment of the application provides a training method for a neural network model for information recommendation, which comprises the following steps:

acquiring display strategy sequencing results respectively corresponding to a plurality of multi-mode information samples, wherein the display strategy sequencing results are obtained by sequencing based on recommended parameters of the display strategy samples in the multi-mode information samples;

sampling based on the display strategy sequencing result to obtain a plurality of display strategy samples;

invoking the initialized neural network model to execute a training task based on the plurality of presentation policy samples to update parameters of the neural network model;

and generating the trained neural network model based on the updated parameters.

The embodiment of the application provides a training device for a neural network model for information recommendation, which comprises the following components:

the system comprises an acquisition module, a display strategy sorting module and a display strategy sorting module, wherein the acquisition module is used for acquiring display strategy sorting results respectively corresponding to a plurality of multimode information samples, and the display strategy sorting results are obtained by sorting based on recommended parameters of the display strategy samples in the multimode information samples;

the sampling module is used for sampling to obtain a plurality of presentation strategy samples based on the presentation strategy sequencing result;

The execution module is used for calling the initialized neural network model to execute a training task based on the plurality of presentation strategy samples so as to update parameters of the neural network model;

and the generating module is used for generating the trained neural network model based on the updated parameters.

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the information recommendation method provided by the embodiment of the application or the training method of the neural network model for information recommendation when executing the executable instructions stored in the memory.

The embodiment of the application provides a computer readable storage medium which stores computer executable instructions for realizing the information recommendation method provided by the embodiment of the application or the training method of a neural network model for information recommendation when being executed by a processor.

The embodiment of the application provides a computer program product, which comprises a computer program or computer executable instructions and is used for realizing the information recommendation method provided by the embodiment of the application or the training method of a neural network model for information recommendation when being executed by a processor.

The embodiment of the application has the following beneficial effects:

the neural network model is trained based on the sequencing result of the plurality of applied display strategy samples, so that the neural network model can learn the capability of accurately predicting recommended parameters of the plurality of display strategies from the sequencing result, and therefore, the optimal display strategy can be screened out for each piece of multi-mode information for application by aiming at least one piece of multi-mode information to be sent through the neural network model, the pertinence and the accuracy of recommendation are enhanced, the finally screened display strategy can meet the requirements of objects, and the utilization rate of recommended resources is improved.

Drawings

FIG. 1 is a schematic diagram of an advertisement management system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an advertisement creative preference strategy based on MAB according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an architecture of an information recommendation system 100 according to an embodiment of the present application;

fig. 4A is a schematic structural diagram of a server 200 according to an embodiment of the present application;

fig. 4B is a schematic structural diagram of a server 200 according to an embodiment of the present application;

FIG. 5 is a flowchart of a training method for a neural network model for information recommendation according to an embodiment of the present application;

Fig. 6 is a flowchart of an information recommendation method according to an embodiment of the present application;

fig. 7 is a flowchart of an information recommendation method according to an embodiment of the present application;

FIG. 8A is a schematic diagram of a training phase and a prediction phase of a neural network model according to an embodiment of the present application;

FIG. 8B is a schematic diagram of a training phase and a prediction phase of a neural network model according to an embodiment of the present application;

fig. 9 is a flowchart of an information recommendation method according to an embodiment of the present application;

FIG. 10 is a schematic diagram of an advertisement management system according to an embodiment of the present application;

FIG. 11A is a schematic diagram of a tLTR model according to an embodiment of the present application;

FIG. 11B is a schematic diagram of a tLTR model according to an embodiment of the present application;

fig. 12 is a schematic diagram of a word embedding process according to an embodiment of the present application.

Detailed Description

The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

It will be appreciated that in the embodiments of the present application, related data such as user information is involved, and when the embodiments of the present application are applied to specific products or technologies, user permissions or agreements need to be obtained, and the collection, use and processing of related data need to comply with relevant laws and regulations and standards of relevant countries and regions.

In the following description, the term "first/second/is merely to distinguish similar objects and does not represent a specific ordering of objects, it being understood that the" first/second/is interchangeable with a specific order or sequence, if allowed, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.

1) Multimode information: i.e., information including multiple display policies, such as advertisements including multiple creatives, where creative refers to a display policy of an advertising element, including the employed element (e.g., title, description, picture, video, etc.), and may also include display parameters such as the display location and size of each element, the number of turns of a dynamic element (e.g., GIF); and the method can further comprise a man-machine interaction mode, such as a link for jumping to a landing page when a text element is clicked, a display magnified image when a picture element is clicked, and the like.

In addition, for advertisements comprising multiple creatives, the multiple creatives of the MC advertisement are all manually uploaded by users (e.g., advertisers) and Dynamic Creative (DC) advertisements, which are obtained by combining multiple advertisement elements by an advertisement management system (AMS, advertising Management System) according to the display requirements, and in each advertisement display, the best Creative is preferred from among the combined multiple creatives. For example, assuming that the user uploaded elements in four dimensions, such as title, description, picture, video, the AMS may automatically cross-multiply and warehouse in the resulting creatives. Assuming, for example, that the user uploaded 3 graphs and 3 titles, i.e., 6 elements total, the AMS could cross-multiply 3*3 =9 combinations, i.e., would put 9 creatives in stock. It can be seen that the MC ad differs from DC ads in that the creative is uploaded after the user has custom combined the elements, and that the creative is automatically cross-multiplied by the AMS after the user has uploaded the elements in multiple dimensions.

2) Neural Networks (NN) model: is a complex network system formed by a large number of simple processing units (also called neurons) widely interconnected, which reflects many basic features of human brain functions, and is a highly complex nonlinear dynamic learning system. The neural network model has massively parallel, distributed storage and processing, self-organizing, self-adapting and self-learning capabilities, and is particularly suitable for processing imprecise and fuzzy information processing problems which need to consider a plurality of factors and conditions at the same time.

3) Rank Learning (LTR), learning To Rank): the method is a method for training a model by adopting a machine learning method when the sorting problem is processed, and can be applied to the aspects of information retrieval, natural language processing, data mining and the like. For each given query-candidate result, extracting the characteristics, and obtaining the real data annotation by a log mining or manual annotation method. And then obtaining the most relevant result through a sequencing model. For example, in a search engine, the model learns the order of web pages in the search results. In e-commerce recommendations, the model needs to learn the order in the list of merchandise recommendations. In the context of advertisement coarse-ranking, the model may learn the order of advertisements in the fine-ranking. Common rank learning can be divided into the following three types: a single document method (PointWise), a document pair method (PairWise), and a document list method (ListWise).

4) In response to: for representing a condition or state upon which an operation is performed, one or more operations performed may be in real-time or with a set delay when the condition or state upon which the operation is dependent is satisfied; without being specifically described, there is no limitation in the execution sequence of the plurality of operations performed.

5) The object is: the system is used for receiving the multi-mode information, and can be a certain actual user or a certain virtual account number, such as a user account number registered in an e-commerce platform.

6) One-Hot encoding (One-Hot): also known as one-bit efficient encoding, the method is to encode N states by using N-bit state registers, each state having its own register bit, and at any time only one bit is active, for example for six states, the corresponding one-hot encoded vectors are respectively: 000001, 000010, 000100, 001000, 010000 and 10000.

7) Word Embedding (Word Embedding): is a generic term for language models and token learning techniques in natural language processing. Conceptually, it refers to embedding a high-dimensional space, which is the number of all words in dimension, into a continuous vector space, which is much lower in dimension, each word or phrase being mapped to a vector on the real number domain. The Word embedding method comprises an artificial neural network (such as Word2 Vec), dimension reduction of a Word co-occurrence matrix, a probability model, a display representation of the context of the Word and the like. For example, a text contains words such as "cat", "dog", etc., and these words are mapped into a vector space, where the vector corresponding to "cat" is (0.1.0.2.0.3), and the vector corresponding to "dog" is (0.2.0.2.4), and the process of mapping the text into a multidimensional vector space is called word embedding.

8) Recall (Match): triggering as many correct results as possible from the full-scale information set, and returning the results to the user, wherein various recall modes are available, including collaborative filtering, topic model, content recall, hot spot recall and the like, for example, in a search system, information matched with keywords can be recalled from the full-scale information set according to keywords input by the user; in the recommendation system, since the user does not have an explicit search term input, content which the user may be interested in can be recommended to the user according to the interest tags, browsing records and the like of the user.

9) Coarse row: the rough rank is a link after recall in the advertisement management system, and the main aim is to finish information rough selection from tens of thousands to hundreds of orders through a heavy strategy+model, wherein the strategy can comprise a large number of diversity, relevance, cold start, exploration strategy and the like, the model can comprise LTR, liteCXR model and the like, wherein the LiteCXR model is one of the most important models in the rough rank stage, and the screening of information can be realized through eCPM.

10 Fine discharge): the fine ranking is a link following the coarse ranking in the advertisement management system, and the main objective is to accurately perform personalized ranking on information, finish the refinement from hundreds of orders to units orders, and rank a plurality of pieces of information obtained in the coarse ranking stage Through a Click-Through-Rate (CTR) predictive model.

With the evolution of internet technology, the alternation of product forms and the actual demands of advertisers, advertisements comprising a plurality of creatives, such as dynamic creative advertisements and multi-creative advertisements, are gradually evolved from the original advertisement form of a single creative, wherein the dynamic creative advertisements only need to upload original materials by advertisers, and the advertisement management system can automatically combine creatives in different forms, so that the threshold of making and putting advertisements by the advertisers is greatly reduced, and more error testing opportunities are given to the advertisers. The multi-creative advertisement helps advertisers solve the embarrassment that new advertisements cannot take, and improves exposure opportunities of new creatives by multiplexing data of already-stabilized advertisements. At the same time, these two new ad formats also present new technical challenges to the ad management system.

To balance computing performance and advertising effectiveness, advertisement management systems typically employ a staged funnel-like system architecture, subject to constraints of computing resources and response delays (e.g., hundred milliseconds). As shown in FIG. 1, when a user request arrives at the advertisement management system, the advertisement management system first filters out tens of thousands of advertisements from millions of advertisement libraries according to rules, including, for example, crowd targeting conditions selected by advertisers, behavioral interest targeting, scene targeting, application (APP, APPlication) installation targeting, etc., this stage is called a recall stage, and the recalled advertisements through the recall stage enter a coarse ranking stage, where the purpose of the coarse ranking stage is to make efficient and lightweight sorting and cutting of the recall queue on the basis of mining high-quality advertisements with exploratory potential, so as to rapidly screen out small amounts of relatively high-quality advertisements to later stages, i.e., fine ranking stages. The fine ranking stage determines the advertisements which are finally exposed to the user, and the main task is to sort the value of a small number of high-quality candidate advertisements according to available advertisement inputs (eCPM, effective Cost Per Mille) displayed every thousand times, and select the advertisement with the highest eCPP to be displayed to the user.

When an ad in an ad management system switches from a purely single creative ad modality to a single creative ad (i.e., an ad comprising only one creative) coexisting with an ad comprising multiple creatives, e.g., a DC ad, a MC ad, the current ad management system selects a single creative ad and an ad comprising multiple creatives in a coarse-ranking stage, i.e., for an ad comprising multiple creatives, the creative to be exposed in the coarse-ranking stage, also taking into account the requirements of computational resources and response delays. The related art mainly includes two schemes, namely, creative carousel mode and Multi-arm-bandwidth (MAB) based ad creative priority policy, which are described below.

First, the creative carousel mode will be described. In the carousel mode, if an advertisement including multiple creatives, such as an MC advertisement, the advertisement management system may not consider that multiple different creatives within the same advertisement are significantly different in terms of Click-Through-Rate (CTR), conversion Rate (CVR), etc., and when a user requests to reach the advertisement management system, the advertisement management system may randomly select one creative from among multiple creatives belonging to the advertisement for a certain recalled DC advertisement or MC advertisement, or play the different creatives according to some other rule, such as different time periods, to select one creative to be exposed to a later fine-ranking stage. The rule of the carousel mode is simple, the modification to the existing advertisement management system is minimum, the time delay is hardly increased, but obviously, the effect is the least guaranteed.

The following continues with a description of MAB-based ad creative priority policies. As in figure 2As shown, the method is based on strategy balance exploration and utilization of the multi-arm slot machine, and the information used internally comprises exposure, clicking, conversion, cost achievement, random exploration factors and the like. For each user request online, the ad management system may sample competing creatives (tid) for each ad containing multiple creatives based on the probability distribution D of the MAB policy ^* ) And a play creative (tid'). In the fine ranking stage, the competitive creative is adopted to represent the advertisement to carry out sequencing competition, and when the advertisement wins, the playing creative is adopted to play, and the specific flow is as follows: 1. extracting creative effect data; 2. calculating creative probabilities from historical data of each creative using a confidence interval upper bound (Upper Confidence Bound) algorithm; 3. and sampling the creative by using the play probability. However, because the competing environments of the different creatives are different, the actual number of plays will not match the play probability, and thus the play creative and the competing creative need to be sampled independently, competing with the competing creative, but exposing the play creative.

It can be seen that the creative carousel mode cannot well distinguish between a creative with good effect and a creative with poor effect, and budget waste is brought to an advertiser to a certain extent; the advertisement creative priority strategy based on MAB has complex flow due to inconsistent competitive creative and playing creative, and has poor creative preference effect due to no consideration of user information, and is difficult to promote and improve on the basis of algorithm, and has poor expandability.

In view of this, the embodiments of the present application provide an information recommendation method, a training method for a neural network model for information recommendation, an apparatus, an electronic device, a computer readable storage medium, and a computer program product, which can optimize a recommendation effect and improve a utilization rate of recommended resources. An exemplary application of the electronic device provided by the embodiment of the present application is described below, where the electronic device provided by the embodiment of the present application may be implemented as a terminal device, may be implemented as a server, or may be implemented cooperatively by the terminal device and the server.

The information recommendation method provided by the embodiment of the application is singly implemented by the server for example.

Referring to fig. 3, fig. 3 is a schematic architecture diagram of an information recommendation system 100 according to an embodiment of the present application, in order to support an application capable of optimizing a recommendation effect and improving a utilization rate of a recommendation resource, as shown in fig. 3, the information recommendation system 100 includes: the server 200, the network 300 and the terminal device 400, wherein the terminal device 400 is provided with a client 410, the client 410 can be various types of clients, such as an instant messaging client, a news information class reading client, an e-commerce shopping client, a browser and the like, and the network 300 can be a wide area network or a local area network or a combination of the two.

In some embodiments, the server 200 may first obtain at least one piece of multi-mode information sent to the object (for example, user 1) from a database (not shown in fig. 3), and then, for each piece of multi-mode information obtained, the server 200 may obtain presentation policy feature information (for example, an element type corresponding to the presentation policy, a tag of the presentation policy, etc.) respectively corresponding to a plurality of presentation policies included in the multi-mode information, and object feature information (for example, an interest tag of user 1, a behavior sequence, etc.) of the object; then, the server 200 may invoke the neural network model to perform prediction processing based on the feature information and the object feature information of each display policy, so as to obtain a sorting parameter corresponding to each display policy, and determine a target display policy corresponding to each piece of multi-mode information based on the sorting parameter; finally, the server 200 may transmit at least one piece of multi-mode information to which the corresponding target presentation policy is respectively applied to the terminal device 400 through the network 300.

In other embodiments, the embodiments of the present application may also be implemented by means of Cloud Technology (Cloud Technology), which refers to a hosting Technology that unifies serial resources such as hardware, software, networks, etc. in a wide area network or a local area network, so as to implement calculation, storage, processing, and sharing of data.

The cloud technology is a generic term of network technology, information technology, integration technology, management platform technology, application technology and the like based on cloud computing business model application, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical network systems require a large amount of computing and storage resources.

By way of example, the server 200 in fig. 3 may be a stand-alone physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (CDN, content Delivery Network), and basic cloud computing services such as big data and artificial intelligence platforms. The terminal device 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, etc. The terminal device 400 and the server 200 may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present application.

Continuing to describe the structure of the server 200 shown in fig. 3, referring to fig. 4A, fig. 4A is a schematic structural diagram of the server 200 provided in the embodiment of the present application, and the server 200 shown in fig. 4A includes: at least one processor 210, a memory 240, at least one network interface 220. The various components in server 200 are coupled together by bus system 230. It is understood that the bus system 230 is used to enable connected communications between these components. The bus system 230 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus system 230 in fig. 4A.

The processor 210 may be an integrated circuit chip with signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, or the like, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

The memory 240 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 240 optionally includes one or more storage devices that are physically located remote from processor 210.

Memory 240 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a random access Memory (RAM, random Access Memory). The memory 240 described in embodiments of the present application is intended to comprise any suitable type of memory.

In some embodiments, memory 240 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 241 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

network communication module 242 for reaching other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 include: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (USB, universal Serial Bus), etc.;

in some embodiments, the apparatus provided in the embodiments of the present application may be implemented in software, and fig. 4A shows an information recommending apparatus 243 stored in the memory 240, which may be software in the form of a program, a plug-in, or the like, including the following software modules: the acquisition module 2431, the prediction module 2432, the determination module 2433, the recommendation module 2434, the transfer module 2435, the training module 2436, the generation module 2437, the setting module 2438, and the ordering module 2439 are logical, and thus can be arbitrarily combined or further split according to the implemented functions. It should be noted that, in fig. 4A, all the above modules are shown at once for convenience of expression, but it should not be considered that the implementation that may include only the acquisition module 2431, the prediction module 2432, the determination module 2433, and the recommendation module 2434 is excluded in the information recommendation device 243, and the functions of each module will be described below.

In other embodiments, as shown in fig. 4B, the memory 240 may further store a training device 244 for neural network models for information recommendation, which may be software in the form of programs and plug-ins, including the following software modules: the acquisition module 2441, the sampling module 2442, the execution module 2443, and the generation module 2444 function of each module will be described below.

The information recommendation method and the training method for the neural network model for information recommendation provided by the embodiment of the application are specifically described below in connection with exemplary applications and implementations of the server provided by the embodiment of the application.

Before explaining the information recommendation method provided by the embodiment of the present application, firstly, the structure of the neural network model provided by the embodiment of the present application is explained. The structure of the neural network model provided by the embodiment of the application may include a double-tower structure shown in fig. 11A or a single-tower structure shown in fig. 11B, and the server may select the double-tower structure shown in fig. 11A or the single-tower structure shown in fig. 11B according to the data size of at least one piece of multi-mode information to be sent to perform prediction processing, for example, when the data size is greater than a data size threshold, the server may call the double-tower structure shown in fig. 11A to perform prediction processing so as to increase the calculation speed; and when the data amount is smaller than the data amount threshold, the server may call the single tower structure shown in fig. 11B to perform the prediction process.

In order to facilitate understanding of the information recommendation method provided by the embodiment of the present application, before explaining the information recommendation method provided by the embodiment of the present application, an explanation is first given to the training method of the neural network model for information recommendation provided by the embodiment of the present application.

Referring to fig. 5, fig. 5 is a flowchart of a training method for a neural network model for information recommendation according to an embodiment of the present application, and will be described with reference to the steps shown in fig. 5.

In step 101, a display policy ranking result corresponding to each of the plurality of multimode information samples is obtained.

Here, the presentation policy ranking result may be ranked based on recommended parameters of the presentation policy samples (i.e., parameters for quantifying recommended effects of the presentation policy samples) in the multimodal information samples.

In some embodiments, the server may obtain a history recommendation record for the sample object, and obtain a plurality of multi-mode information samples from the history recommendation record (for example, obtain a plurality of multi-mode information recommended to the sample object in the past month), and then for each multi-mode information sample, the server may further obtain a display policy ranking result corresponding to each multi-mode information sample, where the display policy ranking result may be a ranking result of the plurality of display policy samples in a fine ranking stage (the fine ranking stage is a stage for determining whether the display policy sample can be exposed most importantly, and can accurately reflect recommended parameters of the display policy sample), that is, a ranking result of the plurality of display policy samples that have been applied.

In step 102, a plurality of presentation strategy samples are sampled based on the presentation strategy ranking results.

In some embodiments, taking the multimode information sample a as an example, assume that the multimode information sample a includes 10 presentation policy samples, which are respectively the presentation policy sample 1, the presentation policy sample 2, the presentation policy samples 3, …, and the presentation policy sample 10, and the presentation policy ranking result of the 10 presentation policy samples is: the server may sample the presentation policy ranking results of the 10 presentation policy samples (e.g., the server may read the identifiers of the presentation policy samples conforming to the proportion from the presentation policy ranking results and use the identifiers of the corresponding presentation policy samples as sampling results), e.g., assuming that the server reads the identifiers of the presentation policy sample 2, the presentation policy sample 4 and the presentation policy sample 5 from the presentation policy ranking results of the 10 presentation policy samples, respectively, the server may use the presentation policy sample 2, the presentation policy sample 4 and the presentation policy sample 5 as sampling results, i.e., the server may perform subsequent processing on the sampled presentation policy sample 2, the presentation policy sample 4 and the presentation policy sample 5.

In step 103, the initialized neural network model is invoked to perform a training task based on the plurality of presentation policy samples to update parameters of the neural network model.

In some embodiments, the server may perform the training task by invoking the initialized neural network model based on the plurality of presentation policy samples to update parameters of the neural network model by: carrying out combination treatment on the plurality of display strategy samples in pairs to obtain a plurality of display strategy sample pairs; comparing recommended parameters of two display strategy samples included in each display strategy sample pair, and generating label data of each display strategy sample pair according to a comparison result; invoking an initialized neural network model to execute a first training task based on a plurality of presentation strategy sample pairs and corresponding label data; wherein the first training task comprises: and carrying out prediction processing on the neural network model which is invoked to be initialized based on the demonstration strategy sample, and updating parameters of the neural network model based on the difference between the comparison result obtained by prediction and the tag data.

It should be noted that, in addition to training the neural network model using the pair of presentation policy samples, a single presentation policy sample may also be used to train the neural network model, for example, after sampling to obtain a plurality of presentation policy samples, tag data corresponding to each presentation policy sample may be determined, and then, based on the plurality of presentation policy samples and the corresponding tag data, the initialized neural network model is invoked to perform a second training task, where the second training task includes: the method and the device for predicting the neural network model based on the display strategy sample call initialization of the neural network model are used for predicting, and parameters of the neural network model are updated based on the difference between a predicted result obtained through prediction and the tag data.

In step 104, a trained neural network model is generated based on the updated parameters.

In some embodiments, after the server invokes the initialized neural network model to perform the training task based on the plurality of presentation policy samples to update parameters of the neural network model, the trained neural network model may be generated based on the updated parameters, and then the server may invoke the trained neural network model to score and predict a plurality of presentation policies included in at least one piece of multimodal information to be transmitted, thereby screening out an optimal presentation policy for application.

The information recommendation method provided by the embodiment of the application will be described with reference to fig. 6.

Referring to fig. 6, fig. 6 is a flowchart of an information recommendation method according to an embodiment of the present application, and will be described with reference to the steps shown in fig. 6.

In step 201, at least one piece of multimodal information to be sent to an object is acquired.

Here, the object may refer to a user of various web applications, such as a user of an e-commerce shopping application, or a user of a news information reading application, etc., each piece of multi-mode information includes a plurality of presentation policies, and elements adopted by different presentation policies are not exactly the same, for example, taking multi-mode information as a multi-creative advertisement including a plurality of creatives as an example, assume that the multi-creative advertisement 1 includes 4 creatives, namely, a creative 1, a creative 2, a creative 3, and a creative 4, respectively, wherein the creative 1 includes a title 1 and a picture 1, the creative 2 includes a title 2 and a picture 1, the creative 3 includes a title 1 and a picture 2, and the creative 4 includes a title 2 and a picture 2.

In some embodiments, the server may obtain at least one piece of multimodal information to be sent to the object by: acquiring multimode information satisfying at least one of the following conditions: matching object feature information (e.g., user's interest tags, behavior sequences, browsing records, etc.) of the object; the similarity (e.g., cosine similarity) with the object feature vector of the object is less than a similarity threshold; and sorting the pieces of multi-mode information based on the predicted click rate, and screening at least one piece of multi-mode information with the front sorting from the sorting result.

For example, taking multi-modal information as an example of multi-creative advertisements, the server may first recall from the ad library multi-creative advertisements that match the text of the tag or keyword, such as multi-creative advertisements that match the region, gender, age, etc. in ad targeting conditions, based on the user's interest tag or keyword; of course, the server may also perform word embedding processing by selecting effective original features, for example, may learn to obtain vector representations of the user and the advertisement respectively through a double-tower structure similar to that shown in fig. 11A, and then calculate vector similarity by using dot product, cosine similarity, euclidean distance, or the like, so as to recall, from the advertisement library, a multi-creative advertisement having a similarity with the feature vector of the user smaller than the similarity threshold; after recalling the plurality of multi-creative advertisements from the ad library, the server may further coarsely rank the recalled plurality of multi-creative advertisements, e.g., may rank the plurality of multi-creative advertisements based on the predicted click rate, and screen at least one top-ranked multi-creative advertisement from the ranking results.

In the following, the generation process of a plurality of creatives included in a multi-creative advertisement is described by taking multi-mode information as an example, and when the multi-mode information is the multi-creative advertisement, the object may be a target user who is an advertiser to deliver the advertisement. In some embodiments, the server may also perform any of the following processes prior to performing step 201: acquiring a material corresponding to a plurality of parts to be filled of the multi-mode information respectively, and filling each material into the corresponding part to be filled to obtain a display strategy of the multi-mode information; and acquiring a plurality of materials respectively corresponding to the plurality of parts to be filled of the multi-mode information, randomly selecting the materials corresponding to each part to be filled, and filling the selected materials into the corresponding parts to be filled to obtain a display strategy of the multi-mode information.

The embodiment of the application provides two modes for acquiring the multi-mode information. The first mode is to acquire one material corresponding to each of a plurality of parts to be filled of the multi-mode information, and fill each material into the corresponding part to be filled to obtain a display strategy of the multi-mode information. Taking multi-mode information as an example of the multi-creative advertisement, assuming that a part to be filled of the multi-creative advertisement comprises a picture part and a document part, and assuming that a picture material 1 uploaded by an advertiser is acquired for the picture part; for the document part, assuming that the document material 1 uploaded by the advertiser is acquired, the server can fill the picture material 1 into the picture part and fill the document material 1 into the document part, so that one creative of the multi-creative advertisement is obtained. The advertiser can continue to make the next creative of the multi-creative ad by uploading the material.

The second mode is that a plurality of materials corresponding to a plurality of parts to be filled of the multi-mode information are obtained, random selection is carried out on the materials corresponding to each part to be filled, the selected materials are filled into the corresponding parts to be filled, and a display strategy of the multi-mode information is obtained. For example, for a picture portion of an advertisement, assume that picture material 1 and picture material 2 uploaded by an advertiser are acquired; for the document portion of the advertisement, assuming that the document material 1 and the document material 2 uploaded by the advertiser are acquired, the server may fill the randomly selected picture material (for example, the picture material 1) into the picture portion, and fill the randomly selected document material (for example, the document material 1) into the document portion, thereby obtaining a creative of the advertisement. It should be noted that, here, the combination modes of the materials corresponding to the different parts to be filled may be exhausted until all possible creatives are obtained, for example, 4 different creatives including (picture material 1, document material 1), (picture material 1, document material 2), (picture material 2, document material 1) and (picture material 2, document material 2) may be obtained, so that the number of obtained creatives may be increased, and flexibility of generating multi-mode information may be further improved.

It should be noted that the multi-creative advertisement is only one specific form of multi-mode information, and the multi-mode information may also be news including a plurality of presentation strategies, or dynamic pictures (GIFs), etc.

In step 202, display policy feature information and object feature information of an object corresponding to each of the plurality of display policies are acquired.

In some embodiments, for each piece of multimodal information acquired in step 201, the server may acquire presentation policy feature information corresponding to each of a plurality of presentation policies included in the multimodal information, for example, including a basic attribute of the presentation policy (for example, an element type corresponding to the presentation policy), a label of the presentation policy (for characterizing a domain to which the multimodal information belongs), a statistical feature of the presentation policy (for example, a predicted click rate of the presentation policy), and so on; in addition, the server may also obtain object feature information of the object, including, for example, an interest tag, a behavior sequence, a browsing record, and the like of the object.

By way of example, assuming that the multi-creative ad 1 includes 4 creatives, namely creative 1, creative 2, creative 3, and creative 4, the server may obtain creative feature information corresponding to each creative, respectively, including, for example, creative base attributes (e.g., the number and types of elements employed by the creative), creative tags (used to characterize the type of ad to which the creative belongs, i.e., which domain of ads), statistical features (e.g., the predicted click-through rate of the creative), and so forth.

In step 203, based on the feature information of each display policy and the feature information of the object, a neural network model is called to perform prediction processing, so as to obtain a ranking parameter corresponding to each display policy.

Here, the neural network model is trained based on the sorting result of the applied plurality of presentation policy samples, and may be, for example, a neural network model trained through steps 101 to 104 shown in fig. 5.

In some embodiments, the neural network model may be trained using the presentation policy sample pairs, and the server may then also perform steps 206 through 210 shown in fig. 7, before performing step 203 shown in fig. 6, as will be described in connection with the steps shown in fig. 7.

In step 206, a display policy ranking result corresponding to each of the plurality of multimode information samples is obtained.

Here, the display policy ranking result is obtained by ranking based on recommended parameters of the display policy samples in the multimode information samples, for example, a plurality of display policy samples in the fine ranking stage may be ranked to obtain the display policy ranking result.

In step 207, a plurality of presentation strategy samples are sampled based on the presentation strategy sample ordering result.

In some embodiments, the server may implement step 207 described above by: determining a sampling proportion according to the training speed of the neural network model, wherein the proportion is inversely related to the training speed; the display policy sorting result corresponding to each multimode information sample is sampled according to a proportion to obtain a plurality of display policy samples, for example, the server can read the identifiers (such as IDs) of the display policy samples conforming to the proportion from the display policy sorting result to serve as a sampling result (for example, assuming that the server reads the identifier of the display policy sample 1 from the display policy sorting result, the display policy sample 1 can be taken as the sampling result), wherein the reading mode can be sequential reading or random reading.

For example, to increase the training speed of the neural network model, the sampling proportion may be reduced appropriately, for example, only 20% of the presentation strategy samples may be sampled from the sorting result to perform the training of the subsequent model; of course, in order to improve accuracy of neural network model prediction (i.e. training speed of the neural network model may be slow), sampling proportion may be increased appropriately, for example, 50% of the display policy samples may be sampled from the sorting result to perform training of the subsequent model, and those skilled in the art may determine sampling proportion according to actual requirements, which is not limited in the embodiment of the present application.

In step 208, the plurality of presentation policy samples are combined two by two to obtain a plurality of presentation policy sample pairs.

In some embodiments, after sampling the sequencing results of the multiple presentation policy samples in each multimode information sample to obtain multiple presentation policy samples, the server may further perform pairwise combination processing on the multiple presentation policy samples obtained by sampling to obtain multiple presentation policy sample pairs.

Taking the multi-mode information sample a as an example, assume that a server samples the sequencing results of a plurality of presentation policy samples included in the multi-mode information sample a to obtain 4 presentation policy samples, which are respectively a presentation policy sample 1, a presentation policy sample 2, a presentation policy sample 3 and a presentation policy sample 4, and then the server can perform two-by-two combination processing on the 4 presentation policy samples to obtain 6 presentation policy sample pairs, which are respectively: (show policy sample 1, show policy sample 2), (show policy sample 1, show policy sample 3), (show policy sample 1, show policy sample 4), (show policy sample 2, show policy sample 3), (show policy sample 2, show policy sample 4), and (show policy sample 3, show policy sample 4).

In step 209, recommended parameters of two presentation policy samples included in each presentation policy sample pair are compared, and tag data of each presentation policy sample pair is generated according to the comparison result.

Here, the recommended parameter refers to a parameter for quantifying a recommended effect.

In some embodiments, after comparing the recommended parameters of the two presentation policy samples included in each presentation policy sample pair, the server may implement the generating tag data of each presentation policy sample pair according to the comparison result as follows: the following is performed for each presentation policy sample pair: when the recommended parameters of the first showing strategy sample in the showing strategy sample pair are larger than those of the second showing strategy sample, determining the label data of the showing strategy sample pair as 1; when the recommended parameters of the first presentation policy sample in the presentation policy sample pair are smaller than the recommended parameters of the second presentation policy sample, the tag data of the presentation policy sample pair is determined to be 0.

Taking the example of the presentation policy sample pair a, assuming that the presentation policy sample pair a includes a presentation policy sample 1 and a presentation policy sample 2, when the recommendation parameter of the presentation policy sample 1 is greater than the recommendation parameter of the presentation policy sample 2 (for example, the ranking of the presentation policy sample 1 in the ranking result is greater than the presentation policy sample 2, that is, the recommendation effect of the presentation policy sample 1 is better than the presentation policy sample 2), the server may determine the tag data of the presentation policy sample pair a as 1; when the recommended parameters of the presentation policy sample 1 are smaller than the recommended parameters of the presentation policy sample 2 (for example, the ranking of the presentation policy sample 1 in the ranking result is smaller than the presentation policy sample 2, that is, the recommended effect of the presentation policy sample 2 is better than the presentation policy sample 1), the server may determine the tag data of the presentation policy sample pair a as 0.

In step 210, a neural network model is trained based on a plurality of presentation policy sample pairs and corresponding tag data.

In some embodiments, when the neural network model includes a plurality of hidden layers, and the plurality of hidden layers are sequentially cascaded to form a tower network, the server may further perform the following processing before training the neural network model based on the plurality of presentation policy sample pairs and the corresponding tag data: setting a mirror network of a tower network in the neural network model, wherein the tower network is used for receiving the display policy characteristic information of one display policy sample in the display policy sample pair, and the mirror network is used for receiving the display policy characteristic information of the other display policy sample in the display policy sample pair. It should be noted that the structure and parameters of the mirror network are identical to those of the tower network, i.e., the mirror network and the tower network are two identical networks.

Taking the display strategy sample pair a as an example, assuming that the display strategy sample pair a includes a display strategy sample 1 and a display strategy sample 2, in a training stage of the neural network model, the server may set a mirror image network of the tower network in the neural network model first, then may input a first splicing vector (i.e., a vector obtained by splicing a display strategy feature vector and an object feature vector of the display strategy sample 1) into the tower network to obtain a sorting parameter of the display strategy sample 1, and input a second splicing vector (i.e., a vector obtained by splicing a display strategy feature vector and an object feature vector of the display strategy sample 2) into the mirror image network to obtain a sorting parameter of the display strategy sample 2, then the server may determine a difference value between the sorting parameter of the display strategy sample 1 and the sorting parameter of the display strategy sample 2, and substitute the difference value and label data of the display strategy sample pair a into a loss function to determine a corresponding error, and finally determine a gradient of the neural network model according to the error, and update parameters of the neural network model based on the gradient; after training the neural network model based on the plurality of presentation policy sample pairs and the corresponding tag data, the server may further perform the following: the mirror network is removed from the neural network model. That is, in the prediction stage, the server may simply encode the spliced vector (i.e., a vector obtained by splicing the display policy feature vector and the object feature vector of the display policy) through the tower network, so as to obtain the sorting parameters corresponding to each display policy respectively.

For example, referring to fig. 8A, fig. 8A is a schematic diagram of a training stage and a prediction stage of a neural network model provided by an embodiment of the present application, as shown in fig. 8A, in the training stage of the neural network model, a server may set a mirror network 802 of a tower network 801 in the neural network model, encode display policy feature vectors of two display policy samples in a display policy sample pair through the tower network 801 and the mirror network 802, respectively, and after training the neural network model based on a plurality of display policy sample pairs and corresponding tag data, the server may remove the mirror network 802 from the neural network model, and it may be seen that, in the training stage, there is a great difference between structures of the neural network model in the training stage and the prediction stage, input may be performed based on the display policy sample pair, and in the prediction stage, input may not be performed based on the display policy pair, and only a single display policy feature vector may be input, that is, in the model in the prediction stage is reduced by half to be used. Further, the loss function shown in fig. 8A may be various types of loss functions, such as a cross entropy loss function, a hinge loss function, and the like.

In other embodiments, when the plurality of hidden layers included in the neural network model are divided into two groups, the plurality of hidden layers of the first group are sequentially cascaded to form a first tower network, and the plurality of hidden layers of the second group are sequentially cascaded to form a second tower network, the server may further perform the following processing before training the neural network model based on the plurality of presentation policy sample pairs and the corresponding tag data: setting a mirror network of a second tower network in the neural network model, wherein the second tower network is used for receiving the display policy characteristic information of the individual display policy sample in the display policy sample pair, and the mirror network is used for receiving the display policy characteristic information of the other display policy sample in the display policy sample pair. It should be noted that the structure and parameters of the mirror network are identical to those of the second tower network, i.e., the mirror network and the second tower network are identical networks.

Taking the display strategy sample pair a as an example, assuming that the display strategy sample pair a comprises a display strategy sample 1 and a display strategy sample 2, in a training stage of a neural network model, a server can firstly set a mirror image network of a second tower network in the neural network model, then can input display strategy characteristic information of the display strategy sample 1 into the second tower network to obtain a sorting parameter of the display strategy sample 1, and input display strategy characteristic information of the display strategy sample 2 into the mirror image network to obtain a sorting parameter of the display strategy sample 2, then the server can determine a difference value between the sorting parameter of the display strategy sample 1 and the sorting parameter of the display strategy sample 2, and substitutes the difference value and label data of the display strategy sample pair a into a loss function to determine a corresponding error, finally determines a gradient of the neural network model according to the error, and updates the parameter of the neural network model based on the gradient; after training the neural network model based on the plurality of presentation policy sample pairs and the corresponding tag data, the server may further perform the following: the mirror network is removed from the neural network model, that is, in the prediction stage, the server may simply encode the display policy feature vector of each display policy through the second tower network, so as to obtain the ranking parameter corresponding to each display policy.

For example, referring to fig. 8B, fig. 8B is a schematic diagram of a training stage and a prediction stage of a neural network model according to an embodiment of the present application, as shown in fig. 8B, in the training stage of the neural network model, a server may set a mirror network 804 of a second tower network 803 in the neural network model, and encode, through the second tower network 803 and the mirror network 804, a presentation policy feature vector of two presentation policy samples included in a presentation policy sample pair respectively, and after training the neural network model based on a plurality of presentation policy sample pairs and corresponding tag data, the server may remove the mirror network 804 from the neural network model, that is, in the prediction stage, reduce the model in the training stage by half for use.

It should be noted that, besides training the neural network model by using the presentation policy samples, a single presentation policy sample may be used to train the neural network model, for example, the neural network model may be trained based on a plurality of presentation policy samples and tag data corresponding to each presentation policy sample, where the tag data of each presentation policy sample is used to characterize the ranking of the presentation policy sample in the ranking result, for example, for the presentation policy sample with the top ranking, the value of the corresponding tag data is also larger, for example, assuming that the presentation policy sample 2 is ranked in the 2 nd position in the ranking result, the value of the corresponding tag data may be 0.9; assuming that the display strategy sample 3 is ranked at the 5 th bit in the ranking result, the value of the corresponding tag data may be 0.7, that is, when the training is performed by using a single display strategy sample, no additional mirror network is required to be set in the neural network model in the training stage, that is, the structures of the neural network model in the training stage and the prediction stage are the same, and the training mode of the neural network model is not specifically limited in the embodiment of the present application.

In other embodiments, the neural network model may include an input layer, at least one hidden layer, and an output layer, and then step 203 shown in fig. 6 may be implemented by steps 2031 to 2033 shown in fig. 9, which will be described in connection with the steps shown in fig. 9.

For each presentation policy feature information, invoking a neural network model to perform the following processing:

in step 2031, vectorization processing is performed on the display policy feature information and the object feature information, so as to obtain a display policy feature vector of the display policy and an object feature vector of the object.

In some embodiments, for each presentation policy feature information, the server may invoke an input layer of the neural network model to perform vectorization processing based on the presentation policy feature information and the object feature information, to obtain a presentation policy feature vector of the presentation policy and an object feature vector of the object.

For example, the server may perform vectorization processing on the presentation policy feature information to obtain a presentation policy feature vector of the presentation policy in the following manner: invoking an input layer of a neural network model to perform One-Hot encoding (One-Hot) processing on the display strategy characteristic information to obtain a first One-Hot encoding vector; after obtaining the first one-hot encoded vector, the server may then invoke an input layer of the neural network model to perform word embedding processing on the first one-hot encoded vector, and determine the obtained first word embedded vector as a presentation policy feature vector of the presentation policy.

For example, the server may perform vectorization processing on the object feature information to obtain an object feature vector of the object in the following manner: invoking an input layer of the neural network model to perform one-heat coding treatment on the object characteristic information to obtain a second one-heat coding vector; after obtaining the second unicode vector, the server may continue to call an input layer of the neural network model to perform word embedding processing on the second unicode vector, and determine the obtained second word embedding vector as an object feature vector of the object.

In other embodiments, the input layer of the neural network model may include sub-input layers corresponding to a plurality of fields (including, for example, a first sub-input layer corresponding to an education industry, a second sub-input layer corresponding to an electronic commerce industry, etc.), and the server may implement the foregoing process of uni-thermal encoding on the object feature information to obtain the second uni-thermal encoding vector by: determining the domain to which the object belongs; invoking a sub-input layer corresponding to the field to perform one-time heat encoding processing on the object feature information to obtain a second one-time heat encoding vector, for example, assuming that the field to which the user belongs is determined to be the education industry, the server may invoke a first sub-input layer corresponding to the education industry to perform one-time heat encoding processing on the feature information of the user to obtain the second one-time heat encoding vector.

It should be noted that, in the embodiment of the present application, a new sub-input layer corresponding to the domain may be added to the input layer of the neural network model according to the characteristics of the traffic scenario, for example, when multi-mode information needs to be recommended to a user in the manufacturing industry, a third sub-input layer corresponding to the manufacturing industry may be added to the input layer of the neural network model, which is highly scalable.

In step 2032, the presentation policy feature vector and the object feature vector are encoded to obtain a hidden layer feature vector.

In some embodiments, the encoding process may be implemented by invoking at least one Hidden Layer (hidden_layer) of the neural network model, where when the number of at least one Hidden Layer is multiple, the multiple Hidden layers may be cascaded in turn to form a tower network; the server may implement step 1032 described above by: the server first performs a concatenation (Concat) process on the display policy feature vector and the object feature vector to obtain a concatenation vector, for example, assume that the display policy feature vector is e= (E) ₁ 、e ₂ 、e ₃ ) Object feature vector is d= (D) ₁ 、d ₂ ) Then, the display strategy feature vector E and the object feature vector D are spliced, and the obtained spliced vector is (E) ₁ 、e ₂ 、e ₃ 、d ₁ 、d ₂ ) The method comprises the steps of carrying out a first treatment on the surface of the After the spliced vector is obtained, the server can call the tower network to encode the spliced vector to obtain the hidden layer feature vector.

For example, the server may implement the foregoing call tower network to encode the spliced vector to obtain the hidden layer feature vector by: encoding the spliced vector through a first hidden layer of the tower network; inputting the coding result output by the first hidden layer into the hidden layer of the subsequent cascade connection, and continuing coding processing through the hidden layer of the subsequent cascade connection until the last hidden layer of the tower network; and determining the coding result output by the last hidden layer as a hidden layer characteristic vector. Taking the number of the plurality of Hidden layers as 4 as an example, the number of the Hidden layers is respectively Hidden Layer 1 (Hidden_layer_1), hidden Layer 2 (Hidden_layer_2), hidden Layer 3 (Hidden_layer_3) and Hidden Layer 4 (Hidden_layer_4), the 4 Hidden layers can be sequentially cascaded to form a tower network, the server can firstly encode the splicing vector through the Hidden Layer 1 of the tower network, then input the encoding result output by the Hidden Layer 1 into the Hidden Layer 2 so as to encode the encoding result output by the Hidden Layer 2 into the Hidden Layer 1, then continuously input the encoding result output by the Hidden Layer 2 into the Hidden Layer 3 so as to encode the encoding result output by the Hidden Layer 3 into the Hidden Layer 4, finally encode the encoding result output by the Hidden Layer 3 into the Hidden Layer 4 so as to encode the encoding result output by the Hidden Layer 4 into the Hidden Layer feature vector.

In other embodiments, before encoding the presentation policy feature vector and the object feature vector to obtain the hidden layer feature vector, the server may further perform the following processing: acquiring the data volume of at least one piece of multi-mode information; and in response to the data volume being smaller than the data volume threshold, switching to the step of executing the tower network for coding processing based on the presentation strategy feature vector and the object feature vector.

For example, the server may determine whether to invoke the single-tower network or the double-tower network to perform encoding processing according to characteristics (such as data volume, time consumption, etc.) of different traffic scenarios, for example, when the server determines that the data volume of at least one piece of multi-mode information to be sent is smaller than a data volume threshold, the server may perform splicing processing on the display policy feature vector and the object feature vector, and invoke the tower network (i.e. the single-tower network) to perform encoding processing on the spliced vector obtained by splicing, so that interaction between the display policy feature vector and the object feature vector can be fully implemented on the basis of ensuring time consumption, thereby obtaining more accurate ordering parameters.

In other embodiments, the encoding process may be implemented by invoking at least one hidden layer of the neural network model, where the number of at least one hidden layer is plural, the plurality of hidden layers may be divided into two groups, the plurality of hidden layers of the first group being cascaded in turn to form a first tower network The plurality of hidden layers of the second group are cascaded in turn to form a second tower network, and the server may implement step 1032 as described above by: the following is performed for each presentation policy feature vector: invoking a first tower network to encode the object feature vector to obtain a first sub-hidden layer feature vector; invoking a second tower network to encode the displayed strategy feature vector to obtain a second sub-hidden layer feature vector; performing Dot multiplication (Dot) on the first sub-hidden layer feature vector and the second sub-hidden layer feature vector to obtain a hidden layer feature vector, wherein the Dot multiplication is also called Dot product or number product, which refers to binary operation of accepting two vectors on a real number R and returning a real number scalar, and is the standard inner product of Euclidean space, for example, for two vectors a= [ a ] ₁ ，a ₂ ，…，a _n ]And b= [ b ] ₁ ，b ₂ ，…，b _n ]The dot product of (2) is defined as: a.b=a ₁ b ₁ +a ₂ b ₂ +…+a _n b _n 。

Taking at least one hidden layer as 6 examples, namely, a hidden layer 1, a hidden layer 2, a hidden layer 3, a hidden layer 4, a hidden layer 5 and a hidden layer 6 respectively, the 6 hidden layers can be divided into two groups, for example, the hidden layers 1 to 3 can be divided into one group, the hidden layers 4 to 6 can be divided into another group, wherein a plurality of hidden layers (namely, the hidden layers 1 to 3) of the first group can be sequentially cascaded to form a first tower network, a plurality of hidden layers (namely, the hidden layers 4 to 6) of the second group can be sequentially cascaded to form a second tower network, then the server can call the first tower network to encode the object feature vector, for example, the server can firstly encode the object feature vector through the hidden layer 1 of the first tower network, then input the encoding result output by the hidden layer 1 to the hidden layer 2 to enable the hidden layer 2 to continue to encode the encoding result output by the hidden layer 1, then input the encoding result output by the hidden layer 2 to the hidden layer 3 to enable the hidden layer 3 to output the hidden layer 3 to be determined to encode the encoding result, and determine the encoding result of the hidden layer 2 to output the hidden layer 3; for each presentation policy feature vector, the server may call the second tower network to perform coding processing on the presentation policy feature vector, for example, the server may first perform coding processing on the presentation policy feature vector through the hidden layer 4 of the second tower network, and input the coding result output by the hidden layer 4 to the hidden layer 5, so that the hidden layer 5 performs coding processing on the coding result output by the hidden layer 4, and then input the coding result output by the hidden layer 5 to the hidden layer 6, so that the hidden layer 6 performs coding processing on the coding result output by the hidden layer 5, and determines the coding result output by the hidden layer 6 as a second sub-hidden layer feature vector; after obtaining the first sub-hidden layer feature vector (i.e., the hidden layer feature vector obtained by encoding the object feature vector) and the second sub-hidden layer feature vector (i.e., the hidden layer feature vector obtained by encoding the presentation policy feature vector), the server may perform a point multiplication process on the first sub-hidden layer feature vector and the second sub-hidden layer feature vector, and determine the point multiplication result as the hidden layer feature vector.

In other embodiments, before encoding the presentation policy feature vector and the object feature vector to obtain the hidden layer feature vector, the server may further perform the following processing: acquiring the data volume of at least one piece of multi-mode information; and responding to the data quantity being greater than or equal to the data quantity threshold value, switching to execute the steps of respectively calling the first tower network and the second tower network to carry out coding processing based on the object feature vector and the display strategy feature vector.

For example, the server may determine whether to invoke the single-tower network or the double-tower network to perform encoding processing according to characteristics (such as data volume, time consumption, etc.) of different traffic scenarios, for example, when the server determines that the data volume of at least one piece of multi-mode information to be sent is greater than a data volume threshold, the server may invoke the double-tower network (i.e., the first tower network and the second tower network described above) to perform encoding processing on the object feature information and the presentation policy feature information respectively, so as to improve the calculation efficiency, and save the time required to be consumed in the step of scoring the presentation policy.

In step 2033, activation processing is performed on the hidden layer feature vector, so as to obtain the ranking parameters of the display policy.

In some embodiments, after obtaining the hidden layer feature vector, the server may invoke an output layer of the neural network model to perform activation processing based on the hidden layer feature vector, to obtain a ranking parameter of the presentation policy (a parameter for quantifying a recommended effect of the presentation policy). For example, the output layer of the neural network model may include various types of nonlinear activation functions (e.g., reLU, sigmoid, tanh functions, etc.), that is, after obtaining the hidden layer feature vector, the server may call the activation function included in the output layer of the neural network model to activate the hidden layer feature vector to obtain the ranking parameter (e.g., score) of the presentation policy.

With continued reference to FIG. 6, in step 204, a target presentation policy corresponding to each piece of multimodal information is determined.

Here, the target presentation policy is a presentation policy having the largest sorting parameter among the multimodal information.

In some embodiments, taking the number of at least one piece of multi-mode information as 3 as an example, assume multi-mode information 1, multi-mode information 2 and multi-mode information 3, respectively, wherein multi-mode information 1 includes 3 presentation policies, namely presentation policy 1, presentation policy 2 and presentation policy 3, respectively; the multi-mode information 2 comprises 2 display strategies, namely a display strategy 4 and a display strategy 5; the multi-mode information 3 includes 4 display policies, namely a display policy 6, a display policy 7, a display policy 8 and a display policy 9, respectively, meanwhile, assuming that the display policy with the largest sorting parameter in the multi-mode information 1 is a display policy 1, the display policy with the largest sorting parameter in the multi-mode information 2 is a display policy 4, and the display policy with the largest sorting parameter in the multi-mode information 3 is a display policy 9, the display policy 1 may be determined as a target display policy of the multi-mode information 1 (i.e. the multi-mode information 1 will be displayed using the display policy 1), the display policy 4 may be determined as a target display policy of the multi-mode information 2 (i.e. the multi-mode information 2 will be displayed using the display policy 4), and the display policy 9 may be determined as a target display policy of the multi-mode information 3 (i.e. the multi-mode information 3 will be displayed using the display policy 9).

In step 205, a recommendation operation for the object is performed based on at least one piece of multimodal information to which the corresponding target presentation policy is respectively applied.

In some embodiments, when the number of the at least one piece of multi-mode information is a plurality of pieces, the server may further perform the following process before performing the recommended operation for the object based on the at least one piece of multi-mode information to which the corresponding target presentation policy is respectively applied: sorting the pieces of multi-mode information to which the corresponding target presentation policies are applied, respectively, according to any one of the following sorting policies: sorting the pieces of multi-mode information respectively applied with the corresponding target display strategies according to the information types; randomly sequencing a plurality of pieces of multi-mode information to which corresponding target display strategies are respectively applied; and ordering the pieces of multi-mode information respectively applied with the corresponding target display strategies according to the operation rules.

By way of example, taking multi-mode information as an example of multi-creative advertisements, after determining the optimal creative (i.e., target display strategy) in each multi-creative advertisement, the multi-creative advertisements can be further ordered, for example, when the multi-creative advertisements comprise advertisements of different types such as electronic commerce, education and the like, the multi-creative advertisements of the same type can be classified according to the type of the information for attractive visual effect, so that the multi-creative advertisements of the same type are arranged together, the user can conveniently find the multi-creative advertisements, and the visual experience of the user is improved; of course, in order to explore the new interests of the user, the multiple multi-creative advertisements can be randomly ordered, and the ordered multiple multi-creative advertisements can be pushed to the user; in addition, the multiple creative advertisements can be ranked according to the operation rule, for example, in order to popularize the product A, the multiple creative advertisements aiming at the product A can be ranked in front, so that the exposure rate of the advertisements of the product A is greatly increased.

According to the information recommendation method provided by the embodiment of the application, the neural network model is trained based on the sequencing results of the applied display strategy samples, so that the trained neural network model can score a plurality of display strategies included in the multi-mode information to be sent, and the optimal display strategy is screened out and recommended to the user, thus the pertinence and the accuracy of recommendation are enhanced, the finally screened display strategy can meet the requirements of objects, and the utilization rate of recommended resources is improved.

In the following, an exemplary application of an embodiment of the present application in a practical application scenario will be described by taking multi-mode information as an advertisement (e.g., DC advertisement, MC advertisement) including a plurality of creatives.

The embodiment of the application provides an information recommendation method, which adopts a sequencing learning mode to construct the sequencing result of eCPM of a plurality of creatives in a learning and fine-ranking stage of a neural network model, and simultaneously adopts a Group (Group) mode to ensure that one advertisement is a Group and a tLTR model (corresponding to the neural network model) needs to rank different creatives in one advertisement in a creative priority task without paying attention to sequencing among different advertisements. Thus, compared with the MAB-based advertising creative priority strategy provided by the related art, the scheme provided by the embodiment of the application considers the information requested by the user, is an online real-time personalized creative priority model, and has a better preferable effect than the MAB-based advertising creative priority strategy. In addition, the method provided by the embodiment of the application can be improved and modified according to different scenes, for example, due to the fact that different flow scenes have larger differences and different audience distributions, the model can be split by different sites (namely the input layer of the model can be split into the sub-input layers corresponding to the fields respectively), different characteristics can be added into the model according to the flow characteristics (namely the new sub-input layer corresponding to the new field can be added into the input layer of the model), and the expandability is high.

The information recommendation method provided by the embodiment of the application is specifically described below.

DC advertisement and MC advertisement are the most important advertisement forms of advertisement products in the current advertisement management system, and occupy larger share in the whole large-disc consumption, so that the improvement of the throwing effect of DC advertisement and MC advertisement can bring obvious advertiser throwing experience and exposure experience of advertisement audiences. The information recommendation method provided by the embodiment of the application can be applied to DC advertisements and MC advertisements, can be used for individuating and automatically determining the optimal creative in the DC advertisements and the MC advertisements, and automatically realizes flow inclination.

In some embodiments, referring to fig. 10, fig. 10 is a schematic diagram of an architecture of an advertisement management system according to an embodiment of the present application, where, as shown in fig. 10, an advertisement creative ranking policy is updated from an original MAB-based creative priority policy to online tLTR (tidwise Learning To Rank) model scoring+creative priority, where the tLTR model is located after a coarse ranking stage and before a fine ranking stage, and the model is for each advertisement including multiple creatives, such as MC advertisements, and an optimal creative is individually selected in different requests and sent to subsequent fine ranking stages. For example, assuming 200 advertisements in the ad queue resulting from the coarse ranking stage in one user request, where 50 advertisements are multi-creative advertisements, the tLTR model may select the corresponding best creative for each of the 50 multi-creative advertisements. For example, the tLTR model may fit the ranking results of the creatives in the fine ranking stage to score estimates and rank the creatives within an advertisement using a coarse ranking algorithm based on the fine ranking results, where the reason for learning the ranking results of the creatives in the fine ranking stage is: 1. the fine-ranking stage is the link of the advertisement management system, which is recommended to be the most accurate; 2. the fine ranking stage is the last link for determining whether the advertisement can be exposed, so that the coarse ranking stage can increase the consistency of the whole ranking and the large disc effect only by fitting the ranking result of the fine ranking stage. The training samples of the model, the model structure, and the loss function will be described separately.

In some embodiments, the tLTR model may be trained in a pairwise fashion to learn the relationship of eCPM order between creatives. Thus, training samples of the model may be derived from creatives in the sample-by-request precision-placement queue. Meanwhile, considering that the creative set comprises a large number of creatives and is limited by the training speed of the model, a certain sampling can be performed, wherein the sampling proportion can be determined according to the training speed of the model, for example, the creative of Top1 can be sampled from the sequencing result, and 5 creatives can be randomly sampled from Top2 to Top20The intention, and sampling 9 creatives from Top20, i.e., sampling a total of 15 creatives, may then be combined two by two to obtain 105 partial order pairs (i.e., training sample pairs corresponding to the presentation strategy sample pairs described above). For example, taking partial order pair 1 as an example, assume that partial order pair 1 includes creative 1 and creative 2, when eCPM of creative 1 ₁ eCPM greater than creative 2 ₂ If so, the tag (label) of the partial order pair 1 is set to 1, otherwise, the tag of the partial order pair 1 is set to 0.

In other embodiments, the tLTR model structure may be further divided into a double-tower structure shown in fig. 11A and a non-double-tower structure (i.e., a single-tower structure) shown in fig. 11B, according to characteristics of different flow scenarios (e.g., data volume, time consumption, etc.). The double-tower structure divides the characteristic vector of the user and the characteristic vector of the creative into two network towers for calculation respectively, and finally, dot product is carried out on the encoding result of the characteristic vector of the user and the encoding result of the characteristic vector of the creative to obtain final prediction score (corresponding to the sorting parameters) of the creative; the non-double-tower structure is a single-tower network, wherein the characteristic vector of a user and the characteristic vector of the creative are spliced, and the spliced vector obtained by splicing is input into one tower for calculation, so that the final prediction score of the creative is obtained. the overall structure of the tLTR model can be divided into: the feature input layer, the hidden layer, and the output layer are described below as input layer, hidden layer, and loss function of the tLTR model, respectively.

1) Feature input layer

In some embodiments, the feature information of the input tLTR model includes feature information of the user (e.g., interest tag, behavior sequence, browse record, etc. of the user) and feature information of the creative (e.g., base attribute of the creative, creative tag, statistical feature, etc.), where the feature information of the user and the feature information of the creative include a large number of category features and character type features, and therefore the feature information needs to be first vectorized to facilitate subsequent model calculation. The word embedding vector of each feature information may be constructed, for example, by using a query word embedding (embedding) table (wherein the embedding table may be a matrix obtained by random initialization, and the shape of the matrix depends on the number of features and the dimensions of the set embedded features).

For example, referring to fig. 12, fig. 12 is a schematic diagram of a word embedding process provided by the embodiment of the present application, fig. 12 shows an illustration of an enabling table, for example, the enabling table may be a matrix W of 5*3, and if for a certain feature value, the corresponding unique hot encoding vector is x= (0, 1, 0), the unique hot encoding vector is subjected to word embedding processing, and the obtained word embedding vector is x=xw= (10, 12, 19).

2) Hidden layer

After vectorizing all feature information through an input layer of the tLTR model to obtain a feature vector X, the feature vector X can be sent to a hidden layer of the tLTR model so that the hidden layer encodes the feature vector X to obtain a deep expression H of the feature ⁽¹⁾ (i.e., hidden layer feature vector) as shown in the following equation:

H ⁽¹⁾ ＝σ ₁ (XW ⁽¹⁾ +b ⁽¹⁾ ) Equation 1

Wherein X represents a feature vector obtained through an input layer of the tLTR model, W ⁽¹⁾ A weight matrix representing the first hidden layer, b ⁽¹⁾ Bias matrix, sigma, representing the first hidden layer ₁ Representing the corresponding nonlinear activation function of the first hidden layer, e.g. including ReLU, sigmoid, tanh function, etc., H ⁽¹⁾ Representing the result of the encoding output by the first hidden layer.

In other embodiments, the tLTR model may further learn the expression of feature deeper layers by stacking more hidden layers, as shown in the following:

H ⁽²⁾ ＝σ ₂ (H ⁽¹⁾ W ⁽²⁾ +b ⁽²⁾ ) Equation 2

Wherein W is ⁽²⁾ A weight matrix representing a second hidden layer, b ⁽²⁾ A bias matrix, sigma, representing a second hidden layer ₂ Representing a non-linear activation function corresponding to the second hidden layer, H ⁽²⁾ Representing the encoded result output by the second hidden layer.

In some embodiments, the last may be hidden from view The result of the layer output coding, i.e. vector H ⁽ⁿ⁾ Each element (H) ₁ ⁽ⁿ⁾ ，H ₂ ⁽ⁿ⁾ ，…，H _n ⁽ⁿ⁾ ) Summing to obtain a scalar z, and activating the scalar z by using a sigmoid function to obtain a final prediction component s of the creative i _i The specific formula is as follows:

z＝H ₁ ⁽ⁿ⁾ +H ₂ ⁽ⁿ⁾ +…+H _n ⁽ⁿ⁾ equation 3

s _i =sigmoid (z) equation 4

3) Loss function

In some embodiments, adaptive training of the tLTR model may be achieved by designing a loss function in combination with inverse gradient propagation, where the loss function may employ pairwise loss function, simply the difference (s _i -s _j ) The label of the pair of samples is substituted into a loss function (such as a cross entropy loss function) for calculation, wherein the specific formula of the loss function is as follows:

wherein, when s _ij When=1, there are:

when s is _ij When= -1, there are:

wherein s is _i 、s _j Representing prediction scores obtained by subjecting creative i and creative j to sigmoid, respectively, and σ represents a sigmoid function, and when s _i ＞s _j Time s _ij When s is =1 _i ＜s _j Time s _ij ＝0。

In some embodiments, considering that the future possible number of creatives increases, the performance of the model is reduced, so that the reasoning speed of the model can be improved by adopting a binarization method, and a smaller loss of model effect can be brought about by adopting the binarization method, but the machine cost can be greatly reduced. In addition, training of the model may use a lisdwase, and training by using lisdwase may be considered. Meanwhile, the interaction between the features is not fully considered in the double-tower structure in fig. 11A, the model structure can be further explored, and the importance of the learning features of the attention mechanism can be introduced.

According to the information recommendation method provided by the embodiment of the application, on the basis of the rough arrangement result of advertisements, good creatives are selected in the multi-creative advertisements for exposure display, and a tLTR model is provided by adopting the thought of LTR+group, and the model fits the eCPM or pCTCVR sequence of each creative in the fine arrangement stage, so that a personalized, extensible and automatic selected advertisement creative optimization strategy is created, scoring pre-estimation and sequencing of a plurality of creatives in one advertisement are realized, the recommendation effect of the advertisement is optimized, and the utilization rate of advertisement resources is further improved.

Continuing with the description below of an exemplary structure of the information recommendation device 243 provided by the embodiments of the present application implemented as a software module, in some embodiments, as shown in fig. 4A, the software module stored in the information recommendation device 243 of the memory 240 may include: an acquisition module 2431, a prediction module 2432, a determination module 2433, and a recommendation module 2434.

An obtaining module 2431, configured to obtain at least one piece of multi-mode information to be sent to the object, where each piece of multi-mode information includes a plurality of presentation policies; the obtaining module 2431 is further configured to obtain display policy feature information and object feature information of an object corresponding to each of the plurality of display policies; the prediction module 2432 is configured to invoke a neural network model to perform prediction processing based on the feature information of each display policy and the feature information of the object, so as to obtain a ranking parameter corresponding to each display policy, where the neural network model is obtained by training based on ranking results of a plurality of applied display policy samples; a determining module 2433, configured to determine a target display policy corresponding to each piece of multi-mode information, where the target display policy is a display policy with a largest sorting parameter in the multi-mode information; the recommendation module 2434 is configured to perform a recommendation operation for the object based on at least one piece of multi-mode information to which the corresponding target presentation policy is respectively applied.

In some embodiments, the prediction module 2432 is further configured to, for each of the presentation policy feature information, invoke the neural network model to perform the following: vectorization processing is carried out on the display strategy feature information and the object feature information respectively, and display strategy feature vectors of the display strategy and object feature vectors of the objects are correspondingly obtained; encoding the displayed strategy feature vector and the object feature vector to obtain a hidden layer feature vector; and activating the hidden layer feature vector to obtain the ordering parameters of the display strategy.

In some embodiments, the encoding process is implemented by invoking at least one hidden layer of the neural network model, where the plurality of hidden layers are cascaded in turn to form a tower network when the number of at least one hidden layer is multiple; the prediction module 2432 is further configured to splice the display policy feature vector and the object feature vector to obtain a spliced vector; and the method is used for calling the tower network to encode the spliced vector so as to obtain the hidden layer feature vector.

In some embodiments, the acquiring module 2431 is further configured to acquire a data amount of at least one piece of multi-mode information; the information recommending apparatus 243 further includes a shift module 2435 for shifting to execute a step of calling the tower network to perform the encoding process based on the presentation policy feature vector and the object feature vector in response to the data amount being smaller than the data amount threshold.

In some embodiments, the prediction module 2432 is further configured to encode the spliced vector through a first hidden layer of the tower network; inputting the coding result output by the first hidden layer into the hidden layer of the subsequent cascade connection, and continuing to carry out coding processing through the hidden layer of the subsequent cascade connection until the last hidden layer; and determining the coding result output by the last hidden layer as a hidden layer characteristic vector.

In some embodiments, the encoding is performed by invoking at least one hidden layer of the neural network model, and when the number of at least one hidden layer is a plurality of hidden layers, the plurality of hidden layers is divided into two groups, the plurality of hidden layers of the first group are sequentially cascaded to form a first tower network, and the plurality of hidden layers of the second group are sequentially cascaded to form a second tower network; prediction module 2432 is further configured to perform, for each of the presentation policy feature vectors, the following: invoking a first tower network to encode the object feature vector to obtain a first sub-hidden layer feature vector; invoking a second tower network to encode the displayed strategy feature vector to obtain a second sub-hidden layer feature vector; and the method is used for carrying out point multiplication processing on the first sub-hidden layer feature vector and the second sub-hidden layer feature vector to obtain the hidden layer feature vector.

In some embodiments, the acquiring module 2431 is further configured to acquire a data amount of at least one piece of multi-mode information; the shift-in module 2435 is further configured to shift-in to execute the steps of calling the first tower network and the second tower network to perform the encoding process respectively based on the object feature vector and the presentation policy feature vector in response to the data amount being greater than or equal to the data amount threshold.

In some embodiments, the prediction module 2432 is further configured to perform a one-hot encoding process on the exhibition policy feature information to obtain a first one-hot encoding vector; the method comprises the steps of carrying out word embedding processing on a first single-hot encoding vector to obtain a display strategy feature vector of a display strategy; the prediction module 2432 is further configured to perform one-hot encoding processing on the object feature information to obtain a second one-hot encoding vector; and the method is used for carrying out word embedding processing on the second single-hot encoding vector to obtain an object feature vector of the object.

In some embodiments, the input layer of the neural network model includes a plurality of sub-input layers corresponding to a plurality of domains, respectively; a determining module 2433, configured to determine a domain to which the object belongs; the prediction module 2432 is further configured to invoke a sub-input layer corresponding to the domain to perform one-hot encoding processing on the object feature information, so as to obtain a second one-hot encoding vector.

In some embodiments, the obtaining module 2431 is further configured to obtain a display policy ranking result corresponding to each of the plurality of multimode information samples, where the display policy ranking result is obtained by ranking based on recommended parameters of the display policy samples in the multimode information samples; the information recommending apparatus 243 further includes a training module 2436 configured to sample a plurality of presentation strategy samples based on the presentation strategy sequencing result; carrying out combination treatment on the plurality of display strategy samples in pairs to obtain a plurality of display strategy sample pairs; comparing recommended parameters of two display strategy samples included in each display strategy sample pair; generating label data of each showing strategy sample pair according to the comparison result; the neural network model is trained based on the plurality of presentation policy sample pairs and the corresponding tag data.

In some embodiments, the information recommendation apparatus 243 further includes a generation module 2437 for performing, for each presentation policy sample pair, the following: when the recommended parameters of the first showing strategy sample in the showing strategy sample pair are larger than those of the second showing strategy sample, determining the label data of the showing strategy sample pair as 1; when the recommended parameter of the first display strategy sample in the display strategy sample pair is smaller than that of the second display strategy sample, the label data of the display strategy sample pair is determined to be 0.

In some embodiments, the determining module 2433 is further configured to determine a proportion of the samples based on a training speed of the neural network model, wherein the proportion is inversely related to the training speed; the training module 2436 is further configured to sample the display policy ranking result corresponding to each multimode information sample according to a proportion, so as to obtain a plurality of display policy samples.

In some embodiments, the information recommendation device 243 further includes a setting module 2438 for setting a mirror network of a tower network in the neural network model prior to training the neural network model based on the plurality of presentation policy sample pairs and the corresponding tag data, wherein the tower network is for receiving presentation policy feature information of one presentation policy sample in the presentation policy sample pairs, and the mirror network is for receiving presentation policy feature information of another presentation policy sample in the presentation policy sample pairs; the setup module 2438 is further configured to remove the mirrored network from the neural network model after training the neural network model based on the plurality of presentation policy sample pairs and the corresponding tag data.

In some embodiments, the setting module 2438 is further configured to set a mirror network of a second tower network in the neural network model prior to training the neural network model based on the plurality of presentation policy sample pairs and the corresponding tag data, wherein the second tower network is configured to receive presentation policy feature information of one presentation policy sample in the presentation policy sample pairs, and the mirror network is configured to receive presentation policy feature information of another presentation policy sample in the presentation policy sample pairs; and for removing the mirrored network from the neural network model after training the neural network model based on the plurality of presentation policy sample pairs and the corresponding tag data.

In some embodiments, the obtaining module 2431 is further configured to obtain multi-mode information that satisfies at least one of the following conditions: matching with object characteristic information of the object; the similarity between the object feature vector and the object is smaller than a similarity threshold; sorting the pieces of multi-mode information based on the predicted click rate; and screening at least one piece of multi-mode information with the top ranking from the ranking result.

In some embodiments, when the number of at least one piece of multi-mode information is a plurality of pieces, the information recommending apparatus 243 further includes a ranking module 2439 for ranking the plurality of pieces of multi-mode information to which the corresponding target presentation policy is applied, respectively, according to any one of the following ranking policies: sorting the pieces of multi-mode information respectively applied with the corresponding target display strategies according to the information types; randomly sequencing a plurality of pieces of multi-mode information to which corresponding target display strategies are respectively applied; and ordering the pieces of multi-mode information respectively applied with the corresponding target display strategies according to the operation rules.

It should be noted that, the description of the apparatus according to the embodiment of the present application is similar to the description of the embodiment of the method described above, and has similar beneficial effects as the embodiment of the method, so that a detailed description is omitted. The technical details of the information recommendation device provided in the embodiment of the present application may be understood according to the description of any one of fig. 6, fig. 7, or fig. 9.

Continuing with the description below of an exemplary architecture of the training apparatus 244 for information recommendation neural network models provided by embodiments of the present application implemented as software modules, in some embodiments, as shown in fig. 4B, the software modules stored in the training apparatus 244 for information recommendation neural network models of the memory 240 may include: an acquisition module 2441, a sampling module 2442, an execution module 2443, and a generation module 2444.

The acquiring module 2441 is configured to acquire display policy ranking results corresponding to the multiple multimode information samples, where the display policy ranking results are obtained by ranking based on recommended parameters of the display policy samples in the multimode information samples; a sampling module 2442, configured to sample a plurality of presentation policy samples based on the presentation policy ranking result; an execution module 2443 to invoke the initialized neural network model to perform a training task based on the plurality of presentation policy samples to update parameters of the neural network model; a generation module 2444 is configured to generate a trained neural network model based on the updated parameters.

In some embodiments, the execution module 2443 is further configured to perform pairwise combination processing on the plurality of presentation policy samples to obtain a plurality of presentation policy sample pairs; comparing recommended parameters of two display strategy samples included in each display strategy sample pair, and generating label data of each display strategy sample pair according to a comparison result; invoking an initialized neural network model to execute a first training task based on a plurality of presentation strategy sample pairs and corresponding label data; wherein the first training task comprises: and carrying out prediction processing on the neural network model which is invoked to be initialized based on the demonstration strategy sample, and updating parameters of the neural network model based on the difference between the comparison result obtained by prediction and the tag data.

It should be noted that, the description of the apparatus according to the embodiment of the present application is similar to the description of the embodiment of the method described above, and has similar beneficial effects as the embodiment of the method, so that a detailed description is omitted. The technical details of the training device for neural network model for information recommendation provided in the embodiment of the present application may be understood from the description of fig. 5.

Embodiments of the present application provide a computer program product comprising a computer program or computer-executable instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer-executable instructions from the computer-readable storage medium, and executes the computer-executable instructions, so that the computer device performs the information recommendation method or the training method for the neural network model for information recommendation according to the embodiment of the present application.

An embodiment of the present application provides a computer-readable storage medium storing computer-executable instructions, in which the computer-executable instructions are stored, which when executed by a processor, cause the processor to perform an information recommendation method provided by an embodiment of the present application, for example, an information recommendation method as shown in fig. 6, fig. 7, or fig. 9, or perform a training method for a neural network model for information recommendation provided by an embodiment of the present application, for example, a training method for a neural network model for information recommendation as shown in fig. 5.

In some embodiments, the computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.

In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, executable instructions may be deployed to be executed on one electronic device or on multiple electronic devices located at one site or, alternatively, on multiple electronic devices distributed across multiple sites and interconnected by a communication network.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. An information recommendation method, the method comprising:

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

and invoking a neural network model to perform prediction processing based on the feature information of each display strategy and the feature information of the object to obtain a sequencing parameter corresponding to each display strategy, wherein the sequencing parameter comprises:

And for each presentation strategy characteristic information, calling the neural network model to execute the following processing:

vectorizing the display strategy feature information and the object feature information respectively to correspondingly obtain a display strategy feature vector of the display strategy and an object feature vector of the object;

encoding the display strategy feature vector and the object feature vector to obtain a hidden layer feature vector;

and activating the hidden layer feature vector to obtain the ordering parameters of the display strategy.

3. The method of claim 2, wherein the step of determining the position of the substrate comprises,

the coding processing is realized by calling at least one hidden layer of the neural network model, and when the number of the at least one hidden layer is a plurality of hidden layers, the hidden layers are sequentially cascaded to form a tower network;

the encoding processing is performed on the display strategy feature vector and the object feature vector to obtain a hidden layer feature vector, which comprises the following steps:

performing splicing processing on the display strategy feature vector and the object feature vector to obtain a spliced vector;

and calling the tower network to encode the spliced vector to obtain a hidden layer feature vector.

4. A method according to claim 3, wherein before encoding the presentation policy feature vector and the object feature vector to obtain a hidden layer feature vector, the method further comprises:

acquiring the data volume of the at least one piece of multi-mode information;

and responding to the data quantity being smaller than a data quantity threshold, and turning to the step of calling the tower network to carry out coding processing based on the display strategy feature vector and the object feature vector.

5. A method according to claim 3, wherein said invoking said tower network to encode said stitched vector results in a hidden layer feature vector comprises:

encoding the spliced vector through a first hidden layer of the tower network;

inputting the coding result output by the first hidden layer to a hidden layer of a subsequent cascade connection, and continuing coding processing through the hidden layer of the subsequent cascade connection until the last hidden layer;

and determining the coding result output by the last hidden layer as the hidden layer feature vector.

6. The method of claim 2, wherein the step of determining the position of the substrate comprises,

The coding processing is realized by calling at least one hidden layer of the neural network model, when the number of the at least one hidden layer is multiple, the hidden layers are divided into two groups, the hidden layers of the first group are sequentially cascaded to form a first tower network, and the hidden layers of the second group are sequentially cascaded to form a second tower network;

the following processing is performed for each of the presentation policy feature vectors:

invoking the first tower network to encode the object feature vector to obtain a first sub-hidden layer feature vector;

invoking the second tower network to encode the display strategy feature vector to obtain a second sub-hidden layer feature vector;

and carrying out point multiplication processing on the first sub-hidden layer feature vector and the second sub-hidden layer feature vector to obtain a hidden layer feature vector.

7. The method of claim 6, wherein prior to encoding the presentation policy feature vector and the object feature vector to obtain a hidden layer feature vector, the method further comprises:

Acquiring the data volume of the at least one piece of multi-mode information;

and responding to the data quantity being greater than or equal to a data quantity threshold, switching to execute a step of respectively calling the first tower network and the second tower network to carry out coding processing based on the object feature vector and the display strategy feature vector.

8. The method of claim 2, wherein the step of determining the position of the substrate comprises,

the vectorizing processing is performed on the display policy feature information and the object feature information respectively, so as to obtain a display policy feature vector of the display policy and an object feature vector of the object correspondingly, which comprise:

performing one-hot coding treatment on the display strategy characteristic information to obtain a first one-hot coding vector; word embedding processing is carried out on the first single-hot encoding vector, so that a display strategy feature vector of the display strategy is obtained;

performing one-time thermal coding treatment on the object characteristic information to obtain a second one-time thermal coding vector; and carrying out word embedding processing on the second single-hot encoding vector to obtain an object feature vector of the object.

9. The method of claim 8, wherein the step of determining the position of the first electrode is performed,

the input layer of the neural network model comprises a plurality of sub-input layers respectively corresponding to a plurality of fields;

The performing the one-hot encoding processing on the object feature information to obtain a second one-hot encoding vector, including:

determining the domain to which the object belongs;

and calling the sub-input layer corresponding to the field to perform one-hot coding processing on the object characteristic information to obtain a second one-hot coding vector.

10. The method of any of claims 1-9, wherein prior to invoking the neural network model for predictive processing, the method further comprises:

carrying out combination treatment on the plurality of display strategy samples in pairs to obtain a plurality of display strategy sample pairs;

comparing recommended parameters of the two display strategy samples included in each display strategy sample pair, and generating label data of each display strategy sample pair according to a comparison result;

training the neural network model based on the plurality of presentation strategy sample pairs and the corresponding tag data.

11. The method of claim 10, wherein generating tag data for each of the presentation policy sample pairs based on the comparison result comprises:

for each of the presentation policy sample pairs, performing the following:

when the recommended parameters of a first display strategy sample in the display strategy sample pair are larger than those of a second display strategy sample, determining the label data of the display strategy sample pair as 1;

and when the recommended parameters of the first display strategy sample in the display strategy sample pair are smaller than those of the second display strategy sample, determining the label data of the display strategy sample pair as 0.

12. The method of claim 10, wherein the sampling a plurality of presentation policy samples based on the presentation policy ranking results comprises:

determining a sampling proportion according to the training speed of the neural network model, wherein the proportion is inversely related to the training speed;

and sampling the display strategy sequencing results corresponding to each multimode information sample according to the proportion to obtain a plurality of display strategy samples.

13. The method of claim 10, wherein the step of determining the position of the first electrode is performed,

When the neural network model includes a plurality of hidden layers and a plurality of hidden layers are sequentially cascaded to form a tower network, before training the neural network model based on the plurality of presentation policy sample pairs and the corresponding tag data, the method further includes:

setting a mirror network of the tower network in the neural network model, wherein the tower network is used for receiving the display policy characteristic information of one display policy sample in the display policy sample pair, and the mirror network is used for receiving the display policy characteristic information of the other display policy sample in the display policy sample pair;

after training the neural network model based on the plurality of presentation strategy sample pairs and the corresponding tag data, the method further comprises:

the mirror network is removed from the neural network model.

14. The method of claim 10, wherein the step of determining the position of the first electrode is performed,

when the plurality of hidden layers included in the neural network model are divided into two groups, the plurality of hidden layers of the first group are sequentially cascaded to form a first tower network, and the plurality of hidden layers of the second group are sequentially cascaded to form a second tower network, before training the neural network model based on the plurality of presentation policy sample pairs and the corresponding tag data, the method further includes:

Setting a mirror network of the second tower network in the neural network model, wherein the second tower network is used for receiving the display policy characteristic information of one display policy sample in the display policy sample pair, and the mirror network is used for receiving the display policy characteristic information of the other display policy sample in the display policy sample pair;

the mirror network is removed from the neural network model.

15. The method according to any of claims 1-9, wherein said obtaining at least one piece of multimodal information to be sent to the object comprises:

acquiring multimode information satisfying at least one of the following conditions: matching object feature information of the object; the similarity between the object feature vector and the object is smaller than a similarity threshold;

and sorting the pieces of multi-mode information based on the predicted click rate, and screening at least one piece of multi-mode information with the front sorting from the sorting result.

16. The method according to any one of claims 1 to 9, wherein,

When the number of the at least one piece of multi-mode information is a plurality of pieces, before performing a recommended operation for the object based on the at least one piece of multi-mode information to which the corresponding target presentation policy is respectively applied, the method further includes:

sorting the pieces of multi-mode information to which the corresponding target presentation policies are applied, respectively, according to any one of the following sorting policies:

sorting a plurality of pieces of multi-mode information to which the corresponding target display strategies are respectively applied according to information types;

randomly sequencing a plurality of pieces of multi-mode information respectively applied with the corresponding target display strategies;

and sorting the pieces of multi-mode information respectively applied with the corresponding target display strategies according to operation rules.

17. A method for training a neural network model for information recommendation, the method comprising:

18. The method of claim 17, wherein the invoking the initialized neural network model to perform a training task based on the plurality of presentation policy samples to update parameters of the neural network model comprises:

invoking the initialized neural network model to execute a first training task based on the plurality of presentation strategy sample pairs and the corresponding tag data; wherein the first training task comprises: and carrying out prediction processing on the neural network model which is invoked to be initialized based on the demonstration strategy sample, and updating parameters of the neural network model based on the difference between the comparison result obtained by prediction and the tag data.

19. An information recommendation device, characterized in that the device comprises:

20. A training apparatus for a neural network model for information recommendation, the apparatus comprising:

21. An electronic device, the electronic device comprising:

a memory for storing executable instructions;

a processor for implementing the information recommendation method of any one of claims 1 to 16 or the training method of the neural network model for information recommendation of any one of claims 17 to 18 when executing the executable instructions stored in the memory.

22. A computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the method of information recommendation according to any one of claims 1 to 16 or the method of training a neural network model for information recommendation according to any one of claims 17 to 18.

23. A computer program product comprising a computer program or computer executable instructions which, when executed by a processor, implement the information recommendation method of any one of claims 1 to 16 or the training method of the neural network model for information recommendation of any one of claims 17 to 18.