CN114519593A

CN114519593A - Resource recall model updating method and device, electronic equipment and storage medium

Info

Publication number: CN114519593A
Application number: CN202011288307.8A
Authority: CN
Inventors: 肖严; 赵惜墨; 李俊杰
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2022-05-20

Abstract

The present disclosure relates to a resource recall model updating method, apparatus, electronic device and storage medium, the method comprising: acquiring a resource click log corresponding to the current model updating period; determining a click resource pool according to the resource click log, wherein the click resource represents a resource clicked by a user in a time period between the current model updating period and the last model updating period; selecting a first click resource in a click resource pool, taking the first click resource as a positive sample resource, and determining a corresponding negative sample resource from click resources except the first click resource; generating training samples in a training sample set according to the positive sample resources, the corresponding negative sample resources and the user; and optimizing parameters in the resource recall model by using a preset loss function according to the training sample set to obtain an updated resource recall model. The method and the device solve the problem of poor resource recall accuracy caused by the fact that the resource recall model cannot sufficiently learn useful information matched with a resource recall process in the training data.

Description

Resource recall model updating method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for updating a resource recall model, an electronic device, and a storage medium.

Background

At present, the business processing flow of each media platform advertisement system generally comprises a targeting phase, a recalling phase, a sequencing phase and a display phase. In the targeting stage, the advertising system receives targeting information of the delivered advertisements, such as crowd targeting, user information, region targeting and the like, set by an advertiser, and the targeting information can help the advertiser to better define a target user of the advertiser; in the recall stage, the advertisement system carries out primary screening on advertisements in the advertisement library according to the advertisement push request; in the sorting stage, the advertisement system sorts the preliminarily screened advertisements, and further screens out the advertisements finally put according to the sorting result; in the display stage, the advertisement system issues the screening results in the sorting stage to the user client for advertisement exposure. The business processing flow of the advertisement system can know that the recall stage serves the subsequent sorting stage, and the recall accuracy directly influences the accuracy of the subsequent sorted candidate set, so that the accuracy of the advertisement finally pushed to the user is influenced critically.

In the related technology, the advertisement system usually uses a trained advertisement recall model to recall the advertisement in a recall stage, the advertisement recall model generally adopts online streaming training, a positive sample in the used training data is the advertisement clicked by the user, a negative sample is the advertisement displayed to the user but not clicked, the number of the negative samples is far greater than that of the positive samples under normal conditions, the proportion of the positive and negative samples is unbalanced, the feedback of the positive sample is easily submerged in the feedback of the negative sample, so that the advertisement recall model cannot fully learn useful information in the training data, and further the accuracy of advertisement recall is poor.

Disclosure of Invention

The present disclosure provides a resource recall model updating method, apparatus, electronic device and storage medium, to at least solve the problem in the related art that the resource recall model for resources such as advertisements cannot sufficiently learn useful information in training data, resulting in poor accuracy of resource recall. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a resource recall model updating method, including:

acquiring a resource click log corresponding to the current model updating period;

determining a click resource pool according to the resource click log, wherein the click resource pool comprises at least one click resource, the click resource represents a resource clicked by a user in a first historical time period, and the first historical time period is a time period between a current model updating period and a previous model updating period;

selecting a first click resource in the click resource pool, taking the first click resource as a positive sample resource, and determining a negative sample resource corresponding to the positive sample resource from the remaining click resources; the first click resource is any click resource in the click resource pool, and the rest click resources are click resources except the first click resource in the click resource pool;

generating a training sample in a training sample set according to a positive sample resource, a negative sample resource corresponding to the positive sample resource and a user clicking the positive sample resource;

optimizing parameters in the resource recall model by using a preset loss function according to the training sample set to obtain an updated resource recall model; the preset loss function is determined based on a degree of matching between a user and a positive sample resource in the training sample and a degree of matching between the user and a negative sample resource.

In an exemplary embodiment, the training samples in the training sample set are generated according to a positive sample resource, a negative sample resource corresponding to the positive sample resource, and a user clicking the positive sample resource; the method comprises the following steps:

determining a user clicking the positive sample resource, and acquiring the user characteristics of the user;

determining a positive sample resource characteristic of the positive sample resource;

determining negative sample resource characteristics of negative sample resources corresponding to the positive sample resources;

and forming training samples in a training sample set by using the user characteristics, the positive sample resource characteristics and the negative sample resource characteristics.

In an exemplary embodiment, the optimizing the parameters in the resource recall model using the preset loss function according to the training sample set includes:

inputting user characteristics, positive sample resource characteristics and negative sample resource characteristics included by the training samples into a resource recall model aiming at each training sample in the training sample set;

obtaining a user feature vector according to the user features through a user neural network of the resource recall model;

respectively obtaining a positive sample resource vector and a negative sample resource vector according to the positive sample resource characteristics and the negative sample resource characteristics through a resource neural network of the resource recall model;

determining a first degree of match between the user feature vector and the positive sample resource vector, and determining a second degree of match between the user feature vector and the negative sample resource vector;

and optimizing parameters in the resource recall model by a preset loss function based on the first matching degree and the second matching degree.

In an exemplary embodiment, the preset loss function includes a first difference term, a second difference term, and a most significant term;

the optimizing, by the preset loss function, the parameters in the resource recall model based on the first matching degree and the second matching degree includes:

determining a first difference between the second matching degree and the first matching degree through the first difference item;

determining a second difference value between the first difference value and a preset fixed value through the second difference value item;

taking the maximum value of the second difference value and a numerical value zero through the maximum value item, and taking the maximum value as a function value of the preset loss function;

and adjusting parameters in the resource recall model according to the direction of minimizing the function value until a training end condition is met.

In an exemplary embodiment, the method further comprises:

responding to a resource pushing request of a target user, and determining a candidate resource set matched with the target user according to the directional information of each resource in a resource library;

determining the target user characteristics of the target user and the candidate resource characteristics of each candidate resource in the candidate resource set;

inputting the target user characteristics and the candidate resource characteristics of the candidate resources into the updated resource recall model to obtain the matching degree between the target user and the candidate resources;

determining at least one target candidate resource from the candidate resource set according to the matching degree between the target user and each candidate resource;

and determining a return result corresponding to the resource pushing request according to the at least one target candidate resource, wherein the return result comprises the target resource to be displayed.

In an exemplary embodiment, after determining a return result corresponding to the resource pushing request according to the at least one target candidate resource, the method further includes:

responding to the click operation of the target user on the target resource, and acquiring a user identifier of the target user and a resource identifier of the target resource;

and generating a resource click log corresponding to the next model updating period according to the user identifier of the target user and the resource identifier of the target resource.

According to a second aspect of the embodiments of the present disclosure, there is provided a resource recall model updating apparatus including:

the click log obtaining unit is configured to execute obtaining of a resource click log corresponding to the current model updating period;

a first determining unit configured to perform determining a click resource pool according to the resource click log, where the click resource pool includes at least one click resource, the click resource represents a resource clicked by a user within a first historical time period, and the first historical time period is a time period between a current model update cycle and a previous model update cycle;

the negative sampling unit is configured to select a first click resource in the click resource pool, determine a negative sample resource corresponding to the positive sample resource from the remaining click resources by taking the first click resource as the positive sample resource; the first click resource is any click resource in the click resource pool, and the rest click resources are click resources except the first click resource in the click resource pool;

the training data generation unit is configured to execute generation of training samples in a training sample set according to positive sample resources, negative sample resources corresponding to the positive sample resources and users clicking the positive sample resources;

the model updating unit is configured to optimize parameters in the resource recall model by using a preset loss function according to the training sample set to obtain an updated resource recall model; the preset loss function is determined based on a degree of matching between a user and a positive sample resource in the training sample and a degree of matching between the user and a negative sample resource.

In an exemplary embodiment, the training data generating unit includes:

a first determining unit configured to perform determining a user clicking the positive sample resource, and obtain a user characteristic of the user;

a second determination unit configured to perform determining a positive sample resource characteristic of the positive sample resource;

a third determining unit configured to perform determining a negative sample resource characteristic of a negative sample resource corresponding to the positive sample resource;

a generating subunit configured to perform forming a training sample in a training sample set with the user feature, the positive sample resource feature, and the negative sample resource feature.

In an exemplary embodiment, the model updating unit includes:

an input unit configured to perform, for each training sample in the set of training samples, inputting a user characteristic, a positive sample resource characteristic, and a negative sample resource characteristic included in the training sample into a resource recall model;

a first network unit configured to execute a user neural network passing through the resource recall model, and obtain a user feature vector according to the user feature;

a second network unit configured to execute a resource neural network passing through the resource recall model, and obtain a positive sample resource vector and a negative sample resource vector according to the positive sample resource feature and the negative sample resource feature respectively;

a matching unit configured to perform determining a first degree of matching between the user feature vector and the positive sample resource vector, and determining a second degree of matching between the user feature vector and the negative sample resource vector;

an optimization unit configured to execute a preset loss function to optimize parameters in the resource recall model based on the first matching degree and the second matching degree.

In an exemplary embodiment, the preset loss function includes a first difference term, a second difference term, and a most significant term; the optimization unit includes:

a fourth determination unit configured to perform determination of a first difference between the second matching degree and the first matching degree by the first difference term;

a fifth determination unit configured to perform determination of a second difference between the first difference and a preset fixed value by the second difference term;

a sixth determining unit configured to perform taking a maximum value of the second difference value and a value zero by the maximum value term, and taking the maximum value as a function value of the preset loss function;

and the parameter adjusting unit is configured to adjust the parameters in the resource recall model according to the direction of minimizing the function value until a training end condition is met.

In an exemplary embodiment, the apparatus further comprises:

the request response unit is configured to execute a resource pushing request responding to a target user, and determine a candidate resource set matched with the target user according to the orientation information of each resource in the resource library;

a seventh determining unit configured to perform determining a target user characteristic of the target user and a candidate resource characteristic of each candidate resource in the candidate resource set;

a matching degree prediction unit configured to input the target user characteristics and the candidate resource characteristics of the candidate resources into the updated resource recall model to obtain matching degrees between the target user and the candidate resources;

an eighth determining unit configured to perform determining at least one target candidate resource from the candidate resource set according to a matching degree between the target user and each candidate resource;

a ninth determining unit, configured to perform determining, according to the at least one target candidate resource, a return result corresponding to the resource pushing request, where the return result includes a target resource to be displayed.

In an exemplary embodiment, the apparatus further comprises:

the acquisition unit is configured to execute click operation of the target user on the target resource, and acquire a user identifier of the target user and a resource identifier of the target resource;

and the click log generation unit is configured to execute resource click logs corresponding to the next model updating period according to the user identification of the target user and the resource identification of the target resource.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the resource recall model updating method according to any of the embodiments described above.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a storage medium, where instructions, when executed by a processor of an electronic device, enable the electronic device to perform the resource recall model updating method according to any one of the foregoing embodiments.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions stored in a computer readable storage medium. Reading the computer instructions from the computer-readable storage medium by a processor of the electronic device, and executing the computer instructions by the processor to enable the electronic device to execute the resource recall model updating method provided in any one of the above embodiments;

the technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

all the resources clicked by the user in the time period between the current model updating period and the last model updating period form a clicked resource pool, the clicked resource pool is sampled negatively, any clicked resource in the clicked resource pool is taken as a positive sample resource, a negative sample resource corresponding to the positive sample resource is determined from the rest clicked resources, and combines the user clicking the positive sample resource to generate the training sample in the training sample set, so that the proportion of the positive sample and the proportion of the negative sample used for training are balanced, and then parameters in the resource recall model are optimized by using a preset loss function according to the training sample set to obtain an updated resource recall model, the resource recall model can fully learn useful information which is matched with the resource recall process in the training data, and therefore the accuracy of resource recall based on the resource recall model is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a diagram of an application environment illustrating a resource recall model update method in accordance with an exemplary embodiment;

FIG. 2 is a flowchart illustrating a resource recall model update method in accordance with an exemplary embodiment;

FIG. 3 is a schematic diagram illustrating a training update to a resource recall model in accordance with an exemplary embodiment;

FIG. 4 is a flowchart illustrating another resource recall model update method in accordance with an exemplary embodiment;

FIG. 5 is a block diagram illustrating a resource recall model update apparatus in accordance with an exemplary embodiment;

FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Referring to fig. 1, a diagram of an application environment of a resource recall model updating method according to an exemplary embodiment is shown, where the application environment may include a server 110 and a plurality of terminals 120, and the server 110 and the plurality of terminals 120 may be connected through a wired network or a wireless network.

The plurality of terminals 120 may be, but are not limited to, smart phones, tablet computers, notebook computers, desktop computers, and the like. The terminal 120 may have installed therein client software, such as an Application program (App for short), which provides corresponding service functions, and the service functions may include, but are not limited to, a live broadcast function and a short video broadcast function, that is, the Application program may include, but is not limited to, a live App and a short video App. The user of the terminal 120 may log into the application through pre-registered user information, which may include an account number and a password.

The server 110 may be a server that provides a background service for the application program in the terminal 120, may also be another server that is connected and communicated with the background server of the application program, may be one server, or may be a server cluster composed of multiple servers. The server 110 may provide a resource pushing service, for example, when a user logs in to an application or refreshes a page of the application, the server 110 may push resources such as advertisements to the user.

In one application scenario, the server 110 may be a server in an advertisement system for providing an advertisement recall service, and the server 110 may recall the advertisement in the advertisement system based on an advertisement recall model when the advertisement recall service is provided, where the advertisement recall model may be updated by using an online streaming training manner according to an embodiment of the present disclosure. A model update period may be set in the online streaming training update process, when the model update period is reached, the server 110 performs training update on the advertisement recall model, for example, the server 110 may perform training update on the advertisement recall model every 15 minutes, taking a time period of 9:30 to 10:00 as an example, the server 110 performs training update on the advertisement recall model at 9:30, the next model update period is 9:45, when the time period reaches 9:45, the server 110 may perform training update on the advertisement recall model updated at 9:30 in the previous model update period, and similarly, when the time period reaches 10:00, the server 110 may perform training update on the advertisement recall model updated at 9:45 in the previous model update period.

It is understood that the resources in the embodiments of the present disclosure are not limited to advertisements, and may also include other resources that can be pushed to the user of the application, for example, news, videos, and the like, and the embodiments of the present disclosure are not limited in this respect.

The resource recall model updating method according to the embodiment of the present disclosure is described in detail below by taking a model updating period, i.e., a current model updating period, as an example.

FIG. 2 is a flowchart illustrating a resource recall model method according to an exemplary embodiment, illustrated in FIG. 2, as applied to the server 110 shown in FIG. 1 by a resource recall model update method, including the following steps.

In step S210, a resource click log corresponding to the current model update period is obtained.

In practical application, after the server pushes resources to a user of an application program and displays the resources through a corresponding terminal, the user can select to click the displayed resources or not click the displayed resources according to the interested condition of the user to the displayed resources, when the user clicks the displayed resources, the server can respond to the clicking operation of the user on the resources to acquire the user identification of the user and the resource identification of the clicked resources, and generate a resource clicking log based on the user identification and the resource identification of the clicked resources.

The server can store the generated resource click logs, and since the server trains and updates the resource recall model according to the set model update cycle in the embodiment of the disclosure, the server can store the resource click logs in different time periods, the resource click logs received in the time period between two adjacent model update cycles are stored as a batch, for example, the time period between two adjacent model update cycles is 9: 30-9: 45, and the server stores the resource click logs received between 9: 30-9: 45 as a batch. When the current model updating period is reached, for example, 9:45, the server may obtain a resource click log corresponding to the current model updating period, where the resource click log is a resource click log in a time period between the current model updating period and a previous model updating period, for example, 9: 30-9: 45.

In a specific implementation, the server may set an update timer, and trigger the server to perform training update on the resource recall model when the current model update period is reached through a timing function of the update timer, for example, the update timer may be set to trigger the server to execute the above-mentioned action of acquiring the resource click log corresponding to the current model update period every 15 minutes, so as to start training update on the resource recall model.

In step S220, a clicked resource pool is determined according to the resource click log.

The click resource pool comprises at least one click resource, the click resource represents a resource clicked by a user in a first historical time period, and the first historical time period is a time period between a current model updating period and a last model updating period.

Specifically, the server may obtain the resource identifier in the resource click log, search for the resource in the resource library matching the resource identifier, and place the resource in the resource library matching the resource identifier as a click resource in the click resource pool. All the resources to be pushed are stored in the resource library, and the resource library can be located locally in the server or in the distributed database system.

In step S230, a first click resource in the click resource pool is selected, and a negative sample resource corresponding to the positive sample resource is determined from the remaining click resources by using the first click resource as the positive sample resource.

The first click resource is any click resource in the click resource pool, and the rest click resources are click resources except the first click resource in the click resource pool.

With a click resource pool comprising (i)₁，i₂，...，i_N) For example, where i represents the above click resource, then for any click resource i_k(k is more than or equal to 1 and less than or equal to N), the server takes the click resource as a positive sample resource

From the remaining click resources (i)₁，...，i_k-1，i_k+1…，i_N) To determine the positive sample resource

Corresponding negative sample resources

In particular, the click resources (i) can be selected from the remaining click resources₁，i_k-1，i_k+1…，i_N) Randomly selecting one as a positive sample resource

Corresponding negative sample resources

Thereby obtaining positive sample resource-negative sample resource pairs

Wherein

Then for the click resource pool (i) based on the sampling process described above₁，i₂，...，i_N) The following results can be obtained:

therefore, the proportion of the positive sample resource and the negative sample resource obtained after sampling treatment is balanced, the positive sample resource and the negative sample resource are both resources clicked by a user in the model updating period, and the situation that the feedback of the positive sample is submerged by the feedback of the negative sample in the subsequent training can be avoided.

In step S240, a training sample in the training sample set is generated according to the positive sample resource, the negative sample resource corresponding to the positive sample resource, and the user clicking the positive sample resource.

In one possible embodiment, the step of generating, by the server, the training samples in the training sample set according to the user, the positive sample resource clicked by the user, and the negative sample resource corresponding to the positive sample resource may include the following steps:

(1) and determining the user clicking the positive sample resource, and acquiring the user characteristics of the user.

Specifically, the server may obtain a user identifier corresponding to each clicked resource from the clicked resource log, where the user identifier is used to identify a user clicking the clicked resource, obtain a user portrait through the user identifier, where the user portrait is used to outline the user at a data level, and may specifically be composed of a plurality of user tags, and in the context of the big data era, the user information is flooded in the network. The server may directly determine a user representation corresponding to the user clicking the positive sample resource as the user feature of the user, or may determine one or more user tags in the user representation as the user feature of the user.

(2) And determining the positive sample resource characteristics of the positive sample resources.

(3) And determining the negative sample resource characteristics of the negative sample resources corresponding to the positive sample resources.

In the embodiment of the present disclosure, the resource characteristics of each resource may include, but are not limited to, a resource category, resource orientation information, a resource price, a resource pushing record, and the like, and the server may correspondingly store and update the resource characteristics of each resource in real time for each resource in the resource library. The server may determine one or more of resource features (such as resource category, resource orientation information, resource price, resource pushing record, and the like) corresponding to the positive sample resource as the positive sample resource feature of the positive sample resource; similarly, the server may determine one or more of the resource characteristics (such as resource category, resource pushing condition, resource price, resource pushing record, and the like) corresponding to the negative sample resource as the negative sample resource characteristics.

(4) And forming a training sample in the training sample set by using the user characteristics, the positive sample resource characteristics and the corresponding negative sample resource characteristics.

In particular, with u_kRepresenting click positive sample resources

User of (1), F (u)_k) Representing user u_kThe user u_kClicked positive sample resource

Is represented as

The positive sample resource

Corresponding negative sample resources

Is represented as

The server may compose a training sample

The training sample is a triplet. Then, for the click resource pool (i)₁，i₂，...，i_N) The generated training sample set may be represented as:

according to the embodiment of the disclosure, for each positive sample resource, the user characteristics of the user clicking the positive sample resource are obtained, the positive sample resource characteristics of the positive sample resource and the negative sample resource characteristics of the negative sample resource corresponding to the positive sample resource construct the training sample with a triple structure, so that the useful information in the training sample set can be fully learned during subsequent model training, and the model parameters can be adjusted more accurately.

In step S250, parameters in the resource recall model are optimized by using a preset loss function according to the training sample set, so as to obtain an updated resource recall model.

And the preset loss function is determined based on the matching degree between the user and the positive sample resource in the training sample and the matching degree between the user and the negative sample resource. The preset loss function can optimize the sequence between the positive sample resource and the negative sample resource, namely, the updated resource recall model obtained by optimizing the parameters in the resource recall model by using the preset loss function according to the training sample set can learn the sequence relation in the training sample, and the sequence relation is also involved in the resource recall, so that the updated resource recall model is more matched with the substantial process of the resource recall, and the resource recall model is favorable for fully learning the useful information which is more matched with the actual recall process in the training data.

In the embodiment of the present disclosure, as shown in fig. 3, the resource recall model may include a user neural network, a resource neural network and a matching node, where the user neural network is configured to perform vector expression on user characteristics, the resource neural network is configured to perform vector expression on positive sample resource characteristics and negative sample resource characteristics, and the positive sample resource characteristics and the negative sample resource characteristics share one resource neural network. The output ends of the user neural network and the resource neural network are respectively connected with a matching node, and the matching node is used for calculating the matching degree between the user and the positive sample resource and the matching degree between the user and the negative sample resource.

The user neural network and the resource neural network can map the input features into the same k-dimensional Embedding space. Illustratively, the user Neural network and the resource Neural network can be DNN (deep Neural network) networks, each DNN network comprises an input layer, a hidden layer and an output layer, and all layers are connected by adopting full connection, namely any neuron of the ith layer is connected with any neuron of the (i + 1) th layer.

Based on this, the server, when optimizing the parameters in the resource recall model using the preset loss function according to the training sample set, may include the following steps:

(1) and inputting the user characteristics, the positive sample resource characteristics and the negative sample resource characteristics included by the training samples into the resource recall model aiming at each training sample in the training sample set.

(2) And obtaining a user feature vector according to the user features through a user neural network of the resource recall model.

(3) And respectively obtaining a positive sample resource vector and a negative sample resource vector according to the positive sample resource characteristics and the negative sample resource characteristics through a resource neural network of the resource recall model.

(4) And determining a first matching degree between the user characteristic vector and the positive sample resource vector and a second matching degree between the user characteristic vector and the negative sample resource vector through a matching neural network of the resource recall model.

To train the sample

For example, continuing with FIG. 3, after the resource recall model receives the input training sample, the user features F (u) in the training sample are used_k) Inputting the positive sample resource characteristics in the training sample into a user neural network

And negative sample resource characteristics

Inputting the data into a resource neural network, and obtaining a user characteristic F (u) through a user neural network_k) The vector expression of (1), namely the user characteristic vector q_kRespectively obtaining the positive sample resource characteristics through a resource neural network

Vector representation of (i.e. positive sample resource vector)

And negative sample resource characteristics

Vector representation of (i.e. negative sample resource vector)

The above vector expression process can be expressed as:

wherein FC denotes a full connection layer, θ₁Representing a parameter to be optimized in the neural network of the user, theta₂Representing the parameters to be optimized in the resource neural network.

User feature vector q_kPositive sample resource vector

And negative sample resource vector

As input to the matching node, the matching node calculates a user feature vector q_kAnd positive sample resource vector

A first degree of matching therebetween, and a user feature vector q_kAnd negative sample resource vector

A second degree of match therebetween, and the degree of match may be characterized, for example, by a cosine similarity.

(5) And the preset loss function optimizes parameters in the resource recall model based on the first matching degree and the second matching degree.

In the embodiment of the disclosure, the parameter to be optimized in the resource recall model includes the parameter θ to be optimized in the user neural network₁And a parameter theta to be optimized in the resource neural network₂. The method comprises the steps that vector expression of a user neural network on user characteristics and vector expression of the resource neural network on positive sample resource characteristics and negative sample resource characteristics are carried out, a first matching degree is obtained based on a vector expression result to represent the matching degree between a user and positive sample resources, a second matching degree is obtained to represent the matching degree between the user and negative sample resources, and therefore a preset loss function can optimize parameters of the user neural network and the resource neural network in a resource recall model based on the first matching degree and the second matching degree to obtain an updated resource recall model.

Specifically, the preset loss function may include a first difference item, a second difference item, and a maximum item, when the preset loss function optimizes a parameter in the resource recall model based on the first matching degree and the second matching degree, a first difference between the second matching degree and the first matching degree may be calculated by the first difference item, a second difference between the first difference and a preset fixed value may be calculated by the second difference item, a maximum value between the second difference and a value zero may be taken by the maximum item, the maximum value is used as a function value of the preset loss function, and then the parameter in the resource recall model is adjusted in a direction of minimizing the function value until a training end condition is satisfied. For example, the predetermined loss function may be expressed as:

wherein θ represents a parameter of the resource recall model, including embodiments of the present disclosureTheta as defined above₁And theta₂；

Representing a first degree of match;

representing a second degree of match;

representing a first difference;

representing a second difference; m represents a preset fixed value, the specific value of the preset fixed value can be set according to actual needs, for example, m can be a value in a set range of 0.2-0.3, and the preset Loss function Loss allows the error between the positive sample resource and the negative sample resource to be controlled between 0 and m. The optimization of parameters in the resource recall model based on the first matching degree and the second matching degree is realized through the preset loss function constructed by the first difference item, the second difference item and the most value item, so that the resource recall model can more fully learn the sequence relation among all sample resources in a training sample set in the updating process, and the updated resource recall model has better resource recall accuracy.

Illustratively, a gradient descent method may be employed in adjusting the parameters in the resource recall model in a direction that minimizes the function value. The training end condition may be, but is not limited to, the number of iterations reaching a preset number threshold, for example, the preset number threshold may be 100 times.

The embodiment of the disclosure uses all resources clicked by the user in the time period between the current model updating cycle and the last model updating cycle to form a click resource pool, performs negative sampling on the click resource pool, uses any click resource in the click resource pool as a positive sample resource, determines a negative sample resource corresponding to the positive sample resource from the remaining click resources, and generates a training sample in a training sample set in combination with the user clicking the positive sample resource, so that the proportion of positive and negative samples for training is balanced, further uses a preset loss function to optimize parameters in the resource recall model according to the training sample set, avoids the feedback of the positive sample in the training process being submerged by the feedback of the negative sample, improves the training capability of the resource recall model, and the resource recall model can fully learn useful information which is more matched with the resource recall process in the training data, and the accuracy of the updated resource recall model on the resource recall is further improved.

Fig. 4 is a flowchart illustrating another resource recall model updating method according to an exemplary embodiment, where as shown in fig. 4, after the parameters in the resource recall model are optimized by using a preset loss function according to the training sample set in step S250 to obtain an updated resource recall model, the method may further include:

in step S410, in response to the resource pushing request of the target user, a candidate resource set matching the target user is determined according to the orientation information of each resource in the resource library.

The targeting information refers to a resource pushing condition corresponding to the resource, that is, a certain resource is used as a resource to be recalled only when the resource pushing condition set by the resource pushing condition is met, taking the resource as an advertisement as an example, the targeting information of the advertisement may include user age, user gender, region information, and the like. The directional information of the resources can be set by the resource delivering party, and the resource pushing system can correspondingly store the directional information of each resource after acquiring the directional information set by the resource delivering party. The target user may be any user that triggers a resource push request.

Specifically, the target user may trigger the resource pushing instruction to send the resource pushing request to the server when logging in the application program or refreshing the page of the application program. Correspondingly, the server receives the resource pushing request, responds to the resource pushing request, searches resources matched with the target user according to the directional information of the resources in the resource library, and the resources matched with the target user form a candidate resource set.

In step S420, a target user characteristic of the target user and a candidate resource characteristic of each candidate resource in the candidate resource set are determined.

Specifically, the resource pushing request may carry a user identifier of a target user, the server may obtain a target user representation of the target user based on the user identifier of the target user, and determine a target user characteristic of the target user based on one or more user tags included in the target user representation, where the user tags may include, but are not limited to, age, gender, region, interest resource category, members, and the like. For example, the server may directly determine the target user representation as a target user characteristic of the target user, or may determine one or more user tags in the target user representation as a target user characteristic of the target user.

The server may determine a corresponding candidate resource characteristic based on the resource characteristics of each candidate resource in the candidate resource set, where the candidate resource characteristic may include, but is not limited to, a resource category, resource orientation information, a resource price, a resource push record, and the like.

In step S430, the target user characteristic and the candidate resource characteristic of each candidate resource are input into the updated resource recall model, so as to obtain the matching degree between the target user and each candidate resource.

The matching degree can represent the probability that the target user clicks the corresponding candidate resource, generally, the larger the matching degree is, the larger the probability that the target user clicks the corresponding candidate resource is, and otherwise, the smaller the matching degree is, the smaller the probability that the target user clicks the corresponding candidate resource is.

In step S440, at least one target candidate resource is determined from the candidate resource set according to the matching degree between the target user and each candidate resource.

Specifically, the candidate resources in the candidate resource set may be sorted in a descending order according to the matching degree, and the N top-ranked candidate resources may be recalled as target candidate resources.

In step S450, a return result corresponding to the resource pushing request is determined according to the at least one target candidate resource, where the return result includes a target resource to be displayed.

Specifically, in step S450, the recalled target candidate resource may be further screened in combination with other screening conditions to obtain a final pushed target resource, and the other screening conditions may be set according to needs in actual applications. Taking the resource as an advertisement as an example, the other screening condition may be the product of the matching degree of the target candidate resource and the bid of the client for the advertisement, and the target candidate resource with the largest product result is returned to the target user as the target resource to be displayed.

The updated resource recall model fully learns the useful information which is matched with the resource recall process in the training data in the last model updating period, so that the accuracy of the updated resource recall model on follow-up resource recall is improved, and the accuracy of the resources pushed to the user is improved.

In a specific embodiment, as shown in fig. 4, after determining, by the server, a return result corresponding to the resource pushing request according to the at least one target candidate resource, the method may further include:

and S460, responding to the click operation of the target user on the target resource, and acquiring the user identifier of the target user and the resource identifier of the target resource.

S470, according to the user identification of the target user and the resource identification of the target resource, generating a resource click log corresponding to the next model updating period.

Specifically, after the server pushes the target resource to the target user and displays the target resource through the corresponding terminal, the target user can select to click the target resource or not according to the interesting condition of the target user to the target resource, after the target user clicks the target resource, the server can respond to the click operation of the target user to the target resource to obtain the user identifier of the target user and the resource identifier of the target resource, and generate the resource click log corresponding to the next model updating period based on the user identifier of the target user and the resource identifier of the target resource. When the next model update cycle arrives, the server may repeatedly perform the foregoing steps S210 to S250 in the embodiment of the present disclosure according to the resource click log generated as described above to update the resource recall model again.

FIG. 5 is a block diagram illustrating a resource recall model updating apparatus in accordance with an exemplary embodiment. Referring to fig. 5, the apparatus includes a click log obtaining unit 510, a first determining unit 520, a negative sampling unit 530, a training data generating unit 540, and a model updating unit 550.

The click log obtaining unit 510 is configured to perform obtaining of a resource click log corresponding to the current model update cycle;

the first determining unit 520 is configured to perform determining a click resource pool according to the resource click log, where the click resource pool includes at least one click resource, the click resource represents a resource clicked by a user within a first historical time period, and the first historical time period is a time period between a current model update cycle and a previous model update cycle;

the negative sampling unit 530 is configured to select a first click resource in the click resource pool, determine a negative sample resource corresponding to the positive sample resource from the remaining click resources by using the first click resource as the positive sample resource; the first click resource is any click resource in the click resource pool, and the rest click resources are click resources except the first click resource in the click resource pool;

the training data generating unit 540 is configured to execute generating a training sample in a training sample set according to the positive sample resource, the negative sample resource corresponding to the positive sample resource, and the user clicking the positive sample resource;

the model updating unit 550 is configured to perform optimization on parameters in the resource recall model by using a preset loss function according to the training sample set, so as to obtain an updated resource recall model; the preset loss function is determined based on the matching degree between the user and the positive sample resource in the training sample and the matching degree between the user and the negative sample resource.

In an exemplary embodiment, the training data generating unit 540 may include:

the first determining unit is configured to execute the steps of determining a user clicking the positive sample resource, and acquiring the user characteristics of the user;

a second determining unit configured to perform determination of a positive sample resource characteristic of the positive sample resource;

and the generating subunit is configured to execute the training samples in the training sample set formed by the user characteristics, the positive sample resource characteristics and the negative sample resource characteristics.

In an exemplary embodiment, the model updating unit 550 may include:

an input unit configured to perform, for each training sample in the training sample set, inputting a user feature, a positive sample resource feature, and a negative sample resource feature included in the training sample into the resource recall model;

the first network unit is configured to execute a user neural network passing through the resource recall model, and a user feature vector is obtained according to the user features;

the second network unit is configured to execute the resource neural network passing through the resource recall model, and obtain a positive sample resource vector and a negative sample resource vector according to the positive sample resource characteristics and the negative sample resource characteristics respectively;

and the optimization unit is configured to execute a preset loss function to optimize the parameters in the resource recall model based on the first matching degree and the second matching degree.

In an exemplary embodiment, the predetermined loss function includes a first difference term, a second difference term, and a most significant term; the optimization unit may include:

a fifth determining unit configured to perform determining a second difference between the first difference and a preset fixed value by the second difference term;

a sixth determining unit configured to perform taking a maximum value of the second difference value and a value zero by the maximum value item, and taking the maximum value as a function value of a preset loss function;

and the parameter adjusting unit is configured to adjust the parameters in the resource recall model according to the direction of minimizing the function value until the training end condition is met.

In an exemplary embodiment, the apparatus may further include:

the matching degree prediction unit is configured to input the target user characteristics and the candidate resource characteristics of each candidate resource into the updated resource recall model to obtain the matching degree between the target user and each candidate resource;

and the ninth determining unit is configured to determine a return result corresponding to the resource pushing request according to the at least one target candidate resource, where the return result includes a target resource to be displayed.

In an exemplary embodiment, the apparatus may further include:

the acquisition unit is configured to execute click operation of a target user on a target resource, and acquire a user identifier of the target user and a resource identifier of the target resource;

and the click log generation unit is configured to execute resource click log generation corresponding to the next model updating period according to the user identification of the target user and the resource identification of the target resource.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

In an exemplary embodiment, there is also provided an electronic device, comprising a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the steps of any of the resource recall model update methods described above when executing instructions stored on the memory.

The electronic device may be a terminal, a server, or a similar computing device, taking the electronic device as a server as an example, fig. 6 is a block diagram of an electronic device for resource recall model update according to an exemplary embodiment, where the electronic device 600 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 610 (the processors 610 may include but are not limited to Processing devices such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 630 for storing data, and one or more storage media 620 (e.g., one or more mass storage devices) for storing application programs 623 or data 622. Memory 630 and storage medium 620 may be, among other things, transient or persistent storage. The program stored in the storage medium 620 may include one or more modules, each of which may include a series of instruction operations for the electronic device. Still further, the central processor 610 may be configured to communicate with the storage medium 620 to execute a series of instruction operations in the storage medium 620 on the electronic device 600. The electronic device 600 may also include one or more power supplies 660, one or more wired or wireless network interfaces 650, one or more input-output interfaces 640, and/or one or more operating systems 621, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

The input/output interface 640 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the electronic device 600. In one example, i/o Interface 640 includes a Network adapter (NIC) that may be coupled to other Network devices via a base station to communicate with the internet. In an exemplary embodiment, the input/output interface 640 may be a Radio Frequency (RF) module for communicating with the internet in a wireless manner.

It will be understood by those skilled in the art that the structure shown in fig. 6 is merely illustrative and is not intended to limit the structure of the electronic device. For example, electronic device 600 may also include more or fewer components than shown in FIG. 6, or have a different configuration than shown in FIG. 6.

In an exemplary embodiment, there is also provided a storage medium, wherein instructions of the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the steps of any of the resource recall model update methods of the above embodiments.

In an exemplary embodiment, a computer program product is also provided that includes computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device executes the resource recall model updating method provided in any one of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A resource recall model update method, comprising:

selecting a first click resource in the click resource pool, taking the first click resource as a positive sample resource, and determining a negative sample resource corresponding to the positive sample resource from the rest click resources; the first click resource is any click resource in the click resource pool, and the rest click resources are click resources except the first click resource in the click resource pool;

2. The resource recall model updating method according to claim 1, wherein the training samples in the training sample set are generated according to the positive sample resources, the negative sample resources corresponding to the positive sample resources, and the user clicking the positive sample resources; the method comprises the following steps:

3. The method of claim 2, wherein the optimizing parameters in the resource recall model using a preset loss function according to the set of training samples comprises:

4. The resource recall model update method of claim 3 wherein the preset loss function comprises a first difference term, a second difference term, and a most valued term;

determining a second difference value between the first difference value and a preset fixed value through the second difference value term;

5. The resource recall model update method of claim 1 wherein the method further comprises:

6. The method according to claim 5, wherein after determining the returned result corresponding to the resource pushing request according to the at least one target candidate resource, the method further comprises:

7. A resource recall model update apparatus, comprising:

8. The resource recall model update apparatus of claim 7, wherein the apparatus further comprises:

an eighth determining unit, configured to perform determining at least one target candidate resource from the candidate resource set according to a matching degree between the target user and each candidate resource;

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the resource recall model update method of any of claims 1 to 6.

10. A storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the resource recall model update method of any of claims 1-6.