CN114817692A

CN114817692A - Method, device and equipment for determining recommended object and computer storage medium

Info

Publication number: CN114817692A
Application number: CN202110072684.6A
Authority: CN
Inventors: 戴蔚群; 钟俊葳; 陈凯; 夏锋
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-20
Filing date: 2021-01-20
Publication date: 2022-07-29

Abstract

The application discloses a method, a device and equipment for determining a recommended object and a computer storage medium, relates to the technical field of computers, and is used for improving the accuracy of recommending the object for a user. The method comprises the following steps: obtaining a plurality of candidate recommendation objects based on an object recommendation request triggered by a target account; respectively obtaining probability value sets corresponding to a plurality of candidate recommended objects according to historical behavior sequences corresponding to a plurality of operation behaviors associated with the target account, wherein the historical behavior sequence corresponding to each operation behavior at least comprises at least one object aimed at when the target account executes the corresponding operation behavior, and the probability value set of each candidate recommended object comprises probability values of the target account respectively executing each operation behavior aiming at the corresponding candidate recommended object; respectively obtaining the recommendation degree of each candidate recommendation object based on each obtained probability value set; and determining at least one target recommendation object from a plurality of candidate recommendation objects based on the obtained recommendation degrees.

Description

Method, device and equipment for determining recommended object and computer storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to the field of Artificial Intelligence (AI) technologies, and provides a method, an apparatus, and a device for determining a recommended object, and a computer storage medium.

Background

At present, with the diversification of elements of network products, a user can obtain various multimedia contents in a network, and the interaction modes between the user and the contents are often quite rich, for example, the user can click a favorite video to watch, and can enjoy, comment and share friends and friends rings or collect the favorite video, while negative feedback can be given to a disliked article.

In the actual scene, the hobbies of each user are different, for example, the user who is keen to share, sharing action frequency in a certain time is higher than other users certainly, and there is the user who watches or shares the video action of square dance in a certain time window, can conclude that it has certain preference to the square dance video. Therefore, a large amount of user preferences and behavior trends are hidden in the interactive behaviors and behavior objects of the user in the system, the implicit interest and behavior habits of the user are mined, accurate object recommendation is facilitated for the user, and the browsing experience of the user is improved.

Therefore, how to mine the implicit interest and behavior habits of the user and improve the object recommendation accuracy is a problem to be solved urgently at present.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for determining a recommended object and a computer storage medium, which are used for improving the accuracy of recommending the object for a user.

In one aspect, a method for determining a recommended object is provided, the method including:

obtaining a plurality of candidate recommendation objects based on an object recommendation request triggered by a target account;

according to the historical behavior sequence corresponding to each of the multiple operation behaviors associated with the target account, obtaining a probability value set corresponding to each of the multiple candidate recommendation objects, respectively, wherein the historical behavior sequence corresponding to each operation behavior at least comprises: the target account is used for executing at least one object corresponding to the corresponding operation behavior in the historical time period, and the probability value set corresponding to each candidate recommended object comprises: the target account respectively executes the probability value of each operation action aiming at the corresponding candidate recommendation object;

respectively obtaining the recommendation degree of each candidate recommendation object based on each obtained probability value set;

and determining at least one target recommendation object from the plurality of candidate recommendation objects based on the obtained recommendation degrees.

In one aspect, an apparatus for determining a recommended object is provided, the apparatus including:

the candidate object obtaining unit is used for obtaining a plurality of candidate recommended objects based on an object recommendation request triggered by a target account;

a probability prediction unit, configured to obtain, according to a historical behavior sequence corresponding to each of multiple operation behaviors associated with the target account, a probability value set corresponding to each of the multiple candidate recommendation objects, where the historical behavior sequence corresponding to each operation behavior at least includes: the target account is used for at least one object when corresponding operation behaviors are executed in a historical time period, and the probability value set corresponding to each candidate recommended object comprises the following steps: the target account respectively executes the probability value of each operation action aiming at the corresponding candidate recommendation object;

a recommendation degree obtaining unit, configured to obtain recommendation degrees of the candidate recommendation objects based on the obtained probability value sets;

and the recommended object determining unit is used for determining at least one target recommended object from the candidate recommended objects based on the obtained recommendation degrees.

Optionally, the probability prediction unit is specifically configured to:

obtaining content characteristic vectors and time characteristic vectors of all objects in the historical behavior sequence; the time characteristic vector is used for representing the degree of the distance between the operation time of the target account on each object and the current time;

respectively obtaining the object feature vector of each object according to the content feature vector and the time feature vector of each object;

and obtaining a sequence feature vector of the historical behavior sequence based on the obtained feature vectors of the objects.

Optionally, the probability prediction unit is specifically configured to:

pooling the characteristic vectors of the objects to obtain a sequence characteristic vector of the historical behavior sequence; or,

obtaining a sequence feature vector of the historical behavior sequence according to each object feature vector and each corresponding object weight parameter value; the object weight parameter value is used for representing the importance degree of the object corresponding to each object feature vector in the historical behavior sequence; or,

and performing serialized feature extraction on the feature vectors of the objects to obtain a serial feature vector of the historical behavior sequence.

Optionally, the probability prediction unit is specifically configured to:

obtaining the sequence weight parameter value of each sequence feature vector to the sequence feature vector through a set regression function; the sum of the obtained sequence weight parameter values is 1, and the sequence weight parameter values are used for representing the influence degree of the operation behavior corresponding to each sequence feature vector on the operation behavior corresponding to one sequence feature vector;

and obtaining a comprehensive characteristic vector of the historical behavior sequence according to the sequence characteristic vector of each historical behavior sequence and the corresponding sequence weight parameter value.

Optionally, the probability prediction unit is specifically configured to:

respectively obtaining at least one weight representation vector corresponding to each sequence feature vector according to each sequence feature vector and at least one attention weight matrix corresponding to the sequence feature vector; wherein each weight representation vector corresponds to an attention weight matrix;

and obtaining a comprehensive characteristic vector corresponding to the sequence characteristic vector according to at least one weight representation vector corresponding to each sequence characteristic vector.

Optionally, the at least one attention weight matrix includes a query vector weight matrix, a key vector weight matrix, and a value vector weight matrix; the probability prediction unit is specifically configured to:

obtaining corresponding query vectors, key vectors and value vectors according to the sequence feature vectors, the query vector weight matrix, the key vector weight matrix and the value vector weight matrix;

respectively obtaining the attention weight value corresponding to each sequence feature vector according to the query vector corresponding to the sequence feature vector and the key vector corresponding to each sequence feature vector; the attention weight value is used for representing the influence degree of the operation behavior corresponding to each sequence feature vector on the operation behavior corresponding to one sequence feature vector;

and obtaining a comprehensive characteristic vector corresponding to the sequence characteristic vector according to the value vector corresponding to each sequence characteristic vector and the attention weight value corresponding to each sequence characteristic vector.

Optionally, the probability prediction unit is specifically configured to:

respectively obtaining probability value sets corresponding to the candidate recommendation objects according to the auxiliary feature vectors and the obtained comprehensive feature vectors;

wherein the assist feature vector comprises at least: one or more of the account feature vector of the target account, a device environment feature vector characterizing a device environment when the target account initiates the object recommendation request, and an object feature vector of each candidate recommended object.

In one aspect, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the above methods when executing the computer program.

In one aspect, a computer storage medium is provided having computer program instructions stored thereon that, when executed by a processor, implement the steps of any of the above-described methods.

In one aspect, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps of any of the methods described above.

In the embodiment of the application, when the target account triggers the object recommendation request, the probability value sets of various operation behaviors performed by the target account on the candidate recommendation objects can be obtained according to the history behavior sequences corresponding to various operation behaviors associated with the target account, and then the recommendation degrees of the candidate recommendation objects are respectively obtained based on the probability value sets. The historical behavior sequences corresponding to the operation behaviors of the target account hide the preference of the operation behaviors of the target account, and the operation behaviors have a certain incidence relation, so that the probability values of the operation behaviors of the target account are obtained according to the historical behavior sequences corresponding to the operation behaviors, the probability values can be closer to the real behaviors of the target account, the recommendation degree obtained subsequently is more accurate, the recommendation object selected by the user is more in line with the preference of the user, and the object recommendation accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only the embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic view of a scenario provided in an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for determining a recommended object according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a recommendation page provided by an embodiment of the present application;

FIG. 4 is a diagram of an example of behavior sequence statistics provided by an embodiment of the present application;

FIG. 5 is a diagram of another example of behavior sequence statistics provided by an embodiment of the present application;

FIG. 6 is a schematic flow chart of an overall scheme provided by an embodiment of the present application;

FIG. 7 is a schematic flow chart of model training provided in an embodiment of the present application;

FIG. 8 is a schematic diagram of a model structure of a probabilistic predictive model according to an embodiment of the present disclosure;

FIG. 9 is a schematic process flow diagram of a probabilistic predictive model according to an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an apparatus for determining a recommended object according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

For the convenience of understanding the technical solutions provided by the embodiments of the present application, some key terms used in the embodiments of the present application are explained first:

object: the contents such as pictures, articles, audio or video can be used as objects to recommend the user.

And (3) information flow recommendation: in one recommendation mode, a user continuously obtains objects such as recommended pictures and texts or videos through a pull-up or pull-down refreshing screen, content interaction is performed through buttons such as sharing, praise, comment and collection of the pictures and the texts and the videos, and the interaction rate and the stay time of the user on the content such as click rate, sharing and praise are core indexes for measuring recommendation effects.

Multi-target sequencing: when the multi-target sorting recommendation system is adopted to sort the candidate set, not only a single target (such as click rate) is referred, but also a plurality of targets (such as share rate, like click rate and collection rate) are considered at the same time, and balance which is most beneficial to product ecology is made.

Deep Neural Network (DNN): a neural network model structure having a plurality of hidden layers.

Historical sequence of behaviors (sequence): the record of a certain specific behavior (such as clicking, sharing or praise) of a user in the system comprises the behavior occurrence time and an object, and the generated sequence is sorted according to a certain attribute (such as the behavior occurrence time). Taking the click behavior as an example, if the video played by the user in history is sequentially v1, v2, v3 and v4, the history behavior sequence corresponding to the click behavior of the user may be { v1, v2, v3, v4 }.

An objective function: a function for measuring and optimizing the distance between the model pre-estimate and the true sample annotation value in machine learning.

In an actual scene, a large amount of user preferences and behavior trends are hidden in interactive behaviors and behavior objects of a user in a system, implicit interests and behavior habits of the user are mined, accurate object recommendation is facilitated for the user, and browsing experience of the user is improved.

At present, a ranking model in information flow recommendation is developed greatly, and manufacturers continuously iterate in different development directions in combination with actual services. The most common way to express user interest is to work with a user portrait, which is usually used as the bottom work in a recommendation system, and the output data is an important input of a ranking model, using basic information and behavior statistics information of the user as the representation of the user, but the method of user portrait needs basic information or long-term and short-term behavior feedback of the user as the calculation information. The limitation is that when the user is a new user, the accurate information of the user is lacked, and due to the time delay of updating, the hobby tendency of the new user cannot be captured in time through the transient behavior of the new user. In addition, the portrayal work and the ranking model are usually independent, and it is difficult to achieve the goal of learning the portrayal and the ranking model together.

The current information stream recommendation does not consider the relationship between sequences and multiple targets and the relationship between multiple behavior sequences, so that the recommendation information is imperfect in basis, and the final sequencing and recommendation effect are influenced.

In view of this, in the method, when a target account triggers an object recommendation request, a probability value set of various operation behaviors performed on each candidate recommendation object by the target account may be obtained according to a history behavior sequence corresponding to each of various operation behaviors associated with the target account, and then, based on each probability value set, a recommendation degree of each candidate recommendation object is obtained respectively. The method comprises the steps of establishing and mining a plurality of behavior sequences of a user, mining implicit interest of the user, estimating a plurality of core targets of a recommendation system, and comprehensively sequencing, wherein the historical behavior sequences corresponding to all operation behaviors of a target account hide the favor of all operation behaviors of the target account, and a certain incidence relation exists among all the operation behaviors.

In addition, the deep neural network is used for bringing a plurality of core effect targets of the recommended service into one model for training, meanwhile, the records of the user under various behaviors are collected and sorted, the user interest is modeled and mined, and the model is added into model training to obtain a multi-target prediction and ordering model for information flow recommendation, which integrates various user behavior sequences. Furthermore, when online sorting and recommendation are performed, various current behavior sequences of the target account can be acquired, multi-dimensional scoring is performed on the candidate objects, and then the objects with higher scores are recommended to the target account.

After the idea of the embodiment of the present application is introduced, the following describes the main techniques related to the embodiment of the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The scheme provided by the embodiment of the application mainly relates to the machine learning/deep learning technology belonging to the field of artificial intelligence, and particularly relates to a probability prediction model obtained through machine learning by the method provided by the application, and then the probability prediction model can be used for predicting various operation behaviors of a target account. The details will be explained by the following examples.

Some brief descriptions are given below to application scenarios to which the technical solution of the embodiment of the present application can be applied, and it should be noted that the application scenarios described below are only used for describing the embodiment of the present application and are not limited. In a specific implementation process, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

The scheme provided by the embodiment of the application can be applied to most scenes for recommending information streams, such as video recommendation scenes, short video recommendation scenes, audio recommendation scenes, text recommendation scenes such as news and scenes such as commodity recommendation scenes.

As shown in fig. 1, a schematic view of a scenario provided in the embodiment of the present application may include a terminal device 101 and a server 102.

The terminal device 101 may be, for example, a mobile phone, a tablet computer (PAD), a Personal Computer (PC), a wearable device, and the like. The terminal device 101 may be installed with a software client capable of browsing objects, such as a browser, a video client, an audio client, or a news client, and a user may log in an account of the user on the client to browse the objects included in the client. Note that, even when the user does not log in to the account, the server corresponding to the client generally identifies the user, and may identify the user through a terminal used by the user, for example, so that the identification may be understood as the account of the user.

The server 102 may be a background server corresponding to a client installed on the terminal device 101, for example, an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform, but is not limited thereto.

The server 102 may include one or more processors 1021, memory 1022, and an I/O interface 1023 to interact with the terminal, among other things. In addition, the server 102 may further configure a database 1024, and the database 1024 may be configured to store basic information and historical behavior sequence information of each account, trained model parameters, and the like. The memory 1022 of the server 102 may further store program instructions of the method for determining a recommended object provided in the embodiment of the present application, and when executed by the processor 1021, the program instructions may be configured to implement the steps of the method for determining a recommended object provided in the embodiment of the present application, so as to obtain at least one object recommended to the target account, and further push the determined object to the target account, where the pushed object information may be correspondingly displayed on the terminal device 101.

Specifically, when a user opens a client through the terminal device 101, an object recommended for the user at this time needs to be displayed on the client, that is, at this time, an object recommendation request may be initiated to the server 102, the server 102 may obtain user behavior data of the user based on the object recommendation request, obtain a historical behavior sequence corresponding to a plurality of operation behaviors of the user from the user behavior data, such as a click behavior sequence, a sharing behavior sequence, or a forwarding behavior sequence, and further perform probability prediction of the plurality of operation behaviors on each candidate recommended object in a recommendation pool corresponding to the user based on the historical behavior sequence corresponding to the plurality of operation behaviors, and further obtain a recommendation score of each candidate recommended object, so as to perform object ranking according to the recommendation score, and select a target recommended object recommended for the user.

Terminal device 101 and server 102 may be communicatively coupled directly or indirectly through one or more networks 103. The network 103 may be a wired network or a Wireless network, for example, the Wireless network may be a mobile cellular network, or may be a Wireless-Fidelity (WIFI) network, or may also be other possible networks, which is not limited in this embodiment of the present invention.

In actual application, the server 102 and the background server of the client may also be different servers, so the server 102 may be a dedicated ranking server, when the user triggers an object recommendation request, the background server of the client may request the server 102 for a recommendation object, and after the server 102 determines a target recommendation object, the target recommendation object is pushed to the client through the background server of the client.

Of course, the method provided in the embodiment of the present application is not limited to be used in the application scenario shown in fig. 1, and may also be used in other possible application scenarios, and the embodiment of the present application is not limited. The functions that can be implemented by the devices of the application scenario shown in fig. 1 will be described together in the following method embodiments.

Referring to fig. 2, a flowchart of a method for determining a recommended object according to an embodiment of the present application is schematically illustrated, and the method may be executed by the server 102 in fig. 1, where the flowchart of the method is described as follows.

Step 201: and obtaining a plurality of candidate recommendation objects based on the object recommendation request triggered by the target account.

In the embodiment of the application, after a target user performs a certain operation on a client on a terminal device, the client may be triggered to send an object recommendation request to a server through a currently logged-in target account. The operation may be, for example, an operation that a user enters a certain operation including one or more objects on the client, or an operation that a page is pulled up, pulled down, or refreshed, or of course, other possible operations may also be included, which is not limited in this embodiment of the present application.

Since the recommendation processes of a certain user or account are similar, a user or account is taken as an example for description, i.e. the target user or the target account is described above. The target account is an account used by the target user for object browsing, and the target user may be any user and is not particularly specified to a certain user, so the target account may also be any account.

Since the number of the objects in the system is large, if all objects enter the sorting, a large amount of computing resources are obviously consumed, so that when the server receives the object recommendation request of the target account, a plurality of candidate recommendation objects can be screened for the target account based on the object recommendation request to serve as a recommendation pool of the target account, and further the subsequent sorting process is performed based on the recommendation pool.

Specifically, object screening may be performed according to a page initiated by the object recommendation request and a user portrait of the target account, for example, when entering a down jacket shopping page, a plurality of goods that are down jackets may be screened for the target account in combination with the user portrait of the target account, or when entering a recommendation page of a news client, news that may be liked by the target user, such as pictures and texts or videos, may be screened for the target account in combination with the user portrait of the target account.

Step 202: and respectively obtaining probability value sets corresponding to the candidate recommendation objects according to the historical behavior sequences corresponding to the operation behaviors associated with the target account.

In the embodiment of the application, a great deal of user preference and behavior tendency are hidden in the interactive behaviors and behavior objects of the user in the system, so that the historical operation behaviors and the behavior objects of the target user can be combined for recommendation when recommendation is performed.

Specifically, for each candidate recommendation object, for example, the candidate recommendation object a, the probability value set corresponding to the candidate recommendation object a may be obtained according to the history behavior sequence corresponding to each of the multiple operation behaviors associated with the target account. The historical behavior sequence corresponding to each operation behavior at least comprises at least one object aimed at by the target account when the corresponding operation behavior is executed in the historical time period, and the probability value set corresponding to the candidate recommended object A comprises probability values of each operation behavior executed by the target account aiming at the candidate recommended object A.

In the embodiment of the present application, the operation behavior may include any operation behavior that may be performed on the candidate recommendation object in the platform. As shown in fig. 3, a schematic diagram of a recommendation page is shown, where on the recommendation page, a plurality of objects, such as the video a and the article B shown in fig. 3, are recommended for a user, and operations that the user can perform on the video a and the article B may include clicking, commenting, like clicking and sharing, and then in the recommendation system, the operation behavior may include clicking behavior, commenting behavior, like clicking behavior and sharing behavior.

Because the recent historical behavior sequence of the user can better reflect the recent preference of the user, when an object recommendation request of the target account is received, an operation object in a historical time period before the current time can be acquired, or a plurality of recent objects can be selected according to the number of objects included in the historical behavior sequence, for example, when each historical behavior sequence includes 30 objects, 30 objects which have recently performed operation behaviors on the target account, such as 30 videos which have been clicked recently or 30 videos which have been shared recently, can be selected.

In practical application, in order to facilitate taking of the behavior sequence during recommendation, the behavior sequence of the user may be counted in advance according to a timeline in which the user performs operation, and further, when a historical behavior sequence corresponding to each operation behavior of the target account needs to be used, the behavior sequence information of the target account may be selected from the counted behavior sequence information.

Fig. 4 shows an example of the behavior sequence statistics of the user a. The statistical behavior sequence information may include basic information of the user a, such as information of age, gender, and the like shown in fig. 4, and the statistical behavior sequence information may be sorted according to a timeline in which the user performs operations, as shown in fig. 4, the user a shares the video a at 17:17 on 8/2 in 2019, shares the video B at 07:34 on 8/6 in 2019, clicks the video C at 20:18 on 8/9 in 2019, and so on.

Fig. 5 shows another example of behavior sequence statistics of the user a. The statistical behavior sequence information may be respectively counted according to different operation behaviors, and each sequence may be arranged according to time, for example, in fig. 5, a corresponding click sequence and a corresponding sharing sequence may be respectively obtained according to a click behavior and a sharing behavior.

In the embodiment of the application, when the probability value set corresponding to the candidate recommendation object a is obtained, the similarity between the candidate recommendation object a and each object may be calculated according to each object included in the historical behavior sequence of each operation behavior of the target account, and then the probability value set corresponding to the candidate recommendation object a is obtained based on the similarity, for example, for a click behavior, the similarity between each object in each sequence and the candidate recommendation object a may be calculated according to the weight value of each sequence or the weight value of each object to obtain the similarity between the target account and the candidate recommendation object a, and then the click probability of the target account is obtained, and of course, when the click behavior is predicted, the weight value of the click sequence may be appropriately higher than that of other sequences.

Or a probability prediction model can be constructed and trained in advance, and then a probability value set corresponding to the candidate recommendation object A is obtained by utilizing the trained probability prediction model according to the historical behavior sequences corresponding to the various operation behaviors. The training process and the processing process of the probabilistic predictive model will be introduced in the subsequent process, and thus will not be described herein in detail.

Similarly, taking the candidate recommendation object a as an example, the obtained probability value set of the candidate recommendation object a includes probability values of each operation behavior performed on the candidate recommendation object a by the target account, that is, if the candidate recommendation object a is pushed to the target account, the click rate, the share rate, the like rate, the forward rate, or the collection rate of the candidate recommendation object a by the target account, and the like.

In the embodiment of the application, besides the probability prediction is performed by using a plurality of historical behavior sequences corresponding to a plurality of operation behaviors of the target account, the probability prediction can be performed by combining with other information in a comprehensive manner, so that the accuracy of the probability prediction is improved. The other information may include, for example, one or more of the following information:

(1) basic information of the target account, such as user name, age, and gender, etc.

(2) And candidate recommendation object information.

(3) Browsing context information of the target account, such as objects browsed or operated by the user before being pulled down.

(4) And the target account initiates the device environment characteristics of the device environment when the object recommendation request is sent, such as the place, the time, the network and the like. For example, when the user is shopping and the location where the object recommendation request is initiated is located in Sichuan, special products related to Sichuan can be recommended for the user.

Step 203: and respectively obtaining the recommendation degrees of the candidate recommendation objects based on the obtained probability value sets.

In the embodiment of the application, after the probability value sets corresponding to the multiple candidate recommendation objects are obtained, the recommendation degrees of the candidate recommendation objects can be obtained based on the probability value sets corresponding to the candidate recommendation objects respectively.

Similarly, taking the candidate recommendation object a as an example, the recommendation degree of the candidate recommendation object a may be obtained by averaging the probability value sets, or considering that the importance degrees of the operation behaviors may be different, corresponding weight values may be given to different operation behaviors according to the importance degrees of the operation behaviors, and then the recommendation degree of the candidate recommendation object a is obtained by performing weighted summation according to the probability values and the weight values of the operation behaviors.

Step 204: and determining at least one target recommendation object from a plurality of candidate recommendation objects based on the obtained recommendation degrees.

In the embodiment of the application, based on the recommendation degree of each candidate recommendation object, each candidate recommendation object can be ranked according to the recommendation degree, and then at least one candidate recommendation object with higher recommendation degree is selected as a target recommendation object. For example, after the ranking is performed, the candidate recommendation objects with the recommendation degrees ranked top ten may be selected as target recommendation objects, and the determined object information of each target recommendation object is pushed to the target account, so that the object information of each target recommendation object can be viewed in the client that logs in the target account.

Specifically, the object information of the target recommendation object may be displayed in the client, or may be displayed according to the recommendation degree, for example, the target recommendation object with high recommendation degree may be preferentially displayed.

In this embodiment of the application, the process of step 202 may be implemented by using a probability prediction model, as shown in fig. 6, which is a schematic flow chart of an overall scheme for recommending an object by using the probability prediction model. The scheme mainly comprises an online process and an offline process, wherein the online process can comprise an online recommendation process and a user interface display process, and the offline process can comprise an offline log collection process and an offline model training process, which are respectively described below.

1. Log collection

The log collection process is mainly used for constructing a sample, and the sample construction is one of key steps in an offline process and mainly comprises feature extraction and label definition.

(1) Feature extraction

Specifically, the system log includes data such as basic information of a user, an object to be operated, an operation behavior of the object, and object information, so that user image data and object data included in the log collected from the system can be processed, for example, filtered, cleaned, and spliced, and processed into a sample file for training the ranking model according to a set standardized format, and the function of the sample file is to abstract the original log into a data expression including complete information. The method particularly focuses on recording various operation behaviors of the user, and the method can include positive feedback behaviors such as clicking, sharing and praise, negative feedback behaviors such as complaint or neglect, recording behavior occurrence time, occurrence objects and corresponding attribute information of the behavior, and generating various user behavior sequences according to behavior occurrence time sequences.

(2) Label definitions

The label definition refers to that the sample is calibrated into a positive sample and a negative sample according to the learning objective. In practical application, the index most concerned by the business can be defined as a main index, and a single label is used for distinguishing the positive sample from the negative sample. For example, for an object pushed to a user, if the user clicks the object, the tag of the click target of the user for the object may be 1, otherwise, the tag is 0, and similarly, if the user shares the object, the tag of the share target of the user for the object may be 1, otherwise, the tag is 0.

In the multi-target scenario of the embodiment of the application, a plurality of indexes concerned by the business and a main index are taken as learning targets, and each index has a respective label. After the features are extracted and the labels are calibrated, a training sample which can be used in the training process can be obtained.

2. Off-line training

After the training sample is made, model training can be performed. Specifically, according to the embodiment of the application, the implicit interest preference and the behavior habit of the user are mined through the historical behavior sequence corresponding to the multiple operation behaviors of the user, the mined multiple behaviors are output and added to the multi-target model learning, and then the probability of the multiple operation behaviors of the user is predicted.

The process of model training will be briefly described below. Fig. 7 is a schematic diagram of a model training process.

Step 701: a training sample set is obtained.

Specifically, the training samples can be collected through the process described in the log collection section, so as to form a training sample set for model training.

In particular, the training sample set may comprise a plurality of training samples, each training sample may comprise at least: before the triggering moment of an object recommendation request, a history behavior sequence corresponding to each of a plurality of operation behaviors associated with a corresponding trigger account, at least one object recommended to the corresponding trigger account based on the object recommendation request, and a label labeled for each object based on the actual operation behavior of each recommended object by the corresponding trigger account.

For example, when an account a initiates an object recommendation request at time B, and accordingly, the server recommends 10 objects for the account a in response to the object recommendation request, a training sample that can be formed by the account a may include a historical behavior sequence corresponding to various operation behaviors of the account a before time B, such as a click sequence or a share sequence, and further include one or more of the 10 objects and tags labeled according to actual operation behaviors of the respective objects by the user.

In the embodiment of the present application, the lengths of the sample behavior sequences of different operation behaviors may be the same or different. In a specific implementation process, in order to reduce the processing difficulty of the model, the lengths of all the sample behavior sequences may be unified, that is, the lengths of all the obtained sample behavior sequences are the same.

Step 702: and predicting the probability of various operation behaviors of each selected training sample to obtain a probability value set corresponding to each training sample.

In the embodiment of the application, the prediction process of the operation behavior probability in the training process is the same as the prediction process actually applied to the recommendation process, so the process will be described in detail in the subsequent recommendation process, and will not be described in detail herein.

In practical application, during each training, probability prediction may be performed on all training samples in the training sample set, or of course, a part of the training samples may be randomly selected according to a certain selection probability to perform probability prediction.

When the number of the candidate objects included in each training sample is 1, the probability value set is a probability value of the object corresponding to each operation behavior, and when the number of the candidate objects included in each training sample is multiple, the probability value set includes probability value subsets of multiple objects, each probability value subset is a probability value of one object corresponding to each operation behavior.

In the embodiment of the application, the probability prediction model can roughly comprise two parts, one part is a feature extraction part which can extract features of candidate objects and a user recent behavior sequence updated in real time, the other part is a multi-target prediction part which is a multi-target prediction model, one target can correspond to one operation behavior, and therefore probability values of all targets, namely all operation behaviors, can be obtained by the multi-target prediction part through the extracted features.

Step 703: and obtaining a loss value of the probability prediction model according to the probability value set and the label set of each training sample.

Step 704: it is determined whether the probabilistic predictive model satisfies a convergence condition.

In the embodiment of the application, as described above, the predicted probability value set may be compared with the tag set labeled according to the actual behavior, and then the loss value of the model may be determined according to the comparison result, and then whether the probability prediction model satisfies the convergence condition may be determined according to the loss value.

Similarly, when the number of candidate objects included in each training sample is 1, the label set is a probability value that the object corresponds to each operation behavior, and when the number of candidate objects included in each training sample is multiple, the label set includes label subsets of multiple objects, each label subset is a probability value that one object corresponds to each operation behavior.

Specifically, when the loss value is not greater than the preset loss value threshold, the difference between the probability indicated by the indication prediction probability and the probability indicated by the labeled label is small enough, so that the accuracy of the probability prediction model meets the requirement, and therefore it can be determined that the probability prediction model is converged, and conversely, the probability prediction model is not converged.

Any possible loss function may be selected to calculate the loss value, which is not limited in the embodiment of the present application.

Step 705: if the result of step 704 is negative, then the parameters of the probabilistic predictive model are adjusted.

And when the probability prediction model does not meet the convergence condition, performing parameter adjustment on the probability prediction model according to the loss value, and entering the next training process by using the adjusted probability prediction model, namely, skipping to the step 702.

If the result of step 704 is yes, that is, when the probabilistic predictive model satisfies the convergence condition, the training is finished, and after the training of the model is finished, a model file of the probabilistic predictive model may be obtained, where the model file includes model parameters of each part of the model.

3. Online recommendation

In the embodiment of the application, the trained probability prediction model can be used for online recommendation, and when an object recommendation request of a target account is received online, the candidate recommendation object set of the target account is comprehensively scored by using an offline training model. In the process, the sequencing service performs feature extraction on a target account initiating an object recommendation request, candidate recommendation objects to be scored and a user recent behavior sequence updated in real time, inputs probability values of a plurality of targets output in a trained multi-target model, then performs object sequencing by integrating the probability values of the plurality of targets, and finally returns N candidate recommendation objects with highest scores in a sequencing result to the target account so as to display the target recommendation objects pushed by the candidate recommendation objects on a user interface. Meanwhile, the user can perform operation feedback on the exposed target recommendation object, such as clicking, sharing, praise and comment, and accordingly the log can also collect the feedback, and the probability prediction model obtained by training can be retrained by using the collected content, so that the probability prediction model is optimized, and the accuracy of the probability prediction model is improved.

In the following, a specific model structure is taken as an example to describe the process of probability prediction. As shown in fig. 8, the model structure of the probabilistic predictive model is shown schematically, and as shown in fig. 8, the probabilistic predictive model includes an input layer, a vectoring layer (vectoring layer), a multi-layer networking layer (multi-layer probability), and a multi-sequence processing layer. It should be noted that fig. 8 shows a model structure taking two targets, that is, two operation behaviors as an example, but in practical applications, the model structure can be extended to more operation behavior applications, and the embodiment of the present application is not limited to the operation behavior types.

Next, a process of performing multi-objective probability prediction by using multiple historical behavior sequences is introduced in combination with a structure of a probability prediction model, and fig. 9 is a schematic processing flow diagram of the probability prediction model.

Step 901: and respectively obtaining sequence feature vectors corresponding to the historical behavior sequences.

In the embodiment of the application, as shown in fig. 8, the input layer inputs information such as user information, recommended context information, device environment feature information, candidate object information, and the like of the target account, and historical behavior sequences corresponding to various operation behaviors of the target account, such as a click behavior sequence and a sharing behavior sequence shown in fig. 8, to the input layer of the probability prediction model.

Furthermore, information input by the input layer is vectorized by the vectorization layer of the probabilistic predictive model. Specifically, the Embedding process mainly aims to reduce the dimension of the sparse feature, and the vectorization layer can be realized through a special full-connection layer.

In the embodiment of the application, although the learning tasks of multiple targets are different, the imbedding structure of the features and the full-connection network structure of the bottom layer are shared mutually, and through the arrangement of the sharing layer, the learned feature parameters are consistent to different tasks, and meanwhile, the problems of insufficient training data and sparse features existing in independent optimization of a single target are avoided.

For the information of the target account, such as the user information, the recommendation context information, the device environment feature information, the candidate recommendation object information, and the like, after being processed by the vectorization layer, corresponding feature vectors, that is, an account feature vector of the target account, a context feature vector, a device environment feature vector representing the device environment when the target account initiates an object recommendation request, and an object feature vector of each candidate recommendation object, can be obtained.

While for the historical behavior sequence input by the input layer, the processing can be performed by a multi-sequence processing layer, such as the right part of the structure shown in fig. 8.

The historical behavior sequence of each operation behavior takes a behavior object as a carrier. If the user A clicks the video V at the time t, the video V is added in the clicking action sequence of the user A, and the video V comprises information such as relevant attributes (identification or category) of the V and the action time when the V is clicked. Each operation behavior corresponds to a corresponding historical behavior sequence, each behavior record corresponds to a carrier, and whether the same carrier in the same behavior is subjected to duplicate removal can be determined by combining services. For example, for a video service, if the user a clicks the video V twice, it may be an incorrect operation, and thus the duplicate removal processing may be performed on the video V twice, whereas in a shopping service, if the user clicks the commodity B twice, it may be that the user prefers the commodity B, and thus two records of the commodity B may be kept.

Since the processing procedure of the historical behavior sequence corresponding to each operation behavior is similar, a historical behavior sequence is taken as an example, for example, a click behavior sequence is described here.

And respectively sequencing the constructed click behavior sequences from far to near according to the time stamps of the clicked objects, and respectively carrying out embedding processing, wherein the click behavior sequences correspondingly output vector representations with the length of d, and d represents the number of the objects.

Since the click behavior sequence is composed of a plurality of objects, each object in the click behavior sequence can be vectorized by the vectorization layer to obtain a content feature vector of each object. For example, for an object, the object information input by the input layer may include features of each feature dimension of the object, for example, the feature dimension includes a publisher, a type, specific content, and the like, and the vectorization layer may perform dimension reduction processing on the features of each feature dimension to obtain a content feature vector that may characterize content related information of the object.

In the embodiment of the present application, for a click behavior sequence, each object may be characterized by a content feature vector, and also needs to characterize its Position feature in the click behavior sequence, so that each object in the click behavior sequence may also be Position-coded by using a Position Encoder (Position Encoder) layer shown in fig. 8. Position coding is a common operation in Natural Language Processing (NLP) for long sequences, primarily to preserve the structural information of the sequence.

Specifically, a position coding method is to code the position of the object in the sequence, for example, the position is located at the fourth position in the click behavior sequence to obtain a time feature vector, and since the objects are arranged in the sequence according to the time sequence, the position of the object in the sequence can also reflect the distance between the operation time and the current time, and further, the obtained time feature vector of each object can indirectly reflect the distance between the operation time and the current time.

Another position coding method may perform position coding according to the degree of the operation behavior occurrence time of the object from the current time, for example, by using learning embedding of a time difference between the behavior occurrence time and the current request time, the degree of the behavior occurrence time from the current time for distinguishing long and short interests may be calculated, so as to obtain a time feature vector representing the degree of the operation time of the target account from the current time of each object from the current time.

After the content feature vector and the time feature vector of each object in one click behavior sequence are obtained, the object feature vector of each object can be correspondingly obtained according to the content feature vector and the time feature vector of each object. For example, the temporal feature vector may be correspondingly superimposed into the content feature vector, resulting in a corresponding object feature vector.

After the Position encoder processing, a plurality of object feature vectors of the click behavior sequence enter a sequence feature extraction layer, and therefore the sequence feature vectors of the click behavior sequence are obtained.

Specifically, the pooling layer may be processed in several ways as follows to obtain a sequence feature vector of the click behavior sequence based on the obtained feature vectors of the respective objects.

(1) Pooling (Pooling) treatment

Namely, pooling processing is carried out on each object feature vector of the click behavior sequence to obtain a sequence feature vector of a historical behavior sequence. The pooling treatment may be performed by sum pooling (sum pooling), mean pooling (mean pooling), or maximum pooling (max pooling).

(2) Attention (attention) mechanism processing

Through a model training process, the object weight parameter value of each object in the sequence can be learned, the object weight parameter value is used for representing the importance degree of each object feature vector corresponding to the object in a historical behavior sequence, and then the sequence feature vector of the historical behavior sequence can be obtained according to each object feature vector and the corresponding object weight parameter value. For example, a weighted summation may be used to obtain the sequence feature vector.

Specifically, the object weight parameter value may be, for example, an attention weight matrix corresponding to each object, and may include a query vector (query) weight matrix, a key vector (key) weight matrix, and a value vector (value) weight matrix, and then the query vector, the key vector, and the value vector may be obtained based on the query weight matrix, the key weight matrix, and the value weight matrix, respectively, and then the object weight parameter value of each object may be obtained based on the query vector and the key vector of each object, and then the sequence feature vector may be obtained according to the object weight parameter value and the value vector.

(3) Recurrent Neural Network (RNN)

Specifically, the RNN is used for carrying out serialized feature extraction on each object feature vector to obtain a sequence feature vector of the click behavior sequence. The RNN network may specifically be any network structure capable of performing serialized feature extraction, which is not limited in this embodiment of the present application.

Through the above process, each historical behavior sequence may output a sequence feature vector correspondingly, as shown in fig. 8, the click behavior sequence and the sharing behavior sequence correspond to the click sequence output and the sharing sequence output, respectively.

Step 902: and performing sequence feature combination on each sequence feature vector to obtain a comprehensive feature vector corresponding to one sequence feature vector.

In the embodiment of the present application, in order to consider the relationship between multiple behaviors and multiple tasks and make a tendency selection on historical behavior sequences of various operation behaviors among different tasks, as shown in fig. 8, a sequence combiner (sequence combiner) is further added in a multi-sequence processing layer, an input of the sequence combiner is a coding result of each historical behavior sequence, that is, a sequence feature vector of each historical behavior sequence, the number of vectors corresponds to the number of historical behavior sequences, the sequence combiner outputs a comprehensive result of multiple historical behavior sequences corresponding to each task, and the number of vectors corresponds to the number of multiple tasks, as shown in fig. 8, the tasks include a click task and a sharing task, and thus an output of the sequence combiner is a comprehensive feature vector corresponding to each click task and each sharing task.

In practical application, as shown in fig. 8, the number of tasks may be the same as the number of operation behaviors, and of course, the number of tasks may be different from the number of operation behaviors, so that the comprehensive result of each task may be learned according to a plurality of operation behaviors, for example, the input sequence may include a click behavior sequence, a sharing behavior sequence, and a comment behavior sequence, and the task may include only a click task and a sharing task.

In the embodiment of the application, the purpose of the sequence combiner is to give different tendencies to the outputs of the historical behavior sequences for different tasks, for example, the weight of the output result of a possible shared sequence in a shared task is higher than that of a click sequence, and further, the output of each historical behavior sequence is synthesized based on the tendencies to obtain the comprehensive result of the task.

Specifically, the following operations are respectively performed for each obtained sequence feature vector:

and aiming at one sequence feature vector, performing sequence feature combination on each sequence feature vector according to the influence degree of various operation behaviors on the operation behavior corresponding to the sequence feature vector respectively to obtain a comprehensive feature vector corresponding to the sequence feature vector.

Since the process of obtaining the corresponding comprehensive feature vector for each task is similar, a click task is still described here as an example. Then, for the sequence feature vector of the click behavior sequence corresponding to the click task, according to the degree of influence of various operation behaviors on the click behavior, the sequence feature vectors can be combined to obtain a comprehensive feature vector corresponding to the sequence feature vector of the click behavior sequence, that is, a comprehensive feature vector of the click task.

In the embodiment of the present application, when obtaining the comprehensive feature vector of each task, the following two ways may be adopted, and the two ways are still introduced below by taking a click task as an example.

(1) First mode

The sequence feature vectors of all the operation behaviors can be obtained through a set regression function, the sequence weight parameter values of the sequence feature vectors of the click behaviors are obtained, and then a comprehensive feature vector of a historical behavior sequence is obtained according to the sequence feature vectors of all the historical behavior sequences and the corresponding sequence weight parameter values. The sum of the obtained sequence weight parameter values is 1, and the sequence weight parameter values are used for representing the influence degree of the operation behavior corresponding to each sequence feature vector on the operation behavior corresponding to one sequence feature vector.

For example, the set regression function may be a softmax function, for example. Specifically, similar to the mapping network of MMOE (Modeling Task Relationships in Multi-Task Learning with Multi-gate knowledge-of-Experts), N times of calculation of softmax can be performed with the candidate as key by the output of the plurality of historical behavior sequences by the candidate, with the value of N corresponding to the number of tasks.

(2) Second mode

The method includes obtaining at least one weight expression vector corresponding to each sequence feature vector according to each sequence feature vector and at least one attention weight matrix corresponding to one sequence feature vector, and obtaining a comprehensive feature vector corresponding to one sequence feature vector according to the at least one weight expression vector corresponding to each sequence feature vector. Wherein each weight representation vector corresponds to an attention weight matrix.

That is, in the specific processing process, N times of self-attentions are calculated among the sequence feature vectors of the plurality of historical behavior sequences, so that comprehensive feature vectors corresponding to N tasks are obtained respectively.

Specifically, at least one attention weight matrix includes a query weight matrix, a key weight matrix, and a value weight matrix, and for convenience of description, the click task is still used as an example for description, and the processes of other tasks are similar to the process of the click task, so that reference may be made to the following description.

The click task corresponds to the query weight matrix, the key weight matrix and the value weight matrix of the task, the query vector, the key vector and the value vector corresponding to each sequence feature vector can be respectively obtained according to the 3 weight matrices and the sequence feature vectors of each operation behavior, and the attention weight value corresponding to each sequence feature vector can be respectively obtained according to the query vector corresponding to the sequence feature vector of the click behavior sequence and the key vector corresponding to each sequence feature vector, wherein the attention weight value is used for representing the influence degree of the operation behavior corresponding to each sequence feature vector on the operation behavior corresponding to one sequence feature vector.

Specifically, taking only click behavior and sharing behavior as an example, then query vector 1, key vector 1, and value vector 1 corresponding to the click behavior sequence and query vector 2, key vector 2, and value vector 2 corresponding to the sharing behavior sequence may be obtained respectively, then similarity between query vector 1 and key vector 2 may be calculated respectively, and thus attention weight values corresponding to the click behavior sequence and the sharing behavior sequence are obtained respectively according to the similarity.

Furthermore, a comprehensive feature vector corresponding to the click task may be obtained according to the value vector corresponding to each sequence feature vector and the attention weight value corresponding to each sequence feature vector.

In both methods, different weights are given to each historical behavior sequence in each task, and the final weighted sum result is the output result of the historical behavior sequence to the task.

Through the above process, the comprehensive feature vectors corresponding to each task can be obtained, and further as shown in fig. 8, the obtained comprehensive feature vectors are connected (connected) to the corresponding task sub-network bottom layer to perform learning together with multiple targets, for example, the comprehensive feature vector of the click task is connected to the click task sub-network bottom layer to perform network training and prediction of the click target.

Step 903: and respectively obtaining probability value sets corresponding to the candidate recommendation objects according to the obtained comprehensive characteristic vectors.

In a multi-task network layer, in order to facilitate each task to perform better data fitting according to self label distribution, each task correspondingly shares an independent neural network structure, as shown in fig. 8, a click task corresponds to one neural network structure, the output of the network structure is click rate, a sharing task corresponds to one neural network structure, and the output of the network structure is sharing rate. When the number of layers of the neural network and the number of nodes on each layer are preset, the neural network structures of different tasks can be the same or different, but because the dimensionality of the comprehensive characteristic vectors corresponding to each task is the same, the nodes on the first layer of each neural network structure can be the same.

In one possible approach, layers in the neural network for each task may be constructed in a fully linked manner.

In the embodiment of the application, in addition to the comprehensive feature vector of each task, probability prediction may be performed comprehensively by synthesizing other auxiliary information, where the auxiliary information may include information such as user information of a target account, recommendation context information, device environment feature information, and candidate recommendation object information, as shown in fig. 8, after processing is performed by a vectorization layer, corresponding feature vectors, that is, an account feature vector, a context feature vector of the target account, a device environment feature vector representing a device environment when the target account initiates a recommendation request for an object, and an object feature vector of each candidate recommendation object may be obtained, and then the comprehensive feature vectors and the auxiliary vectors of each task are connected at a connection layer, so as to be input to a neural network structure in a multitask network layer for probability prediction.

The model adopted by the embodiment of the application belongs to a multi-sequence multi-target deep neural network structure, multiple service indexes are simultaneously modeled and learned in a multi-target single model mode, the size parameter of the model and the bandwidth request when the model parameters are obtained on line are reduced, the independent calculation time of multiple models is reduced, the implicit interest of a user is mined by simultaneously modeling multiple behavior sequences of the user, the influence of time factors is considered to be large in the mining process, the long-term interest and the short-term interest of the user are identified by adding Position encoder, and the learned sequence interest is added into a multi-target model. In addition, due to the adoption of the user behavior sequence, the user behavior can be captured in time during online recommendation, and the user behavior sequence is updated in real time, so that the change of the user interest can be rapidly captured, and the method is particularly suitable for completing hot start by migrating the interest of an old user to a new service scene.

In summary, the deep neural network model of multi-behavior sequence and multi-target output of the user is constructed, a plurality of core indexes of the recommendation system are optimized simultaneously and integrated and sorted on line, model training is reduced, additional information is added, the comprehensive recommendation effect of the indexes is improved, and the method and the device are suitable for most recommended service scenes. The method and the system achieve the effect of simultaneously improving the benefits of multiple indexes in multiple business scenes, and can quickly capture the real-time interests of users.

Referring to fig. 10, based on the same inventive concept, an embodiment of the present application further provides an apparatus 100 for determining a recommended object, including:

a candidate object obtaining unit 1001 configured to obtain a plurality of candidate recommended objects based on an object recommendation request triggered by a target account;

a probability prediction unit 1002, configured to obtain, according to a historical behavior sequence corresponding to each of multiple operation behaviors associated with the target account, a probability value set corresponding to each of multiple candidate recommendation objects, where the historical behavior sequence corresponding to each operation behavior at least includes: the target account is used for executing at least one object corresponding to the corresponding operation behavior in the historical time period, and the probability value set corresponding to each candidate recommended object comprises: the target account respectively executes the probability value of each operation action aiming at the corresponding candidate recommendation object;

a recommendation degree obtaining unit 1003, configured to obtain recommendation degrees of the candidate recommendation objects based on the obtained probability value sets respectively;

a recommended object determining unit 1004 for determining at least one target recommended object from the plurality of candidate recommended objects based on the obtained respective recommendation degrees.

The probability prediction unit 1003 is specifically configured to:

respectively obtaining probability value sets corresponding to a plurality of candidate recommendation objects according to historical behavior sequences corresponding to a plurality of operation behaviors through a trained probability prediction model;

the probability prediction model is obtained by training a plurality of training samples, and each training sample at least comprises: before the triggering moment of an object recommendation request, a history behavior sequence corresponding to each of a plurality of operation behaviors associated with a corresponding trigger account, at least one object recommended to the corresponding trigger account based on the object recommendation request, and a label labeled for each object based on the actual operation behavior of each recommended object by the corresponding trigger account.

Optionally, the probability prediction unit 1003 is specifically configured to:

respectively obtaining sequence characteristic vectors corresponding to the historical behavior sequences;

for each obtained sequence feature vector, the following operations are respectively executed: aiming at one sequence feature vector, according to the influence degree of various operation behaviors on the operation behavior corresponding to the sequence feature vector, performing sequence feature combination on each sequence feature vector to obtain a comprehensive feature vector corresponding to the sequence feature vector;

and respectively obtaining probability value sets corresponding to the candidate recommendation objects according to the obtained comprehensive characteristic vectors.

Optionally, the probability prediction unit 1003 is specifically configured to:

obtaining content characteristic vectors and time characteristic vectors of all objects in a historical behavior sequence; the time characteristic vector is used for representing the degree of the operation time of the target account on each object from the current time;

respectively obtaining object feature vectors of each object according to the content feature vectors and the time feature vectors of each object;

and obtaining a sequence feature vector of a historical behavior sequence based on the obtained feature vectors of the objects.

Optionally, the probability prediction unit 1003 is specifically configured to:

pooling each object feature vector to obtain a sequence feature vector of a historical behavior sequence; or,

obtaining a sequence feature vector of a historical behavior sequence according to each object feature vector and each corresponding object weight parameter value; the object weight parameter value is used for representing the importance degree of each object feature vector corresponding to the object in a historical behavior sequence; or,

and performing serialized feature extraction on the feature vectors of the objects to obtain a sequence feature vector of a historical behavior sequence.

Optionally, the probability prediction unit 1003 is specifically configured to:

obtaining sequence weight parameter values of each sequence feature vector to one sequence feature vector through a set regression function; the sum of the obtained sequence weight parameter values is 1, and the sequence weight parameter values are used for representing the influence degree of the operation behavior corresponding to each sequence feature vector on the operation behavior corresponding to one sequence feature vector;

Optionally, the probability prediction unit 1003 is specifically configured to:

respectively obtaining at least one weight expression vector corresponding to each sequence feature vector according to each sequence feature vector and at least one attention weight matrix corresponding to one sequence feature vector; wherein each weight representation vector corresponds to an attention weight matrix;

Optionally, the at least one attention weight matrix comprises a query vector weight matrix, a key vector weight matrix, and a value vector weight matrix; the probability prediction unit 1003 is specifically configured to:

respectively obtaining corresponding query vectors, key vectors and value vectors according to the sequence feature vectors and the query vector weight matrix, the key vector weight matrix and the value vector weight matrix;

respectively obtaining the attention weight value corresponding to each sequence feature vector according to the query vector corresponding to one sequence feature vector and the key vector corresponding to each sequence feature vector; the attention weight value is used for representing the influence degree of the operation behavior corresponding to each sequence feature vector on the operation behavior corresponding to one sequence feature vector;

Optionally, the probability prediction unit 1003 is specifically configured to:

wherein the assistant feature vector at least includes: one or more of an account feature vector of the target account, a device environment feature vector characterizing a device environment when the target account initiates the object recommendation request, and an object feature vector of each candidate recommendation object.

The apparatus may be configured to execute the methods shown in the embodiments shown in fig. 2 to fig. 9, and therefore, for functions and the like that can be realized by each functional module of the apparatus, reference may be made to the description of the embodiments shown in fig. 2 to fig. 9, which is not repeated here.

Referring to fig. 11, based on the same technical concept, an embodiment of the present application further provides a computer device 110, which may include a memory 1101 and a processor 1102.

The memory 1101 is used for storing computer programs executed by the processor 1102. The memory 1101 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to use of the computer device, and the like. The processor 1102 may be a Central Processing Unit (CPU), a digital processing unit, or the like. The specific connection medium between the memory 1101 and the processor 1102 is not limited in this embodiment. In the embodiment of the present application, the memory 1101 and the processor 1102 are connected by a bus 1103 in fig. 11, the bus 1103 is indicated by a thick line in fig. 11, and the connection manner between other components is merely illustrative and not limited thereto. The bus 1103 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 11, but this is not intended to represent only one bus or type of bus.

The memory 1101 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 1101 may also be a non-volatile memory (non-volatile memory) such as, but not limited to, a read-only memory (rom), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD), or the memory 1101 may be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 1101 may be a combination of the above memories.

A processor 1102 for executing the method performed by the apparatus in the embodiments shown in fig. 2-9 when invoking the computer program stored in the memory 1101.

In some possible embodiments, various aspects of the methods provided by the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of the methods according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device, for example, the computer device may perform the methods performed by the devices in the embodiments shown in fig. 2-9.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of determining a recommended object, the method comprising:

2. The method of claim 1, wherein obtaining probability value sets corresponding to the candidate recommendation objects respectively according to historical behavior sequences corresponding to a plurality of operation behaviors associated with the target account respectively comprises:

respectively obtaining probability value sets corresponding to the candidate recommendation objects according to historical behavior sequences corresponding to the operation behaviors through a trained probability prediction model;

wherein, the probability prediction model is obtained by training a plurality of training samples, and each training sample at least comprises: before the triggering moment of an object recommendation request, a history behavior sequence corresponding to each of a plurality of operation behaviors associated with a corresponding trigger account, at least one object recommended to the corresponding trigger account based on the object recommendation request, and a label labeled for each object based on the actual operation behavior of each recommended object by the corresponding trigger account.

3. The method of claim 2, wherein obtaining, through a trained probabilistic predictive model, probability value sets corresponding to the candidate recommended objects according to historical behavior sequences corresponding to the operation behaviors respectively comprises:

for each obtained sequence feature vector, the following operations are respectively executed: aiming at one sequence feature vector, according to the influence degree of the various operation behaviors on the operation behavior corresponding to the sequence feature vector, performing sequence feature combination on each sequence feature vector to obtain a comprehensive feature vector corresponding to the sequence feature vector;

4. The method of claim 3, wherein obtaining sequence feature vectors for a sequence of historical behaviors comprises:

5. The method of claim 4, wherein deriving the sequence feature vector for the one sequence of historical behaviors based on the obtained respective object feature vectors comprises:

obtaining a sequence feature vector of the historical behavior sequence according to the object feature vectors and the corresponding object weight parameter values; the object weight parameter value is used for representing the importance degree of the object corresponding to each object feature vector in the historical behavior sequence; or,

6. The method according to claim 3, wherein for one sequence feature vector, performing sequence feature combination on each sequence feature vector according to the degree of influence of the plurality of operation behaviors on the operation behavior corresponding to the one sequence feature vector, respectively, to obtain a comprehensive feature vector corresponding to the one sequence feature vector, includes:

7. The method according to claim 3, wherein for one sequence feature vector, performing sequence feature combination on each sequence feature vector according to the influence degree of each of the plurality of operation behaviors on the operation behavior corresponding to the one sequence feature vector, to obtain a comprehensive feature vector corresponding to the one sequence feature vector, includes:

8. The method of claim 7, in which the at least one attention weight matrix comprises a query vector weight matrix, a key vector weight matrix, and a value vector weight matrix;

respectively obtaining at least one weight representation vector corresponding to each sequence feature vector according to each sequence feature vector and at least one attention weight matrix corresponding to the sequence feature vector, including:

obtaining a comprehensive feature vector corresponding to the sequence feature vector according to at least one weight representation vector corresponding to each sequence feature vector, including:

9. The method of claim 3, wherein obtaining probability value sets corresponding to the candidate recommended objects respectively according to the obtained comprehensive feature vectors comprises:

10. An apparatus for determining a recommended object, the apparatus comprising:

the candidate object obtaining unit is used for obtaining a plurality of candidate recommended objects based on an object recommendation request triggered by the target account;

a probability prediction unit, configured to obtain, according to a historical behavior sequence corresponding to each of multiple operation behaviors associated with the target account, a probability value set corresponding to each of the multiple candidate recommendation objects, where the historical behavior sequence corresponding to each operation behavior at least includes: the target account is used for executing at least one object corresponding to the corresponding operation behavior in the historical time period, and the probability value set corresponding to each candidate recommended object comprises: the target account respectively executes the probability value of each operation action aiming at the corresponding candidate recommendation object;

11. The apparatus as claimed in claim 10, wherein the probability prediction unit is specifically configured to:

12. The apparatus as claimed in claim 11, wherein the probability prediction unit is specifically configured to:

13. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor,

the processor when executing the computer program realizes the steps of the method of any of claims 1 to 9.

14. A computer storage medium having computer program instructions stored thereon, wherein,

the computer program instructions, when executed by a processor, implement the steps of the method of any one of claims 1 to 9.