CN115146152A

CN115146152A - Recommendation system training method, recommendation device, electronic equipment and storage medium

Info

Publication number: CN115146152A
Application number: CN202210640273.7A
Authority: CN
Inventors: 黄睿; 李阔; 郑凯; 张晨斌; 宋洋
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-06-07
Filing date: 2022-06-07
Publication date: 2022-10-04

Abstract

The present disclosure relates to a recommendation system training method, a recommendation apparatus, an electronic device, and a storage medium, including: the method comprises the steps of determining an initial recommendation object corresponding to a recommendation system based on initial account state data and initial account behavior data, determining preset index information, transition account state data and transition account behavior data of the recommendation system in a preset time period according to the initial account state data, the initial account behavior data and the initial recommendation object, carrying out sparse-to-dense conversion on the preset index information in the preset time period to obtain preset index information of each time step of the recommendation system in the preset time period, and training the recommendation system based on the preset index information, the transition account state data and the transition account behavior data of each time step to obtain a target recommendation system. According to the method and the device, sparse long-term indexes can be decomposed into improved dense long-term indexes, and then the recommendation system trained based on the long-term indexes meets the index requirements.

Description

Recommendation system training method, recommendation device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to a recommendation system training method, a recommendation apparatus, an electronic device, and a storage medium.

Background

With the rapid development of the current mobile internet, recalls, sequences and strategies in a recommendation system become standard paradigms. The personalized recommendation system service needs to screen and grade a large number of items in the candidate set, and recommends high-quality and good-content objects for the user, so that the user satisfaction and retention rate are improved, and further the comprehensive value and the continuous development of the recommendation system are realized.

The existing recommendation method usually calculates corresponding scores by means of instant feedback predicted values of various users, and ranks objects based on the scores, which essentially improves short-term indexes of the users, such as click rate and watching rate, and indirectly influences long-term indexes such as retention rate, so that the long-term indexes have incorrectness, and a recommendation system trained based on the long-term indexes is not qualified.

Disclosure of Invention

The disclosure provides a recommendation system training method, a recommendation device, an electronic device and a storage medium, and the technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a recommendation system training method, including:

determining primary account state data and primary account behavior data of a recommendation system;

determining an initial recommendation object corresponding to a recommendation system based on the primary account state data and the primary account behavior data;

determining preset index information, transitional account state data and transitional account behavior data of a recommendation system in a preset time period according to the initial account state data, the initial account behavior data and the initial recommendation object; the transitional account status data characterizes improved data for the primary account status data based on the primary recommendation object; the transitional account behavior data characterizes improved data for the primary account behavior data based on the primary recommendation object;

performing sparse-to-dense conversion on the preset index information in the preset time period to obtain the preset index information of each time step of the recommendation system in the preset time period;

training a recommendation system based on preset index information, transition account state data and transition account behavior data of each time step to obtain a target recommendation system; and the preset index information corresponding to the target recommendation system meets the preset index condition.

In some possible embodiments, determining the preset index information, the transitional account state data and the transitional account behavior data of the recommendation system over a preset time period according to the primary account state data, the primary account behavior data and the primary recommendation object includes:

inputting the initial account state data, the initial account behavior data and the initial recommendation object into a data simulator to obtain first account state data, first account behavior data and a first time step of a recommendation system;

determining a first recommendation object corresponding to a recommendation system based on the first account state data and the first account behavior data; and (3) circulating step: inputting the first account state data, the first account behavior data and the first recommendation object into a data simulator to obtain second account state data, second account behavior data and a second time step of a recommendation system; until preset index information, transitional account state data and transitional account behavior data of the recommendation system in a preset time period are obtained;

the preset time period consists of a plurality of time steps in a cycle, and the plurality of time steps comprise a first time step and a second time step.

In some possible embodiments, the sparse-to-dense conversion is performed on the preset index information in the preset time period to obtain the preset index information of each time step in the preset time period, where the conversion includes:

determining recommended system state data corresponding to each time step based on account state data and account behavior data corresponding to each time step in a preset time period;

determining recommended system behavior data corresponding to each time step based on a recommended object corresponding to each time step in a preset time period;

and inputting the preset time period, the preset index information in the preset time period, the recommended system state data corresponding to each time step and the recommended system behavior data into the trained index information decomposer to obtain the preset index information of the recommended system in each time step.

In some possible embodiments, the method further comprises:

constructing an original information decomposer;

determining reference index information corresponding to each time step based on a preset time period and preset index information on the preset time period;

inputting the recommended system state data and behavior data corresponding to each time step into an original information resolver to obtain prediction index information corresponding to each time step;

training an original information analyzer based on the reference index information corresponding to each time step and the prediction index information corresponding to each time step;

and obtaining the index information decomposer under the condition of meeting the iteration termination condition.

In some possible embodiments, in case the iteration termination condition is satisfied, obtaining an index information decomposer, including:

under the condition that the difference value between the prediction index information corresponding to any time step in each time step and the reference index information is smaller than or equal to a first preset difference value, and the difference value between the accumulated index information and the preset index information in a preset time period is smaller than or equal to a second preset difference value, terminating the training of the original information decomposer;

and determining the trained original information decomposer as an index information decomposer.

In some possible embodiments, the method further comprises:

acquiring a sample data set and initial data of an original generator;

training an original generator and a discriminator based on the sample data set and the initial data to obtain a target generator; the target generator includes a data simulator.

In some possible embodiments, obtaining the sample data set and the initial data of the primitive generator includes:

acquiring historical offline data of a recommendation system;

dividing historical offline data into a plurality of sample data based on the visit round; the number of the sample data is the same as the numerical value of the access round; each sample data of the plurality of sample data comprises sample account status data, sample account behavior data, and sample system behavior data;

and sampling historical offline data of the recommendation system to obtain initial data of the original generator.

In some possible embodiments, training the raw generator and the discriminator based on the sample data set and the initial data to obtain a target generator, including:

inputting the sample data and the initial data corresponding to the first access round into a discriminator to obtain a first discrimination result of the sample data corresponding to the first access round and a second discrimination result of the initial data;

determining target loss of the discriminator based on the label information of the sample data corresponding to the first visit round, the first label information of the initial data corresponding to the discriminator, the first discrimination result and the second discrimination result;

training a discriminator based on the target loss of the discriminator;

determining target loss of the original generator based on the second judgment result and the first marking information of the original generator corresponding to the initial data;

training a raw generator based on a target loss of the raw generator;

generating first generated data based on the raw generator and the initial data;

inputting the sample data corresponding to the second visit round and the first generated data into a discriminator to obtain a first discrimination result of the sample data corresponding to the second visit round and a second discrimination result of the first generated data; and (3) circulating step: determining the target loss of the discriminator based on the label information of the sample data corresponding to the second access round, the first label information of the first generated data corresponding to the discriminator, the first discrimination result and the second discrimination result;

and obtaining the target generator under the condition that the iteration termination condition is met.

In some possible embodiments, the raw generator includes a raw simulator and a raw recommender; generating first generated data based on the raw generator and the initial data, comprising:

inputting sample account state data and sample account behavior data in the initial data into an original recommender to obtain first system behavior data;

inputting the first system behavior data, sample account state data in the initial data and sample account behavior data into an original simulator to obtain first account state data and first account behavior data;

first generation data is determined based on the first account status data, the first account behavior data, and the first system behavior data.

and determining preset index information, transitional account state data and transitional account behavior data corresponding to the retention rate and/or the account preference degree of the recommendation system in a preset time period according to the initial account state data, the initial account behavior data and the initial recommendation object.

According to a second aspect of the embodiments of the present disclosure, there is provided a recommendation method including:

acquiring account state data and account execution data of a target account;

inputting the account state data and the account execution data of the target account into the target recommendation system trained according to the training method of any one of the recommendation systems in claims 1 to 10 to obtain the target recommendation object.

According to a third aspect of the embodiments of the present disclosure, there is provided a recommendation system training apparatus including:

a first data determination module configured to perform determining primary account status data and primary account behavior data for a recommendation system;

the object determination module is configured to determine an initial recommendation object corresponding to the recommendation system based on the primary account state data and the primary account behavior data;

the second data determination module is configured to execute the determination of the preset index information, the transitional account state data and the transitional account behavior data of the recommendation system in a preset time period according to the initial account state data, the initial account behavior data and the initial recommendation object; the transitional account status data characterizes improved data based on the initial recommendation object to the initial account status data; the transitional account behavior data characterizes improved data for the primary account behavior data based on the primary recommendation object;

the information conversion module is configured to perform sparse-to-dense conversion on the preset index information in the preset time period to obtain the preset index information of each time step of the recommendation system in the preset time period;

the training module is configured to execute training on the recommendation system based on the preset index information, the transitional account state data and the transitional account behavior data of each time step to obtain a target recommendation system; and the preset index information corresponding to the target recommendation system meets the preset index condition.

In some possible embodiments, the second data determination module is configured to perform:

the preset time period consists of a plurality of time steps in the cycle, and the plurality of time steps comprise a first time step and a second time step.

In some possible embodiments, the information conversion module is configured to perform:

In some possible embodiments, the apparatus further comprises a resolver training module configured to perform:

constructing an original information decomposer;

In some possible embodiments, a resolver training module configured to perform:

In some possible embodiments, the apparatus further comprises a target generator determination module configured to perform:

acquiring sample data set and initial data of an original generator;

In some possible embodiments, the target generator determination module is configured to perform:

acquiring historical offline data of a recommendation system;

inputting the sample data and the initial data corresponding to the first visit round into a discriminator to obtain a first discrimination result of the sample data and a second discrimination result of the initial data corresponding to the first visit round;

determining target loss of the discriminator based on the label information of the sample data corresponding to the first access round, the first label information of the initial data corresponding to the discriminator, the first discrimination result and the second discrimination result;

training a discriminator based on the target loss of the discriminator;

training a raw generator based on a target loss of the raw generator;

In some possible embodiments, the raw generator comprises a raw simulator and a raw recommender; a target generator determination module configured to perform:

According to a fourth aspect of the embodiments of the present disclosure, there is provided a recommendation apparatus including:

the data acquisition module is configured to acquire account state data and account execution data of the target account;

and the object recommending module is configured to input the account state data and the account execution data of the target account into the target recommending system obtained by training of the recommending system training device to obtain the target recommending object.

According to a fifth aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of any of the first or second aspects described above.

According to a sixth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of the first or second aspects of the embodiments of the present disclosure.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program, the computer program being stored in a readable storage medium, from which at least one processor of a computer device reads and executes the computer program, causing the computer device to perform the method of any one of the first or second aspects of embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

determining initial account state data and initial account behavior data of a recommendation system, determining an initial recommendation object corresponding to the recommendation system based on the initial account state data and the initial account behavior data, determining preset index information, transition account state data and transition account behavior data of the recommendation system in a preset time period according to the initial account state data, the initial account behavior data and the initial recommendation object, wherein the transition account state data represent improved data of the initial account state data based on the initial recommendation object; the transitional account behavior data representation is based on improved data of an initial recommendation object on the initial account behavior data, sparse-to-dense conversion is carried out on preset index information in a preset time period to obtain preset index information of each time step of a recommendation system in the preset time period, the recommendation system is trained based on the preset index information of each time step, transitional account state data and transitional account behavior data to obtain a target recommendation system, and the preset index information corresponding to the target recommendation system meets preset index conditions. According to the method and the device, sparse long-term indexes can be decomposed into improved dense long-term indexes, and then the recommendation system trained based on the long-term indexes meets the index requirements.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a schematic diagram illustrating an application environment in accordance with an illustrative embodiment;

FIG. 2 is a flow diagram illustrating a recommendation system training method in accordance with an exemplary embodiment;

FIG. 3 is a flowchart illustrating the acquisition of sample data sets and initial data of a raw generator in accordance with an exemplary embodiment;

FIG. 4 is a flow diagram illustrating a generator and arbiter training in accordance with an exemplary embodiment;

FIG. 5 is a block diagram illustrating a generator and arbiter according to an exemplary embodiment;

FIG. 6 is a flow diagram illustrating a data simulator application in accordance with an exemplary embodiment;

FIG. 7 is a flow chart illustrating a method for obtaining preset index information over a preset time period in accordance with an exemplary embodiment;

FIG. 8 is a flowchart illustrating a method for obtaining preset index information for each time step according to an exemplary embodiment;

FIG. 9 is a flow diagram illustrating the training of an index resolver, according to an exemplary embodiment;

FIG. 10 is a schematic diagram illustrating a recommendation system training architecture in accordance with an exemplary embodiment;

FIG. 11 is a flow chart illustrating a recommendation method in accordance with an exemplary embodiment;

FIG. 12 is a block diagram illustrating a recommendation system training device in accordance with an exemplary embodiment;

FIG. 13 is a block diagram illustrating a recommendation system training apparatus in accordance with an exemplary embodiment;

FIG. 14 is a block diagram illustrating an electronic device for recommendation system training or recommendation in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment of a recommendation system training method according to an exemplary embodiment, and as shown in fig. 1, the application environment may include a recommendation system training client 01 and a server 02.

In the embodiment of the application, the client 01 may obtain the sample data set through interaction with the server 02.

Alternatively, client 01 may include, but is not limited to, smart phones, desktop computers, tablet computers, laptop computers, smart speakers, digital assistants, augmented Reality (AR)/Virtual Reality (VR) devices, smart wearable devices, and the like. Software running on the device, such as applications, applets, etc., is also possible. Optionally, the operating system running on the device may include, but is not limited to, an android system, an IOS system, linux, windows, unix, and the like.

Optionally, the server 02 determines primary account state data and primary account behavior data of the recommendation system, determines a primary recommendation object corresponding to the recommendation system based on the primary account state data and the primary account behavior data, and determines preset index information, transition account state data and transition account behavior data of the recommendation system in a preset time period according to the primary account state data, the primary account behavior data and the primary recommendation object, where the transition account state data represents improved data of the primary account state data based on the primary recommendation object; the transient account behavior data representation is based on improved data of an initial recommendation object on the initial account behavior data, sparse-to-dense conversion is carried out on preset index information on a preset time period to obtain preset index information of each time step of a recommendation system in the preset time period, the recommendation system is trained based on the preset index information of each time step, transient account state data and the transient account behavior data to obtain a target recommendation system, and the preset index information corresponding to the target recommendation system meets preset index conditions. After the server 02 obtains the target recommendation system, account state data and account execution data of the target account are obtained; and inputting the account state data of the target account and the account execution data into the obtained target recommendation system to obtain a target recommendation object.

The server 02 may include an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), and a big data and artificial intelligence platform. The operating system running on the server may include, but is not limited to, an android system, an IOS system, linux, windows, unix, and the like.

In addition, it should be noted that fig. 1 shows only one application environment of the recommendation system training method provided by the present disclosure, and in practical applications, other application environments may also be included.

Fig. 2 is a flowchart illustrating a recommendation system training method according to an exemplary embodiment, where as shown in fig. 2, the recommendation system training method may be applied to a server or a client, and includes the following steps:

in step S201, primary account status data and primary account behavior data of the recommendation system are determined.

In the embodiment of the application, the server can determine the primary account state data and the primary account behavior data of the recommendation system. Optionally, the initial account status data includes a tag list of a plurality of objects (videos) clicked by the user in all objects recommended by the recommendation system in the corresponding access round of the account information; the initial account behavior data comprises viewing feedback representation of all objects recommended by the recommendation system by the client corresponding to the account information in the corresponding access round and the interval time between the access round and the next access round.

In the embodiment of the present application, the recommendation system may be an object recommendation system, and the object may include video, music, information, and the like.

In an alternative embodiment, the primary account status data and the primary account behavior data of the recommendation system may be the primary account status data and the primary account behavior data of one account, or may be the primary account status data and the primary account behavior data of a plurality of accounts.

In step S203, an initial recommendation object corresponding to the recommendation system is determined based on the primary account status data and the primary account behavior data.

In the embodiment of the application, the server may determine the initial recommendation object corresponding to the recommendation system based on the primary account state data and the primary account behavior data.

Optionally, the recommendation system at this time is not trained, so the server may input the primary account state data and the primary account behavior data into the recommendation system that is not trained, and obtain an initial recommendation object output by the recommendation system.

In step S205, determining preset index information, transitional account state data, and transitional account behavior data of the recommendation system over a preset time period according to the primary account state data, the primary account behavior data, and the primary recommendation object; the transitional account status data characterizes improved data for the primary account status data based on the primary recommendation object; the transitional account behavior data characterizes improvement data based on the primary recommendation object to the primary account behavior data.

Optionally, the transitional account status data includes a tag list of a plurality of objects (videos) clicked by the user in all objects recommended by the recommendation system in a corresponding access round based on the improved initial recommended object to the initial account status data, the account information; the transitional account behavior data comprises the viewing feedback representation of all objects recommended by the recommendation system by the client corresponding to the account information in the corresponding visit round and the interval time between the visit round and the next visit round after the initial recommended object is improved on the basis of the initial account state data.

In an optional embodiment, the server may place the whole process on a platform bearing the recommendation system, that is, the server obtains primary account state data and primary account behavior data of the recommendation system, then inputs the primary account state data and the primary account behavior data into the recommendation system that is not trained, obtains an initial recommendation object output by the recommendation system, determines preset index information of the recommendation system in a preset time period based on feedback (such as click-to-view) of the initial recommendation object corresponding to the user, and obtains new transition account state data and transition account behavior data.

In an alternative embodiment, the server may use the data simulator to mimic the user's feedback, speeding up the entire training process.

An embodiment of a deterministic data simulator is described below, and in an alternative embodiment, the data simulator is part of a target generator trained on raw generators.

In the embodiment of the application, the server can acquire the sample data set and the initial data of the original generator. Then, the server can train an original generator and a discriminator based on the sample data set and the initial data to obtain a target generator; wherein the target generator comprises a data simulator.

Alternatively, the original generator may be a newly constructed generator without any training. May be previously built and have undergone some training, but no completed generator has been trained.

In an embodiment of the present application, the sample data set may include a plurality of sample data. Optionally, the plurality of sample data may be sample data corresponding to one account information. Optionally, the multiple sample data may be sample data corresponding to multiple account information, that is, each piece of account information in the multiple pieces of account information corresponds to multiple sample data, and the multiple sample data corresponding to each piece of account information form a sample data set.

An implementation method for acquiring sample data sets and initial data of an original generator is described below. FIG. 3 is a flowchart illustrating the acquisition of sample data sets and initial data of a raw generator, as shown in FIG. 3, according to an exemplary embodiment, including:

in step S301, historical offline data of the recommendation system is acquired.

In the embodiment of the present application, the recommendation system may be an object recommendation system, and the object may include video, music, information, and the like. The following description will be given by taking the recommendation system as a video recommendation system.

In step S303, dividing the historical offline data into a plurality of sample data based on the access round; the number of the sample data is the same as the numerical value of the access round; each sample data of the plurality of sample data includes sample account status data, sample account behavior data, and sample system behavior data.

In the embodiment of the application, the server can acquire one account information or historical offline data corresponding to a plurality of account information from the recommendation system.

In an optional embodiment, taking historical offline data as data corresponding to one piece of account information as an example, in the embodiment of the present application, the server may obtain the historical offline data of the account information in a period of time from the recommendation system. Alternatively, the period of time may be any duration of time, such as a week, a month, a quarter, and so forth.

In this embodiment of the application, the historical offline data may be a historical record that is reserved after the recommendation system recommends a video for the account information when the client corresponding to the account information accesses the recommendation system. Alternatively, the server may divide the historical offline data into a plurality of sample data based on the round of access.

Optionally, an access round may include a process from the client corresponding to the account information to start the application corresponding to the recommendation system to close the application, and in this process, a process in which the application is hung in the background of the client may be included.

In this way, the server may divide the historical offline data in the period corresponding to the account information into a plurality of sample data that is the same as the number of rounds based on the number of rounds of the access rounds. For example, assuming that the access response of the client corresponding to the account information to access the recommendation system in the period of time is 25 times, the server may divide the historical offline data in the period of time into 25 sample data.

In another optional embodiment, taking the historical offline data as data corresponding to the plurality of pieces of account information as an example, in the embodiment of the present application, the server may obtain, from the recommendation system, historical offline data of each piece of account information in the plurality of pieces of account information within a period of time. Alternatively, the period of time may be any length of time, such as a week, a month, a quarter, and so forth.

In the embodiment of the application, the historical offline data may be a historical record which is reserved after the recommendation system recommends a video for the account information when the client corresponding to each account information accesses the recommendation system. Alternatively, the server may divide the historical offline data into a plurality of sample data based on the visit round.

In this way, the server may divide the historical offline data within the period of time corresponding to each account information into a plurality of sample data that is the same as the number of rounds based on the number of rounds of the access rounds. For example, if the access response of the client corresponding to the first account information in the plurality of account information to access the recommendation system in the period of time is 25 times, the server may divide the historical offline data of the first account information in the period of time into 25 sample data; if the access times of the client corresponding to the second account information in the plurality of account information to access the recommendation system within the period of time are 20, the server may divide the historical offline data of the second account information within the period of time into 20 sample data; the access return of the client corresponding to the third account information in the plurality of account information to the recommendation system in the period of time is 18 times, and then the server can divide the historical offline data of the third account information in the period of time into 18 sample data … …

In an embodiment of the present application, each sample data may include sample account status data, sample account behavior data, and sample system behavior data in a corresponding access round.

Optionally, the sample account status data includesIs a label list of a plurality of objects (videos) clicked by a user in all objects recommended by the recommendation system in the corresponding access rounds according to the account information

Each tag c in the tag list _k A tag used to represent each object that the user clicks. For example, if the tag of an object clicked by the user is a sports class, the tag c of the object _k May be represented by "0", of course, this c _k The expression "0" is an alternative expression and does not limit the actual expression. Wherein, the first and the second end of the pipe are connected with each other,

u in (1) is associated with the account, and t is the number of times the account and recommender system interact with respect to the object in the corresponding access round, which may be associated with time t.

Since each tag c _k Is a discrete number, and in order to facilitate subsequent implementation, the server may use an embedded model to associate c with c _k Conversion into high-dimensional continuous vectors

As such, the sample account status data in each sample data may be represented as:

optionally, the sample account behavior data includes viewing feedback characterizations of all objects recommended by the recommendation system by the client corresponding to the account information in the corresponding visit round, and the interval time between the visit round and the next visit round

Assuming that the recommender system recommends a total of N objects in the visit round, the viewing feedback characterization of the N objects can be used

Representation in which the viewing feedback of each object characterizes

May be represented by 0/1, e.g., "0" for viewing, "1" for not viewing, k =1,2 … … N,

as such, the sample account behavior data in each sample data may be represented as:

optionally, the sample system behavior data includes representations of all objects recommended by the recommendation system in the corresponding visit round by the account information. Assuming that the object is a video, the identification of the video may include a tag representation of the video (such as a travel class, a game class, or a fitness class, etc.), a content representation (elements contained in each frame of the video, including houses, food, plants, etc.), and statistical features (a viewed quantity, a liked quantity, a forwarded quantity, etc.). In the embodiment of the application, the behavior data of the sample system can be used

And (4) showing.

In order to facilitate subsequent training, the recommendation system and the account information are highlighted to be established into a link, the server can construct sample system state data based on the sample account state data and the sample account behavior data, namely the combination of the sample account state data and the sample account behavior data at the moment t, and the expression is as follows:

in step S305, historical offline data of the recommendation system is sampled, and initial data of the raw generator is obtained.

In an alternative embodiment, the server may sample historical offline data, for example, sample data at time t is used as initial data of the original generator, or partial sample data at different times is combined to obtain the initial data of the original generator.

Therefore, the server can acquire the sample data set and the initial data of the primitive generator and provide effective training data for subsequent training of the primitive generator and the discriminator.

In the embodiment of the application, the server can train the original generator and the discriminator based on the sample data set corresponding to one account information, and can also train the original generator and the discriminator based on the sample data sets corresponding to a plurality of account information to obtain a trained target generator and discriminator, so as to obtain the data simulator in the target generator.

The following is described by taking a sample data set corresponding to one account information as an example, and the application of the sample data set corresponding to multiple account information may refer to the application of the sample data set corresponding to one account information, which is not described herein again.

In order to enable a data simulator in a trained target generator to follow the time sequence when simulating data, a plurality of samples in a sample data set can be sequenced according to the time sequence when a sample data set and an initial data are utilized to train an original generator and a discriminator, and the sequenced sample data set can be expressed as follows:

D＝(τ ₁ ，τ ₂ ......τ _n ) … … equation (4)

Wherein each sample data in the sample data set may be expressed as:

FIG. 4 is a flowchart illustrating training of a generator and a discriminator, as shown in FIG. 4, according to an exemplary embodiment, including:

in step S401, the sample data and the initial data corresponding to the first visit round are input to the arbiter, so as to obtain a first determination result of the sample data corresponding to the first visit round and a second determination result of the initial data.

In the embodiment of the application, the raw generator can be constructed based on a deep neural network.

FIG. 5 is a block diagram illustrating a generator and arbiter, according to an exemplary embodiment, as shown in FIG. 5, including: the device comprises an offline data module, a raw generator and a discriminator, wherein the raw generator comprises a raw recommender and a raw simulator. The training process is explained below in conjunction with fig. 5.

In this embodiment, the server may input sample data corresponding to the first visit round output by the offline data module into the arbiter, and the original generator may input the initial data into the arbiter to obtain a first determination result of the sample data corresponding to the first visit round and a second determination result of the initial data.

Alternatively, the arbiter usually represents the probability of data being true or false by 0 to 1. In practice, since the initial data is sampled or synthesized, and the original generator is not trained yet, the second discrimination result of the initial data is more toward 0 (closer to 0 indicates more false), and the first discrimination result of the sample data corresponding to the first visit round is more toward 1 (closer to 1 indicates more true). For example, assume that the first determination result is 0.85 and the second determination result is 0.25.

In step S402, a target loss of the discriminator is determined based on the label information of the sample data corresponding to the first access round, the first label information of the initial data corresponding to the discriminator, the first discrimination result, and the second discrimination result.

In an optional embodiment, the label information of the sample data corresponding to the first access round and the label information of the initial data corresponding to the sample data corresponding to the first access round may be obtained first, where the label information of the initial data corresponding to the first label information of the discriminator and the label information of the sample data corresponding to the first access round are preset according to an actual situation, and since the initial data is false with respect to the discriminator, the first label information of the initial data corresponding to the discriminator is 0, and since the sample data corresponding to the first access round is true with respect to the discriminator, the label information of the sample data corresponding to the first access round is 1.

Optionally, a first discrimination loss may be determined based on the first discrimination result and the label information of the sample data corresponding to the first visit round, a second discrimination loss may be determined based on the second discrimination result and the first label information of the initial data corresponding to the discriminator, and the target loss of the discriminator may be determined according to the target loss function of the discriminator, the first discrimination loss, and the second discrimination loss.

In step S403, the discriminator is trained based on the target loss of the discriminator.

In this manner, the server may train the discriminators based on their target loss, completing a first round of training of the discriminators.

Then, the arbiter may return a feedback to the offline data module and the primitive generator to inform the offline data module and the primitive generator that the arbiter has completed a round of training, so that the offline data module sends sample data corresponding to the second round of access to the arbiter, so that the primitive generator then completes its own training.

In step S404, a target loss of the original generator is determined based on the second discrimination result and the first annotation information of the original generator corresponding to the initial data.

In the embodiment of the present application, since the initial data is true with respect to the original generator, the first annotation information of the initial data corresponding to the original generator is 1, and the server may substitute the first annotation information of the initial data corresponding to the original generator of the second determination result into the target loss function of the original generator to obtain the target loss of the original generator.

In step S405, the raw generator is trained based on its target loss.

In this manner, the server may train the raw generator based on the target loss of the raw generator, completing a first round of training of the raw generator.

In step S406, first generated data is generated based on the raw generator and the initial data.

After the raw generator completes the first round of training, first generated data may be generated based on the raw generator and the initial data.

In the embodiment of the application, after a round of training is completed, the server can input sample account state data and sample account behavior data in the initial data into the original recommender, and the original recommender can learn recommended behaviors of the recommendation system and recommend N objects to obtain first system behavior data. And the label representation, the content representation and the statistical representation of each object in the N newly recommended objects of the first system behavior data. That is, the first system behavior data at time t +1 is generated using the sample account status data and the sample account behavior at time t, and the expression is as follows:

then, the server combines the sample account state data and the sample account behavior data in the initial data into sample system state data, inputs the first system behavior data and the sample system state data into the original simulator, and obtains the first account state data and the first account behavior data, wherein the expression is as follows:

optionally, the raw generator determines the first generation data based on the first account status data, the first account behavior data, and the first system behavior data. Therefore, the interactive data of the user recommendation system, namely the generated data corresponding to the sample data, is generated through iterative interaction of the original recommender and the original simulator on a time sequence.

In step S407, inputting the sample data and the first generated data corresponding to the second access round into the arbiter, to obtain a first determination result of the sample data corresponding to the second access round and a second determination result of the first generated data; and (3) circulating step: and determining the target loss of the discriminator based on the label information of the sample data corresponding to the second access round, the first label information of the first generated data corresponding to the discriminator, the first discrimination result and the second discrimination result.

In the embodiment of the application, after the server obtains the first generated data, the server may input the sample data corresponding to the second access round output by the offline data module into the arbiter, and input the first generated data into the arbiter to obtain the first discrimination result of the sample data corresponding to the second access round and the second discrimination result of the first generated data. Then referring to the above specific process, the target loss of the discriminator is determined based on the label information of the sample data corresponding to the second visit round, the first label information of the first generated data corresponding to the discriminator, the first discrimination result of the sample data corresponding to the second visit round, and the second discrimination result of the first generated data, and the discriminator is trained based on the target loss of the discriminator, so that the second round of discriminator training is completed.

Then, the server can determine the target loss of the original generator according to the second judgment result of the first generation data and the first marking information of the first generation data corresponding to the original generator, train the original generator based on the target loss of the original generator, and then finish the second round of training … … of the original generator

In step S408, in the case where the iteration end condition is satisfied, a target generator is obtained.

Optionally, after the server completes a preset number of iteration rounds or convergence of the discriminator, the iteration is terminated to obtain the target generator.

After the training of the discriminator and the original generator is finished for a plurality of times in reference to the second round of training, under the condition that the iteration termination condition is met, the trained original generator, namely the target generator, can be obtained, and the trained discriminator can also be obtained.

And because the original generator is trained, the original recommender and the original simulator contained in the original generator are also trained, so that the trained object recommender corresponding to the original recommender and the trained data simulator corresponding to the original simulator are obtained.

Alternatively, the target loss function of the discriminator may be expressed as:

wherein the content of the first and second substances,

as the sample data, the data is,

in order to generate the data it is necessary to,

is the result of the discrimination.

In an alternative embodiment, the generator and the arbiter are iteratively interacted, so that each iteration completes training of the arbiter and the generator, and the generator may be trained first, and the arbiter may be trained first, or as described above, the arbiter may be trained first, and then the generator may be trained.

Optionally, in order to verify whether the training process is performed sequentially, the server may verify the discriminator and the target generator by using the verification data set after completing a plurality of rounds of training in the training process, so as to verify the capability of the data simulator.

As can be seen from the above, the account status data and the account behavior data of the next round can be obtained by inputting the system behavior data and the system status data of the previous round into the data simulator, wherein the account behavior data can include the viewing feedback characterization of each object, and the interval time between the current visit round and the next visit round is

Thus, the present application can utilize trainingThe trained data simulator simulates the process of accessing the recommendation system by the account to be analyzed.

As can be seen from the above, the primary account status data and the primary account behavior data of the recommendation system may be the primary account status data and the primary account behavior data of one account, or may be the primary account status data and the primary account behavior data of a plurality of accounts. Optionally, if the personal index of a certain account to be analyzed, including the number of revisits of a month, is desired to be obtained through the data simulator, the primary account status data and the primary account behavior data of the recommendation system may be the primary account status data and the primary account behavior data of one account. Alternatively, if the next-day retention rate or the month retention rate of the whole recommendation system is desired by the data simulator, the primary account status data and the primary account behavior data of the recommendation system may be the primary account status data and the primary account behavior data of a plurality of accounts.

In the present embodiment, assuming that long-term indicators such as the next-day retention rate and/or account preference of the recommendation system are desired to be analyzed, in conjunction with the analysis, fig. 6 is a flow chart illustrating a data simulator application according to an exemplary embodiment. The server may obtain primary account status data and primary account behavior data of the recommendation system from the sample data set.

As shown in fig. 6, in some possible embodiments, the server may determine an initial recommendation object corresponding to the recommendation system based on the primary account status data and the primary account behavior data. Specifically, the server may determine a plurality of initial recommendation objects corresponding to each account based on the primary account status data and the primary account behavior data of each account.

Fig. 7 is a flowchart illustrating a method for obtaining preset index information over a preset time period, such as a month retention rate corresponding to one month, according to an exemplary embodiment, where the method includes:

in step S701, the primary account status data, the primary account behavior data, and the primary recommendation object are input into the data simulator, so as to obtain the first account status data, the first account behavior data, and the first time step of the recommendation system.

As shown in fig. 6, the server inputs the primary account status data, the primary account behavior data, and the primary recommendation object into the data simulator, and obtains the first account status data, the first account behavior data, and the first time step of the recommendation system.

The first time step may be a return visit time t1 in fig. 6, i.e., an interval between the first visit round and the next visit round.

In this way, the server simulates the access process of the primary recommendation system of the recommendation system by using the data simulator.

In step S703, determining a first recommendation object corresponding to the recommendation system based on the first account status data and the first account behavior data; and (3) circulating step: inputting the first account state data, the first account behavior data and the first recommendation object into a data simulator to obtain second account state data, second account behavior data and a second time step of a recommendation system; until preset index information, transitional account state data and transitional account behavior data of the recommendation system in a preset time period are obtained; the preset time period consists of a plurality of time steps in a cycle, and the plurality of time steps comprise a first time step and a second time step.

Since the new account status information and the new account behavior information, that is, the first account status data and the first account behavior data, are obtained in step S701, the server may determine one or more first recommendation objects corresponding to the recommendation system based on the first account status data and the first account behavior data.

Subsequently, the server may input the first account status data, the first account behavior data, and the first recommendation object into the data simulator, to obtain second account status data, second account behavior data, and a second time step of the recommendation system.

The second time step may be a return visit time t2 in fig. 6, that is, an interval between the second visit round and the next visit round.

In this way, the server simulates and completes the access process of the second recommendation system of the recommendation system by using the data simulator. Referring to the above process, the server may obtain, through the data simulator, a time step (return visit time), account state information, and account behavior information corresponding to each cycle, and end the cycle simulation process until the total duration of each time step reaches a preset time period, for example, one month corresponding to a month retention rate. At this time, the server also obtains the transient account state data and the transient account behavior data corresponding to the time point of the preset time period.

In the embodiment of the present application, regarding the retention rate, in the simulation process of the data simulator, two parts of accounts may exist in the accounts (for example, 10000 accounts exist) of the recommendation system, including simulating a period of time, when the preset period of time has not yet arrived, visiting an account with a return time of 0 (for example, 6500 accounts), and also including simulating an account with a return time to the preset period of time (for example, 3500 accounts). At this time, the server may determine that the monthly retention rate is 35%.

Suppose return visit time t ₁ Time of return visit t ₂ … … Return time t _i The sum of (a) and (b) is a preset time period, which may be a continuous time step from the last occurrence of a non-zero value, such as a month retention rate, to the next occurrence of a non-zero value, such as a month retention rate, as follows.

Wherein r is _k ＝0，k≠T _i Have and only have

Wherein s refers to the recommended system state data, a refers to the recommended system behavior data, and the subscripts of s, a, and r in equation (9) refer to the several time steps.

Taking the monthly retention rate as an example, the preset time period is one month, and each time step r in the preset time period _k The non-zero value is the value which is expected to appear, namely whether the monthly retention rate is obtained. Generally, the monthly retention rate can be obtained only at the end of the month, i.e. only in the formula

Is non-zero, e.g. 35%, others _k Are all zero and no retention occurs.

Therefore, the access process of the recommendation system can be rapidly simulated through the data simulator, and compared with real data through the recommendation system, the completion time can be greatly shortened, and the efficiency is improved for subsequent data analysis.

In step S207, sparse-to-dense conversion is performed on the preset index information in the preset time period to obtain the preset index information of each time step of the recommendation system in the preset time period.

In some possible embodiments, the long-term index information, such as the monthly retention rate, may be obtained only when a predetermined period of time has elapsed. Therefore, the preset index information such as the monthly retention rate is difficult to be acquired at a certain time step within the preset time period, and the preset index information at the certain time step cannot be used as data to improve and update the recommendation system. Based on the method and the device, the preset index information in the preset time period can be subjected to sparse-to-dense conversion, and the preset index information of each time step of the recommendation system in the preset time period is obtained.

Fig. 8 is a flowchart illustrating a method for obtaining preset index information for each time step according to an exemplary embodiment, where the method includes:

in step S801, recommended system state data corresponding to each time step is determined based on account state data and account behavior data corresponding to each time step in a preset time period.

In step S803, recommended system behavior data corresponding to each time step is determined based on the recommended object corresponding to each time step in the preset time period.

In step S805, the preset time period, the preset index information in the preset time period, and the recommended system state data and the recommended system behavior data corresponding to each time step are input into the trained index information resolver, so as to obtain the preset index information of the recommended system in each time step.

The present application further provides a method for training an index information decomposer, and fig. 9 is a flowchart illustrating a method for training an index information decomposer according to an exemplary embodiment, which includes

In step S901, an original information decomposer is constructed.

In the embodiment of the present application, the original information decomposer may be constructed based on a deep neural network.

In step S902, reference index information corresponding to each time step is determined based on a preset time period and preset index information over the preset time period.

In the embodiment of the application, the server can use a uniform strategy to apply the sparse value

Uniformly distributing the time steps of each sparse value to obtain a smooth non-zero value, which is represented by:

for example, the monthly retention rate of 35% above is assigned to the time step of each sparse value, resulting in a smooth non-zero value.

In step S903, the recommended system state data and behavior data corresponding to each time step are input to the original information resolver, and prediction index information corresponding to each time step is obtained.

Initializing original information decomposer R _θ : s multiplied by A → R, wherein S and A respectively represent recommended system state data and recommended system behavior data in reinforcement learning, and the server can allocate one piece of prediction index information R to the combination of the recommended system state data and the behavior data input original information corresponding to each time step _θ (s _t ，a _t )。

In step S904, the original information analyzer is trained based on the reference index information corresponding to each time step and the prediction index information corresponding to each time step.

Then, the server may train the original information analyzer using the reference index information corresponding to each time step and the prediction index information corresponding to each time step.

In step S905, if the iteration termination condition is satisfied, the index information decomposer is obtained.

Therefore, the server determines that the difference between the predicted index information and the reference index information corresponding to any one time step in each time step is less than or equal to a first preset difference, and the difference between the accumulated index information and the preset index information in a preset time period is less than or equal to a second preset difference, terminates the training of the original information decomposer, and determines the trained original information decomposer as the index information decomposer. Therefore, the sum of the decomposed preset index information of each time step is equal to the value of the preset index information in the preset time period as much as possible, and the preset index information is easier to improve in the subsequent improvement process under the condition that the inconvenience of the value is guaranteed.

The server determines that a difference between the prediction index information corresponding to any one of the time steps and the reference index information is less than or equal to a first preset difference, because the training objective is to make the prediction index information corresponding to each time step approach the smooth non-zero value of each time step as much as possible, and the method is specifically implemented by the following formula:

the server determines that the difference between the accumulated index information and the preset index information in the preset time period is smaller than or equal to a second preset difference, because it is ensured that the accumulated value in the preset time period before and after the conversion is unchanged, optionally, the accumulated value in the preset time period is consistent after the conversion by scaling the value in each time step, namely:

wherein, the scaling variation formula is as follows:

thus, the trained index information decomposer can obtain the monthly retention rate corresponding to each time step:

wherein, the first and the second end of the pipe are connected with each other,

a monthly retention rate for each time step.

Therefore, the index information decomposer can be used for decomposing the sparse numerical value into the dense numerical value which is easy to improve, and the difficulty in improving the long-term index is effectively improved.

In addition, compared with a common supervised learning model or a non-explicit user model modeling mode, the learned feedback and behavior expression of the user have stronger generalization and can represent the preference of the user at different time and under different behaviors.

In step S209, training the recommendation system based on the preset index information, the transitional account status data, and the transitional account behavior data of each time step to obtain a target recommendation system; and the preset index information corresponding to the target recommendation system meets the preset index condition.

In an alternative embodiment, fig. 10 is a schematic diagram of a training structure of a recommendation system according to an exemplary embodiment, and as shown in fig. 10, a server may train the recommendation system through a data simulator and the recommendation system trained before, by using preset index information, transition account state data, and transition account behavior data of each time step, until preset index information corresponding to the recommendation system meets a preset index condition, stop training, and obtain a target recommendation system.

As can be seen from the above, the user simulation behavior in the preset time period by the data simulator obtains the preset index information, the transitional account state data and the transitional account behavior data corresponding to the first round in the preset time period. Then, based on the fact that the preset index information over the preset time period is difficult to improve, the server may decompose the preset index information over the preset time period into preset index information of each time step in the preset time period which is easy to improve by using the index decomposer. And then, inputting the preset index information, the transitional account state data and the transitional account behavior data of each time step into a recommendation system, and training the recommendation system to obtain a transitional recommendation object of the round. In this way, the training of the first wheel to the recommendation system is completed.

Next, the server may input the transition account state data, the transition account behavior data, and the transition recommendation object corresponding to the first round into the data simulator, and obtain the preset index information, the transition account state data, and the transition account behavior data corresponding to the second round over the preset time period according to the flow chart applied by the data simulator shown in fig. 6. Subsequently, the server may utilize the index decomposer to decompose the preset index information in the preset time period corresponding to the second round into the preset index information of each time step in the preset time period which is easy to improve. And inputting the preset index information, the transitional account state data and the transitional account behavior data of each time step into a recommendation system, and training the recommendation system to obtain a transitional recommendation object of the round. In this way, training of the second wheel to the recommendation system is completed.

Referring to the first round and the second round of training processes, the recommendation system can be continuously trained until the preset index information corresponding to the target recommendation system meets the preset index condition, for example, the monthly retention rate reaches 50%, the training of the recommendation system is stopped, and the target recommendation system is obtained.

In another optional embodiment, the server may perform training on the recommendation system by using the actual platform feedback system and the recommendation system and using the preset index information, the transition account state data, and the transition account behavior data of each time step until the preset index information corresponding to the recommendation system meets the preset index condition, and stop the training to obtain the target recommendation system. Optionally, in this embodiment, the server may use an actual platform feedback system to replace the data simulator in the previous embodiment, so as to implement training of the target recommendation system.

Thus, compared to the first embodiment, since the feedback time of the actual platform feedback system is slower than that of the data simulator, the server needs more time to obtain the trained target recommendation object. However, compared with a data simulator, the actual platform feedback system can obtain more accurate transient account state data, transient account behavior data and preset index information in a preset time period through simulation, and therefore a target recommendation system which meets requirements better can be obtained.

In the above, the primary account status data, the first account status data, the second account status data, and the transitional account status data (… … of the first round and the second round) include tag lists of a plurality of objects (videos) that the user clicks among all objects recommended by the recommendation system in the corresponding access round by the account information. The initial account behavior data, the first account behavior data, the second account behavior data and the transitional account behavior data (… … of the first round and the second round) comprise viewing feedback representations of all objects recommended by the recommendation system by the client corresponding to the account information in the corresponding visit round and the interval time from the visit round to the next visit round. The account state data represents the same data, and different titles are given only in different size circulation rounds for convenience of writing.

Based on the method and the system, the trained target recommendation system can be used for recommending the object to a certain account.

Fig. 11 is a flowchart illustrating a recommendation method according to an exemplary embodiment, and as shown in fig. 11, the recommendation system training method may be applied to a server or a client, and includes the following steps:

in step S1101, account status data and account execution data of the target account are acquired.

In step S1103, the account status data of the target account and the account execution data are input into the target recommendation system obtained by training, so as to obtain a target recommendation object.

The server can obtain the account state data and the account execution data of the target account, and input the account state data and the account execution data of the target account into the trained target recommendation system to obtain the target recommendation object.

FIG. 12 is a block diagram illustrating a recommendation system training device according to an example embodiment. Referring to fig. 12, the apparatus includes:

a first data determination module 1201 configured to perform determining primary account status data and primary account behavior data of a recommendation system;

an object determination module 1202 configured to perform determining an initial recommendation object corresponding to a recommendation system based on the primary account status data and the primary account behavior data;

a second data determining module 1203, configured to perform determining, according to the primary account status data, the primary account behavior data, and the primary recommendation object, preset index information, transition account status data, and transition account behavior data of the recommendation system over a preset time period; the transitional account status data characterizes improved data for the primary account status data based on the primary recommendation object; the transitional account behavior data characterizes improved data for the primary account behavior data based on the primary recommendation object;

the information conversion module 1204 is configured to perform sparse-to-dense conversion on preset index information in a preset time period to obtain preset index information of each time step of the recommendation system in the preset time period;

the training module 1205 is configured to perform training on the recommendation system based on the preset index information, the transition account state data and the transition account behavior data of each time step to obtain a target recommendation system; and the preset index information corresponding to the target recommendation system meets the preset index condition.

and inputting the preset time period, the preset index information in the preset time period, the recommended system state data and the recommended system behavior data corresponding to each time step into the trained index information decomposer to obtain the preset index information of the recommended system in each time step.

constructing an original information decomposer;

In some possible embodiments, the resolver training module is configured to perform:

acquiring sample data set and initial data of an original generator;

acquiring historical offline data of a recommendation system;

training a discriminator based on the target loss of the discriminator;

training a raw generator based on a target loss of the raw generator;

In some possible embodiments, the raw generator includes a raw simulator and a raw recommender; a target generator determination module configured to perform:

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

FIG. 13 is a block diagram illustrating a recommendation system training apparatus according to an exemplary embodiment. Referring to fig. 13, the apparatus includes:

a data acquisition module 1301 configured to perform acquiring account status data and account execution data of the target account;

the object recommending module 1302 is configured to input the account state data and the account execution data of the target account into the target recommending system obtained by training of the recommending system training device to obtain the target recommending object.

FIG. 14 is a block diagram illustrating an electronic device 2000 for recommendation system training or recommendation, according to an example embodiment. For example, the apparatus 2000 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 14, the apparatus 2000 may include one or more of the following components: a processing component 2002, a memory 2004, a power component 2006, a multimedia component 2008, an audio component 2010, an input/output (I/O) interface 2012, a sensor component 2014, and a communications component 2016.

The processing component 2002 generally controls the overall operation of the device 2000, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 2002 may include one or more processors 2020 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 2002 can include one or more modules that facilitate interaction between the processing component 2002 and other components. For example, the processing component 2002 may include a multimedia module to facilitate interaction between the multimedia component 2008 and the processing component 2002.

The memory 2004 is configured to store various types of data to support operation at the device 2000. Examples of such data include instructions for any application or method operating on device 2000, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 2004 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 2006 provides power to the various components of the device 2000. The power supply components 2006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 2000.

The multimedia component 2008 includes a screen providing an output interface between the device 2000 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 2008 includes a front camera and/or a rear camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 2000 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

Audio component 2010 is configured to output and/or input audio signals. For example, audio component 2010 includes a Microphone (MIC) configured to receive external audio signals when apparatus 2000 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 2004 or transmitted via the communication component 2016. In some embodiments, audio assembly 2010 also includes a speaker for outputting audio signals.

The I/O interface 2012 provides an interface between the processing component 2002 and peripheral interface modules, which can be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 2014 includes one or more sensors for providing various aspects of state assessment for the device 2000. For example, sensor assembly 2014 may detect an open/closed state of device 2000, a relative positioning of components, such as a display and keypad of apparatus 2000, a change in position of apparatus 2000 or a component of apparatus 2000, the presence or absence of user contact with apparatus 2000, an orientation or acceleration/deceleration of apparatus 2000, and a change in temperature of apparatus 2000. Sensor assembly 2014 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 2014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 2014 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 2016 is configured to facilitate wired or wireless communication between the apparatus 2000 and other devices. The apparatus 2000 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 2016 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 2016 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 2000 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 2004 comprising instructions, executable by the processor 2020 of the apparatus 2000 to perform the above-described method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It should be noted that: the sequence of the above embodiments of the present invention is only for description, and does not represent the advantages or disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A recommendation system training method, comprising:

determining an initial recommendation object corresponding to the recommendation system based on the initial account state data and the initial account behavior data;

determining preset index information, transitional account state data and transitional account behavior data of the recommendation system in a preset time period according to the initial account state data, the initial account behavior data and the initial recommendation object; the transitional account status data characterizes improvement data for the primary account status data based on the primary recommendation object; the transitional account behavior data characterizes improvement data based on the initial recommendation object to the primary account behavior data;

training the recommendation system based on the preset index information of each time step, the transitional account state data and the transitional account behavior data to obtain a target recommendation system; and the preset index information corresponding to the target recommendation system meets the preset index condition.

2. The recommendation system training method according to claim 1, wherein the determining, according to the primary account status data, the primary account behavior data and the primary recommendation object, the preset index information, the transitional account status data and the transitional account behavior data of the recommendation system over a preset time period comprises:

inputting the initial account state data, the initial account behavior data and the initial recommendation object into a data simulator to obtain first account state data, first account behavior data and a first time step of the recommendation system;

determining a first recommendation object corresponding to the recommendation system based on the first account state data and the first account behavior data; and (3) circulating step: inputting the first account state data, the first account behavior data and the first recommendation object into the data simulator to obtain second account state data, second account behavior data and a second time step of the recommendation system; until the preset index information, the transitional account state data and the transitional account behavior data of the recommendation system in a preset time period are obtained;

wherein the preset time period consists of a plurality of time steps in a cycle, and the plurality of time steps includes the first time step and the second time step.

3. The recommendation system training method according to claim 2, wherein the converting of the preset index information in the preset time period from sparse to dense to obtain the preset index information of each time step of the recommendation system in the preset time period comprises:

determining the recommended system state data corresponding to each time step based on the account state data and the account behavior data corresponding to each time step in the preset time period;

determining the recommended system behavior data corresponding to each time step based on the recommended object corresponding to each time step in the preset time period;

and inputting the preset time period, the preset index information in the preset time period, and the recommended system state data and the recommended system behavior data corresponding to each time step into a trained index information decomposer to obtain the preset index information of the recommended system in each time step.

4. The recommendation system training method according to claim 3, further comprising:

constructing an original information decomposer;

determining reference index information corresponding to each time step based on the preset time period and preset index information on the preset time period;

inputting the recommended system state data and behavior data corresponding to each time step into the original information decomposer to obtain prediction index information corresponding to each time step;

training the original information analyzer based on the reference index information corresponding to each time step and the prediction index information corresponding to each time step;

5. The recommendation system training method according to claim 4, wherein the obtaining the index information decomposer in case of satisfying an iteration termination condition comprises:

terminating the training of the original information decomposer under the condition that the difference value between the prediction index information corresponding to any one time step in each time step and the reference index information is smaller than or equal to a first preset difference value, and the difference value between the accumulated index information and the preset index information in the preset time period is smaller than or equal to a second preset difference value;

and determining the trained original information decomposer as the index information decomposer.

6. The recommendation system training method according to claim 2, further comprising:

acquiring a sample data set and initial data of an original generator;

training the original generator and a discriminator based on the sample data set and the initial data to obtain a target generator; the target generator includes a data simulator.

7. The recommendation system training method according to claim 6, wherein the obtaining of the sample data set and the initial data of the primitive generator comprises:

acquiring historical offline data of the recommendation system;

dividing the historical offline data into a plurality of sample data based on an access round; the number of the sample data is the same as the numerical value of the visit round; each sample data of the plurality of sample data comprises sample account status data, sample account behavior data, and sample system behavior data;

8. The recommendation system training method according to claim 7, wherein said training said primitive generator and discriminator based on said sample data set and said initial data to obtain a target generator comprises:

inputting sample data corresponding to a first visit round and the initial data into the discriminator to obtain a first discrimination result of the sample data corresponding to the first visit round and a second discrimination result of the initial data;

determining target loss of the discriminator based on label information of sample data corresponding to the first visit round, first label information of the initial data corresponding to the discriminator, the first discrimination result and the second discrimination result;

training the discriminator based on a target loss of the discriminator;

determining a target loss of the original generator based on the second judgment result and first marking information of the initial data corresponding to the original generator;

training the raw generator based on a target loss for the raw generator;

inputting the sample data corresponding to the second visit round and the first generated data into the discriminator to obtain a first discrimination result of the sample data corresponding to the second visit round and a second discrimination result of the first generated data; and (3) circulating step: determining target loss of the arbiter based on the label information of the sample data corresponding to the second visit round, the first label information of the first generated data corresponding to the arbiter, the first discrimination result and the second discrimination result;

and obtaining the target generator under the condition that an iteration termination condition is met.

9. The recommendation system training method according to claim 8, wherein the raw generator comprises a raw simulator and a raw recommender; the generating first generated data based on the raw generator and the initial data comprises:

inputting sample account state data and sample account behavior data in the initial data into the original recommender to obtain first system behavior data;

inputting the first system behavior data, sample account state data and sample account behavior data in the initial data into the original simulator to obtain first account state data and first account behavior data;

determining the first generation data based on the first account status data, the first account behavior data, and the first system behavior data.

10. The recommendation system training method according to claims 1 to 9, wherein the determining, according to the primary account status data, the primary account behavior data and the primary recommendation object, the preset index information, the transitional account status data and the transitional account behavior data of the recommendation system over a preset time period comprises:

and determining preset index information, transitional account state data and transitional account behavior data corresponding to retention rate and/or account preference of the recommendation system in a preset time period according to the initial account state data, the initial account behavior data and the initial recommendation object.

11. A recommendation method, comprising:

acquiring account state data and account execution data of a target account;

inputting the account state data and the account execution data of the target account into a target recommendation system obtained by training according to the training method of any one of the recommendation systems in claims 1 to 10 to obtain a target recommendation object.

12. A recommendation system training device, comprising:

an object determination module configured to perform determining an initial recommendation object corresponding to the recommendation system based on the primary account status data and the primary account behavior data;

a second data determination module configured to perform determination of preset index information, transitional account state data and transitional account behavior data of the recommendation system over a preset time period according to the primary account state data, the primary account behavior data and the primary recommendation object; the transitional account status data characterizes improvement data for the primary account status data based on the primary recommendation object; the transitional account behavior data characterizes improvement data for the primary account behavior data based on the primary recommendation object;

the training module is configured to execute training on the recommendation system based on the preset index information of each time step, the transitional account state data and the transitional account behavior data to obtain a target recommendation system; and the preset index information corresponding to the target recommendation system meets the preset index condition.

13. A recommendation device, comprising:

an object recommending module configured to input the account state data and the account execution data of the target account into the target recommending system trained by the recommending system training device according to claim 12 to obtain the target recommending object.

14. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the recommendation system training method of any of claims 1 to 10 or the recommendation method of claim 11.

15. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the recommendation system training method of any of claims 1-10 or the recommendation method of claim 11.

16. A computer program product, characterized in that the computer program product comprises a computer program stored in a readable storage medium, from which at least one processor of a computer device reads and executes the computer program, causing the computer device to perform the recommendation system training method of any one of claims 1 to 10 or the recommendation method of claim 11.