CN111275205A

CN111275205A - Virtual sample generation method, terminal device and storage medium

Info

Publication number: CN111275205A
Application number: CN202010032925.XA
Authority: CN
Inventors: 谢宜廷; 李延平
Original assignee: Ud Network Co ltd
Current assignee: Ud Network Co ltd
Priority date: 2020-01-13
Filing date: 2020-01-13
Publication date: 2020-06-12
Anticipated expiration: 2040-01-13
Also published as: CN111275205B

Abstract

The application is applicable to the technical field of computers, and provides a virtual sample generation method, which comprises the following steps: acquiring virtual user information, inputting the virtual user information into a first machine learning model, and acquiring search behavior information of a virtual user; inputting the search behavior information into a second machine learning model to obtain the selection behavior information of the virtual user; and combining the search behavior information and the selection behavior information to generate a virtual interaction track corresponding to the virtual user information, and taking the virtual interaction track as a virtual sample. Therefore, training samples for model training are provided for the object recommendation model based on reinforcement learning, so that the object recommendation model can be subjected to model training on line, model training on line is not needed, and the cost of the model training on line is reduced.

Description

Virtual sample generation method, terminal device and storage medium

Technical Field

The application belongs to the technical field of computers, and particularly relates to a virtual sample generation method and terminal equipment.

Background

With the development of intelligence, the existing recommendation system can be combined with a machine learning model to improve the recommendation capability. At present, most of machine learning models applied to the recommendation system are deep learning models, and such models can complete model training on line through off-line data marked with various labels. Besides the deep learning model, the recommendation system can also be combined with the reinforcement learning model, and an agent in the reinforcement learning model learns on line in a mode of continuously trial and error and updating parameters. However, online learning is a real-time interaction process of the model and the user, the model is not intelligent enough at the moment, and error information is easily recommended to the user, so that the user experience is greatly influenced, and the model training cost is very high.

Disclosure of Invention

The embodiment of the application provides a virtual sample generation method, terminal equipment and a storage medium, and can solve the problem of high model training cost of a reinforcement learning model applied to a recommendation system.

In a first aspect, an embodiment of the present application provides a method for generating a virtual sample, where the virtual interaction environment includes a first machine learning model and a second machine learning model, the method includes:

acquiring virtual user information, inputting the virtual user information into a first machine learning model, and acquiring search behavior information of a virtual user;

inputting the search behavior information into a second machine learning model to obtain the selection behavior information of the virtual user;

and combining the search behavior information and the selection behavior information to generate a virtual interaction track corresponding to the virtual user information, and taking the virtual interaction track as a virtual sample.

According to the method and the device, the article searching behavior of the user on the recommending system and the selecting behavior of the user on the article are simulated through the virtual interaction environment, the searching behavior and the selecting behavior are combined into the virtual interaction track to serve as the virtual sample, so that the training sample for model training is provided for the article recommending model based on reinforcement learning, the article recommending model can be subjected to model training on line, model training on line is not needed, and the cost of the model training on line is reduced.

In a second aspect, an embodiment of the present application provides a method for generating a virtual interactive environment, where the virtual interactive environment includes a first machine learning model and a second machine learning model, the method includes:

acquiring virtual user information, inputting the virtual user information into a first preset model, and acquiring search behavior information of a virtual user;

inputting the search behavior information into a second preset model to obtain the selection behavior information of the virtual user;

combining the search behavior information and the selection behavior information to generate a virtual interaction track corresponding to the virtual user information;

updating model parameters respectively corresponding to the first preset model and the second preset model according to a comparison result between the virtual interaction track and a preset real interaction track;

and taking the first preset model with the updated model parameters as the first machine learning model, and taking the second preset model with the updated model parameters as the second machine learning model.

In a third aspect, an embodiment of the present application provides a method for generating an item recommendation model, where the method includes, based on a virtual interaction environment:

obtaining a plurality of virtual samples output by the virtual interactive environment;

and inputting each virtual sample into an article recommendation model to update parameters of the article recommendation model.

In a fourth aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements a method for generating a virtual sample, a method for generating a virtual interactive environment, or a method for generating an item recommendation model when executing the computer program.

In a fifth aspect, the present application provides a computer-readable storage medium, which stores a computer program, where the computer program is executed by a processor to implement a method for generating a virtual sample, a method for generating a virtual interactive environment, or a method for generating an item recommendation model.

In a sixth aspect, the present application provides a computer program product, which when running on a terminal device, causes the terminal device to execute the steps of the above method for generating a virtual sample, the method for generating a virtual interactive environment, or the method for generating an item recommendation model.

It is understood that the beneficial effects of the second to sixth aspects can be seen from the description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a schematic illustration of a virtual interactive environment provided by an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for generating a virtual sample according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a method for generating a virtual sample according to another embodiment of the present application;

fig. 4 is a schematic flowchart of a method for generating a virtual sample according to another embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for generating a virtual interactive environment according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating a method for generating an item recommendation model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

As introduced in the background, the recommendation system may make item recommendations based on a reinforcement learning model, but the rewarding guidance behavior gained by the agents in the reinforcement learning model through interaction with the environment emphasizes how to act based on the environment to achieve the maximum expected benefit. According to the requirement of reinforcement learning, training a reinforcement learning model in a real environment is very challenging, and a large amount of sampling and trial and error are needed in the real environment, so that the training of the model in the real environment requires high cost, and even the trial and error can cause loss which is difficult to estimate.

Therefore, the embodiment of the application provides a method for generating a virtual sample, which realizes that an article search behavior of a user on a recommendation system and a selection behavior of the user on an article are simulated through a virtual interaction environment, and the search behavior and the selection behavior are combined into a virtual interaction track to serve as the virtual sample, so that a training sample for model training is provided for an article recommendation model based on reinforcement learning, and further the article recommendation model can be subjected to model training on line without performing model training on line, and the cost of the model training on line is reduced.

Fig. 1 shows a schematic diagram of a virtual interactive environment provided by an embodiment of the present application, where the virtual interactive environment 100 includes a user generated model 101, a first machine learning model 102 and a second machine learning model 103. The user generation model 103 is mainly used for generating virtual user information, which may include an activation function-a full connection layer-a softmax function; the first machine learning model 102 is a search behavior generation model, which is mainly used for simulating a search behavior of a user on a search engine and may include an activation function-full connection layer; the second machine learning model 103 generates a model for the selection behavior, mainly for simulating the behavior of the user for selecting the searched item, which may include an activation function-full connectivity-softmax function.

The method for generating a virtual sample provided in the embodiment of the present application may be applied to a terminal device, where the terminal device may be a mobile phone, a tablet computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a server, and the like.

Fig. 2 shows a schematic flow chart of a method for generating a virtual sample provided in the present application, which may be applied to the terminal device on which the virtual interactive environment is built, by way of example and not limitation.

S201, acquiring virtual user information, inputting the virtual user information into a first machine learning model, and acquiring search behavior information of a virtual user;

in the S201, the virtual user information is user information simulated by the system, which may include user gender, occupation, browsing records, and the like, and the browsing records are browsing records of the user on the internet. The first machine learning model is a model recommending article information to the virtual user according to the virtual user information, so that the search behavior information of the user can be simulated by combining the virtual user information and the article information. The search behavior information includes virtual user information and article information, and the article information may be information such as commodity information and news information.

By way of example and not limitation, when a real user uses a shopping platform to shop, information of goods to be purchased is generally searched, and the shopping platform returns an item recommendation list to a user terminal according to search information input by the real user. In this embodiment, the first machine learning model predicts commodity information that the virtual user may search according to the virtual user information, and then obtains an item recommendation list according to the commodity information, thereby simulating a search behavior of a real user on a shopping platform. Specifically, the virtual user information input into the first machine learning model may be activated through an activation function, feature extraction may be performed through a full connection layer, and commodity information that may be searched by the virtual user may be output.

S202, inputting the search behavior information into a second machine learning model to obtain the selection behavior information of the virtual user;

in S202 described above, the second machine learning model is a model that predicts selections made by the virtual user for the item. By way of example and not limitation, when the shopping platform returns an item recommendation list to the user terminal of the real user, the real user may have multiple selections for the item, such as selecting to click on the item, purchasing the item, clicking on the item and then leaving the item page, browsing the next item recommendation list, and so on. In this embodiment, the second machine learning model predicts a probability of each selection made by the virtual user for the item based on the search behavior information, and determines the selection made by the virtual user for the item based on the probabilities. Specifically, the search behavior information input into the second machine learning model may be activated through an activation function, feature extraction may be performed through a full connection layer, a probability value of each selection made by the virtual user on the item may be output through a softmax function, and the selection behavior information may be determined according to the probability value.

S203, combining the search behavior information and the selection behavior information to generate a virtual interaction track corresponding to the virtual user information, and taking the virtual interaction track as a virtual sample.

In the step S203, the virtual interaction trajectory is an interaction trajectory between a simulated user and the item recommendation system. Alternatively, the search and selection behavior information and the selection behavior information may be concatenated in the chronological order in which the search and selection behaviors occur. It should be understood that the same virtual user may have multiple search behavior information and multiple selection behavior information.

According to the method and the device, the interaction track of the user and the article recommendation system is simulated through the virtual interaction environment, and the interaction track is used as a training sample for training the article recommendation model, so that the article recommendation model can be subjected to model training under an offline condition.

On the basis of the embodiment shown in fig. 2, the present application provides another embodiment of a virtual sample generation method. The acquiring of the virtual user information in step S201 described above includes steps S2011 and S2012. It should be noted that the steps that are the same as those in the embodiment of fig. 2 are not repeated herein, please refer to the foregoing description.

S2011, acquiring preset Gaussian noise, and collecting a plurality of target vectors with preset dimensions from the Gaussian noise;

in S2011, the gaussian noise is based on a normal distribution, and the predetermined dimension is a dim dimension, such as 2 dimensions or 3 dimensions. Randomly sampling from Gaussian noise to obtain m dim-dimensional target vectors z¹，z²，…，z^m。

S2012, inputting the target vector into a user generating model to simulate user information, and obtaining the virtual user information corresponding to the target vector.

In S2012 above, the user generated model is a model that generates simulated user information. Optionally, target vector z¹，z²，…，z^mInputting a user generation model to obtain virtual user information

According to the embodiment of the application, the virtual user information is randomly generated according to the Gaussian noise, so that specific information does not need to be input by a user, user operation is reduced, and the virtual user information generated by the Gaussian noise based on normal distribution is more stable.

On the basis of the embodiment shown in fig. 2, fig. 3 shows a flowchart of another virtual sample generation method provided in the embodiment of the present application. As shown in fig. 3, the step S201 of inputting the virtual user information into the first machine learning model to obtain the search behavior information of the virtual user specifically includes steps S301 and S302. It should be noted that the steps that are the same as those in the embodiment of fig. 2 are not repeated herein, please refer to the foregoing description.

S301, extracting first characteristic information of the virtual user information, and acquiring a corresponding item recommendation list according to the first characteristic information, wherein the item recommendation list comprises the item information;

in S301, the virtual user information may be activated through an activation function, a non-linear factor of the virtual user information is increased, and a problem that the first machine learning model is linear and inseparable is avoided, feature extraction is performed on the activated virtual user information through a full-connection network, article information that the virtual user may search is determined according to the extracted feature information, and an article recommendation list corresponding to the article information is obtained.

S302, combining the item information on the item recommendation list with the virtual user information to generate search behavior information of the virtual user.

In S302, since the item information and the virtual user information belong to static data, in order to generate environment interaction, the item information and the virtual user information are combined to generate search behavior information of the virtual user searching for an item, thereby generating data interacting with the environment.

On the basis of the embodiment shown in fig. 2, fig. 4 shows a flowchart of another virtual sample generation method provided in the embodiment of the present application. As shown in fig. 4, the step S202 specifically includes steps S401 and S402. It should be noted that the steps that are the same as those in the embodiment of fig. 2 are not repeated herein, please refer to the foregoing description.

S401, extracting second characteristic information of the search behavior information, classifying the second characteristic information through a preset classifier, and obtaining a probability value of each selection made by the virtual user to an article;

in the above S401, the search behavior information may be activated by an activation function, a non-linear factor of the search behavior information is increased, and a problem that the second machine learning model is linear and inseparable is avoided, the activated search behavior information is subjected to feature extraction by a full-connection network, and the extracted feature information is classified by a preset classifier (e.g., softmax), so as to obtain a probability value of each selection made by the virtual user on the article.

S402, determining the selection behavior information of the virtual user according to the probability values.

In the above S402, each selection made by the virtual user for the item corresponds to a probability value, and optionally, the selection with the highest probability value is used as the selection behavior information of the virtual user.

Fig. 5 shows a schematic flow chart of a method for generating a virtual interactive environment provided by the present application, which may be applied to the terminal device, by way of example and not limitation, and the virtual interactive environment is built on the terminal device based on the virtual interactive environment.

S501, acquiring virtual user information, inputting the virtual user information into a first preset model, and acquiring search behavior information of a virtual user;

s502, inputting the search behavior information into a second preset model to obtain the selection behavior information of the virtual user;

s503, combining the search behavior information and the selection behavior information to generate a virtual interaction track corresponding to the virtual user information;

in S501 to S503, the first predetermined model is an initialization model of a first machine learning model, and the second predetermined model is an initialization model of a second machine learning model. For brevity of description, see S201 to S203 for the rest of the description.

S504, updating model parameters respectively corresponding to the first preset model and the second preset model according to a comparison result between the virtual interaction track and a preset real interaction track;

and S505, taking the first preset model with the updated model parameters as the first machine learning model, and taking the second preset model with the updated model parameters as the second machine learning model.

In the above S504 and S505, a preset discriminator is used to discriminate a comparison result between the virtual interaction trajectory and the real interaction trajectory, the comparison result is used as a reward value of the first preset model and the second preset model, and the model parameters respectively corresponding to the first preset model and the second preset model are updated according to the search behavior information, the selection behavior information and the reward value.

On the basis of the embodiment shown in fig. 5, the present application provides another embodiment of a method for generating a virtual interactive environment. Step S5011 is also included before step S501. It should be noted that the steps that are the same as those in the embodiment of fig. 5 are not repeated herein, please refer to the foregoing description.

S5011, generating a user generation model by adopting a countermeasure generation network algorithm, wherein the user generation model is used for generating the virtual user information

In the above S5011, the countermeasure generation network algorithm is a deep learning model algorithm including a generator and an arbiter. Specifically, step A, collecting a plurality of target vectors with preset dimensions from Gaussian noise, and inputting the target vectors into a preset generator to obtain first virtual user data; step B, acquiring a maximum deviation value between the first virtual user data and the preset real user data through a preset discriminator, and updating parameters of the preset discriminator according to the maximum deviation value; step C, collecting a plurality of target vectors with preset dimensions from Gaussian noise, and inputting the target vectors into a preset generator to obtain second virtual user data; step D, acquiring a minimum deviation value between the second virtual user data and the real user data through the preset discriminator after updating the parameters, and updating the parameters of the preset generator according to the minimum deviation value; and D, repeatedly executing the steps A to D until the minimum deviation value reaches a second preset value, and taking a preset generator as a user generated model.

In the step B, the real user data may be a data set obtained by generating a user feature vector by one-hot encoding of user information such as user gender, occupation, browsing history, and the like. In order to distinguish real data from virtual data as much as possible, it is necessary to maximize the distribution distance between the two data, optionally using Wasserstein distance measure, resulting in the formula:

max_DE_x～Pdata[D(x)]-E_z～Pz[D(G(z))]，

the parameter updating formula is as follows:

where D (x) is data of the discriminator, G (z) is data of the generator, and theta is a parameter of the discriminator, and theta is defined not to exceed a constant c in order to prevent the parameter theta from becoming infinite.

In step D above, in contrast to step B, minimizing the Wasserstein distance yields the formula:

min_G-E_z～Pz[D(g(z))]；

the parameter updating formula is as follows:

on the basis of the embodiment shown in fig. 5, the present application provides another embodiment of a method for generating a virtual interactive environment. The above step S504 includes steps S5041 and S5042. It should be noted that the steps that are the same as those in the embodiment of fig. 5 are not repeated herein, please refer to the foregoing description.

S5041, identifying a comparison result between the virtual interaction track and the real interaction track through a preset identifier, and taking the comparison result as reward values of the first preset model and the second preset model;

in S5041, the preset discriminator distinguishes the virtually generated interaction trajectory from the actual interaction trajectory, that is, makes the distance between them larger, and updates the preset discriminator parameter θ by using the formula of the original countermeasure network, that is, the following formula is maximized at θ:

E_Tg[log(D_θ(s,a))]+E_Tc[log(1-D_θ(s,a))](ii) a Wherein s is search behavior information and a is selection behavior information.

And obtaining theta corresponding to the maximum value of the formula, updating the model parameters of the preset discriminator in a random gradient descending mode and the like according to the theta, identifying the minimum deviation value of the virtual interactive track and the real interactive track by the preset discriminator after updating the model parameters, and taking the minimum deviation value as a reward value.

And S5042, updating model parameters respectively corresponding to the first preset model and the second preset model according to the search behavior information, the selection behavior information and the reward value.

In S5042, the virtual user information in the search behavior information is used as the state S1, the item in the search behavior information is used as the new policy a1, the reward value is used as the reward value of the policy a1 in the state S1 at a certain time, and the model parameters of the first preset model are updated by the DQN algorithm.

And taking the searched behavior information as a state s2, selecting the behavior information as a strategy a2, taking the reward value as the reward value of a strategy a2 adopted under the state s2 at a certain moment, and updating the model parameters of the second preset model by adopting a DQN algorithm.

Fig. 6 shows a schematic flow chart of a method for generating an item recommendation model provided in the present application, which may be applied to the terminal device, by way of example and not limitation, and the virtual interactive environment is built on the terminal device based on the virtual interactive environment.

S601, acquiring a plurality of virtual samples output by the virtual interactive environment;

and S602, inputting each virtual sample into an article recommendation model to update parameters of the article recommendation model.

In the above S601 and S602, the virtual sample output by the virtual interactive environment is used as a training sample for training the item recommendation model, so that the item recommendation model does not need to acquire real-time data on the item recommendation system as a training sample, the item recommendation model based on reinforcement learning is trained under an offline condition, and the training cost and the training risk of the item recommendation model are reduced.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 7 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 7, the terminal device 7 of this embodiment includes: at least one processor 70 (only one shown in fig. 7), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, wherein the processor 70 executes the computer program 72 to implement the steps in any of the above-mentioned embodiments of the method for generating a virtual sample, the method for generating a virtual interactive environment, or the method for generating an item recommendation model.

It should be noted that the terminal device 7 may have functions of implementing the steps in the embodiments of the virtual sample generation method, the virtual interactive environment generation method, and the item recommendation model generation method. It can be understood that the steps in the embodiments of the method for generating a virtual sample, the method for generating a virtual interactive environment, or the method for generating an article recommendation model may be implemented on the same terminal device, or the steps in the embodiments of the method for generating a virtual sample, the method for generating a virtual interactive environment, or the method for generating an article recommendation model may be implemented on different terminal devices.

The terminal device 7 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 70, a memory 71. Those skilled in the art will appreciate that fig. 7 is only an example of the terminal device 7, and does not constitute a limitation to the terminal device 7, and may include more or less components than those shown, or combine some components, or different components, for example, and may further include input/output devices, network access devices, and the like.

The Processor 70 may be a Central Processing Unit (CPU), and the Processor 70 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may in some embodiments be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. In other embodiments, the memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 71 may also be used to temporarily store data that has been output or is to be output.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), random-access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for generating a virtual sample, based on a virtual interactive environment, wherein the virtual interactive environment comprises a first machine learning model and a second machine learning model, the method comprising:

2. The method for generating a virtual sample according to claim 1, wherein the acquiring virtual user information includes:

acquiring preset Gaussian noise, and acquiring a plurality of target vectors with preset dimensions from the Gaussian noise;

and inputting the target vector into a user generation model to simulate user information, and acquiring the virtual user information corresponding to the target vector.

3. The method for generating virtual samples according to claim 1 or 2, wherein the inputting the virtual user information into the first machine learning model to obtain the search behavior information of the virtual user comprises:

extracting first characteristic information of the virtual user information through the first machine learning model, and acquiring a corresponding article recommendation list according to the first characteristic information, wherein the article recommendation list comprises article information;

and combining the item information on the item recommendation list with the virtual user information to generate the search behavior information of the virtual user.

4. The method for generating virtual samples according to claim 1 or 2, wherein the inputting the search behavior information into a second machine learning model to obtain the selection behavior information of the virtual user comprises:

extracting second characteristic information of the search behavior information through the second machine learning model, and classifying the second characteristic information through a preset classifier to obtain a probability value of each selection made by the virtual user to an article;

and determining the selection behavior information of the virtual user according to the probability values.

5. A method for generating a virtual interactive environment, wherein the virtual interactive environment comprises a first machine learning model and a second machine learning model, the method comprising:

6. The method for generating a virtual interactive environment according to claim 5, wherein before said obtaining the virtual user information, further comprising:

and generating a user generation model by adopting a countermeasure generation network algorithm, wherein the user generation model is used for generating the virtual user information.

7. The method for generating a virtual interactive environment according to claim 5, wherein the updating the model parameters respectively corresponding to the first preset model and the second preset model according to the comparison result between the virtual interactive trajectory and the preset real interactive trajectory includes:

identifying a comparison result between the virtual interaction track and the real interaction track through a preset identifier, and taking the comparison result as reward values of the first preset model and the second preset model;

and updating model parameters respectively corresponding to the first preset model and the second preset model according to the searching behavior information, the selecting behavior information and the reward value.

8. A method for generating an item recommendation model based on a virtual interactive environment is characterized by comprising the following steps:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 4, or 5 to 7, or 8 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4, or 5 to 7, or 8.