CN110135951B

CN110135951B - Game commodity recommendation method and device and readable storage medium

Info

Publication number: CN110135951B
Application number: CN201910406926.3A
Authority: CN
Inventors: 杜鑫
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2019-05-15
Filing date: 2019-05-15
Publication date: 2021-07-27
Anticipated expiration: 2039-05-15
Also published as: CN110135951A

Abstract

According to the method, the device and the readable storage medium for recommending the game commodities, the current state set of the player is formed by acquiring the attribute feature vectors of the current game commodities browsed by the player and the feature vectors of the player; inputting the current state set of the player into a reinforcement learning algorithm model so as to enable the reinforcement learning algorithm model to call an attribute prediction matrix set corresponding to the characteristic vector of the player and output each attribute prediction characteristic vector; and taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity and recommending the recommended game commodity, so that when the game commodity is recommended for the player, the used reinforcement learning algorithm model comprehensively considers factors of historical game commodities browsed by the player and the recommended game commodities caused by the browsed current game commodities, and the game commodity capable of meeting the real requirements of the player is recommended for the player.

Description

Game commodity recommendation method and device and readable storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for recommending game commodities and a readable storage medium.

Background

With the development of computer technology, it becomes possible to provide more accurate commodity recommendation service for users by using data analysis technology. Particularly, in the field of games, unlike general commodities, attributes of game commodities are more diversified, which makes it difficult to recommend more precise game commodities to players.

In the prior art, game commodities are generally recommended to players based on a clustering algorithm, and distances between current game commodities browsed by the players and other game commodities are analyzed by using a distance algorithm, so that game commodities most similar to the current game commodities are found from the distances and are recommended as recommended game commodities.

However, in the recommendation of game commodities based on the clustering algorithm, although the overall attributes of the recommended game commodities can be kept high in similarity with the current game commodities browsed by the player, the attributes of the game commodities are more diversified, and the player is more concerned about a certain sub-attribute of the game commodities. That is, the recommended game pieces do not take into account the current game piece sub-attributes that the player is actually interested in, which would make the recommended game pieces unmatchable to the player's actual needs.

Disclosure of Invention

In order to solve the above mentioned problems, the present invention provides a method, an apparatus and a readable storage medium for recommending game goods.

In one aspect, the present invention provides a method for recommending game merchandise, including:

acquiring attribute feature vectors of current game commodities browsed by a player and feature vectors of the player to form a current state set of the player;

inputting the current state set of the player into a reinforcement learning algorithm model so that the reinforcement learning algorithm model calls an attribute prediction matrix set corresponding to the current state set of the player and outputs attribute prediction characteristic vectors; the attribute prediction matrix set is determined by the reinforcement learning algorithm model according to each attribute feature vector of historical game commodities browsed by a player;

and taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity and recommending the recommended game commodity.

In an alternative embodiment, before inputting the current state set of the player into the reinforcement learning algorithm model, the method further includes:

judging whether the player triggers a recommendation request for game commodities;

and if so, executing the step of inputting the current state set of the player into a reinforcement learning algorithm model.

In an optional implementation manner, when the player does not trigger a recommendation request for game goods, the method for recommending game goods further includes:

acquiring the behavior of a player on current game commodities, and calling a last state set of the player; wherein, the previous state set comprises the attribute feature vectors of the previous game commodity browsed by the player;

inputting the last state set and the current state set of the player into a reinforcement learning algorithm model so that the reinforcement learning algorithm model takes the behavior of the current game commodity as a model reward, and updating an attribute prediction matrix set corresponding to the player in the reinforcement learning algorithm model.

In an alternative embodiment, the inputting the previous state set and the current state set of the player into a reinforcement learning algorithm model, so that the reinforcement learning algorithm model uses the behavior of the current game commodity as a model reward, and updating the attribute prediction matrix set corresponding to the player in the reinforcement learning algorithm model includes:

determining a corresponding reward value of the behavior of the current game commodity in a preset reward function;

updating the probability matrix of each attribute in the attribute prediction matrix set corresponding to the player by using an updating formula, wherein the updating formula is Q^new(s,α)＝(1-lr)·Q(s,α)+lr·[R+γ·maxQ(α,α')]；

Wherein, Q is^new(s, α) represents an updated probability value when the feature vector of the previous game commodity is s and the feature vector of the current game commodity is α, Q (s, α) represents a probability value when the feature vector of the previous game commodity is s and the feature vector of the current game commodity is α, maxQ (α, α') represents a probability maximum value among probability values of attribute feature vectors of the current game commodity when the feature vector of the previous game commodity is α of a probability matrix Q, lr is a preset algorithm learning rate, R is the award value, and γ is a preset discount factor.

In an optional implementation manner, the inputting the current state set of the player into a reinforcement learning algorithm model, so that the reinforcement learning algorithm model invokes a set of attribute prediction matrices corresponding to the current state set of the player, and outputs each attribute prediction feature vector includes:

calling a corresponding attribute prediction matrix set according to the characteristic vector of the player in the current state set; wherein, the attribute prediction matrix set comprises a probability matrix of each attribute;

and aiming at each attribute feature vector of the current game commodity, performing prediction processing by using a corresponding probability matrix to obtain each attribute prediction feature vector.

In one optional implementation, the recommending, as a recommended game item, a game item that matches each of the attribute prediction feature vectors, includes:

and taking the attribute prediction feature vectors as constraint conditions, and obtaining recommended game commodities in a preset game commodity library by utilizing the constraint conditions so as to recommend the recommended game commodities.

In one optional implementation, taking the attribute prediction feature vectors as constraint conditions, and obtaining recommended game commodities in a preset game commodity library by using the constraint conditions, the method includes:

taking the attribute prediction characteristic vectors as constraint conditions, and acquiring the weight of each prediction characteristic vector;

and obtaining recommended game commodities in a preset game commodity library according to each constraint condition and the corresponding weight.

In still another aspect, the present invention provides a game commodity recommendation apparatus, including:

the interactive module is used for acquiring attribute feature vectors of current game commodities browsed by the player and feature vectors of the player to form a current state set of the player;

the processing module is used for inputting the current state set of the player into a reinforcement learning algorithm model so that the reinforcement learning algorithm model calls an attribute prediction matrix set corresponding to the current state set of the player and outputs attribute prediction characteristic vectors; the attribute prediction matrix set is determined by the reinforcement learning algorithm model according to each attribute feature vector of historical game commodities browsed by a player;

the interaction module is also used for taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity and recommending the recommended game commodity.

In one optional implementation, the processing module is further configured to, before inputting the current state set of the player into the reinforcement learning algorithm model, execute the step of inputting the current state set of the player into the reinforcement learning algorithm model when the player triggers a recommendation request for a game commodity;

the processing module acquires the behavior of a player on the current game commodity when the player does not trigger a recommendation request for the game commodity, and calls a last state set of the player; wherein, the previous state set comprises the attribute feature vectors of the previous game commodity browsed by the player; inputting the last state set and the current state set of the player into a reinforcement learning algorithm model so that the reinforcement learning algorithm model takes the behavior of the current game commodity as a model reward, and updating an attribute prediction matrix set corresponding to the player in the reinforcement learning algorithm model.

In still another aspect, the present invention provides a game commodity recommendation apparatus, including: a memory, a processor, and a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of the preceding claims.

In a final aspect, the invention provides a readable storage medium having stored thereon a computer program for execution by a process to perform the method of any preceding claim.

According to the method, the device and the readable storage medium for recommending the game commodities, the current state set of the player is formed by acquiring the attribute feature vectors of the current game commodities browsed by the player and the feature vectors of the player; inputting the current state set of the player into a reinforcement learning algorithm model so that the reinforcement learning algorithm model calls an attribute prediction matrix set corresponding to the current state set of the player and outputs attribute prediction characteristic vectors; the attribute prediction matrix set is determined by the reinforcement learning algorithm model according to each attribute feature vector of historical game commodities browsed by a player; taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity, recommending and acquiring each attribute feature vector of the current game commodity browsed by the player and the feature vector of the player to form a current state set of the player; inputting the current state set of the player into a reinforcement learning algorithm model so as to enable the reinforcement learning algorithm model to call an attribute prediction matrix set corresponding to the characteristic vector of the player and output each attribute prediction characteristic vector; and taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity and recommending the recommended game commodity, so that when the game commodity is recommended for the player, the used reinforcement learning algorithm model comprehensively considers factors of historical game commodities browsed by the player and the recommended game commodities caused by the browsed current game commodities, and the game commodity capable of meeting the real requirements of the player is recommended for the player.

Drawings

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

FIG. 1 is a schematic diagram of a network architecture on which the present invention is based;

fig. 2 is a schematic flow chart illustrating a method for recommending game merchandise according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a method for recommending game merchandise according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of a game commodity recommendation device according to a third embodiment of the present invention;

fig. 5 is a hardware schematic diagram of a game commodity recommendation device according to a fourth embodiment of the present invention.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Detailed Description

Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present application. It should be understood that the drawings and embodiments of the present application are for illustration purposes only and are not intended to limit the scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the embodiments of the application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Of course, in other prior art, a model may be established according to the historical browsing records of the player to directly predict the attributes of the game goods that may be of interest to the player to determine the recommended game goods. However, due to the diversification of the attributes of the game commodities and the interaction between the recommended system and the browsing behavior of the player, the historical browsing record model cannot be updated well in a timing manner by adopting the method, and the recommended effect of the recommended game commodities is poor.

In order to solve the above mentioned problems, the present invention provides a method, an apparatus and a readable storage medium for recommending game goods. Fig. 1 is a schematic diagram of a network architecture based on the present invention, and as shown in fig. 1, the network architecture based on the present invention at least includes a game commodity recommending device 1 and a terminal 2.

The recommendation device 1 for game commodities can be a server or a server cluster erected at the cloud end, and can be used for storing data and calculating and processing the data according to preset processing logic.

The terminal 2 may specifically be a hardware device that can be used for a player to perform a game experience, such as a smart phone, a tablet computer, a desktop computer, a smart game machine, and the like, where a game client may be installed on the terminal 2 or a game experience interface may be provided on the terminal, that is, a player may trigger a corresponding game operation on the client or the interface, where the game operation includes, but is not limited to, controlling a game character to perform a game experience, browsing game goods, purchasing game goods, and the like.

The game commodity recommending apparatus 1 and the terminal 2 can be connected by wireless communication or wired communication to perform data interaction.

Fig. 2 is a schematic flow chart of a method for recommending game commodities, according to an embodiment of the present invention, as shown in fig. 2, the method for recommending game commodities includes:

step 101, acquiring attribute feature vectors of current game commodities browsed by a player and feature vectors of the player to form a current state set of the player;

102, inputting the current state set of the player into a reinforcement learning algorithm model so that the reinforcement learning algorithm model calls an attribute prediction matrix set corresponding to the current state set of the player and outputs each attribute prediction characteristic vector;

the attribute prediction matrix set is determined by the reinforcement learning algorithm model according to each attribute feature vector of historical game commodities browsed by a player;

and 103, taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity and recommending the recommended game commodity.

The execution subject of the method for recommending a game commodity according to the present embodiment is a game commodity recommending apparatus shown in fig. 1. In addition, the reinforcement learning algorithm model described in this embodiment may be a variety of models, and in particular, an improved Q-learning algorithm model may be used.

Specifically, the present embodiment provides a more accurate game commodity recommendation service to a player. First, the game commodity recommending device acquires attribute feature vectors of current game commodities browsed by a player and feature vectors of the player, and forms a current state set of the player.

The player and the current goods browsed by the player are acquired through the terminal, namely the terminal records a series of game data such as the ID of the player, the identity of the player, the game behavior of the player and the like and sends the game data to the recommendation device of the game goods for analysis.

The game commodity recommending device analyzes the image of the player by using the user image system for the obtained game data to obtain the characteristic vector of the player. Wherein, the player is labeled by the image analysis of the player, so that each player can be described by a plurality of labels, and the labels can be used for reflecting the basic characteristics of the player, such as sex, age group, consumption level in the game and the like, and can also be used for reflecting the characteristics of interest or character of the player, such as favorite game type, favorite game commodity type and the like; and can also be used for reflecting the action characteristics related to the game action, such as the game operation level, the game attitude, the game operation style and the like of the player in the game process. That is, by performing portrait analysis on game data of a player, portrait tags of the player in different feature dimensions are obtained, and a feature vector of the player itself is obtained based on the portrait tags. The image analysis of the game data of the player can be specifically realized by using a conventional image analysis algorithm, and the present invention is not limited thereto.

In addition, the recommending device for the game commodity analyzes the game data according to the browsing behavior of the player on the game commodity so as to obtain each attribute feature vector of the current game commodity browsed by the player.

Wherein, the game goods generally refer to the props in the game, such as the weapons, the armour equipment, the medicament, fashion clothing, etc. of the characters. For different game commodities, the attributes are generally different: for example, for weapons, the attributes are typically injury value, attack scope, special skills, element attributes, cooling duration, pricing information, and the like; for another example, for fashion clothing, the attributes thereof are typically applicable location, applicable duration, price information, applicable role position, applicable role occupation, and the like; also for example, for a pharmaceutical agent, the attribute may be a damage or recovery value, a duration, an element attribute, a drug resistance attribute, and the like.

That is, based on the diversification of game commodities, the attributes of each game commodity are different, and in the present invention, the game commodity recommending apparatus acquires data of a current game commodity viewed by a player and acquires attribute feature vectors of the current game commodity. For example, for a weapon, the injury value, the attack range, the special skill, the element attribute, the cooling duration, and the price information are all different attributes of the weapon, and the attribute information for each attribute can be described by a feature vector, for example, the injury value in the injury attribute is 20 to 25, and the feature vector can be represented as [20,25 ]. It should be noted that the foregoing example is only one implementation manner provided by the present invention, and those skilled in the art may set the specific content and the representation manner represented by the attribute and the feature vector according to the difference of the game commodity and the difference of the game type, which is not limited in this respect.

When the recommending device of the game commodity obtains the characteristic vector of the player and each attribute characteristic vector of the current game commodity browsed by the player, the current state set of the player is constructed based on the vectors, so that the reinforcement learning algorithm model can process the current state set of the player.

And then, inputting the current state set of the player into a reinforcement learning algorithm model so that the reinforcement learning algorithm model calls an attribute prediction matrix set corresponding to the current state set of the player and outputs each attribute prediction characteristic vector.

In the recommendation method adopted by the invention, an improved Q-learning algorithm model in the reinforcement learning algorithm model is utilized to process the current state set so as to realize the output of the characteristic vector of each attribute prediction of the game commodity which is possibly interested by the player.

Specifically, in the improved Q-learning algorithm model, different attribute prediction matrix sets can be preset for different types of players for data processing. The different types of players may be embodied as feature vectors of current states of different players, that is, for any two players, when their own feature vectors are the same, the types corresponding to the two players are the same, and at this time, the attribute prediction matrix sets adopted by the two players are also the same. Meanwhile, the attribute prediction matrix set is determined by the reinforcement learning algorithm model according to the attribute characteristic vectors of historical game commodities browsed by the player, and can reflect the preference of the player on each attribute in the process of browsing the game commodities in a period of time.

Therefore, the recommending device of the game commodity calls the attribute prediction matrix set matched with or associated with the characteristic vector of the player from a plurality of pre-stored attribute prediction matrix sets according to the characteristic vector of the player so as to correspondingly process each acquired attribute characteristic vector in the current state set of the player by using the attribute prediction matrix set. Namely, calling a corresponding attribute prediction matrix set according to the characteristic vector of the player; wherein, the attribute prediction matrix set comprises a probability matrix of each attribute; and aiming at each attribute feature vector of the current game commodity, performing prediction processing by using a corresponding probability matrix to obtain each attribute prediction feature vector.

It should be noted that any attribute prediction matrix set may specifically be composed of probability matrices of all attributes, and each probability matrix corresponds to an attribute of a game commodity, for example, an attribute prediction feature vector of an attribute that is used for calculating a damage value by using a damage value probability matrix.

In addition, the probability matrix in this embodiment may be a two-dimensional matrix, where the coordinate in one direction of each element of each probability matrix may be used to represent the attribute feature vector of the current game commodity, the coordinate in the other direction may be used to represent the predicted or recommended attribute prediction feature vector, and the element value corresponding to each coordinate group is a probability value.

Table 1 shows an injury probability matrix, as shown in table 1 below, when the injury value of the current weapon is [20,25], the probability of the player triggering the effective browsing behavior with the injury value of [20,25] next time is 0.6, the probability of triggering the effective browsing behavior with the injury value of [1,24] next time is 0.1, and the probability of triggering the effective browsing behavior with the injury value of [26,50] next time is 0.3.

TABLE 1

By using each probability matrix, the attribute prediction feature vector having the highest probability value can be output as the output attribute prediction feature vector. At this time, the game commodity recommending device obtains the attribute prediction feature vector corresponding to each attribute of the current game commodity.

Then, the game commodity recommending device takes the attribute prediction feature vectors as constraint conditions, and obtains recommended game commodities from a preset game commodity library by using the constraint conditions so as to recommend the recommended game commodities. Specifically, in the game commodity library, each game commodity can be classified, screened and searched according to the attribute. Therefore, the obtained attribute prediction feature vectors can be used as constraint conditions, namely query conditions, so that the game commodity recommending device can query and filter in the game commodity library to obtain recommended game commodities meeting the constraint conditions.

In a preferred embodiment, in some cases, the game commodity library does not include game commodities that satisfy all of the constraint conditions, and at this time, the recommendation device for game commodities may use the attribute prediction feature vectors as the constraint conditions, obtain the weight of each prediction feature vector, and obtain a recommended game commodity in a preset game commodity library according to each constraint condition and the corresponding weight. Specifically, if the probability value corresponding to the attribute prediction feature vector of the obtained damage value is 0.2, the probability value corresponding to the attribute prediction feature vector of the element attribute is 0.9, and the probability value corresponding to the attribute prediction feature vector of the price information is 0.8, at this time, the weight of the attribute prediction feature vector of the damage value with a lower probability value may be reduced or set to zero, so that the recommended game product can satisfy the attribute prediction feature vector corresponding to the attribute with a higher probability value, that is, a higher weight.

The method for recommending the game commodities comprises the steps of obtaining attribute feature vectors of current game commodities browsed by a player and feature vectors of the player, and forming a current state set of the player; inputting the current state set of the player into a reinforcement learning algorithm model so that the reinforcement learning algorithm model calls an attribute prediction matrix set corresponding to the current state set of the player and outputs attribute prediction characteristic vectors; the attribute prediction matrix set is determined by the reinforcement learning algorithm model according to each attribute feature vector of historical game commodities browsed by a player; taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity, recommending and acquiring each attribute feature vector of the current game commodity browsed by the player and the feature vector of the player to form a current state set of the player; inputting the current state set of the player into a reinforcement learning algorithm model so as to enable the reinforcement learning algorithm model to call an attribute prediction matrix set corresponding to the characteristic vector of the player and output each attribute prediction characteristic vector; and taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity and recommending the recommended game commodity, so that when the game commodity is recommended for the player, the used reinforcement learning algorithm model comprehensively considers factors of historical game commodities browsed by the player and the recommended game commodities caused by the browsed current game commodities, and the game commodity capable of meeting the real requirements of the player is recommended for the player.

Fig. 3 is a schematic flow chart of a method for recommending game commodities, according to a second embodiment of the present invention, as shown in fig. 3, the method for recommending game commodities includes:

step 201, acquiring attribute feature vectors of current game commodities browsed by a player and feature vectors of the player to form a current state set of the player;

step 202, judging whether the player triggers a recommendation request for game commodities;

if yes, go to step 203, otherwise go to step 205.

Step 203, inputting the current state set of the player into a reinforcement learning algorithm model, so that an attribute prediction matrix set corresponding to the feature vector of the player called by the reinforcement learning algorithm model outputs each attribute prediction feature vector;

step 204, taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity and recommending the recommended game commodity;

step 205, acquiring the behavior of a player on the current game commodity, and calling the last state set of the player; wherein, the previous state set comprises the attribute feature vectors of the previous game commodity browsed by the player;

and step 206, inputting the last state set and the current state set of the player into a reinforcement learning algorithm model, so that the reinforcement learning algorithm model takes the behavior of the current game commodity as a model reward, and updates an attribute prediction matrix set corresponding to the characteristic vector of the player in the reinforcement learning algorithm model.

The execution subject of the method for recommending a game commodity according to the present embodiment is the game commodity recommending apparatus 1 shown in fig. 1.

First, the game commodity recommending apparatus 1 acquires attribute feature vectors of the current game commodity viewed by the player and feature vectors of the player, and configures a current state set of the player. The specific implementation manner is similar to that of the embodiment, and is not described herein.

Unlike the previous embodiment, in the second embodiment, it is further determined whether the player triggers a recommendation request for game merchandise.

That is, if and only if a player sends a request for recommending game commodities to a device for recommending game commodities via a terminal, the device for recommending game commodities inputs the current state set of the player into a reinforcement learning algorithm model, so that an attribute prediction matrix set corresponding to the feature vector of the player called by the reinforcement learning algorithm model outputs each attribute prediction feature vector; and taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity and recommending the recommended game commodity. Of course, the recommendation process in the second embodiment is similar to the foregoing embodiment, and is not described herein again.

Further, unlike the foregoing embodiment, when the player does not trigger a recommendation request for a game commodity, the game commodity recommendation device acquires the behavior of the player for the current game commodity and invokes the last state set of the player; wherein, the previous state set comprises the attribute feature vectors of the previous game commodity browsed by the player. Specifically, similar to the current state set, the attribute feature vectors of the last game item browsed by the player are included in the last state set of the player.

And the game commodity recommending device inputs the last state set and the current state set of the player into a reinforcement learning algorithm model, so that the reinforcement learning algorithm model takes the behavior of the current game commodity as model reward, and updates an attribute prediction matrix set corresponding to the characteristic vector of the player in the reinforcement learning algorithm model.

In particular, the Q-learning algorithm model is a reinforcement learning algorithm based on a reward mechanism. If the player is considered as the environment of the model action, if the player clicks or purchases the game goods recommended by the recommending device, the algorithm model of the recommending device will receive an award. The goal of the recommender is to optimize the recommendation strategy of the Q-learning algorithm model to obtain the maximum jackpot.

Further, a bonus function may be defined for each attribute of the game item, such as:

in the reward function, f_reward(S) a prize value of attribute S, e.g., 100 when the recommended game item is placed by the player; when the recommended game item is viewed only for hover details, the prize value is 1.

Therefore, the obtained previous state set, current state set and reward function of the player can be used for updating the attribute prediction matrix set corresponding to the player in the reinforcement learning algorithm model.

In an optional implementation manner, the previous state set and the current state set of the player are input into a reinforcement learning algorithm model, so that the reinforcement learning algorithm model uses the behavior of the current game commodity as a model reward, and updates an attribute prediction matrix set corresponding to a feature vector of the player in the reinforcement learning algorithm model, which may adopt the following manner:

first, the corresponding bonus value for the behavior of the current game commodity is determined in a preset bonus function, wherein the bonus function is as described above and is not described herein. The action on the current game commodity may specifically include viewing floating details, viewing recommended item details, placing an order, and other circumstances.

Updating the probability matrix of each attribute in the attribute prediction matrix set in the reinforcement learning algorithm model by using an updating formula, wherein the updating formula is as follows:

Q^new(s,α)＝(1-lr)·Q(s,α)+lr·[R+γ·maxQ(α,α')]；

wherein, Q is^new(s, α) represents an updated probability value when the feature vector of the previous game commodity is s and the feature vector of the current game commodity is α, Q (s, α) represents a probability value when the feature vector of the previous game commodity is s and the feature vector of the current game commodity is α, maxQ (α, α') represents a probability maximum value among the probability values of the attribute feature vectors of the current game commodity when the attribute feature vector of the previous game commodity is α of a probability matrix Q, lr is a preset algorithm learning rate, R is the award value, and γ is a preset discount factor.

Taking the probability matrix shown in table 1 as an example, if the attribute feature vector in the current state set is [20,25], and the attribute feature vector in the previous state set is [1,24], then:

Q^new([1，24]，[20，25])＝(1-lr)·Q([1，24]，[20，25])+lr·[R+γ·Q([1，24]，[1，24])]；

namely, Q^new([1，24]，[20，25])＝(1-lr)·0.2+lr·[R+γ·0.8]；

That is, in table 1 after updating, the attribute feature vector is [1,24], and the probability value corresponding to the attribute prediction feature vector [20,25] is [ (1-lr) · 0.2+ lr · R · 0.8 ].

Through the mode, the probability value in the probability matrix can be updated rapidly, so that the recommending device can recommend more accurate game commodities to the player.

Fig. 4 is a schematic structural diagram of a game commodity recommendation device according to a third embodiment of the present invention, and as shown in fig. 4, the game commodity recommendation device includes:

the interactive module 10 is configured to obtain attribute feature vectors of current game commodities browsed by a player and feature vectors of the player, and form a current state set of the player;

the processing module 20 is configured to input the current state set of the player into a reinforcement learning algorithm model, so that an attribute prediction matrix set corresponding to a feature vector of the player called by the reinforcement learning algorithm model outputs each attribute prediction feature vector;

the interaction module 10 is further configured to take the game commodity matched with each attribute prediction feature vector as a recommended game commodity and recommend the recommended game commodity.

In an optional implementation manner, the apparatus for recommending game commodities further includes a determining module, configured to determine whether the player triggers a request for recommending game commodities before inputting the current state set of the player into the reinforcement learning algorithm model;

if so, processing module 20 performs the step of inputting the set of current states of the player into a reinforcement learning algorithm model.

In an alternative embodiment, when the player does not trigger a recommendation request for game merchandise, the processing module 20 is further configured to: acquiring the behavior of a player on current game commodities, and calling a last state set of the player; wherein, the previous state set comprises the attribute feature vectors of the previous game commodity browsed by the player; inputting the last state set and the current state set of the player into a reinforcement learning algorithm model so that the reinforcement learning algorithm model takes the behavior of the current game commodity as a model reward, and updating an attribute prediction matrix set corresponding to the characteristic vector of the player in the reinforcement learning algorithm model.

In an optional implementation manner, the processing module 20 is specifically configured to: determining a corresponding reward value of the behavior of the current game commodity in a preset reward function; updating the probability matrix of each attribute in the attribute prediction matrix set corresponding to the player by using an updating formula, wherein the updating formula is Q^new(s,α)＝(1-lr)·Q(s,α)+lr·[R+γ·maxQ(α,α')](ii) a Wherein, Q is^new(s, α) represents an updated probability value when the feature vector of the previous game commodity is s and the feature vector of the current game commodity is α, Q (s, α) represents a probability value when the feature vector of the previous game commodity is s and the feature vector of the current game commodity is α, maxQ (α, α') represents a probability maximum value among probability values of attribute feature vectors of the current game commodity when the feature vector of the previous game commodity is α of a probability matrix Q, lr is a preset algorithm learning rate, R is the award value, and γ is a preset discount factor.

In an optional implementation manner, the processing module 20 is specifically configured to: calling a corresponding attribute prediction matrix set according to the characteristic vector of the player; wherein, the attribute prediction matrix set comprises a probability matrix of each attribute; and aiming at each attribute feature vector of the current game commodity, performing prediction processing by using a corresponding probability matrix to obtain each attribute prediction feature vector.

In an optional implementation manner, the processing module 20 is further configured to use the attribute prediction feature vectors as constraints, and obtain recommended game commodities in a preset game commodity library by using the constraints, so that the recommended game commodities are recommended by the interaction module 10.

In an optional implementation manner, the processing module 20 is specifically configured to use the attribute prediction feature vectors as constraint conditions, and obtain a weight of each prediction feature vector; and obtaining recommended game commodities in a preset game commodity library according to each constraint condition and the corresponding weight.

The invention provides a game commodity recommending device, which is characterized in that a current state set of a player is formed by acquiring attribute feature vectors of current game commodities browsed by the player and feature vectors of the player; inputting the current state set of the player into a reinforcement learning algorithm model so as to enable the reinforcement learning algorithm model to call an attribute prediction matrix set corresponding to the characteristic vector of the player and output each attribute prediction characteristic vector; and recommending the game commodity matched with each attribute prediction feature vector as a recommended game commodity, so that the characteristics of the player and the attributes of the current game commodity browsed by the player are fully considered when the game commodity is recommended for the player, and the game commodity meeting the current requirement can be accurately recommended for the player.

Fig. 5 is a hardware schematic diagram of a game commodity recommendation device according to a fourth embodiment of the present invention, and as shown in fig. 5, the game commodity recommendation device includes: a processor 42 and a computer program stored on the memory 41 and executable on the processor 42, the processor 42 executing the method of the above embodiment when executing the computer program.

The present invention also provides a readable storage medium comprising a program which, when run on a terminal, causes the terminal to perform the method of any of the above embodiments.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), and the like.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A method for recommending game merchandise, comprising:

taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity and recommending;

inputting the current state set of the player into a reinforcement learning algorithm model, so that the reinforcement learning algorithm model calls an attribute prediction matrix set corresponding to the current state set of the player, and outputs each attribute prediction feature vector, wherein the method comprises the following steps:

2. A method for recommending game merchandise according to claim 1, wherein said inputting the current state set of the player into a reinforcement learning algorithm model further comprises:

3. The method of recommending game merchandise according to claim 2, wherein when said player does not trigger a request for recommending game merchandise, said method of recommending game merchandise further comprises:

4. A method for recommending game items according to claim 3, wherein said inputting a previous state set and a current state set of said player into a reinforcement learning algorithm model, so that said reinforcement learning algorithm model uses said behavior on said current game items as a model reward, and updating a set of attribute prediction matrices corresponding to said player in said reinforcement learning algorithm model, comprises:

5. The method of recommending a game commodity according to claim 1, wherein said recommending a game commodity that matches each of the attribute prediction feature vectors as a recommended game commodity comprises:

6. The method of claim 5, wherein the step of obtaining a recommended game commodity from a preset game commodity library by using the attribute prediction feature vectors as constraints comprises:

7. A game item recommendation device, comprising:

the interaction module is also used for taking the game commodity matched with each attribute prediction feature vector as a recommended game commodity and recommending the recommended game commodity;

the processing module is specifically used for calling a corresponding attribute prediction matrix set according to the characteristic vector of the player in the current state set; wherein, the attribute prediction matrix set comprises a probability matrix of each attribute;

8. A recommendation device for game merchandise according to claim 7, wherein the processing module is further configured to perform the step of inputting the current state set of the player into the reinforcement learning algorithm model when the player triggers a recommendation request for game merchandise before inputting the current state set of the player into the reinforcement learning algorithm model;

9. A game item recommendation device, comprising: a memory, a processor, and a computer program;

wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any one of claims 1-6.

10. A readable storage medium, having stored thereon a computer program which is processed to execute to implement the method according to any one of claims 1-6.