CN110659947A

CN110659947A - Commodity recommendation method and device

Info

Publication number: CN110659947A
Application number: CN201910962727.0A
Authority: CN
Inventors: 赵国海
Original assignee: SHENYANG NE-CARES Co Ltd
Current assignee: SHENYANG NE-CARES Co Ltd
Priority date: 2019-10-11
Filing date: 2019-10-11
Publication date: 2020-01-07

Abstract

The invention provides a commodity recommendation method and a commodity recommendation device, which relate to the field of electronic commerce, and the commodity recommendation method comprises the following steps: acquiring current state information of a target object; inputting the state information into a pre-trained recommendation model to obtain an action value corresponding to the state information; determining probability identifications corresponding to the commodities to be recommended respectively according to the action values; and selecting a preset number of commodities to be recommended to the target object according to the probability identifications. By applying the commodity recommendation method provided by the invention, commodities can be recommended for the user according to the current state information of the target object, the requirements of the user can be comprehensively sensed, and the shopping experience of the user is improved.

Description

Commodity recommendation method and device

Technical Field

The invention relates to the field of electronic commerce, in particular to a commodity recommendation method and device.

Background

With the development of science and technology, the internet is developed greatly, the work and life modes of people are greatly changed due to the appearance of the internet, the work and life of people become more abundant and convenient, and particularly, in the field of e-commerce, people can carry out online shopping through an online shopping platform. For example, the purchase of various commodities such as clothes, food, cosmetics and the like brings great convenience to people.

The research of the inventor finds that the method is generally applied to a commodity recommendation engine for recommending commodities to users in the E-commerce field, the existing recommendation engine technology is generally applied to a content recommendation algorithm, a deep FM algorithm and the like, commodities which are interested by the users can be analyzed according to online behaviors of the users, and then the commodities which are interested by the users are recommended to the users, however, the existing recommendation engine can only recommend the commodities to the users according to the online behaviors of the users, the current requirements of the users cannot be comprehensively perceived, and accordingly the commodity recommendation effect is poor.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a commodity recommendation method, which can recommend commodities to a user according to the current state information of a target object, can comprehensively sense the requirements of the user, and can improve the shopping experience of the user.

The invention also provides a commodity recommendation device used for ensuring the realization and the application of the method in practice.

A method of merchandise recommendation, comprising:

acquiring current state information of a target object;

inputting the state information into a pre-trained recommendation model to obtain an action value corresponding to the state information;

determining probability identifications corresponding to the commodities to be recommended respectively according to the action values;

and selecting a preset number of commodities to be recommended to the target object according to the probability identifications.

Optionally, the obtaining current state information of the target object includes:

acquiring current time information, and acquiring object data of the target object in each pre-established message queue;

analyzing the current time information and the object data to obtain the current environment information of the target object and the travel information of the target object;

and obtaining the current state information of the target object according to the environment information and the travel information.

Optionally, the method for inputting the state information into a pre-trained recommendation model to obtain an action value corresponding to the state information includes:

when the state information is input into a pre-trained recommendation model, acquiring a preset action space, wherein the action space comprises each preset action value;

determining network parameters corresponding to the action values;

calculating the state information according to each network parameter to obtain a score value corresponding to each action value;

and determining the action value corresponding to the score value with the maximum value as the action value corresponding to the state information.

Optionally, in the method, selecting a preset number of to-be-recommended commodities to recommend to the target object according to each probability identifier includes:

determining the magnitude of a recommended probability value represented by each probability identifier;

sequencing the commodities to be recommended according to the size of each recommendation probability value, and selecting a preset number of commodities to be recommended according to the sequence of the recommendation probability values represented by the probability identifications from large to small;

composing the selected commodities to be recommended into recommendation information;

recommending the recommendation information to the target object.

The above method, optionally, further includes:

acquiring operation information of the target object;

generating an award value corresponding to the state information based on the operation information;

and updating the network parameters of the recommendation model according to the state information, the reward value and the action value.

An article recommendation device comprising:

the first acquisition unit is used for acquiring the current state information of the target object;

the input unit is used for inputting the state information into a pre-trained recommendation model to obtain an action value corresponding to the state information;

the determining unit is used for determining probability identifications corresponding to the commodities to be recommended respectively according to the action values;

and the recommending unit is used for selecting a preset number of commodities to be recommended to recommend to the target object according to the probability identifications.

The above apparatus, optionally, the first obtaining unit includes:

the acquisition subunit is used for acquiring current time information and acquiring object data of the target object in each pre-established message queue;

the analysis subunit is configured to analyze the current time information and the object data to obtain current environment information of the target object and trip information of the target object;

and the generating subunit is used for obtaining the current state information of the target object according to the environment information and the travel information.

The above apparatus, optionally, the input unit, includes:

the input subunit is used for acquiring a preset action space when the state information is input into a pre-trained recommendation model, wherein the action space comprises each preset action value;

the first determining subunit is configured to determine a network parameter corresponding to each action value;

the operation subunit is used for calculating the state information according to each network parameter to obtain a score value corresponding to each action value;

and the second determining subunit is used for determining the action value corresponding to the score value with the maximum value as the action value corresponding to the state information.

The above apparatus, optionally, the recommending unit includes:

the third determining subunit is used for determining the size of the recommended probability value represented by each probability identifier;

the sorting subunit is used for sorting the commodities to be recommended according to the size of the recommended probability value and selecting a preset number of commodities to be recommended according to the sequence of the recommended probability value represented by the probability identifier from large to small;

the execution subunit is used for forming recommendation information by the selected commodities to be recommended;

and the recommending subunit is used for recommending the recommending information to the target object.

The above apparatus, optionally, further comprises:

a second acquisition unit configured to acquire operation information of the target object;

the generating unit is used for generating an award value corresponding to the state information according to the operation information;

and the updating unit is used for updating the network parameters of the recommendation model according to the state information, the reward value and the action value.

According to the scheme, the invention provides a commodity recommendation method and device, wherein the current state information of a target object is acquired; inputting the state information into a pre-trained recommendation model to obtain an action value corresponding to the state information; determining probability identifications corresponding to the commodities to be recommended respectively according to the action values; and selecting a preset number of commodities to be recommended to the target object according to the probability identifications. By applying the commodity recommendation method provided by the invention, commodities can be recommended for the user according to the current state information of the target object, the requirements of the user can be comprehensively sensed, and the shopping experience of the user is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

FIG. 1 is a flow chart of a method for recommending merchandise according to the present invention;

FIG. 2 is a flowchart of another method of a merchandise recommendation method according to the present invention;

FIG. 3 is a flowchart of another method of a merchandise recommendation method according to the present invention;

FIG. 4 is a flowchart of another method of a merchandise recommendation method according to the present invention;

FIG. 5 is a diagram illustrating an exemplary method for recommending merchandise according to the present invention;

FIG. 6 is a schematic structural diagram of a merchandise recommendation device according to the present invention;

fig. 7 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multi-processor apparatus, distributed computing environments that include any of the above devices or equipment, and the like.

The embodiment of the invention provides a commodity recommendation method, which can be applied to various system platforms, wherein an execution main body of the method can be a computer terminal or a processor of various mobile devices, and a flow chart of the method is shown in fig. 1, and the method specifically comprises the following steps:

s101: and acquiring the current state information of the target object.

In the method provided by the embodiment of the present invention, the current state information of the target object may include current environment information of the target object and trip information of the target object.

In particular, the target object may be a passenger user of an airline.

S102: and inputting the state information into a pre-trained recommendation model to obtain an action value corresponding to the state information.

In the method provided by the embodiment of the invention, the recommendation model can be a Markov decision process MDP model; by inputting the current state information of the target object into the recommendation model, the action value corresponding to the current state information of the target object can be obtained, and the action value is the action value with the maximum score value in the action space of the recommendation model.

S103: and determining recommendation probability identifications corresponding to the commodities to be recommended respectively according to the action values.

In the method provided by the embodiment of the invention, each probability mark represents the recommendation probability value of the to-be-recommended commodity to which the probability mark belongs; mapping the probability value of the commodity to be recommended into a recommendation probability identifier by applying a One-Hot coding mode; the action value is a vector formed by the probability identifications of the commodities to be recommended, and the probability identifications of the commodities to be recommended can be obtained by analyzing the action value, wherein the probability identification of each commodity is determined by the recommendation probability value of the commodity.

S104: and selecting a preset number of commodities to be recommended to the target object according to the recommendation probability identifications.

In the method provided by the embodiment of the invention, a preset number of commodities to be recommended are selected based on the recommendation probability values represented by the recommendation probability identifications; and generating recommendation information according to the selected commodities to be recommended, and displaying the recommendation information on the target object.

The embodiment of the invention provides a commodity recommendation method, which comprises the following steps: acquiring current state information of a target object; inputting the state information into a pre-trained recommendation model to obtain an action value corresponding to the state information; determining probability identifications corresponding to the commodities to be recommended respectively according to the action values; and selecting a preset number of commodities to be recommended to the target object according to the probability identifications. By applying the commodity recommendation method provided by the invention, commodities can be recommended for the user according to the current state information of the target object, the requirements of the user can be comprehensively sensed, and the shopping experience of the user is improved.

In the method for recommending a commodity according to the embodiment of the present invention, based on the above implementation process, specifically, the obtaining of the current state information of the target object specifically includes, as shown in fig. 2:

s201: and acquiring current time information, and acquiring object data of the target object in each pre-established message queue.

In the method provided by the embodiment of the invention, the message queue can be a Kafka message queue, and the message queue is used for transmitting object data acquired from a data source; the object data can be in a JSON format; for example, flight data of the target object is obtained, and the flight data format may be: { flight No: "flight number", DeparturePlace: "origin", Destination: "Destination".

S202: analyzing the current time information and the object data to obtain the current environment information of the target object and the travel information of the target object.

In the method provided by the embodiment of the invention, the object data can be analyzed firstly, and the current time information and the analyzed object data are calculated to obtain the current environment information and the current travel information of the target object; for example, the expected takeoff time is 12:30, and the current time is 11:00, the value of the non-boarding state of the target object is 30 minutes away from the takeoff time, the check-in state of the user can be represented by s1, the security check-in state of the user is represented by s2, and the dining interval state of the user is represented by s 3; wherein, s 1: value machine (0,1), s 2: security check area (0,1), s 3: catering intervals (1,2,3), wherein if the user has carried out the check-in, s1 takes the value of 1; if the user is located in the security check area, s2 takes the value 1; if the user is breakfast at the current moment, s3 takes a value of 1; if the user is at lunch at the current moment, s3 takes a value of 2, and if the user is at dinner at the current moment, s3 takes a value of 3.

S203: and obtaining the current state information of the target object according to the current environment information of the target object and the travel information of the target object.

In the method provided by the embodiment of the invention, the current state information of the target object is formed by the current environment information and the current travel information of the target object, and the state information can be a matrix.

By applying the method provided by the embodiment of the invention, the state information of the user can be obtained based on the current environment information and the travel information of the user, the requirement of the user can be accurately sensed, and further, the accurate commodity recommendation can be realized.

In the method for recommending a commodity according to an embodiment of the present invention, based on the implementation process, specifically, the inputting the state information into a pre-trained recommendation model to obtain an action value corresponding to the state information includes, as shown in fig. 3:

s301: and when the state information is input into a pre-trained recommendation model, acquiring a preset action space, wherein the action space comprises each preset action value.

In the method provided by the embodiment of the invention, the action space comprises a plurality of preset action values, and each action value can be a vector formed by probability identifications of commodities to be recommended; the probability identifications of the various recommended items in the different action values may be different.

S302: and determining the network parameters corresponding to the action values.

In the method provided by the embodiment of the invention, each action value corresponds to one network parameter, and the network parameter corresponding to each action value is a network parameter obtained by pre-training.

S303: and calculating the state information according to each network parameter to obtain a score value corresponding to each action value.

In the method provided by the embodiment of the invention, the state information is calculated according to each network parameter, so that the probability distribution of each action value can be obtained; the probability value based on the respective action values may be a score value corresponding to each action value.

S304: and determining the action value corresponding to the score value with the maximum value as the action value corresponding to the state information.

In the method provided by the embodiment of the invention, the score values are respectively compared to obtain the action value with the maximum score value, and the action value corresponding to the score value with the maximum value is determined as the action value corresponding to the state information.

By applying the method provided by the embodiment of the invention, the action value which is most worth to be executed can be selected based on the score value of each action value, so that the selected action value is the action value which is most matched with the current state information of the user, and the satisfactory recommendation information of the user can be recommended for the user.

In the method for recommending commodities provided in an embodiment of the present invention, based on the implementation process, specifically, the selecting a preset number of commodities to be recommended to the target object according to each of the probability identifiers includes:

s401: and sequencing the commodities to be recommended according to the recommendation probability values represented by the probability identifications, and selecting a preset number of commodities to be recommended according to the sequence of the recommendation probability values represented by the probability identifications from large to small.

In the method provided by the embodiment of the invention, the recommendation probability values represented by the probability identifications are determined, and the commodities to be recommended are sorted according to the recommendation probability values represented by the probability identifications.

S402: and forming recommendation information by the selected commodities to be recommended.

In the method provided by the embodiment of the invention, the selected commodities to be recommended are arranged according to the recommendation probability value represented by the probability identifier to form recommendation information, so that a user can preferentially see the commodities with larger recommendation probability values in the recommendation information.

S403: recommending the recommendation information to the target object.

In the method provided by the embodiment of the invention, the recommendation information is displayed to the user through a preset visual page, and can also be recommended to the user through a notification message form and a commodity video introduction form.

By applying the method provided by the embodiment of the invention, the preset number of commodities to be recommended can be selected based on the magnitude of each recommendation probability value, so that a user can quickly find the interested commodities.

In the commodity recommendation method provided in the embodiment of the present invention, based on the implementation process, the method specifically further includes:

acquiring operation information of the target object;

In the method provided by the embodiment of the present invention, the operation information of the target object is obtained, where the operation information may be null, or may at least include any one of browsing operation information, collection operation information, and purchase operation information of the target object, that is, before the next state information of the user arrives, the operation information of the target object on each to-be-recommended product is recorded, for example, browsing operation information generated by browsing the product in the recommendation information, collection operation information generated by collecting the product in the recommendation information, and purchase operation information generated by purchasing the product in the recommendation information; based on the operation information of the target object, an award value corresponding to the state information is generated, optionally, the award value may be added with "1" for each time the target object browses any commodity of the recommendation information, the award value may be added with "5" for each time the target object collects any commodity of the recommendation information, the award value may be added with "100" for each time the target object purchases any commodity of the recommendation information, it should be noted that a calculation rule of the award value may be specifically set by a technician according to an actual situation, and the calculation rule is not limited herein.

In the method provided by the embodiment of the invention, the operation information of the user can be acquired according to the preset acquisition period through the log acquisition system.

By applying the method provided by the embodiment of the invention, the parameters of the network are adjusted through the reward value, so that the recommendation model can carry out commodity recommendation by combining the behavior habits on the subscriber line and the state information of the user, the requirements of the user can be comprehensively sensed, the commodity recommendation is carried out for the user, and the commodity recommendation effect is good.

In the method provided by the embodiment of the present invention, based on the implementation process, specifically, the training process of the recommendation model may be:

randomly acquiring a plurality of pre-stored four-element groups from a pre-selected storage space;

and calling a preset Bellman equation to calculate each four-element group so as to update the network parameters of the recommendation model.

The storage process of each four-element group may be: inputting training state information at a first moment into a first strategy subnetwork of the recommendation model to generate a training action value; a first evaluation sub-network of the training model generates a quality evaluation value of the action according to the training state information at the first moment, the training action value and the reward value corresponding to the training action value; and acquiring second-time training state information, and storing the first-time state information, the second-time state information, the training action value and the quality evaluation value into a preset storage space in a four-element group mode.

In an embodiment of the present invention, as shown in fig. 5, the recommendation model includes an online network and a target network; the online network comprises a first strategy sub-network and a first evaluation sub-network, and the target network comprises a second strategy sub-network and a second evaluation sub-network; the construction process of the recommendation model may be:

and step a1, initializing a recommendation model.

In the process of executing step a1, a first policy sub-network μ (s | θ) is established by means of random initialization^μ) And a first evaluation network Q (s, a | theta |)^Q) Wherein, theta^μIs a network parameter of the first policy subnetwork, theta^QNetwork parameters for a first evaluation sub-network; the strategy sub-network mainly aims to update the parameters of the strategy towards the direction of increasing the Q value of the value function, an action is selected through the strategy sub-network according to the current state, and the evaluation network is a network which calculates the Q value by the Q function; for the second policy sub-network μ' (s | θ)^μ') Network parameter theta of the second policy subnetwork^μ'A second evaluation sub-network Q' (s,a|θ^Q') And a second evaluation sub-network^Q'Carrying out initialization; a memory space R is opened up for the memory playback mechanism (memoryrelay).

Step a2, generating each four-element group, and storing each four-element group.

During the execution of step a2, state s is initialized_tAnd carrying out iterative solution; in each iteration, an action a is selected by adding Gaussian disturbance to the current first strategy sub-network_t＝μ(s_t|θ^μ)+Ν_tSearch is performed, and a is after the action is executed_tWe will receive a corresponding prize r_tAnd take out the next state s_t+1Four-element set(s) formed in the current iteration process_t,a_t,r_t,s_t+1) To be stored in the space R.

And step a3, updating the network. By the current first evaluation subnetwork pair Q(s)_i，a_i|θ^Q) Estimating, randomly selecting a small-batch four-element group from R, estimating Q by using a Bellmann equation, and assuming that an obtained result is expressed by y, then:

y_i＝r_i+γQ'(s_i+1,μ'(s_i+1|θ^μ')|θ^Q')

where γ is the discount factor.

Step a4, updating network parameters of the first evaluation sub-network and updating network parameters of the first policy sub-network.

In the step a4, updating the network parameters of the first evaluation sub-network by a minimum loss function; the loss function may be:

after the first evaluation sub-network is updated, the first policy network is updated, and a policy gradient is mainly adopted when the first policy network is updated:

the strategy gradient is from 1 to the gradient in the current i state and then by a factor of N.

Step a4, updating the network parameters of the second evaluation sub-network, and the network parameters of the second policy sub-network.

In the process of executing the step a4, the mode of maximizing the expected reward is adopted, and after the strategy gradient is obtained, the mode of ascending the gradient is adopted. And finally, updating the target network by using the updated online network. Then evaluating the parameter update of the target network: theta^Q'←τθ^Q+(1-τ)θ^Q'Updating parameters of the strategy target network: theta^μ′←τθ^μ+(1-τ)θ^μ'。

In the specific application process, the recommendation model continuously executes tasks to learn the recommendation strategy based on the quality function learning equation of the Bellman iterative formula in the deep reinforcement learning. Each time an action value is generated, four element groups are extracted from the buffer with evenly distributed probabilities for updating the state-skill value. According to the statistical consistency of the Bellman formula, the state-skill value function gradually approaches to the true value along with the increase of the updating times, so that the recommendation model can measure the value of each action value corresponding to the target object under the state information at each moment, and then the action value corresponding to the maximum quality value can be selected through a greedy algorithm, so that the target object can be recommended with the required commodities.

In the method provided by the embodiment of the present invention, the first policy subnetwork, the second policy subnetwork, the first evaluation subnetwork and the second evaluation subnetwork may be shallow neural networks, and may specifically be neural networks including a hidden layer, where an input of the first policy subnetwork is state information s, and an input of the first evaluation subnetwork may include state information s, an action value a and a reward value r.

The above specific implementations and the derivation processes of the implementations are all within the scope of the present invention.

Corresponding to the method described in fig. 1, an embodiment of the present invention further provides a commodity recommendation device, which is used for implementing the method in fig. 1 specifically, the commodity recommendation device provided in the embodiment of the present invention may be applied to a computer terminal or various mobile devices, and a schematic structural diagram of the commodity recommendation device is shown in fig. 6, and specifically includes:

a first obtaining unit 501, configured to obtain current state information of a target object;

an input unit 502, configured to input the state information into a pre-trained recommendation model, so as to obtain an action value corresponding to the state information;

a determining unit 503, configured to determine, according to the action value, probability identifiers corresponding to the respective commodities to be recommended;

and the recommending unit 504 is configured to select a preset number of commodities to be recommended to recommend to the target object according to each probability identifier.

In an embodiment of the present invention, based on the foregoing solution, the first obtaining unit 501 is configured to:

In an embodiment of the present invention, based on the foregoing scheme, the input unit 502 is configured to:

In an embodiment of the present invention, based on the foregoing solution, the recommending unit 504 is configured to:

In an embodiment of the present invention, based on the foregoing solution, the article recommendation device further includes:

The embodiment of the invention also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the device where the storage medium is located is controlled to execute the data processing method.

An electronic device is provided in an embodiment of the present invention, and the structural diagram of the electronic device is shown in fig. 7, which specifically includes a memory 601, and one or more instructions 602, where the one or more instructions 602 are stored in the memory 601 and configured to be executed by one or more processors 603 to perform the following operations on the one or more instructions 602:

acquiring current state information of a target object;

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The above detailed description is provided for the commodity recommendation method and apparatus provided by the present invention, and the principle and the implementation of the present invention are explained in the present document by applying specific examples, and the description of the above examples is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for recommending an article, comprising:

acquiring current state information of a target object;

2. The method of claim 1, wherein the obtaining current state information of the target object comprises:

3. The method of claim 1, wherein the inputting the state information into a pre-trained recommendation model to obtain an action value corresponding to the state information comprises:

determining network parameters corresponding to the action values;

4. The method according to claim 1, wherein the selecting a preset number of the to-be-recommended commodities to recommend to the target object according to each probability identifier comprises:

recommending the recommendation information to the target object.

5. The method of claim 1, further comprising:

acquiring operation information of the target object;

6. An article recommendation device, comprising:

7. The apparatus of claim 6, wherein the first obtaining unit comprises:

8. The apparatus of claim 6, wherein the input unit comprises:

9. The apparatus of claim 6, wherein the recommending unit comprises:

10. The apparatus of claim 6, further comprising: