CN113158086B

CN113158086B - Personalized customer recommendation system and method based on deep reinforcement learning

Info

Publication number: CN113158086B
Application number: CN202110365717.6A
Authority: CN
Inventors: 陈建平; 傅启明; 黄泽天
Original assignee: Zhejiang Beierxiong Technology Co ltd
Current assignee: Zhejiang Beierxiong Technology Co ltd
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2023-05-05
Anticipated expiration: 2041-04-06
Also published as: CN113158086A

Abstract

The invention discloses a personalized customer recommendation system and method based on deep reinforcement learning, comprising an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module.

Description

Personalized customer recommendation system and method based on deep reinforcement learning

Technical Field

The invention relates to the field of recommendation systems of tourist routes and scenic spots, in particular to a personalized customer recommendation system and method based on deep reinforcement learning.

Background

With the development of economy, the travel industry is also rapidly developed, the travel will of people is also stronger, meanwhile, the travel demands of people are also diversified, and the difficulty of providing the needed travel information for passengers is objectively increased. In addition, personalized custom tourist attraction recommendation is also increasingly popular with passengers, and most tourist attraction recommendation systems at present have insufficient precision in personalized precision attraction recommendation of users, and do not have good balance profit and user preference in the recommendation process, namely, the recommended attractions are either high in profit but dislike by users or are favored but low in profit by users.

With the development of deep reinforcement learning technology, scenic spot recommendation of image information and auxiliary information for different users is getting more and more important. The current recommendation system aiming at the user characteristics only carries out recommendation according to the user images or a small part of user labels, has certain value in universality but cannot achieve higher accuracy and personalized recommendation, so that the recommendation in the aspects of personalized requirements and accurate recommendation of people cannot be met.

Disclosure of Invention

The invention solves the technical problem of providing a personalized customer recommendation system based on deep reinforcement learning, which can realize accurate recommendation.

The technical scheme adopted for solving the technical problems is as follows: the personalized customer recommendation system based on deep reinforcement learning comprises an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module;

the image data acquisition module is used for acquiring head portrait information of a user in a store;

the mobile data acquisition module is used for acquiring scenic spots selected by users and user information;

the online data module is a database stored in a server, and user history information is stored in the database;

the processing terminal is used for establishing an environment model for the received information and giving an optimal scenic spot recommendation scheme according to the environment model.

Further is: the mobile data acquisition module is a mobile terminal.

Further is: the image data acquisition module comprises an electronic camera, and the electronic camera is in communication connection with the first communication module through a USB interface.

Further is: the first communication module and the second communication module are WI-FI modules.

The invention also discloses a personalized customer recommending method based on deep reinforcement learning, which comprises the following steps,

s1: the image data acquisition module acquires user information in real time;

s2: the image data acquisition module is communicated with the processing terminal through a USB interface, and the image information received from the data acquisition module is transmitted to the data processing terminal;

s3: after the data processing terminal receives the image information, the modeling unit establishes an environment model for the received data information; the decision unit provides an optimal scenic spot type recommendation scheme according to the environment model and transmits the scheme to the mobile data acquisition module;

s4: the mobile data acquisition module transmits the scenic spot selected by the user and the user information to the data processing terminal;

and S5, after the data processing terminal receives the information, acquiring user history information through an online data module, combining profit data through feature matching, giving out a final recommended scenic spot through the user history information and user habits, and transmitting the final recommended scenic spot to the mobile data acquisition module.

Further is: the building of the environment model in step S3 includes the following steps:

s31: modeling scenic spot recommendation problems as a Markov decision process model, and modeling states, actions and immediate rewarding functions in the scenic spot recommendation problems;

s32: establishing a return value function model;

s33: and solving an optimal adjustment scheme by using a DQN deep reinforcement learning algorithm.

Further is: the user history information in step S5 is obtained by inquiring a database in the server through the user identity card, the final recommended scenic spot is given, specifically, the comprehensive scores of the user history information and profit data are utilized for selection, and the ranking is carried out based on the habit of the user.

Further is: the specific modeling and deep reinforcement learning algorithm is as follows:

step one: modeling the scenic spot recommendation problem as an MDP model, and defining states, actions and immediate rewards functions therein;

(a) The status is denoted by s, and if the gender obtained by image analysis of the user entering the store at a certain moment is Sex, age, consumption level Co, and geographic position Pos, the user status at that moment can be expressed as:

s＝(Sex，Age，Co，Pos)

wherein the consumption level is derived from the user's clothing;

(b) Action, denoted by a, let the scenic spot type be selected as 1, not selected as 0, let the system select n different types of scenic spots, the n scenic spots being the highest scenic spot in the subdivision n scenic spot types that gives an internal score based on scenic spot playability, the set of actions that the system can take is:

a= { [0,1], … …, [0,1] }; [0,1] n in total, n being a natural number;

(c) An immediate rewards function, denoted r, in the system, indicating a user-selected preference, the rewards are indicated as:

r＝r1+1.5*r2

r1=i (10+0.01×r) where I represents an indication function, 1 when clicked by the user, or 0 otherwise;

r is the average profit of the corresponding type scenic spot;

r2=i (100+0.01×r); r is the average profit of the corresponding type of the selected scenic spot;

where r1 represents the user clicking on the prize for the selected attraction type and r2 represents the prize for the attraction ultimately selected by the user;

step two: establishing a value function return model;

let R (s, a) denote the return value of action a in state s, the value function Q (s, a) being the expectation of R (s, a), Q (s, a) =e [ R (s, a) ];

step three: solving an optimal strategy by using a DQN deep reinforcement learning algorithm;

(1) Initializing a memory playback unit, the capacity being N, for storing samples of training;

(2) Initializing a current value network and randomly initializing a weight parameter omega;

(3) Initializing a target value network, wherein the structure and the initialization weight are the same as those of the current value network;

(4) And (3) obtaining Q (s, a) in any state s through a current value network by using a photograph showing road conditions, calculating a value function through the current value network, selecting an action a by using an E-greedy strategy, recording each state transition action as a time step t, and storing the data (s, a, r, s') obtained in each time step into a playback memory unit.

(5) Defining a loss function L (ω):

L(ω)＝E[(r+γmaxa′Q(s′,a′；ω ^- )-Q(s,a；ω)) ² ]

(6) Randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting Q (s, a), s', r to a current value network, a target value network and L (omega), updating L (omega) about omega by using a gradient descent method, and solving an optimal strategy, wherein the method for updating a value function by using a DQN algorithm is as follows:

Q(s,a)←Q(s,a)+α[r+γmax _a’ Q(s’,a’；ω ^- )-Q(s,a)]

s←s′

a←a′

wherein gamma is a discount factor, which depends on the actual convergence;

(7) Updating parameters of the target value network as parameters of the current value network after every N iterations;

step four: and transmitting m scenic spots with highest probability output by the DQN network in the n scenic spot types to a mobile data acquisition module for selection by a user.

Further is: step five: based on the types of scenic spots selected by the user, the travel distance, the historical information of the user and the places where the user is located are subjected to feature matching in all scenic spots, X scenic spots with highest scores after the scenic spots travelled by the user are excluded are selected based on the evaluation system, the X scenic spots are sequenced based on the habit of the user and then transmitted to the mobile data acquisition module for the user to select, and the evaluation system P is as follows:

P＝p1+0.01*R’+10*I’

wherein p1 is the profit of the attraction given an internal score based on attraction playability, R' is 1 if the user travels through the type of attraction, or 0 otherwise;

wherein the user habits are: obtained from experiments performed for randomly selecting multiple persons, it was found that a user selected one of a plurality of options for similar attractions, in which the probability was higher, placed in the middle when sorting.

The beneficial effects of the invention are as follows: according to the invention, accurate scenic spot recommendation can be performed according to the characteristics and the preferences of different users and on the basis of considering profits of travel agencies, so that the user satisfaction can be greatly improved, and the profits of the travel agencies can be improved to a certain extent.

Drawings

FIG. 1 is a schematic diagram of an intelligent scenic spot recommendation system.

Fig. 2 is a schematic diagram of a data processing terminal structure of the intelligent scenic spot recommendation system of the invention.

FIG. 3 is a schematic diagram of a training process of the DQN deep reinforcement learning algorithm.

FIG. 4 is a flow chart of a method of the intelligent scenic spot recommendation system of the invention.

Detailed Description

In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below.

It will be understood that when an element is referred to as being "fixed to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

A personalized customer recommendation system based on deep reinforcement learning as shown in fig. 1, wherein: the system comprises an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module;

the image data acquisition module is used for acquiring head portrait information of a user in a store and can specifically comprise two acquisition cameras, wherein one image data acquisition module is aligned with a gate, and the other image data acquisition module is aligned with a hall;

the processing terminal is used for establishing an environment model for the received information and giving an optimal scenic spot recommendation scheme according to the environment model;

in operation, each 4 frames of acquired user images are used as a group to be transmitted to a processing terminal through a deep neural network, the result is transmitted to the processing terminal, the processing terminal collects data, a Markov decision process is constructed through a reinforcement learning method, and an optimal strategy is solved, namely, the most matched scenic spots are selected from a characteristic scenic spot type pool according to the user images, the selected most matched scenic spot types are transmitted to a mobile device APP of a travel agency staff, the user selects partial favorite scenic spots from the scenic spot types, the favorite scenic spots of the user, the selected travel types (short distance, medium distance and long distance) are transmitted to the processing terminal through the mobile device, the processing terminal acquires the travelled scenic spots of the user in a user database by combining the acquired information and an online server, and then selects from a location of the travel agency and a limited scenic spot pool of the selected type (excluding the travelled scenic spot of the user) on the basis of comprehensively considering the preference and profits of the user, and finally the optimal recommended scenic spot is provided

Specifically, the mobile data acquisition module is a mobile terminal, the image data acquisition module comprises an electronic camera, the electronic camera is in communication connection with the first communication module through a USB interface, the first communication module and the second communication module are WI-FI modules, and the types of the WI-FI modules are as follows: SKW77.

The system comprises the following steps in operation:

s1: the image data acquisition module acquires user information in real time;

and S5, after receiving the information, the data processing terminal acquires user history information through an online data module, combines profit data through feature matching, gives out final recommended sceneries through the user history information and user habits, and transmits the final recommended sceneries to the mobile data acquisition module, wherein the user history information is obtained by inquiring a database in a server through a user identity card, the final recommended sceneries are given out, specifically, the feature matching is utilized, the comprehensive scores of the user history information and the profit data are selected, and sorting is carried out based on the user habits.

Wherein: the building of the environment model in step S3 includes the following steps:

s32: establishing a return value function model;

The specific modeling and deep reinforcement learning algorithm is as follows:

s＝(Sex，Age，Co，Pos)

wherein the consumption level is derived from the user's clothing;

a= { [0,1], … …, [0,1] }; [0,1] n in total;

r＝r1+1.5*r2

r is the average profit of the corresponding type scenic spot;

step two: establishing a value function return model;

(5) Defining a loss function L (ω):

L(ω)＝E[(r+γmaxa′Q(s′，a′；ω ^- )-Q(s，a；ω)) ² ]

Q(s，a)←Q(s，a)+α[r+γmax _a’ Q(s’，a’；ω ^- )-Q(s，a)]

s←s′

a←a′

wherein gamma is a discount factor, which depends on the actual convergence;

(7) And updating the parameters of the target value network as the parameters of the current value network after every N iterations.

Step four: the m scenic spots with highest probability output by the DQN network in the n scenic spot types are transmitted to the mobile data acquisition module for selection by a user, and n and m are natural numbers and can be set arbitrarily according to actual conditions, such as: the 10 scenic spots with highest probability output by the DQN network in the 30 scenic spot types are transmitted to a mobile data acquisition module for selection by a user, and 1-5 scenic spot types selected by the user and travel distance selected by the user are received;

step five: based on the scenic spot type selected by the user, the distance travelled, the historical information of the user (obtained by inquiring through an online data module) and the place where the user is located are subjected to feature matching in all scenic spots, X scenic spots with highest scores after the scenic spots travelled by the user are excluded are selected based on an evaluation system, and the X is a natural number and can be set according to actual conditions, such as: x can be 3, 5, 6, etc., and the X sceneries are sequenced based on the habit of the user and then transmitted to the mobile data acquisition module for the user to select, and the evaluation system P is as follows:

P＝p1+0.01*R’+10*I’

wherein the user habits are: obtained from experiments performed for randomly selecting multiple persons, it was found that a user selected a attraction with a higher probability among options for multiple similar attractions, placed the attraction with the higher probability in the middle when sorting, such as: 1000 persons are randomly selected to conduct experiments, and the probability that the user selects the middle three options in the options of 5 similar sceneries is higher, so that the three sceneries with highest profit are placed in the middle when 5 sceneries are ordered.

The deep reinforcement learning is to introduce a convolutional neural network on the basis of the traditional reinforcement learning, so that the problem of dimension disasters caused by the fact that the reinforcement learning calculates and stores state-action values one by one under the condition of a high-dimensional state space is solved.

While the foregoing is directed to embodiments of the present invention, other and further details of the invention may be had by the present invention, it should be understood that the foregoing description is merely illustrative of the present invention and that no limitations are intended to the scope of the invention, except insofar as modifications, equivalents, improvements or modifications are within the spirit and principles of the invention.

Claims

1. A personalized customer recommendation method based on deep reinforcement learning is characterized in that: the system comprises an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module;

the image data acquisition module is used for acquiring user head portrait information of a store, wherein the user head portrait information comprises the gender, age, consumption level and geographic position of a user;

the mobile data acquisition module is used for acquiring scenic spots selected by a user and user information, wherein the user information comprises whether the user selects the scenic spots or not;

the online data module is a database stored in the server, wherein the database stores user history information, and the user history information is scenic spots travelled by the user;

the processing terminal is used for establishing a scenic spot recommendation model for the received information and giving an optimal scenic spot recommendation scheme according to the scenic spot recommendation model;

the steps are that,

s1: the image data acquisition module acquires user information in real time;

s5, after the data processing terminal receives the information, acquiring user history information through an online data module, combining profit data through feature matching, giving out a final recommended scenic spot through the user history information and user habit, and transmitting the final recommended scenic spot to a mobile data acquisition module;

the building of the environment model in step S3 includes the following steps:

s32: establishing a return value function model;

s33: solving an optimal adjustment scheme by using a DQN deep reinforcement learning algorithm;

the specific modeling and deep reinforcement learning algorithm is as follows:

s＝(Sex，Age，Co，Pos)

wherein the consumption level is derived from the user's clothing;

a= { [0,1], … …, [0,1] }; [0,1] n in total, n being a natural number;

r＝r1+1.5*r2；

r is the average profit of the corresponding type scenic spot;

step two: establishing a value function return model;

(4) After Q (s, a) in any state s is obtained through a current value network and a value function is calculated through the current value network, an E-greedy strategy is used for selecting action a, each state transition is performed, the action is recorded as a time step t, and data (s, a, r, s') obtained in each time step are stored in a playback memory unit;

(5) Defining a loss function L (ω):

L(ω)＝E[(r+γmaxa′Q(s′,a′；ω ^- )-Q(s,a；ω)) ² ]

Q(s,a)←Q(s,a)+α[r+γmax _a ’Q(s’,a’；ω ^- )-Q(s,a)]

s←s′

a←a′

wherein gamma is a discount factor, which depends on the actual convergence;

step four: and transmitting m scenic spots with highest probability output by the DQN network in the n scenic spot types to a mobile data acquisition module for user selection, wherein n and m are natural numbers, and n is more than m.

2. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the user history information in step S5 is obtained by inquiring a database in the server through the user identity card, the final recommended scenic spot is given, specifically, the comprehensive scores of the user history information and profit data are utilized for selection, and the ranking is carried out based on the habit of the user.

3. The personalized customer recommendation method based on deep reinforcement learning of claim 2, wherein:

step five: based on the types of scenic spots selected by the user, the travel distance, the historical information of the user and the places where the user is located are subjected to feature matching in all scenic spots, X scenic spots with highest scores after the scenic spots travelled by the user are excluded are selected based on the evaluation system, the X scenic spots are sequenced based on the habit of the user and then transmitted to the mobile data acquisition module for the user to select, and the evaluation system P is as follows:

P＝p1+0.01*R’+10*I’

4. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the mobile data acquisition module is a mobile terminal.

5. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the image data acquisition module comprises an electronic camera, and the electronic camera is in communication connection with the first communication module through a USB interface.

6. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the first communication module and the second communication module are WI-FI modules.