CN113158086B - Personalized customer recommendation system and method based on deep reinforcement learning - Google Patents
Personalized customer recommendation system and method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN113158086B CN113158086B CN202110365717.6A CN202110365717A CN113158086B CN 113158086 B CN113158086 B CN 113158086B CN 202110365717 A CN202110365717 A CN 202110365717A CN 113158086 B CN113158086 B CN 113158086B
- Authority
- CN
- China
- Prior art keywords
- user
- data acquisition
- acquisition module
- scenic spot
- processing terminal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 29
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000012545 processing Methods 0.000 claims abstract description 40
- 238000004891 communication Methods 0.000 claims abstract description 36
- 230000006870 function Effects 0.000 claims description 30
- 230000009471 action Effects 0.000 claims description 21
- 238000011156 evaluation Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000002474 experimental method Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000010191 image analysis Methods 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/587—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/14—Travel agencies
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Life Sciences & Earth Sciences (AREA)
- Library & Information Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a personalized customer recommendation system and method based on deep reinforcement learning, comprising an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module.
Description
Technical Field
The invention relates to the field of recommendation systems of tourist routes and scenic spots, in particular to a personalized customer recommendation system and method based on deep reinforcement learning.
Background
With the development of economy, the travel industry is also rapidly developed, the travel will of people is also stronger, meanwhile, the travel demands of people are also diversified, and the difficulty of providing the needed travel information for passengers is objectively increased. In addition, personalized custom tourist attraction recommendation is also increasingly popular with passengers, and most tourist attraction recommendation systems at present have insufficient precision in personalized precision attraction recommendation of users, and do not have good balance profit and user preference in the recommendation process, namely, the recommended attractions are either high in profit but dislike by users or are favored but low in profit by users.
With the development of deep reinforcement learning technology, scenic spot recommendation of image information and auxiliary information for different users is getting more and more important. The current recommendation system aiming at the user characteristics only carries out recommendation according to the user images or a small part of user labels, has certain value in universality but cannot achieve higher accuracy and personalized recommendation, so that the recommendation in the aspects of personalized requirements and accurate recommendation of people cannot be met.
Disclosure of Invention
The invention solves the technical problem of providing a personalized customer recommendation system based on deep reinforcement learning, which can realize accurate recommendation.
The technical scheme adopted for solving the technical problems is as follows: the personalized customer recommendation system based on deep reinforcement learning comprises an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module;
the image data acquisition module is used for acquiring head portrait information of a user in a store;
the mobile data acquisition module is used for acquiring scenic spots selected by users and user information;
the online data module is a database stored in a server, and user history information is stored in the database;
the processing terminal is used for establishing an environment model for the received information and giving an optimal scenic spot recommendation scheme according to the environment model.
Further is: the mobile data acquisition module is a mobile terminal.
Further is: the image data acquisition module comprises an electronic camera, and the electronic camera is in communication connection with the first communication module through a USB interface.
Further is: the first communication module and the second communication module are WI-FI modules.
The invention also discloses a personalized customer recommending method based on deep reinforcement learning, which comprises the following steps,
s1: the image data acquisition module acquires user information in real time;
s2: the image data acquisition module is communicated with the processing terminal through a USB interface, and the image information received from the data acquisition module is transmitted to the data processing terminal;
s3: after the data processing terminal receives the image information, the modeling unit establishes an environment model for the received data information; the decision unit provides an optimal scenic spot type recommendation scheme according to the environment model and transmits the scheme to the mobile data acquisition module;
s4: the mobile data acquisition module transmits the scenic spot selected by the user and the user information to the data processing terminal;
and S5, after the data processing terminal receives the information, acquiring user history information through an online data module, combining profit data through feature matching, giving out a final recommended scenic spot through the user history information and user habits, and transmitting the final recommended scenic spot to the mobile data acquisition module.
Further is: the building of the environment model in step S3 includes the following steps:
s31: modeling scenic spot recommendation problems as a Markov decision process model, and modeling states, actions and immediate rewarding functions in the scenic spot recommendation problems;
s32: establishing a return value function model;
s33: and solving an optimal adjustment scheme by using a DQN deep reinforcement learning algorithm.
Further is: the user history information in step S5 is obtained by inquiring a database in the server through the user identity card, the final recommended scenic spot is given, specifically, the comprehensive scores of the user history information and profit data are utilized for selection, and the ranking is carried out based on the habit of the user.
Further is: the specific modeling and deep reinforcement learning algorithm is as follows:
step one: modeling the scenic spot recommendation problem as an MDP model, and defining states, actions and immediate rewards functions therein;
(a) The status is denoted by s, and if the gender obtained by image analysis of the user entering the store at a certain moment is Sex, age, consumption level Co, and geographic position Pos, the user status at that moment can be expressed as:
s=(Sex,Age,Co,Pos)
wherein the consumption level is derived from the user's clothing;
(b) Action, denoted by a, let the scenic spot type be selected as 1, not selected as 0, let the system select n different types of scenic spots, the n scenic spots being the highest scenic spot in the subdivision n scenic spot types that gives an internal score based on scenic spot playability, the set of actions that the system can take is:
a= { [0,1], … …, [0,1] }; [0,1] n in total, n being a natural number;
(c) An immediate rewards function, denoted r, in the system, indicating a user-selected preference, the rewards are indicated as:
r=r1+1.5*r2
r1=i (10+0.01×r) where I represents an indication function, 1 when clicked by the user, or 0 otherwise;
r is the average profit of the corresponding type scenic spot;
r2=i (100+0.01×r); r is the average profit of the corresponding type of the selected scenic spot;
where r1 represents the user clicking on the prize for the selected attraction type and r2 represents the prize for the attraction ultimately selected by the user;
step two: establishing a value function return model;
let R (s, a) denote the return value of action a in state s, the value function Q (s, a) being the expectation of R (s, a), Q (s, a) =e [ R (s, a) ];
step three: solving an optimal strategy by using a DQN deep reinforcement learning algorithm;
(1) Initializing a memory playback unit, the capacity being N, for storing samples of training;
(2) Initializing a current value network and randomly initializing a weight parameter omega;
(3) Initializing a target value network, wherein the structure and the initialization weight are the same as those of the current value network;
(4) And (3) obtaining Q (s, a) in any state s through a current value network by using a photograph showing road conditions, calculating a value function through the current value network, selecting an action a by using an E-greedy strategy, recording each state transition action as a time step t, and storing the data (s, a, r, s') obtained in each time step into a playback memory unit.
(5) Defining a loss function L (ω):
L(ω)=E[(r+γmaxa′Q(s′,a′;ω - )-Q(s,a;ω)) 2 ]
(6) Randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting Q (s, a), s', r to a current value network, a target value network and L (omega), updating L (omega) about omega by using a gradient descent method, and solving an optimal strategy, wherein the method for updating a value function by using a DQN algorithm is as follows:
Q(s,a)←Q(s,a)+α[r+γmax a’ Q(s’,a’;ω - )-Q(s,a)]
s←s′
a←a′
wherein gamma is a discount factor, which depends on the actual convergence;
(7) Updating parameters of the target value network as parameters of the current value network after every N iterations;
step four: and transmitting m scenic spots with highest probability output by the DQN network in the n scenic spot types to a mobile data acquisition module for selection by a user.
Further is: step five: based on the types of scenic spots selected by the user, the travel distance, the historical information of the user and the places where the user is located are subjected to feature matching in all scenic spots, X scenic spots with highest scores after the scenic spots travelled by the user are excluded are selected based on the evaluation system, the X scenic spots are sequenced based on the habit of the user and then transmitted to the mobile data acquisition module for the user to select, and the evaluation system P is as follows:
P=p1+0.01*R’+10*I’
wherein p1 is the profit of the attraction given an internal score based on attraction playability, R' is 1 if the user travels through the type of attraction, or 0 otherwise;
wherein the user habits are: obtained from experiments performed for randomly selecting multiple persons, it was found that a user selected one of a plurality of options for similar attractions, in which the probability was higher, placed in the middle when sorting.
The beneficial effects of the invention are as follows: according to the invention, accurate scenic spot recommendation can be performed according to the characteristics and the preferences of different users and on the basis of considering profits of travel agencies, so that the user satisfaction can be greatly improved, and the profits of the travel agencies can be improved to a certain extent.
Drawings
FIG. 1 is a schematic diagram of an intelligent scenic spot recommendation system.
Fig. 2 is a schematic diagram of a data processing terminal structure of the intelligent scenic spot recommendation system of the invention.
FIG. 3 is a schematic diagram of a training process of the DQN deep reinforcement learning algorithm.
FIG. 4 is a flow chart of a method of the intelligent scenic spot recommendation system of the invention.
Detailed Description
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below.
It will be understood that when an element is referred to as being "fixed to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
A personalized customer recommendation system based on deep reinforcement learning as shown in fig. 1, wherein: the system comprises an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module;
the image data acquisition module is used for acquiring head portrait information of a user in a store and can specifically comprise two acquisition cameras, wherein one image data acquisition module is aligned with a gate, and the other image data acquisition module is aligned with a hall;
the mobile data acquisition module is used for acquiring scenic spots selected by users and user information;
the online data module is a database stored in a server, and user history information is stored in the database;
the processing terminal is used for establishing an environment model for the received information and giving an optimal scenic spot recommendation scheme according to the environment model;
in operation, each 4 frames of acquired user images are used as a group to be transmitted to a processing terminal through a deep neural network, the result is transmitted to the processing terminal, the processing terminal collects data, a Markov decision process is constructed through a reinforcement learning method, and an optimal strategy is solved, namely, the most matched scenic spots are selected from a characteristic scenic spot type pool according to the user images, the selected most matched scenic spot types are transmitted to a mobile device APP of a travel agency staff, the user selects partial favorite scenic spots from the scenic spot types, the favorite scenic spots of the user, the selected travel types (short distance, medium distance and long distance) are transmitted to the processing terminal through the mobile device, the processing terminal acquires the travelled scenic spots of the user in a user database by combining the acquired information and an online server, and then selects from a location of the travel agency and a limited scenic spot pool of the selected type (excluding the travelled scenic spot of the user) on the basis of comprehensively considering the preference and profits of the user, and finally the optimal recommended scenic spot is provided
Specifically, the mobile data acquisition module is a mobile terminal, the image data acquisition module comprises an electronic camera, the electronic camera is in communication connection with the first communication module through a USB interface, the first communication module and the second communication module are WI-FI modules, and the types of the WI-FI modules are as follows: SKW77.
The system comprises the following steps in operation:
s1: the image data acquisition module acquires user information in real time;
s2: the image data acquisition module is communicated with the processing terminal through a USB interface, and the image information received from the data acquisition module is transmitted to the data processing terminal;
s3: after the data processing terminal receives the image information, the modeling unit establishes an environment model for the received data information; the decision unit provides an optimal scenic spot type recommendation scheme according to the environment model and transmits the scheme to the mobile data acquisition module;
s4: the mobile data acquisition module transmits the scenic spot selected by the user and the user information to the data processing terminal;
and S5, after receiving the information, the data processing terminal acquires user history information through an online data module, combines profit data through feature matching, gives out final recommended sceneries through the user history information and user habits, and transmits the final recommended sceneries to the mobile data acquisition module, wherein the user history information is obtained by inquiring a database in a server through a user identity card, the final recommended sceneries are given out, specifically, the feature matching is utilized, the comprehensive scores of the user history information and the profit data are selected, and sorting is carried out based on the user habits.
Wherein: the building of the environment model in step S3 includes the following steps:
s31: modeling scenic spot recommendation problems as a Markov decision process model, and modeling states, actions and immediate rewarding functions in the scenic spot recommendation problems;
s32: establishing a return value function model;
s33: and solving an optimal adjustment scheme by using a DQN deep reinforcement learning algorithm.
The specific modeling and deep reinforcement learning algorithm is as follows:
step one: modeling the scenic spot recommendation problem as an MDP model, and defining states, actions and immediate rewards functions therein;
(a) The status is denoted by s, and if the gender obtained by image analysis of the user entering the store at a certain moment is Sex, age, consumption level Co, and geographic position Pos, the user status at that moment can be expressed as:
s=(Sex,Age,Co,Pos)
wherein the consumption level is derived from the user's clothing;
(b) Action, denoted by a, let the scenic spot type be selected as 1, not selected as 0, let the system select n different types of scenic spots, the n scenic spots being the highest scenic spot in the subdivision n scenic spot types that gives an internal score based on scenic spot playability, the set of actions that the system can take is:
a= { [0,1], … …, [0,1] }; [0,1] n in total;
(c) An immediate rewards function, denoted r, in the system, indicating a user-selected preference, the rewards are indicated as:
r=r1+1.5*r2
r1=i (10+0.01×r) where I represents an indication function, 1 when clicked by the user, or 0 otherwise;
r is the average profit of the corresponding type scenic spot;
r2=i (100+0.01×r); r is the average profit of the corresponding type of the selected scenic spot;
where r1 represents the user clicking on the prize for the selected attraction type and r2 represents the prize for the attraction ultimately selected by the user;
step two: establishing a value function return model;
let R (s, a) denote the return value of action a in state s, the value function Q (s, a) being the expectation of R (s, a), Q (s, a) =e [ R (s, a) ];
step three: solving an optimal strategy by using a DQN deep reinforcement learning algorithm;
(1) Initializing a memory playback unit, the capacity being N, for storing samples of training;
(2) Initializing a current value network and randomly initializing a weight parameter omega;
(3) Initializing a target value network, wherein the structure and the initialization weight are the same as those of the current value network;
(4) And (3) obtaining Q (s, a) in any state s through a current value network by using a photograph showing road conditions, calculating a value function through the current value network, selecting an action a by using an E-greedy strategy, recording each state transition action as a time step t, and storing the data (s, a, r, s') obtained in each time step into a playback memory unit.
(5) Defining a loss function L (ω):
L(ω)=E[(r+γmaxa′Q(s′,a′;ω - )-Q(s,a;ω)) 2 ]
(6) Randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting Q (s, a), s', r to a current value network, a target value network and L (omega), updating L (omega) about omega by using a gradient descent method, and solving an optimal strategy, wherein the method for updating a value function by using a DQN algorithm is as follows:
Q(s,a)←Q(s,a)+α[r+γmax a’ Q(s’,a’;ω - )-Q(s,a)]
s←s′
a←a′
wherein gamma is a discount factor, which depends on the actual convergence;
(7) And updating the parameters of the target value network as the parameters of the current value network after every N iterations.
Step four: the m scenic spots with highest probability output by the DQN network in the n scenic spot types are transmitted to the mobile data acquisition module for selection by a user, and n and m are natural numbers and can be set arbitrarily according to actual conditions, such as: the 10 scenic spots with highest probability output by the DQN network in the 30 scenic spot types are transmitted to a mobile data acquisition module for selection by a user, and 1-5 scenic spot types selected by the user and travel distance selected by the user are received;
step five: based on the scenic spot type selected by the user, the distance travelled, the historical information of the user (obtained by inquiring through an online data module) and the place where the user is located are subjected to feature matching in all scenic spots, X scenic spots with highest scores after the scenic spots travelled by the user are excluded are selected based on an evaluation system, and the X is a natural number and can be set according to actual conditions, such as: x can be 3, 5, 6, etc., and the X sceneries are sequenced based on the habit of the user and then transmitted to the mobile data acquisition module for the user to select, and the evaluation system P is as follows:
P=p1+0.01*R’+10*I’
wherein p1 is the profit of the attraction given an internal score based on attraction playability, R' is 1 if the user travels through the type of attraction, or 0 otherwise;
wherein the user habits are: obtained from experiments performed for randomly selecting multiple persons, it was found that a user selected a attraction with a higher probability among options for multiple similar attractions, placed the attraction with the higher probability in the middle when sorting, such as: 1000 persons are randomly selected to conduct experiments, and the probability that the user selects the middle three options in the options of 5 similar sceneries is higher, so that the three sceneries with highest profit are placed in the middle when 5 sceneries are ordered.
The deep reinforcement learning is to introduce a convolutional neural network on the basis of the traditional reinforcement learning, so that the problem of dimension disasters caused by the fact that the reinforcement learning calculates and stores state-action values one by one under the condition of a high-dimensional state space is solved.
While the foregoing is directed to embodiments of the present invention, other and further details of the invention may be had by the present invention, it should be understood that the foregoing description is merely illustrative of the present invention and that no limitations are intended to the scope of the invention, except insofar as modifications, equivalents, improvements or modifications are within the spirit and principles of the invention.
Claims (6)
1. A personalized customer recommendation method based on deep reinforcement learning is characterized in that: the system comprises an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module;
the image data acquisition module is used for acquiring user head portrait information of a store, wherein the user head portrait information comprises the gender, age, consumption level and geographic position of a user;
the mobile data acquisition module is used for acquiring scenic spots selected by a user and user information, wherein the user information comprises whether the user selects the scenic spots or not;
the online data module is a database stored in the server, wherein the database stores user history information, and the user history information is scenic spots travelled by the user;
the processing terminal is used for establishing a scenic spot recommendation model for the received information and giving an optimal scenic spot recommendation scheme according to the scenic spot recommendation model;
the steps are that,
s1: the image data acquisition module acquires user information in real time;
s2: the image data acquisition module is communicated with the processing terminal through a USB interface, and the image information received from the data acquisition module is transmitted to the data processing terminal;
s3: after the data processing terminal receives the image information, the modeling unit establishes an environment model for the received data information; the decision unit provides an optimal scenic spot type recommendation scheme according to the environment model and transmits the scheme to the mobile data acquisition module;
s4: the mobile data acquisition module transmits the scenic spot selected by the user and the user information to the data processing terminal;
s5, after the data processing terminal receives the information, acquiring user history information through an online data module, combining profit data through feature matching, giving out a final recommended scenic spot through the user history information and user habit, and transmitting the final recommended scenic spot to a mobile data acquisition module;
the building of the environment model in step S3 includes the following steps:
s31: modeling scenic spot recommendation problems as a Markov decision process model, and modeling states, actions and immediate rewarding functions in the scenic spot recommendation problems;
s32: establishing a return value function model;
s33: solving an optimal adjustment scheme by using a DQN deep reinforcement learning algorithm;
the specific modeling and deep reinforcement learning algorithm is as follows:
step one: modeling the scenic spot recommendation problem as an MDP model, and defining states, actions and immediate rewards functions therein;
(a) The status is denoted by s, and if the gender obtained by image analysis of the user entering the store at a certain moment is Sex, age, consumption level Co, and geographic position Pos, the user status at that moment can be expressed as:
s=(Sex,Age,Co,Pos)
wherein the consumption level is derived from the user's clothing;
(b) Action, denoted by a, let the scenic spot type be selected as 1, not selected as 0, let the system select n different types of scenic spots, the n scenic spots being the highest scenic spot in the subdivision n scenic spot types that gives an internal score based on scenic spot playability, the set of actions that the system can take is:
a= { [0,1], … …, [0,1] }; [0,1] n in total, n being a natural number;
(c) An immediate rewards function, denoted r, in the system, indicating a user-selected preference, the rewards are indicated as:
r=r1+1.5*r2;
r1=i (10+0.01×r) where I represents an indication function, 1 when clicked by the user, or 0 otherwise;
r is the average profit of the corresponding type scenic spot;
r2=i (100+0.01×r); r is the average profit of the corresponding type of the selected scenic spot;
where r1 represents the user clicking on the prize for the selected attraction type and r2 represents the prize for the attraction ultimately selected by the user;
step two: establishing a value function return model;
let R (s, a) denote the return value of action a in state s, the value function Q (s, a) being the expectation of R (s, a), Q (s, a) =e [ R (s, a) ];
step three: solving an optimal strategy by using a DQN deep reinforcement learning algorithm;
(1) Initializing a memory playback unit, the capacity being N, for storing samples of training;
(2) Initializing a current value network and randomly initializing a weight parameter omega;
(3) Initializing a target value network, wherein the structure and the initialization weight are the same as those of the current value network;
(4) After Q (s, a) in any state s is obtained through a current value network and a value function is calculated through the current value network, an E-greedy strategy is used for selecting action a, each state transition is performed, the action is recorded as a time step t, and data (s, a, r, s') obtained in each time step are stored in a playback memory unit;
(5) Defining a loss function L (ω):
L(ω)=E[(r+γmaxa′Q(s′,a′;ω - )-Q(s,a;ω)) 2 ]
(6) Randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting Q (s, a), s', r to a current value network, a target value network and L (omega), updating L (omega) about omega by using a gradient descent method, and solving an optimal strategy, wherein the method for updating a value function by using a DQN algorithm is as follows:
Q(s,a)←Q(s,a)+α[r+γmax a ’Q(s’,a’;ω - )-Q(s,a)]
s←s′
a←a′
wherein gamma is a discount factor, which depends on the actual convergence;
(7) Updating parameters of the target value network as parameters of the current value network after every N iterations;
step four: and transmitting m scenic spots with highest probability output by the DQN network in the n scenic spot types to a mobile data acquisition module for user selection, wherein n and m are natural numbers, and n is more than m.
2. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the user history information in step S5 is obtained by inquiring a database in the server through the user identity card, the final recommended scenic spot is given, specifically, the comprehensive scores of the user history information and profit data are utilized for selection, and the ranking is carried out based on the habit of the user.
3. The personalized customer recommendation method based on deep reinforcement learning of claim 2, wherein:
step five: based on the types of scenic spots selected by the user, the travel distance, the historical information of the user and the places where the user is located are subjected to feature matching in all scenic spots, X scenic spots with highest scores after the scenic spots travelled by the user are excluded are selected based on the evaluation system, the X scenic spots are sequenced based on the habit of the user and then transmitted to the mobile data acquisition module for the user to select, and the evaluation system P is as follows:
P=p1+0.01*R’+10*I’
wherein p1 is the profit of the attraction given an internal score based on attraction playability, R' is 1 if the user travels through the type of attraction, or 0 otherwise;
wherein the user habits are: obtained from experiments performed for randomly selecting multiple persons, it was found that a user selected one of a plurality of options for similar attractions, in which the probability was higher, placed in the middle when sorting.
4. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the mobile data acquisition module is a mobile terminal.
5. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the image data acquisition module comprises an electronic camera, and the electronic camera is in communication connection with the first communication module through a USB interface.
6. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the first communication module and the second communication module are WI-FI modules.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110365717.6A CN113158086B (en) | 2021-04-06 | 2021-04-06 | Personalized customer recommendation system and method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110365717.6A CN113158086B (en) | 2021-04-06 | 2021-04-06 | Personalized customer recommendation system and method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113158086A CN113158086A (en) | 2021-07-23 |
CN113158086B true CN113158086B (en) | 2023-05-05 |
Family
ID=76888787
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110365717.6A Active CN113158086B (en) | 2021-04-06 | 2021-04-06 | Personalized customer recommendation system and method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113158086B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114254837B (en) * | 2021-12-28 | 2024-07-02 | 西安交通大学 | Travel route customization method and system based on deep reinforcement learning |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199458A (en) * | 2019-12-30 | 2020-05-26 | 北京航空航天大学 | Recommendation system based on meta-learning and reinforcement learning |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3079116A1 (en) * | 2015-04-10 | 2016-10-12 | Tata Consultancy Services Limited | System and method for generating recommendations |
CN110874780B (en) * | 2018-09-01 | 2023-11-14 | 昆山炫生活信息技术股份有限公司 | Scenic spot playing system and recommendation method based on big data statistics |
CN110263256B (en) * | 2019-06-21 | 2022-12-02 | 西安电子科技大学 | Personalized recommendation method based on multi-mode heterogeneous information |
CN111415198B (en) * | 2020-03-19 | 2023-04-28 | 桂林电子科技大学 | Tourist behavior preference modeling method based on reverse reinforcement learning |
CN112182398B (en) * | 2020-10-13 | 2022-05-10 | 福州大学 | Scenic spot recommendation method and system considering long-term preference and short-term preference of user |
-
2021
- 2021-04-06 CN CN202110365717.6A patent/CN113158086B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199458A (en) * | 2019-12-30 | 2020-05-26 | 北京航空航天大学 | Recommendation system based on meta-learning and reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN113158086A (en) | 2021-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8577962B2 (en) | Server apparatus, client apparatus, content recommendation method, and program | |
CN107291888B (en) | Machine learning statistical model-based living recommendation system method near living hotel | |
CN106682035A (en) | Individualized learning recommendation method and device | |
CN108665083A (en) | A kind of method and system for advertisement recommendation for dynamic trajectory model of being drawn a portrait based on user | |
Renjith et al. | A personalized mobile travel recommender system using hybrid algorithm | |
CN109872664A (en) | A kind of wisdom tour guide device | |
Perić et al. | Determinants of active sport event tourists’ expenditure–the case of mountain bikers and trail runners | |
CN113158086B (en) | Personalized customer recommendation system and method based on deep reinforcement learning | |
US20180197149A1 (en) | Automated methods and systems to schedule activities | |
CN110781256B (en) | Method and device for determining POI matched with Wi-Fi based on sending position data | |
CN116823534A (en) | Intelligent service virtual man system for text travel industry based on multi-mode large model | |
CN114386664A (en) | Personalized travel route recommendation method based on reinforcement learning | |
CN113515693A (en) | City scenic spot wisdom recommendation system | |
CN116503209A (en) | Digital twin system based on artificial intelligence and data driving | |
CN104915365B (en) | Activity flow recommendation method and device | |
Celdir et al. | Popularity bias in online dating platforms: Theory and empirical evidence | |
CN116226537A (en) | Layout and display method, device, equipment and medium of page modules in page | |
CN115438871A (en) | Ice and snow scenic spot recommendation method and system integrating preference and eliminating popularity deviation | |
CN114721572A (en) | Visual display method, device, medium, equipment and system for dream | |
CN102959560A (en) | Automatic appeal measurement method | |
CN110809489B (en) | Information processing apparatus, information processing method, and storage medium | |
US20170169490A1 (en) | Automated Personalized Product Specification | |
Papadakis et al. | Visit planner: A personalized mobile trip design application based on a hybrid recommendation model | |
CN117273869B (en) | Intelligent agricultural product pushing method, system, device and medium based on user data | |
CN113537548B (en) | Recommendation method, device and equipment for driving route |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |