CN113158086B - Personalized customer recommendation system and method based on deep reinforcement learning - Google Patents

Personalized customer recommendation system and method based on deep reinforcement learning Download PDF

Info

Publication number
CN113158086B
CN113158086B CN202110365717.6A CN202110365717A CN113158086B CN 113158086 B CN113158086 B CN 113158086B CN 202110365717 A CN202110365717 A CN 202110365717A CN 113158086 B CN113158086 B CN 113158086B
Authority
CN
China
Prior art keywords
user
data acquisition
acquisition module
scenic spot
processing terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110365717.6A
Other languages
Chinese (zh)
Other versions
CN113158086A (en
Inventor
陈建平
傅启明
黄泽天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Beierxiong Technology Co ltd
Original Assignee
Zhejiang Beierxiong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Beierxiong Technology Co ltd filed Critical Zhejiang Beierxiong Technology Co ltd
Priority to CN202110365717.6A priority Critical patent/CN113158086B/en
Publication of CN113158086A publication Critical patent/CN113158086A/en
Application granted granted Critical
Publication of CN113158086B publication Critical patent/CN113158086B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/587Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/14Travel agencies

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a personalized customer recommendation system and method based on deep reinforcement learning, comprising an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module.

Description

Personalized customer recommendation system and method based on deep reinforcement learning
Technical Field
The invention relates to the field of recommendation systems of tourist routes and scenic spots, in particular to a personalized customer recommendation system and method based on deep reinforcement learning.
Background
With the development of economy, the travel industry is also rapidly developed, the travel will of people is also stronger, meanwhile, the travel demands of people are also diversified, and the difficulty of providing the needed travel information for passengers is objectively increased. In addition, personalized custom tourist attraction recommendation is also increasingly popular with passengers, and most tourist attraction recommendation systems at present have insufficient precision in personalized precision attraction recommendation of users, and do not have good balance profit and user preference in the recommendation process, namely, the recommended attractions are either high in profit but dislike by users or are favored but low in profit by users.
With the development of deep reinforcement learning technology, scenic spot recommendation of image information and auxiliary information for different users is getting more and more important. The current recommendation system aiming at the user characteristics only carries out recommendation according to the user images or a small part of user labels, has certain value in universality but cannot achieve higher accuracy and personalized recommendation, so that the recommendation in the aspects of personalized requirements and accurate recommendation of people cannot be met.
Disclosure of Invention
The invention solves the technical problem of providing a personalized customer recommendation system based on deep reinforcement learning, which can realize accurate recommendation.
The technical scheme adopted for solving the technical problems is as follows: the personalized customer recommendation system based on deep reinforcement learning comprises an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module;
the image data acquisition module is used for acquiring head portrait information of a user in a store;
the mobile data acquisition module is used for acquiring scenic spots selected by users and user information;
the online data module is a database stored in a server, and user history information is stored in the database;
the processing terminal is used for establishing an environment model for the received information and giving an optimal scenic spot recommendation scheme according to the environment model.
Further is: the mobile data acquisition module is a mobile terminal.
Further is: the image data acquisition module comprises an electronic camera, and the electronic camera is in communication connection with the first communication module through a USB interface.
Further is: the first communication module and the second communication module are WI-FI modules.
The invention also discloses a personalized customer recommending method based on deep reinforcement learning, which comprises the following steps,
s1: the image data acquisition module acquires user information in real time;
s2: the image data acquisition module is communicated with the processing terminal through a USB interface, and the image information received from the data acquisition module is transmitted to the data processing terminal;
s3: after the data processing terminal receives the image information, the modeling unit establishes an environment model for the received data information; the decision unit provides an optimal scenic spot type recommendation scheme according to the environment model and transmits the scheme to the mobile data acquisition module;
s4: the mobile data acquisition module transmits the scenic spot selected by the user and the user information to the data processing terminal;
and S5, after the data processing terminal receives the information, acquiring user history information through an online data module, combining profit data through feature matching, giving out a final recommended scenic spot through the user history information and user habits, and transmitting the final recommended scenic spot to the mobile data acquisition module.
Further is: the building of the environment model in step S3 includes the following steps:
s31: modeling scenic spot recommendation problems as a Markov decision process model, and modeling states, actions and immediate rewarding functions in the scenic spot recommendation problems;
s32: establishing a return value function model;
s33: and solving an optimal adjustment scheme by using a DQN deep reinforcement learning algorithm.
Further is: the user history information in step S5 is obtained by inquiring a database in the server through the user identity card, the final recommended scenic spot is given, specifically, the comprehensive scores of the user history information and profit data are utilized for selection, and the ranking is carried out based on the habit of the user.
Further is: the specific modeling and deep reinforcement learning algorithm is as follows:
step one: modeling the scenic spot recommendation problem as an MDP model, and defining states, actions and immediate rewards functions therein;
(a) The status is denoted by s, and if the gender obtained by image analysis of the user entering the store at a certain moment is Sex, age, consumption level Co, and geographic position Pos, the user status at that moment can be expressed as:
s=(Sex,Age,Co,Pos)
wherein the consumption level is derived from the user's clothing;
(b) Action, denoted by a, let the scenic spot type be selected as 1, not selected as 0, let the system select n different types of scenic spots, the n scenic spots being the highest scenic spot in the subdivision n scenic spot types that gives an internal score based on scenic spot playability, the set of actions that the system can take is:
a= { [0,1], … …, [0,1] }; [0,1] n in total, n being a natural number;
(c) An immediate rewards function, denoted r, in the system, indicating a user-selected preference, the rewards are indicated as:
r=r1+1.5*r2
r1=i (10+0.01×r) where I represents an indication function, 1 when clicked by the user, or 0 otherwise;
r is the average profit of the corresponding type scenic spot;
r2=i (100+0.01×r); r is the average profit of the corresponding type of the selected scenic spot;
where r1 represents the user clicking on the prize for the selected attraction type and r2 represents the prize for the attraction ultimately selected by the user;
step two: establishing a value function return model;
let R (s, a) denote the return value of action a in state s, the value function Q (s, a) being the expectation of R (s, a), Q (s, a) =e [ R (s, a) ];
step three: solving an optimal strategy by using a DQN deep reinforcement learning algorithm;
(1) Initializing a memory playback unit, the capacity being N, for storing samples of training;
(2) Initializing a current value network and randomly initializing a weight parameter omega;
(3) Initializing a target value network, wherein the structure and the initialization weight are the same as those of the current value network;
(4) And (3) obtaining Q (s, a) in any state s through a current value network by using a photograph showing road conditions, calculating a value function through the current value network, selecting an action a by using an E-greedy strategy, recording each state transition action as a time step t, and storing the data (s, a, r, s') obtained in each time step into a playback memory unit.
(5) Defining a loss function L (ω):
L(ω)=E[(r+γmaxa′Q(s′,a′;ω - )-Q(s,a;ω)) 2 ]
(6) Randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting Q (s, a), s', r to a current value network, a target value network and L (omega), updating L (omega) about omega by using a gradient descent method, and solving an optimal strategy, wherein the method for updating a value function by using a DQN algorithm is as follows:
Q(s,a)←Q(s,a)+α[r+γmax a’ Q(s’,a’;ω - )-Q(s,a)]
s←s′
a←a′
wherein gamma is a discount factor, which depends on the actual convergence;
(7) Updating parameters of the target value network as parameters of the current value network after every N iterations;
step four: and transmitting m scenic spots with highest probability output by the DQN network in the n scenic spot types to a mobile data acquisition module for selection by a user.
Further is: step five: based on the types of scenic spots selected by the user, the travel distance, the historical information of the user and the places where the user is located are subjected to feature matching in all scenic spots, X scenic spots with highest scores after the scenic spots travelled by the user are excluded are selected based on the evaluation system, the X scenic spots are sequenced based on the habit of the user and then transmitted to the mobile data acquisition module for the user to select, and the evaluation system P is as follows:
P=p1+0.01*R’+10*I’
wherein p1 is the profit of the attraction given an internal score based on attraction playability, R' is 1 if the user travels through the type of attraction, or 0 otherwise;
wherein the user habits are: obtained from experiments performed for randomly selecting multiple persons, it was found that a user selected one of a plurality of options for similar attractions, in which the probability was higher, placed in the middle when sorting.
The beneficial effects of the invention are as follows: according to the invention, accurate scenic spot recommendation can be performed according to the characteristics and the preferences of different users and on the basis of considering profits of travel agencies, so that the user satisfaction can be greatly improved, and the profits of the travel agencies can be improved to a certain extent.
Drawings
FIG. 1 is a schematic diagram of an intelligent scenic spot recommendation system.
Fig. 2 is a schematic diagram of a data processing terminal structure of the intelligent scenic spot recommendation system of the invention.
FIG. 3 is a schematic diagram of a training process of the DQN deep reinforcement learning algorithm.
FIG. 4 is a flow chart of a method of the intelligent scenic spot recommendation system of the invention.
Detailed Description
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below.
It will be understood that when an element is referred to as being "fixed to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
A personalized customer recommendation system based on deep reinforcement learning as shown in fig. 1, wherein: the system comprises an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module;
the image data acquisition module is used for acquiring head portrait information of a user in a store and can specifically comprise two acquisition cameras, wherein one image data acquisition module is aligned with a gate, and the other image data acquisition module is aligned with a hall;
the mobile data acquisition module is used for acquiring scenic spots selected by users and user information;
the online data module is a database stored in a server, and user history information is stored in the database;
the processing terminal is used for establishing an environment model for the received information and giving an optimal scenic spot recommendation scheme according to the environment model;
in operation, each 4 frames of acquired user images are used as a group to be transmitted to a processing terminal through a deep neural network, the result is transmitted to the processing terminal, the processing terminal collects data, a Markov decision process is constructed through a reinforcement learning method, and an optimal strategy is solved, namely, the most matched scenic spots are selected from a characteristic scenic spot type pool according to the user images, the selected most matched scenic spot types are transmitted to a mobile device APP of a travel agency staff, the user selects partial favorite scenic spots from the scenic spot types, the favorite scenic spots of the user, the selected travel types (short distance, medium distance and long distance) are transmitted to the processing terminal through the mobile device, the processing terminal acquires the travelled scenic spots of the user in a user database by combining the acquired information and an online server, and then selects from a location of the travel agency and a limited scenic spot pool of the selected type (excluding the travelled scenic spot of the user) on the basis of comprehensively considering the preference and profits of the user, and finally the optimal recommended scenic spot is provided
Specifically, the mobile data acquisition module is a mobile terminal, the image data acquisition module comprises an electronic camera, the electronic camera is in communication connection with the first communication module through a USB interface, the first communication module and the second communication module are WI-FI modules, and the types of the WI-FI modules are as follows: SKW77.
The system comprises the following steps in operation:
s1: the image data acquisition module acquires user information in real time;
s2: the image data acquisition module is communicated with the processing terminal through a USB interface, and the image information received from the data acquisition module is transmitted to the data processing terminal;
s3: after the data processing terminal receives the image information, the modeling unit establishes an environment model for the received data information; the decision unit provides an optimal scenic spot type recommendation scheme according to the environment model and transmits the scheme to the mobile data acquisition module;
s4: the mobile data acquisition module transmits the scenic spot selected by the user and the user information to the data processing terminal;
and S5, after receiving the information, the data processing terminal acquires user history information through an online data module, combines profit data through feature matching, gives out final recommended sceneries through the user history information and user habits, and transmits the final recommended sceneries to the mobile data acquisition module, wherein the user history information is obtained by inquiring a database in a server through a user identity card, the final recommended sceneries are given out, specifically, the feature matching is utilized, the comprehensive scores of the user history information and the profit data are selected, and sorting is carried out based on the user habits.
Wherein: the building of the environment model in step S3 includes the following steps:
s31: modeling scenic spot recommendation problems as a Markov decision process model, and modeling states, actions and immediate rewarding functions in the scenic spot recommendation problems;
s32: establishing a return value function model;
s33: and solving an optimal adjustment scheme by using a DQN deep reinforcement learning algorithm.
The specific modeling and deep reinforcement learning algorithm is as follows:
step one: modeling the scenic spot recommendation problem as an MDP model, and defining states, actions and immediate rewards functions therein;
(a) The status is denoted by s, and if the gender obtained by image analysis of the user entering the store at a certain moment is Sex, age, consumption level Co, and geographic position Pos, the user status at that moment can be expressed as:
s=(Sex,Age,Co,Pos)
wherein the consumption level is derived from the user's clothing;
(b) Action, denoted by a, let the scenic spot type be selected as 1, not selected as 0, let the system select n different types of scenic spots, the n scenic spots being the highest scenic spot in the subdivision n scenic spot types that gives an internal score based on scenic spot playability, the set of actions that the system can take is:
a= { [0,1], … …, [0,1] }; [0,1] n in total;
(c) An immediate rewards function, denoted r, in the system, indicating a user-selected preference, the rewards are indicated as:
r=r1+1.5*r2
r1=i (10+0.01×r) where I represents an indication function, 1 when clicked by the user, or 0 otherwise;
r is the average profit of the corresponding type scenic spot;
r2=i (100+0.01×r); r is the average profit of the corresponding type of the selected scenic spot;
where r1 represents the user clicking on the prize for the selected attraction type and r2 represents the prize for the attraction ultimately selected by the user;
step two: establishing a value function return model;
let R (s, a) denote the return value of action a in state s, the value function Q (s, a) being the expectation of R (s, a), Q (s, a) =e [ R (s, a) ];
step three: solving an optimal strategy by using a DQN deep reinforcement learning algorithm;
(1) Initializing a memory playback unit, the capacity being N, for storing samples of training;
(2) Initializing a current value network and randomly initializing a weight parameter omega;
(3) Initializing a target value network, wherein the structure and the initialization weight are the same as those of the current value network;
(4) And (3) obtaining Q (s, a) in any state s through a current value network by using a photograph showing road conditions, calculating a value function through the current value network, selecting an action a by using an E-greedy strategy, recording each state transition action as a time step t, and storing the data (s, a, r, s') obtained in each time step into a playback memory unit.
(5) Defining a loss function L (ω):
L(ω)=E[(r+γmaxa′Q(s′,a′;ω - )-Q(s,a;ω)) 2 ]
(6) Randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting Q (s, a), s', r to a current value network, a target value network and L (omega), updating L (omega) about omega by using a gradient descent method, and solving an optimal strategy, wherein the method for updating a value function by using a DQN algorithm is as follows:
Q(s,a)←Q(s,a)+α[r+γmax a’ Q(s’,a’;ω - )-Q(s,a)]
s←s′
a←a′
wherein gamma is a discount factor, which depends on the actual convergence;
(7) And updating the parameters of the target value network as the parameters of the current value network after every N iterations.
Step four: the m scenic spots with highest probability output by the DQN network in the n scenic spot types are transmitted to the mobile data acquisition module for selection by a user, and n and m are natural numbers and can be set arbitrarily according to actual conditions, such as: the 10 scenic spots with highest probability output by the DQN network in the 30 scenic spot types are transmitted to a mobile data acquisition module for selection by a user, and 1-5 scenic spot types selected by the user and travel distance selected by the user are received;
step five: based on the scenic spot type selected by the user, the distance travelled, the historical information of the user (obtained by inquiring through an online data module) and the place where the user is located are subjected to feature matching in all scenic spots, X scenic spots with highest scores after the scenic spots travelled by the user are excluded are selected based on an evaluation system, and the X is a natural number and can be set according to actual conditions, such as: x can be 3, 5, 6, etc., and the X sceneries are sequenced based on the habit of the user and then transmitted to the mobile data acquisition module for the user to select, and the evaluation system P is as follows:
P=p1+0.01*R’+10*I’
wherein p1 is the profit of the attraction given an internal score based on attraction playability, R' is 1 if the user travels through the type of attraction, or 0 otherwise;
wherein the user habits are: obtained from experiments performed for randomly selecting multiple persons, it was found that a user selected a attraction with a higher probability among options for multiple similar attractions, placed the attraction with the higher probability in the middle when sorting, such as: 1000 persons are randomly selected to conduct experiments, and the probability that the user selects the middle three options in the options of 5 similar sceneries is higher, so that the three sceneries with highest profit are placed in the middle when 5 sceneries are ordered.
The deep reinforcement learning is to introduce a convolutional neural network on the basis of the traditional reinforcement learning, so that the problem of dimension disasters caused by the fact that the reinforcement learning calculates and stores state-action values one by one under the condition of a high-dimensional state space is solved.
While the foregoing is directed to embodiments of the present invention, other and further details of the invention may be had by the present invention, it should be understood that the foregoing description is merely illustrative of the present invention and that no limitations are intended to the scope of the invention, except insofar as modifications, equivalents, improvements or modifications are within the spirit and principles of the invention.

Claims (6)

1. A personalized customer recommendation method based on deep reinforcement learning is characterized in that: the system comprises an image data acquisition module, a mobile data acquisition module, an online data module and a processing terminal, wherein the image data acquisition module is in communication connection with the processing terminal through a first communication module, the mobile data acquisition module is in communication connection with the processing terminal through a second communication module, and the online data module is in communication connection with the processing terminal through a third communication module;
the image data acquisition module is used for acquiring user head portrait information of a store, wherein the user head portrait information comprises the gender, age, consumption level and geographic position of a user;
the mobile data acquisition module is used for acquiring scenic spots selected by a user and user information, wherein the user information comprises whether the user selects the scenic spots or not;
the online data module is a database stored in the server, wherein the database stores user history information, and the user history information is scenic spots travelled by the user;
the processing terminal is used for establishing a scenic spot recommendation model for the received information and giving an optimal scenic spot recommendation scheme according to the scenic spot recommendation model;
the steps are that,
s1: the image data acquisition module acquires user information in real time;
s2: the image data acquisition module is communicated with the processing terminal through a USB interface, and the image information received from the data acquisition module is transmitted to the data processing terminal;
s3: after the data processing terminal receives the image information, the modeling unit establishes an environment model for the received data information; the decision unit provides an optimal scenic spot type recommendation scheme according to the environment model and transmits the scheme to the mobile data acquisition module;
s4: the mobile data acquisition module transmits the scenic spot selected by the user and the user information to the data processing terminal;
s5, after the data processing terminal receives the information, acquiring user history information through an online data module, combining profit data through feature matching, giving out a final recommended scenic spot through the user history information and user habit, and transmitting the final recommended scenic spot to a mobile data acquisition module;
the building of the environment model in step S3 includes the following steps:
s31: modeling scenic spot recommendation problems as a Markov decision process model, and modeling states, actions and immediate rewarding functions in the scenic spot recommendation problems;
s32: establishing a return value function model;
s33: solving an optimal adjustment scheme by using a DQN deep reinforcement learning algorithm;
the specific modeling and deep reinforcement learning algorithm is as follows:
step one: modeling the scenic spot recommendation problem as an MDP model, and defining states, actions and immediate rewards functions therein;
(a) The status is denoted by s, and if the gender obtained by image analysis of the user entering the store at a certain moment is Sex, age, consumption level Co, and geographic position Pos, the user status at that moment can be expressed as:
s=(Sex,Age,Co,Pos)
wherein the consumption level is derived from the user's clothing;
(b) Action, denoted by a, let the scenic spot type be selected as 1, not selected as 0, let the system select n different types of scenic spots, the n scenic spots being the highest scenic spot in the subdivision n scenic spot types that gives an internal score based on scenic spot playability, the set of actions that the system can take is:
a= { [0,1], … …, [0,1] }; [0,1] n in total, n being a natural number;
(c) An immediate rewards function, denoted r, in the system, indicating a user-selected preference, the rewards are indicated as:
r=r1+1.5*r2;
r1=i (10+0.01×r) where I represents an indication function, 1 when clicked by the user, or 0 otherwise;
r is the average profit of the corresponding type scenic spot;
r2=i (100+0.01×r); r is the average profit of the corresponding type of the selected scenic spot;
where r1 represents the user clicking on the prize for the selected attraction type and r2 represents the prize for the attraction ultimately selected by the user;
step two: establishing a value function return model;
let R (s, a) denote the return value of action a in state s, the value function Q (s, a) being the expectation of R (s, a), Q (s, a) =e [ R (s, a) ];
step three: solving an optimal strategy by using a DQN deep reinforcement learning algorithm;
(1) Initializing a memory playback unit, the capacity being N, for storing samples of training;
(2) Initializing a current value network and randomly initializing a weight parameter omega;
(3) Initializing a target value network, wherein the structure and the initialization weight are the same as those of the current value network;
(4) After Q (s, a) in any state s is obtained through a current value network and a value function is calculated through the current value network, an E-greedy strategy is used for selecting action a, each state transition is performed, the action is recorded as a time step t, and data (s, a, r, s') obtained in each time step are stored in a playback memory unit;
(5) Defining a loss function L (ω):
L(ω)=E[(r+γmaxa′Q(s′,a′;ω - )-Q(s,a;ω)) 2 ]
(6) Randomly extracting one (s, a, r, s ') from the playback memory unit, transmitting Q (s, a), s', r to a current value network, a target value network and L (omega), updating L (omega) about omega by using a gradient descent method, and solving an optimal strategy, wherein the method for updating a value function by using a DQN algorithm is as follows:
Q(s,a)←Q(s,a)+α[r+γmax a ’Q(s’,a’;ω - )-Q(s,a)]
s←s′
a←a′
wherein gamma is a discount factor, which depends on the actual convergence;
(7) Updating parameters of the target value network as parameters of the current value network after every N iterations;
step four: and transmitting m scenic spots with highest probability output by the DQN network in the n scenic spot types to a mobile data acquisition module for user selection, wherein n and m are natural numbers, and n is more than m.
2. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the user history information in step S5 is obtained by inquiring a database in the server through the user identity card, the final recommended scenic spot is given, specifically, the comprehensive scores of the user history information and profit data are utilized for selection, and the ranking is carried out based on the habit of the user.
3. The personalized customer recommendation method based on deep reinforcement learning of claim 2, wherein:
step five: based on the types of scenic spots selected by the user, the travel distance, the historical information of the user and the places where the user is located are subjected to feature matching in all scenic spots, X scenic spots with highest scores after the scenic spots travelled by the user are excluded are selected based on the evaluation system, the X scenic spots are sequenced based on the habit of the user and then transmitted to the mobile data acquisition module for the user to select, and the evaluation system P is as follows:
P=p1+0.01*R’+10*I’
wherein p1 is the profit of the attraction given an internal score based on attraction playability, R' is 1 if the user travels through the type of attraction, or 0 otherwise;
wherein the user habits are: obtained from experiments performed for randomly selecting multiple persons, it was found that a user selected one of a plurality of options for similar attractions, in which the probability was higher, placed in the middle when sorting.
4. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the mobile data acquisition module is a mobile terminal.
5. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the image data acquisition module comprises an electronic camera, and the electronic camera is in communication connection with the first communication module through a USB interface.
6. The personalized customer recommendation method based on deep reinforcement learning of claim 1, wherein: the first communication module and the second communication module are WI-FI modules.
CN202110365717.6A 2021-04-06 2021-04-06 Personalized customer recommendation system and method based on deep reinforcement learning Active CN113158086B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110365717.6A CN113158086B (en) 2021-04-06 2021-04-06 Personalized customer recommendation system and method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110365717.6A CN113158086B (en) 2021-04-06 2021-04-06 Personalized customer recommendation system and method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113158086A CN113158086A (en) 2021-07-23
CN113158086B true CN113158086B (en) 2023-05-05

Family

ID=76888787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110365717.6A Active CN113158086B (en) 2021-04-06 2021-04-06 Personalized customer recommendation system and method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113158086B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114254837B (en) * 2021-12-28 2024-07-02 西安交通大学 Travel route customization method and system based on deep reinforcement learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199458A (en) * 2019-12-30 2020-05-26 北京航空航天大学 Recommendation system based on meta-learning and reinforcement learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3079116A1 (en) * 2015-04-10 2016-10-12 Tata Consultancy Services Limited System and method for generating recommendations
CN110874780B (en) * 2018-09-01 2023-11-14 昆山炫生活信息技术股份有限公司 Scenic spot playing system and recommendation method based on big data statistics
CN110263256B (en) * 2019-06-21 2022-12-02 西安电子科技大学 Personalized recommendation method based on multi-mode heterogeneous information
CN111415198B (en) * 2020-03-19 2023-04-28 桂林电子科技大学 Tourist behavior preference modeling method based on reverse reinforcement learning
CN112182398B (en) * 2020-10-13 2022-05-10 福州大学 Scenic spot recommendation method and system considering long-term preference and short-term preference of user

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199458A (en) * 2019-12-30 2020-05-26 北京航空航天大学 Recommendation system based on meta-learning and reinforcement learning

Also Published As

Publication number Publication date
CN113158086A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
US8577962B2 (en) Server apparatus, client apparatus, content recommendation method, and program
CN107291888B (en) Machine learning statistical model-based living recommendation system method near living hotel
CN106682035A (en) Individualized learning recommendation method and device
CN108665083A (en) A kind of method and system for advertisement recommendation for dynamic trajectory model of being drawn a portrait based on user
Renjith et al. A personalized mobile travel recommender system using hybrid algorithm
CN109872664A (en) A kind of wisdom tour guide device
Perić et al. Determinants of active sport event tourists’ expenditure–the case of mountain bikers and trail runners
CN113158086B (en) Personalized customer recommendation system and method based on deep reinforcement learning
US20180197149A1 (en) Automated methods and systems to schedule activities
CN110781256B (en) Method and device for determining POI matched with Wi-Fi based on sending position data
CN116823534A (en) Intelligent service virtual man system for text travel industry based on multi-mode large model
CN114386664A (en) Personalized travel route recommendation method based on reinforcement learning
CN113515693A (en) City scenic spot wisdom recommendation system
CN116503209A (en) Digital twin system based on artificial intelligence and data driving
CN104915365B (en) Activity flow recommendation method and device
Celdir et al. Popularity bias in online dating platforms: Theory and empirical evidence
CN116226537A (en) Layout and display method, device, equipment and medium of page modules in page
CN115438871A (en) Ice and snow scenic spot recommendation method and system integrating preference and eliminating popularity deviation
CN114721572A (en) Visual display method, device, medium, equipment and system for dream
CN102959560A (en) Automatic appeal measurement method
CN110809489B (en) Information processing apparatus, information processing method, and storage medium
US20170169490A1 (en) Automated Personalized Product Specification
Papadakis et al. Visit planner: A personalized mobile trip design application based on a hybrid recommendation model
CN117273869B (en) Intelligent agricultural product pushing method, system, device and medium based on user data
CN113537548B (en) Recommendation method, device and equipment for driving route

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant