CN113129108B - Product recommendation method and device based on Double DQN algorithm - Google Patents

Product recommendation method and device based on Double DQN algorithm Download PDF

Info

Publication number
CN113129108B
CN113129108B CN202110452994.0A CN202110452994A CN113129108B CN 113129108 B CN113129108 B CN 113129108B CN 202110452994 A CN202110452994 A CN 202110452994A CN 113129108 B CN113129108 B CN 113129108B
Authority
CN
China
Prior art keywords
product
historical
basic information
feature
target user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110452994.0A
Other languages
Chinese (zh)
Other versions
CN113129108A (en
Inventor
王光臣
张衡
张盼盼
王宇
潘宇光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202110452994.0A priority Critical patent/CN113129108B/en
Publication of CN113129108A publication Critical patent/CN113129108A/en
Application granted granted Critical
Publication of CN113129108B publication Critical patent/CN113129108B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a product recommendation method and a system based on a Double DQN algorithm, comprising the following steps: basic information of a target user is obtained; inputting basic information of a target user into a Double DQN algorithm after training, and outputting the prediction satisfaction degree of each product by the Double DQN algorithm; and sorting the products according to the order of the predicted satisfaction degree from large to small, and recommending the sorted products to the target user. Not only the personal information of the user, such as personal risk preference, income situation, etc., but also the information of the product itself, such as historical purchase data of the product, purchase satisfaction of the product, etc., are analyzed sufficiently, so that the most suitable product is recommended to the user.

Description

Product recommendation method and device based on Double DQN algorithm
Technical Field
The invention relates to the technical field of product recommendation, in particular to a product recommendation method and device based on a Double DQN algorithm.
Background
The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.
In recent years, with the rapid development of internet technology, product recommendation systems have rapidly developed, and are now widely used in various services such as e-commerce services and financial product recommendation services.
The current product recommendation methods are generally recommendation methods based on user information, and the methods analyze data such as risk preference of users to obtain similarity between the users and the products, so that corresponding product recommendation is performed according to the similarity. However, the existing product recommendation methods do not fully analyze the information of the product purchased by the user, such as historical purchase data of the product, price change condition of the product, and the like, and do not realize accurate recommendation of the product, so that the product is not accurately recommended to the required clients.
Therefore, in the prior art, the recommending mode and the device of the product cannot be well designed, the requirements of users cannot be met, and satisfactory experience of the users cannot be provided.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a product recommendation method and device based on a Double DQN algorithm.
In a first aspect, the invention provides a product recommendation method based on a Double DQN algorithm;
a product recommendation method based on Double DQN algorithm comprises the following steps:
basic information of a target user is obtained;
processing basic information of a target user and extracting characteristics of the basic information;
inputting the characteristics representing the basic information of the target user into the trained deep reinforcement learning model to obtain the prediction satisfaction degree of each product;
sorting products according to the order of the predicted satisfaction degree from large to small, and recommending the sorted products to a target user;
the deep reinforcement learning model refers to a Double DQN algorithm.
In a second aspect, the invention provides a product recommendation device based on a Double DQN algorithm;
product recommendation device based on Double DQN algorithm includes:
an acquisition module configured to: basic information of a target user is obtained;
a feature extraction module configured to: processing basic information of a target user and extracting characteristics of the basic information;
a prediction module configured to: inputting the characteristics representing the basic information of the target user into the trained deep reinforcement learning model to obtain the prediction satisfaction degree of each product;
a recommendation module configured to: sorting products according to the order of the predicted satisfaction degree from large to small, and recommending the sorted products to a target user;
the deep reinforcement learning model refers to a Double DQN algorithm.
In a third aspect, the present invention also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein the processor is coupled to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of the first aspect.
In a fourth aspect, the present invention also provides a computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the invention has the beneficial effects that: not only the personal information of the user, such as personal risk preference, income situation and the like, but also the information of the product itself, such as historical purchase data of the product, purchase satisfaction degree of the product and the like, are utilized, so that the most suitable product is recommended to the user.
The invention applies a Double-Q learning algorithm (Double DQN algorithm) in deep reinforcement learning to product recommendation, and fully analyzes the data of the product by using the algorithm, thereby recommending the product with higher user satisfaction.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a flow chart of an implementation of a product recommendation method based on a Double DQN algorithm provided by the invention;
FIG. 2 is a reinforcement learning framework diagram of one embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
As shown in fig. 1, the product recommendation method based on Double DQN algorithm includes:
s101: basic information of a target user is obtained;
s102: processing basic information of a target user and extracting characteristics of the basic information;
s103: inputting the characteristics representing the basic information of the target user into the trained deep reinforcement learning model to obtain the prediction satisfaction degree of each product;
s104: sorting products according to the order of the predicted satisfaction degree from large to small, and recommending the sorted products to a target user;
the deep reinforcement learning model refers to a Double DQN algorithm.
Further, the step S101: basic information of a target user is obtained; the method specifically comprises the following steps:
the monthly average income of the target user, the times of purchasing historical products, the frequency of purchasing the historical products, the risk level of purchasing the historical products and the price fluctuation data of purchasing the historical products are obtained.
Further, the step S102: processing basic information of a target user and extracting characteristics of the basic information; the method specifically comprises the following steps:
feature extraction is performed by convolutional neural networks.
Further, the step S103: inputting the characteristics representing the basic information of the target user into the trained deep reinforcement learning model to obtain the prediction satisfaction degree of each product; the training steps comprise:
constructing a training set, wherein the training set is user basic information of known historical purchase satisfaction of products;
preprocessing the basic information of the user in the training set, and taking the state characteristics of the basic information of the user and the historical purchase satisfaction of the known product obtained after preprocessing as input values of a deep reinforcement learning model; and training the model to obtain a trained deep reinforcement learning model.
Further, the preprocessing the basic information of the users in the training set specifically includes:
dividing the average monthly income, the historical product purchase times, the historical product purchase frequency, the risk level of the historical purchased products and the price fluctuation data of the users in the training set by N time units to obtain a plurality of divided data s t The time unit may be divided according to the time dimension of the data, for example, a time unit is set to be one month, and the subscript t represents a time point, so as to record the time interval of the data represented by the state;
all the data in the same time unit after segmentation are subjected to feature extraction through a convolutional neural network CNN to obtain a month average income feature, a historical product purchase frequency feature, a risk level feature of a historical purchased product and a price fluctuation data feature;
the month average income characteristic, the historical product purchase frequency characteristic, the risk grade characteristic of the historical purchased product and the price fluctuation data characteristic are spliced in series to obtain a state characteristic χ(s) corresponding to the same time unit t ) And similarly, obtaining the state characteristics under all time units.
It should be appreciated that due to the monthly average revenue, historical product purchase times, calendar of the users in the training setThe frequency of purchasing historical products, the risk level of purchasing the products and the price fluctuation data are very large in data quantity and very large in variety, so that the input various data are required to be preprocessed to extract the characteristics of all the data so as to reduce the dimension of the data. Dividing a plurality of data s t Extracting features by deep neural network, wherein the extracted features are χ (s t ) There are a variety of feature extraction networks, for example, employing a corresponding feature extraction network such that the number of pairs s t The extracted data state features χ(s) t ) Is a multidimensional vector. If a variety of historical data is considered, the extracted features may be in the form of a combination of multiple vectors, such as a matrix or the like.
Further, the step S103: inputting the characteristics representing the basic information of the target user into the trained deep reinforcement learning model to obtain the prediction satisfaction degree of each product; the method specifically comprises the following steps:
and inputting the characteristics representing the basic information of the target user into the trained deep reinforcement learning model to obtain the predicted satisfaction degree of the product, wherein the predicted satisfaction degree of the product is a value obtained through an optimal Q value function of a Double DQN algorithm.
The training principle of the Double DQN algorithm of deep reinforcement learning is described in detail, and how to obtain all state characteristics χ (s t ) Is the optimal Q value function Q of (2) * (χ(s t ),a):
As shown in fig. 2, at each time point t, the state in which the agent is currently located is characterized as χ (s t ) At this time, the agent performs operation a t Obtaining rewards r from the environment t And observe a new state feature χ (s t+1 )。
The goal of agent learning is to select a strategy pi, which is the action a taken at each time t, to maximize the desired total rewards t I.e., pi= { a t ,a t+1 ,a t+2 ,…a T -where T is the set terminal moment;
maximizing the expected return maximizes the future cumulative discount rewards, i.e., maximizes:
r t +γr t+12 r t+2 +…+γ T-t r T maximum, wherein, gamma is more than or equal to 0 and less than or equal to 1 as discount rate,
the value of action a taken by policy pi under the state feature χ(s) is noted as:
E[r t +γr t+12 r t+2 +…+γ T-t r T |χ(s t )=X(s),a t =a],
which represents the expected total rewards of all possible decision sequences after execution of operation a, starting from the state feature χ(s), according to the strategy pi.
Simultaneously defining an optimal Q value function:
Q * (χ(s),a)=max π Q π (χ(s),a)=max π E[r t +γr t+12 r t+2 +…+γ T-t r T |χ(s t )=χ(s),a t =a]
which represents the desired total prize to be decided upon according to the optimal strategy after performing operation a under the state features χ(s).
The process of obtaining the optimal Q value function Q (X(s), a) under each state feature χ(s) by iterative means:
obtained from the Bellman formula:
Q * (χ(s),a)=E[r t +γmax a' Q * (χ(s'),a')|χ(s t )=χ(s),a t =a]。
thus, from the above equation, Q is estimated by a function approximator Q (χ(s), a; θ) * (χ(s), a), iterating θ by random gradient descent (SGD),
Figure BDA0003039511790000071
Figure BDA0003039511790000072
wherein θ is - Updated every k steps, i.e. atUpdate every k steps
Figure BDA0003039511790000073
Then at other steps theta - Remain unchanged.
Thus, given a set of allowed operations, such as in this embodiment, a set of allowed operations a for a product may include, but is not limited to, purchasing a product, not purchasing the product, and selling an already owned product, etc., rewards in the deep reinforcement learning model are satisfaction levels obtained for each of the operations, which are set in a number of ways, such as by determining how to set based on personal information of the user, such as risk preferences, etc.
In the training principle of the deep reinforcement learning model, the characteristics extracted by the data in the training set can be utilized to iterate to obtain the final theta * Corresponding Q (χ(s), a; θ) * )=Q * (χ(s), a) is the corresponding optimal Q function.
So long as each state feature χ (s t ) The lower corresponding optimal Q value function Q * (χ(s t ) A), then only the state features χ(s) t ) Lower select Q * (χ(s t ) The operation a) with the largest value is the product in the state characteristic χ (s t ) Optimal operation a capable of maximizing future cumulative satisfaction * In each state χ (s t ) All adopt the optimal operation a * The future satisfaction of the product can be maximized and the totality of all optimal operations, called the optimal strategy pi * Is recorded as
Figure BDA0003039511790000074
Obtaining the optimal strategy pi * Thereafter, an optimal strategy pi is simulated and executed on the product * The predicted maximum satisfaction of the product can be obtained:
obtain the optimal strategy pi * That is, each state feature χ (s t ) Lower corresponding optimum operation
Figure BDA0003039511790000081
Only the operation process of each product is simulated, and corresponding states s are corresponding to the transaction t All adopt χ(s) t ) Corresponding optimal transaction operations->
Figure BDA0003039511790000082
The predicted maximum satisfaction of the product can be obtained. Meanwhile, in the process of simulating the transaction, the deadline T, time, operation times and the like can be set for the product, for example, the deadline T is set to be six months, namely, the cumulative total predicted satisfaction of the product in six months is to be simulated. It can also be set that the operation can be performed once every five days, and the data characteristic of five days before the operation day is the current state characteristic χ(s) t ) For state s at each operation t By χ(s) t ) Corresponding optimal transaction operations->
Figure BDA0003039511790000083
The predicted maximum satisfaction of the product can be obtained, and the predicted maximum satisfaction is the predicted satisfaction of the product output in step S103.
Further, the step S104 is to sort the products according to the order of the predicted satisfaction degree from big to small, and recommend the sorted products to the target user; the method specifically comprises the following steps:
the ranking may be performed by directly comparing the predicted satisfaction degree obtained in S103 or by the relative recommendation rate of each product, and there are many methods for calculating the relative recommendation rate by the simulated maximum satisfaction degree of each product obtained in S103, for example, for the present embodiment, assuming that the predicted satisfaction degrees of three products 1, 2 and 3 are constants 18, 15 and 13, respectively, the relative recommendation rate may be calculated by taking the lower product simulated satisfaction degree as a standard, that is, the simulated satisfaction degree of product 3 as a standard, and the recommendation rate is 1, and the relative recommendation rate of product 1 is 18+.13+.1.38, and the relative recommendation rate of product 2 is 15+.13+.15, and may be similarly calculated when there are a plurality of products.
The invention aims to overcome the defects of the technology, and provides a product recommendation method and device based on a Double DQN algorithm, wherein a deep reinforcement learning model is trained through basic information of a user, historical data of a product and the Double DQN algorithm, simulation operation obtained through large-scale data analysis is often higher in reliability and stability than operation found through manual experience, meanwhile, the defect that the Q learning algorithm, the DQN algorithm and the like are easy to overestimate is overcome, the found optimal strategy is used for simulating the prediction satisfaction degree of the product, and the product is ranked through the prediction satisfaction degree of the product, so that the product with relatively high satisfaction rate is recommended to the user.
Example two
The invention provides a product recommendation device based on a Double DQN algorithm;
product recommendation device based on Double DQN algorithm includes:
an acquisition module configured to: basic information of a target user is obtained;
a feature extraction module configured to: processing basic information of a target user and extracting characteristics of the basic information;
a prediction module configured to: inputting the characteristics representing the basic information of the target user into the trained deep reinforcement learning model to obtain the prediction satisfaction degree of each product;
a recommendation module configured to: sorting products according to the order of the predicted satisfaction degree from large to small, and recommending the sorted products to a target user;
the deep reinforcement learning model refers to a Double DQN algorithm.
Here, it should be noted that the above-mentioned obtaining module, feature extraction module, prediction module, and recommendation module correspond to steps S101 to S104 in the first embodiment, and the above-mentioned modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.
The proposed system may be implemented in other ways. For example, the system embodiments described above are merely illustrative, such as the division of the modules described above, are merely a logical function division, and may be implemented in other manners, such as multiple modules may be combined or integrated into another system, or some features may be omitted, or not performed.
Example III
The embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein the processor is coupled to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of the first embodiment.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software.
The method in the first embodiment may be directly implemented as a hardware processor executing or implemented by a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Example IV
The present embodiment also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, perform the method of embodiment one.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. The product recommendation method based on Double DQN algorithm is characterized by comprising the following steps:
basic information of a target user is obtained;
processing basic information of a target user and extracting characteristics of the basic information;
inputting the characteristics representing the basic information of the target user into a trained deep reinforcement learning model to obtain the prediction satisfaction degree of each product, wherein the prediction satisfaction degree of the product is a value obtained through an optimal Q value function of a Double DQN algorithm;
in particular, the method comprises the steps of,
at each time point t, the agentThe current state is characterized as χ (s t ) At this time, the agent performs operation a t Obtaining rewards r from the environment t And observe a new state feature χ (s t+1 );
The goal of agent learning is to select a strategy pi, which is the action a taken at each time t, to maximize the desired total rewards t I.e., pi= { a t ,a t+1 ,a t+2 ,…a T -where T is the set terminal moment;
maximizing the expected return maximizes the future cumulative discount rewards, i.e., maximizes:
r t +γr t+12 r t+2 +…+γ T-t r T maximum, wherein, gamma is more than or equal to 0 and less than or equal to 1 as discount rate,
the value of action a taken by policy pi under the state feature χ(s) is noted as:
E[r t +γr t+12 r t+2 +…+γ T-t r T |χ(S t )=χ(s),a t =a],
it represents the expected total rewards of all possible decision sequences, starting from the state features χ(s), after execution of operation a, according to the strategy pi;
simultaneously defining an optimal Q value function:
Q * (χ(s),a)=max π Q π (χ(s),a)=max π E[r t +γr t+12 r t+2 +…+γ T-t r T |χ(s t )=χ(s),a t =a],
which represents the expected total rewards to be decided according to the optimal strategy after performing operation a under the state features χ(s);
the process of obtaining the optimal Q value function Q (χ(s), a) under each state feature χ(s) by iterative means:
obtained from the Bellman formula:
Q * (χ(s),a)=E[r t +γmax a' Q * (χ(s'),a')|χ(s t )=χ(s),a t =a];
thus, from the above equation, Q is estimated by a function approximator Q (χ(s), a; θ) * (χ(s), a), iterating θ by random gradient descent (SGD),
Figure QLYQS_1
Figure QLYQS_2
wherein θ is - Once every k steps, i.e. at every k steps
Figure QLYQS_3
Then at other steps theta - Remain unchanged;
sorting products according to the order of the predicted satisfaction degree from large to small, and recommending the sorted products to a target user;
the deep reinforcement learning model refers to a Double DQN algorithm;
preprocessing basic information of users in a training set, which specifically comprises the following steps:
dividing the average monthly income, the historical product purchase times, the historical product purchase frequency, the risk level of the historical purchased products and the price fluctuation data of the users in the training set by N time units to obtain a plurality of divided data s t The subscript t represents a time point, thereby recording a time interval of the data represented by the state;
all the data in the same time unit after segmentation are subjected to feature extraction through a convolutional neural network CNN to obtain a month average income feature, a historical product purchase frequency feature, a risk level feature of a historical purchased product and a price fluctuation data feature;
the month average income characteristic, the historical product purchase frequency characteristic, the risk grade characteristic of the historical purchased product and the price fluctuation data characteristic are spliced in series to obtain the same oneState features χ(s) corresponding to time units t ) And similarly, obtaining the state characteristics under all time units.
2. The product recommendation method based on Double DQN algorithm as claimed in claim 1, wherein the basic information of the target user is obtained; the method specifically comprises the following steps:
the monthly average income of the target user, the times of purchasing historical products, the frequency of purchasing the historical products, the risk level of purchasing the historical products and the price fluctuation data of purchasing the historical products are obtained.
3. The product recommendation method based on Double DQN algorithm as claimed in claim 1, wherein the basic information of the target user is processed and the characteristics thereof are extracted; the method specifically comprises the following steps:
feature extraction is performed by convolutional neural networks.
4. The product recommendation method based on Double DQN algorithm as claimed in claim 1, wherein the feature representing the basic information of the target user is input into the trained deep reinforcement learning model to obtain the prediction satisfaction degree of each product; the training steps comprise:
constructing a training set, wherein the training set is user basic information of known historical purchase satisfaction of products;
preprocessing the basic information of the user in the training set, taking the state characteristics of the basic information of the user obtained after preprocessing and the historical purchase satisfaction degree of the known product as input values of a deep reinforcement learning model, and training the model to obtain the trained deep reinforcement learning model.
5. Product recommendation device based on Double DQN algorithm, characterized by including:
an acquisition module configured to: basic information of a target user is obtained;
a feature extraction module configured to: processing basic information of a target user and extracting characteristics of the basic information;
a prediction module configured to: inputting the characteristics representing the basic information of the target user into a trained deep reinforcement learning model to obtain the prediction satisfaction degree of each product, wherein the prediction satisfaction degree of the product is a value obtained through an optimal Q value function of a Double DQN algorithm;
in particular, the method comprises the steps of,
at each time point t, the state in which the agent is currently located is characterized as χ (s t ) At this time, the agent performs operation a t Obtaining rewards r from the environment t And observe a new state feature χ (s t+1 );
The goal of agent learning is to select a strategy pi, which is the action a taken at each time t, to maximize the desired total rewards t I.e., pi= { a t ,a t+1 ,a t+2 ,…a T -where T is the set terminal moment;
maximizing the expected return maximizes the future cumulative discount rewards, i.e., maximizes:
r t +γr t+12 r t+2 +…+γ T-t r t maximum, wherein, gamma is more than or equal to 0 and less than or equal to 1 as discount rate,
the value of action a taken by policy pi under the state feature χ(s) is noted as:
E[r t +γr t+12 r t+2 +…+γ T-t r T |χ(S t )=χ(s),a t =a],
it represents the expected total rewards of all possible decision sequences, starting from the state features χ(s), after execution of operation a, according to the strategy pi;
simultaneously defining an optimal Q value function:
Q * (χ(s),a)=max π Q π (χ(s),a)=max π E[r t +γr t+12 r t+2 +…+γ T-t r T |χ(s t )=χ(s),a t =a],
which represents the expected total rewards to be decided according to the optimal strategy after performing operation a under the state features χ(s);
obtaining the optimal Q value function Q under each state characteristic χ(s) in an iterative mode * (χ(s), process of a):
obtained from the Bellman formula:
Q * (χ(s),a)=E[r t +γmax a′ Q * (χ(s′),a′)|χ(s t )=χ(s),a t =a];
thus, from the above equation, Q is estimated by a function approximator Q (χ(s), a; θ) * (χ(s), a), iterating θ by random gradient descent (SGD),
Figure QLYQS_4
Figure QLYQS_5
wherein θ is - Once every k steps, i.e. at every k steps
Figure QLYQS_6
Then at other steps theta - Remain unchanged;
a recommendation module configured to: sorting products according to the order of the predicted satisfaction degree from large to small, and recommending the sorted products to a target user;
the deep reinforcement learning model refers to a Double DQN algorithm;
preprocessing basic information of users in a training set, which specifically comprises the following steps:
dividing the average monthly income, the historical product purchase times, the historical product purchase frequency, the risk level of the historical purchased products and the price fluctuation data of the users in the training set by N time units to obtain a plurality of divided data s t The subscript t represents a time point, thereby recording a time interval of the data represented by the state;
all the data in the same time unit after segmentation are subjected to feature extraction through a convolutional neural network CNN to obtain a month average income feature, a historical product purchase frequency feature, a risk level feature of a historical purchased product and a price fluctuation data feature;
the month average income characteristic, the historical product purchase frequency characteristic, the risk grade characteristic of the historical purchased product and the price fluctuation data characteristic are spliced in series to obtain a state characteristic χ(s) corresponding to the same time unit t ) And similarly, obtaining the state characteristics under all time units.
6. An electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein the processor is coupled to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of any of claims 1-4.
7. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of any of claims 1-4.
CN202110452994.0A 2021-04-26 2021-04-26 Product recommendation method and device based on Double DQN algorithm Active CN113129108B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110452994.0A CN113129108B (en) 2021-04-26 2021-04-26 Product recommendation method and device based on Double DQN algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110452994.0A CN113129108B (en) 2021-04-26 2021-04-26 Product recommendation method and device based on Double DQN algorithm

Publications (2)

Publication Number Publication Date
CN113129108A CN113129108A (en) 2021-07-16
CN113129108B true CN113129108B (en) 2023-05-30

Family

ID=76780002

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110452994.0A Active CN113129108B (en) 2021-04-26 2021-04-26 Product recommendation method and device based on Double DQN algorithm

Country Status (1)

Country Link
CN (1) CN113129108B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114581249B (en) * 2022-03-22 2024-05-31 山东大学 Financial product recommendation method and system based on investment risk bearing capacity assessment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112045680A (en) * 2020-09-02 2020-12-08 山东大学 Cloth stacking robot control system and control method based on behavior cloning
CN112291284A (en) * 2019-07-22 2021-01-29 中国移动通信有限公司研究院 Content pushing method and device and computer readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711871B (en) * 2018-12-13 2021-03-12 北京达佳互联信息技术有限公司 Potential customer determination method, device, server and readable storage medium
CN110263244B (en) * 2019-02-14 2024-02-13 深圳市雅阅科技有限公司 Content recommendation method, device, storage medium and computer equipment
CN110598120A (en) * 2019-10-16 2019-12-20 信雅达系统工程股份有限公司 Behavior data based financing recommendation method, device and equipment
CN110866791A (en) * 2019-11-25 2020-03-06 恩亿科(北京)数据科技有限公司 Commodity pushing method and device, storage medium and electronic equipment
CN111898032B (en) * 2020-08-13 2024-04-30 腾讯科技(深圳)有限公司 Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291284A (en) * 2019-07-22 2021-01-29 中国移动通信有限公司研究院 Content pushing method and device and computer readable storage medium
CN112045680A (en) * 2020-09-02 2020-12-08 山东大学 Cloth stacking robot control system and control method based on behavior cloning

Also Published As

Publication number Publication date
CN113129108A (en) 2021-07-16

Similar Documents

Publication Publication Date Title
CN106875206A (en) Acquisition of information, assessment, questionnaire method, device and server
CN109858957A (en) Finance product recommended method, device, computer equipment and storage medium
WO2020253038A1 (en) Model construction method and apparatus
US11770407B2 (en) Methods and apparatuses for defending against data poisoning attacks in recommender systems
CN114581249B (en) Financial product recommendation method and system based on investment risk bearing capacity assessment
CN111797320A (en) Data processing method, device, equipment and storage medium
CN113129108B (en) Product recommendation method and device based on Double DQN algorithm
CN114782201A (en) Stock recommendation method and device, computer equipment and storage medium
Jiang et al. Intertemporal pricing via nonparametric estimation: Integrating reference effects and consumer heterogeneity
CN114037518A (en) Risk prediction model construction method and device, electronic equipment and storage medium
CN115713389A (en) Financial product recommendation method and device
CN114897607A (en) Data processing method and device for product resources, electronic equipment and storage medium
CN113988431A (en) Method, system and equipment for predicting potential broker capacity of client
CN112785095A (en) Loan prediction method, loan prediction device, electronic device, and computer-readable storage medium
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium
Long et al. Should optimal designers worry about consideration?
Motte Mathematical models for large populations, behavioral economics, and targeted advertising
CN110135472B (en) Method for determining rewards of image description model and electronic device
CN112862602B (en) User request determining method, storage medium and electronic device
CN114331582A (en) Fund recommendation method and device, computer equipment and storage medium
CN113837183B (en) Multi-stage certificate intelligent generation method, system and medium based on real-time mining
CN117668557A (en) Travel transit sequencing method and device applying balanced representation learning
CN118200393A (en) Message pushing method and device
CN115908006A (en) Financial product recommendation method, system, equipment and medium based on decision tree
CN117350765A (en) Variable determining method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant