CN111738787A - Information pushing method and device - Google Patents

Information pushing method and device Download PDF

Info

Publication number
CN111738787A
CN111738787A CN201910512050.0A CN201910512050A CN111738787A CN 111738787 A CN111738787 A CN 111738787A CN 201910512050 A CN201910512050 A CN 201910512050A CN 111738787 A CN111738787 A CN 111738787A
Authority
CN
China
Prior art keywords
commodity
user
information
model
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910512050.0A
Other languages
Chinese (zh)
Inventor
周东
雷章明
汤桢伟
兰华勇
古川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910512050.0A priority Critical patent/CN111738787A/en
Publication of CN111738787A publication Critical patent/CN111738787A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Abstract

The invention discloses an information pushing method and device, which specifically comprise the following steps: acquiring user information, commodity information and commodity sequence information of user historical behaviors; training a user-commodity interest model by adopting a reinforcement learning method according to the acquired information; training a user-commodity repurchase period model by adopting a deep learning method; when the current user accesses online, searching relevant information of the current user, and obtaining a re-purchasing period of each commodity in a commodity list by the current user according to a user-commodity interest model and a user-commodity re-purchasing period model; and comparing the historical purchase recording time of the current user with the re-purchasing period of each commodity in the commodity list, and filtering the non-re-purchased commodities in the commodity list to obtain the recommended commodity list of the current user. By adopting the method and the device, the recommendation strategy can be dynamically adjusted according to the user feedback; and repeated commodity recommendation in a non-repeated purchasing period is avoided.

Description

Information pushing method and device
Technical Field
The invention relates to the technical field of electronic commerce, in particular to an information pushing method and device.
Background
In the field of personalized goods recommendation, current recommendation algorithms include:
1. based on recommendation of popularity, the popularity is defined according to browsing volume, sales volume, social hotspots and the like, and the cold start problem can be effectively solved.
2. Based on collaborative filtering recommendation, only user behavior is relied on, and deep understanding of commodity characteristics or metadata is not needed. The coverage rate is high, the problem of long tail can be effectively solved, and the user is surprised.
3. Content/rule/knowledge recommendation based on a complete, well-categorized content/rule/knowledge architecture and structured data features. The timeliness is high, the recommendation reason is reasonably explained, and the user trust degree is high.
4. Based on the combination of the recommended algorithms, the advantages are complementary, the problems of cold start, data sparseness and data unstructured can be solved, and the application range is wide.
For the prior art, including recommendation based on popularity (popularity), recommendation based on collaborative filtering, recommendation based on content/rule/knowledge, and combination of these recommendation algorithms, since the recommendation result is output to the user for display, dynamic adjustment of the recommendation strategy is not performed according to user feedback (such as whether to click or not, whether to purchase, etc.), resulting in a less than ideal recommendation effect. In addition, due to the fact that the specific repurchase cycle factor of the commodity is not considered (generally, the repurchase cycle of the consumable part is larger than that of the fast-consumed part), repeated recommendation of the commodity in a non-repurchase cycle is caused, and resource waste of the display position and user experience problems (such as contradictory conflict, distrust and the like) are caused.
Disclosure of Invention
The invention aims to provide an information pushing method and device, which can dynamically adjust a recommendation strategy according to user feedback; and repeated commodity recommendation in a non-repeated purchasing period is avoided.
In order to achieve the above object, an information pushing method includes:
acquiring user information, commodity information and commodity sequence information of user historical behaviors;
training a user-commodity interest model by adopting a reinforcement learning method according to the acquired information; training a user-commodity repurchase period model by adopting a deep learning method;
when a current user accesses online, searching user information, commodity information and commodity sequence information of historical behaviors of the current user, and obtaining a repurchase cycle of the current user to each commodity in a commodity list according to a user-commodity interest model and a user-commodity repurchase cycle model;
and comparing the historical purchase recording time of the current user with the re-purchasing period of each commodity in the commodity list, and filtering the non-re-purchased commodities in the commodity list to obtain the recommended commodity list of the current user.
In order to achieve the above object, the present invention further provides an information pushing apparatus, including:
the data management module is used for acquiring user information, commodity information and commodity sequence information of historical behaviors of the user;
the model training module is used for training the user-commodity interest model by adopting a reinforcement learning method according to the information acquired from the data management module; training a user-commodity repurchase period model by adopting a deep learning method;
the online recommendation module is used for searching user information, commodity information and commodity sequence information of historical behaviors of the current user when the current user performs online access, and obtaining a repurchase cycle of the current user to each commodity in the commodity list according to the user-commodity interest model and the user-commodity repurchase cycle model; and comparing the historical purchase recording time of the current user with the re-purchasing period of each commodity in the commodity list, and filtering the non-re-purchased commodities in the commodity list to obtain the recommended commodity list of the current user.
In summary, the invention provides an information push improvement method and device based on the combination of reinforcement learning and deep learning, which obtains user information, commodity information, and commodity sequence information of user historical behaviors; training a user-commodity interest model by adopting a reinforcement learning method according to the acquired information; training a user-commodity repurchase period model by adopting a deep learning method; when the current user accesses online, searching relevant information of the current user, and obtaining a re-purchasing period of each commodity in a commodity list by the current user according to a user-commodity interest model and a user-commodity re-purchasing period model; and comparing the historical purchase recording time of the current user with the re-purchasing period of each commodity in the commodity list, and filtering the non-re-purchased commodities in the commodity list to obtain the recommended commodity list of the current user. By the scheme of the invention, the following problems in the prior art are solved: the recommendation strategy cannot be dynamically adjusted according to the user feedback; and repeatedly pushing in the non-repurchase period.
Drawings
Fig. 1 is a flowchart illustrating an information push method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an information pushing apparatus according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The core idea of the invention is that firstly, two models are trained off-line, namely a user-commodity interest model and a user-commodity repurchase cycle model, wherein the user-commodity interest model is trained by a reinforcement learning method according to user information, commodity information and commodity sequence information of user historical behaviors, and a commodity list of each user is output; and training a user-commodity repurchase period model through a deep learning method according to the user information and the commodity information bought by the user, and outputting the commodity repurchase period of each user. When the online user accesses, filtering non-repurchase commodities according to the user-commodity interest model and the user-commodity repurchase cycle model to obtain a recommended commodity list of the current user.
Fig. 1 is a schematic flow chart of an information push method according to the present invention, where the method includes:
and 11, acquiring user information, commodity information and commodity sequence information of user historical behaviors.
In this step, the acquired user information, commodity information, and commodity sequence information of the user historical behaviors are obtained by processing the acquired information. Specifically, user information, commodity information, user behavior information and user order information of each user are collected, and then the collected information is sequentially subjected to extraction, cleaning, characterization and labeling to obtain the processed user information, commodity information and user historical behavior commodity sequence information.
Wherein, the user portrait information that obtains after the user information of gathering processes includes: user rating, gender, user age, user marital status, academic history, occupation, presence or absence of a car, the method comprises the following steps of firstly purchasing the garment in the same year, last-year shopping, first single time to present, present time, last login present time, first order present days in the same year, last month unit price, last two month unit prices, last three month unit prices, mother and baby medal level, individual care and make-up medal level, wine medal, user liveness model, purchasing power segmentation, life cycle, RFM full-quality category score, POP garment RFM grouping, POP garment RFM standardized score, POP garment RFM super garment RFM grouping, POP garment super RFM standardized score, large garment quality RFM grouping, large garment quality RFM standardized score, user value grouping and user value standard score.
After the collected user behavior information and the user order information are processed, the commodity sequence information of the user historical behaviors is obtained, and the commodity sequence information comprises the following steps: a user history browsing merchandise ID sequence, a user history purchasing merchandise ID sequence, a user history exposure merchandise ID sequence, etc.
The collected commodity information is processed to obtain commodity price, size, brand, sales volume, categories and the like. The commodity here refers to a commodity obtained from commodity sequence information of user historical behaviors.
Step 12, training a user-commodity interest model by adopting a reinforcement learning method according to the acquired information; and training a user-commodity repurchase period model by adopting a deep learning method.
The method for training the user-commodity interest model by adopting the reinforcement learning method specifically comprises the following steps:
s121, constructing two identical neural networks, namely an eval neural network and a target neural network, wherein the eval neural network is used for obtaining a recommended commodity list and a predicted value of the quality degree of a recommended result, and the target neural network is used for updating parameters of the eval neural network;
in this step, the eval neural network is used to obtain a recommended goods list (action) and a predicted value Q of the quality degree of the recommended result.
S122, initializing eval neural network parameters thetaQAnd thetaμ(ii) a Initializing target neural network parameters θQ′=θQ,θμ′=θμ
S123, using the user information, commodity information and the commodity sequence information of the historical behaviors of the user as a piece of training data Si(ii) a Will(s)i,ai,ri,si+1) The ith sample is taken as the N samples of the reinforcement learning training set; wherein, aiList of recommended items for ith sample, riUser feedback for the ith sample, si+1The commodity sequence information is the commodity sequence information of the user information, commodity information and user historical behaviors of the next state of the ith sample;
wherein(s)i,ai,ri,si+1) As a sample, byiBehavioral influence is given by si+1. By analogy, the i +1 th sample is(s)i+1,ai+1,ri+1,si+2)。
S124, recommending knots according to the ith sampleConstructing a first loss function according to the target value of the result quality degree and the predicted value of the recommended result quality degree; in the eval neural network, the first loss function is optimized by taking a small value, and the parameter theta of the eval neural network is updatedQ
And calculating the gradient of the first loss function by adopting a gradient descending method, and selecting the direction with the fastest gradient descending so as to minimize the first loss function.
S125, in the eval neural network, optimizing the expectation function fed back by the user by taking a small value, and updating the parameter theta of the eval neural networkμ
S126, according to thetaQAnd thetaμUpdating the parameter of the target neural network to be thetaQ′And thetaμ′And obtaining the trained user-commodity interest model.
The method for training the user-commodity repurchase period model by adopting the deep learning method specifically comprises the following steps:
the SS121 takes the user information of each user and the commodity information bought by the user as the ith sample in M samples of a deep learning training set, wherein i belongs to M, and M is a natural number;
the SS122 inputs the user information of the ith sample and the commodity information bought by the user into a user-commodity repurchase cycle model to obtain a commodity repurchase cycle training value;
and the SS123 constructs a second loss function according to the commodity repurchase cycle training value and the repurchase cycle real value of the ith sample, optimizes the minimum value of the second loss function, and updates the network weight parameters to obtain a trained user-commodity repurchase cycle model.
The actual value of the repurchase cycle for each sample is determined by the actual purchase record time. When the training value of the sample is closer to the true value, the second loss function takes the minimum value, the obtained network weight parameter is optimal, the model training is finished, and the trained user-commodity repurchasing period model can be used for determining a commodity repurchasing period for the currently visited user in the subsequent step 13, that is, the repurchasing period of each commodity in the commodity list by the current user is obtained.
And calculating the gradient of the second loss function by adopting a gradient descending method, and selecting the direction with the fastest gradient descending so as to enable the second loss function to be the minimum. The user-commodity repurchase period model corresponds to a set of optimal network weight parameters after being trained.
And step 13, when the current user accesses online, searching the user information, commodity information and commodity sequence information of the historical behaviors of the current user, and obtaining the repurchase cycle of the current user to each commodity in the commodity list according to the user-commodity interest model and the user-commodity repurchase cycle model.
The method comprises the following specific steps: when a current user accesses, user information, commodity information and commodity sequence information of historical behaviors of the user are obtained according to a current user identifier;
inputting the user information, the commodity information and the commodity sequence information of the historical behaviors of the user into a user-commodity interest model to obtain a commodity list of the current user;
and respectively inputting the current user information and the commodity information in the commodity list into a user-commodity re-purchasing period model to obtain the re-purchasing period of each commodity in the commodity list by the current user.
And step 14, comparing the historical purchasing record time of the current user with the re-purchasing period of each commodity in the commodity list, and filtering the non-re-purchased commodities in the commodity list to obtain the recommended commodity list of the current user.
Thus, the information push method of the present invention is completed.
Based on the same inventive concept, the invention also discloses an information pushing device, the schematic diagram of which is shown in fig. 2, the device comprises:
the data management module 201 is used for acquiring user information, commodity information and commodity sequence information of user historical behaviors;
the model training module 202 is used for training a user-commodity interest model by adopting a reinforcement learning method according to the information acquired from the data management module; training a user-commodity repurchase period model by adopting a deep learning method;
the online recommendation module 203 is used for searching relevant information of the current user when the current user performs online access, and obtaining a repurchase cycle of each commodity in the commodity list by the current user according to the user-commodity interest model and the user-commodity repurchase cycle model; and comparing the historical purchase recording time of the current user with the re-purchasing period of each commodity in the commodity list, and filtering the non-re-purchased commodities in the commodity list to obtain the recommended commodity list of the current user.
The model training module 202 is particularly useful in training a user-commodity interest model using reinforcement learning,
constructing two identical neural networks of an eval neural network and a target neural network, wherein the eval neural network is used for acquiring a recommended commodity list and a predicted value of the quality degree of a recommended result, and the target neural network is used for updating parameters of the eval neural network;
initializing eval neural network parameters θQAnd thetaμ(ii) a Initializing target neural network parameters θQ′=θQ,θμ′=θμ
Using the user information, commodity information and commodity sequence information of the historical behaviors of each user as a piece of training data si(ii) a Will(s)i,ai,ri,si+1) The ith sample is taken as the N samples of the reinforcement learning training set; wherein, aiList of recommended items for ith sample, riUser feedback for the ith sample, si+1The commodity sequence information is the commodity sequence information of the user information, commodity information and user historical behaviors of the next state of the ith sample;
constructing a first loss function according to the target value of the recommendation result quality degree of the ith sample and the predicted value of the recommendation result quality degree; in the eval neural network, the first loss function is optimized by taking a small value, and the parameter theta of the eval neural network is updatedQ
In the eval neural network, the expectation function fed back by the user is optimized by taking a small value, and the parameter theta of the eval neural network is updatedμ
According to thetaQAnd thetaμUpdating target neural networksParameter is thetaQ′And thetaμ′And obtaining the trained user-commodity interest model.
The model training module 202 is specifically configured to train the user-commodity buyback cycle model by deep learning,
taking the user information of each user and the commodity information bought by the user as the ith sample in M samples of a deep learning training set, wherein i belongs to M, and M is a natural number;
inputting the user information of the ith sample and the commodity information bought by the user into a user-commodity repurchase period model to obtain a commodity repurchase period training value;
and constructing a second loss function according to the commodity repurchase period training value and the repurchase period real value of the ith sample, optimizing the minimum value of the second loss function, and updating the network weight parameters to obtain a trained user-commodity repurchase period model.
The online recommendation module 203 searches the user information, the commodity information, and the commodity sequence information of the historical behaviors of the user of the current user when the current user performs online access, obtains the repurchase cycle of each commodity in the commodity list of the current user according to the user-commodity interest model and the user-commodity repurchase cycle model, and is specifically used for,
when a current user accesses, user information, commodity information and commodity sequence information of historical behaviors of the user are obtained according to a current user identifier;
inputting the user information, the commodity information and the commodity sequence information of the historical behaviors of the user into a user-commodity interest model to obtain a commodity list of the current user;
and respectively inputting the current user information and the commodity information in the commodity list into a user-commodity re-purchasing period model to obtain the re-purchasing period of each commodity in the commodity list by the current user.
In one embodiment, specific scenarios are listed below for clarity of illustration of the invention.
1. And the data management module collects, processes and stores information by user dimensionality so as to facilitate the calling of the subsequent model training module and the online recommendation module.
2. Training user-commodity interest model process by adopting reinforcement learning method
1) Firstly, two identical neural networks of the eval neural network and the target neural network are required to be constructed, and a parameter theta of the eval neural network is initializedQAnd thetaμ(ii) a Initializing target neural network parameters θQ′And thetaμ′,θQ′←θQ,θμ′←θQ(ii) a At the same time, random process noise is initialized
Figure BDA0002093804120000081
It should be noted that both the Actor network and the Critic network have target-net and eval-net, and only the parameters of the Actor network and Critic network are trained during the training process, while the parameters of the Actor network and Critic network are copied by the former two networks at regular intervals. The purpose of using the target network and the eval network is to update parameters in a continuous state every time, and correlation exists before and after the parameters are updated, namely training data of the model are not independently and uniformly distributed any more, so that the neural network can only see the problem (limitation) in one side, and even the overall effect of the neural network is not converged. The Actor eval-net outputs an action (action) according to the state (state), and the critical eval-net outputs a Q value according to the state and the action.
2) A training data state s to be called from the data management moduleiInputting Actor eval-net, the action Actor selects an a according to the current strategy and noise explorationi
Figure BDA0002093804120000091
In a state siLower aiInfluence environment, return user feedback (reward): r isiAnd a new state si+1
Will(s)i,ai,ri,si+1) As one of N samples in the reinforcement learning training set.
3) Critic eval-net according to siAnd aiObtaining a predicted value Q(s) of the degree of the quality of the recommendation resulti,ai)。
4) By the next state si+1The next (future) action a is calculated in the Actor eval-neti+1The formula is expressed as follows: a isi+1=μ′(si+1μ′)。
5) Calculating a new Q value (Q') by using the Critic target-net, and the formula is as follows: q ═ Q'(s)i+1,ai+1)。
6) With current user feedback riAnd accumulating the Q values to calculate a target Q value. Target Q value is target value y of recommendation result qualityi=ri+ γ q, i.e. yi=ri+γQ′(si+1,μ′(si+1μ′)|θQ′). Gamma denotes a discount factor.
7) Calculate the gradient of Critic eval-net:
the first loss function (loss) of Critic eval-net defines: using a method similar to supervised learning, the target value y according to the degree of goodness of the recommendation result of the ith sampleiAnd the predicted value Q(s) of the degree of the quality of the recommended resulti,ai) Constructing a first loss function:
Figure BDA0002093804120000092
wherein, yiCan be regarded as "Label": y isi=ri+γQ′(si+1,μ′(si+1μ′)|θQ′) γ denotes a discount factor and N denotes N samples.
Updating Critic eval-net by minimizing L, thereby updating thetaQ
8) Calculating the gradient of Actor eval-net:
the expectation function of the user feedback of Actor eval-net is defined as:
Figure BDA0002093804120000094
actor eval-net parameter theta is calculated and updated through sampling strategyμ
Figure BDA0002093804120000093
9) According to thetaQAnd thetaμUpdating parameters of the target network:
θQ′←τθQ+(1-τ)θQ′
θμ′←τθμ+(1-τ)θμ′
in conclusion, the user-commodity interest model of the invention can dynamically update the user-commodity interest model and adjust the strategy of the recommended commodity list in real time through the training process. After user information, commodity information and commodity sequence information of a user are input in a user-commodity interest model, a commodity list corresponding to the user is output, and the commodity list is provided with a commodity identifier of each commodity.
3. Process for training user-commodity repurchase period model by deep learning method
The user-commodity repurchase period model mainly adopts three layers of fully connected networks, and initializes the neuron weight parameter theta of each layer, which is also called as the weight parameter theta.
The collected and processed user information of each user and commodity information bought by the user are used as the ith sample i in the deep learning training set M, wherein i belongs to the M;
inputting the user information of the ith sample and the commodity information bought by the user into a user-commodity repurchase period model to obtain a commodity repurchase period training value hθ(x(i)) Wherein h isθ(x)=θTx=θ0x0+…+θnxnTheta represents a network weight parameter, x represents a neuron, and n represents an information characteristic dimension;
constructing a second loss function according to the commodity repurchase cycle training value and the repurchase cycle real value of the ith sample, and performing the second loss function
Figure BDA0002093804120000101
Optimizing the minimum value, and updating the network weight parameter theta to obtainThe trained user-commodity re-purchasing period model; wherein, y(i)Is the real value of the repeat purchasing cycle of the ith sample. Wherein, the gradient of the second loss function is calculated by adopting a gradient descending method, and the direction with the fastest gradient descending is selected to ensure that the loss function J (theta) is the minimum.
In addition, y is(i)And inputting the label data into the user-commodity repurchase period model for gradient training to obtain the user-commodity repurchase period model. The tag data is obtained by acquiring the real purchase record time of each user within a preset time period, for example, acquiring the purchase record of each user within one year, and calculating the re-purchase time of the corresponding commodity of each user. Briefly, if a user purchases a mobile phone twice in a year, the real value of the re-purchase cycle of the mobile phone by the user is: one year/2 times 6 months.
In summary, after the user information of a user and the information of the commodities bought by the user are input in the user-commodity repurchase cycle model, the repurchase cycle of each commodity corresponding to the user is output. The user-commodity repurchase period model can filter non-repurchase commodities through the training process.
4. Online recommendation process
When a current user accesses, the user portrait information is obtained from the data management module according to the user ID, the user history browses the commodity ID sequence, the user history purchased commodity ID sequence, the user history exposed commodity ID sequence, the commodity price, the size, the brand, the sales volume and the category.
And inputting the information acquired from the data management module into a user-commodity interest model to obtain a commodity list of the current user. For example, the current user ID1 corresponds to the merchandise ID1, the merchandise ID2, and the merchandise ID3 in the merchandise list.
The user portrait information and commodity information corresponding to each of commodity ID1, commodity ID2 and commodity ID3 are acquired from the data management module based on the user ID 1.
Inputting user portrait information of current user ID1 and commodity information of commodity ID1 into a user-commodity repurchase cycle model to obtain a repurchase cycle T1 of the current user ID1 to the commodity ID 1;
inputting user portrait information of current user ID1 and commodity information of commodity ID2 into a user-commodity repurchase cycle model to obtain a repurchase cycle T2 of the current user ID1 to the commodity ID 2;
user portrait information of current user ID1 and commodity information of commodity ID3 are input to the user-commodity buyback cycle model, resulting in a buyback cycle T3 for the current user ID1 to the commodity ID 3.
And writing the commodity into a recommended commodity list if the historical purchase recording time T is greater than T according to the historical purchase recording time of the current user on each commodity in the commodity list in the data management module. For example, if the current historical purchase record time of the user ID1 for the article ID1 is 100 days of purchased time and is greater than the repurchase period T1, it indicates that the article ID1 is an article available for repurchase, and the article ID1 is written into the recommended article list.
In another embodiment, when the current user performs online access, the recommended commodity examples obtained through the user-commodity interest model and the user-commodity repurchase cycle model are as follows:
after the collected information is processed, the information is processed,
user information, such as gender, age, etc., of the current user, including 256-dimensional user vectors,
the user history browsing commodity sequence comprises the commodity ID, the commodity price and the commodity monthly sales volume of each browsing commodity. For example, the current user browses 3 items: hua is cell phone, Macbook, Sony camera, price is 5000, 15000, 20000 respectively, sales volume is 3000, 2000, 1000 respectively, then, user history browsing commodity sequence is expressed as [ Hua is cell phone, 5000,3000, Macbook,15000,2000, Sony camera, 20000,1000], total 312 dimension vector.
Inputting the 256+ 312-568-dimensional vector into a user-commodity interest model to obtain a commodity list corresponding to the user.
Inputting 256-dimensional user information and 52-dimensional commodity information, namely 256+ 52-308-dimensional vectors into a user-commodity repurchase period model, and obtaining the repurchase period of each commodity in a commodity list corresponding to the user.
And finally, comparing the historical purchase recording time of the current user with the re-purchasing period of each commodity in the commodity list, and filtering the non-re-purchased commodities in the commodity list to obtain the recommended commodity list of the current user.
The modules of the above embodiments may be integrated into one body, or may be separately deployed; the sub-modules can be combined into one module, or can be further split into a plurality of sub-modules.
In addition, an electronic device is further provided in this embodiment of the present application, and a schematic structural diagram is shown in fig. 3, and includes a memory 301, a processor 302, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the target location method of the sequence data when executing the program.
Furthermore, a computer-readable storage medium is provided in an embodiment of the present application, on which a computer program is stored, which when executed by a processor implements the steps of the method for object localization of sequence data.
In conclusion, the scheme of the invention trains the user-commodity interest model by a reinforcement learning method, and can sense the feedback of the user and dynamically adjust the recommendation strategy of the model in the training process; the user-commodity repurchase period model is trained through a deep learning method, and non-repurchase commodities can be filtered. Thereby forming an accurate and objective commodity recommendation list.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. An information pushing method, characterized in that the method comprises:
acquiring user information, commodity information and commodity sequence information of user historical behaviors;
training a user-commodity interest model by adopting a reinforcement learning method according to the acquired information; training a user-commodity repurchase period model by adopting a deep learning method;
when a current user accesses online, searching user information, commodity information and commodity sequence information of historical behaviors of the current user, and obtaining a repurchase cycle of the current user to each commodity in a commodity list according to a user-commodity interest model and a user-commodity repurchase cycle model;
and comparing the historical purchase recording time of the current user with the re-purchasing period of each commodity in the commodity list, and filtering the non-re-purchased commodities in the commodity list to obtain the recommended commodity list of the current user.
2. The method of claim 1, wherein training the user-commodity interest model using reinforcement learning specifically comprises:
constructing two identical neural networks of an eval neural network and a target neural network, wherein the eval neural network is used for acquiring a recommended commodity list and a predicted value of the quality degree of a recommended result, and the target neural network is used for updating parameters of the eval neural network;
initializing eval neural network parameters θQAnd thetaμ(ii) a Initializing target neural network parameters θQ′=θQ,θμ′=θμ
Using the user information, commodity information and commodity sequence information of the historical behaviors of each user as a piece of training data si(ii) a Will(s)i,ai,ri,si+1) The ith sample is taken as the N samples of the reinforcement learning training set; wherein, aiList of recommended items for ith sample, riUser feedback for the ith sample, si+1The commodity sequence information is the commodity sequence information of the user information, commodity information and user historical behaviors of the next state of the ith sample;
constructing a first loss function according to the target value of the recommendation result quality degree of the ith sample and the predicted value of the recommendation result quality degree; in the eval neural network, the first loss function is optimized by taking a small value, and the parameter theta of the eval neural network is updatedQ
In eval neural network, the users are reversedThe fed expectation function takes a small value to optimize, and the eval neural network parameter theta is updatedμ
According to thetaQAnd thetaμUpdating the parameter of the target neural network to be thetaQ′And thetaμ′And obtaining the trained user-commodity interest model.
3. The method of claim 1, wherein training the user-commodity buyback cycle model using the deep learning method specifically comprises:
taking the user information of each user and the commodity information bought by the user as the ith sample in M samples of a deep learning training set, wherein i belongs to M, and M is a natural number;
inputting the user information of the ith sample and the commodity information bought by the user into a user-commodity repurchase period model to obtain a commodity repurchase period training value;
and constructing a second loss function according to the commodity repurchase period training value and the repurchase period real value of the ith sample, optimizing the minimum value of the second loss function, and updating the network weight parameters to obtain a trained user-commodity repurchase period model.
4. The method according to claim 1, wherein when the current user performs online access, the user information, the commodity information, and the commodity sequence information of the historical behavior of the user of the current user are searched, and a repurchase cycle of each commodity in the commodity list by the current user is obtained according to the user-commodity interest model and the user-commodity repurchase cycle model, which specifically includes:
when a current user accesses, user information, commodity information and commodity sequence information of historical behaviors of the user are obtained according to a current user identifier;
inputting the user information, the commodity information and the commodity sequence information of the historical behaviors of the user into a user-commodity interest model to obtain a commodity list of the current user;
and respectively inputting the current user information and the commodity information in the commodity list into a user-commodity re-purchasing period model to obtain the re-purchasing period of each commodity in the commodity list by the current user.
5. An information pushing apparatus, comprising:
the data management module is used for acquiring user information, commodity information and commodity sequence information of historical behaviors of the user;
the model training module is used for training the user-commodity interest model by adopting a reinforcement learning method according to the information acquired from the data management module; training a user-commodity repurchase period model by adopting a deep learning method;
the online recommendation module is used for searching user information, commodity information and commodity sequence information of historical behaviors of the current user when the current user performs online access, and obtaining a repurchase cycle of the current user to each commodity in the commodity list according to the user-commodity interest model and the user-commodity repurchase cycle model; and comparing the historical purchase recording time of the current user with the re-purchasing period of each commodity in the commodity list, and filtering the non-re-purchased commodities in the commodity list to obtain the recommended commodity list of the current user.
6. The apparatus of claim 5, wherein the model training module, when employing reinforcement learning to train the user-commodity interest model, is specifically configured to,
constructing two identical neural networks of an eval neural network and a target neural network, wherein the eval neural network is used for acquiring a recommended commodity list and a predicted value of the quality degree of a recommended result, and the target neural network is used for updating parameters of the eval neural network;
initializing eval neural network parameters θQAnd thetaμ(ii) a Initializing target neural network parameters θQ′=θQ,θμ′=θμ
Using the user information, commodity information and commodity sequence information of the historical behaviors of each user as a piece of training data si(ii) a Will(s)i,ai,ri,si+1) In N samples as reinforcement learning training setThe ith sample; wherein, aiList of recommended items for ith sample, riUser feedback for the ith sample, si+1The commodity sequence information is the commodity sequence information of the user information, commodity information and user historical behaviors of the next state of the ith sample;
constructing a first loss function according to the target value of the recommendation result quality degree of the ith sample and the predicted value of the recommendation result quality degree; in the eval neural network, the first loss function is optimized by taking a small value, and the parameter theta of the eval neural network is updatedQ
In the eval neural network, the expectation function fed back by the user is optimized by taking a small value, and the parameter theta of the eval neural network is updatedμ
According to thetaQAnd thetaμUpdating the parameter of the target neural network to be thetaQ′And thetaμ′And obtaining the trained user-commodity interest model.
7. The apparatus according to claim 5, wherein the model training module is specifically configured to, when training the user-commodity buyback cycle model using a deep learning method,
taking the user information of each user and the commodity information bought by the user as the ith sample in M samples of a deep learning training set, wherein i belongs to M, and M is a natural number;
inputting the user information of the ith sample and the commodity information bought by the user into a user-commodity repurchase period model to obtain a commodity repurchase period training value;
and constructing a second loss function according to the commodity repurchase period training value and the repurchase period real value of the ith sample, optimizing the minimum value of the second loss function, and updating the network weight parameters to obtain a trained user-commodity repurchase period model.
8. The apparatus according to claim 5, wherein the online recommendation module, when the current user accesses online, searches for user information, commodity information, and commodity sequence information of historical behaviors of the current user, obtains a repurchase cycle of each commodity in the commodity list of the current user according to the user-commodity interest model and the user-commodity repurchase cycle model, and is specifically configured to,
when a current user accesses, user information, commodity information and commodity sequence information of historical behaviors of the user are obtained according to a current user identifier;
inputting the user information, the commodity information and the commodity sequence information of the historical behaviors of the user into a user-commodity interest model to obtain a commodity list of the current user;
and respectively inputting the current user information and the commodity information in the commodity list into a user-commodity re-purchasing period model to obtain the re-purchasing period of each commodity in the commodity list by the current user.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 4 when executing the program.
10. A computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements the method of any of claims 1-4.
CN201910512050.0A 2019-06-13 2019-06-13 Information pushing method and device Pending CN111738787A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910512050.0A CN111738787A (en) 2019-06-13 2019-06-13 Information pushing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910512050.0A CN111738787A (en) 2019-06-13 2019-06-13 Information pushing method and device

Publications (1)

Publication Number Publication Date
CN111738787A true CN111738787A (en) 2020-10-02

Family

ID=72646334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910512050.0A Pending CN111738787A (en) 2019-06-13 2019-06-13 Information pushing method and device

Country Status (1)

Country Link
CN (1) CN111738787A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538141A (en) * 2021-07-14 2021-10-22 中数通信息有限公司 Product recommendation method based on customer information
CN115345718A (en) * 2022-10-19 2022-11-15 易商惠众(北京)科技有限公司 Exclusive-based commodity recommendation method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157097A (en) * 2016-08-22 2016-11-23 北京京东尚科信息技术有限公司 Method of Commodity Recommendation and system
WO2018069817A1 (en) * 2016-10-10 2018-04-19 Tata Consultancy Services Limited System and method for predicting repeat behavior of customers
CN108038545A (en) * 2017-12-06 2018-05-15 湖北工业大学 Fast learning algorithm based on Actor-Critic neutral net continuous controls
CN109471963A (en) * 2018-09-13 2019-03-15 广州丰石科技有限公司 A kind of proposed algorithm based on deeply study
CN109872220A (en) * 2019-01-24 2019-06-11 上海朝朝晤网络科技有限公司 A kind of commercial product recommending list method for pushing and commercial product recommending list supplying system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157097A (en) * 2016-08-22 2016-11-23 北京京东尚科信息技术有限公司 Method of Commodity Recommendation and system
WO2018069817A1 (en) * 2016-10-10 2018-04-19 Tata Consultancy Services Limited System and method for predicting repeat behavior of customers
CN108038545A (en) * 2017-12-06 2018-05-15 湖北工业大学 Fast learning algorithm based on Actor-Critic neutral net continuous controls
CN109471963A (en) * 2018-09-13 2019-03-15 广州丰石科技有限公司 A kind of proposed algorithm based on deeply study
CN109872220A (en) * 2019-01-24 2019-06-11 上海朝朝晤网络科技有限公司 A kind of commercial product recommending list method for pushing and commercial product recommending list supplying system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曾宪宇;刘淇;赵洪科;徐童;王怡君;陈恩红;: "用户在线购买预测:一种基于用户操作序列和选择模型的方法", 计算机研究与发展, no. 08, 15 August 2016 (2016-08-15) *
李裕礞;练绪宝;徐博;王健;林鸿飞;: "基于用户隐性反馈行为的下一个购物篮推荐", 中文信息学报, no. 05, 15 September 2017 (2017-09-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538141A (en) * 2021-07-14 2021-10-22 中数通信息有限公司 Product recommendation method based on customer information
CN115345718A (en) * 2022-10-19 2022-11-15 易商惠众(北京)科技有限公司 Exclusive-based commodity recommendation method and system

Similar Documents

Publication Publication Date Title
CN109102127B (en) Commodity recommendation method and device
CN107563841B (en) Recommendation system based on user score decomposition
US11651381B2 (en) Machine learning for marketing of branded consumer products
CN109087178B (en) Commodity recommendation method and device
CN112487278A (en) Training method of recommendation model, and method and device for predicting selection probability
Chinnamgari R Machine Learning Projects: Implement supervised, unsupervised, and reinforcement learning techniques using R 3.5
CN113508378A (en) Recommendation model training method, recommendation device and computer readable medium
CN107784390A (en) Recognition methods, device, electronic equipment and the storage medium of subscriber lifecycle
CN111242729A (en) Serialization recommendation method based on long-term and short-term interests
Rao A strategist’s guide to artificial intelligence
CN112508613A (en) Commodity recommendation method and device, electronic equipment and readable storage medium
CN113191838B (en) Shopping recommendation method and system based on heterogeneous graph neural network
CN113158024B (en) Causal reasoning method for correcting popularity deviation of recommendation system
CN111949887A (en) Item recommendation method and device and computer-readable storage medium
KR102422408B1 (en) Method and apparatus for recommending item based on collaborative filtering neural network
CN111738787A (en) Information pushing method and device
Ahamed et al. A recommender system based on deep neural network and matrix factorization for collaborative filtering
CN111680213B (en) Information recommendation method, data processing method and device
CN113850654A (en) Training method of item recommendation model, item screening method, device and equipment
CN103425791A (en) Article recommending method and device based on ant colony collaborative filtering algorithm
CN107169830A (en) A kind of personalized recommendation method based on cluster PU matrix decompositions
CN115344794A (en) Scenic spot recommendation method based on knowledge map semantic embedding
CN111681081B (en) Interactive product configuration method and system, computer readable storage medium and terminal
CN113761388A (en) Recommendation method and device, electronic equipment and storage medium
JP6686208B1 (en) Information processing device, information processing method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination