CN112801743B - Commodity recommendation method and device, electronic equipment and storage medium - Google Patents

Commodity recommendation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112801743B
CN112801743B CN202110137907.2A CN202110137907A CN112801743B CN 112801743 B CN112801743 B CN 112801743B CN 202110137907 A CN202110137907 A CN 202110137907A CN 112801743 B CN112801743 B CN 112801743B
Authority
CN
China
Prior art keywords
commodity
vector
determining
recommended
exploration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110137907.2A
Other languages
Chinese (zh)
Other versions
CN112801743A (en
Inventor
王成庆
成建勇
赵巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Necessary Industrial Technology Co ltd
Original Assignee
Zhuhai Necessary Industrial Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Necessary Industrial Technology Co ltd filed Critical Zhuhai Necessary Industrial Technology Co ltd
Publication of CN112801743A publication Critical patent/CN112801743A/en
Application granted granted Critical
Publication of CN112801743B publication Critical patent/CN112801743B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy
    • G06Q30/0629Directed, with specific intent or strategy for generating comparisons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders

Abstract

The embodiment of the invention discloses a commodity recommendation method, a commodity recommendation device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining an inherent characteristic matrix of a current period of the commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period; determining a commodity initial sequencing result vector of the current period based on the inherent characteristic matrix of the current period, the recommended exploration commodity score matrix of the current period and the reinforcement learning parameter vector of the current period; determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result; and determining the commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period. The technical scheme provided by the embodiment of the invention has the advantages of high response speed to high-quality commodities or hot commodities, high recommended commodity attraction and high overall conversion rate.

Description

Commodity recommendation method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to a data processing technology, in particular to a commodity recommendation method, a commodity recommendation device, electronic equipment and a storage medium.
Background
With the continuous expansion of the electronic commerce scale, the number of commodity types and the number of commodities of each commodity type are rapidly increased, and due to the fact that the information amount is large, a user needs to spend a large amount of time to browse a large amount of irrelevant commodity information to find out a commodity which the user wants to purchase, a platform needs to give a good commodity recommendation result, and the requirement of the user for purchasing the commodity is met. In a C2M (Customer-to-Manufacturer) mode, not only the e-commerce platform will recommend commodities to the user, but also the Manufacturer, i.e., the merchant itself, will recommend commodities to the user, but the commodity recommendation method in the prior art is inaccurate in recommendation and low in efficiency.
Disclosure of Invention
The embodiment of the invention provides a commodity recommendation method, a commodity recommendation device, electronic equipment and a storage medium, which have higher response speed to high-quality commodities or hot commodities, enable a commodity recommendation result to be associated with the actual commodity sales volume, are beneficial to purchasing commodities by users, and have high commodity attractiveness and high overall conversion rate.
In a first aspect, an embodiment of the present invention provides a method for recommending a commodity, including:
determining an inherent characteristic matrix of a current period of the commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period;
determining a commodity initial sequencing result vector of the current period based on the inherent characteristic matrix of the current period, the recommended and explored commodity score matrix of the current period and the reinforcement learning parameter vector of the current period;
determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result;
and determining the commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period.
In a second aspect, an embodiment of the present invention further provides a commodity recommendation device, including:
the system comprises a first determining module, a second determining module and a judging module, wherein the first determining module is used for determining an inherent characteristic matrix of a current period of a commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period;
the reinforcement learning module is used for determining a commodity initial sequencing result vector of the current period based on the inherent characteristic matrix of the current period, the recommended exploration commodity score matrix of the current period and the reinforcement learning parameter vector of the current period;
the double feedback loop module is used for determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result;
and the second determining module is used for determining the commodity recommendation result of the current period based on the commodity sequencing intermediate result vector of the current period.
In a third aspect, an embodiment of the present invention further provides a merchant article recommendation device applied to the C2M model, where the merchant article recommendation device includes an article recommendation device according to the second aspect of the present invention.
In a fourth aspect, the embodiment of the present invention further provides a platform commodity recommendation device applied to the C2M model, wherein the platform commodity recommendation device includes a commodity recommendation device according to the second aspect of the present invention as described above.
In a fifth aspect, an embodiment of the present invention provides an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the methods provided by the embodiments of the present invention.
In a sixth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method provided by the present invention.
According to the technical scheme provided by the embodiment of the invention, the initial commodity sequencing result vector of the current period is determined through the intrinsic characteristic matrix based on the current period, the recommended exploration commodity score matrix of the current period and the reinforcement learning parameter vector of the current period, and the intermediate commodity sequencing result vector of the current period is determined through the initial commodity sequencing result vector based on the current period and the actual commodity sales sequencing result; and the commodity recommendation result of the current period is determined through the commodity sorting intermediate result vector of the current period, so that the commodity recommendation method has higher response speed on high-quality commodities or hot commodities, the commodity recommendation result has smaller difference with the actual commodity sales volume, the recommendation accuracy and the recommendation efficiency are improved, the commodity selection of a user is facilitated, the recommended commodity attraction force is high, and the overall conversion rate is high.
Drawings
FIG. 1a is a flow chart of a method for recommending merchandise according to an embodiment of the present invention;
fig. 1b is an exemplary diagram of aggregation processing performed on real-time message data according to an embodiment of the present invention;
fig. 2a is a flowchart of a commodity recommendation method according to an embodiment of the present invention;
fig. 2b is a flowchart of a method for recommending a commodity according to an embodiment of the present invention;
fig. 3a is a block diagram of a structure of a commodity recommending apparatus according to an embodiment of the present invention;
FIG. 3b is a block diagram of a merchandise recommendation device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Fig. 1a is a flowchart of an article recommendation method according to an embodiment of the present invention, where the method may be executed by an article recommendation apparatus, where the apparatus may be implemented by software and/or hardware, and the apparatus may be configured in an electronic device such as a computer, a server, and the like. Optionally, the method may be applied to a scene in which a new user performs commodity recommendation, for example, may be applied to a scene in which a new user performs commodity recommendation in an e-commerce platform, may also be applied to a scene in which a new user performs commodity recommendation in a merchant platform, and may also be applied to a scene in which a platform in an e-commerce platform performs new user commodity recommendation to a merchant, which is not specifically limited.
As shown in fig. 1a, the technical solution provided by the embodiment of the present invention includes:
s110: and determining an inherent characteristic matrix of the current period of the commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period.
In the embodiment of the invention, the electronic device can set the time as one period, and can acquire the real-time message data of the commodity in each period from an electronic commerce website or a server, wherein the real-time message data of the commodity comprises commodity exposure real-time message data, commodity click real-time message data and commodity order real-time message data. Then, the real-time message data of the commodity in each period is processed, so as to obtain the inherent characteristic matrix of the commodity, wherein one period can be ten minutes, or other times.
In a real-time manner of the embodiment of the present invention, optionally, determining the inherent characteristic matrix includes:
aggregating all the commodity exposure real-time message data, commodity click real-time message data and commodity order real-time message data in a period to obtain a real-time expression matrix of the commodity; determining an intrinsic characteristic matrix of the commodity based on the real-time performance matrix of the commodity. The intrinsic characteristic matrix of the current period is obtained by performing aggregation processing on exposure real-time message data, commodity click real-time message data and commodity order real-time message data of all commodities in the current period.
The commodity exposure real-time message data may refer to: the data of the commodity displayed to the user through the platform is converted into real-time message queue data, for example, the number of times the commodity is displayed by the platform. The commodity click real-time message data may refer to: after the goods are displayed to the user through the platform, the behavior data clicked by the user is converted into real-time message queue data, for example, the data may be data for clicking to browse the goods, data for clicking to evaluate, and the like. The commodity order real-time message data may refer to: after the goods are displayed to the user through the platform, the data purchased by the user is converted into real-time message queue data, for example, order data of the goods. In an implementation manner of the embodiment of the present invention, optionally, the determining the inherent characteristic matrix of the commodity based on the real-time performance matrix of the commodity includes: determining a click conversion rate vector of the commodity, an exposure conversion rate vector of the commodity, a good comment vector of the commodity and a discount index vector of the commodity based on the real-time performance matrix of the commodity; forming an intrinsic feature matrix of the commodity based on the click conversion rate vector, the exposure conversion rate vector, the favorable score vector and the discount index vector;
the click conversion rate of each commodity forms a click conversion rate vector of the commodity; the exposure conversion for each commodity forms an exposure conversion vector for the commodity; the number of good scores of each commodity forms a good score vector of the commodity; the discount indices for each item form a discount index vector for the item.
In this embodiment, after aggregating all the real-time message data of the exposure of the goods, the real-time message data of the click of the goods, and the real-time message data of the order of the goods in one period (for an example of aggregating the real-time message data, refer to fig. 1b), the real-time representation matrix D of each goods is obtained by outputting according to the goods dimension, and the following data format is satisfied:
D={de,dc,dp}
de={de1,de2,......,den}
dc={dc1,dc2,......,dcn}
dp={dp1,dp2,......,dpn}
wherein d iseReal-time exposure value vectors for the commodities; denReal-time exposure value of commodity n; dcReal-time click numerical vectors for the commodities; dcnReal-time click numerical values for the commodity n; dpA real-time sales value vector of the commodity; dpnThe real-time purchase value for item n. The inherent characteristic matrix F of the commodity meets the following data format:
F={fcvr,fctcvr,fcom,fdis}
fcvr={fcvr1,fcvr2,......,fcvrn}
fctcvr={fctcvr1,fctcvr2,......,fctcvrn}
fcom={fcom1,fcom2,......,fcomn}
fdis={fdis1,fdis2,......,fdisn}
wherein f iscvrn=dcn/den;fctcvrn=dpn/den
Wherein the content of the first and second substances,
Figure GDA0003562845290000061
dn=(pn-p′n)/pn
wherein, dnA discount rate for commodity n; alpha (alpha) ("alpha")1,α2,α3Weight values which are respectively discount indexes; th1, th2, th3 are threshold values of price intervals respectively; p is a radical ofnIs the price of commercial n, p'nN-fold price for the commodity;
wherein f iscvrThe click conversion rate vector of the commodity is obtained; f. ofcvrnThe click conversion rate of commodity n; f. ofctcvrIs the exposure conversion vector of the commodity; f. ofcvrnExposure conversion for commodity n; f. ofcomA good comment vector of the commodity; f. ofcomnThe number of good scores of the commodity n; f. ofdisA discount index vector for the commodity; f. ofdisnIs the discount index for commodity n.
In the embodiment of the present invention, the reinforcement learning parameter vector may be set manually, or may be determined by other data.
In an implementation manner of the embodiment of the present invention, optionally, the determining the reinforcement learning parameter vector of the current period includes: and determining a reinforcement learning parameter vector of the current period based on the intrinsic characteristic matrix of the last period of the commodity, the recommended exploration commodity score matrix of the last period and the reinforcement learning score vector of the last period. The inherent feature matrix of the previous period may be determined by referring to the above-mentioned method of the inherent feature matrix. Each period corresponds to a recommended exploration commodity score matrix, and each period corresponds to a reinforcement learning score vector.
In an implementation manner of the embodiment of the present invention, optionally, the recommended explored commodity score matrix is determined based on the following manner: forming a recommended exploration commodity vector based on the recommended exploration score value of each commodity; forming a continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action score value of each commodity; and determining a score matrix of the recommended exploration commodities based on the vector of the recommended exploration commodities and the continuous action vector of the recommended exploration commodities.
Optionally, the recommended exploration score value is determined based on the following formula;
Figure GDA0003562845290000071
wherein e isenThe recommended exploration score value of the commodity n is shown, and phi is a first sales threshold value of the commodity; f. ofcvrnExposure conversion for commercial n; dpnThe purchase quantity value of the commodity n;
wherein the content of the first and second substances,
Figure GDA0003562845290000081
wherein disnThe value of the exploration value score of the commodity n; beta is a1、β2、γ1、γ2Linear coefficients of the value fraction values are explored respectively; p is a radical ofth1、pth2、pth3Respectively a second sales threshold, a third sales threshold and a fourth sales threshold of the commodity; wherein ε is a minimum set value, which may be taken to be 0.001.
Wherein the content of the first and second substances,
Figure GDA0003562845290000082
an increment vector of the sales quantity of the commodity n in the mth period relative to the previous period;
Figure GDA0003562845290000083
is the average of the delta vectors;
Figure GDA0003562845290000084
is the incremental vector variance; wherein the increment vector diffpur={diffpur1,diffpur2,......,diffpurn};
Wherein the content of the first and second substances,
Figure GDA0003562845290000085
Figure GDA0003562845290000086
wherein the recommended exploration sustained action score value is determined based on the following formula:
Figure GDA0003562845290000087
wherein rho is a time attenuation coefficient, and is more than 0 and less than 1;
Figure GDA0003562845290000088
recommending and exploring the score value of the commodity n in the ith period; th (h)eSearching score values for preset recommendations; e.g. of the typeremnExploring a sustained action score value for the recommendation of commodity n; t is the maximum number of cycles that the search product is recommended to have a sustained effect on the score.
In this embodiment, optionally, the forming a recommended search commodity vector based on the recommended search score value of each commodity includes: determining the recommended exploration commodity vector based on the following formula:
Ee={ee1,ee2,......,een}
wherein E iseExploring commodity vectors for the recommendations;
correspondingly, the method for forming the continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action point value of each commodity comprises the following steps: determining the recommended exploration commodity continuous action vector based on the following formula:
Erem={erem1,erem2,......,eremn}
wherein E isremExploring commodity persistence exposure vectors for the recommendations;
correspondingly, the determining a score matrix of recommended exploration commodities based on the vector of recommended exploration commodities and the continuous action vector of recommended exploration commodities comprises: determining the recommended exploration commodity score matrix based on the following formula:
E={Ee,Erem}
wherein E is the recommended exploration commodity score matrix.
For the method for determining the recommended search commodity score matrix in each period, the above method may be referred to.
In an implementation manner of the embodiment of the present invention, optionally, the reinforcement learning score vector is determined based on the following formula:
Figure GDA0003562845290000091
wherein R is the reinforcement learning score vector;
wherein N ism=dc+dp;dcReal-time click numerical vectors for the commodities; d is a radical ofpA real-time sales value vector of the commodity;
wherein the content of the first and second substances,
Figure GDA0003562845290000092
wherein Q is1The initial reward point value obtained by the 1 st period reinforcement learning; qmThe reward point value obtained by the reinforcement learning of the mth period; deiExposure value of the commodity in the ith period;
wherein the content of the first and second substances,
Figure GDA0003562845290000093
wherein, dcmnRepresenting whether the commodity n is clicked or not in the mth period; when d iscmnCharacterization 0 indicates that in the mth cycle, item n is not clicked; when d iscmnCharacterizing that in the mth cycle, item n is clicked;
wherein, dpmnCharacterizing whether the commodity n is purchased or not in the mth period; when d isqmnCharacterization 0 in the mth cycle, item n is not purchased, when dgmnCharacterizing that at the mth cycle, item n is purchased; wherein, the bonus1 is the reward point value after the commodity is clicked; bonus2 is the value of the prize score after the merchandise has been purchased. The Bandit function is a function used for evaluating reward points of click and purchase actions in reinforcement learning. In an implementation manner of the embodiment of the present invention, optionally, the determining a reinforcement learning parameter vector of a current cycle based on the commodity intrinsic feature matrix of the previous cycle, the recommended search commodity score matrix of the previous cycle, and the reinforcement learning score vector of the previous cycle includes:
adjusting the reinforcement learning parameter vector based on the following formula:
Figure GDA0003562845290000101
wherein the content of the first and second substances,
Figure GDA0003562845290000102
wherein θ ═ R, fcvr,fctcvr,fcom,fdis,Ee,Erem};
Wherein L (θ) ═ Lctcvr(θ)+Lcomment(θ);
Wherein W ═ { W ═ WR,WF,WE},WF={Wcvr,Wctcvr,Wcom,Wdis},WE= {WEe,WErem};
Wherein, WiIs the weighted value of the ith dimension; wiIs an element in the reinforcement learning parameter vector W; c. CiBoundary constraint condition of ith dimension; theta is an input feature vector; r is a reinforcement learning score vector, fcvrAs a click conversion vector for the good, fctcvrAn exposure conversion vector for the commodity; f. ofcomThe evaluation number vector is the good evaluation number vector of the commodity; f. ofdisA discount index vector for the commodity; eeExploring commodity vectors for the recommendations; eremExploring commodity continuous action vectors for the recommendations; l (theta) is a multi-objective loss function; l isctcvr(θ) is an exposure conversion loss function; l iscomment(θ) is the total merit loss function;
wherein, WRA corresponding weight value of the score vector R for reinforcement learning; wFThe weight vector corresponding to the inherent characteristic matrix F; wESearching a corresponding weight vector of the commodity score matrix E for recommendation; wcvrThe weight values are corresponding to the click conversion rate vectors; wctcvrThe weight value is corresponding to the exposure conversion rate vector; wcomThe weight value corresponding to the good score number vector is obtained; w is a group ofdisAnd the weight value is the weight value corresponding to the discount index vector.
Inputting the commodity intrinsic characteristic matrix F of the previous period, the recommended and explored commodity score matrix E of the previous period and the reinforcement learning score vector R of the previous period into the automatic parameter adjusting module, and calculating the reinforcement learning parameter vector W by the automatic parameter adjusting module; the weight values of all the matrixes or vectors in the reinforcement learning algorithm can be determined, so that the initial commodity sorting score value in the subsequent output initial commodity sorting result vector is close to the actual high-income direction as much as possible, and the income generated by commodity recommendation can be the highest by inputting the initial commodity sorting score value into the automatic parameter adjusting module for weight value adjustment of a plurality of factors.
The automatic parameter adjusting module can adjust parameters through a multi-target supervised learning algorithm, comprises two targets of improving exposure conversion rate and total commodity goodness, and is realized through constructing a loss function. The automated parameter tuning module adjusts parameters or may also be implemented by a multi-objective evolutionary algorithm of a Vector Evaluation Genetic Algorithm (VEGA), a multi-objective genetic algorithm (MOGA), a non-dominated sorting genetic algorithm (NSGA), a non-dominated sorting genetic algorithm with elite strategies (NSGA 2).
S120: and determining the initial commodity sorting result vector of the current period based on the intrinsic characteristic matrix of the current period, the recommended and explored commodity score matrix of the current period and the reinforcement learning parameter vector of the current period.
In this embodiment, the intrinsic feature matrix F of the current period, the recommended search commodity score matrix E of the current period, and the reinforcement learning parameter vector W of the current period may be input to the reinforcement learning module, and the reinforcement learning module may calculate a commodity preliminary ranking result vector R1 through a dobby gambling machine algorithm; the algorithm in the reinforcement learning module can also be an epsilon dobby gambling machine method, a reinforcement learning method of a gradient strategy, a Monte Carlo tree searching method and the like.
In an implementation manner of the embodiment of the present invention, optionally, the determining the initial commodity ranking result vector of the current period based on the commodity intrinsic feature matrix of the current period, the recommended and explored commodity score matrix of the current period, and the reinforcement learning parameter vector of the current period includes:
determining the initial commodity ordering result vector based on the following formula:
R1=concat(R,F,E)·W
wherein, R1 is the vector of the initial commodity ordering result; f is the inherent characteristic matrix, and W is the reinforcement learning parameter vector. The concat function is a function that combines vectors with the same dimension into a matrix according to columns.
Wherein, the reinforcement learning score vector can be calculated by the reinforcement learning module, wherein, the calculation process can be as follows
Wherein the content of the first and second substances,
Figure GDA0003562845290000121
wherein R is the reinforcement learning score vector;
wherein N ism=dc+dp;dcReal-time click numerical vectors for the commodities; d is a radical ofpA real-time sales value vector of the commodity;
wherein the content of the first and second substances,
Figure GDA0003562845290000122
wherein Q is1The initial reward point value obtained by the 1 st period reinforcement learning; qmThe reward point value obtained by reinforcement learning for the mth period; d is a radical ofeiExposure value of the commodity in the ith period; wherein the mth cycle may be the current cycle;
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003562845290000123
wherein d iscmnRepresenting whether the commodity n is clicked or not in the mth period; when d iscmnCharacterization 0 indicates that in the mth cycle, item n is not clicked; when d iscmnCharacterizing that in the mth cycle, item n is clicked;
wherein d ispmnCharacterizing whether the commodity n is purchased or not in the mth period; when d isqmnCharacterization 0 in the mth cycle, item n is not purchased, when dqmnCharacterizing that at the mth cycle, item n is purchased; wherein, the bonus1 is the reward point value after the commodity is clicked; bonus2 is the value of the prize score after the merchandise has been purchased. It should be noted that the high-quality commodities in the current cycle can be searched by determining the score matrix of the recommended exploration commodities, so that the high-quality commodities quickly rise to the head of the initial sequencing result, and meanwhile, the continuous action scores of the recommended exploration commodities are introduced by considering the action time of the high-quality commodities, so that the high-quality commodities act for a plurality of cycles. And introducing an exploration value point value as a parameter for evaluating the exploration value of the commodity in consideration of the fact that the high-quality commodity rapidly ascends to the extent that the high-quality commodity should not contain the hot commodity which is already at the head of the initial sequencing result.
S130: and determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result.
In the embodiment of the invention, the commodity actual sales volume sorting result can be the actual sales volume sorting result of all commodities in the current period and is used as the commodity actual sales volume sorting result DF; or inputting the real-time expression matrix D of the current period into the data screening processing module, sorting according to the decreasing order of the sales volume, selecting all the commodities of the first X names and the corresponding sales volume, and outputting the actual sales volume sorting result of the X names before the sales volume as the actual sales volume sorting result DF of the commodities.
In the embodiment of the invention, the commodity initial sorting result vector R1 and the commodity actual sales sorting result DF of the current period are input into the double feedback loop module, and a commodity sorting intermediate result vector R2 is output. Specifically, commodities with low actual sales but ranked first in R1 may be de-weighted (the feedback gain factor may be decreased), commodities with high actual sales but ranked later in R1 may be weighted (the feedback gain factor may be increased), and commodities with high actual sales may be ranked to the head of R1, and the head of R1 does not contain commodities with low sales.
In an implementation manner of the embodiment of the present invention, optionally, the determining a commodity sorting intermediate result vector in the current cycle based on the commodity initial sorting result vector in the current cycle and the commodity actual sales sorting result includes: determining a feedback gain coefficient of the commodities based on the actual sales volume sorting result of the commodities; and determining a commodity sequencing intermediate result vector of the current period based on the feedback gain coefficient of each commodity and the commodity initial sequencing result vector.
In an implementation manner of the embodiment of the present invention, optionally, the feedback gain coefficient of the commodity is determined based on the following formula:
Figure GDA0003562845290000141
wherein, buffernA feedback gain factor for commodity n; r1th1、R1th2Is the segmentation threshold in R1;
wherein, DF1、DF2、DF3Respectively are commodity sets segmented according to commodity sales volume; r1nIs the initial sorting result score value of the commodity n; dpnThe sales value of the commodity n is shown; wherein the content of the first and second substances,
DF1={dp11,......,dp1r},dpn>dpth1
DF2={dp21,......,dp2s},dpth1≥dpn>dpth2
DF3={dp31,......,dp3u},dpth3≥dpn>dpth3
DF={DF1,DF2,DF3};
r+s+u=X
wherein d ispth1、dpth2、dpth3A threshold value for commodity sales; r, s and u are the commodity numbers corresponding to the three commodity sets respectively; and X is the number of the commodities in the commodity sales actual sequencing result.
In this embodiment, optionally, the determining a commodity sorting intermediate result vector in the current period based on the feedback gain coefficient of each commodity and the commodity initial sorting result vector includes:
determining the commodity ordering intermediate result vector based on the following formula:
R2={R21,R22,......,R2n}
R2n=R1n×buffern
wherein R2 is the commodity ordering intermediate result vector; r2nThe intermediate result point value is sorted for commodity n.
S140: and determining the commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period.
In an implementation manner of this embodiment, optionally, the determining the commodity recommendation result of the current round based on the commodity sorting intermediate result vector of the current round includes: forming a commodity sorting result matrix according to commodity applicable gender based on the commodity sorting intermediate result vector; and determining the commodity recommendation result of the current round based on the commodity sequencing result matrix and the gender of the user.
In this embodiment of the present invention, optionally, forming a commodity sorting result matrix according to the commodity applicable gender based on the commodity sorting intermediate result vector includes: determining a commodity ordering result matrix based on the following formula:
R3={R3female,R3male,R3common}
wherein, R3 is the commodity ordering result matrix;
wherein the content of the first and second substances,
Figure GDA0003562845290000151
pi1 or pi=-1,
Figure GDA0003562845290000152
Wherein the content of the first and second substances,
Figure GDA0003562845290000153
pi0, or pi=-1,
Figure GDA0003562845290000154
Wherein the content of the first and second substances,
Figure GDA0003562845290000155
wherein the content of the first and second substances,
Figure GDA0003562845290000156
Figure GDA0003562845290000157
Figure GDA0003562845290000158
wherein the content of the first and second substances,
Figure GDA0003562845290000161
wherein p isiGender variation is applied to the commodity;
Figure GDA0003562845290000162
applying gender descriptors to the commodity;
wherein, male, female and common are respectively characterized as the applicable gender of male, female and common;
R3female,R3male,R3commonrespectively are commodity sequencing result vectors suitable for women, men and the universities;
Figure GDA0003562845290000163
the ranking result point values of the commodity n suitable for women, men and general use are respectively.
In an implementation manner of the embodiment of the present invention, optionally, the determining a gender vector of the user based on the physiological gender of the user and the shopping tendency gender of the user includes: determining the user gender vector based on the following formula:
Figure GDA0003562845290000164
wherein, UsexA user gender vector;
wherein the content of the first and second substances,
Figure GDA0003562845290000165
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003562845290000166
is the physiological gender variable of user j;
Figure GDA0003562845290000167
a shopping propensity gender variable for user j;
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003562845290000168
Figure GDA0003562845290000169
wherein, usersexjA physiological sex descriptor for user j;
Figure GDA00035628452900001610
the total number of times that the user j clicks on the commodity;
Figure GDA00035628452900001611
the number of times that the suitable gender is a female commodity is clicked for the user j;
Figure GDA00035628452900001612
total number of purchases of merchandise for user j;
Figure GDA00035628452900001613
the number of times of purchasing a commodity of which gender is suitable for women for the user j; gamma ray1、γ2、δ1、δ2Are respectively proportional coefficients;
correspondingly, the determining the commodity recommendation result of the current cycle based on the commodity sorting result matrix and the user gender vector includes: determining a commodity recommendation based on the following formula:
Figure GDA0003562845290000171
wherein, output is a set formed by the commodity recommendation results; th (h)femaleA threshold for use with a female commodity; th (th)maleA threshold value for a suitable male commodity; other indicates other situations. th (h)femaleAnd thmaleCan be preset manually。
In the related art, after acquiring new user information, a platform generally needs to recommend a commodity to a new user by combining commodity characteristics and user requirements; however, the new user is easy to lose due to the characteristics of less behaviors, no obvious direction of shopping demand, unfamiliarity with the e-commerce platform function and the like. Therefore, it is necessary to make proper commodity recommendation for new users with less information, and currently, the commodity recommendation cold start solution for new users generally includes the following three types:
the first method is to count the sales of all commodities of the whole recent platform to obtain the sales ranking list of all commodities; and then, screening the commodities at the head of the ranking list according to the applicable gender of the commodities, and recommending the commodities for male/female/users with unknown gender.
The second is to linearly integrate multiple commodity features. Specifically, the commodity characteristics include sales volume, click conversion rate, exposure conversion rate, category to which the commodity belongs, goodness, and the like. The linear integrated formula is:
Figure GDA0003562845290000172
wherein SnThe final ranking score characterizing the good n, k the number of characteristic categories characterizing the good,
Figure GDA0003562845290000173
the ith characteristic score, α, representing item niAnd representing the characteristic weight value corresponding to the characteristic i. Typically, the feature weight values are calculated by an artificial learning method (such as a neural network or a support vector machine).
The third is to calculate the commodity ranking using a reinforcement learning method, such as a dobby method or a gradient strategy method. The reinforcement learning method can learn which commodities are more popular in the platform in real time, and the sequence of the commodities is promoted forwards.
However, the above solutions have poor real-time performance, and the first and second solutions have the drawback that statistics of various performances and characteristics of the product is generally required on a daily basis. The current commercial product purchasing hotspot occurring in real time cannot be quickly responded. For example, day 18: when 00 hours, a certain commodity A in the platform gets a lot of attention, the click rate and the purchase rate are increased dramatically, the first scheme and the second scheme need to be counted by taking days as units, and the current commodity purchasing hotspot cannot be tracked quickly. According to the technical scheme provided by the embodiment of the invention, the current commodity purchasing hotspot can be quickly tracked through the reinforcement learning algorithm in the reinforcement learning module, the commodity sequence is quickly increased, the commodity sequence can be calculated in real time, the real-time performance is strong, the response is quick, and the current commodity purchasing hotspot is quickly tracked.
According to the technical scheme in the related technology, the commodity recommendation result is not controlled, and the difference between the commodity recommendation result and the commodity sales amount in the past day is large. Among others, the third above solution has this drawback. If a certain commodity is clicked in a large amount in a short time, but the actual sales volume is not satisfactory or the number of bad comments is large, the commodity is likely to be promoted to the head of the sequencing result, which is not favorable for the user to select the commodity and can cause the platform conversion rate to be reduced. According to the embodiment of the invention, the commodity sequencing intermediate result vector is determined by the commodity initial sequencing result vector and the commodity actual sales quantity sequencing result, so that the commodity recommendation result is determined according to the commodity sequencing intermediate result vector, that is, the reinforcement learning result can be fed back and optimized by utilizing the commodity actual sales quantity condition, commodities with high actual sales quantity but low reinforcement learning sequencing result sequencing can be moved upwards, and otherwise, the commodity sequencing is moved downwards, so that hot commodities or high-quality commodities can be recommended to a user.
The technical scheme in the three related technologies has low response speed to high-quality goods or hot-spot goods. Specifically, when a certain commodity gets attention of a large number of users in a short time in a new process or after being put on shelf for a period of time, the click rate and the sales rate of the commodity are greatly increased, and the commodity can be regarded as a high-quality commodity or a hot commodity. The three schemes have the defect of slow reaction to high-quality commodities or hot commodities, and particularly the high-quality commodities or hot commodities on the same day cannot well move up the sequence of the high-quality commodities or the hot commodities quickly. According to the embodiment of the invention, the commodity recommendation and exploration score matrix is determined, and the initial commodity sorting result vector is determined based on the commodity recommendation and exploration score matrix and other parameters, so that a commodity recommendation result is obtained, namely, the commodity is recommended and explored in the commodity recommendation process, high-quality commodities or hot commodities can be searched and quickly raised to the head of the sorting result, and the problem of low response speed to the high-quality commodities or the hot commodities is solved.
In the related art, the first and second schemes in the above related art are poorly adaptable. The general needs of general users vary widely, and the types of hot commodities change continuously along with the change of seasons, weather, the dominant age and the dominant gender in the guest group. The first scheme and the second scheme of manually setting parameters have difficulty in tracking large-scale demand changes over a long time span. According to the embodiment of the invention, the reinforcement learning parameter vector of the current period is determined through the intrinsic characteristic matrix of the previous period, the recommended exploration commodity score matrix of the previous period and the reinforcement learning score vector of the previous period, namely, each period, and the adaptability to response speed and environment can be strong by adjusting the reinforcement learning parameters (adjusting the weight value of the matrix or the vector), so that high user attraction and conversion rate can be kept for recommendation scenes of different conditions.
According to the technical scheme provided by the embodiment of the invention, the initial commodity sorting result vector of the current period is determined by the intrinsic characteristic matrix of the current period, the recommended and explored commodity score matrix of the current period and the reinforcement learning parameter vector of the current period, and the intermediate commodity sorting result vector of the current period is determined by the initial commodity sorting result vector of the current period and the actual commodity sales sorting result; and the commodity recommendation result of the current period is determined through the commodity sorting intermediate result vector of the current period, so that the commodity recommendation method has higher response speed to high-quality commodities or hot commodities, the commodity recommendation result has smaller difference with the actual commodity sales volume, the commodity purchasing by users is facilitated, the recommended commodity attraction is high, and the overall conversion rate is high.
Fig. 2a is a flowchart of a commodity recommendation method according to an embodiment of the present invention, in the embodiment, determination of a reinforcement learning parameter vector is optimized, and as shown in fig. 2a, a technical solution according to the embodiment of the present invention includes:
s210: and determining the intrinsic characteristic matrix of the current period of the commodity.
S220: determining a reinforcement learning parameter vector of the current period based on the intrinsic characteristic matrix of the last period of the commodity, the score matrix of the recommended and explored commodity of the last period and the reinforcement learning score vector of the last period
S230: and determining the recommended exploration commodity score matrix of the current period.
S240: and determining the initial commodity sorting result vector of the current period based on the intrinsic characteristic matrix of the current period, the recommended and explored commodity score matrix of the current period and the reinforcement learning parameter vector of the current period.
S250: and determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result.
S260: and forming a commodity sequencing result matrix according to the commodity applicable gender based on the commodity sequencing intermediate result vector.
S270: and determining the commodity recommendation result of the current round based on the commodity sequencing result matrix and the gender of the user.
Reference is made to the above embodiments for the introduction of S210-S270.
In order to describe the technical solution of the present invention in more detail, as shown in fig. 2b, the technical solution provided by the embodiment of the present invention includes the following steps:
step 1: and aggregating all the commodity exposure real-time message data, the commodity click real-time message data and the commodity order real-time message data in the last period T, and then outputting according to the commodity dimension.
Step 2: and (3) judging whether the real-time data processing in the step (1) is finished, if so, entering the next step, if not, processing is abnormal, and interrupting and exiting.
The mode for judging whether the real-time data processing of the commodity is finished comprises the following steps:
the exposure real-time message data, the click real-time message data and the order real-time message data of all the commodities in the last period T are processed and calculated, and data are not missed; and whether the processed data conforms to the following data format:
D={de,dc,dp}
de={de1,de2,......,den}
dc={dc1,dc2,......,dcn}
dp={dp1,dp2,......,dpn}
if the condition is not met, the real-time message data processing is judged to be not finished, and the processing process is interrupted. And forming a real-time expression matrix D from the processed real-time data, and forming an inherent characteristic matrix F of the previous round based on the real-time expression matrix.
And step 3: inputting the inherent characteristic matrix F of the commodity in the previous period, the score matrix E of the recommended explored commodity in the calculation result of the previous period and the score vector R of the reinforcement learning in the previous period into an automatic parameter adjusting module; and the reinforced learning parameter vector W is automatically calculated by an automatic parameter adjusting module.
And 4, step 4: and calculating a recommended exploration commodity score matrix E of the current period by the recommended exploration module according to the real-time performance matrix D of the current period of each commodity.
And 5: inputting the inherent characteristic matrix F of the commodities, the reinforcement learning parameter vector W and the recommended exploration commodity score matrix E of the current period into a reinforcement learning module, and calculating an initial commodity sequencing result vector R1 by the reinforcement learning module.
Step 6: and forming a real-time expression matrix D from the real-time data processed in the current period, inputting the real-time expression matrix D into a data screening processing module, sorting the commodities according to the decreasing order of sales volume, selecting all the commodities with the name of X before the sales volume and the corresponding sales volume, and outputting an actual sales volume sorting result DF of the commodities with the name of X before the sales volume.
And 7: and judging whether the data screening is finished or not. The conditions for judging the completion of the data screening comprise: (1) screening all the commodity real-time expression matrixes D; (2) the actual sales ranking result DF after screening complies with the following data format:
DF={DF1,DF2,DF3}
DF1={dp11,......,dp1r},dpn>dpth1
DF2={dp21,......,dp2s},dpth1≥dpn>dpth2
DF3={dp31,......,dp3u},dpth3≥dpn>dpth3
if the condition is not met, the data screening is not finished, and the processing process is interrupted.
And 8: inputting the initial commodity sorting result vector R1 and the actual commodity sales sorting result DF into a double feedback loop module for sorting, and outputting a commodity sorting intermediate result vector R2.
And step 9: and sorting the commodity sorting intermediate result vector R2 into a commodity sorting result matrix R3 according to the commodity applicable gender.
Step 10: and calculating a user gender vector Usex according to the physiological gender and the shopping tendency gender of the user.
Step 11: and outputting a final commodity recommendation result recommended for the specific user according to the user gender vector Ulex and the commodity sequencing result matrix R3.
Step 12: and after the calculation of the current round is finished, recording the intermediate variable, recommending and exploring the commodity score matrix E and the score vector R of reinforcement learning, and providing data basis for the next round of calculation.
Fig. 3a is a block diagram of a structure of a product recommendation device according to an embodiment of the present invention, and as shown in fig. 3a, the device according to the embodiment of the present invention includes: a first determination module 310, a reinforcement learning module 320, a dual feedback loop module 330, and a second determination module 340.
The first determining module 310 is configured to determine an inherent feature matrix of a current period of a commodity, a recommended exploration commodity score matrix of the current period, and a reinforcement learning parameter vector of the current period;
the reinforcement learning module 320 is configured to determine a commodity initial sequencing result vector of the current period based on the inherent feature matrix of the current period, the recommended exploration commodity score matrix of the current period, and the reinforcement learning parameter vector of the current period;
a double feedback loop module 330, configured to determine a commodity sorting intermediate result vector in the current period based on the commodity initial sorting result vector in the current period and the commodity actual sales sorting result;
and the second determining module 340 is configured to determine the commodity recommendation result in the current period based on the commodity sorting intermediate result vector in the current period.
Here, the article recommendation device may be a user device or a server capable of implementing article recommendation, or may be a device in which both the user device and the server are integrated. In particular embodiments, user devices include, but are not limited to, smart phones, tablets, wearable devices, devices capable of interacting through speech, and other electronic products. In particular embodiments, the server includes, but is not limited to, implementations such as a network host, a single network server, a collection of network servers, or a cloud-computing-based computer collection. Here, the Cloud is made up of a large number of hosts or web servers based on Cloud Computing (Cloud Computing), which is a type of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. It will be understood by those skilled in the art that the above-described article recommendation device is merely exemplary, and other existing or future article recommendation devices may be adapted to the present invention and are included within the scope of the present invention and are hereby incorporated by reference. Here, the commodity recommending apparatus includes an electronic device capable of automatically performing numerical calculation and information processing according to instructions set or stored in advance, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
As shown in fig. 3b, based on the above embodiment, optionally, the first determining module 310 includes an automatic parameter adjusting module 350, configured to determine a reinforcement learning parameter vector of the current period based on the intrinsic feature matrix of the previous period of the item, the recommended search item score matrix of the previous period, and the reinforcement learning score vector of the previous period.
Optionally, the second determining module 340 includes an output sorting module 341 and a recommendation result output module 342;
the output sorting module 341 is configured to form a commodity sorting result matrix according to the commodity applicable gender based on the commodity sorting intermediate result vector;
and a recommendation result output module 342, configured to determine a recommendation result of the current round of commodities based on the commodity ranking result matrix and the user gender.
Optionally, the first determining module 310 includes a data processing module 360, configured to aggregate all of the commodity exposure real-time message data, the commodity click real-time message data, and the commodity order real-time message data in one period to obtain a real-time expression matrix of the commodity;
determining an intrinsic characteristic matrix of the commodity based on the real-time performance matrix of the commodity.
Optionally, the determining an inherent characteristic matrix of the commodity based on the real-time performance matrix of the commodity includes:
determining a click conversion rate vector of the commodity, an exposure conversion rate vector of the commodity, a good comment vector of the commodity and a discount index vector of the commodity based on the real-time performance matrix of the commodity;
forming an intrinsic feature matrix of the commodity based on the click conversion rate vector, the exposure conversion rate vector, the favorable rating vector and the discount index vector;
the click conversion rate of each commodity forms a click conversion rate vector of the commodity;
the exposure conversion for each commodity forms an exposure conversion vector for the commodity;
the number of good scores of each commodity forms a good score vector of the commodity;
the discount indices for each item form a discount index vector for the item.
Optionally, the first determining module 310 includes a recommendation exploration module 370, configured to:
forming a recommended exploration commodity vector based on the recommended exploration score value of each commodity;
forming a continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action score value of each commodity;
and determining a score matrix of the recommended exploration commodities based on the vector of the recommended exploration commodities and the continuous action vector of the recommended exploration commodities.
Optionally, the recommended exploration score value is determined based on the following formula;
Figure GDA0003562845290000251
wherein e isenThe recommended exploration score value of the commodity n is shown, and phi is a first sales threshold value of the commodity; f. ofcvrnExposure conversion for commercial n; d is a radical ofpnThe purchase quantity value of the commodity n;
wherein the content of the first and second substances,
Figure GDA0003562845290000252
wherein disnIs the value of the exploration value score of the commodity n; beta is a1、β2、γ1、γ2Linear coefficients of the value fraction values are explored respectively; p is a radical of formulath1、pth2、pth3Respectively a second sales threshold, a third sales threshold and a fourth sales threshold of the commodity;
wherein the content of the first and second substances,
Figure GDA0003562845290000253
the increment vector of the sales volume of the commodity n in the mth period relative to the previous period is obtained;
Figure GDA0003562845290000254
is the average of the delta vectors;
Figure GDA0003562845290000255
is the incremental vector variance; wherein the delta vector diffpur={diffpur1,diffpur2,......,diffpurn};
Wherein the recommended exploration sustained action score value is determined based on the following formula:
Figure GDA0003562845290000261
wherein rho is a time attenuation coefficient, and is more than 0 and less than 1;
Figure GDA0003562845290000262
recommending and exploring a point value of the commodity n in the ith period; th (h)eA preset recommended exploration score value; e.g. of a cylinderremnExploring a sustained action score value for the recommendation of commodity n; t is the maximum number of cycles that the search product is recommended to have a sustained effect on the score.
Optionally, the forming a recommended exploration commodity vector based on the recommended exploration score value of each commodity includes:
determining the recommended exploration commodity vector based on the following formula:
Ee={ee1,ee2,......,een}
wherein E iseExploring commodity vectors for the recommendations;
correspondingly, the method for forming the continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action point value of each commodity comprises the following steps:
determining the recommended exploration commodity continuous action vector based on the following formula:
Erem={erem1,erem2,......,eremn}
wherein E isremExploring commodity persistence exposure vectors for the recommendations;
correspondingly, the determining a score matrix of the recommended exploration commodity based on the vector of the recommended exploration commodity and the continuous action vector of the recommended exploration commodity comprises:
determining the recommended exploration commodity score matrix based on the following formula:
E={Ee,Erem}
wherein E is the recommended exploration commodity score matrix.
Optionally, the reinforcement learning score vector is determined based on the following formula:
Figure GDA0003562845290000263
wherein R is the reinforcement learning score vector;
wherein, Nm=dc+dp;dcA click numerical vector for the commodity; d is a radical ofpThe sales quantity numerical vector of the commodity is obtained;
wherein the content of the first and second substances,
Figure GDA0003562845290000271
wherein Q is1The initial reward point value obtained by the 1 st period reinforcement learning; qmThe reward point value obtained by the m period reinforcement learning; deiExposure value of the commodity in the ith period;
wherein the content of the first and second substances,
Figure GDA0003562845290000272
wherein d iscmnRepresenting whether the commodity n is clicked or not in the mth period; when d iscmnCharacterization 0 indicates that in the mth cycle, item n is not clicked; when d iscmnCharacterizing that in the mth cycle, item n is clicked;
wherein d ispmnCharacterizing whether the commodity n is purchased or not in the mth period; when d isqmnCharacterization 0 in the mth cycle, item n is not purchased, when dqmnCharacterizing that at the mth cycle, item n is purchased;
wherein, the bonus1 is the reward point value after the commodity is clicked; the bonus points value of bouquet 2 is the value of the bonus points after the merchandise is purchased.
Optionally, the automation parameter adjustment module 350 is configured to:
adjusting the reinforcement learning parameter vector based on the following formula:
Figure GDA0003562845290000273
wherein the content of the first and second substances,
Figure GDA0003562845290000274
wherein θ ═ { R, f ═cvr,fctcvr,fcom,fdis,Ee,Erem};
Wherein L (θ) ═ Lctcvr(θ)+Lcomment(θ);
Wherein W is { W ═ WR,WF,WE},WF={Wcvr,Wctcvr,Wcom,Wdis},WE= {WEe,WErem};
Wherein, WiIs the weighted value of the ith dimension; wiIs an element in the reinforcement learning parameter vector W; c. CiBoundary constraint condition of ith dimension; theta is an input feature vector; r is a reinforcement learning score vector, fcvrAs a click conversion vector for the good, fctcvrAn exposure conversion vector for the commodity; f. ofcomThe evaluation number vector is the good evaluation number vector of the commodity; f. ofdisA discount index vector for the commodity; eeExploring commodity vectors for the recommendations; eremExploring commodity continuous action vectors for the recommendations; l (theta) is a multi-objective loss function; l is a radical of an alcoholctcvr(θ) is an exposure conversion loss function; l iscomment(θ) is the total merit loss function;
wherein, WRA corresponding weight value of the score vector R for reinforcement learning; wFThe weight vector corresponding to the inherent characteristic matrix F; wESearching a corresponding weight vector of the commodity score matrix E for recommendation; wcvrThe weight values are corresponding to the click conversion rate vectors; wctcvrThe weight value is corresponding to the exposure conversion rate vector; wcomThe weight value corresponding to the good score number vector is obtained; wdisAnd the weight value is the weight value corresponding to the discount index vector.
Optionally, the reinforcement learning module 320 is configured to:
determining the initial commodity ordering result vector based on the following formula:
R1=concat(R,F,E)·W
wherein, R1 is the vector of the initial commodity ordering result; f is the inherent characteristic matrix, and W is the reinforcement learning parameter vector.
Optionally, the apparatus further includes a data filtering module 390 for determining the actual sales volume sorting result of the commodities;
optionally, the double feedback loop module 330 is configured to determine a feedback gain coefficient of the commodity based on the actual sales quantity sorting result of the commodity;
and determining a commodity sequencing intermediate result vector of the current period based on the feedback gain coefficient of each commodity and the commodity initial sequencing result vector.
Optionally, the determining a feedback gain coefficient of the commodity based on the actual sales volume sorting result of the commodity includes:
determining a feedback gain factor for the commodity based on the following equation:
Figure GDA0003562845290000291
wherein, buffernA feedback gain factor for commodity n; r1th1、R1th2Is the segmentation threshold in R1; wherein, DF1、DF2、DF3Respectively are commodity sets segmented according to commodity sales volume; r1nIs the initial ranking result point value of the commodity n; dpnThe sales value of the commodity n is shown;
correspondingly, determining a commodity sorting intermediate result vector of the current period based on the feedback gain coefficient of each commodity and the commodity initial sorting result vector, including:
determining the commodity ordering intermediate result vector based on the following formula:
R2={R21,R22,......,R2n}
R2n=R1n×buffern
wherein R2 is the commodity ordering intermediate result vector; r2nThe intermediate result point value is sorted for commodity n.
Optionally, forming a commodity sorting result matrix according to commodity applicable gender based on the commodity sorting intermediate result vector, including:
determining a commodity ordering result matrix based on the following formula:
R3={R3female,R3male,R3common}
wherein, R3 is the commodity ordering result matrix;
wherein the content of the first and second substances,
Figure GDA0003562845290000301
pi1 or pi=-1,
Figure GDA0003562845290000302
Wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003562845290000303
pi0, or pi=-1,
Figure GDA0003562845290000304
Wherein the content of the first and second substances,
Figure GDA0003562845290000305
wherein the content of the first and second substances,
Figure GDA0003562845290000306
Figure GDA0003562845290000307
Figure GDA0003562845290000308
wherein the content of the first and second substances,
Figure GDA0003562845290000309
wherein p isiGender variation is applied to the commodity;
Figure GDA00035628452900003010
applying gender descriptors to the commodity;
wherein, male, female and common are respectively characterized as the applicable gender of male, female and common;
R3female,R3male,R3commonrespectively are commodity sequencing result vectors suitable for women, men and general;
Figure GDA00035628452900003011
the result point values of n ordering suitable for female, male and general commodities are respectively.
Optionally, the apparatus further comprises a user gender matching module 380 for determining a user gender vector based on the user physiological gender and the user shopping tendency gender;
and a recommendation result output module 342, configured to determine a recommendation result of the commodity in the current period based on the commodity sorting result matrix and the user gender vector.
Optionally, the user gender matching module 380 is configured to:
determining the user gender vector based on the following formula:
Figure GDA0003562845290000311
wherein the content of the first and second substances,
Figure GDA0003562845290000312
wherein the content of the first and second substances,
Figure GDA0003562845290000313
is the physiological gender variable of user j;
Figure GDA0003562845290000314
a shopping propensity gender variable for user j;
wherein the content of the first and second substances,
Figure GDA0003562845290000315
Figure GDA0003562845290000316
wherein, the usersexjA physiological sex descriptor for user j;
Figure GDA0003562845290000317
the total number of times that the user j clicks on the commodity;
Figure GDA0003562845290000318
the number of times that the suitable gender is a female commodity is clicked for the user j;
Figure GDA0003562845290000319
total number of purchases of merchandise for user j;
Figure GDA00035628452900003110
the number of times of purchasing a commodity of which gender is suitable for women for the user j; gamma ray1、γ2、δ1、δ2Are respectively proportional coefficients;
correspondingly, the determining the commodity recommendation result of the current cycle based on the commodity sorting result matrix and the user gender vector includes:
determining a commodity recommendation based on the following formula:
Figure GDA00035628452900003111
wherein, output is a set formed by the commodity recommendation results; th (th)femaleA threshold for use with a female commodity; th (h)maleA threshold value for a suitable male commodity; other indicates other situations.
The device can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
The embodiment of the invention provides a platform commodity recommending device applied to a C2M mode, wherein the platform commodity recommending device comprises the commodity recommending device provided by the embodiment of the invention.
The embodiment of the invention provides a merchant commodity recommending device applied to a C2M mode, wherein the merchant commodity recommending device is provided by the embodiment of the invention.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes:
one or more processors 410, one processor 410 being illustrated in FIG. 4;
a memory 420;
the apparatus may further include: an input device 430 and an output device 440.
The processor 410, the memory 420, the input device 430 and the output device 440 of the apparatus may be connected by a bus or other means, for example, in fig. 4.
The memory 420 serves as a non-transitory computer-readable storage medium and may be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a method for recommending merchandise according to an embodiment of the present invention (e.g., the first determining module 310, the reinforcement learning module 320, the dual feedback loop module 330, and the second determining module 340 shown in fig. 3). The processor 410 executes software programs, instructions and modules stored in the memory 420 to execute various functional applications and data processing of the computer device, namely, to implement a product recommendation method of the above method embodiment, that is:
determining an inherent characteristic matrix of a current period of the commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period;
determining a commodity initial sequencing result vector of the current period based on the inherent characteristic matrix of the current period, the recommended and explored commodity score matrix of the current period and the reinforcement learning parameter vector of the current period;
determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result;
and determining the commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period. The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 420 may optionally include memory located remotely from processor 410, which may be connected to the terminal device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus. The output device 440 may include a display device such as a display screen.
An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a method for recommending an article according to an embodiment of the present invention:
determining an inherent characteristic matrix of a current period of the commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period;
determining a commodity initial sequencing result vector of the current period based on the inherent characteristic matrix of the current period, the recommended and explored commodity score matrix of the current period and the reinforcement learning parameter vector of the current period;
determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result;
and determining the commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period. Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (20)

1. A method for recommending an article, comprising:
determining an inherent characteristic matrix of a current period of the commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period;
determining a commodity initial sequencing result vector of the current period based on the inherent characteristic matrix of the current period, the recommended and explored commodity score matrix of the current period and the reinforcement learning parameter vector of the current period;
determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result;
determining a commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period;
the recommended explored commodity score matrix is determined based on the following modes:
forming a recommended exploration commodity vector based on the recommended exploration score value of each commodity;
forming a continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action score value of each commodity; determining the recommended exploration good score matrix based on the recommended exploration good vector and the recommended exploration good continuous action vector;
the recommended exploration score value is determined based on the following formula:
Figure FDA0003562845280000011
wherein e isenThe recommended exploration score value of the commodity n is shown, and phi is a first sales threshold value of the commodity; f. ofcvrnExposure conversion for commercial n; d is a radical ofpnThe purchase quantity value of the commodity n; disnThe value of the exploration value score of the commodity n;
Figure FDA0003562845280000012
the increment vector of the sales volume of the commodity n in the mth period relative to the previous period is obtained;
Figure FDA0003562845280000013
is the average of the delta vectors;
Figure FDA0003562845280000014
is the incremental vector variance; wherein the delta vector diffpur={diffpur1,diffpur2,......,diffpurn}。
2. The method of claim 1, wherein determining a reinforcement learning parameter vector for a current cycle comprises:
and determining a reinforcement learning parameter vector of the current period based on the intrinsic characteristic matrix of the last period of the commodity, the recommended exploration commodity score matrix of the last period and the reinforcement learning score vector of the last period.
3. The method of claim 2, wherein the determining the commodity recommendation result for the current cycle based on the commodity ranking intermediate result vector for the current cycle comprises:
forming a commodity sorting result matrix according to commodity applicable gender based on the commodity sorting intermediate result vector;
and determining the commodity recommendation result of the current round based on the commodity sequencing result matrix and the gender of the user.
4. The method of any of claims 1-3, wherein determining the intrinsic feature matrix comprises:
aggregating all the commodity exposure real-time message data, commodity click real-time message data and commodity order real-time message data in a period to obtain a real-time expression matrix of the commodity;
determining an intrinsic characteristic matrix of the commodity based on the real-time performance matrix of the commodity.
5. The method of claim 4, wherein determining the intrinsic characteristic matrix of the good based on the real-time performance matrix of the good comprises:
determining a click conversion rate vector of the commodity, an exposure conversion rate vector of the commodity, a good comment vector of the commodity and a discount index vector of the commodity based on the real-time performance matrix of the commodity;
forming an intrinsic feature matrix of the commodity based on the click conversion rate vector, the exposure conversion rate vector, the favorable score vector and the discount index vector;
the click conversion rate of each commodity forms a click conversion rate vector of the commodity;
the exposure conversion for each commodity forms an exposure conversion vector for the commodity;
the number of good scores of each commodity forms a good score vector of the commodity;
the discount indices for each item form a discount index vector for the item.
6. The method of claim 1, wherein the value of the exploratory value score for commodity n is determined based on the following formula:
Figure FDA0003562845280000031
wherein, beta1、β2、γ1、γ2Linear coefficients of the value fraction values are explored respectively; p is a radical of formulath1、pth2、pth3Respectively a second sales threshold, a third sales threshold and a fourth sales threshold of the commodity; epsilon is a minimum set value; wherein the recommended exploration sustained action score value is determined based on the following formula:
Figure FDA0003562845280000032
where ρ is the time decayA subtraction factor, ρ is greater than 0 and less than 1;
Figure FDA0003562845280000033
recommending and exploring a point value of the commodity n in the ith period; th (h)eSearching score values for preset recommendations; e.g. of the typeremnExploring a sustained action score value for the recommendation of commodity n; t is the maximum number of cycles that the search product is recommended to have a sustained effect on the score.
7. The method of claim 6, wherein forming a recommended exploration commodity vector based on the recommended exploration score values for each commodity comprises:
determining the recommended exploration commodity vector based on the following formula:
Ee={ee1,ee2,......,een}
wherein, EeExploring commodity vectors for the recommendations;
wherein, the forming of the continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action point value of each commodity comprises:
determining the recommended exploration commodity continuous action vector based on the following formula:
Erem={erem1,erem2,……,eremn}
wherein E isremExploring commodity persistence exposure vectors for the recommendations;
wherein the determining the recommended exploration commodity score matrix based on the recommended exploration commodity vector and the recommended exploration commodity continuous action vector comprises:
determining the recommended exploration commodity score matrix based on the following formula:
E={Ee,Erem}
wherein E is the recommended exploration commodity score matrix.
8. The method of claim 7, wherein the reinforcement learning score vector is determined based on the following formula:
Figure FDA0003562845280000041
wherein R is the reinforcement learning score vector;
wherein N ism=dc+dp;dcA click numerical vector for the commodity; dpThe sales quantity numerical vector of the commodity is obtained;
wherein the content of the first and second substances,
Figure FDA0003562845280000042
wherein Q is1The initial reward point value obtained by the 1 st period reinforcement learning; qmThe reward point value obtained by the m period reinforcement learning; deiExposure value of the commodity in the ith period;
wherein the content of the first and second substances,
Figure FDA0003562845280000043
wherein d iscmnRepresenting whether the commodity n is clicked or not in the mth period; when d iscmnCharacterization 0 indicates that in the mth cycle, item n is not clicked; when d iscmnCharacterizing that in the mth cycle, item n is clicked;
wherein d ispmnCharacterizing whether the commodity n is purchased or not in the mth period; when d isqmnCharacterization 0 in the mth cycle, item n is not purchased, when dqmnCharacterizing that at the mth cycle, item n is purchased;
wherein, the bonus1 is the reward point value after the commodity is clicked; bonus2 is the value of the prize score after the merchandise has been purchased.
9. The method of claim 8, wherein determining the reinforcement learning parameter vector of the current cycle based on the commodity intrinsic characteristic matrix of the previous cycle, the recommended explored commodity score matrix of the previous cycle, and the reinforcement learning score vector of the previous cycle comprises:
adjusting the reinforcement learning parameter vector based on the following formula:
Figure FDA0003562845280000051
wherein the content of the first and second substances,
Figure FDA0003562845280000052
wherein θ ═ { R, f ═cvr,fctcvr,fcom,fdis,Ee,Erem};
Wherein L (θ) ═ Lctcvr(θ)+Lcomment(θ);
Wherein W ═ { W ═ WR,WF,WE},WF={Wcvr,Wctcvr,Wcom,Wdis},WE={WEe,WErem};
Wherein, WiIs the weighted value of the ith dimension; wiIs an element in the reinforcement learning parameter vector W; c. CiBoundary constraint condition of ith dimension; theta is an input feature vector; r is a reinforcement learning score vector, fcvrAs a click conversion rate vector of the commodity, fctcvrAn exposure conversion vector for the commodity; f. ofcomThe evaluation number vector is the good evaluation number vector of the commodity; f. ofdisA discount index vector for the commodity; eeExploring a commodity vector for the recommendation; eremExploring commodity continuous action vectors for the recommendations; l (theta) is a multi-objective loss function; l isctcvr(θ) is an exposure conversion loss function; l is a radical of an alcoholcomment(θ) is the total merit loss function;
wherein, WRA corresponding weight value of the score vector R for reinforcement learning; w is a group ofFThe weight vector corresponding to the inherent characteristic matrix F; w is a group ofESearching a corresponding weight vector of the commodity score matrix E for recommendation; wcvrThe weight values are corresponding to the click conversion rate vectors; wctcvrWeight values corresponding to the exposure conversion rate vector;WcomThe weight value corresponding to the good comment number vector is obtained; wdisAnd the weight value is the weight value corresponding to the discount index vector.
10. The method of claim 9, wherein the determining the initial commodity ranking result vector of the current cycle based on the commodity intrinsic characteristic matrix of the current cycle, the recommended search commodity score matrix of the current cycle, and the reinforcement learning parameter vector of the current cycle comprises:
determining the initial commodity ordering result vector based on the following formula:
R1=concat(R,F,E)·W
wherein, R1 is the vector of the initial commodity ordering result; f is the inherent characteristic matrix, and W is the reinforcement learning parameter vector.
11. The method of claim 3, wherein the determining a commodity ranking intermediate result vector of a current cycle based on the commodity initial ranking result vector of the current cycle and the commodity actual sales ranking result comprises:
determining a feedback gain coefficient of the commodities based on the actual sales volume sorting result of the commodities;
and determining a commodity sequencing intermediate result vector of the current period based on the feedback gain coefficient of each commodity and the initial commodity sequencing result vector.
12. The method of claim 11, wherein determining a feedback gain factor for the commodity based on the actual sales volume ranking result comprises:
determining a feedback gain factor for the commodity based on the following equation:
Figure FDA0003562845280000071
wherein, buffernA feedback gain factor for commodity n; r1th1、R1th2Is the segmentation threshold in R1;
wherein, DF1、DF2、DF3Respectively are commodity sets segmented according to commodity sales volume; r1nIs the initial ranking result point value of the commodity n; dpnThe purchase quantity value of the commodity n;
determining a commodity sequencing intermediate result vector of a current period based on the feedback gain coefficient of each commodity and the commodity initial sequencing result vector, wherein the method comprises the following steps:
determining the commodity ordering intermediate result vector based on the following formula:
R2={R21,R22,……,R2n}
R2n=R1n×buffern
wherein R2 is the commodity ordering intermediate result vector; r2nSorting the intermediate result point value for the commodity n; r1 is the vector of the initial ordering result of the commodity.
13. The method of claim 12, wherein forming a commodity ranking result matrix according to commodity applicable gender based on the commodity ranking intermediate result vector comprises:
determining a commodity ordering result matrix based on the following formula:
R3={R3female,R3male,R3common}
wherein, R3 is the commodity sequencing result matrix;
wherein the content of the first and second substances,
Figure FDA0003562845280000081
or
Figure FDA0003562845280000082
Wherein the content of the first and second substances,
Figure FDA0003562845280000083
or
Figure FDA0003562845280000084
Wherein the content of the first and second substances,
Figure FDA0003562845280000085
wherein the content of the first and second substances,
Figure FDA0003562845280000086
Figure FDA0003562845280000087
Figure FDA0003562845280000088
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003562845280000089
wherein p isiApplying a gender variable to the commodity;
Figure FDA00035628452800000810
applying gender descriptors to the commodity;
wherein, the male, female and common characteristics of the applicable sex are male, female and common respectively;
R3female,R3male,R3commonrespectively are commodity sequencing result vectors suitable for women, men and the universities;
Figure FDA00035628452800000811
the result point values of n ordering suitable for female, male and general commodities are respectively.
14. The method of claim 13, wherein determining the commodity recommendation result for the current cycle based on the ranked intermediate result matrix of the commodities and the gender of the user comprises:
determining a user gender vector based on a user physiological gender and the user shopping tendency gender;
and determining a commodity recommendation result of the current period based on the commodity sequencing result matrix and the user gender vector.
15. The method of claim 14, wherein determining a user gender vector based on a user physiological gender and a user shopping propensity gender comprises:
determining the user gender vector based on the following formula:
Figure FDA0003562845280000091
wherein the content of the first and second substances,
Figure FDA0003562845280000092
wherein the content of the first and second substances,
Figure FDA0003562845280000093
a physiological sex variable for user j;
Figure FDA0003562845280000094
a shopping propensity gender variable for user j;
wherein the content of the first and second substances,
Figure FDA0003562845280000095
Figure FDA0003562845280000096
wherein, the usersexjA physiological sex descriptor for user j;
Figure FDA0003562845280000097
the total number of times that the user j clicks on the commodity;
Figure FDA0003562845280000098
the number of times that the suitable gender is a female commodity is clicked for the user j;
Figure FDA0003562845280000099
total number of purchases of merchandise for user j;
Figure FDA00035628452800000910
the number of times of purchasing a commodity of which gender is suitable for women for the user j; gamma ray1、γ2、δ1、δ2Are respectively proportional coefficients;
wherein the determining of the commodity recommendation result of the current cycle based on the commodity sorting result matrix and the user gender vector comprises:
determining a commodity recommendation based on the following formula:
Figure FDA00035628452800000911
wherein, output is a set formed by the commodity recommendation results; th (h)femaleA threshold for a suitable female commodity; th (h)maleA threshold value for a suitable male commodity; other characterizes other cases.
16. An article recommendation device, comprising:
the system comprises a first determining module, a second determining module and a judging module, wherein the first determining module is used for determining an inherent characteristic matrix of a current period of a commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period;
the reinforcement learning module is used for determining a commodity initial sequencing result vector of the current period based on the inherent characteristic matrix of the current period, the recommended exploration commodity score matrix of the current period and the reinforcement learning parameter vector of the current period;
the double feedback loop module is used for determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result;
the second determining module is used for determining the commodity recommendation result in the current period based on the commodity sequencing intermediate result vector in the current period;
the first determination module includes a recommendation exploration module to:
forming a recommended exploration commodity vector based on the recommended exploration score value of each commodity;
forming a continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action score value of each commodity;
determining a score matrix of the recommended exploration commodity based on the recommended exploration commodity vector and the continuous action vector of the recommended exploration commodity;
the recommended exploration score value is determined based on the following formula:
Figure FDA0003562845280000101
wherein e isenThe recommended exploration score value of the commodity n is shown, and phi is a first sales threshold value of the commodity; f. ofcvrnExposure conversion for commercial n; dpnThe purchase quantity value of the commodity n; disnThe value of the exploration value score of the commodity n;
Figure FDA0003562845280000102
the increment vector of the sales volume of the commodity n in the mth period relative to the previous period is obtained;
Figure FDA0003562845280000103
is the average of the delta vectors;
Figure FDA0003562845280000104
is the incremental vector variance; wherein the delta vector diffpur={diffpur1,diffpur2,......,diffpurn}。
17. A platform commodity recommendation device for use in a C2M mode, wherein the platform commodity recommendation device comprises the commodity recommendation device of claim 16.
18. A merchant merchandise recommender for use in the C2M mode, wherein the merchant merchandise recommender comprises the apparatus of claim 16.
19. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-15.
20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 15.
CN202110137907.2A 2020-12-23 2021-02-01 Commodity recommendation method and device, electronic equipment and storage medium Active CN112801743B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2020115414363 2020-12-23
CN202011541436 2020-12-23

Publications (2)

Publication Number Publication Date
CN112801743A CN112801743A (en) 2021-05-14
CN112801743B true CN112801743B (en) 2022-05-31

Family

ID=75813453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110137907.2A Active CN112801743B (en) 2020-12-23 2021-02-01 Commodity recommendation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112801743B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491377A (en) * 2018-03-06 2018-09-04 中国计量大学 A kind of electric business product comprehensive score method based on multi-dimension information fusion
CN110363617A (en) * 2019-06-03 2019-10-22 北京三快在线科技有限公司 A kind of recommended method, device, electronic equipment and readable storage medium storing program for executing
CN111325609A (en) * 2020-02-28 2020-06-23 京东数字科技控股有限公司 Commodity recommendation list determining method and device, electronic equipment and storage medium
US10776854B2 (en) * 2015-03-16 2020-09-15 Fujifilm Corporation Merchandise recommendation device, merchandise recommendation method, and program
CN111815413A (en) * 2020-07-09 2020-10-23 湖南数客星球信息技术有限公司 Big data commodity prediction system and method based on hot event
CN112036987A (en) * 2020-09-11 2020-12-04 杭州海康威视数字技术股份有限公司 Method and device for determining recommended commodities
CN112767069A (en) * 2020-12-31 2021-05-07 青岛海尔科技有限公司 Commodity recommendation method and device, storage medium and electronic device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10776854B2 (en) * 2015-03-16 2020-09-15 Fujifilm Corporation Merchandise recommendation device, merchandise recommendation method, and program
CN108491377A (en) * 2018-03-06 2018-09-04 中国计量大学 A kind of electric business product comprehensive score method based on multi-dimension information fusion
CN110363617A (en) * 2019-06-03 2019-10-22 北京三快在线科技有限公司 A kind of recommended method, device, electronic equipment and readable storage medium storing program for executing
CN111325609A (en) * 2020-02-28 2020-06-23 京东数字科技控股有限公司 Commodity recommendation list determining method and device, electronic equipment and storage medium
CN111815413A (en) * 2020-07-09 2020-10-23 湖南数客星球信息技术有限公司 Big data commodity prediction system and method based on hot event
CN112036987A (en) * 2020-09-11 2020-12-04 杭州海康威视数字技术股份有限公司 Method and device for determining recommended commodities
CN112767069A (en) * 2020-12-31 2021-05-07 青岛海尔科技有限公司 Commodity recommendation method and device, storage medium and electronic device

Also Published As

Publication number Publication date
CN112801743A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN111080413A (en) E-commerce platform commodity recommendation method and device, server and storage medium
EP4181026A1 (en) Recommendation model training method and apparatus, recommendation method and apparatus, and computer-readable medium
CN105159910A (en) Information recommendation method and device
CN109903086B (en) Similar crowd expansion method and device and electronic equipment
WO2019233077A1 (en) Ranking of business object
CN112508256B (en) User demand active prediction method and system based on crowdsourcing
CN112258260A (en) Page display method, device, medium and electronic equipment based on user characteristics
CN111949887A (en) Item recommendation method and device and computer-readable storage medium
CN113744017A (en) E-commerce search recommendation method and device, equipment and storage medium
US20220309562A1 (en) Intelligent listing creation for a for sale object
WO2023142520A1 (en) Information recommendation method and apparatus
CN111310038A (en) Information recommendation method and device, electronic equipment and computer-readable storage medium
CN111429214B (en) Transaction data-based buyer and seller matching method and device
CN113763019A (en) User information management method and device
CN113327151A (en) Commodity object recommendation method and device, computer equipment and storage medium
CN113763089A (en) Article recommendation method and device and computer-readable storage medium
CN112801743B (en) Commodity recommendation method and device, electronic equipment and storage medium
CN113495991A (en) Recommendation method and device
CN107357847B (en) Data processing method and device
CN112948701B (en) Information recommendation device, method, equipment and storage medium
CN113516496B (en) Advertisement conversion rate estimation model construction method, device, equipment and medium thereof
CN115204943A (en) Advertisement recall method, device, equipment and storage medium
CN113781134A (en) Item recommendation method and device and computer-readable storage medium
CN111626805B (en) Information display method and device
CN111666481A (en) Data mining method and device, computer readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant