CN112801743B

CN112801743B - Commodity recommendation method and device, electronic equipment and storage medium

Info

Publication number: CN112801743B
Application number: CN202110137907.2A
Authority: CN
Inventors: 王成庆; 成建勇; 赵巍
Original assignee: Zhuhai Necessary Industrial Technology Co ltd
Current assignee: Zhuhai Necessary Industrial Technology Co ltd
Priority date: 2020-12-23
Filing date: 2021-02-01
Publication date: 2022-05-31
Anticipated expiration: 2041-02-01
Also published as: CN112801743A

Abstract

The embodiment of the invention discloses a commodity recommendation method, a commodity recommendation device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining an inherent characteristic matrix of a current period of the commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period; determining a commodity initial sequencing result vector of the current period based on the inherent characteristic matrix of the current period, the recommended exploration commodity score matrix of the current period and the reinforcement learning parameter vector of the current period; determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result; and determining the commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period. The technical scheme provided by the embodiment of the invention has the advantages of high response speed to high-quality commodities or hot commodities, high recommended commodity attraction and high overall conversion rate.

Description

Commodity recommendation method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to a data processing technology, in particular to a commodity recommendation method, a commodity recommendation device, electronic equipment and a storage medium.

Background

With the continuous expansion of the electronic commerce scale, the number of commodity types and the number of commodities of each commodity type are rapidly increased, and due to the fact that the information amount is large, a user needs to spend a large amount of time to browse a large amount of irrelevant commodity information to find out a commodity which the user wants to purchase, a platform needs to give a good commodity recommendation result, and the requirement of the user for purchasing the commodity is met. In a C2M (Customer-to-Manufacturer) mode, not only the e-commerce platform will recommend commodities to the user, but also the Manufacturer, i.e., the merchant itself, will recommend commodities to the user, but the commodity recommendation method in the prior art is inaccurate in recommendation and low in efficiency.

Disclosure of Invention

The embodiment of the invention provides a commodity recommendation method, a commodity recommendation device, electronic equipment and a storage medium, which have higher response speed to high-quality commodities or hot commodities, enable a commodity recommendation result to be associated with the actual commodity sales volume, are beneficial to purchasing commodities by users, and have high commodity attractiveness and high overall conversion rate.

In a first aspect, an embodiment of the present invention provides a method for recommending a commodity, including:

determining an inherent characteristic matrix of a current period of the commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period;

determining a commodity initial sequencing result vector of the current period based on the inherent characteristic matrix of the current period, the recommended and explored commodity score matrix of the current period and the reinforcement learning parameter vector of the current period;

determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result;

and determining the commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period.

In a second aspect, an embodiment of the present invention further provides a commodity recommendation device, including:

the system comprises a first determining module, a second determining module and a judging module, wherein the first determining module is used for determining an inherent characteristic matrix of a current period of a commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period;

the reinforcement learning module is used for determining a commodity initial sequencing result vector of the current period based on the inherent characteristic matrix of the current period, the recommended exploration commodity score matrix of the current period and the reinforcement learning parameter vector of the current period;

the double feedback loop module is used for determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result;

and the second determining module is used for determining the commodity recommendation result of the current period based on the commodity sequencing intermediate result vector of the current period.

In a third aspect, an embodiment of the present invention further provides a merchant article recommendation device applied to the C2M model, where the merchant article recommendation device includes an article recommendation device according to the second aspect of the present invention.

In a fourth aspect, the embodiment of the present invention further provides a platform commodity recommendation device applied to the C2M model, wherein the platform commodity recommendation device includes a commodity recommendation device according to the second aspect of the present invention as described above.

In a fifth aspect, an embodiment of the present invention provides an electronic device, including:

one or more processors;

a storage device for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the methods provided by the embodiments of the present invention.

In a sixth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method provided by the present invention.

According to the technical scheme provided by the embodiment of the invention, the initial commodity sequencing result vector of the current period is determined through the intrinsic characteristic matrix based on the current period, the recommended exploration commodity score matrix of the current period and the reinforcement learning parameter vector of the current period, and the intermediate commodity sequencing result vector of the current period is determined through the initial commodity sequencing result vector based on the current period and the actual commodity sales sequencing result; and the commodity recommendation result of the current period is determined through the commodity sorting intermediate result vector of the current period, so that the commodity recommendation method has higher response speed on high-quality commodities or hot commodities, the commodity recommendation result has smaller difference with the actual commodity sales volume, the recommendation accuracy and the recommendation efficiency are improved, the commodity selection of a user is facilitated, the recommended commodity attraction force is high, and the overall conversion rate is high.

Drawings

FIG. 1a is a flow chart of a method for recommending merchandise according to an embodiment of the present invention;

fig. 1b is an exemplary diagram of aggregation processing performed on real-time message data according to an embodiment of the present invention;

fig. 2a is a flowchart of a commodity recommendation method according to an embodiment of the present invention;

fig. 2b is a flowchart of a method for recommending a commodity according to an embodiment of the present invention;

fig. 3a is a block diagram of a structure of a commodity recommending apparatus according to an embodiment of the present invention;

FIG. 3b is a block diagram of a merchandise recommendation device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Fig. 1a is a flowchart of an article recommendation method according to an embodiment of the present invention, where the method may be executed by an article recommendation apparatus, where the apparatus may be implemented by software and/or hardware, and the apparatus may be configured in an electronic device such as a computer, a server, and the like. Optionally, the method may be applied to a scene in which a new user performs commodity recommendation, for example, may be applied to a scene in which a new user performs commodity recommendation in an e-commerce platform, may also be applied to a scene in which a new user performs commodity recommendation in a merchant platform, and may also be applied to a scene in which a platform in an e-commerce platform performs new user commodity recommendation to a merchant, which is not specifically limited.

As shown in fig. 1a, the technical solution provided by the embodiment of the present invention includes:

s110: and determining an inherent characteristic matrix of the current period of the commodity, a recommended exploration commodity score matrix of the current period and a reinforcement learning parameter vector of the current period.

In the embodiment of the invention, the electronic device can set the time as one period, and can acquire the real-time message data of the commodity in each period from an electronic commerce website or a server, wherein the real-time message data of the commodity comprises commodity exposure real-time message data, commodity click real-time message data and commodity order real-time message data. Then, the real-time message data of the commodity in each period is processed, so as to obtain the inherent characteristic matrix of the commodity, wherein one period can be ten minutes, or other times.

In a real-time manner of the embodiment of the present invention, optionally, determining the inherent characteristic matrix includes:

aggregating all the commodity exposure real-time message data, commodity click real-time message data and commodity order real-time message data in a period to obtain a real-time expression matrix of the commodity; determining an intrinsic characteristic matrix of the commodity based on the real-time performance matrix of the commodity. The intrinsic characteristic matrix of the current period is obtained by performing aggregation processing on exposure real-time message data, commodity click real-time message data and commodity order real-time message data of all commodities in the current period.

The commodity exposure real-time message data may refer to: the data of the commodity displayed to the user through the platform is converted into real-time message queue data, for example, the number of times the commodity is displayed by the platform. The commodity click real-time message data may refer to: after the goods are displayed to the user through the platform, the behavior data clicked by the user is converted into real-time message queue data, for example, the data may be data for clicking to browse the goods, data for clicking to evaluate, and the like. The commodity order real-time message data may refer to: after the goods are displayed to the user through the platform, the data purchased by the user is converted into real-time message queue data, for example, order data of the goods. In an implementation manner of the embodiment of the present invention, optionally, the determining the inherent characteristic matrix of the commodity based on the real-time performance matrix of the commodity includes: determining a click conversion rate vector of the commodity, an exposure conversion rate vector of the commodity, a good comment vector of the commodity and a discount index vector of the commodity based on the real-time performance matrix of the commodity; forming an intrinsic feature matrix of the commodity based on the click conversion rate vector, the exposure conversion rate vector, the favorable score vector and the discount index vector;

the click conversion rate of each commodity forms a click conversion rate vector of the commodity; the exposure conversion for each commodity forms an exposure conversion vector for the commodity; the number of good scores of each commodity forms a good score vector of the commodity; the discount indices for each item form a discount index vector for the item.

In this embodiment, after aggregating all the real-time message data of the exposure of the goods, the real-time message data of the click of the goods, and the real-time message data of the order of the goods in one period (for an example of aggregating the real-time message data, refer to fig. 1b), the real-time representation matrix D of each goods is obtained by outputting according to the goods dimension, and the following data format is satisfied:

D＝{d_e，d_c，d_p}

d_e＝{d_e1，d_e2，......，d_en}

d_c＝{d_c1，d_c2，......，d_cn}

d_p＝{d_p1，d_p2，......，d_pn}

wherein d is_eReal-time exposure value vectors for the commodities; d_enReal-time exposure value of commodity n; d_cReal-time click numerical vectors for the commodities; d_cnReal-time click numerical values for the commodity n; d_pA real-time sales value vector of the commodity; d_pnThe real-time purchase value for item n. The inherent characteristic matrix F of the commodity meets the following data format:

F＝{f_cvr，f_ctcvr，f_com，f_dis}

f_cvr＝{f_cvr1，f_cvr2，......，f_cvrn}

f_ctcvr＝{f_ctcvr1，f_ctcvr2，......，f_ctcvrn}

f_com＝{f_com1，f_com2，......，f_comn}

f_dis＝{f_dis1，f_dis2，......，f_disn}

wherein f is_cvrn＝d_cn/d_en；f_ctcvrn＝d_pn/d_en；

Wherein the content of the first and second substances,

d_n＝(p_n-p′_n)/p_n

wherein, d_nA discount rate for commodity n; alpha (alpha) ("alpha")₁，α₂，α₃Weight values which are respectively discount indexes; th1, th2, th3 are threshold values of price intervals respectively; p is a radical of_nIs the price of commercial n, p'_nN-fold price for the commodity;

wherein f is_cvrThe click conversion rate vector of the commodity is obtained; f. of_cvrnThe click conversion rate of commodity n; f. of_ctcvrIs the exposure conversion vector of the commodity; f. of_cvrnExposure conversion for commodity n; f. of_comA good comment vector of the commodity; f. of_comnThe number of good scores of the commodity n; f. of_disA discount index vector for the commodity; f. of_disnIs the discount index for commodity n.

In the embodiment of the present invention, the reinforcement learning parameter vector may be set manually, or may be determined by other data.

In an implementation manner of the embodiment of the present invention, optionally, the determining the reinforcement learning parameter vector of the current period includes: and determining a reinforcement learning parameter vector of the current period based on the intrinsic characteristic matrix of the last period of the commodity, the recommended exploration commodity score matrix of the last period and the reinforcement learning score vector of the last period. The inherent feature matrix of the previous period may be determined by referring to the above-mentioned method of the inherent feature matrix. Each period corresponds to a recommended exploration commodity score matrix, and each period corresponds to a reinforcement learning score vector.

In an implementation manner of the embodiment of the present invention, optionally, the recommended explored commodity score matrix is determined based on the following manner: forming a recommended exploration commodity vector based on the recommended exploration score value of each commodity; forming a continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action score value of each commodity; and determining a score matrix of the recommended exploration commodities based on the vector of the recommended exploration commodities and the continuous action vector of the recommended exploration commodities.

Optionally, the recommended exploration score value is determined based on the following formula;

wherein e is_enThe recommended exploration score value of the commodity n is shown, and phi is a first sales threshold value of the commodity; f. of_cvrnExposure conversion for commercial n; d_pnThe purchase quantity value of the commodity n;

wherein the content of the first and second substances,

wherein dis_nThe value of the exploration value score of the commodity n; beta is a₁、β₂、γ₁、γ₂Linear coefficients of the value fraction values are explored respectively; p is a radical of_th1、p_th2、p_th3Respectively a second sales threshold, a third sales threshold and a fourth sales threshold of the commodity; wherein ε is a minimum set value, which may be taken to be 0.001.

Wherein the content of the first and second substances,

an increment vector of the sales quantity of the commodity n in the mth period relative to the previous period;

is the average of the delta vectors;

is the incremental vector variance; wherein the increment vector diff_pur＝{diff_pur1，diff_pur2，......，diff_purn}；

Wherein the content of the first and second substances,

wherein the recommended exploration sustained action score value is determined based on the following formula:

wherein rho is a time attenuation coefficient, and is more than 0 and less than 1;

recommending and exploring the score value of the commodity n in the ith period; th (h)_eSearching score values for preset recommendations; e.g. of the type_remnExploring a sustained action score value for the recommendation of commodity n; t is the maximum number of cycles that the search product is recommended to have a sustained effect on the score.

In this embodiment, optionally, the forming a recommended search commodity vector based on the recommended search score value of each commodity includes: determining the recommended exploration commodity vector based on the following formula:

E_e＝{e_e1，e_e2，......，e_en}

wherein E is_eExploring commodity vectors for the recommendations;

correspondingly, the method for forming the continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action point value of each commodity comprises the following steps: determining the recommended exploration commodity continuous action vector based on the following formula:

E_rem＝{e_rem1，e_rem2，......，e_remn}

wherein E is_remExploring commodity persistence exposure vectors for the recommendations;

correspondingly, the determining a score matrix of recommended exploration commodities based on the vector of recommended exploration commodities and the continuous action vector of recommended exploration commodities comprises: determining the recommended exploration commodity score matrix based on the following formula:

E＝{E_e，E_rem}

wherein E is the recommended exploration commodity score matrix.

For the method for determining the recommended search commodity score matrix in each period, the above method may be referred to.

In an implementation manner of the embodiment of the present invention, optionally, the reinforcement learning score vector is determined based on the following formula:

wherein R is the reinforcement learning score vector;

wherein N is_m＝d_c+d_p；d_cReal-time click numerical vectors for the commodities; d is a radical of_pA real-time sales value vector of the commodity;

wherein the content of the first and second substances,

wherein Q is₁The initial reward point value obtained by the 1 st period reinforcement learning; q_mThe reward point value obtained by the reinforcement learning of the mth period; d_eiExposure value of the commodity in the ith period;

wherein the content of the first and second substances,

wherein, d_cmnRepresenting whether the commodity n is clicked or not in the mth period; when d is_cmnCharacterization 0 indicates that in the mth cycle, item n is not clicked; when d is_cmnCharacterizing that in the mth cycle, item n is clicked;

wherein, d_pmnCharacterizing whether the commodity n is purchased or not in the mth period; when d is_qmnCharacterization 0 in the mth cycle, item n is not purchased, when d_gmnCharacterizing that at the mth cycle, item n is purchased; wherein, the bonus1 is the reward point value after the commodity is clicked; bonus2 is the value of the prize score after the merchandise has been purchased. The Bandit function is a function used for evaluating reward points of click and purchase actions in reinforcement learning. In an implementation manner of the embodiment of the present invention, optionally, the determining a reinforcement learning parameter vector of a current cycle based on the commodity intrinsic feature matrix of the previous cycle, the recommended search commodity score matrix of the previous cycle, and the reinforcement learning score vector of the previous cycle includes:

adjusting the reinforcement learning parameter vector based on the following formula:

wherein the content of the first and second substances,

wherein θ ═ R, f_cvr，f_ctcvr，f_com，f_dis，E_e，E_rem}；

Wherein L (θ) ═ L_ctcvr(θ)+L_comment(θ)；

Wherein W ═ { W ═ W_R，W_F，W_E}，W_F＝{W_cvr，W_ctcvr，W_com，W_dis}，W_E＝ {W_Ee，W_Erem}；

Wherein, W_iIs the weighted value of the ith dimension; w_iIs an element in the reinforcement learning parameter vector W; c. C_iBoundary constraint condition of ith dimension; theta is an input feature vector; r is a reinforcement learning score vector, f_cvrAs a click conversion vector for the good, f_ctcvrAn exposure conversion vector for the commodity; f. of_comThe evaluation number vector is the good evaluation number vector of the commodity; f. of_disA discount index vector for the commodity; e_eExploring commodity vectors for the recommendations; e_remExploring commodity continuous action vectors for the recommendations; l (theta) is a multi-objective loss function; l is_ctcvr(θ) is an exposure conversion loss function; l is_comment(θ) is the total merit loss function;

wherein, W_RA corresponding weight value of the score vector R for reinforcement learning; w_FThe weight vector corresponding to the inherent characteristic matrix F; w_ESearching a corresponding weight vector of the commodity score matrix E for recommendation; w_cvrThe weight values are corresponding to the click conversion rate vectors; w_ctcvrThe weight value is corresponding to the exposure conversion rate vector; w_comThe weight value corresponding to the good score number vector is obtained; w is a group of_disAnd the weight value is the weight value corresponding to the discount index vector.

Inputting the commodity intrinsic characteristic matrix F of the previous period, the recommended and explored commodity score matrix E of the previous period and the reinforcement learning score vector R of the previous period into the automatic parameter adjusting module, and calculating the reinforcement learning parameter vector W by the automatic parameter adjusting module; the weight values of all the matrixes or vectors in the reinforcement learning algorithm can be determined, so that the initial commodity sorting score value in the subsequent output initial commodity sorting result vector is close to the actual high-income direction as much as possible, and the income generated by commodity recommendation can be the highest by inputting the initial commodity sorting score value into the automatic parameter adjusting module for weight value adjustment of a plurality of factors.

The automatic parameter adjusting module can adjust parameters through a multi-target supervised learning algorithm, comprises two targets of improving exposure conversion rate and total commodity goodness, and is realized through constructing a loss function. The automated parameter tuning module adjusts parameters or may also be implemented by a multi-objective evolutionary algorithm of a Vector Evaluation Genetic Algorithm (VEGA), a multi-objective genetic algorithm (MOGA), a non-dominated sorting genetic algorithm (NSGA), a non-dominated sorting genetic algorithm with elite strategies (NSGA 2).

S120: and determining the initial commodity sorting result vector of the current period based on the intrinsic characteristic matrix of the current period, the recommended and explored commodity score matrix of the current period and the reinforcement learning parameter vector of the current period.

In this embodiment, the intrinsic feature matrix F of the current period, the recommended search commodity score matrix E of the current period, and the reinforcement learning parameter vector W of the current period may be input to the reinforcement learning module, and the reinforcement learning module may calculate a commodity preliminary ranking result vector R1 through a dobby gambling machine algorithm; the algorithm in the reinforcement learning module can also be an epsilon dobby gambling machine method, a reinforcement learning method of a gradient strategy, a Monte Carlo tree searching method and the like.

In an implementation manner of the embodiment of the present invention, optionally, the determining the initial commodity ranking result vector of the current period based on the commodity intrinsic feature matrix of the current period, the recommended and explored commodity score matrix of the current period, and the reinforcement learning parameter vector of the current period includes:

determining the initial commodity ordering result vector based on the following formula:

R1＝concat(R，F，E)·W

wherein, R1 is the vector of the initial commodity ordering result; f is the inherent characteristic matrix, and W is the reinforcement learning parameter vector. The concat function is a function that combines vectors with the same dimension into a matrix according to columns.

Wherein, the reinforcement learning score vector can be calculated by the reinforcement learning module, wherein, the calculation process can be as follows

Wherein the content of the first and second substances,

wherein R is the reinforcement learning score vector;

wherein the content of the first and second substances,

wherein Q is₁The initial reward point value obtained by the 1 st period reinforcement learning; q_mThe reward point value obtained by reinforcement learning for the mth period; d is a radical of_eiExposure value of the commodity in the ith period; wherein the mth cycle may be the current cycle;

wherein, the first and the second end of the pipe are connected with each other,

wherein d is_cmnRepresenting whether the commodity n is clicked or not in the mth period; when d is_cmnCharacterization 0 indicates that in the mth cycle, item n is not clicked; when d is_cmnCharacterizing that in the mth cycle, item n is clicked;

wherein d is_pmnCharacterizing whether the commodity n is purchased or not in the mth period; when d is_qmnCharacterization 0 in the mth cycle, item n is not purchased, when d_qmnCharacterizing that at the mth cycle, item n is purchased; wherein, the bonus1 is the reward point value after the commodity is clicked; bonus2 is the value of the prize score after the merchandise has been purchased. It should be noted that the high-quality commodities in the current cycle can be searched by determining the score matrix of the recommended exploration commodities, so that the high-quality commodities quickly rise to the head of the initial sequencing result, and meanwhile, the continuous action scores of the recommended exploration commodities are introduced by considering the action time of the high-quality commodities, so that the high-quality commodities act for a plurality of cycles. And introducing an exploration value point value as a parameter for evaluating the exploration value of the commodity in consideration of the fact that the high-quality commodity rapidly ascends to the extent that the high-quality commodity should not contain the hot commodity which is already at the head of the initial sequencing result.

S130: and determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result.

In the embodiment of the invention, the commodity actual sales volume sorting result can be the actual sales volume sorting result of all commodities in the current period and is used as the commodity actual sales volume sorting result DF; or inputting the real-time expression matrix D of the current period into the data screening processing module, sorting according to the decreasing order of the sales volume, selecting all the commodities of the first X names and the corresponding sales volume, and outputting the actual sales volume sorting result of the X names before the sales volume as the actual sales volume sorting result DF of the commodities.

In the embodiment of the invention, the commodity initial sorting result vector R1 and the commodity actual sales sorting result DF of the current period are input into the double feedback loop module, and a commodity sorting intermediate result vector R2 is output. Specifically, commodities with low actual sales but ranked first in R1 may be de-weighted (the feedback gain factor may be decreased), commodities with high actual sales but ranked later in R1 may be weighted (the feedback gain factor may be increased), and commodities with high actual sales may be ranked to the head of R1, and the head of R1 does not contain commodities with low sales.

In an implementation manner of the embodiment of the present invention, optionally, the determining a commodity sorting intermediate result vector in the current cycle based on the commodity initial sorting result vector in the current cycle and the commodity actual sales sorting result includes: determining a feedback gain coefficient of the commodities based on the actual sales volume sorting result of the commodities; and determining a commodity sequencing intermediate result vector of the current period based on the feedback gain coefficient of each commodity and the commodity initial sequencing result vector.

In an implementation manner of the embodiment of the present invention, optionally, the feedback gain coefficient of the commodity is determined based on the following formula:

wherein, buffer_nA feedback gain factor for commodity n; r1_th1、R1_th2Is the segmentation threshold in R1;

wherein, DF₁、DF₂、DF₃Respectively are commodity sets segmented according to commodity sales volume; r1_nIs the initial sorting result score value of the commodity n; d_pnThe sales value of the commodity n is shown; wherein the content of the first and second substances,

DF₁＝{d_p11，......，d_p1r}，d_pn＞d_pth1

DF₂＝{d_p21，......，d_p2s}，d_pth1≥d_pn＞d_pth2

DF₃＝{d_p31，......，d_p3u}，d_pth3≥d_pn＞d_pth3

DF＝{DF₁，DF₂，DF₃}；

r+s+u＝X

wherein d is_pth1、d_pth2、d_pth3A threshold value for commodity sales; r, s and u are the commodity numbers corresponding to the three commodity sets respectively; and X is the number of the commodities in the commodity sales actual sequencing result.

In this embodiment, optionally, the determining a commodity sorting intermediate result vector in the current period based on the feedback gain coefficient of each commodity and the commodity initial sorting result vector includes:

determining the commodity ordering intermediate result vector based on the following formula:

R2＝{R2₁，R2₂，......，R2_n}

R2_n＝R1_n×buffer_n

wherein R2 is the commodity ordering intermediate result vector; r2_nThe intermediate result point value is sorted for commodity n.

S140: and determining the commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period.

In an implementation manner of this embodiment, optionally, the determining the commodity recommendation result of the current round based on the commodity sorting intermediate result vector of the current round includes: forming a commodity sorting result matrix according to commodity applicable gender based on the commodity sorting intermediate result vector; and determining the commodity recommendation result of the current round based on the commodity sequencing result matrix and the gender of the user.

In this embodiment of the present invention, optionally, forming a commodity sorting result matrix according to the commodity applicable gender based on the commodity sorting intermediate result vector includes: determining a commodity ordering result matrix based on the following formula:

R3＝{R3_female，R3_male，R3_common}

wherein, R3 is the commodity ordering result matrix;

wherein the content of the first and second substances,

p_i1 or p_i＝-1，

Wherein the content of the first and second substances,

p_i0, or p_i＝-1，

Wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein p is_iGender variation is applied to the commodity;

applying gender descriptors to the commodity;

wherein, male, female and common are respectively characterized as the applicable gender of male, female and common;

R3_female，R3_male，R3_commonrespectively are commodity sequencing result vectors suitable for women, men and the universities;

the ranking result point values of the commodity n suitable for women, men and general use are respectively.

In an implementation manner of the embodiment of the present invention, optionally, the determining a gender vector of the user based on the physiological gender of the user and the shopping tendency gender of the user includes: determining the user gender vector based on the following formula:

wherein, U_sexA user gender vector;

wherein the content of the first and second substances,

is the physiological gender variable of user j;

a shopping propensity gender variable for user j;

wherein, user_sexjA physiological sex descriptor for user j;

the total number of times that the user j clicks on the commodity;

the number of times that the suitable gender is a female commodity is clicked for the user j;

total number of purchases of merchandise for user j;

the number of times of purchasing a commodity of which gender is suitable for women for the user j; gamma ray₁、γ₂、δ₁、δ₂Are respectively proportional coefficients;

correspondingly, the determining the commodity recommendation result of the current cycle based on the commodity sorting result matrix and the user gender vector includes: determining a commodity recommendation based on the following formula:

wherein, output is a set formed by the commodity recommendation results; th (h)_femaleA threshold for use with a female commodity; th (th)_maleA threshold value for a suitable male commodity; other indicates other situations. th (h)_femaleAnd th_maleCan be preset manually。

In the related art, after acquiring new user information, a platform generally needs to recommend a commodity to a new user by combining commodity characteristics and user requirements; however, the new user is easy to lose due to the characteristics of less behaviors, no obvious direction of shopping demand, unfamiliarity with the e-commerce platform function and the like. Therefore, it is necessary to make proper commodity recommendation for new users with less information, and currently, the commodity recommendation cold start solution for new users generally includes the following three types:

the first method is to count the sales of all commodities of the whole recent platform to obtain the sales ranking list of all commodities; and then, screening the commodities at the head of the ranking list according to the applicable gender of the commodities, and recommending the commodities for male/female/users with unknown gender.

The second is to linearly integrate multiple commodity features. Specifically, the commodity characteristics include sales volume, click conversion rate, exposure conversion rate, category to which the commodity belongs, goodness, and the like. The linear integrated formula is:

wherein S_nThe final ranking score characterizing the good n, k the number of characteristic categories characterizing the good,

the ith characteristic score, α, representing item n_iAnd representing the characteristic weight value corresponding to the characteristic i. Typically, the feature weight values are calculated by an artificial learning method (such as a neural network or a support vector machine).

The third is to calculate the commodity ranking using a reinforcement learning method, such as a dobby method or a gradient strategy method. The reinforcement learning method can learn which commodities are more popular in the platform in real time, and the sequence of the commodities is promoted forwards.

However, the above solutions have poor real-time performance, and the first and second solutions have the drawback that statistics of various performances and characteristics of the product is generally required on a daily basis. The current commercial product purchasing hotspot occurring in real time cannot be quickly responded. For example, day 18: when 00 hours, a certain commodity A in the platform gets a lot of attention, the click rate and the purchase rate are increased dramatically, the first scheme and the second scheme need to be counted by taking days as units, and the current commodity purchasing hotspot cannot be tracked quickly. According to the technical scheme provided by the embodiment of the invention, the current commodity purchasing hotspot can be quickly tracked through the reinforcement learning algorithm in the reinforcement learning module, the commodity sequence is quickly increased, the commodity sequence can be calculated in real time, the real-time performance is strong, the response is quick, and the current commodity purchasing hotspot is quickly tracked.

According to the technical scheme in the related technology, the commodity recommendation result is not controlled, and the difference between the commodity recommendation result and the commodity sales amount in the past day is large. Among others, the third above solution has this drawback. If a certain commodity is clicked in a large amount in a short time, but the actual sales volume is not satisfactory or the number of bad comments is large, the commodity is likely to be promoted to the head of the sequencing result, which is not favorable for the user to select the commodity and can cause the platform conversion rate to be reduced. According to the embodiment of the invention, the commodity sequencing intermediate result vector is determined by the commodity initial sequencing result vector and the commodity actual sales quantity sequencing result, so that the commodity recommendation result is determined according to the commodity sequencing intermediate result vector, that is, the reinforcement learning result can be fed back and optimized by utilizing the commodity actual sales quantity condition, commodities with high actual sales quantity but low reinforcement learning sequencing result sequencing can be moved upwards, and otherwise, the commodity sequencing is moved downwards, so that hot commodities or high-quality commodities can be recommended to a user.

The technical scheme in the three related technologies has low response speed to high-quality goods or hot-spot goods. Specifically, when a certain commodity gets attention of a large number of users in a short time in a new process or after being put on shelf for a period of time, the click rate and the sales rate of the commodity are greatly increased, and the commodity can be regarded as a high-quality commodity or a hot commodity. The three schemes have the defect of slow reaction to high-quality commodities or hot commodities, and particularly the high-quality commodities or hot commodities on the same day cannot well move up the sequence of the high-quality commodities or the hot commodities quickly. According to the embodiment of the invention, the commodity recommendation and exploration score matrix is determined, and the initial commodity sorting result vector is determined based on the commodity recommendation and exploration score matrix and other parameters, so that a commodity recommendation result is obtained, namely, the commodity is recommended and explored in the commodity recommendation process, high-quality commodities or hot commodities can be searched and quickly raised to the head of the sorting result, and the problem of low response speed to the high-quality commodities or the hot commodities is solved.

In the related art, the first and second schemes in the above related art are poorly adaptable. The general needs of general users vary widely, and the types of hot commodities change continuously along with the change of seasons, weather, the dominant age and the dominant gender in the guest group. The first scheme and the second scheme of manually setting parameters have difficulty in tracking large-scale demand changes over a long time span. According to the embodiment of the invention, the reinforcement learning parameter vector of the current period is determined through the intrinsic characteristic matrix of the previous period, the recommended exploration commodity score matrix of the previous period and the reinforcement learning score vector of the previous period, namely, each period, and the adaptability to response speed and environment can be strong by adjusting the reinforcement learning parameters (adjusting the weight value of the matrix or the vector), so that high user attraction and conversion rate can be kept for recommendation scenes of different conditions.

According to the technical scheme provided by the embodiment of the invention, the initial commodity sorting result vector of the current period is determined by the intrinsic characteristic matrix of the current period, the recommended and explored commodity score matrix of the current period and the reinforcement learning parameter vector of the current period, and the intermediate commodity sorting result vector of the current period is determined by the initial commodity sorting result vector of the current period and the actual commodity sales sorting result; and the commodity recommendation result of the current period is determined through the commodity sorting intermediate result vector of the current period, so that the commodity recommendation method has higher response speed to high-quality commodities or hot commodities, the commodity recommendation result has smaller difference with the actual commodity sales volume, the commodity purchasing by users is facilitated, the recommended commodity attraction is high, and the overall conversion rate is high.

Fig. 2a is a flowchart of a commodity recommendation method according to an embodiment of the present invention, in the embodiment, determination of a reinforcement learning parameter vector is optimized, and as shown in fig. 2a, a technical solution according to the embodiment of the present invention includes:

s210: and determining the intrinsic characteristic matrix of the current period of the commodity.

S220: determining a reinforcement learning parameter vector of the current period based on the intrinsic characteristic matrix of the last period of the commodity, the score matrix of the recommended and explored commodity of the last period and the reinforcement learning score vector of the last period

S230: and determining the recommended exploration commodity score matrix of the current period.

S240: and determining the initial commodity sorting result vector of the current period based on the intrinsic characteristic matrix of the current period, the recommended and explored commodity score matrix of the current period and the reinforcement learning parameter vector of the current period.

S250: and determining a commodity sequencing intermediate result vector of the current period based on the commodity initial sequencing result vector of the current period and the commodity actual sales quantity sequencing result.

S260: and forming a commodity sequencing result matrix according to the commodity applicable gender based on the commodity sequencing intermediate result vector.

S270: and determining the commodity recommendation result of the current round based on the commodity sequencing result matrix and the gender of the user.

Reference is made to the above embodiments for the introduction of S210-S270.

In order to describe the technical solution of the present invention in more detail, as shown in fig. 2b, the technical solution provided by the embodiment of the present invention includes the following steps:

step 1: and aggregating all the commodity exposure real-time message data, the commodity click real-time message data and the commodity order real-time message data in the last period T, and then outputting according to the commodity dimension.

Step 2: and (3) judging whether the real-time data processing in the step (1) is finished, if so, entering the next step, if not, processing is abnormal, and interrupting and exiting.

The mode for judging whether the real-time data processing of the commodity is finished comprises the following steps:

the exposure real-time message data, the click real-time message data and the order real-time message data of all the commodities in the last period T are processed and calculated, and data are not missed; and whether the processed data conforms to the following data format:

D＝{d_e，d_c，d_p}

d_e＝{d_e1，d_e2，......，d_en}

d_c＝{d_c1，d_c2，......，d_cn}

d_p＝{d_p1，d_p2，......，d_pn}

if the condition is not met, the real-time message data processing is judged to be not finished, and the processing process is interrupted. And forming a real-time expression matrix D from the processed real-time data, and forming an inherent characteristic matrix F of the previous round based on the real-time expression matrix.

And step 3: inputting the inherent characteristic matrix F of the commodity in the previous period, the score matrix E of the recommended explored commodity in the calculation result of the previous period and the score vector R of the reinforcement learning in the previous period into an automatic parameter adjusting module; and the reinforced learning parameter vector W is automatically calculated by an automatic parameter adjusting module.

And 4, step 4: and calculating a recommended exploration commodity score matrix E of the current period by the recommended exploration module according to the real-time performance matrix D of the current period of each commodity.

And 5: inputting the inherent characteristic matrix F of the commodities, the reinforcement learning parameter vector W and the recommended exploration commodity score matrix E of the current period into a reinforcement learning module, and calculating an initial commodity sequencing result vector R1 by the reinforcement learning module.

Step 6: and forming a real-time expression matrix D from the real-time data processed in the current period, inputting the real-time expression matrix D into a data screening processing module, sorting the commodities according to the decreasing order of sales volume, selecting all the commodities with the name of X before the sales volume and the corresponding sales volume, and outputting an actual sales volume sorting result DF of the commodities with the name of X before the sales volume.

And 7: and judging whether the data screening is finished or not. The conditions for judging the completion of the data screening comprise: (1) screening all the commodity real-time expression matrixes D; (2) the actual sales ranking result DF after screening complies with the following data format:

DF＝{DF₁，DF₂，DF₃}

DF₁＝{d_p11，......，d_p1r}，d_pn＞d_pth1

DF₂＝{d_p21，......，d_p2s}，d_pth1≥d_pn＞d_pth2

DF₃＝{d_p31，......，d_p3u}，d_pth3≥d_pn＞d_pth3

if the condition is not met, the data screening is not finished, and the processing process is interrupted.

And 8: inputting the initial commodity sorting result vector R1 and the actual commodity sales sorting result DF into a double feedback loop module for sorting, and outputting a commodity sorting intermediate result vector R2.

And step 9: and sorting the commodity sorting intermediate result vector R2 into a commodity sorting result matrix R3 according to the commodity applicable gender.

Step 10: and calculating a user gender vector Usex according to the physiological gender and the shopping tendency gender of the user.

Step 11: and outputting a final commodity recommendation result recommended for the specific user according to the user gender vector Ulex and the commodity sequencing result matrix R3.

Step 12: and after the calculation of the current round is finished, recording the intermediate variable, recommending and exploring the commodity score matrix E and the score vector R of reinforcement learning, and providing data basis for the next round of calculation.

Fig. 3a is a block diagram of a structure of a product recommendation device according to an embodiment of the present invention, and as shown in fig. 3a, the device according to the embodiment of the present invention includes: a first determination module 310, a reinforcement learning module 320, a dual feedback loop module 330, and a second determination module 340.

The first determining module 310 is configured to determine an inherent feature matrix of a current period of a commodity, a recommended exploration commodity score matrix of the current period, and a reinforcement learning parameter vector of the current period;

the reinforcement learning module 320 is configured to determine a commodity initial sequencing result vector of the current period based on the inherent feature matrix of the current period, the recommended exploration commodity score matrix of the current period, and the reinforcement learning parameter vector of the current period;

a double feedback loop module 330, configured to determine a commodity sorting intermediate result vector in the current period based on the commodity initial sorting result vector in the current period and the commodity actual sales sorting result;

and the second determining module 340 is configured to determine the commodity recommendation result in the current period based on the commodity sorting intermediate result vector in the current period.

Here, the article recommendation device may be a user device or a server capable of implementing article recommendation, or may be a device in which both the user device and the server are integrated. In particular embodiments, user devices include, but are not limited to, smart phones, tablets, wearable devices, devices capable of interacting through speech, and other electronic products. In particular embodiments, the server includes, but is not limited to, implementations such as a network host, a single network server, a collection of network servers, or a cloud-computing-based computer collection. Here, the Cloud is made up of a large number of hosts or web servers based on Cloud Computing (Cloud Computing), which is a type of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. It will be understood by those skilled in the art that the above-described article recommendation device is merely exemplary, and other existing or future article recommendation devices may be adapted to the present invention and are included within the scope of the present invention and are hereby incorporated by reference. Here, the commodity recommending apparatus includes an electronic device capable of automatically performing numerical calculation and information processing according to instructions set or stored in advance, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

As shown in fig. 3b, based on the above embodiment, optionally, the first determining module 310 includes an automatic parameter adjusting module 350, configured to determine a reinforcement learning parameter vector of the current period based on the intrinsic feature matrix of the previous period of the item, the recommended search item score matrix of the previous period, and the reinforcement learning score vector of the previous period.

Optionally, the second determining module 340 includes an output sorting module 341 and a recommendation result output module 342;

the output sorting module 341 is configured to form a commodity sorting result matrix according to the commodity applicable gender based on the commodity sorting intermediate result vector;

and a recommendation result output module 342, configured to determine a recommendation result of the current round of commodities based on the commodity ranking result matrix and the user gender.

Optionally, the first determining module 310 includes a data processing module 360, configured to aggregate all of the commodity exposure real-time message data, the commodity click real-time message data, and the commodity order real-time message data in one period to obtain a real-time expression matrix of the commodity;

determining an intrinsic characteristic matrix of the commodity based on the real-time performance matrix of the commodity.

Optionally, the determining an inherent characteristic matrix of the commodity based on the real-time performance matrix of the commodity includes:

determining a click conversion rate vector of the commodity, an exposure conversion rate vector of the commodity, a good comment vector of the commodity and a discount index vector of the commodity based on the real-time performance matrix of the commodity;

forming an intrinsic feature matrix of the commodity based on the click conversion rate vector, the exposure conversion rate vector, the favorable rating vector and the discount index vector;

the click conversion rate of each commodity forms a click conversion rate vector of the commodity;

the exposure conversion for each commodity forms an exposure conversion vector for the commodity;

the number of good scores of each commodity forms a good score vector of the commodity;

the discount indices for each item form a discount index vector for the item.

Optionally, the first determining module 310 includes a recommendation exploration module 370, configured to:

forming a recommended exploration commodity vector based on the recommended exploration score value of each commodity;

forming a continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action score value of each commodity;

and determining a score matrix of the recommended exploration commodities based on the vector of the recommended exploration commodities and the continuous action vector of the recommended exploration commodities.

wherein e is_enThe recommended exploration score value of the commodity n is shown, and phi is a first sales threshold value of the commodity; f. of_cvrnExposure conversion for commercial n; d is a radical of_pnThe purchase quantity value of the commodity n;

wherein the content of the first and second substances,

wherein dis_nIs the value of the exploration value score of the commodity n; beta is a₁、β₂、γ₁、γ₂Linear coefficients of the value fraction values are explored respectively; p is a radical of formula_th1、p_th2、p_th3Respectively a second sales threshold, a third sales threshold and a fourth sales threshold of the commodity;

wherein the content of the first and second substances,

the increment vector of the sales volume of the commodity n in the mth period relative to the previous period is obtained;

is the average of the delta vectors;

is the incremental vector variance; wherein the delta vector diff_pur＝{diff_pur1，diff_pur2，......，diff_purn}；

recommending and exploring a point value of the commodity n in the ith period; th (h)_eA preset recommended exploration score value; e.g. of a cylinder_remnExploring a sustained action score value for the recommendation of commodity n; t is the maximum number of cycles that the search product is recommended to have a sustained effect on the score.

Optionally, the forming a recommended exploration commodity vector based on the recommended exploration score value of each commodity includes:

determining the recommended exploration commodity vector based on the following formula:

E_e＝{e_e1，e_e2，......，e_en}

wherein E is_eExploring commodity vectors for the recommendations;

correspondingly, the method for forming the continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action point value of each commodity comprises the following steps:

determining the recommended exploration commodity continuous action vector based on the following formula:

E_rem＝{e_rem1，e_rem2，......，e_remn}

correspondingly, the determining a score matrix of the recommended exploration commodity based on the vector of the recommended exploration commodity and the continuous action vector of the recommended exploration commodity comprises:

determining the recommended exploration commodity score matrix based on the following formula:

E＝{E_e，E_rem}

wherein E is the recommended exploration commodity score matrix.

Optionally, the reinforcement learning score vector is determined based on the following formula:

wherein R is the reinforcement learning score vector;

wherein, N_m＝d_c+d_p；d_cA click numerical vector for the commodity; d is a radical of_pThe sales quantity numerical vector of the commodity is obtained;

wherein the content of the first and second substances,

wherein Q is₁The initial reward point value obtained by the 1 st period reinforcement learning; q_mThe reward point value obtained by the m period reinforcement learning; d_eiExposure value of the commodity in the ith period;

wherein the content of the first and second substances,

wherein d is_pmnCharacterizing whether the commodity n is purchased or not in the mth period; when d is_qmnCharacterization 0 in the mth cycle, item n is not purchased, when d_qmnCharacterizing that at the mth cycle, item n is purchased;

wherein, the bonus1 is the reward point value after the commodity is clicked; the bonus points value of bouquet 2 is the value of the bonus points after the merchandise is purchased.

Optionally, the automation parameter adjustment module 350 is configured to:

wherein the content of the first and second substances,

wherein θ ═ { R, f ═_cvr，f_ctcvr，f_com，f_dis，E_e，E_rem}；

Wherein L (θ) ═ L_ctcvr(θ)+L_comment(θ)；

Wherein W is { W ═ W_R，W_F，W_E}，W_F＝{W_cvr，W_ctcvr，W_com，W_dis}，W_E＝ {W_Ee，W_Erem}；

Wherein, W_iIs the weighted value of the ith dimension; w_iIs an element in the reinforcement learning parameter vector W; c. C_iBoundary constraint condition of ith dimension; theta is an input feature vector; r is a reinforcement learning score vector, f_cvrAs a click conversion vector for the good, f_ctcvrAn exposure conversion vector for the commodity; f. of_comThe evaluation number vector is the good evaluation number vector of the commodity; f. of_disA discount index vector for the commodity; e_eExploring commodity vectors for the recommendations; e_remExploring commodity continuous action vectors for the recommendations; l (theta) is a multi-objective loss function; l is a radical of an alcohol_ctcvr(θ) is an exposure conversion loss function; l is_comment(θ) is the total merit loss function;

wherein, W_RA corresponding weight value of the score vector R for reinforcement learning; w_FThe weight vector corresponding to the inherent characteristic matrix F; w_ESearching a corresponding weight vector of the commodity score matrix E for recommendation; w_cvrThe weight values are corresponding to the click conversion rate vectors; w_ctcvrThe weight value is corresponding to the exposure conversion rate vector; w_comThe weight value corresponding to the good score number vector is obtained; w_disAnd the weight value is the weight value corresponding to the discount index vector.

Optionally, the reinforcement learning module 320 is configured to:

R1＝concat(R，F，E)·W

wherein, R1 is the vector of the initial commodity ordering result; f is the inherent characteristic matrix, and W is the reinforcement learning parameter vector.

Optionally, the apparatus further includes a data filtering module 390 for determining the actual sales volume sorting result of the commodities;

optionally, the double feedback loop module 330 is configured to determine a feedback gain coefficient of the commodity based on the actual sales quantity sorting result of the commodity;

and determining a commodity sequencing intermediate result vector of the current period based on the feedback gain coefficient of each commodity and the commodity initial sequencing result vector.

Optionally, the determining a feedback gain coefficient of the commodity based on the actual sales volume sorting result of the commodity includes:

determining a feedback gain factor for the commodity based on the following equation:

wherein, buffer_nA feedback gain factor for commodity n; r1_th1、R1_th2Is the segmentation threshold in R1; wherein, DF₁、DF₂、DF₃Respectively are commodity sets segmented according to commodity sales volume; r1_nIs the initial ranking result point value of the commodity n; d_pnThe sales value of the commodity n is shown;

correspondingly, determining a commodity sorting intermediate result vector of the current period based on the feedback gain coefficient of each commodity and the commodity initial sorting result vector, including:

R2＝{R2₁，R2₂，......，R2_n}

R2_n＝R1_n×buffer_n

Optionally, forming a commodity sorting result matrix according to commodity applicable gender based on the commodity sorting intermediate result vector, including:

determining a commodity ordering result matrix based on the following formula:

R3＝{R3_female，R3_male，R3_common}

wherein, R3 is the commodity ordering result matrix;

wherein the content of the first and second substances,

p_i1 or p_i＝-1，

p_i0, or p_i＝-1，

Wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein p is_iGender variation is applied to the commodity;

applying gender descriptors to the commodity;

R3_female，R3_male，R3_commonrespectively are commodity sequencing result vectors suitable for women, men and general;

the result point values of n ordering suitable for female, male and general commodities are respectively.

Optionally, the apparatus further comprises a user gender matching module 380 for determining a user gender vector based on the user physiological gender and the user shopping tendency gender;

and a recommendation result output module 342, configured to determine a recommendation result of the commodity in the current period based on the commodity sorting result matrix and the user gender vector.

Optionally, the user gender matching module 380 is configured to:

determining the user gender vector based on the following formula:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

is the physiological gender variable of user j;

a shopping propensity gender variable for user j;

wherein the content of the first and second substances,

wherein, the user_sexjA physiological sex descriptor for user j;

the total number of times that the user j clicks on the commodity;

total number of purchases of merchandise for user j;

correspondingly, the determining the commodity recommendation result of the current cycle based on the commodity sorting result matrix and the user gender vector includes:

determining a commodity recommendation based on the following formula:

wherein, output is a set formed by the commodity recommendation results; th (th)_femaleA threshold for use with a female commodity; th (h)_maleA threshold value for a suitable male commodity; other indicates other situations.

The device can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

The embodiment of the invention provides a platform commodity recommending device applied to a C2M mode, wherein the platform commodity recommending device comprises the commodity recommending device provided by the embodiment of the invention.

The embodiment of the invention provides a merchant commodity recommending device applied to a C2M mode, wherein the merchant commodity recommending device is provided by the embodiment of the invention.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device includes:

one or more processors 410, one processor 410 being illustrated in FIG. 4;

a memory 420;

the apparatus may further include: an input device 430 and an output device 440.

The processor 410, the memory 420, the input device 430 and the output device 440 of the apparatus may be connected by a bus or other means, for example, in fig. 4.

The memory 420 serves as a non-transitory computer-readable storage medium and may be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a method for recommending merchandise according to an embodiment of the present invention (e.g., the first determining module 310, the reinforcement learning module 320, the dual feedback loop module 330, and the second determining module 340 shown in fig. 3). The processor 410 executes software programs, instructions and modules stored in the memory 420 to execute various functional applications and data processing of the computer device, namely, to implement a product recommendation method of the above method embodiment, that is:

and determining the commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period. The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 420 may optionally include memory located remotely from processor 410, which may be connected to the terminal device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus. The output device 440 may include a display device such as a display screen.

An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a method for recommending an article according to an embodiment of the present invention:

and determining the commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period. Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for recommending an article, comprising:

determining a commodity recommendation result of the current period based on the commodity sorting intermediate result vector of the current period;

the recommended explored commodity score matrix is determined based on the following modes:

forming a continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action score value of each commodity; determining the recommended exploration good score matrix based on the recommended exploration good vector and the recommended exploration good continuous action vector;

the recommended exploration score value is determined based on the following formula:

wherein e is_enThe recommended exploration score value of the commodity n is shown, and phi is a first sales threshold value of the commodity; f. of_cvrnExposure conversion for commercial n; d is a radical of_pnThe purchase quantity value of the commodity n; dis_nThe value of the exploration value score of the commodity n;

is the average of the delta vectors;

is the incremental vector variance; wherein the delta vector diff_pur＝{diff_pur1，diff_pur2，......，diff_purn}。

2. The method of claim 1, wherein determining a reinforcement learning parameter vector for a current cycle comprises:

and determining a reinforcement learning parameter vector of the current period based on the intrinsic characteristic matrix of the last period of the commodity, the recommended exploration commodity score matrix of the last period and the reinforcement learning score vector of the last period.

3. The method of claim 2, wherein the determining the commodity recommendation result for the current cycle based on the commodity ranking intermediate result vector for the current cycle comprises:

forming a commodity sorting result matrix according to commodity applicable gender based on the commodity sorting intermediate result vector;

and determining the commodity recommendation result of the current round based on the commodity sequencing result matrix and the gender of the user.

4. The method of any of claims 1-3, wherein determining the intrinsic feature matrix comprises:

aggregating all the commodity exposure real-time message data, commodity click real-time message data and commodity order real-time message data in a period to obtain a real-time expression matrix of the commodity;

5. The method of claim 4, wherein determining the intrinsic characteristic matrix of the good based on the real-time performance matrix of the good comprises:

forming an intrinsic feature matrix of the commodity based on the click conversion rate vector, the exposure conversion rate vector, the favorable score vector and the discount index vector;

the discount indices for each item form a discount index vector for the item.

6. The method of claim 1, wherein the value of the exploratory value score for commodity n is determined based on the following formula:

wherein, beta₁、β₂、γ₁、γ₂Linear coefficients of the value fraction values are explored respectively; p is a radical of formula_th1、p_th2、p_th3Respectively a second sales threshold, a third sales threshold and a fourth sales threshold of the commodity; epsilon is a minimum set value; wherein the recommended exploration sustained action score value is determined based on the following formula:

where ρ is the time decayA subtraction factor, ρ is greater than 0 and less than 1;

recommending and exploring a point value of the commodity n in the ith period; th (h)_eSearching score values for preset recommendations; e.g. of the type_remnExploring a sustained action score value for the recommendation of commodity n; t is the maximum number of cycles that the search product is recommended to have a sustained effect on the score.

7. The method of claim 6, wherein forming a recommended exploration commodity vector based on the recommended exploration score values for each commodity comprises:

E_e＝{e_e1，e_e2，......，e_en}

wherein, E_eExploring commodity vectors for the recommendations;

wherein, the forming of the continuous action vector of the recommended exploration commodity based on the recommended exploration continuous action point value of each commodity comprises:

E_rem＝{e_rem1，e_rem2，……，e_remn}

wherein the determining the recommended exploration commodity score matrix based on the recommended exploration commodity vector and the recommended exploration commodity continuous action vector comprises:

E＝{E_e，E_rem}

wherein E is the recommended exploration commodity score matrix.

8. The method of claim 7, wherein the reinforcement learning score vector is determined based on the following formula:

wherein R is the reinforcement learning score vector;

wherein N is_m＝d_c+d_p；d_cA click numerical vector for the commodity; d_pThe sales quantity numerical vector of the commodity is obtained;

wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein, the bonus1 is the reward point value after the commodity is clicked; bonus2 is the value of the prize score after the merchandise has been purchased.

9. The method of claim 8, wherein determining the reinforcement learning parameter vector of the current cycle based on the commodity intrinsic characteristic matrix of the previous cycle, the recommended explored commodity score matrix of the previous cycle, and the reinforcement learning score vector of the previous cycle comprises:

wherein the content of the first and second substances,

wherein θ ═ { R, f ═_cvr，f_ctcvr，f_com，f_dis，E_e，E_rem}；

Wherein L (θ) ═ L_ctcvr(θ)+L_comment(θ)；

Wherein W ═ { W ═ W_R，W_F，W_E}，W_F＝{W_cvr，W_ctcvr，W_com，W_dis}，W_E＝{W_Ee，W_Erem}；

Wherein, W_iIs the weighted value of the ith dimension; w_iIs an element in the reinforcement learning parameter vector W; c. C_iBoundary constraint condition of ith dimension; theta is an input feature vector; r is a reinforcement learning score vector, f_cvrAs a click conversion rate vector of the commodity, f_ctcvrAn exposure conversion vector for the commodity; f. of_comThe evaluation number vector is the good evaluation number vector of the commodity; f. of_disA discount index vector for the commodity; e_eExploring a commodity vector for the recommendation; e_remExploring commodity continuous action vectors for the recommendations; l (theta) is a multi-objective loss function; l is_ctcvr(θ) is an exposure conversion loss function; l is a radical of an alcohol_comment(θ) is the total merit loss function;

wherein, W_RA corresponding weight value of the score vector R for reinforcement learning; w is a group of_FThe weight vector corresponding to the inherent characteristic matrix F; w is a group of_ESearching a corresponding weight vector of the commodity score matrix E for recommendation; w_cvrThe weight values are corresponding to the click conversion rate vectors; w_ctcvrWeight values corresponding to the exposure conversion rate vector；W_comThe weight value corresponding to the good comment number vector is obtained; w_disAnd the weight value is the weight value corresponding to the discount index vector.

10. The method of claim 9, wherein the determining the initial commodity ranking result vector of the current cycle based on the commodity intrinsic characteristic matrix of the current cycle, the recommended search commodity score matrix of the current cycle, and the reinforcement learning parameter vector of the current cycle comprises:

R1＝concat(R，F，E)·W

11. The method of claim 3, wherein the determining a commodity ranking intermediate result vector of a current cycle based on the commodity initial ranking result vector of the current cycle and the commodity actual sales ranking result comprises:

determining a feedback gain coefficient of the commodities based on the actual sales volume sorting result of the commodities;

and determining a commodity sequencing intermediate result vector of the current period based on the feedback gain coefficient of each commodity and the initial commodity sequencing result vector.

12. The method of claim 11, wherein determining a feedback gain factor for the commodity based on the actual sales volume ranking result comprises:

wherein, DF₁、DF₂、DF₃Respectively are commodity sets segmented according to commodity sales volume; r1_nIs the initial ranking result point value of the commodity n; d_pnThe purchase quantity value of the commodity n;

determining a commodity sequencing intermediate result vector of a current period based on the feedback gain coefficient of each commodity and the commodity initial sequencing result vector, wherein the method comprises the following steps:

R2＝{R2₁，R2₂，……，R2_n}

R2_n＝R1_n×buffer_n

wherein R2 is the commodity ordering intermediate result vector; r2_nSorting the intermediate result point value for the commodity n; r1 is the vector of the initial ordering result of the commodity.

13. The method of claim 12, wherein forming a commodity ranking result matrix according to commodity applicable gender based on the commodity ranking intermediate result vector comprises:

determining a commodity ordering result matrix based on the following formula:

R3＝{R3_female，R3_male，R3_common}

wherein, R3 is the commodity sequencing result matrix;

wherein the content of the first and second substances,

or

Wherein the content of the first and second substances,

or

Wherein the content of the first and second substances,

wherein the content of the first and second substances,

wherein p is_iApplying a gender variable to the commodity;

applying gender descriptors to the commodity;

wherein, the male, female and common characteristics of the applicable sex are male, female and common respectively;

14. The method of claim 13, wherein determining the commodity recommendation result for the current cycle based on the ranked intermediate result matrix of the commodities and the gender of the user comprises:

determining a user gender vector based on a user physiological gender and the user shopping tendency gender;

and determining a commodity recommendation result of the current period based on the commodity sequencing result matrix and the user gender vector.

15. The method of claim 14, wherein determining a user gender vector based on a user physiological gender and a user shopping propensity gender comprises:

determining the user gender vector based on the following formula:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

a physiological sex variable for user j;

a shopping propensity gender variable for user j;

wherein the content of the first and second substances,

wherein, the user_sexjA physiological sex descriptor for user j;

the total number of times that the user j clicks on the commodity;

total number of purchases of merchandise for user j;

wherein the determining of the commodity recommendation result of the current cycle based on the commodity sorting result matrix and the user gender vector comprises:

determining a commodity recommendation based on the following formula:

wherein, output is a set formed by the commodity recommendation results; th (h)_femaleA threshold for a suitable female commodity; th (h)_maleA threshold value for a suitable male commodity; other characterizes other cases.

16. An article recommendation device, comprising:

the second determining module is used for determining the commodity recommendation result in the current period based on the commodity sequencing intermediate result vector in the current period;

the first determination module includes a recommendation exploration module to:

determining a score matrix of the recommended exploration commodity based on the recommended exploration commodity vector and the continuous action vector of the recommended exploration commodity;

wherein e is_enThe recommended exploration score value of the commodity n is shown, and phi is a first sales threshold value of the commodity; f. of_cvrnExposure conversion for commercial n; d_pnThe purchase quantity value of the commodity n; dis_nThe value of the exploration value score of the commodity n;

is the average of the delta vectors;

17. A platform commodity recommendation device for use in a C2M mode, wherein the platform commodity recommendation device comprises the commodity recommendation device of claim 16.

18. A merchant merchandise recommender for use in the C2M mode, wherein the merchant merchandise recommender comprises the apparatus of claim 16.

19. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-15.

20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 15.