CN112328918B

CN112328918B - Commodity sorting method, computing device and computer-readable storage medium

Info

Publication number: CN112328918B
Application number: CN202110012784.XA
Authority: CN
Inventors: 温国华; 温艳鸿
Original assignee: Zhongzhi Guanaitong Shanghai Technology Co ltd; Zhongzhi Aiyoutong Nanjing Information Technology Co ltd
Current assignee: Zhongzhi Guanaitong Shanghai Technology Co ltd; Zhongzhi Aiyoutong Nanjing Information Technology Co ltd
Priority date: 2021-01-06
Filing date: 2021-01-06
Publication date: 2021-03-23
Anticipated expiration: 2041-01-06
Also published as: CN112328918A

Abstract

The invention provides a commodity ordering method, a computing device and a computer readable storage medium. The method comprises the following steps: acquiring a commodity data set based on historical search behaviors of a user; determining a plurality of data characteristics of the commodity based on the commodity data set; training a linear regression model based on the plurality of data features to obtain a convergence parameter for the linear regression model, the convergence parameter comprising a convergence weight for each of the plurality of data features and a convergence intercept of the linear regression model; determining a score for each of a plurality of items in a user search result based on the plurality of data features and the respective convergence weights of the plurality of data features; and ranking the plurality of items based on the score for each item.

Description

Commodity sorting method, computing device and computer-readable storage medium

Technical Field

The present invention relates generally to the field of machine learning, and more particularly to a method, computing device, and computer-readable storage medium for merchandise sorting.

Background

With the rapid development of electronic commerce, online shopping has been deeply achieved in the aspects of people's life. When a user searches for a commodity on the e-commerce platform according to the keyword, the display position of the commodity serving as a search result has an important influence on the shopping experience of the user and the purchase success rate of the user. Therefore, various methods have been proposed for commodity ranking. Simple commodity ordering methods include fixed ordering modes, such as ordering according to single factors such as commodity price, sales volume, evaluation and the like. More complex commodity ordering methods include combinations of various ordering modes, such as ordering by comprehensively considering at least two of the factors of price, sales volume, evaluation and the like, and ordering by considering personal characteristics of users and the like.

Even in this case, due to the restriction of factors such as the number of samples, the extraction of sample features, the extraction of user features, and the like, the ranking of search results often cannot fully reflect the real hope of the user, so that the user experience is poor, and the conversion rate is not high.

Disclosure of Invention

In order to solve the above problems, the present invention provides a commodity ranking method, in which a large amount of commodity data based on user historical behaviors are integrated to construct a sample set relatively complete to feature extraction, and a linear regression model is used for training to score search results, so that more accurate ranking results can be obtained with less operation cost.

According to one aspect of the invention, a method of ordering items is provided. The method comprises the following steps: acquiring a commodity data set based on historical search behaviors of a user; determining a plurality of data characteristics of the commodity based on the commodity data set; training a linear regression model based on the plurality of data features to obtain a convergence parameter for the linear regression model, the convergence parameter comprising a convergence weight for each of the plurality of data features and a convergence intercept of the linear regression model; determining a score for each of a plurality of items in a user search result based on the plurality of data features and the respective convergence weights of the plurality of data features; and ranking the plurality of items based on the score for each item.

According to another aspect of the invention, a computing device is provided. The computing device includes: at least one processor; and at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor causing the computing device to perform steps according to the above-described method.

According to yet another aspect of the present invention, a computer-readable storage medium is provided, having stored thereon computer program code, which when executed performs the method as described above.

In some embodiments, the item data set includes keyword data, item data corresponding to keywords, browsing data of an item, and category data of the item, wherein the keyword data includes a user identifier, a search time, and a search keyword, the item data corresponding to keywords includes the search keyword, an item identifier, an item name, and an item number, the browsing data of the item includes the user identifier, the item identifier, and the browsing time, and the category data of the item includes the item identifier and category information to which the item belongs; and obtaining a data set of goods based on the user historical search behavior comprises: acquiring a first commodity data set based on the search keyword, the keyword data and commodity data corresponding to the keyword; integrating the first commodity data set and browsing data of the commodities based on the user identifier, the commodity identifier and the difference between the browsing time and the searching time to obtain a second commodity data set; and acquiring the commodity data set based on the commodity identifier, the second commodity data set and the commodity class data of the commodity.

In some embodiments, the merchandise data set further includes purchase data for an article, the purchase data for the article including the user identifier, the article identifier, and a time of purchase, wherein obtaining the merchandise data set based on the article identifier, the second merchandise data set, and the item class data for the article further comprises: integrating the second commodity data set and the purchase data of the commodities based on the user identifier, the commodity identifier and the difference between the purchase time and the browsing time to obtain a third commodity data set; and the obtaining the item data set based on the item identifier, the second item data set, and the item class data for the item comprises: the item data set is obtained based on the item identifier, the third item data set, and the item class data of the item.

In some embodiments, the merchandise data set further comprises user data comprising a user age and a user gender, wherein obtaining the merchandise data set based on the merchandise identifier, the second merchandise data set, and the category data of the merchandise further comprises: acquiring a fourth commodity data set based on the commodity identifier, the third commodity data set and the commodity class data of the commodity; and integrating the fourth commodity data set and the user data based on the user identifier to obtain the commodity data set.

In some embodiments, the plurality of data features includes a click rate feature of the good, a conversion rate feature of the good, a gender proportion feature of the good, and an age proportion feature of the good, and wherein determining the plurality of data features of the good based on the data set of the good includes: determining click rate characteristics of the commodities corresponding to the search keywords based on the click times of the commodities corresponding to the search keywords in the commodity data set and the times of searching the search keywords; determining conversion rate characteristics of the commodities based on the click times of the commodities corresponding to one search keyword and the purchase times of the commodities in the commodity data set; determining gender ratio characteristics of the commodities based on the number of clicks of the commodities corresponding to one search keyword in the commodity data set and the number of clicks of the commodities by users of the same gender; and determining the age ratio characteristic of the commodity based on the click times of the commodity corresponding to one search keyword in the commodity data set and the click times of the user of the specified age interval of the commodity.

In some embodiments, the age-proportion feature comprises a first age-proportion feature, a second age-proportion feature, a third age-proportion feature, and a fourth age-proportion feature, wherein determining the age-proportion feature of the item comprises: determining a first age proportion characteristic of the commodity based on the number of times that the commodity corresponding to one search keyword in the commodity data set is clicked by users in a first age interval and the number of times that the commodity is clicked by all the users; determining a second age proportion characteristic of the commodity based on the number of times that the commodity corresponding to one search keyword is clicked by users in a second age interval and the number of times that the commodity is clicked by all the users in the commodity data set, wherein the second age interval is larger than the first age interval; determining a third age proportion characteristic of the commodity based on the number of times that the commodity corresponding to one search keyword is clicked by users in a third age interval and the number of times that the commodity is clicked by all the users in the commodity data set, wherein the third age interval is larger than the second age interval; and determining a fourth age proportion characteristic of the commodity based on the number of times that the commodity corresponding to one search keyword is clicked by users in a fourth age interval and the number of times that the commodity is clicked by all the users in the commodity data set, wherein the fourth age interval is larger than the third age interval.

In some embodiments, training a linear regression model based on the plurality of data features to obtain a convergence parameter for the linear regression model comprises: setting a weight parameter of each of the plurality of data features, an intercept parameter of the linear regression model, and a learning step size of the linear regression model; determining a predicted value of the linear regression model based on the plurality of data features and the weight parameter; calculating the sum of the squares of the average errors between the predicted values and the true values as a loss function of the linear regression model; determining partial derivatives of the loss function with respect to the weight parameter for each of the data features and the intercept parameter of the linear regression model; updating the weight parameter of each data feature and the intercept parameter of the linear regression model based on the partial derivatives and the learning step size; determining whether an updated value of the weight parameter is less than a predetermined value; and if the updated value is less than the predetermined value, determining the weight parameter of each data feature as the convergence weight and determining an updated intercept parameter as the convergence intercept.

In some embodiments, the plurality of data features further includes a category click rate, and determining the plurality of data features for the item based on the item data set further includes: determining the item click rate of the commodity based on the number of times that the commodity item corresponding to a search keyword is clicked and the total number of times that all commodity items corresponding to the search keyword are clicked in the commodity data set; wherein determining a score for each of a plurality of items in a user search result based on the plurality of data features and the respective convergence weights of the plurality of data features further comprises: modifying a score for each item based on the item click rate for the item, and ranking the plurality of items based on the score for each item further comprises: ranking the plurality of items based on the revised score for each item.

Drawings

The invention will be better understood and other objects, details, features and advantages thereof will become more apparent from the following description of specific embodiments of the invention given with reference to the accompanying drawings.

FIG. 1 shows a schematic diagram of a system for implementing a method of ordering items according to an embodiment of the invention.

FIG. 2 illustrates a flow diagram of a method of ordering items according to some embodiments of the invention.

FIG. 3 shows a flowchart of steps for obtaining a data set of items based on a user's historical search behavior, in accordance with an embodiment of the invention.

FIG. 4 is a flowchart illustrating steps for training a linear regression model according to an embodiment of the present invention.

FIG. 5 illustrates a block diagram of a computing device suitable for implementing embodiments of the present invention.

Detailed Description

Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

In the following description, for the purposes of illustrating various inventive embodiments, certain specific details are set forth in order to provide a thorough understanding of the various inventive embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details. In other instances, well-known devices, structures and techniques associated with this application may not be shown or described in detail to avoid unnecessarily obscuring the description of the embodiments.

Throughout the specification and claims, the word "comprise" and variations thereof, such as "comprises" and "comprising," are to be understood as an open, inclusive meaning, i.e., as being interpreted to mean "including, but not limited to," unless the context requires otherwise.

Reference throughout this specification to "one embodiment" or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment. Thus, the appearances of the phrases "in one embodiment" or "in some embodiments" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the terms first, second and the like used in the description and the claims are used for distinguishing objects for clarity, and do not limit the size, other order and the like of the described objects.

Fig. 1 shows a schematic view of a system 1 for implementing a method of ordering goods according to an embodiment of the invention. As shown in fig. 1, the system 1 includes a user terminal 10, a computing device 20, a server 30, and a network 40. User terminal 10, computing device 20, and server 30 may interact with data via network 40. Here, each user terminal 10 may be a mobile or fixed terminal of an end user, such as a mobile phone, a tablet computer, a desktop computer, or the like. The user terminal 10 may communicate with a server 30 of the electronic commerce enterprise, for example, through an electronic commerce enterprise application or a specific search engine installed thereon, to send information to the server 30 and/or receive information from the server 30. The computing device 20 performs corresponding operations based on data from the user terminal 10 and/or the server 30. The computing device 20 may include at least one processor 210 and at least one memory 220 coupled to the at least one processor 210, the memory 220 having stored therein instructions 230 executable by the at least one processor 210, the instructions 230, when executed by the at least one processor 210, performing at least a portion of the method 100 as described below. Note that herein, computing device 20 may be part of server 30 or may be separate from server 30. The specific structure of computing device 20 or server 30 may be described, for example, in connection with FIG. 5, below.

FIG. 2 illustrates a flow diagram of a method 100 of merchandise sorting according to some embodiments of the invention. The method 100 may be performed, for example, by the computing device 20 or the server 30 in the system 1 shown in fig. 1. The method 100 is described below in conjunction with fig. 1-5, with an example being performed in the computing device 20.

As shown in FIG. 2, method 100 includes step 110, wherein computing device 20 obtains a data set of items based on a user's historical search behavior. Here, the user history search behavior may include a commodity search behavior of the user, a browsing behavior based on a search result, a purchasing behavior based on the browsing behavior, and the like. The commodity data based on the user history search behavior refers to various commodity data that the user has involved in performing each search behavior. In a typical programming architecture design, separate data is generated for each historical behavior of the user and each piece of data is stored separately in a different list/database area. Therefore, before using these data, it is first necessary to selectively integrate these data.

In some embodiments, the commodity data set based on the user historical search behavior may include at least keyword data, commodity data corresponding to the keywords, browsing data of the commodities, and category data of the commodities.

One piece of keyword data includes a user identifier (user ID), a search time, and a search keyword. The search keyword is a keyword input by a user when the user performs a search operation.

The article data corresponding to one keyword includes the search keyword, an article identifier (article ID, which may be, for example, a number or code of an article), an article name, and an article serial number. Here, the product number refers to a naturally ordered number of each product in the search result generated from the search key.

The browsing data of one article includes a user ID, an article ID, and a browsing time.

The item data of one item includes an item ID and item information to which the item belongs. In an e-commerce system, a commodity usually belongs to a tree-structured multi-level class, the highest level class is called a root level class, and the lowest level class is called a final level class. In the present invention, the item information to which the commodity belongs refers to the final item of the commodity.

Based on the various data described above, the computing device 20 integrates the data in stages to produce the desired data set of the good at step 110.

FIG. 3 shows a flowchart of step 110 of obtaining a data set of items based on a user's historical search behavior, according to an embodiment of the invention.

As shown in fig. 3, step 110 may include a substep 112 in which computing device 20 obtains a first set of merchandise data based on the search keyword, the keyword data, and the merchandise data corresponding to the keyword. As previously discussed, both the keyword data and the item data corresponding to the keyword include a search keyword, and thus in sub-step 112, the computing device 20 may index the search keyword and integrate the keyword data and the item data corresponding to the keyword together to form a first item data set. Each piece of data of the first commodity data set includes a search keyword, a user ID, search time, a commodity ID, a commodity name, and a commodity number.

Next, at substep 114, computing device 20 may integrate the first item data set and the browsing data for the item based on the user ID, the item ID, and the difference between the browsing time and the search time to obtain a second item data set. As described above, the user ID and the product ID are both included in the first product data set and the browsing data of the product, and therefore the first product data set and the browsing data of the product can be integrated based on the user ID and the product ID to generate the second product data set. More specifically, statistically, it is generally considered that there is a correlation between the product data generated by the search behavior and the browsing behavior of the user only when the time difference between the two behaviors is within a predetermined time (a first predetermined time, for example, several minutes), and thus the corresponding product data is integrated. On the contrary, if the time difference between the search behavior and the browsing behavior of the user exceeds the preset time, the commodity data generated by the two behaviors are not considered to have correlation, and the subsequent linear regression model cannot be trained by the two behaviors.

In some embodiments, the merchandise data set further includes purchase data for the merchandise, the purchase data for the merchandise including a user ID, an item ID, and a time of purchase. In this case, step 110 may include sub-step 116 in which computing device 20 integrates the second merchandise data set and the purchase data for the merchandise based on the user ID, the merchandise ID, and the difference between the purchase time and the browsing time to obtain a third merchandise data set. Similarly to the above, it is common that only when the time difference between the browsing behavior and the purchasing behavior of the user is within a predetermined time (a second predetermined time, for example, several minutes or several tens of minutes), the commodity data generated by the browsing behavior and the purchasing behavior are considered to have a correlation therebetween, and thus the corresponding commodity data is integrated. On the contrary, if the time difference between the browsing behavior and the purchasing behavior of the user exceeds the preset time, the commodity data generated by the browsing behavior and the purchasing behavior are not considered to have correlation, and the subsequent linear regression model cannot be trained by the commodity data.

Next, in sub-step 118, computing device 20 obtains a merchandise data set based on the merchandise ID, the third merchandise data set, and the item class data for the merchandise. As described above, the item data of the article includes the article ID and the item information to which the article belongs. Thus, in sub-step 118, computing device 20 may integrate the third merchandise data set and the item class data for the merchandise, indexed by the merchandise ID, to form the desired merchandise data set.

In some embodiments, the purchase data for the item may not be included in the item data set. In this case, step 110 may not include substep 116 described above, and in substep 118, computing device 20 obtains the item data set based on the item ID, the second item data set obtained in substep 114 (instead of the third item data set obtained in substep 116), and the item class data for the item.

In some embodiments, the merchandise data set may also include user data including a user age and a user gender.

In this case, the merchandise data set generated in step 110 may also be integrated with user data. Specifically, the commodity data set obtained in sub-step 118 of the above-described embodiment is not the final desired commodity data set, but is an intermediate product of the final commodity data set. In the present embodiment, the commodity data set generated in the above-described substep 118 is referred to as a fourth commodity data set. Then, the fourth commodity data set and the user data are integrated based on the user ID to obtain a finally required commodity data set.

Continuing with FIG. 2, at step 120, computing device 20 may determine a plurality of data characteristics of the item based on the item data set acquired at step 110.

The plurality of data features of the good may include at least some of a click rate feature of the good, a conversion rate feature of the good, a user gender proportion feature of the good, a user age proportion feature of the good, and a textual similarity between the search keyword and the name of the good.

Specifically, the click rate characteristics of the item corresponding to a search keyword may be determined based on the number of clicks of the item corresponding to the search keyword and the number of times the search keyword is searched in the item data set. For example, the click rate characteristic of a good may be determined by the following equation (1):

（1），

wherein x₁Indicating click rate characteristics of goods, n₁Indicating the number of clicks, n, of a commodity corresponding to a search keyword₂Indicating the number of times the search key was searched, n₁And n₂The data of the corresponding items in the commodity data set can be obtained through statistics.

Specifically, the conversion rate characteristic of the commodity may be determined based on the number of clicks of the commodity corresponding to one search keyword and the number of purchases of the commodity in the commodity data set. For example, the conversion characteristics of a commercial product can be determined by the following equation (2):

（2），

wherein x₂Indicating the conversion characteristics of the commodity, n₁Indicating the number of clicks, n, of a commodity corresponding to a search keyword₃Indicating the number of purchases of the item, n₁And n₃The data of the corresponding items in the commodity data set can be obtained through statistics.

Specifically, the gender ratio characteristic of the commodity may be determined based on the number of times that the commodity corresponding to one search keyword is clicked by users of the same gender and the number of times that the commodity is clicked by all the users in the commodity data set. For example, the gender ratio characteristic of the merchandise may be determined by the following equation (3):

（3），

wherein x₃Indicating gender ratio characteristic of merchandise, n₁Indicating the number of clicks, n, of a commodity corresponding to a search keyword₄Indicating the number of times the item was clicked on by users of the same gender (male or female), n₁And n₄The data of the corresponding items in the commodity data set can be obtained through statistics.

Specifically, the age-related characteristic of the product may be determined based on the number of times of user clicks of a product designated age zone corresponding to one search keyword and the number of times of clicks by all users in the product data set. For example, the age-related characteristic of the commodity can be determined by the following formula (4):

（4），

wherein x₄Indicating age-related characteristics of the goods, n₁Indicating the number of clicks, n, of a commodity corresponding to a search keyword₅Indicating the number of clicks, n, of the user of the designated age interval of the item₁And n₅The data of the corresponding items in the commodity data set can be obtained through statistics.

In one embodiment, users in different age intervals present different data characteristics for different behaviors of the commodity, so that a plurality of age ratio characteristics can be determined according to different age intervals when the linear regression model is trained according to the age ratio characteristics. For example, the age intervals of the user may be divided into a first age interval (e.g., [0, 30 ]), a second age interval (e.g., [30, 40 ]), a third age interval (e.g., [40, 50 ]), and a fourth age interval (e.g., [50, z ]), where z is a predetermined maximum age value, e.g., 150).

Specifically, the first age proportion characteristic of the item may be determined based on the number of clicks of the item corresponding to one search keyword and the number of clicks of the item by the user of the first age zone in the item data set. For example, the first age ratio characteristic may be determined by the following formula (5):

（5），

wherein x₅Indicating a first age-related characteristic, n₁Indicating the number of clicks, n, of a commodity corresponding to a search keyword₆Indicating the number of times the item was clicked on by users in the first age interval, n₁And n₆The data of the corresponding items in the commodity data set can be obtained through statistics.

Similarly, a second age-related characteristic of the item may be determined based on the number of clicks of the item corresponding to a search keyword in the item data set and the number of clicks of the item by users of a second age interval. Here, as indicated above, the second age interval is larger than the first age interval. For example, the second age ratio characteristic may be determined by the following formula (6):

（6），

wherein x₆Indicating a second age-related characteristic, n₁Indicating the number of clicks, n, of a commodity corresponding to a search keyword₇Indicating the number of times the item was clicked on by users in the second age interval, n₁And n₇The data of the corresponding items in the commodity data set can be obtained through statistics.

Similarly, the third age proportion characteristic of the item may be determined based on the number of clicks of the item corresponding to one search keyword in the item data set and the number of clicks of the item by users of the third age interval. Here, as described above, the third age interval is larger than the second age interval. For example, the third age ratio characteristic may be determined by the following formula (7):

（7），

wherein x₇Indicating a third age-related characteristic, n₁Indicating the number of clicks, n, of a commodity corresponding to a search keyword₈Indicating the number of times the item was clicked on by users in the third age interval, n₁And n₈The data of the corresponding items in the commodity data set can be obtained through statistics.

Similarly, the fourth age-related characteristic of the item may be determined based on the number of clicks of the item corresponding to one search keyword and the number of clicks of the item by users in the fourth age zone in the item data set. Here, as described above, the fourth age interval is larger than the third age interval. For example, the fourth age duty characteristic can be determined by the following equation (8):

（8），

wherein x₈Indicating a fourth age-to-ratio feature, n₁Indicating the number of clicks, n, of a commodity corresponding to a search keyword₉Indicating the number of times the item was clicked by users in the fourth age interval, n₁And n₉The data of the corresponding items in the commodity data set can be obtained through statistics.

The text similarity refers to the text similarity between a search keyword input by a user and a product name in product data corresponding to a keyword, and may be calculated, for example, by a similarity algorithm TF-IDF (term frequency-inverse file frequency) or BM25, which is not described herein again.

Continuing with FIG. 2, next, at step 130, computing device 20 trains a linear regression model based on the plurality of data features determined at step 120 to obtain convergence parameters for the linear regression model. The convergence parameter includes respective convergence weights for the plurality of data features and a convergence intercept of the linear regression model.

A linear regression model is a classical artificial intelligence algorithm model, which is a regression analysis that models the relationship between one or more independent and dependent variables using a least squares function called the linear regression equation. The linear regression model can be simply expressed as:

y = XW+b （9），

where y represents the output value, X represents the input value, W represents the weight of the input value, and b represents the intercept of the model.

When the input value includes a plurality of values, X is an input matrix (with a size of N × m) composed of N m-dimensional input values, and W is a weight matrix (with a size of m × 1 vectors) composed of weights for each of the m-dimensional input values. The weight matrix W and the intercept b form model parameters of a linear regression model, and the training of the linear regression model is to train W and b to obtain the convergence weight and the convergence intercept.

More specifically, for each commodity in the commodity data set obtained in step 110, a plurality of data features (assumed to be m) of the commodity may be obtained in step 120, and the plurality of data features of each commodity constitute an input value, so that for a commodity data set containing N commodities, the input matrix is an N × m matrix (i.e., a matrix formed by N m-dimensional input values), and a linear regression model may be trained based on the input matrix.

FIG. 4 shows a flowchart of the step 130 of training the linear regression model according to an embodiment of the present invention.

As shown in fig. 4, step 130 may include a substep 131 in which computing device 20 sets a weight parameter for each of the plurality of data features, an intercept parameter of the linear regression model, and a learning step size of the linear regression model. As mentioned above, the plurality of data characteristics obtained in step 120 include the click rate characteristic x of the product₁Conversion characteristic x of commercial product₂Gender ratio characteristic x of the merchandise₃Age ratio characteristic x of a commodity₄(more specifically, one or more age-specific characteristics: a first age-specific characteristic x may be included₅Second age ratio characteristic x₆Third age ratio characteristic x₇And a fourth age ratio characteristic x₈The following feature x is expressed by an age₄Described for example) and at least some of the text similarities. It is assumed here that the click-through rate characteristic x of the item selected therein in step 130₁Conversion characteristic x of commercial product₂Gender ratio characteristic x of the merchandise₃Age ratio characteristic x of a commodity₄For training a linear regression model.

Initially, computing device 20 may set an initial weight parameter w for each of these data features₁、w₂、w₃、w₄For example, set to all 0 s. The intercept parameter b is set to 0, for example. The learning step size α determines the convergence speed of the model parameters, for example α = 0.01. Those skilled in the art canIt is understood that the data features used in training the linear regression model in step 130 may be a subset of the data features obtained in step 120 rather than all of the data features.

Next, at sub-step 132, computing device 20 may base on the plurality of data features x₁、x₂、x₃、x₄And a weight parameter w₁、w₂、w₃、w₄Determining a predicted value y of a linear regression model_i'：

（10），

Wherein x_iFor the ith input value (ith commodity) of the input matrix X as described above, j represents the jth data feature (here, it is assumed that 4 data features are used, m = 4), w_jA weight parameter i =1, 2, … … N representing the data characteristic.

Next, in sub-step 133, the predicted value y determined in sub-step 132 is calculated_i' and true value y_iAs a loss function of the linear regression model.

The loss function can be expressed as:

（11），

true value y_i= log_n(i+q)，

Where i is the number of input values, n is the base of the logarithm, q is an additional value, and n and q are empirically based parameters, e.g., both n and q take the value of 2.

Next, in sub-step 134, a weight parameter w of the loss function loss with respect to each data feature is determined_jAnd the partial derivative of the intercept parameter b of the linear regression model.

（12），

Wherein j =1, 2, … … m,

（13）。

next, in sub-step 135, the weight parameter w for each data feature may be updated based on the partial derivatives and the learning step size α obtained in sub-step 134_jAnd the intercept parameter b of the linear regression model. Specifically, the updated weight parameter w may be determined as follows_j'and intercept parameter b':

（14），

（15）。

in sub-step 136, an updated value (w) of the weight parameter is determined_j'- w_jI.e. by

) Whether less than a predetermined value. Here, the predetermined value is a threshold value for judging whether or not the weight parameter converges, and may be set to a magnitude of 0.01 based on experience.

If it is determined in sub-step 136 that the updated value is less than the predetermined value, that is, it is determined that the weight parameter of the linear regression model converges, then in sub-step 137, the weight parameter w of each data feature at that time is determined_j'as a convergence weight for the linear regression model and determines an updated intercept parameter b' as the convergence intercept.

On the other hand, if it is determined in sub-step 136 that the updated value is greater than or equal to the predetermined value, i.e., it is determined that the weight parameters of the linear regression model do not converge, thenStep 130 may determine the weight parameter w at this time_j'and intercept parameter b' repeat the above substeps 131 to 136.

Continuing with FIG. 2, at step 140, computing device 20 may determine a score for each of the plurality of items in the user search results based on the convergence weights of each of the plurality of data features obtained at step 120 and the plurality of data features obtained at step 130, and rank the items based on the score for each item, e.g., by ranking the scores from high to low, at step 150.

Specifically, for example, a plurality of data features of each item in the user search results may be weighted and summed with their respective weights to obtain a score for the item.

（16）

Wherein x_iCharacteristic of i-th data, w, representing a commodity_iRepresents the convergence weight of the ith data feature, and b represents the convergence intercept.

In some cases, the category of the good has a large impact on the search results. For example, in the case where the same search term corresponds to a plurality of commodities and the commodities are in a plurality of categories, the case where the commodities in different categories are clicked is also significant for the ordering of the commodity list finally presented to the user. To this end, in some embodiments, in sub-step, a category click rate may also be determined as a data feature.

Specifically, in step 120, the computing device 20 may determine the item click rate of the item based on the number of times that the item corresponding to one search keyword is clicked and the total number of times that all items corresponding to the search keyword are clicked in the item data set. For example, the item click rate of a good may be determined by the following equation (17):

（17），

wherein x₉Indicating item click rate characteristics of goods, n₁₀Indicating the number of times, n, that an item class corresponding to a search keyword is clicked₁₁Indicates the total number of times of clicking on all the goods corresponding to the search keyword, n₁₀And n₁₁The data of the corresponding items in the commodity data set can be obtained through statistics.

In this case, in step 140 described above, the score of each item may be corrected based on the obtained item click rate of the item, and the items may be sorted based on the corrected score of each item in step 150.

Specifically, the corrected score s' of each commodity can be obtained according to the following formula (18):

（18）。

FIG. 5 illustrates a block diagram of a computing device 500 suitable for implementing embodiments of the present invention. Computing device 500 may be, for example, computing device 20 or server 30 as described above.

As shown in fig. 5, computing device 500 may include one or more Central Processing Units (CPUs) 510 (only one shown schematically) that may perform various appropriate actions and processes in accordance with computer program instructions stored in Read Only Memory (ROM) 520 or loaded from storage unit 580 into Random Access Memory (RAM) 530. In the RAM 530, various programs and data required for the operation of the computing device 500 may also be stored. The CPU 510, ROM 520, and RAM 530 are connected to each other by a bus 540. An input/output (I/O) interface 550 is also connected to bus 540.

A number of components in computing device 500 are connected to I/O interface 550, including: an input unit 560 such as a keyboard, a mouse, etc.; an output unit 570 such as various types of displays, speakers, and the like; a storage unit 580 such as a magnetic disk, an optical disk, or the like; and a communication unit 590 such as a network card, a modem, a wireless communication transceiver, etc. The communication unit 590 allows the computing device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The method 100 described above may be performed, for example, by the CPU 510 of a computing device 500, such as computing device 20 or server 30. For example, in some embodiments, the method 100 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 580. In some embodiments, part or all of the computer program may be loaded and/or installed onto computing device 500 via ROM 520 and/or communications unit 590. When the computer program is loaded into RAM 530 and executed by CPU 510, one or more of the operations of method 100 described above may be performed. Further, the communication unit 590 may support wired or wireless communication functions.

Those skilled in the art will appreciate that the computing device 500 illustrated in FIG. 5 is merely illustrative. In some embodiments, computing device 20 or server 30 may contain more or fewer components than computing device 500.

The method 100 for ordering items and the computing device 500 that may be used as the computing device 20 or the server 30 in accordance with the present invention are described above in connection with the figures. However, it will be appreciated by those skilled in the art that the performance of the steps of the method 100 is not limited to the order shown in the figures and described above, but may be performed in any other reasonable order. Further, the computing device 500 also need not include all of the components shown in FIG. 5, it may include only some of the components necessary to perform the functions described in the present disclosure, and the manner in which these components are connected is not limited to the form shown in the figures.

The present invention may be methods, apparatus, systems and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therein for carrying out aspects of the present invention.

In one or more exemplary designs, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. For example, if implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The units of the apparatus disclosed herein may be implemented using discrete hardware components, or may be integrally implemented on a single hardware component, such as a processor. For example, the various illustrative logical blocks, modules, and circuits described in connection with the invention may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both.

The previous description of the invention is provided to enable any person skilled in the art to make or use the invention. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the present invention is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of ordering items, comprising:

acquiring a commodity data set based on historical search behaviors of a user, wherein the commodity data set comprises keyword data, commodity data corresponding to the keywords, browsing data of commodities and category data of the commodities;

determining a plurality of data characteristics of the commodity based on the commodity data set;

training a linear regression model based on the plurality of data features to obtain a convergence parameter for the linear regression model, the convergence parameter comprising a convergence weight for each of the plurality of data features and a convergence intercept of the linear regression model;

determining a score for each of a plurality of items in a user search result based on the plurality of data features and the respective convergence weights of the plurality of data features; and

ranking the plurality of items based on the score for each item;

wherein training a linear regression model based on the plurality of data features to obtain a convergence parameter for the linear regression model comprises:

setting a weight parameter of each of the plurality of data features, an intercept parameter of the linear regression model, and a learning step size of the linear regression model;

determining a predicted value of the linear regression model based on the plurality of data features and the weight parameter;

calculating the sum of the squares of the average errors between the predicted values and the true values as a loss function of the linear regression model;

determining partial derivatives of the loss function with respect to the weight parameter for each of the data features and the intercept parameter of the linear regression model;

updating the weight parameter of each data feature and the intercept parameter of the linear regression model based on the partial derivatives and the learning step size;

determining whether an updated value of the weight parameter is less than a predetermined value; and

if the updated value is less than the predetermined value, determining a weight parameter for each of the data features as the convergence weight and determining an updated intercept parameter as the convergence intercept.

2. The method of claim 1, wherein

The keyword data includes a user identifier, a search time and a search keyword,

the commodity data corresponding to the keywords comprises the search keywords, commodity identifiers, commodity names and commodity serial numbers,

the browsing data of the article includes the user identifier, the article identifier, and a browsing time,

the item data of the commodity comprises the commodity identifier and item information to which the commodity belongs;

and obtaining a data set of goods based on the user historical search behavior comprises:

acquiring a first commodity data set based on the search keyword, the keyword data and commodity data corresponding to the keyword;

integrating the first commodity data set and browsing data of the commodities based on the user identifier, the commodity identifier and the difference between the browsing time and the searching time to obtain a second commodity data set; and

the item data set is obtained based on the item identifier, the second item data set, and the item class data of the item.

3. The method of claim 2, wherein the merchandise data set further comprises purchase data for merchandise, the purchase data for merchandise comprising the user identifier, the merchandise identifier, and a time of purchase, wherein obtaining the merchandise data set based on the merchandise identifier, the second merchandise data set, and the category data for the merchandise further comprises:

integrating the second commodity data set and the purchase data of the commodities based on the user identifier, the commodity identifier and the difference between the purchase time and the browsing time to obtain a third commodity data set; and the obtaining the item data set based on the item identifier, the second item data set, and the item class data for the item comprises:

the item data set is obtained based on the item identifier, the third item data set, and the item class data of the item.

4. The method of claim 3, wherein the merchandise data set further comprises user data comprising a user age and a user gender, wherein obtaining the merchandise data set based on the merchandise identifier, the second merchandise data set, and the item data for the merchandise further comprises:

acquiring a fourth commodity data set based on the commodity identifier, the third commodity data set and the commodity class data of the commodity; and

integrating the fourth commodity data set and the user data based on the user identifier to obtain the commodity data set.

5. The method of claim 1, wherein the plurality of data features includes at least some of a click-through rate feature of the good, a conversion rate feature of the good, a gender ratio feature of the good, an age ratio feature of the good, and a textual similarity between the search keyword and the name of the good, and wherein determining the plurality of data features of the good based on the data set of the good includes:

determining click rate characteristics of the commodities corresponding to the search keywords based on the click times of the commodities corresponding to the search keywords in the commodity data set and the times of searching the search keywords;

determining conversion rate characteristics of the commodities based on the click times of the commodities corresponding to one search keyword and the purchase times of the commodities in the commodity data set;

determining gender ratio characteristics of the commodities based on the number of clicks of the commodities corresponding to one search keyword in the commodity data set and the number of clicks of the commodities by users of the same gender;

determining the age ratio characteristic of the commodity based on the click times of the commodity corresponding to one search keyword in the commodity data set and the click times of the user of the specified age interval of the commodity; and

and determining the text similarity based on the commodity name in the commodity data set, a search keyword and the commodity data corresponding to the search keyword.

6. The method of claim 5, wherein the age-proportion feature comprises a first age-proportion feature, a second age-proportion feature, a third age-proportion feature, and a fourth age-proportion feature, wherein determining the age-proportion feature of the item comprises:

determining a first age proportion characteristic of the commodity based on the number of times that the commodity corresponding to one search keyword in the commodity data set is clicked by users in a first age interval and the number of times that the commodity is clicked by all the users;

determining a second age proportion characteristic of the commodity based on the number of times that the commodity corresponding to one search keyword is clicked by users in a second age interval and the number of times that the commodity is clicked by all the users in the commodity data set, wherein the second age interval is larger than the first age interval;

determining a third age proportion characteristic of the commodity based on the number of times that the commodity corresponding to one search keyword is clicked by users in a third age interval and the number of times that the commodity is clicked by all the users in the commodity data set, wherein the third age interval is larger than the second age interval; and

determining a fourth age proportion characteristic of the commodity based on the number of times that the commodity corresponding to one search keyword in the commodity data set is clicked by users in a fourth age interval and the number of times that the commodity is clicked by all the users, wherein the fourth age interval is larger than the third age interval.

7. The method of claim 1, wherein the plurality of data features further comprises a category click rate, and determining a plurality of data features for a good based on the good data set further comprises:

determining the item click rate of the commodity based on the number of times that the commodity item corresponding to a search keyword is clicked and the total number of times that all commodity items corresponding to the search keyword are clicked in the commodity data set;

wherein determining a score for each of a plurality of items in a user search result based on the plurality of data features and the respective convergence weights of the plurality of data features further comprises:

modifying a score for each item based on the item click rate for the item, and ranking the plurality of items based on the score for each item further comprises:

ranking the plurality of items based on the revised score for each item.

8. A computing device, comprising:

at least one processor; and

at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor causing the computing device to perform the steps of the method of any of claims 1-7.

9. A computer-readable storage medium having stored thereon computer program code which, when executed, performs the method of any of claims 1 to 7.