CN108960719B

CN108960719B - Method and device for selecting products and computer readable storage medium

Info

Publication number: CN108960719B
Application number: CN201810693419.8A
Authority: CN
Inventors: 李建星
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2021-10-15
Anticipated expiration: 2038-06-29
Also published as: CN108960719A

Abstract

The invention discloses a product selection method and device and a computer readable storage medium, and relates to the field of data processing. The product selection method comprises the following steps: determining a characteristic value of the alternative commodity according to the online commodity data of the alternative commodity and the user behavior data of the online user of which the offline distance from the store to be selected is smaller than a preset value; inputting the characteristic value of the candidate commodity into a classification model, wherein the classification model is trained by taking the characteristic value of training sample data as input and taking sales information as a marking value; and determining whether to select the alternative commodity for the store to be selected according to the classification result of the classification model. Therefore, the commodities needing to be sold in the off-line shop can be quickly determined from the mass of on-line commodities, and the product selection efficiency and the accuracy are improved.

Description

Method and device for selecting products and computer readable storage medium

Technical Field

The present invention relates to the field of data processing, and in particular, to a method and an apparatus for selecting a product, and a computer-readable storage medium.

Background

With the diversification of consumer demands, e-commerce is changing from pure online sales to online-offline convergence, and gradually becoming a mainstream e-commerce form. The goods sources of the off-line stores are mainly supplied through the on-line platform, and specific goods need to be selected from a large quantity of goods.

At present, the commodities of the online store are mainly selected by adopting a manual commodity selection mode of a commodity collection and sale person, namely generally, supplier selection, commodity audit and the like are carried out through the business experience of the commodity collection and sale person so as to investigate all aspects of the commodities on a platform and select the commodities which are suitable for store positioning and corresponding categories.

Disclosure of Invention

After the inventor analyzes the data, the workload of the manual selection mode is huge because the quantity of Stock Keeping Units (SKUs) on the current large-scale e-commerce platform is millions. In addition, due to the fact that subjective deviation exists in manual selection and comprehensive evaluation indexes do not exist, the probability of error of a manual selection mode is high, and further sales performance is poor.

The embodiment of the invention aims to solve the technical problem that: how to improve the efficiency and the accuracy of off-line shop selection.

According to a first aspect of some embodiments of the present invention, there is provided an item selection method, comprising: determining a characteristic value of the alternative commodity according to the online commodity data of the alternative commodity and the user behavior data of the online user of which the offline distance from the store to be selected is smaller than a preset value; inputting the characteristic value of the candidate commodity into a classification model, wherein the classification model is trained by taking the characteristic value of training sample data as input and taking sales information as a marking value; and determining whether to select the alternative commodity for the store to be selected according to the classification result of the classification model.

In some embodiments, the method of selecting further comprises: acquiring online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; and training the classification model by using the characteristic value of the training sample data as input and using corresponding sales information as a mark value.

In some embodiments, the classification model is a classification decision tree, the classification nodes in the classification decision tree are features of the commodity, and the leaf nodes represent classification results.

In some embodiments, the method of selecting further comprises: acquiring online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; training a classification model, comprising: calculating the information gain of the features which are not added into the classification decision tree according to the training sample data corresponding to the positions of the nodes to be added; adding the characteristics with the maximum information gain to the positions of the nodes to be added as classification nodes; and repeating the steps of calculating information gain and adding the features until all the features are added into the classification decision tree.

In some embodiments, the method of selecting further comprises: acquiring online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; training a classification model, comprising: calculating the information gain of the features which are not added into the classification decision tree according to the training sample data corresponding to the positions of the nodes to be added; adding leaf nodes representing classification results to the positions of the nodes to be added under the condition that the information gain of each feature which is not added into the classification decision tree is smaller than a preset information gain threshold, wherein the classification results are determined according to the mark values corresponding to the positions of the nodes to be added; under the condition that the information gains of the features which are not added into the classification decision tree are not smaller than a preset information gain threshold, adding the features with the maximum information gains to the positions of the nodes to be added to serve as classification nodes; and repeating the steps of calculating information gain and adding the features until all the features are added into the classification decision tree.

In some embodiments, the method of selecting further comprises: inputting the characteristic value of the tested commodity into a classification decision tree; determining the prediction accuracy of the classification decision tree according to the classification result of the classification decision tree and the marking value of the test commodity; in response to the prediction accuracy being below a preset accuracy threshold, retraining the classification decision tree after performing at least one of: modifying the information gain threshold; updating the characteristics; and updating training sample data.

In some embodiments, the method of selecting further comprises: obtaining a plurality of training sample data subsets, wherein commodities corresponding to each training sample data subset belong to the same category; respectively training a classification model by adopting each training sample data subset to obtain a plurality of sub-classification models; and determining the weight of each sub-classification model according to the prediction accuracy of each sub-classification model so as to determine whether to select the alternative commodity for the store to be selected according to the weighted calculation result of the classification result of each sub-classification model.

In some embodiments, the selection method further comprises performing at least one of the following feature screening methods: screening out one of the different characteristics having a correlation greater than a preset value; and screening out the characteristic that the correlation of the information of the sales volume is less than the preset value.

In some embodiments, the online merchandise data includes at least one of: the online shopping system comprises an online purchasing feature, an online browsing feature, an online searching feature, an online shopping cart adding feature, an online brand sales feature, an online evaluation feature, an online basic attribute feature and an online purchasing feature of a user in an offline preset range; the user behavior data includes at least one of: a user location feature, a user purchase feature, a user browsing feature, a user search feature, a user add shopping cart feature, a user purchase brand feature, a user base attribute feature.

According to a second aspect of some embodiments of the present invention, there is provided an option apparatus comprising: the candidate commodity characteristic value determining module is configured to determine a characteristic value of a candidate commodity according to the online commodity data of the candidate commodity and the user behavior data of the online user of which the offline distance from the store to be selected is smaller than a preset value; the candidate commodity characteristic value input module is configured to input the characteristic value of the candidate commodity into a classification model, wherein the classification model is trained by taking the characteristic value of training sample data as input and taking sales information as a marking value; and the selection determining module is configured to determine whether to select the alternative commodity for the store to be selected according to the classification result of the classification model.

In some embodiments, the selection device further comprises: a classification model training module configured to acquire online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; and training the classification model by using the characteristic value of the training sample data as input and using corresponding sales information as a mark value.

In some embodiments, the selection device further comprises: a classification decision tree training module configured to acquire online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; training a classification model, comprising: and calculating the information gain of the features which are not added into the classification decision tree according to the training sample data corresponding to the positions of the nodes to be added, adding the features with the maximum information gain to the positions of the nodes to be added to serve as classification nodes, and repeating the steps of calculating the information gain and adding the features until all the features are added into the classification decision tree.

In some embodiments, the selection device further comprises: a classification decision tree training module configured to acquire online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; training a classification model, comprising: calculating information gains of the features which are not added into the classification decision tree according to training sample data corresponding to the positions of the nodes to be added, adding leaf nodes representing classification results to the positions of the nodes to be added under the condition that the information gains of the features which are not added into the classification decision tree are all smaller than a preset information gain threshold, adding the features with the maximum information gains to the positions of the nodes to be added as classification nodes under the condition that the information gains of the features which are not added into the classification decision tree are not all smaller than the preset information gain threshold, and repeating the steps of calculating the information gains and adding the features until all the features are added into the classification decision tree; and determining the classification result according to the mark value corresponding to the position of the node to be added.

In some embodiments, the selection device further comprises: the classification model test adjustment module is configured to input the characteristic value of the test commodity into a classification decision tree; determining the prediction accuracy of the classification decision tree according to the classification result of the classification decision tree and the marking value of the test commodity; in response to the prediction accuracy being below a preset accuracy threshold, retraining the classification decision tree after performing at least one of: modifying the information gain threshold; updating the characteristics; and updating training sample data.

In some embodiments, the selection device further comprises: the multi-model training module is configured to acquire a plurality of training sample data subsets, wherein commodities corresponding to each training sample data subset belong to the same category; respectively training a classification model by adopting each training sample data subset to obtain a plurality of sub-classification models; and determining the weight of each sub-classification model according to the prediction accuracy of each sub-classification model so as to determine whether to select the alternative commodity for the store to be selected according to the weighted calculation result of the classification result of each sub-classification model.

In some embodiments, the selection device further comprises: a feature screening module configured to perform at least one of the following feature screening methods: screening out one of the different characteristics having a correlation greater than a preset value; and screening out the characteristic that the correlation of the information of the sales volume is less than the preset value.

According to a third aspect of some embodiments of the present invention, there is provided an option apparatus comprising: a memory; and a processor coupled to the memory, the processor configured to perform any of the aforementioned methods of selection based on instructions stored in the memory.

According to a fourth aspect of some embodiments of the present invention, there is provided a computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements any one of the aforementioned methods of selecting.

Some embodiments of the above invention have the following advantages or benefits: according to the embodiment of the invention, the characteristic value of the candidate commodity can be determined through the online commodity data and the data of the users around the offline candidate store, and the candidate commodity is classified by adopting the pre-trained classification model to determine whether to select the candidate commodity for the candidate store, so that the commodity needing to be sold in the offline store can be rapidly determined from a large amount of online commodities, and the commodity selecting efficiency and accuracy are improved.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is an exemplary flow chart of an election method according to some embodiments of the invention.

Fig. 2A and 2B are exemplary flow diagrams of feature screening methods according to some embodiments of the invention.

FIG. 3 is an exemplary flow diagram of a classification model training method according to some embodiments of the invention.

FIG. 4 is a diagram of a classification decision tree in some embodiments of the invention.

Fig. 5A and 5B are exemplary flow diagrams of classification decision tree training methods according to some embodiments of the invention.

FIG. 6 is an exemplary flow chart of a classification decision tree training method according to further embodiments of the present invention.

FIG. 7 is an exemplary flow chart of a classification model adjustment method according to some embodiments of the invention.

Fig. 8 is an exemplary block diagram of an option device according to some embodiments of the invention.

Fig. 9 is an exemplary block diagram of an option device according to further embodiments of the invention.

Fig. 10 is an exemplary block diagram of an option device according to further embodiments of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Fig. 1 is an exemplary flow chart of an election method according to some embodiments of the invention. As shown in fig. 1, the selection method of this embodiment includes steps S102 to S106.

In step S102, a feature value of the candidate product is determined according to the online product data of the candidate product and the user behavior data of the online user whose offline distance from the store to be selected is smaller than a preset value. The to-be-selected shop is an off-line shop of the to-be-selected commodity.

The online merchandise data includes sales data for the merchandise, flow data, after-market data, supply chain data, user representation data for the online user, and the like. The user behavior data is crowd information which is matched with the online users and is around the offline shop, and the identification of the crowd can be obtained by collecting the offline operation positions of the users when the users perform online operations.

The users around the offline shop may be defined as users whose number of times of the preset operation within the preset time reaches the preset value within the coverage range where the offline distance from the to-be-selected shop is smaller than the preset value. For example, a user who makes a single number of times more than 5 times within a range of 3 km around the candidate, browses a website, or uses an application number of times more than 20 times within the last 3 months may be defined as a user around an off-line shop.

In step S104, the feature value of the candidate commodity is input to a classification model, which is trained with the feature value of the training sample data as input and the sales information as a marker value. The classification result of the classification model comprises the selection of alternative commodities for the stores to be selected and the non-selection of alternative commodities for the stores to be selected.

The characteristics used in the training phase and the prediction phase are consistent, and the characteristic value is determined according to the condition of each commodity. When the characteristics are selected, the characteristics in the online commodity data and the user behavior data can be subjected to characteristic fusion to obtain characteristics used in training and prediction, so that the characteristics have online and offline attributes simultaneously, and online commodities can be selected for offline shops better.

In step S106, it is determined whether to select an alternative product for the store to be selected according to the classification result of the classification model.

By the method of the embodiment, the characteristic value of the candidate commodity can be determined through the online commodity data and the data of the users around the offline candidate store, and the candidate commodity is classified by adopting the pre-trained classification model to determine whether the candidate commodity is selected for the candidate store, so that the commodity needing to be sold in the offline store can be rapidly determined from a large amount of online commodities, and the commodity selecting efficiency and accuracy are improved.

In some embodiments, the specific contents of the online commodity data and the user behavior data can refer to table 1. One or more of the subclasses of data in the table below may be selected as desired by those skilled in the art, or data other than table 1 may be selected.

TABLE 1

After the characteristics in the online commodity data and the user behavior data are obtained, different characteristics can be fused according to the commodity identification. One example of the integrated features may be as shown in table 2. In table 2, "whether or not" in the last column is clear "is a flag value, and each column other than the first column and the last column indicates one feature.

TABLE 2

In some embodiments, the features may be further filtered after the features are initially determined. An embodiment of the feature screening method of the present invention is described below with reference to fig. 2A and 2B.

Fig. 2A is an exemplary flow diagram of a feature screening method according to some embodiments of the invention. As shown in fig. 2A, the feature screening method of this embodiment includes steps S2012 to S2014.

In step S2012, correlations between different features are calculated.

The method of one embodiment of the correlation calculation is to calculate the covariance between the two variables. If the changes of the two variables tend to be consistent, the covariance of the two variables is positive; if the variation trends of the two variables are opposite, the covariance is a negative value; if the two variables are independent, then the covariance is 0. Equation (1) is an example of a covariance calculation method.

In formula (1), n represents the total data amount, and i represents the identification of the data; x and Y each represent two different variables, e.g. two characteristics, X_iAnd Y_iRespectively represent the ith values of the two variables,

and

respectively representing the mean values of two variables; cov (X, Y) represents the covariance of the two variables.

In step S2014, one of the different features having the correlation greater than the preset value is screened out. For example, the main features may be retained according to the traffic demand.

Thus, only one feature can be retained among a plurality of similar features, and the calculation efficiency is improved.

FIG. 2B is an exemplary flow chart of a feature screening method according to other embodiments of the present invention. As shown in fig. 2B, the feature screening method of this embodiment includes steps S2022 to S2024.

In step S2022, a correlation between the feature and the sales amount information is calculated. The calculation method of the correlation can still refer to formula (1). At this time, one of the two variables X and Y of the formula (1) is a feature, and the other is sales information, i.e., a tag value.

In step S2024, features whose correlation with the pin amount information is smaller than a preset value are screened out.

Therefore, the characteristics with small influence on the result can be removed, and the classification accuracy is improved.

An embodiment of the classification model training method is described below with reference to fig. 3.

FIG. 3 is an exemplary flow diagram of a classification model training method according to some embodiments of the invention. As shown in fig. 3, the classification model training method of this embodiment includes steps S302 to S306.

In step S302, online commodity data and user behavior data for training are acquired.

In step S304, feature fusion is performed on the acquired online commodity data and the user behavior data according to the commodity identifier, and training sample data is generated, where the training sample data includes a feature value and sales information corresponding to each commodity identifier.

The sales information can be data such as sales quantity, sales amount and the like; the SKU may also be a label of two or more categories such as "clear sale", "not clear sale", and the like, for example, a SKU with a daily sale amount of 1000 or more may be preset to have a label value of "clear sale".

The sales information may be pure offline data, pure online data, or both, and may be selected by one skilled in the art as desired. Therefore, whether or not the product is selected for the off-line shop can be determined according to whether or not the product is hot sold.

In step S306, the classification model is trained using the feature values of the training sample data as input and the corresponding sales information as the label value. The classification model may be a decision tree, neural network, logistic regression, or the like.

Taking the decision tree as an example, the classification nodes in the trained classification decision tree can be the features of the commodity, and the leaf nodes can represent the classification result. FIG. 4 is a diagram of a classification decision tree in some embodiments of the invention. As shown in fig. 4, if the collection amount of one commodity is larger than the preset value x and is an imported commodity, it is classified as a "selected commodity"; if the collection amount of a commodity is larger than a preset value x, the commodity is not imported, and the off-line crowd preference degree around the to-be-selected store is low, the commodity is classified as 'no-selection commodity'.

An embodiment of the classification decision tree training method of the present invention is described below with reference to FIG. 5A.

FIG. 5A is an exemplary flow diagram of a classification decision tree training method according to some embodiments of the invention. As shown in FIG. 5A, the classification decision tree training method of this embodiment includes steps S502-S506.

In step S502, the information gain of the features not added to the classification decision tree is calculated according to the training sample data corresponding to the node position to be added. The training sample data corresponding to the node positions to be added refers to data left when all training sample data are classified to the current position by adopting a classification decision tree.

When the node to be added is the root node, the training sample data corresponding to the root node is all the training sample data adopted by the training classification decision tree because no classification is carried out before the classification of the root node.

When the node to be added is not the root node, the training sample data corresponding to the position of the node to be added is the training sample data classified by adopting the characteristics of the father node in the training sample data corresponding to the father node of the node to be added. For example, the parent node corresponds to the data A, B, C, D, and the parent node is characterized by "whether the inventory is greater than the preset value y", then the data A, B with the inventory greater than y may be the data corresponding to the first child node position of the parent node, and the data C, D with the inventory less than or equal to y may be the data corresponding to the second child node position of the parent node.

In step S504, the feature with the largest information gain is added to the node position to be added.

Then, the training sample data corresponding to the newly added node positions can be classified according to the newly added features. Therefore, training sample data corresponding to the positions of the nodes to be added in the next cycle are obtained.

In step S506, it is determined whether there are more features currently not added to the decision tree. If yes, go back to step S502; if not, the training process is ended and leaf nodes representing classification results are added to the current classification decision tree. And the classification result represented by the leaf node is determined according to the mark value of the training sample data corresponding to the position of the node to be added. That is, if the leaf node is compared with the sibling leaf node, and if the commodity data having the flag value of a is larger or accounts for a larger number, the classification result of the leaf node is set to a.

In some embodiments, no feature may be added for the node to be added when the information gain of the feature is small. An embodiment of the classification decision tree training method of the present invention is described below with reference to FIG. 5B.

FIG. 5B is an exemplary flow chart of a classification decision tree training method according to further embodiments of the present invention. As shown in FIG. 5B, the classification decision tree training method of the embodiment includes steps S512-S520.

In step S512, the information gain of the features not added to the classification decision tree is calculated according to the training sample data corresponding to the node position to be added.

In step S514, it is determined whether the information gain of each feature not added to the classification decision tree is smaller than a preset information gain threshold. If yes, go to step S516; if not, step S518 is performed.

In step S516, a leaf node representing a classification result is added to the node position to be added, where the classification result is determined according to the label value corresponding to the node position to be added.

In step S518, the feature with the largest information gain is added to the node position to be added as a classification node.

In step S520, it is determined whether there are more features currently not added to the decision tree. If yes, go back to step S512; if not, the training process is ended, and leaf nodes representing the classification result are added to the classification nodes of the current classification decision tree.

A calculation method of the information gain is exemplarily described below. Let it be assumed that, for the training sample data set S and the features T, S includes X positive samples and Y negative samples, where the positive samples and the negative samples respectively represent different final classification results, for example, represent good-selling goods and non-good-selling goods, respectively. Carrying out classification marking on the data set S aiming at the characteristics T, for example, setting T to represent 'collection amount', X1 positive samples with the collection amount being 'high', and Y1 negative samples; the positive sample of "low" was collected as X2 and the negative sample as Y2. The information gain rate of the feature T is calculated based on the feature T as follows.

First, the information entropy info (S) of the data set S is calculated using formula (2).

Then, the information entropy info (T) of the feature T is calculated using equation (3).

Finally, the information gain ratio gain (T) of the feature T is calculated by using equations (4) and (5), wherein splittinfo (T) is an intermediate variable.

By the method, the nodes with large influence on the classification result can be preferentially added into the classification decision tree, so that the trained classification model is more accurate, and the accuracy of the selected product is improved.

In some embodiments, multiple models may be trained in advance. An embodiment of the classification decision tree training method of the present invention is described below with reference to FIG. 6.

FIG. 6 is an exemplary flow chart of a classification decision tree training method according to further embodiments of the present invention. As shown in FIG. 6, the classification decision tree training method of this embodiment includes steps S602-S606.

In step S602, a plurality of training sample data subsets are obtained, where the commodities corresponding to each training sample data subset belong to the same category. For example, data in the same subset of training sample data may belong to the same category, the same region category, and so on.

In step S604, each training sample data subset is used to train a classification model, and a plurality of sub-classification models are obtained. For a specific training method, reference may be made to the foregoing embodiments, which are not described herein again.

In step S606, the weight of each sub-classification model is determined according to the prediction accuracy of each sub-classification model, so as to determine whether to select an alternative product for the store to be selected according to the weighted calculation result of the classification result of each sub-classification model. The weight of the sub-classification model can be positively correlated with the prediction accuracy.

Therefore, the multiple models can be trained in advance according to the characteristics of different types of commodities, the prediction results can be determined comprehensively according to the prediction results of the multiple models, and the accuracy of product selection is improved.

In some embodiments, the pre-collected labeled data may be partitioned into a training set and a test set. After training the models with the training set, the classification models can be tested with the test set. An embodiment of the classification model adaptation method of the present invention is described below with reference to fig. 7.

FIG. 7 is an exemplary flow chart of a classification model adjustment method according to some embodiments of the invention. As shown in fig. 7, the classification model adjustment method of this embodiment includes steps S702 to S706.

In step S702, the feature values of the test product are input into the classification decision tree.

In step S704, a prediction accuracy of the classification decision tree is determined according to the classification result of the classification decision tree and the label value of the test product.

In step S706, in response to the prediction accuracy being lower than the preset accuracy threshold, retraining the classification decision tree after performing at least one of the following operations: modifying the information gain threshold; updating the characteristics; and updating training sample data.

Because the information gain threshold determines whether the current position is a node for adding the characteristic node or representing the classification result, the structure of the decision tree can be adjusted by modifying the information gain threshold; the updated features may be added features, subtracted features, or replaced features, so that predictions may be made based on different data characteristics; the updated training sample data can be added data, reduced data or replaced data, so that the extreme data can be removed, and the data which can represent the user preference around the shop can be stored.

By testing and adjusting the trained classification model, the prediction accuracy of the model can be improved, and the accuracy and the efficiency of product selection are further improved.

An embodiment of the selection device of the present invention is described below with reference to fig. 8.

Fig. 8 is an exemplary block diagram of an option device according to some embodiments of the invention. As shown in fig. 8, the selecting device 80 of this embodiment includes: the candidate commodity characteristic value determining module 810 is configured to determine a characteristic value of a candidate commodity according to the online commodity data of the candidate commodity and the user behavior data of the online user of which the offline distance from the candidate store is smaller than a preset value; a candidate commodity feature value input module 820 configured to input a feature value of a candidate commodity into a classification model, wherein the classification model is trained by taking a feature value of training sample data as input and taking sales information as a tag value; and the item selection determining module 830 is configured to determine whether to select an alternative commodity for the to-be-selected store according to the classification result of the classification model.

In some embodiments, the selecting device 80 further includes: a classification model training module 840 configured to obtain online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; and training the classification model by using the characteristic value of the training sample data as input and using corresponding sales information as a mark value.

In some embodiments, the selecting device 80 further includes: a classification decision tree training module 850 configured to obtain online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; training a classification model, comprising: and calculating the information gain of the features which are not added into the classification decision tree according to the training sample data corresponding to the positions of the nodes to be added, adding the features with the maximum information gain to the positions of the nodes to be added to serve as classification nodes, and repeating the steps of calculating the information gain and adding the features until all the features are added into the classification decision tree.

In some embodiments, the selecting device 80 further includes: a classification decision tree training module 850 configured to obtain online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; training a classification model, comprising: calculating information gains of the features which are not added into the classification decision tree according to training sample data corresponding to the positions of the nodes to be added, adding leaf nodes representing classification results to the positions of the nodes to be added under the condition that the information gains of the features which are not added into the classification decision tree are all smaller than a preset information gain threshold, adding the features with the maximum information gains to the positions of the nodes to be added as classification nodes under the condition that the information gains of the features which are not added into the classification decision tree are not all smaller than the preset information gain threshold, and repeating the steps of calculating the information gains and adding the features until all the features are added into the classification decision tree; and determining the classification result according to the mark value corresponding to the position of the node to be added.

In some embodiments, the selecting device 80 further includes: a classification model test adjustment module 860 configured to input the feature values of the test goods into a classification decision tree; determining the prediction accuracy of the classification decision tree according to the classification result of the classification decision tree and the marking value of the test commodity; in response to the prediction accuracy being below a preset accuracy threshold, retraining the classification decision tree after performing at least one of: modifying the information gain threshold; updating the characteristics; and updating training sample data.

In some embodiments, the selecting device 80 further includes: a multi-model training module 870 configured to obtain a plurality of training sample data subsets, wherein the commodities corresponding to each training sample data subset belong to the same category; respectively training a classification model by adopting each training sample data subset to obtain a plurality of sub-classification models; and determining the weight of each sub-classification model according to the prediction accuracy of each sub-classification model so as to determine whether to select the alternative commodity for the store to be selected according to the weighted calculation result of the classification result of each sub-classification model.

In some embodiments, the selecting device 80 further includes: a feature screening module 880 configured to perform at least one of the following feature screening methods: screening out one of the different characteristics having a correlation greater than a preset value; and screening out the characteristic that the correlation of the information of the sales volume is less than the preset value.

Fig. 9 is an exemplary block diagram of an option device according to further embodiments of the invention. As shown in fig. 9, the selecting apparatus 900 of this embodiment includes: a memory 910 and a processor 920 coupled to the memory 910, the processor 920 configured to execute an option method in any of the embodiments described above based on instructions stored in the memory 910.

Memory 910 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.

Fig. 10 is an exemplary block diagram of an option device according to further embodiments of the invention. As shown in fig. 10, the selection device 1000 of this embodiment includes: the memory 1010 and the processor 1020 may further include an input/output interface 1030, a network interface 1040, a storage interface 1050, and the like. These

interfaces

1030, 1040, 1050 and the memory 1010 and the processor 1020 may be connected via a bus 1060, for example. The input/output interface 1030 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. Network interface 1040 provides a connection interface for various networking devices. The storage interface 1050 provides a connection interface for external storage devices such as an SD card and a usb disk.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program is configured to implement any one of the aforementioned methods when executed by a processor.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An item selection method, comprising:

determining a characteristic value of the candidate commodity according to the online commodity data of the candidate commodity and the user behavior data of the online user of which the offline distance from the candidate store is smaller than a preset value, wherein the candidate store is an offline store;

inputting the characteristic value of the candidate commodity into a classification model, wherein the classification model is trained by taking the characteristic value of training sample data as input and taking sales information as a marking value;

and determining whether to select the alternative commodity for the shop to be selected according to the classification result of the classification model.

2. The method of selecting items of claim 1, further comprising:

acquiring online commodity data and user behavior data for training;

performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification;

and training the classification model by using the characteristic value of the training sample data as input and using corresponding sales information as a mark value.

3. The method of selecting a commodity according to claim 1, wherein the classification model is a classification decision tree, classification nodes in the classification decision tree are features of the commodity, and leaf nodes represent classification results.

4. The method of selecting items of claim 3, further comprising:

acquiring online commodity data and user behavior data for training;

training a classification model, comprising:

calculating the information gain of the features which are not added into the classification decision tree according to the training sample data corresponding to the positions of the nodes to be added;

adding the characteristics with the maximum information gain to the positions of the nodes to be added to serve as classification nodes;

and repeating the steps of calculating information gain and adding the features until all the features are added into the classification decision tree.

5. The method of selecting items of claim 3, further comprising:

acquiring online commodity data and user behavior data for training;

training a classification model, comprising:

adding leaf nodes representing classification results to the positions of nodes to be added under the condition that the information gain of each feature which is not added into the classification decision tree is smaller than a preset information gain threshold, wherein the classification results are determined according to the mark values corresponding to the positions of the nodes to be added;

under the condition that the information gains of the features which are not added into the classification decision tree are not smaller than a preset information gain threshold, adding the features with the maximum information gains to the positions of the nodes to be added to serve as classification nodes;

6. The method of selecting items of claim 4, further comprising:

inputting the characteristic value of the tested commodity into a classification decision tree;

determining the prediction accuracy of the classification decision tree according to the classification result of the classification decision tree and the marking value of the test commodity;

in response to the prediction accuracy being below a preset accuracy threshold, retraining the classification decision tree after performing at least one of: and modifying the information gain threshold, updating the characteristics and updating the training sample data.

7. The method of selecting items of claim 1, further comprising:

obtaining a plurality of training sample data subsets, wherein commodities corresponding to each training sample data subset belong to the same category;

respectively training a classification model by adopting each training sample data subset to obtain a plurality of sub-classification models;

and determining the weight of each sub-classification model according to the prediction accuracy of each sub-classification model so as to determine whether to select the alternative commodity for the store to be selected according to the weighted calculation result of the classification result of each sub-classification model.

8. An election method according to any one of claims 1-7, further comprising performing at least one of the following feature screening methods:

screening out one of the different characteristics having a correlation greater than a preset value;

and screening out the characteristic that the correlation of the information of the sales volume is less than the preset value.

9. An item selection method according to any one of claims 1 to 7,

the online merchandise data includes at least one of: the online shopping system comprises an online purchasing feature, an online browsing feature, an online searching feature, an online shopping cart adding feature, an online brand sales feature, an online evaluation feature, an online basic attribute feature and an online purchasing feature of a user in an offline preset range;

the user behavior data includes at least one of: a user location feature, a user purchase feature, a user browsing feature, a user search feature, a user add shopping cart feature, a user purchase brand feature, a user base attribute feature.

10. An item selection apparatus comprising:

the system comprises an alternative commodity characteristic value determining module, a characteristic value determining module and a characteristic value determining module, wherein the alternative commodity characteristic value determining module is configured to determine the characteristic value of an alternative commodity according to online commodity data of the alternative commodity and user behavior data of an online user of which the offline distance from a store to be selected is smaller than a preset value, and the store to be selected is an offline store;

the candidate commodity characteristic value input module is configured to input the characteristic value of the candidate commodity into a classification model, wherein the classification model is trained by taking the characteristic value of training sample data as input and taking sales information as a marking value;

and the selection determining module is configured to determine whether to select the alternative commodity for the shop to be selected according to the classification result of the classification model.

11. The election device of claim 10, further comprising:

a classification model training module configured to acquire online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; and training the classification model by using the characteristic value of the training sample data as input and using corresponding sales information as a mark value.

12. The selection apparatus according to claim 10, wherein the classification model is a classification decision tree, classification nodes in the classification decision tree are features of commodities, and leaf nodes represent classification results.

13. The election device of claim 12, further comprising:

a classification decision tree training module configured to acquire online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; training a classification model, comprising: and calculating the information gain of the features which are not added into the classification decision tree according to the training sample data corresponding to the positions of the nodes to be added, adding the features with the maximum information gain to the positions of the nodes to be added to serve as classification nodes, and repeating the steps of calculating the information gain and adding the features until all the features are added into the classification decision tree.

14. The election device of claim 12, further comprising:

a classification decision tree training module configured to acquire online commodity data and user behavior data for training; performing feature fusion on the acquired online commodity data and the user behavior data according to the commodity identification to generate training sample data, wherein the training sample data comprises a feature value and sales volume information corresponding to each commodity identification; training a classification model, comprising: calculating information gain of the features which are not added into the classification decision tree according to training sample data corresponding to the positions of the nodes to be added, adding leaf nodes representing classification results to the positions of the nodes to be added under the condition that the information gain of each feature which is not added into the classification decision tree is smaller than a preset information gain threshold, adding the features with the maximum information gain to the positions of the nodes to be added as classification nodes under the condition that the information gain of the features which are not added into the classification decision tree is not smaller than the preset information gain threshold, and repeating the steps of calculating the information gain and adding the features until all the features are added into the classification decision tree;

and determining the classification result according to the mark value corresponding to the position of the node to be added.

15. The election device of claim 13 further comprising:

the classification model test adjustment module is configured to input the characteristic value of the test commodity into a classification decision tree; determining the prediction accuracy of the classification decision tree according to the classification result of the classification decision tree and the marking value of the test commodity; in response to the prediction accuracy being below a preset accuracy threshold, retraining the classification decision tree after performing at least one of: modifying the information gain threshold; updating the characteristics; and updating training sample data.

16. The election device of claim 10, further comprising:

the multi-model training module is configured to acquire a plurality of training sample data subsets, wherein commodities corresponding to each training sample data subset belong to the same category; respectively training a classification model by adopting each training sample data subset to obtain a plurality of sub-classification models; and determining the weight of each sub-classification model according to the prediction accuracy of each sub-classification model so as to determine whether to select the alternative commodity for the store to be selected according to the weighted calculation result of the classification result of each sub-classification model.

17. The option device according to any one of claims 10 to 16, further comprising:

a feature screening module configured to perform at least one of the following feature screening methods: screening out one of the different characteristics having a correlation greater than a preset value; and screening out the characteristic that the correlation of the information of the sales volume is less than the preset value.

18. The option device according to any one of claims 10 to 16,

19. An item selection apparatus comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the method of selecting as claimed in any of claims 1-9 based on instructions stored in the memory.

20. A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements the method of selecting an item of any one of claims 1 to 9.