CN111832782A - Method and device for determining physical distribution attribute of article - Google Patents

Method and device for determining physical distribution attribute of article Download PDF

Info

Publication number
CN111832782A
CN111832782A CN201910307387.8A CN201910307387A CN111832782A CN 111832782 A CN111832782 A CN 111832782A CN 201910307387 A CN201910307387 A CN 201910307387A CN 111832782 A CN111832782 A CN 111832782A
Authority
CN
China
Prior art keywords
attribute
data set
source data
logistics
article
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910307387.8A
Other languages
Chinese (zh)
Inventor
李垚男
陈士亮
李伟伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Zhenshi Information Technology Co Ltd
Original Assignee
Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Zhenshi Information Technology Co Ltd filed Critical Beijing Jingdong Zhenshi Information Technology Co Ltd
Priority to CN201910307387.8A priority Critical patent/CN111832782A/en
Publication of CN111832782A publication Critical patent/CN111832782A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a device for determining physical distribution attributes of articles, and relates to the technical field of computers. One embodiment of the method comprises: acquiring name information and logistics attribute information of an article to acquire a source data set; determining a first matrix corresponding to the name information of the article and an attribute label corresponding to the logistics attribute information based on the source data set; training the first matrix and the attribute label as training samples to obtain an attribute prediction model; and determining the logistics attribute information of the article to be predicted according to the attribute prediction model and the name information of the article to be predicted. The embodiment utilizes the statistical name information and the logistics attribute information to generate the text neural regression network to solve the problem of commodity logistics attribute prediction, thereby improving the prediction accuracy and precision.

Description

Method and device for determining physical distribution attribute of article
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for determining physical distribution attributes of articles.
Background
With the development of electronic commerce, the order volume is increased rapidly, and the logistics is also rapidly developed. In the logistics process, express delivery freight is generally calculated according to the volume or weight of the package. The volume or weight of the package is generally measured by a worker on site through a tool, and the express delivery charge is calculated. In the process, more or less freight charges can be caused by negligence or carelessness of workers, so that the economy of customers is damaged, and the user experience is reduced.
The methods currently exist are: and predicting logistics attributes (the logistics attributes refer to the attributes of the packaged goods in the logistics circulation process, including the maximum edge, the middle long edge, the minimum edge and the weight) of the package by using the data in the data set, and then predicting the freight of the package according to the logistics attributes. Wherein, the data in the data set is counted in advance, and each piece of data in the data set comprises a short text (namely commodity name information) and a corresponding numerical attribute (namely logistics attribute). Specifically, for the data obtained by statistics, the data is classified, for example, the tv is classified into one category, the large fitness equipment is classified into one category, and then the logistics attribute of each commodity is counted under each category to obtain the maximum value and the minimum value. When a new parcel needs to be predicted, the new parcel is classified according to the commodity name of the parcel to obtain the class to which the new parcel belongs, and then the statistical result of the median or the median and the like of the logistics attribute of the class to which the new parcel belongs is taken as the logistics attribute of the parcel.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the method converts the problem of correspondence of continuous numerical values of texts (namely, the problem of correspondence between commodity names and commodity logistics attributes, namely, the problem of text regression) into a classification problem, and thus, the relation between categories is neglected, so that the prediction precision is low. Moreover, the variety of the goods is various, a lot of time is consumed when the category of the goods is counted, a complex rule needs to be designed when the category of the package to be detected is subsequently judged, the prediction precision is related to the data set and the judgment rule, and if the data in the data set is not rich enough or the judgment rule is not perfect, the prediction precision is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for determining physical distribution attributes of an article, which can generate a text neural regression network by using statistical name information and physical distribution attribute information to solve the problem of predicting physical distribution attributes of a commodity.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of determining an attribute of an article logistics, including:
acquiring name information and logistics attribute information of an article to acquire a source data set;
determining a first matrix corresponding to the name information of the article and an attribute label corresponding to the logistics attribute information based on the source data set;
training the first matrix and the attribute label as training samples to obtain an attribute prediction model;
and determining the logistics attribute information of the article to be predicted according to the attribute prediction model and the name information of the article to be predicted.
Optionally, determining, based on the source data set, a first matrix corresponding to the name information of the item includes:
generating an ordered dictionary according to the source data set;
mapping the name information in the source data set into a first vector according to the ordered dictionary;
initializing each word of the ordered dictionary into a second vector to obtain a second matrix corresponding to the ordered dictionary;
and obtaining a first matrix corresponding to the name information of the article according to the first vector and the second matrix.
Optionally, the ordered dictionary comprises words and numbers of the words;
mapping name information in the source data set to a first vector according to the ordered dictionary comprises:
determining the number of each word in the name information in the ordered dictionary;
mapping the name information in the source data set to a first vector according to the following formula (1):
Figure BDA0002030315130000031
where v represents the total number of words in the ordered dictionary, p represents the number of each word in the name information in the ordered dictionary, MkDenotes the kth element in the first vector to which name information is mapped, p and q denote natural numbers and p and q are not identical.
Optionally, determining the attribute tags based on the source data set comprises: and for each logistics attribute of the article, regularizing the logistics attribute to obtain an attribute label of the logistics attribute of the article.
Optionally, for each logistics attribute of the article, regularizing the logistics attribute, and obtaining the attribute label of the logistics attribute of the article includes:
for each logistics attribute of the article, determining a mean and a mean square error of the logistics attribute in the source data set;
and determining an attribute label of the logistics attribute of the article according to the mean and mean square error.
Optionally, determining an attribute label for the logistics attribute of the item according to the following equation (2):
Figure BDA0002030315130000032
where N represents the number of items in the source data set, yi,jDenotes the jth logistic property, μ, of the ith itemjMeans, σ, representing the jth material flow property in the source data setjMean square error, z, representing the jth material flow property in the source data seti,jAnd the attribute label represents the jth logistics attribute in the source data set.
Optionally, the attribute prediction model is obtained by training according to a loss function shown in the following formula (3):
Figure BDA0002030315130000041
wherein Loss represents a Loss function, λ represents a preset first weight, Wg,hRepresents a connection weight of the g-th neuron to the h-th neuron, b represents an offset value of the connection, k represents the number of neurons having the offset value, θ represents a preset second weight,
Figure BDA0002030315130000042
a tag of an attribute representing the prediction is indicated,
Figure BDA0002030315130000043
represents L2The term of regularization is used to describe the term,
Figure BDA0002030315130000044
representing the weighted mean square error.
Optionally, before determining the first matrix corresponding to the name information of the article and the attribute tag corresponding to the logistics attribute information based on the source data set, the method further includes: cleaning the source data set to remove useless contents; and performing word segmentation and de-duplication processing on the cleaned source data set.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an apparatus for determining physical distribution properties of an article, including: the data acquisition module is used for acquiring the name information and the logistics attribute information of the article to acquire a source data set; the data processing module is used for determining a first matrix corresponding to the name information of the article and an attribute label corresponding to the logistics attribute information based on the source data set; the model training module is used for training the first matrix and the attribute labels as training samples to obtain an attribute prediction model; and the attribute determining module is used for determining the logistics attribute information of the to-be-predicted article according to the attribute prediction model and the name information of the to-be-predicted article.
Optionally, the data processing module is further configured to: generating an ordered dictionary according to the source data set; mapping the name information in the source data set into a first vector according to the ordered dictionary; initializing each word of the ordered dictionary into a second vector to obtain a second matrix corresponding to the ordered dictionary; and obtaining a first matrix corresponding to the name information of the article according to the first vector and the second matrix.
Optionally, the ordered dictionary comprises words and numbers of the words;
the data processing module is further configured to: determining the number of each word in the name information in the ordered dictionary; mapping the name information in the source data set to a first vector according to the following formula (1):
Figure BDA0002030315130000051
where v represents the total number of words in the ordered dictionary, p represents the number of each word in the name information in the ordered dictionary, MkDenotes the kth element in the first vector to which name information is mapped, p and q denote natural numbers and p and q are not identical.
Optionally, the data processing module is further configured to: and for each logistics attribute of the article, regularizing the logistics attribute to obtain an attribute label of the logistics attribute of the article.
Optionally, the data processing module is further configured to: for each logistics attribute of the article, determining a mean and a mean square error of the logistics attribute in the source data set; and determining an attribute label of the logistics attribute of the article according to the mean and mean square error.
Optionally, the data processing module is further configured to determine an attribute tag of the logistics attribute of the commodity according to the following formula (2):
Figure BDA0002030315130000052
where N represents the number of items in the source data set, yi,jDenotes the jth logistic property, μ, of the ith itemjMeans, σ, representing the jth material flow property in the source data setjRepresenting the source dataCentralizing the mean square error of the jth physical property, zi,jAnd the attribute label represents the jth logistics attribute in the source data set.
Optionally, the attribute prediction model is obtained by training according to a loss function shown in the following formula (3):
Figure BDA0002030315130000053
wherein Loss represents a Loss function, λ represents a preset first weight, Wg,hRepresents a connection weight of the g-th neuron to the h-th neuron, b represents an offset value of the connection, k represents the number of neurons having the offset value, θ represents a preset second weight,
Figure BDA0002030315130000061
indicating the predicted attribute label, λ (∑)g,hWg,h 2+∑kbk 2) Represents L2The term of regularization is used to describe the term,
Figure BDA0002030315130000062
representing the weighted mean square error.
Optionally, the apparatus further comprises a data filtering module, configured to: cleaning the source data set to remove useless contents; and performing word segmentation and de-duplication processing on the cleaned source data set.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for determining the material flow property of an embodiment of the present invention.
To achieve the above object, according to an aspect of the embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, the program, when executed by a processor, implementing the method for determining the property of the material flow according to the embodiments of the present invention.
One embodiment of the above invention has the following advantages or benefits: the name information and the logistics attribute information of the article are acquired to acquire a source data set; determining a first matrix corresponding to the name information of the article and an attribute label corresponding to the logistics attribute information based on the source data set; training the first matrix and the attribute label as training samples to obtain an attribute prediction model; and determining logistics attribute information of the to-be-predicted article according to the attribute prediction model and the name information of the to-be-predicted article, so that a text neural regression network can be generated by utilizing the statistical name information and the logistics attribute information to solve the problem of commodity logistics attribute prediction, and the prediction accuracy and precision are improved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a main flow of a method of determining properties of a material stream according to an embodiment of the invention;
FIG. 2 is a schematic illustration of a sub-flow of a method of determining properties of an item stream according to an embodiment of the invention;
FIG. 3 is a block diagram of a neural network model for a method of determining material flow attributes in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of the main modules of an apparatus for determining properties of a material flow according to an embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a method of determining properties of a material stream according to an embodiment of the invention, as shown in fig. 1, the method comprising:
step S101: and acquiring the name information and the logistics attribute information of the article to acquire a source data set.
In this step, the logistics attribute information of the article refers to attribute information of the article in the logistics circulation process after being packaged, such as a maximum side, a middle long side, a minimum side or a weight.
Step S102: and determining a first matrix corresponding to the name information of the article and an attribute label corresponding to the logistics attribute information based on the source data set.
In an alternative embodiment, as shown in fig. 2, the process of determining the first matrix corresponding to the name information of the item based on the source data set includes:
step S201: generating an ordered dictionary according to the source data set;
step S202: mapping the name information in the source data set into a first vector according to the ordered dictionary;
step S203: initializing each word of the ordered dictionary into a second vector to obtain a second matrix corresponding to the ordered dictionary;
step S204: and obtaining a first matrix corresponding to the name information of the article according to the first vector and the second matrix.
For step S201, before generating the ordered dictionary according to the source data set, data in the source data set may be preprocessed or filtered, which may specifically include:
cleaning the source data set to remove useless contents;
and performing word segmentation and de-duplication processing on the cleaned source data set.
Wherein the process of cleaning the source data set may include removing stop words in the name information of the object, i.e. removing typical text that does not contribute to the result. As a specific example, a typical stop word includes "+ - ()? 【】 ""! ,. Is there a And @ # … … & () ". Of course, in this embodiment, a stop word list may also be maintained or counted, and if a word in the stop word list appears in the name information in the source data set, the word in the name information is deleted. For example, if the name information in the source data set is "[ 4K ] tv set" or "high definition-tv set", the "[ 4K ] tv set" and the "high definition-tv set" in the text are deleted, and the "4K tv set" and the "high definition-tv set" are obtained.
After removing useless contents in the source data set, the text in the source data set is segmented, and specifically, the existing segmentation tools such as jieba or FoolNLTK can be used for segmentation. Space intervals can be used between words after word segmentation.
After word segmentation, the obtained words need to be subjected to de-duplication processing to remove duplicated words.
After the source data concentrated text is sequentially cleaned, participled and de-duplicated, the remaining words are generated into an ordered dictionary. Each word is given a unique number in the ordered dictionary. For example, the remaining words include: high definition, TV, refrigerator, 4K, Hami melon, watermelon, strawberry, milk, etc., then the ordered dictionary that generates can be as shown in Table 1 below:
table 1:
numbering Word and phrase Numbering Word and phrase
1 High definition 2 Television receiver
3 Refrigerator with a door 4 4K
5 Hami melon 6 Watermelon
7 Strawberry 8 Milk
For step S202, the following process is specifically included:
determining the number of each word in the name information in the ordered dictionary;
mapping the name information in the source data set to a first vector according to the following formula (1):
Figure BDA0002030315130000091
where v represents the total number of words in the ordered dictionary, p represents the number of each word in the name information in the ordered dictionary, MkDenotes the kth element in the first vector to which the name information is mapped, p and q denote natural numbers and are not the same, for example, p ═ 1 and q ═ 0.
As a specific example, if the name information is "4K high definition television", "4K" is numbered 4 in the ordered dictionary, "high definition" is numbered 1 in the ordered dictionary, "television" is numbered 2 in the ordered dictionary, the first vector T of the name information map is (1, 1, 0, 1, 0, 0, 0, 0).
For step S203, each word in the ordered dictionary may be initialized to a multi-dimensional vector, for example, a 128-dimensional vector, by using a common initialization method or a pre-training method. For example, initialization is done using the Kaiming He initialization method or using a uniform distribution U [0, 1) between 0 and 1 or word2vec initializes each word to a multidimensional vector. After initializing each word to an R-dimensional vector, the entire ordered dictionary corresponds to a V R second matrix W, where V represents the number of words in the ordered dictionary. Where Word2vec (Word to vector) is a group of correlation models used to generate Word vectors. These models are shallow, two-layer neural networks that are trained to reconstruct linguistic word text.
Then multiplying the second matrix by the first vector obtained in the above step to obtain the first matrix M corresponding to the name informationk=W×T。
After the first matrix corresponding to the name information of each article is obtained, an attribute tag corresponding to the logistics attribute information of each article needs to be determined. Specifically, the following processes may be included:
and for each logistics attribute of the article, regularizing the logistics attribute to obtain an attribute label of the logistics attribute of the article.
Further, the method comprises the following steps:
for each logistics attribute of the item, determining an average of the logistics attribute in the source data set
Figure BDA0002030315130000101
Sum mean square error
Figure BDA0002030315130000102
According to the mean value mujSum mean square error σjDetermining said object of said articleAttribute tags for stream attributes
Figure BDA0002030315130000103
Where N represents the number of items in the source data set, yi,jDenotes the jth logistic property, μ, of the ith itemjMeans, σ, representing the jth material flow property in the source data setjMean square error, z, representing the jth material flow property in the source data seti,jAnd the attribute label represents the jth logistics attribute in the source data set.
After the first matrix corresponding to each item name information and the attribute label corresponding to the logistics attribute information can be obtained through the steps, the neural network model can be constructed. Namely, step S103: and training the first matrix and the attribute label as training samples to obtain an attribute prediction model.
Specifically, as shown in fig. 3, the constructed neural network model may include:
(1) embedding layer (the embedding layer is the first layer of the neural network model): the embedding layer is used for determining a first matrix of each input sample, namely determining a first matrix corresponding to a sequence formed by n words.
(2) And (3) rolling layers: as a specific example, there are three convolution kernels at this level, with sizes of (128, 3), (128, 4) and (128, 5), respectively, 64 each, and the way of convolution is an effective convolution, i.e. no padding is applied. After convolution, the RELU activation function is employed. Among them, Activation functions (Activation functions) play an important role in learning and understanding very complex and nonlinear functions by artificial neural network models. They introduce non-linear characteristics into our network. In this embodiment, a filtered Linear Unit (RELU) activation function is preferably used as an activation function of the neuron, and a model after the RELU sparse implementation can better mine relevant features and fit training data. The ReLU has the following advantages over other activation functions: for a linear function, the expression capacity of the ReLU is stronger, and the ReLU is particularly embodied in a deep network; for the nonlinear function, because the Gradient of the non-negative interval is constant, the ReLU has no Problem of Gradient disappearance (changing Gradient distribution), so that the convergence speed of the model is maintained in a stable state.
(3) A pooling layer: the pooling layer employs maximum pooling.
(4) An output layer: in this embodiment, the output layer is connected in a full-connection manner, and the number of output layer nodes is the same as the number of physical distribution attributes.
After the neural network model is constructed, a final attribute prediction model can be obtained only by training with a loss function and a preset training optimization method.
Specifically, in this step, an error back propagation algorithm and an Adam optimizer can be used to train and optimize the model, and when the accuracy of the model does not rise any more, the network parameter with the highest accuracy is saved as the model parameter. The Adam optimizer is proposed by two scholars of Kingma and Lei Ba in 12 months 2014, and combines the advantages of two optimization algorithms of AdaGrad and RMSProp. The First Moment estimate (i.e., the mean of the gradient) and the Second Moment estimate (i.e., the non-centered variance of the gradient) of the gradient are considered together, and the update step is calculated. Back Propagation (BP) is a short term for "back propagation of errors", a common method used in conjunction with optimization methods (such as gradient descent) to train artificial neural networks. The method calculates the gradient of the loss function for all weights in the network. This gradient is fed back to the optimization method for updating the weights to minimize the loss function.
Wherein, in an alternative embodiment, the loss function is shown in equation (3) below:
Figure BDA0002030315130000111
wherein Loss represents a Loss function, λ represents a preset first weight, Wg,hRepresents a connection weight of the g-th neuron to the h-th neuron, b represents an offset value of the connection, k represents the number of neurons having the offset value, θ represents a preset second weight,
Figure BDA0002030315130000112
indicating the predicted attribute label, λ (∑)g,hWg,h 2+∑kbk 2) Represents L2The term of regularization is used to describe the term,
Figure BDA0002030315130000121
representing the weighted mean square error.
After the final attribute prediction model is obtained, logistics attribute information of the article to be predicted can be determined by using the attribute prediction model. Namely, step S104: and determining the logistics attribute information of the article to be predicted according to the attribute prediction model and the name information of the article to be predicted.
Specifically, the following processes may be included:
1) and cleaning the name information of the article to be predicted, and inputting the name information into the attribute prediction model after word segmentation processing.
2) After the output of the attribute prediction model is obtained, the logistics attribute information of the object to be predicted is obtained through reverse regularization, namely
Figure BDA0002030315130000122
Wherein, yjIs the jth logistic attribute of the item to be predicted.
The method for determining the logistics attribute of an article, provided by the embodiment of the invention, comprises the steps of obtaining a source data set by obtaining name information and logistics attribute information of the article, determining a first matrix corresponding to the name information of the article and an attribute label corresponding to the logistics attribute information based on the source data set, and training the first matrix and the attribute label as training samples to obtain an attribute prediction model, namely generating a text neural regression network by utilizing statistical name information and logistics attribute information; and then according to the attribute prediction model and the name information of the article to be predicted, determining the logistics attribute information of the article to be predicted to solve the problem of commodity logistics attribute prediction, thereby improving the prediction accuracy and precision.
Above is aIn the examples, the values of the physical distribution properties are all continuous values. In an alternative embodiment, the value of the logistic attribute may be discretized, for example, if the length of the article ranges from 0 to 100 cm, then the index may be 5 cm, and discretized, then the label z corresponding to that lengthi10 × i, where i is 1, 2 … 20. After discretizing the values of the logistic attributes, the loss function of the training model also needs to be changed, for example, using cross entropy as its loss function, specifically,
Figure BDA0002030315130000123
wherein the content of the first and second substances,
Figure BDA0002030315130000124
is a predicted label.
Fig. 4 is a schematic diagram of main modules of an apparatus 400 for determining physical distribution properties of an object according to an embodiment of the present invention, as shown in fig. 4, the apparatus 400 including:
a data obtaining module 401, configured to obtain name information and logistics attribute information of an article to obtain a source data set;
a data processing module 402, configured to determine, based on the source data set, a first matrix corresponding to the name information of the article and an attribute tag corresponding to the logistics attribute information;
a model training module 403, configured to train the first matrix and the attribute labels as training samples to obtain an attribute prediction model;
and an attribute determining module 404, configured to determine logistics attribute information of the to-be-predicted item according to the attribute prediction model and name information of the to-be-predicted item.
Optionally, the data processing module 402 is further configured to:
generating an ordered dictionary according to the source data set;
mapping the name information in the source data set into a first vector according to the ordered dictionary;
initializing each word of the ordered dictionary into a second vector to obtain a second matrix corresponding to the ordered dictionary;
and obtaining a first matrix corresponding to the name information of the article according to the first vector and the second matrix.
Wherein the ordered dictionary comprises words and numbers of the words. The data processing module 402 is further configured to:
determining the number of each word in the name information in the ordered dictionary;
mapping the name information in the source data set to a first vector according to the following formula (1):
Figure BDA0002030315130000131
where v represents the total number of words in the ordered dictionary, p represents the number of each word in the name information in the ordered dictionary, MkDenotes the kth element in the first vector to which name information is mapped, p and q denote natural numbers and p and q are not identical.
Optionally, the data processing module 402 is further configured to: and for each logistics attribute of the article, regularizing the logistics attribute to obtain an attribute label of the logistics attribute of the article.
Optionally, the data processing module 402 is further configured to: for each logistics attribute of the article, determining a mean and a mean square error of the logistics attribute in the source data set; and determining an attribute label of the logistics attribute of the article according to the mean and mean square error.
Optionally, the data processing module is further configured to determine an attribute tag of the logistics attribute of the commodity according to the following formula (2):
Figure BDA0002030315130000141
where N represents the number of items in the source data set, yi,jDenotes the jth logistic property, μ, of the ith itemjMeans, σ, representing the jth material flow property in the source data setjRepresenting the source data setMean square error of the jth material flow property, zi,jAnd the attribute label represents the jth logistics attribute in the source data set.
Optionally, the apparatus further comprises a data filtering module, configured to: cleaning the source data set to remove useless contents; and performing word segmentation and de-duplication processing on the cleaned source data set.
The device can execute the method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.
Fig. 5 illustrates an exemplary system architecture 500 for a method of determining an attribute of an item logistics, or an apparatus for determining an attribute of an item logistics, to which embodiments of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the terminal devices 501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The terminal devices 501, 502, 503 may have various communication client applications installed thereon, such as a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 501, 502, 503 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 505 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the terminal devices 501, 502, 503. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.
It should be noted that the method for determining the physical distribution attribute provided by the embodiment of the present invention is generally executed by the server 505, and accordingly, the device for determining the physical distribution attribute is generally disposed in the server 505.
It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not in some cases constitute a limitation on the unit itself, and for example, the sending module may also be described as a "module that sends a picture acquisition request to a connected server".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
acquiring name information and logistics attribute information of an article to acquire a source data set;
determining a first matrix corresponding to the name information of the article and an attribute label corresponding to the logistics attribute information based on the source data set;
training the first matrix and the attribute label as training samples to obtain an attribute prediction model;
and determining the logistics attribute information of the article to be predicted according to the attribute prediction model and the name information of the article to be predicted.
According to the technical scheme of the embodiment of the invention, the text neural regression network is generated by utilizing the statistical name information and the logistics attribute information to solve the problem of commodity logistics attribute prediction, so that the prediction accuracy and precision are improved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (18)

1. A method of determining attributes of a stream of items, comprising:
acquiring name information and logistics attribute information of an article to acquire a source data set;
determining a first matrix corresponding to the name information of the article and an attribute label corresponding to the logistics attribute information based on the source data set;
training the first matrix and the attribute label as training samples to obtain an attribute prediction model;
and determining the logistics attribute information of the article to be predicted according to the attribute prediction model and the name information of the article to be predicted.
2. The method of claim 1, wherein determining, based on the source data set, a first matrix corresponding to name information for the item comprises:
generating an ordered dictionary according to the source data set;
mapping the name information in the source data set into a first vector according to the ordered dictionary;
initializing each word of the ordered dictionary into a second vector to obtain a second matrix corresponding to the ordered dictionary;
and obtaining a first matrix corresponding to the name information of the article according to the first vector and the second matrix.
3. The method of claim 2, wherein the ordered dictionary comprises words and numbers of the words;
mapping name information in the source data set to a first vector according to the ordered dictionary comprises:
determining the number of each word in the name information in the ordered dictionary;
mapping the name information in the source data set to a first vector according to the following formula (1):
Figure FDA0002030315120000011
where v represents the total number of words in the ordered dictionary, p represents the number of each word in the name information in the ordered dictionary, MkDenotes the kth element in the first vector to which name information is mapped, p and q denote natural numbers and p and q are not identical.
4. The method of claim 1, wherein determining attribute tags based on the set of source data comprises:
and for each logistics attribute of the article, regularizing the logistics attribute to obtain an attribute label of the logistics attribute of the article.
5. The method of claim 4, wherein for each logistics attribute of the item, regularizing the logistics attribute, and obtaining an attribute label for the logistics attribute of the item comprises:
for each logistics attribute of the article, determining a mean and a mean square error of the logistics attribute in the source data set;
and determining an attribute label of the logistics attribute of the article according to the mean and mean square error.
6. The method of claim 5, wherein an attribute label for the logistics attribute of the item is determined according to equation (2) below:
Figure FDA0002030315120000021
where N represents the number of items in the source data set, yi,jDenotes the jth logistic property, μ, of the ith itemjMeans, σ, representing the jth material flow property in the source data setjMean square error, z, representing the jth material flow property in the source data seti,jAnd the attribute label represents the jth logistics attribute in the source data set.
7. The method of claim 6, wherein the attribute prediction model is trained according to a loss function represented by the following equation (3):
Figure FDA0002030315120000022
wherein Loss represents a Loss function, λ represents a preset first weight, Wg,hRepresents a connection weight of the g-th neuron to the h-th neuron, b represents an offset value of the connection, k represents the number of neurons having the offset value, θ represents a preset second weight,
Figure FDA0002030315120000031
representation predictionProperty tag of (1), λ (∑)g,hWg,h 2+∑kbk 2) Represents L2The term of regularization is used to describe the term,
Figure FDA0002030315120000032
representing the weighted mean square error.
8. The method of any of claims 1-7, wherein prior to determining the first matrix corresponding to the name information of the item and the attribute label corresponding to the logistics attribute information based on the source data set, the method further comprises:
cleaning the source data set to remove useless contents;
and performing word segmentation and de-duplication processing on the cleaned source data set.
9. An apparatus for determining attributes of a stream of articles, comprising:
the data acquisition module is used for acquiring the name information and the logistics attribute information of the article to acquire a source data set;
the data processing module is used for determining a first matrix corresponding to the name information of the article and an attribute label corresponding to the logistics attribute information based on the source data set;
the model training module is used for training the first matrix and the attribute labels as training samples to obtain an attribute prediction model;
and the attribute determining module is used for determining the logistics attribute information of the to-be-predicted article according to the attribute prediction model and the name information of the to-be-predicted article.
10. The apparatus of claim 9, wherein the data processing module is further configured to:
generating an ordered dictionary according to the source data set;
mapping the name information in the source data set into a first vector according to the ordered dictionary;
initializing each word of the ordered dictionary into a second vector to obtain a second matrix corresponding to the ordered dictionary;
and obtaining a first matrix corresponding to the name information of the article according to the first vector and the second matrix.
11. The apparatus of claim 10, wherein the ordered dictionary comprises words and numbers of the words;
the data processing module is further configured to:
determining the number of each word in the name information in the ordered dictionary;
mapping the name information in the source data set to a first vector according to the following formula (1):
Figure FDA0002030315120000041
where v represents the total number of words in the ordered dictionary, p represents the number of each word in the name information in the ordered dictionary, MkDenotes the kth element in the first vector to which name information is mapped, p and q denote natural numbers and p and q are not identical.
12. The apparatus of claim 9, wherein the data processing module is further configured to:
and for each logistics attribute of the article, regularizing the logistics attribute to obtain an attribute label of the logistics attribute of the article.
13. The apparatus of claim 12, wherein the data processing module is further configured to:
for each logistics attribute of the article, determining a mean and a mean square error of the logistics attribute in the source data set;
and determining an attribute label of the logistics attribute of the article according to the mean and mean square error.
14. The apparatus of claim 13, wherein the data processing module is further configured to determine an attribute label for the logistics attribute of the item of merchandise according to equation (2) below:
Figure FDA0002030315120000042
where N represents the number of items in the source data set, yi,jDenotes the jth logistic property, μ, of the ith itemjMeans, σ, representing the jth material flow property in the source data setjMean square error, z, representing the jth material flow property in the source data seti,jAnd the attribute label represents the jth logistics attribute in the source data set.
15. The apparatus of claim 14, wherein the attribute prediction model is trained according to a loss function represented by the following equation (3):
Figure FDA0002030315120000051
wherein Loss represents a Loss function, λ represents a preset first weight, Wg,hRepresents a connection weight of the g-th neuron to the h-th neuron, b represents an offset value of the connection, k represents the number of neurons having the offset value, θ represents a preset second weight,
Figure FDA0002030315120000052
indicating the predicted attribute label, λ (∑)g,hWg,h 2+∑kbk 2) Represents L2The term of regularization is used to describe the term,
Figure FDA0002030315120000053
representing the weighted mean square error.
16. The apparatus according to any one of claims 9-15, further comprising a data screening module configured to:
cleaning the source data set to remove useless contents;
and performing word segmentation and de-duplication processing on the cleaned source data set.
17. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.
18. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN201910307387.8A 2019-04-17 2019-04-17 Method and device for determining physical distribution attribute of article Pending CN111832782A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910307387.8A CN111832782A (en) 2019-04-17 2019-04-17 Method and device for determining physical distribution attribute of article

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910307387.8A CN111832782A (en) 2019-04-17 2019-04-17 Method and device for determining physical distribution attribute of article

Publications (1)

Publication Number Publication Date
CN111832782A true CN111832782A (en) 2020-10-27

Family

ID=72915430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910307387.8A Pending CN111832782A (en) 2019-04-17 2019-04-17 Method and device for determining physical distribution attribute of article

Country Status (1)

Country Link
CN (1) CN111832782A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761853A (en) * 2020-11-11 2021-12-07 北京沃东天骏信息技术有限公司 Data screening method and device
WO2023155425A1 (en) * 2022-02-17 2023-08-24 北京京东振世信息技术有限公司 Goods transfer method and apparatus, electronic device, and computer-readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160155067A1 (en) * 2014-11-20 2016-06-02 Shlomo Dubnov Mapping Documents to Associated Outcome based on Sequential Evolution of Their Contents
CN106126597A (en) * 2016-06-20 2016-11-16 乐视控股(北京)有限公司 User property Forecasting Methodology and device
CN107145977A (en) * 2017-04-28 2017-09-08 电子科技大学 A kind of method that structured attributes deduction is carried out to online social network user
CN108829763A (en) * 2018-05-28 2018-11-16 电子科技大学 A kind of attribute forecast method of the film review website user based on deep neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160155067A1 (en) * 2014-11-20 2016-06-02 Shlomo Dubnov Mapping Documents to Associated Outcome based on Sequential Evolution of Their Contents
CN106126597A (en) * 2016-06-20 2016-11-16 乐视控股(北京)有限公司 User property Forecasting Methodology and device
CN107145977A (en) * 2017-04-28 2017-09-08 电子科技大学 A kind of method that structured attributes deduction is carried out to online social network user
CN108829763A (en) * 2018-05-28 2018-11-16 电子科技大学 A kind of attribute forecast method of the film review website user based on deep neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHAOHONG SONG ET AL: "A Bayesian Dynamic Forecast Model Based on Neural Network", 《PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOISUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION WORKSHOPS》, pages 130 - 132 *
张时俊 等: "基于矩阵分解的个性化推荐系统", 《中文信息学报》, vol. 31, no. 3, pages 134 - 139 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761853A (en) * 2020-11-11 2021-12-07 北京沃东天骏信息技术有限公司 Data screening method and device
WO2023155425A1 (en) * 2022-02-17 2023-08-24 北京京东振世信息技术有限公司 Goods transfer method and apparatus, electronic device, and computer-readable medium

Similar Documents

Publication Publication Date Title
CN107729937B (en) Method and device for determining user interest tag
US11062089B2 (en) Method and apparatus for generating information
CN109840730B (en) Method and device for data prediction
CN112200601B (en) Item recommendation method, device and readable storage medium
WO2023016173A1 (en) Inventory adjustment method and apparatus, electronic device, and computer readable medium
CN109063935A (en) A kind of method, apparatus and storage medium of prediction task processing time
CN111046170A (en) Method and apparatus for outputting information
CN113743971A (en) Data processing method and device
CN112861895B (en) Abnormal article detection method and device
CN111832782A (en) Method and device for determining physical distribution attribute of article
CN113763019A (en) User information management method and device
CN112749323A (en) Method and device for constructing user portrait
CN114663015A (en) Replenishment method and device
CN113779380A (en) Cross-domain recommendation method, device and equipment, and content recommendation method, device and equipment
CN109255563B (en) Method and device for determining storage area of article
CN110619400A (en) Method and device for generating order information
CN111428486B (en) Article information data processing method, device, medium and electronic equipment
CN112783468A (en) Target object sorting method and device
CN107357847B (en) Data processing method and device
CN110766431A (en) Method and device for judging whether user is sensitive to coupon
CN113379173B (en) Method and device for marking warehouse goods with labels
CN110827101A (en) Shop recommendation method and device
CN114677174A (en) Method and device for calculating sales volume of unladen articles
CN111274383B (en) Object classifying method and device applied to quotation
CN108255880A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination