CN112529646A - Commodity classification method and device - Google Patents

Commodity classification method and device Download PDF

Info

Publication number
CN112529646A
CN112529646A CN201910881276.8A CN201910881276A CN112529646A CN 112529646 A CN112529646 A CN 112529646A CN 201910881276 A CN201910881276 A CN 201910881276A CN 112529646 A CN112529646 A CN 112529646A
Authority
CN
China
Prior art keywords
commodity
classification
category
description information
training data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910881276.8A
Other languages
Chinese (zh)
Inventor
陈宏申
赵佳枢
殷大伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910881276.8A priority Critical patent/CN112529646A/en
Publication of CN112529646A publication Critical patent/CN112529646A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0603Catalogue ordering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0621Item configuration or customization

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a commodity classification method and a commodity classification device, and relates to the technical field of computers. One embodiment of the method comprises: acquiring commodity classification training data; training to obtain a commodity classification model for commodity classification based on a neural network by using the commodity classification training data; predicting the probability that the commodities to be classified belong to the commodity classes in the commodity classification table and the probability that words in the commodity description information of the commodities to be classified are used as the commodity classes by using the commodity classification model according to the commodity description information of the commodities to be classified; and determining the commodity category to which the commodity to be classified belongs according to the sequence of the predicted probabilities from high to low. The specific implementation mode ensures the validity and reliability of the commodity classification training data, realizes the automatic classification of commodities, and can dig out possible new vocabularies as commodity classification.

Description

Commodity classification method and device
Technical Field
The invention relates to the technical field of computers, in particular to a commodity classification method and device.
Background
With the increasing variety of commodities, in order to facilitate the management of the commodities and to facilitate the user to quickly find out the commodities which are interested or intended to be purchased from a large number of commodities, the commodities need to be classified.
At present, a common commodity classification method is to adopt a manually set hierarchical classification structure to classify commodities in a manual classification manner, for example, a flat television is classified into a household appliance, a large household appliance and a flat television, wherein the household appliance is the highest classification, the large household appliance is the second classification, and the flat television is the last classification.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: due to the fact that the commodities are various, the labor amount required for manual classification is large, and the commodity classification efficiency is low; the same commodity can be classified into a plurality of categories, and the most suitable commodity category is difficult to find by manual classification; the manually set hierarchical classification structure itself has limitations, is not convenient to adjust, and cannot be well adapted to newly-appeared categories of goods.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for classifying commodities, which can not only realize automatic classification of commodities, but also dig out possible new commodity classifications based on the existing commodity classifications.
To achieve the above object, according to a first aspect of the present invention, there is provided a method of classifying commodities, including:
acquiring commodity classification training data, wherein the commodity classification training data comprises commodity description information and corresponding commodity classes, the commodity description information is acquired according to commodity identification, and the commodity classes are defined in a predefined commodity classification table;
training to obtain a commodity classification model for commodity classification based on a neural network by using the commodity classification training data;
predicting the probability that the commodities to be classified belong to the commodity classes in the commodity classification table and the probability that words in the commodity description information of the commodities to be classified are used as the commodity classes by using the commodity classification model according to the commodity description information of the commodities to be classified; and determining the commodity category to which the commodity to be classified belongs according to the sequence of the predicted probabilities from high to low.
Optionally, when determining that a word in the description information of the commodity to be classified is a commodity classification to which the commodity to be classified belongs, or when the probability of the word as a commodity category is greater than a threshold probability, adding the word as a commodity classification to the commodity classification table.
Optionally, the acquiring of the commodity classification training data includes:
acquiring commodity click data, wherein the commodity click data comprises one or more search keywords for searching commodities and commodity identifications of commodities selected according to the search keywords;
and under the condition that the search keyword is the commodity category defined in the commodity classification table, determining commodity description information corresponding to the commodity identification, and constructing the commodity category and the commodity description information corresponding to the search keyword into commodity classification training data.
Optionally, the method further comprises: when the commodity is selected according to one or more search keywords as the commodity category, constructing the commodity category corresponding to the search keyword which causes the commodity to be selected the most times and the commodity description information of the commodity as commodity classification training data.
Optionally, the method further comprises: when the search keyword as the commodity category corresponds to one or more commodity identifications, determining the commodity identifications in a predefined proportion as the commodity identifications corresponding to the commodity category, and constructing the commodity description information corresponding to the commodity category and the commodity identifications in the predefined proportion as commodity classification training data.
Optionally, the neural network comprises a word segmentation processing layer, a commodity category identification layer and a commodity category prediction layer; the training of the commodity classification training data based on the neural network to obtain a commodity classification model for commodity classification comprises the following steps:
processing one or more words in the commodity description information into word vectors by using the word segmentation processing layer;
calculating the probability that the words corresponding to the word vectors are used as commodity categories by using the commodity category identification layer, and calculating commodity description vectors according to the probability that the word vectors and the words corresponding to the word vectors are used as commodity categories;
predicting the probability of the commodities belonging to the commodity category in the commodity classification table according to the commodity description vector by using the commodity category prediction layer;
and adjusting the neural network, and indicating the commodity category corresponding to the predicted maximum probability value as the commodity category corresponding to the commodity description information in the commodity classification training data.
To achieve the above object, according to a second aspect of the present invention, there is provided an apparatus for sorting commodities: the method comprises the following steps: the system comprises a commodity classification training data acquisition module, a commodity classification model determination module and a commodity category prediction module; wherein the content of the first and second substances,
the commodity classification training data acquisition module is used for acquiring commodity classification training data, wherein the commodity classification training data comprises commodity description information and corresponding commodity classes, the commodity description information is acquired according to commodity identification, and the commodity classes are defined in a predefined commodity classification table;
the commodity classification model determining module is used for training to obtain a commodity classification model for commodity classification based on a neural network by using the commodity classification training data;
the commodity category prediction module is used for predicting the probability that the commodity to be classified belongs to the commodity category in the commodity classification table and the probability that the word in the commodity description information of the commodity to be classified is taken as the commodity category by using the commodity classification model according to the description information of the commodity to be classified; and determining the commodity category to which the commodity to be classified belongs according to the sequence of the predicted probabilities from high to low.
Optionally, the commodity classification training data obtaining module is further configured to, when it is determined that a word in the description information of the commodity to be classified is a commodity classification to which the commodity to be classified belongs, or when the probability of the word as the commodity classification is greater than a threshold probability, add the word vector as the commodity classification to the commodity classification table.
Optionally, the acquiring of the commodity classification training data includes:
acquiring commodity click data, wherein the commodity click data comprises one or more search keywords for searching commodities and commodity identifications of commodities selected according to the search keywords;
and under the condition that the search keyword is the commodity category defined in the commodity classification table, determining commodity description information corresponding to the commodity identification, and constructing the commodity category and the commodity description information corresponding to the search keyword into commodity classification training data.
Optionally, the method further comprises: when the commodity is selected according to one or more search keywords as the commodity category, constructing the commodity category corresponding to the search keyword which causes the commodity to be selected the most times and the commodity description information of the commodity as commodity classification training data.
Optionally, the method further comprises: when the search keyword as the commodity category corresponds to one or more commodity identifications, determining the commodity identifications in a predefined proportion as the commodity identifications corresponding to the commodity category, and constructing the commodity description information corresponding to the commodity category and the commodity identifications in the predefined proportion as commodity classification training data.
Optionally, the neural network comprises a word segmentation processing layer, a commodity category identification layer and a commodity category prediction layer; the training of the commodity classification training data based on the neural network to obtain a commodity classification model for commodity classification comprises the following steps:
processing one or more words in the commodity description information into word vectors by using the word segmentation processing layer;
calculating the probability that the words corresponding to the word vectors are used as commodity categories by using a commodity category identification layer, and calculating commodity description vectors according to the probability that the word vectors and the words corresponding to the word vectors are used as commodity categories;
predicting the probability of the commodities belonging to the commodity category in the commodity classification table according to the commodity description vector by using the commodity category prediction layer;
and adjusting the neural network until the commodity category corresponding to the predicted maximum probability value is the commodity category corresponding to the commodity description information in the commodity classification training data.
To achieve the above object, according to a third aspect of the present invention, there is provided a server for sorting commodities, comprising: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as in any one of the methods of article classification described above.
To achieve the above object, according to a fourth aspect of the present invention, there is provided a computer readable medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements any of the methods of article classification as described above.
One embodiment of the above invention has the following advantages or benefits: based on a predefined commodity classification table and massive commodity click data, training data for determining a commodity classification model is obtained, and comprehensiveness and effectiveness of the training data are guaranteed; meanwhile, the commodity description information is processed through the commodity classification model, so that the attribution of commodities in the predefined commodity classification table can be automatically predicted, new words which can be used for commodity classification can be mined, the predefined commodity classification table can be expanded, and the commodity classification accuracy is improved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of a method of classifying commodities, according to an embodiment of the present invention;
FIG. 2a is a schematic diagram of the main structure of a neural network according to an embodiment of the present invention;
FIG. 2b is a schematic diagram of the main structure of a participle processing layer according to an embodiment of the present invention;
fig. 3 is a schematic diagram of the main structure of an apparatus for sorting articles according to an embodiment of the present invention;
FIG. 4 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 5 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
As shown in fig. 1, an embodiment of the present invention provides a method for classifying a commodity, which may specifically include the following steps:
step S101, commodity classification training data is obtained, wherein the commodity classification training data comprises commodity description information and corresponding commodity categories, the commodity description information is obtained according to commodity identification, and the commodity categories are defined in a predefined commodity classification table.
The commodity identification is any information which can be used for identifying the commodity, such as a commodity ID and the like, and commodity description information corresponding to the commodity can be acquired from various channels such as an e-commerce platform, a commodity information base, a database and the like according to the commodity identification, such as a commodity title, a commodity model and the like; the predefined commodity classification table defines commodity categories, such as a manually set commodity three-level classification table and the like.
In an optional embodiment, the obtaining of the commodity classification training data includes: acquiring commodity click data, wherein the commodity click data comprises one or more search keywords for searching commodities and commodity identifications of commodities selected according to the search keywords; and under the condition that the search keyword is the commodity category defined in the commodity classification table, determining commodity description information corresponding to the commodity identification, and constructing the commodity category and the commodity description information corresponding to the search keyword into commodity classification training data.
It can be understood that there may be one or more commodities selected (e.g., selected in a click manner) after being searched by using the same search keyword, there may also be one or more corresponding commodity identifiers, and the same commodity may also be selected multiple times, so that the corresponding same commodity identifier may appear multiple times, each time corresponds to a selection record, and therefore, the number of times that the corresponding commodity is selected may be calculated according to the number of times that the commodity identifier appears.
Specifically, referring to table 1, the search keywords for searching for the item included in the item click data include: search keyword 1, search keyword 2, search keyword 3, and the like; taking the search keyword 1 as an example for explanation, after searching by using the search keyword 1, the product identifiers corresponding to the products clicked or selected by the search user include: the commodity identification system comprises a commodity identification 1, a commodity identification 2, a commodity identification 3 and a commodity identification 1, wherein the commodity identification 1 corresponds to 2 records, namely the number of times of selection corresponding to the commodity identification 1 is 2; one or more of the same products may also be searched using different search keywords, and thus different search keywords may correspond to the same product identifier, such as the product identifier 1 corresponding to each of the search keyword 1 and the search keyword 2 in table 1 below.
On this basis, since the search keyword is set or input by the user, and many search keywords are not matched with the target product to be searched or the target product cannot be searched well, in order to ensure the validity of the product classification training data, the obtained product click data can be screened, that is, whether the search keyword 1, the search keyword 2, and the search keyword 3 are the product categories in the predefined product classification table is determined, the following description is still given by taking the search keyword 1 as an example: if the search keyword 1 is not the commodity category, filtering the search keyword 1 and the corresponding commodity identification; if the search keyword 1 is a commodity category, acquiring corresponding commodity description information according to the commodity identification 1, the commodity identification 2 and the commodity identification 3 corresponding to the search keyword, and further selecting the search keyword 1 and the corresponding commodity description information to form commodity classification training data.
TABLE 1 Commodity click data example
Figure BDA0002205945390000071
In an alternative embodiment, when the product is selected according to one or more search keywords as the product category, the product category corresponding to the search keyword that causes the product to be selected the most times and the product description information of the product are constructed as product classification training data. Specifically, referring to table 1, it is described that the search keyword 1 and the search keyword 2 are both product categories, and when a search is performed using the search keyword 1 or the search keyword 2, products corresponding to the product identifier 1 are all selected, but the difference is that when a search is performed using the search keyword 1, the number of times that the product corresponding to the product identifier 1 is selected is 2, and when a search is performed using the search keyword 2, the number of times that the product corresponding to the product identifier 1 is selected is 1, so that it is determined that the probability that the product identifier 1 belongs to the product category corresponding to the search keyword 1 is high, and then the product category corresponding to the search keyword 1 and the product description information corresponding to the product identifier 1 are selected to constitute product classification training data, and at the same time, the data of the search keyword 2 and the product identifier 1 are discarded.
In an optional implementation manner, when the search keyword as the commodity category corresponds to one or more commodity identifications, a predefined proportion of commodity identifications are determined as the commodity identifications corresponding to the commodity category, and commodity description information corresponding to the commodity category and the predefined proportion of commodity identifications is constructed as commodity classification training data. The predefined ratio may be any ratio set according to actual requirements, such as 20%, 25%, 50%, etc. Specifically, still referring to table 1, the search keyword 1 is used as a commodity category for explanation, and when the search keyword 1 is used for searching, the total number of the selected commodity identifications is 3, that is, the commodity identification 1, the commodity identification 2, and the commodity identification 3; wherein, the frequency of occurrence of the commodity identification 1 is 2, the frequency of occurrence of the commodity identification 2 is 1, and the frequency of occurrence of the commodity identification 3 is 1; the higher the frequency of occurrence of the product identifier is, the higher the frequency of selection of the corresponding product is, that is, the higher the probability that the product belongs to the search keyword 1 is, so that according to the frequency of occurrence of the product identifier, a certain proportion of product identifiers can be selected in the order of frequency from high to low, for example, 50% of the product identifiers are selected, and the product identifiers included in the search keyword 1 after selection are: commodity sign 1, commodity sign 2. On the basis, the search keyword 1 and the commodity description information corresponding to the commodity identification 1 and the commodity identification 2 are selected to form commodity classification training data.
And S102, training to obtain a commodity classification model for commodity classification based on a neural network by using the commodity classification training data.
That is, the training is performed using the commodity description information and the commodity category included in the commodity classification training data based on the neural network. Specifically, the description is given by taking the example that the commodity description information and the commodity category are respectively "home green sofa" and "sofa", the commodity description information is input into a preset neural network, such as a convolutional neural network and a spiral residual error neural network, and is processed, so that the neural network can predict the commodity category corresponding to the commodity description information according to the input commodity description information, and adjust the preset neural network according to whether the predicted commodity category is consistent with the "sofa", until the commodity category predicted by the neural network is "sofa", the current neural network is determined to be a commodity classification model for commodity classification.
Referring to fig. 2a, in an alternative embodiment, the neural network includes a segmentation processing layer, a commodity category identification layer, and a commodity category prediction layer; the training of the commodity classification training data based on the neural network to obtain a commodity classification model for commodity classification comprises the following steps: processing one or more words in the commodity description information into word vectors by using the word segmentation processing layer; calculating the probability that the words corresponding to the word vectors are used as commodity categories by using the commodity category identification layer, and calculating commodity description vectors according to the probability that the word vectors and the words corresponding to the word vectors are used as commodity categories; predicting the probability of the commodities belonging to the commodity category in the commodity classification table according to the commodity description vector by using the commodity category prediction layer; and adjusting the neural network until the commodity category corresponding to the predicted maximum probability value is the commodity category corresponding to the commodity description information in the commodity classification training data. It is understood that the word segmentation processing layer, the commodity category identification layer and the commodity category prediction layer may be a layer of multi-layer neural network set according to actual needs, such as a convolutional neural network, a cyclic neural network, a spiral residual error neural network, and the like.
Further, referring to fig. 2b, the word segmentation processing layer may include one or more character-level convolution layers and one or more spiral residual hidden layers, where the character-level convolution layers are configured to process one or more words obtained by segmenting from the commodity description information into primary word vectors; the spiral residual hiding layer further processes the primary word vector into a word vector according to the combination characteristics (such as the size, proportion, collocation relationship and the like among the words) of one or more words.
More specifically, the description will be given by taking the example where the product description information and the product category are "home green couch" and "couches", respectively. Before the commodity description information is used, the commodity description information is subjected to word segmentation processing, for example, the commodity description information 'household green couch' is split into 3 words: "household", "green", "small sofa"; inputting the split household, green and couch into a word segmentation processing layer, wherein the word segmentation processing layer respectively maps each character into a character vector according to the identification corresponding to each character in the input words, so that the word vectors corresponding to the household, green and couch are respectively recorded as S1, S2 and S3, and the obtained word vectors are all vectors with the same dimension (such as 5 dimensions); inputting the word vectors into a commodity category identification layer, wherein the commodity category identification layer can determine the probability that 3 words corresponding to the word vectors are respectively used as commodity categories corresponding to commodity description information according to a classification identification function, such as a softmax classification function and the like, and the probability is recorded as P1, P2 and P3; inputting the word vectors S1, S2 and S3 and the probabilities P1, P2 and P3 that the words corresponding to the word vectors are taken as commodity categories into a commodity category prediction layer, and calculating to obtain commodity description vectors through the following formulas:
commercial description vector ═ S1 × P1+ S2 × P2+ S3 × P3
On the basis, the commodity category prediction layer respectively calculates the probabilities p1, p2, p3 and the like that the commodity category in the predefined commodity classification table is the commodity category to which the corresponding commodity description information belongs according to the obtained description vector, and if the commodity classification table includes "couch", "tv" and "watch" as an example, the obtained probabilities that the commodity category to which the commodity description information belongs are 0.8, 0.1 and 0.1 respectively, it can be known that the probability that the commodity description information "home green couch" belongs to the commodity category "couch" is 0.8, and thus the commodity category corresponding to the "home green couch" is predicted to be "couch".
Step S103, according to the commodity description information of the commodity to be classified, the commodity classification model is used for predicting the probability that the commodity to be classified belongs to the commodity category in the commodity classification table and the probability that the word in the commodity description information of the commodity to be classified is used as the commodity category; and determining the commodity category to which the commodity to be classified belongs according to the sequence of the predicted probabilities from high to low.
That is, when the commodity classification model is used for predicting the commodity category of the commodity to be classified according to the commodity description information of the commodity to be classified, the attribution of the commodity in the predefined commodity classification table can be automatically predicted, words in the commodity description information can be mined for commodity classification, the predefined commodity classification table can be expanded, and the accuracy of commodity classification is improved.
In an optional implementation manner, when determining that a word in the description information of the commodity to be classified is a commodity classification to which the commodity to be classified belongs, or when the probability that the word is taken as a commodity category is greater than a threshold probability, adding the word as a commodity classification to the commodity classification table.
When the commodity classification table predefined by the commodity does not have the commodity category matched with the commodity description information, or the commodity description information cannot be well attributed to the commodity classification table, or when the vocabulary contained in the commodity description information can be used as the commodity category of the commodity to be classified, the vocabulary which can be used as the commodity category can be added into the commodity classification table, so that the newly generated commodity classification table can be updated in real time, and meanwhile, the method can be better suitable for the classification of the commodity. Specifically, a threshold probability (for example, 0.6) may be set according to the actual demand, and still taking the example that the product description information and the product category are "home green sofa" and "small sofa", respectively, as an example, when the probability that the word vector or the corresponding 3 words ("home", "green", and "small sofa") are respectively taken as the product category corresponding to the product description information is determined to be 0.1, and 0.8 by using the softmax classification function, then the probability 0.8 corresponding to the "small sofa" is greater than the threshold probability 0.6, so that the word "small sofa" identified in the product description information may be determined to be the product category to which the product description information "home green small sofa" belongs, and "small sofa" is added to the predefined product classification table.
It is understood that, in addition to the probabilities that the commodity description information belongs to the commodity category in the recognition vocabulary or the commodity classification table, the probabilities P1, P2, P3 and the probabilities P1, P2, P3 may be used to represent the probability that the commodity description information belongs to the commodity category in the recognition vocabulary or the commodity classification table, and the obtained score value is positively correlated with the probability, and then the probabilities that the commodity description information "home green couch" belongs to the recognition vocabulary ("home", "green", "couch") and the predefined commodity category ("couch", "tv", "watch") are respectively calculated through the normalization process, and the sum of the obtained probabilities is 1, and the commodity category to which the commodity description information belongs is determined according to the maximum value of the obtained probabilities.
It should be noted that after the commodity classification model for commodity classification is determined, the commodity category of the commodity to be classified can be predicted according to the description information of the commodity to be classified, and the predicted commodity to be classified can include not only the commodity category in the predefined commodity classification table determined by the commodity classification prediction layer, but also a newly added vocabulary which can be used as the commodity category and is recognized by the commodity classification recognition layer from the commodity description information of the commodity to be classified.
Based on the embodiment, the effectiveness and reliability of the corresponding relation between the commodity category and the commodity description information in the commodity classification training data are ensured through multiple screening such as screening of search keywords in commodity click data, screening of search keywords corresponding to the commodity identifications, screening of one or more commodity identifications corresponding to the search keywords and the like; meanwhile, except for predicting the commodity category from the predefined commodity classification table, the vocabulary possibly used as the commodity category corresponding to the commodity description information can be identified from the commodity description information, so that the accuracy of the predicted commodity classification is improved, the predefined commodity classification table is expanded, the commodity classification table can be updated in real time, and the method has wider applicability.
Referring to fig. 3, an embodiment of the present invention provides an apparatus 300 for sorting goods, including: a commodity classification training data acquisition module 301, a commodity classification model determination module 302 and a commodity classification prediction module; wherein the content of the first and second substances,
the commodity classification training data obtaining module 301 is configured to obtain commodity classification training data, where the commodity classification training data includes commodity description information and corresponding commodity categories, where the commodity description information is obtained according to commodity identifiers, and the commodity categories are defined in a predefined commodity classification table;
the commodity classification model determining module 302 is configured to use the commodity classification training data to obtain a commodity classification model for commodity classification based on neural network training;
the commodity category predicting module 303 is configured to predict, according to description information of a commodity to be classified, a probability that the commodity to be classified belongs to a commodity category in the commodity classification table and a probability that a word in the commodity description information of the commodity to be classified is used as the commodity category by using the commodity classification model; and determining the commodity category to which the commodity to be classified belongs according to the sequence of the predicted probabilities from high to low.
In an optional implementation manner, the commodity classification training data obtaining module 301 is further configured to, when determining that a word in the description information of the commodity to be classified is a commodity classification to which the commodity to be classified belongs, or when a probability of the word as the commodity classification is greater than a threshold probability, add the word vector as the commodity classification to the commodity classification table.
In an optional embodiment, the obtaining of the commodity classification training data includes:
acquiring commodity click data, wherein the commodity click data comprises one or more search keywords for searching commodities and commodity identifications of commodities selected according to the search keywords;
and under the condition that the search keyword is the commodity category defined in the commodity classification table, determining commodity description information corresponding to the commodity identification, and constructing the commodity category and the commodity description information corresponding to the search keyword into commodity classification training data.
In an optional implementation manner, the obtaining the commodity classification training data further includes: when the commodity is selected according to one or more search keywords as the commodity category, constructing the commodity category corresponding to the search keyword which causes the commodity to be selected the most times and the commodity description information of the commodity as commodity classification training data.
In an optional implementation manner, the obtaining the commodity classification training data further includes:
when the search keyword as the commodity category corresponds to one or more commodity identifications, determining the commodity identifications in a predefined proportion as the commodity identifications corresponding to the commodity category, and constructing the commodity description information corresponding to the commodity category and the commodity identifications in the predefined proportion as commodity classification training data.
In an alternative embodiment, the neural network comprises a word segmentation processing layer, a commodity category identification layer and a commodity category prediction layer;
the training of the commodity classification training data based on the neural network to obtain a commodity classification model for commodity classification comprises the following steps:
processing one or more words in the commodity description information into word vectors by using the word segmentation processing layer;
calculating the probability that the words corresponding to the word vectors are used as commodity categories by using the commodity category identification layer, and calculating commodity description vectors according to the probability that the word vectors and the words corresponding to the word vectors are used as commodity categories;
predicting the probability of the commodities belonging to the commodity category in the commodity classification table according to the commodity description vector by using the commodity category prediction layer;
and adjusting the neural network until the commodity category corresponding to the predicted maximum probability value is the commodity category corresponding to the commodity description information in the commodity classification training data.
Fig. 4 shows an exemplary system architecture 400 of a product sorting method or a product sorting apparatus to which embodiments of the present invention may be applied.
As shown in fig. 4, the system architecture 400 may include terminal devices 401, 402, 403, a network 404, and a server 405. The network 404 serves as a medium for providing communication links between the terminal devices 401, 402, 403 and the server 405. Network 404 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 401, 402, 403 to interact with a server 405 over a network 404 to receive or send messages or the like. The terminal devices 401, 402, 403 may have various communication client applications installed thereon, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like.
The terminal devices 401, 402, 403 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 405 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the terminal devices 401, 402, and 403. The background management server can analyze and process the received data such as the product information inquiry request and feed back the processing result (the predicted commodity classification) to the terminal equipment.
It should be noted that the product classification method provided by the embodiment of the present invention is generally executed by the server 405, and accordingly, the product classification device is generally disposed in the server 405.
It should be understood that the number of terminal devices, networks, and servers in fig. 4 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 501.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprises a commodity classification training data acquisition module and a commodity classification model determination module. The names of these modules do not form a limitation on the modules themselves in some cases, for example, the product classification model determination module may also be described as "a module that uses the product classification training data to train a product classification model for product classification based on a neural network".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring commodity classification training data, wherein the commodity classification training data comprises commodity description information and corresponding commodity classes, the commodity description information is acquired according to commodity identification, and the commodity classes are defined in a predefined commodity classification table; training to obtain a commodity classification model for commodity classification based on a neural network by using the commodity classification training data; predicting the probability that the commodities to be classified belong to the commodity classes in the commodity classification table and the probability that words in the commodity description information of the commodities to be classified are used as the commodity classes by using the commodity classification model according to the commodity description information of the commodities to be classified; and determining the commodity category to which the commodity to be classified belongs according to the sequence of the predicted probabilities from high to low.
According to the technical scheme of the embodiment of the invention, the effectiveness and reliability of the corresponding relation between the commodity category and the commodity description information in the commodity classification training data are ensured through multiple screening such as screening of search keywords in the commodity click data, screening of search keywords corresponding to the commodity identifications, screening of one or more commodity identifications corresponding to the search keywords and the like; meanwhile, except for predicting the commodity category from the predefined commodity classification table, the vocabulary possibly used as the commodity category corresponding to the commodity description information can be identified from the commodity description information, so that the accuracy of the predicted commodity classification is improved, the predefined commodity classification table is expanded, the commodity classification table can be updated in real time, and the method has wider applicability.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (14)

1. A method of classifying a commodity, comprising:
acquiring commodity classification training data, wherein the commodity classification training data comprises commodity description information and corresponding commodity classes, the commodity description information is acquired according to commodity identification, and the commodity classes are defined in a predefined commodity classification table;
training to obtain a commodity classification model for commodity classification based on a neural network by using the commodity classification training data;
predicting the probability that the commodities to be classified belong to the commodity classes in the commodity classification table and the probability that words in the commodity description information of the commodities to be classified are used as the commodity classes by using the commodity classification model according to the commodity description information of the commodities to be classified; and determining the commodity category to which the commodity to be classified belongs according to the sequence of the predicted probabilities from high to low.
2. The article sorting method according to claim 1,
and when determining that the words in the description information of the commodities to be classified are the commodity classification to which the commodities to be classified belong, or when the probability of the words as the commodity classification is greater than the threshold probability, adding the words as the commodity classification to the commodity classification table.
3. The method for classifying commodities, according to claim 1, wherein said obtaining commodity classification training data comprises:
acquiring commodity click data, wherein the commodity click data comprises one or more search keywords for searching commodities and commodity identifications of commodities selected according to the search keywords;
and under the condition that the search keyword is the commodity category defined in the commodity classification table, determining commodity description information corresponding to the commodity identification, and constructing the commodity category and the commodity description information corresponding to the search keyword into commodity classification training data.
4. The article classification method according to claim 3, further comprising:
when the commodity is selected according to one or more search keywords as the commodity category, constructing the commodity category corresponding to the search keyword which causes the commodity to be selected the most times and the commodity description information of the commodity as commodity classification training data.
5. The article classification method according to claim 3, further comprising: when the search keyword as the commodity category corresponds to one or more commodity identifications, determining the commodity identifications in a predefined proportion as the commodity identifications corresponding to the commodity category, and constructing the commodity description information corresponding to the commodity category and the commodity identifications in the predefined proportion as commodity classification training data.
6. The method according to claim 1, wherein the neural network comprises a segmentation processing layer, a commodity category identification layer, and a commodity category prediction layer;
the training of the commodity classification training data based on the neural network to obtain a commodity classification model for commodity classification comprises the following steps:
processing one or more words in the commodity description information into word vectors by using the word segmentation processing layer;
calculating the probability that the words corresponding to the word vectors are used as commodity categories by using the commodity category identification layer, and calculating commodity description vectors according to the probability that the word vectors and the words corresponding to the word vectors are used as commodity categories;
predicting the probability of the commodities belonging to the commodity category in the commodity classification table according to the commodity description vector by using the commodity category prediction layer;
and adjusting the neural network until the commodity category corresponding to the predicted maximum probability value is the commodity category corresponding to the commodity description information in the commodity classification training data.
7. An article sorting device, comprising: the system comprises a commodity classification training data acquisition module, a commodity classification model determination module and a commodity category prediction module; wherein the content of the first and second substances,
the commodity classification training data acquisition module is used for acquiring commodity classification training data, wherein the commodity classification training data comprises commodity description information and corresponding commodity classes, the commodity description information is acquired according to commodity identification, and the commodity classes are defined in a predefined commodity classification table;
the commodity classification model determining module is used for training to obtain a commodity classification model for commodity classification based on a neural network by using the commodity classification training data;
the commodity category prediction module is used for predicting the probability that the commodity to be classified belongs to the commodity category in the commodity classification table and the probability that the word in the commodity description information of the commodity to be classified is taken as the commodity category by using the commodity classification model according to the description information of the commodity to be classified; and determining the commodity category to which the commodity to be classified belongs according to the sequence of the predicted probabilities from high to low.
8. The merchandise classification device of claim 7, wherein the merchandise classification training data acquisition module is further configured to,
and when determining that the words in the description information of the commodities to be classified are the commodity classifications to which the commodities to be classified belong, or when the probability of the words as the commodity classifications is greater than a threshold probability, adding the word vectors as the commodity classifications to the commodity classification table.
9. The merchandise classification device according to claim 7, wherein the acquiring of the merchandise classification training data includes:
acquiring commodity click data, wherein the commodity click data comprises one or more search keywords for searching commodities and commodity identifications of commodities selected according to the search keywords;
and under the condition that the search keyword is the commodity category defined in the commodity classification table, determining commodity description information corresponding to the commodity identification, and constructing the commodity category and the commodity description information corresponding to the search keyword into commodity classification training data.
10. The article sorting device of claim 9, further comprising:
when the commodity is selected according to one or more search keywords as the commodity category, constructing the commodity category corresponding to the search keyword which causes the commodity to be selected the most times and the commodity description information of the commodity as commodity classification training data.
11. The article sorting device of claim 9, further comprising:
when the search keyword as the commodity category corresponds to one or more commodity identifications, determining the commodity identifications in a predefined proportion as the commodity identifications corresponding to the commodity category, and constructing the commodity description information corresponding to the commodity category and the commodity identifications in the predefined proportion as commodity classification training data.
12. The article classification device according to claim 7, wherein the neural network includes a word segmentation processing layer, an article category identification layer, and an article category prediction layer;
the training of the commodity classification training data based on the neural network to obtain a commodity classification model for commodity classification comprises the following steps:
processing one or more words in the commodity description information into word vectors by using the word segmentation processing layer;
calculating the probability that the words corresponding to the word vectors are used as commodity categories by using the commodity category identification layer, and calculating commodity description vectors according to the probability that the word vectors and the words corresponding to the word vectors are used as commodity categories;
predicting the probability of the commodities belonging to the commodity category in the commodity classification table according to the commodity description vector by using the commodity category prediction layer;
and adjusting the neural network until the commodity category corresponding to the predicted maximum probability value is the commodity category corresponding to the commodity description information in the commodity classification training data.
13. A server for sorting goods, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
14. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN201910881276.8A 2019-09-18 2019-09-18 Commodity classification method and device Pending CN112529646A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910881276.8A CN112529646A (en) 2019-09-18 2019-09-18 Commodity classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910881276.8A CN112529646A (en) 2019-09-18 2019-09-18 Commodity classification method and device

Publications (1)

Publication Number Publication Date
CN112529646A true CN112529646A (en) 2021-03-19

Family

ID=74975053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910881276.8A Pending CN112529646A (en) 2019-09-18 2019-09-18 Commodity classification method and device

Country Status (1)

Country Link
CN (1) CN112529646A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801720A (en) * 2021-04-12 2021-05-14 连连(杭州)信息技术有限公司 Method and device for generating shop category identification model and identifying shop category

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801720A (en) * 2021-04-12 2021-05-14 连连(杭州)信息技术有限公司 Method and device for generating shop category identification model and identifying shop category

Similar Documents

Publication Publication Date Title
CN110909182B (en) Multimedia resource searching method, device, computer equipment and storage medium
US11836778B2 (en) Product and content association
CN110020162B (en) User identification method and device
CN107885873B (en) Method and apparatus for outputting information
US11423096B2 (en) Method and apparatus for outputting information
CN107832338B (en) Method and system for recognizing core product words
CN103279513A (en) Method for generating content label and method and device for providing multi-media content information
CN111008321A (en) Recommendation method and device based on logistic regression, computing equipment and readable storage medium
CN107908662B (en) Method and device for realizing search system
CN107609192A (en) The supplement searching method and device of a kind of search engine
CN107247798B (en) Method and device for constructing search word bank
CN110766486A (en) Method and device for determining item category
CN112116426A (en) Method and device for pushing article information
CN112529646A (en) Commodity classification method and device
CN107463628B (en) Data filling method and system thereof
CN112905885B (en) Method, apparatus, device, medium and program product for recommending resources to user
CN110807095A (en) Article matching method and device
CN113722593A (en) Event data processing method and device, electronic equipment and medium
CN112256566A (en) Test case preservation method and device
CN110895564A (en) Potential customer data processing method and device
CN111199437A (en) Data processing method and device
CN113360765B (en) Event information processing method and device, electronic equipment and medium
CN113127750B (en) Information list generation method and device, storage medium and electronic equipment
CN106777403B (en) Information pushing method and device
CN112783956B (en) Information processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination