WO2023020508A1 - 一种商品自动分类方法、装置及计算机设备 - Google Patents
一种商品自动分类方法、装置及计算机设备 Download PDFInfo
- Publication number
- WO2023020508A1 WO2023020508A1 PCT/CN2022/112865 CN2022112865W WO2023020508A1 WO 2023020508 A1 WO2023020508 A1 WO 2023020508A1 CN 2022112865 W CN2022112865 W CN 2022112865W WO 2023020508 A1 WO2023020508 A1 WO 2023020508A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- commodity
- classifiers
- classification
- level
- classifier
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000011218 segmentation Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 230000000717 retained effect Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000013517 stratification Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
Definitions
- the invention relates to the field of electronic commerce, and more specifically, relates to a method, device and computer equipment for automatic commodity classification.
- the technical problem to be solved by the present invention is to provide an automatic commodity classification method, device and computer equipment for the above-mentioned defects of the prior art.
- the technical solution adopted by the present invention to solve its technical problem is: construct a kind of commodity automatic classification method, comprise the following steps:
- the commodity taxonomy includes a plurality of preset commodity classifiers
- step S4 repeating the step S3 until the superior classifiers cannot find the commodity classifiers appearing at the same time, the classifiers at all levels are combined into a commodity classification hierarchical tree, and the commodity classification hierarchical tree includes each commodity and all levels of classifiers corresponding relationship.
- the product description information includes title, abstract and keywords
- the product classification in the product description information of each product is extracted using the product classification thesaurus Words include:
- the title, the abstract, and the keywords are segmented using the preset commodity classifiers in the commodity classifier library, and the commodity classifiers in the commodity description information of each commodity are extracted from the word segmentation.
- the step S3 includes:
- the step S3 includes:
- step S3 if the combination of at least two hierarchical classifiers is the classification keyword corresponding to the same commodity, only one group of hierarchical classifiers is retained.
- repeating the step S3 in the step S4 until no commodity classifiers appearing at the same time can be found by the superior classifier includes: repeating the step S3 until the grading level reaches the predetermined level. Set the number of levels.
- step S4 it also includes:
- step S4 it also includes:
- an automatic commodity classification device comprising:
- An extracting unit configured to extract commodity classifiers in the commodity description information of each commodity using a commodity classification thesaurus, the commodity classification thesaurus including a plurality of preset commodity classifiers;
- the first grading unit is used to count the number of occurrences of each of the commodity classifiers, and use the commodity classifier with the largest number of occurrences as the first-level classifier;
- the second grading unit is used to count the commodity classifiers that appear simultaneously with the upper-level classifiers, and the commodity classifiers that appear most frequently with the upper-level classifiers are used as the secondary classifiers;
- the third grading unit is used to repeatedly execute the second grading unit until the upper-level classifiers cannot find the commodity classifiers that appear at the same time, and combine the classifiers at all levels into a commodity classification hierarchical tree, in the commodity classification hierarchical tree Contains the corresponding relationship between each commodity and classifiers at all levels.
- the present invention also provides a computer device, including a processor and a memory, and the processor is communicatively connected to the memory;
- the memory is used to store computer programs
- the processor is configured to execute the computer program stored in the memory to realize the above-mentioned method for automatically classifying commodities.
- Implementing a method, device and computer equipment for automatic commodity classification of the present invention has the following beneficial effects: the present invention automatically builds a product classification hierarchical tree according to the commodity classification thesaurus and commodity description information, which makes the classification more reasonable and saves a lot of labor costs at the same time .
- Fig. 1 is a flowchart of a method for automatically classifying commodities provided by an embodiment of the present invention
- Fig. 2 is a flow chart of a method for automatically classifying commodities provided by an embodiment of the present invention
- Fig. 3 is a flowchart of a method for automatically classifying commodities provided by an embodiment of the present invention
- Fig. 4 is a schematic structural diagram of an automatic commodity classification device provided by an embodiment of the present invention.
- the method for automatically classifying commodities in this embodiment is used to automatically classify a large number of commodities, and can be used for warehouse management, e-commerce platform management, etc., such as an electronic component sales platform.
- the method for automatically classifying commodities includes the following steps:
- each product must provide product description information, and the product and product description information correspond one-to-one, and the product description information is used to describe the parameters, performance, effects, etc. of the product.
- the product classification thesaurus includes multiple preset product classifiers. Classifiers are obtained through the analysis of a large amount of commodity information, coupled with manual screening, so as to establish a scientific and reasonable commodity classification thesaurus. It can be understood that with the launch of new products and the withdrawal of old products, the product classification thesaurus is dynamically changing, and some preset product classification words are added or deleted according to the changes of products.
- the product classification thesaurus After the product classification thesaurus is established, use the product classification thesaurus to extract the product classification words in the product description information of each product.
- the product description information includes title, abstract and keywords, etc., and when using the product classification thesaurus to extract the product classifiers in the product description information of each product, it is necessary to use the preset product classifiers in the product classification thesaurus respectively Segment the title, abstract and keywords, and extract the product classification words in the product description information of each product from the word segmentation.
- each commodity classifier Counts the occurrence times of each commodity classifier, and use the commodity classifier with the largest number of occurrences as the first-level classifier. Specifically, after extracting the product classification words in the product description information of each product from the word segmentation, count the number of occurrences of each product classification word, and sort all the product classification words according to the number of occurrences from high to low.
- the commodity classifiers are used as the first-level classifiers. If there are at least two product classifiers with the same number of appearances, then these multiple product classifiers are all regarded as first-level classifiers.
- the secondary classifiers are determined in sequence after the superior classifiers are determined, for example, the second-level classifiers are determined after the first-level classifiers are determined, the third-level classifiers are determined after the second-level classifiers are determined, and so on.
- the secondary classifiers it is necessary to count the commodity classifiers that appear at the same time as the superior classifiers, that is, the description information of a certain commodity contains both the superior classifiers and the secondary classifiers, for example, the superior classifier is Diode, and the secondary classifier is Xiao For Teky, diodes and Schottky need to appear in the description information of a certain product at the same time.
- the commodity classifiers that appear at the same time as the superior classifiers sort the commodity classifiers that appear at the same time as the superior classifiers according to the number of occurrences from high to low, and use the commodity classifiers that appear most frequently with the superior classifiers as the secondary classification word.
- the superior classifier is diode, and if Schottky is the most frequent occurrence of diode, then Schottky is the secondary classifier of diode.
- the "superior classifier" in the commodity classifiers that appear together with the superior classifiers is counted not only including the superior classifiers, but including all the superior classifiers.
- the "superior classifier" when searching for the third-level classifier, the "superior classifier" must include both the first-level classifier diode and the second-level classifier Schottky.
- the same product may have different descriptions or names
- multi-level classifiers there may be a problem of repeated stratification.
- at least two classifiers are combined to form the corresponding classification keywords of the same product, only One of the set of hierarchical classifiers.
- the combination of one of the two level classifiers is AB
- the combination of the other two level classifiers is BA
- AB and BA are classification keywords corresponding to the same product, only one set of level classifiers is reserved.
- Step S3 is repeated until no commodity classifiers appearing at the same time can be found by the superior classifiers, and the classifiers at all levels are combined into a commodity classification hierarchical tree.
- the commodity classification hierarchical tree includes the corresponding relationship between each commodity and the classifiers at all levels. Because the product classification tree contains the corresponding relationship between each product and the classifiers at all levels, users can quickly locate the product through the product classification tree layer by layer when searching for products, and find the desired product quickly and accurately.
- step S3 is repeated in step S4 until the upper-level classifiers cannot find the commodity classifiers that appear simultaneously, including: repeating step S3 until the grading levels reach the preset grading level number, that is, when the number of grading levels reaches the preset number of grading levels, step S3 will not be repeated.
- This embodiment automatically builds a product classification hierarchical tree according to the product classification thesaurus and product description information, making the classification more reasonable and saving a lot of labor costs.
- step S4 it also includes: S51, using the commodity classification thesaurus to extract the commodity classification words in the commodity description information of the newly added commodity, and searching the commodity classification words in the commodity classification The location of the hierarchical tree, add new products to the product classification hierarchical tree.
- This embodiment solves the problem of integrating new commodities into the commodity classification hierarchy tree, so that the newly added commodities can be automatically and quickly located in the commodity classification hierarchy tree.
- step S4 it also includes: S52, if the commodity classification thesaurus does not contain the commodity classification words corresponding to the newly added commodities, then add the commodity classification words of the newly added commodities Go to the product classification thesaurus; use the updated product classification thesaurus to re-execute steps S1 to S4 to update the product classification hierarchy tree.
- This embodiment solves the problem that the newly added commodity is a new product, and the entire commodity classification hierarchy tree needs to be updated again.
- the commodity classifier statistical data generated in the previous classification process can be used to improve the update efficiency.
- the commodity automatic classification device of this embodiment is used to automatically classify a large number of commodities, and can be used for warehouse management, e-commerce platform management, etc., such as an electronic component sales platform.
- the automatic commodity classification device includes an extraction unit, a first classification unit, a second classification unit and a third classification unit, which will be described respectively below.
- the extracting unit is configured to extract commodity classifiers in the commodity description information of each commodity by using a commodity classifier database, which includes a plurality of preset commodity classifiers.
- each product must provide product description information, and the product and product description information correspond one-to-one, and the product description information is used to describe the parameters, performance, effects, etc. of the product.
- the product classification thesaurus includes multiple preset product classifiers. Classifiers are obtained through the analysis of a large amount of commodity information, coupled with manual screening, so as to establish a scientific and reasonable commodity classification thesaurus. It can be understood that with the launch of new products and the withdrawal of old products, the product classification thesaurus is dynamically changing, and some preset product classification words are added or deleted according to the changes of products.
- the product classification thesaurus After the product classification thesaurus is established, use the product classification thesaurus to extract the product classification words in the product description information of each product.
- the product description information includes title, abstract and keywords.
- the product classification thesaurus When using the product classification thesaurus to extract the product classifiers in the product description information of each product, it is necessary to use the preset product classifiers in the product classification thesaurus.
- the title, abstract and keywords are segmented, and the product classification words in the product description information of each product are extracted from the word segmentation.
- the first grading unit is used to count the number of occurrences of each commodity classifier, and the commodity classifier with the largest number of occurrences is used as the first-level classifier. Specifically, after extracting the product classification words in the product description information of each product from the word segmentation, count the number of occurrences of each product classification word, and sort all the product classification words according to the number of occurrences from high to low.
- the commodity classifiers are used as the first-level classifiers. If there are at least two product classifiers with the same number of appearances, then these multiple product classifiers are all regarded as first-level classifiers.
- the second grading unit is used to count the commodity classifiers that appear simultaneously with the superior classifier, and the commodity classifier that appears most frequently with the superior classifier is used as the secondary classifier.
- the secondary classifiers determine the secondary classifiers in turn.
- the secondary classifiers it is necessary to count the product classifiers that appear at the same time as the upper-level classifiers, that is, a certain commodity description information contains both the upper-level classifiers and the second-level categories.
- the upper-level classifier is diode and the second-level classifier is Schottky, then both diode and Schottky need to appear in the description information of a certain product.
- the commodity classifiers that appear at the same time as the superior classifiers sort the commodity classifiers that appear at the same time as the superior classifiers according to the number of occurrences from high to low, and use the commodity classifiers that appear most frequently with the superior classifiers as the secondary classification word.
- the superior classifier is diode, and if Schottky is the most frequent occurrence of diode, then Schottky is the secondary classifier of diode.
- the "superior classifier" in the commodity classifiers that appear together with the superior classifiers is counted not only including the upper classifiers, but including all the superior classifiers.
- the "superior classifier" when searching for the third-level classifier, the "superior classifier" must include both the first-level classifier diode and the second-level classifier Schottky.
- the principle is explained by using the commodity classification hierarchy tree to include three-level classifiers.
- Other multi-level classifiers can be referred to for implementation: count the commodity classifiers that appear at the same time as the first-level classifiers, and classify the products that appear most frequently at the same time as the first-level classifiers words as secondary classifiers.
- the commodity classifiers that appear simultaneously with the first-level classifiers and the second-level classifiers are counted, and the commodity classifiers that appear most frequently together with the first-level classifiers and the second-level classifiers are taken as the third-level classifiers.
- the same product may have different descriptions or names
- multi-level classifiers there may be a problem of repeated stratification.
- at least two classifiers are combined to form the corresponding classification keywords of the same product, only One of the set of hierarchical classifiers.
- the combination of one of the two level classifiers is AB
- the combination of the other two level classifiers is BA
- AB and BA are classification keywords corresponding to the same product, only one set of level classifiers is reserved.
- the third grading unit is used to repeatedly execute the second grading unit until the upper-level classifiers cannot find the commodity classifiers that appear at the same time, and the classifiers at all levels are combined into a commodity classification hierarchy tree.
- the commodity classification hierarchy tree includes each commodity and The corresponding relationship of classifiers at all levels. Because the product classification tree contains the corresponding relationship between each product and the classifiers at all levels, users can quickly locate the product through the product classification tree layer by layer when searching for products, and find the desired product quickly and accurately.
- the preset grading levels can be set, then repeating the second grading unit until the upper-level classifiers cannot find the commodity classifiers that appear at the same time includes: repeating the second grading unit until the grading levels reach the preset The number of grading levels is set, that is, when the number of grading levels reaches the preset number of grading levels, the second grading unit will not be repeatedly executed.
- This embodiment automatically builds a product classification hierarchical tree according to the product classification thesaurus and product description information, making the classification more reasonable and saving a lot of labor costs.
- the computer device in this embodiment includes a processor and a memory, and the processor is communicatively connected to the memory.
- the memory is used to store computer programs; the processor is used to execute the computer programs stored in the memory to realize the above-mentioned automatic commodity classification method.
- the computer equipment in this embodiment automatically builds a product classification hierarchical tree according to the product classification thesaurus and product description information, making the classification more reasonable and saving a lot of labor costs.
- each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other.
- the description is relatively simple, and for the related information, please refer to the description of the method part.
- RAM random access memory
- ROM read-only memory
- EEPROM electrically programmable ROM
- EEPROM electrically erasable programmable ROM
- registers hard disk, removable disk, CD-ROM, or any other Any other known storage medium.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Finance (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Accounting & Taxation (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种商品自动分类方法、装置及计算机设备。该方法包括:S1、使用商品分类词库提取每个商品的商品描述信息中的商品分类词,商品分类词库包括多个预设商品分类词;S2、统计每个商品分类词的出现次数,将出现次数最多的商品分类词作为一级分类词;S3、统计与上级分类词同时出现的商品分类词,将与上级分类词同时出现次数最多的商品分类词作为次级分类词;S4、重复步骤S3直至上级分类词找不到同时出现的商品分类词,将各级分类词按层级组合为商品分类层级树,商品分类层级树中包含各个商品和各级分类词的对应关系。上述方法根据商品分类词库和商品描述信息自动建设品分类层级树,使分类更为合理,同时节省大量人力成本。
Description
本发明涉及电子商务领域,更具体地说,涉及一种商品自动分类方法、装置及计算机设备。
在电子商务领域,为方便用户查找所需商品,需将商品分类排放显示,对商品进行好的分类管理能够帮助用户快速找到需要的商品。现有技术依靠人工进行商品分类,工作人员因经验所限可能导致分类不合理,同时该分类方法需要耗费大量人力建设和维护,成本高效率低。
本发明要解决的技术问题在于,针对现有技术的上述缺陷,提供一种商品自动分类方法、装置及计算机设备。
本发明解决其技术问题所采用的技术方案是:构造一种商品自动分类方法,包括下述步骤:
S1、使用商品分类词库提取每个商品的商品描述信息中的商品分类词,所述商品分类词库包括多个预设商品分类词;
S2、统计每个所述商品分类词的出现次数,将出现次数最多的商品分类词作为一级分类词;
S3、统计与上级分类词同时出现的商品分类词,将与上级分类词同时出现次数最多的商品分类词作为次级分类词;
S4、重复所述步骤S3直至上级分类词找不到同时出现的商品分类词,将各级分类词按层级组合为商品分类层级树,所述商品分类层级树中包含各个商品和各级分类词的对应关系。
进一步,在本发明所述的商品自动分类方法中,所述商品描述信息包括标题、摘要和关键词,则所述步骤S1中使用商品分类词库提取每个商品的商品描述信息中的商品分类词包括:
使用商品分类词库中的预设商品分类词对所述标题、所述摘要和所述关键词进行分词,从分词中提取每个商品的商品描述信息中的商品分类词。
进一步,在本发明所述的商品自动分类方法中,所述步骤S3包括:
S31、统计与所述一级分类词同时出现的商品分类词,将与所述一级分类词同时出现次数最多的商品分类词作为二级分类词。
进一步,在本发明所述的商品自动分类方法中,所述步骤S3包括:
S32、统计与所述一级分类词和所述二级分类词同时出现的商品分类词,将与所述一级分类词和所述二级分类词同时出现次数最多的商品分类词作为三级分类词。
进一步,在本发明所述的商品自动分类方法中,在所述步骤S3中,若至少两个层级分类词组合后为同一商品对应的分类关键词,则仅保留其中一组层级分类词。
进一步,在本发明所述的商品自动分类方法中,所述步骤S4中重复所述步骤S3直至上级分类词找不到同时出现的商品分类词包括:重复所述步骤S3直至分级级数达到预设分级级数。
进一步,在本发明所述的商品自动分类方法中,在所述步骤S4之后还包括:
S51、使用商品分类词库提取新增商品的商品描述信息中的商品分类词,查找所述商品分类词在所述商品分类层级树的位置,将所述新增商品添加至所述商品分类层级树。
进一步,在本发明所述的商品自动分类方法中,在所述步骤S4之后还包括:
S52、若所述商品分类词库不包含新增商品对应的商品分类词,则将所述新增商品的商品分类词添加至所述商品分类词库中;使用更新后的商品分类词库重新执行所述步骤S1至步骤S4,更新所述商品分类层级树。
另外,本发明还提供一种商品自动分类装置,包括:
提取单元,用于使用商品分类词库提取每个商品的商品描述信息中的商品分类词,所述商品分类词库包括多个预设商品分类词;
第一分级单元,用于统计每个所述商品分类词的出现次数,将出现次数最多的商品分类词作为一级分类词;
第二分级单元,用于统计与上级分类词同时出现的商品分类词,将与上级分类词同时出现次数最多的商品分类词作为次级分类词;
第三分级单元,用于重复执行所述第二分级单元直至上级分类词找不到同时出现的商品分类词,将各级分类词按层级组合为商品分类层级树,所述商品分类层级树中包含各个商品和各级分类词的对应关系。
另外,本发明还提供一种计算机设备,包括处理器和存储器,所述处理器通信连接所述存储器;
所述存储器用于存储计算机程序;
所述处理器用于执行所述存储器中存储的计算机程序以实现如上述的商品自动分类方法。
实施本发明的一种商品自动分类方法、装置及计算机设备,具有以下有益效果:本发明根据商品分类词库和商品描述信息自动建设品分类层级树,使分类更为合理,同时节省大量人力成本。
下面将结合附图及实施例对本发明作进一步说明,附图中:
图1是本发明实施例提供的一种商品自动分类方法的流程图;
图2是本发明实施例提供的一种商品自动分类方法的流程图;
图3是本发明实施例提供的一种商品自动分类方法的流程图;
图4是本发明实施例提供的一种商品自动分类装置的结构示意图。
为了对本发明的技术特征、目的和效果有更加清楚的理解,现对照附图详细说明本发明的具体实施方式。
在一优选实施例中,参考图1,本实施例的商品自动分类方法用于对大量商品进行自动分类,可用于仓库管理、电商平台管理等,例如电子元件售卖平台。具体的,该商品自动分类方法包括下述步骤:
S1、使用商品分类词库提取每个商品的商品描述信息中的商品分类词,商品分类词库包括多个预设商品分类词。
具体的,本实施例要求每个商品必须提供商品描述信息,商品和商品描述信息一一对应,商品描述信息用于描述商品的参数、性能、效果等。在提取商品描述信息中的商品分类词时首先需要对商品描述信息进行分词,为使分词更加专业合理,需建立商品分类词库,商品分类词库包括多个预设商品分类词,预设商品分类词经由大量商品信息分析取得到,加之人工筛选,从而建立科学合理的商品分类词库。可以理解,随着新商品的上市和旧商品的退出,商品分类词库是动态变化的,根据商品的变化对应新增或删除一些预设商品分类词。
建立商品分类词库后,使用商品分类词库提取每个商品的商品描述信息中的商品分类词。作为选择,商品描述信息包括标题、摘要和关键词等,则使用商品分类词库提取每个商品的商品描述信息中的商品分类词时,需使用商品分类词库中的预设商品分类词分别对标题、摘要和关键词进行分词,从分词中提取每个商品的商品描述信息中的商品分类词。
S2、统计每个商品分类词的出现次数,将出现次数最多的商品分类词作为一级分类词。具体的,从分词中提取每个商品的商品描述信息中的商品分类词后,统计每个商品分类词的出现次数,根据出现次数从高到低对所有商品分类词进行排序,将出现次数最多的商品分类词作为一级分类词。若有至少两个商品分类词的出现次数相同,则将这多个商品分类词都作为一级分类词。
S3、统计与上级分类词同时出现的商品分类词,将与上级分类词同时出现次数最多的商品分类词作为次级分类词。
具体的,确定上级分类词后依次确定次级分类词,例如确定一级分类词后确定二级分类词,确定二级分类词后确定三极分类词,依次类推。确定次级分类词时需统计与上级分类词同时出现的商品分类词,即某一商品描述信息中同时包含上级分类词和次级分类词,例如上级分类词是二极管,次级分类词是肖特基,则需要某一商品描述信息中同时出现二极管和肖特基。统计与上级分类词同时出现的商品分类词,按照出现次数从高到低对与上级分类词同时出现的商品分类词进行排序,将与上级分类词同时出现次数最多的商品分类词作为次级分类词。例如上级分类词是二极管,若与二极管同时出现次数最多的是肖特基,则肖特基为二极管的次级分类词。进一步,统计与上级分类词同时出现的商品分类词中的“上级分类词”不仅是包含上一级分类词,而是包含所有上级分类词。例如,一级分类词为二极管,二级分类词为肖特基,则在寻找三极分类词时,“上级分类词”需同时包含一级分类词二极管,二级分类词肖特基。
现以商品分类层级树包含三层级分类词进行原理说明,其他多层级分类词可参考实施:
S31、统计与一级分类词同时出现的商品分类词,将与一级分类词同时出现次数最多的商品分类词作为二级分类词。
S32、统计与一级分类词和二级分类词同时出现的商品分类词,将与一级分类词和二级分类词同时出现次数最多的商品分类词作为三级分类词。
作为选择,因同一商品可能存在不同描述或叫法,在得到多级分类词后,可能存在重复分层问题,若至少两个层级分类词组合后为同一商品对应的分类关键词,则仅保留其中一组层级分类词。例如其中一个两个层级分类词组合后为AB,另外一个两个层级分类词组合后为BA,而AB和BA为同一商品对应的分类关键词,则仅保留其中一组层级分类词即可。
S4、重复步骤S3直至上级分类词找不到同时出现的商品分类词,将各级分类词按层级组合为商品分类层级树,商品分类层级树中包含各个商品和各级分类词的对应关系。因商品分类层级树包含各个商品和各级分类词的对应关系,用户在查找商品时可通过商品分类层级树一层层进行快速定位,快速准确的找到所需商品。作为选择,为防止层级过多可设置预设分级级数,则步骤S4中重复步骤S3直至上级分类词找不到同时出现的商品分类词包括:重复步骤S3直至分级级数达到预设分级级数,即分级级数达到预设分级级数时便不再重复步骤S3。
本实施例根据商品分类词库和商品描述信息自动建设品分类层级树,使分类更为合理,同时节省大量人力成本。
在一些实施例的商品自动分类方法中,参考图2,在步骤S4之后还包括:S51、使用商品分类词库提取新增商品的商品描述信息中的商品分类词,查找商品分类词在商品分类层级树的位置,将新增商品添加至商品分类层级树。本实施例解决新增商品融入商品分类层级树的问题,使新增商品能自动快速定位至商品分类层级树。
在一些实施例的商品自动分类方法中,参考图3,在步骤S4之后还包括:S52、若商品分类词库不包含新增商品对应的商品分类词,则将新增商品的商品分类词添加至商品分类词库中;使用更新后的商品分类词库重新执行步骤S1至步骤S4,更新商品分类层级树。本实施例解决新增商品为新产品的问题,需重新更新整个商品分类层级树,在更新过程中可使用在先分类过程中产生的商品分类词统计数据,提高更新效率。
在一优选实施例中,参考图4,本实施例的商品自动分类装置用于对大量商品进行自动分类,可用于仓库管理、电商平台管理等,例如电子元件售卖平台。具体的,该商品自动分类装置包括提取单元、第一分级单元、第二分级单元和第三分级单元,以下分别进行说明。
提取单元,用于使用商品分类词库提取每个商品的商品描述信息中的商品分类词,商品分类词库包括多个预设商品分类词。
具体的,本实施例要求每个商品必须提供商品描述信息,商品和商品描述信息一一对应,商品描述信息用于描述商品的参数、性能、效果等。在提取商品描述信息中的商品分类词时首先需要对商品描述信息进行分词,为使分词更加专业合理,需建立商品分类词库,商品分类词库包括多个预设商品分类词,预设商品分类词经由大量商品信息分析取得到,加之人工筛选,从而建立科学合理的商品分类词库。可以理解,随着新商品的上市和旧商品的退出,商品分类词库是动态变化的,根据商品的变化对应新增或删除一些预设商品分类词。
建立商品分类词库后,使用商品分类词库提取每个商品的商品描述信息中的商品分类词。作为选择,商品描述信息包括标题、摘要和关键词,则使用商品分类词库提取每个商品的商品描述信息中的商品分类词时,需使用商品分类词库中的预设商品分类词分别对标题、摘要和关键词进行分词,从分词中提取每个商品的商品描述信息中的商品分类词。
第一分级单元,用于统计每个商品分类词的出现次数,将出现次数最多的商品分类词作为一级分类词。具体的,从分词中提取每个商品的商品描述信息中的商品分类词后,统计每个商品分类词的出现次数,根据出现次数从高到低对所有商品分类词进行排序,将出现次数最多的商品分类词作为一级分类词。若有至少两个商品分类词的出现次数相同,则将这多个商品分类词都作为一级分类词。
第二分级单元,用于统计与上级分类词同时出现的商品分类词,将与上级分类词同时出现次数最多的商品分类词作为次级分类词。
具体的,确定上级分类词后依次确定次级分类词,确定次级分类词时需统计与上级分类词同时出现的商品分类词,即某一商品描述信息中同时包含上级分类词和次级分类词,例如上级分类词是二极管,次级分类词是肖特基,则需要某一商品描述信息中同时出现二极管和肖特基。统计与上级分类词同时出现的商品分类词,按照出现次数从高到低对与上级分类词同时出现的商品分类词进行排序,将与上级分类词同时出现次数最多的商品分类词作为次级分类词。例如上级分类词是二极管,若与二极管同时出现次数最多的是肖特基,则肖特基为二极管的次级分类词。进一步,统计与上级分类词同时出现的商品分类词中的“上级分类词”不仅是包含上一级分类词,而是包括所有上级分类词。例如,一级分类词为二极管,二级分类词为肖特基,则在寻找三极分类词时,“上级分类词”需同时包含一级分类词二极管,二级分类词肖特基。
现以商品分类层级树包含三级分类词进行原理说明,其他多层分类词可参考实施:统计与一级分类词同时出现的商品分类词,将与一级分类词同时出现次数最多的商品分类词作为二级分类词。统计与一级分类词和二级分类词同时出现的商品分类词,将与一级分类词和二级分类词同时出现次数最多的商品分类词作为三级分类词。
作为选择,因同一商品可能存在不同描述或叫法,在得到多级分类词后,可能存在重复分层问题,若至少两个层级分类词组合后为同一商品对应的分类关键词,则仅保留其中一组层级分类词。例如其中一个两个层级分类词组合后为AB,另外一个两个层级分类词组合后为BA,而AB和BA为同一商品对应的分类关键词,则仅保留其中一组层级分类词即可。
第三分级单元,用于重复执行第二分级单元直至上级分类词找不到同时出现的商品分类词,将各级分类词按层级组合为商品分类层级树,商品分类层级树中包含各个商品和各级分类词的对应关系。因商品分类层级树包含各个商品和各级分类词的对应关系,用户在查找商品时可通过商品分类层级树一层层进行快速定位,快速准确的找到所需商品。作为选择,为防止层级过多可设置预设分级级数,则重复执行第二分级单元直至上级分类词找不到同时出现的商品分类词包括:重复执行第二分级单元直至分级级数达到预设分级级数,即分级级数达到预设分级级数时便不再重复执行第二分级单元。
本实施例根据商品分类词库和商品描述信息自动建设品分类层级树,使分类更为合理,同时节省大量人力成本。
在一优选实施例中,本实施例的计算机设备包括处理器和存储器,处理器通信连接存储器。存储器用于存储计算机程序;处理器用于执行存储器中存储的计算机程序以实现如上述的商品自动分类方法。本实施例的计算机设备根据商品分类词库和商品描述信息自动建设品分类层级树,使分类更为合理,同时节省大量人力成本。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上实施例只为说明本发明的技术构思及特点,其目的在于让熟悉此项技术的人士能够了解本发明的内容并据此实施,并不能限制本发明的保护范围。凡跟本发明权利要求范围所做的均等变化与修饰,均应属于本发明权利要求的涵盖范围。
Claims (10)
- 一种商品自动分类方法,其特征在于,包括下述步骤:S1、使用商品分类词库提取每个商品的商品描述信息中的商品分类词,所述商品分类词库包括多个预设商品分类词;S2、统计每个所述商品分类词的出现次数,将出现次数最多的商品分类词作为一级分类词;S3、统计与上级分类词同时出现的商品分类词,将与上级分类词同时出现次数最多的商品分类词作为次级分类词;S4、重复所述步骤S3直至上级分类词找不到同时出现的商品分类词,将各级分类词按层级组合为商品分类层级树,所述商品分类层级树中包含各个商品和各级分类词的对应关系。
- 根据权利要求1所述的商品自动分类方法,其特征在于,所述商品描述信息包括标题、摘要和关键词,则所述步骤S1中使用商品分类词库提取每个商品的商品描述信息中的商品分类词包括:使用商品分类词库中的预设商品分类词对所述标题、所述摘要和所述关键词进行分词,从分词中提取每个商品的商品描述信息中的商品分类词。
- 根据权利要求1所述的商品自动分类方法,其特征在于,所述步骤S3包括:S31、统计与所述一级分类词同时出现的商品分类词,将与所述一级分类词同时出现次数最多的商品分类词作为二级分类词。
- 根据权利要求3所述的商品自动分类方法,其特征在于,所述步骤S3包括:S32、统计与所述一级分类词和所述二级分类词同时出现的商品分类词,将与所述一级分类词和所述二级分类词同时出现次数最多的商品分类词作为三级分类词。
- 根据权利要求1所述的商品自动分类方法,其特征在于,在所述步骤S3中,若至少两个层级分类词组合后为同一商品对应的分类关键词,则仅保留其中一组层级分类词。
- 根据权利要求1所述的商品自动分类方法,其特征在于,所述步骤S4中重复所述步骤S3直至上级分类词找不到同时出现的商品分类词包括:重复所述步骤S3直至分级级数达到预设分级级数。
- 根据权利要求1所述的商品自动分类方法,其特征在于,在所述步骤S4之后还包括:S51、使用商品分类词库提取新增商品的商品描述信息中的商品分类词,查找所述商品分类词在所述商品分类层级树的位置,将所述新增商品添加至所述商品分类层级树。
- 根据权利要求1所述的商品自动分类方法,其特征在于,在所述步骤S4之后还包括:S52、若所述商品分类词库不包含新增商品对应的商品分类词,则将所述新增商品的商品分类词添加至所述商品分类词库中;使用更新后的商品分类词库重新执行所述步骤S1至步骤S4,更新所述商品分类层级树。
- 一种商品自动分类装置,其特征在于,包括:提取单元,用于使用商品分类词库提取每个商品的商品描述信息中的商品分类词,所述商品分类词库包括多个预设商品分类词;第一分级单元,用于统计每个所述商品分类词的出现次数,将出现次数最多的商品分类词作为一级分类词;第二分级单元,用于统计与上级分类词同时出现的商品分类词,将与上级分类词同时出现次数最多的商品分类词作为次级分类词;第三分级单元,用于重复执行所述第二分级单元直至上级分类词找不到同时出现的商品分类词,将各级分类词按层级组合为商品分类层级树,所述商品分类层级树中包含各个商品和各级分类词的对应关系。
- 一种计算机设备,其特征在于,包括处理器和存储器,所述处理器通信连接所述存储器;所述存储器用于存储计算机程序;所述处理器用于执行所述存储器中存储的计算机程序以实现如权利要求1至8任一项所述的商品自动分类方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110936505.9A CN113779243A (zh) | 2021-08-16 | 2021-08-16 | 一种商品自动分类方法、装置及计算机设备 |
CN202110936505.9 | 2021-08-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023020508A1 true WO2023020508A1 (zh) | 2023-02-23 |
Family
ID=78837834
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/112865 WO2023020508A1 (zh) | 2021-08-16 | 2022-08-16 | 一种商品自动分类方法、装置及计算机设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113779243A (zh) |
WO (1) | WO2023020508A1 (zh) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779243A (zh) * | 2021-08-16 | 2021-12-10 | 深圳市世强元件网络有限公司 | 一种商品自动分类方法、装置及计算机设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778205A (zh) * | 2014-01-13 | 2014-05-07 | 北京奇虎科技有限公司 | 一种基于互信息的商品分类方法和系统 |
US20180025364A1 (en) * | 2016-07-20 | 2018-01-25 | Nec Personal Computers, Ltd. | Information processing apparatus, information processing method, and program |
US10909144B1 (en) * | 2015-03-06 | 2021-02-02 | Amazon Technologies, Inc. | Taxonomy generation with statistical analysis and auditing |
CN112463971A (zh) * | 2020-09-15 | 2021-03-09 | 杭州商情智能有限公司 | 一种基于层级组合模型的电商商品分类方法及系统 |
CN113779243A (zh) * | 2021-08-16 | 2021-12-10 | 深圳市世强元件网络有限公司 | 一种商品自动分类方法、装置及计算机设备 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102737057B (zh) * | 2011-04-14 | 2015-04-01 | 阿里巴巴集团控股有限公司 | 一种商品类目信息的确定方法及装置 |
CN103902545B (zh) * | 2012-12-25 | 2018-10-16 | 北京京东尚科信息技术有限公司 | 一种类目路径识别方法及系统 |
CN105320778B (zh) * | 2015-11-25 | 2019-04-02 | 焦点科技股份有限公司 | 一种适用于电子商务中文网站商品标签化的方法 |
CN108280221B (zh) * | 2018-02-08 | 2022-04-15 | 北京百度网讯科技有限公司 | 关注点的层次化构建方法、装置和计算机设备 |
CN111782760B (zh) * | 2019-05-09 | 2024-07-16 | 北京沃东天骏信息技术有限公司 | 核心产品词的识别方法、装置及设备 |
CN111353045B (zh) * | 2020-03-18 | 2023-12-22 | 智者四海(北京)技术有限公司 | 构建文本分类体系的方法 |
-
2021
- 2021-08-16 CN CN202110936505.9A patent/CN113779243A/zh active Pending
-
2022
- 2022-08-16 WO PCT/CN2022/112865 patent/WO2023020508A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778205A (zh) * | 2014-01-13 | 2014-05-07 | 北京奇虎科技有限公司 | 一种基于互信息的商品分类方法和系统 |
US10909144B1 (en) * | 2015-03-06 | 2021-02-02 | Amazon Technologies, Inc. | Taxonomy generation with statistical analysis and auditing |
US20180025364A1 (en) * | 2016-07-20 | 2018-01-25 | Nec Personal Computers, Ltd. | Information processing apparatus, information processing method, and program |
CN112463971A (zh) * | 2020-09-15 | 2021-03-09 | 杭州商情智能有限公司 | 一种基于层级组合模型的电商商品分类方法及系统 |
CN113779243A (zh) * | 2021-08-16 | 2021-12-10 | 深圳市世强元件网络有限公司 | 一种商品自动分类方法、装置及计算机设备 |
Also Published As
Publication number | Publication date |
---|---|
CN113779243A (zh) | 2021-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9015194B2 (en) | Root cause analysis using interactive data categorization | |
US20180173788A1 (en) | System And Method For Providing Inclusion-Based Electronically Stored Information Item Classification Suggestions With The Aid Of A Digital Computer | |
CN102567464B (zh) | 基于扩展主题图的知识资源组织方法 | |
US9996558B2 (en) | Method and system for accessing a set of data tables in a source database | |
CN113590698B (zh) | 基于人工智能技术的数据资产分类建模与分级保护方法 | |
US11531831B2 (en) | Managing machine learning features | |
WO2021175009A1 (zh) | 预警事件图谱的构建方法、装置、设备及存储介质 | |
CN107918657B (zh) | 一种数据源的匹配方法和装置 | |
CN106598999B (zh) | 一种计算文本主题归属度的方法及装置 | |
WO2021051864A1 (zh) | 词典扩充方法及装置、电子设备、存储介质 | |
CN103034656B (zh) | 章节内容分层方法和装置、文章内容分层方法和装置 | |
CN111506727A (zh) | 文本内容类别获取方法、装置、计算机设备和存储介质 | |
WO2023020508A1 (zh) | 一种商品自动分类方法、装置及计算机设备 | |
CN110659365A (zh) | 一种基于多层次结构词典的畜产品安全事件文本分类方法 | |
CN113742292B (zh) | 基于ai技术的多线程数据检索及所检索数据的访问方法 | |
CN115617743A (zh) | 一种基于数据采集的科技项目档案管理系统 | |
JP5324677B2 (ja) | 類似文書検索支援装置及び類似文書検索支援プログラム | |
CN110941645B (zh) | 一种自动判定串案的方法、装置、存储介质及处理器 | |
CN112861956A (zh) | 基于数据分析的水污染模型构建方法 | |
CN115510331B (zh) | 一种基于闲置量聚合的共享资源匹配方法 | |
CN111652708A (zh) | 一种应用于房屋抵押贷款产品中的风险评估方法及装置 | |
CN114511027B (zh) | 通过大数据网络进行英语远程数据提取方法 | |
CN109815475B (zh) | 文本匹配方法、装置、计算设备及系统 | |
JP5310196B2 (ja) | 分類体系改正支援プログラム、分類体系改正支援装置、および分類体系改正支援方法 | |
JP3880534B2 (ja) | 文書分類方法及び文書分類プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22857819 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22857819 Country of ref document: EP Kind code of ref document: A1 |