CN115588113A

CN115588113A - E-commerce commodity classification system based on hierarchical combination model

Info

Publication number: CN115588113A
Application number: CN202211089770.9A
Authority: CN
Inventors: 曾定茜; 游静; 罗学强
Original assignee: Guangdong Polytechnic Institute
Current assignee: Guangdong Polytechnic Institute
Priority date: 2022-09-07
Filing date: 2022-09-07
Publication date: 2023-01-10

Abstract

The invention belongs to the technical field of commodity classification processing, and discloses an E-commerce commodity classification system based on a hierarchical combination model, which comprises the following components: the system comprises a commodity title acquisition module, a commodity information acquisition module, a central control module, a commodity image preprocessing module, a commodity feature extraction module, a commodity description information processing module, a keyword extraction module, an association mapping relation determination module, a commodity level combination classification model construction module, a classification module and an output module. According to the invention, a hierarchical combined classification model is constructed through titles, commodity image characteristics, description characteristics and other various information, and commodity classification is carried out through mutual cooperation of the combined model and a neural network, so that the accuracy of commodity classification can be ensured, meanwhile, the classification confusion of commodities of different categories with the same name can be effectively avoided, and comprehensive, accurate and multilevel E-commerce commodity classification is realized.

Description

E-commerce commodity classification system based on hierarchical combination model

Technical Field

The invention belongs to the technical field of commodity classification processing, and particularly relates to an e-commerce commodity classification system based on a hierarchical combination model.

Background

At present, the development of the e-commerce field is still growing, and the generated data can be said to grow exponentially. Various commodities are layered endlessly, and manual classification is difficult to realize due to the fact that the data size is too large; it is necessary to complete the classification of the e-commerce goods by a computer classification system.

However, when a product is released by the e-commerce platform, the category to which the product belongs is selected first, and after the category is selected, fields such as attributes and descriptions related to the category are filled in. The categories are the original category system, but the category system is not accurate and comprehensive, and the adjustment of the original category system relates to the maintenance of data of historical published products, so that the workload is extremely large. In addition, the habits of the merchant who releases the commodities and the buyer who purchases the commodities may be different, and one category system cannot simultaneously meet the use requirements of different terminal users.

Through the above analysis, the problems and defects of the prior art are as follows: the existing commodity classification method of the E-commerce platform is inaccurate in classification, incomplete and large in workload.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an E-commerce commodity classification system based on a hierarchical combination model.

The invention is realized in such a way, an E-commerce commodity classification system based on a hierarchical combination model comprises:

the commodity title acquisition module is connected with the central control module and is used for acquiring the title names of the e-commerce commodities to be classified;

the commodity image preprocessing module is connected with the central control module and is used for preprocessing the acquired commodity image;

the commodity image preprocessing module is used for preprocessing the obtained commodity image and comprises the following steps:

denoising the commodity image:

wherein label1 represents the de-noised commodity image, g (x, y, 0) is the initial commodity image, g (x, y, pi/2), g (x, y, pi/2) respectively represent 3 images obtained by pi/2, pi, 3 pi/2 and other interval phase shifts of the initial commodity image, f ₀ Represents a spatial carrier frequency, x represents one coordinate value among pixel coordinates (x, y);

enhancing the denoised commodity image to obtain a preprocessed commodity image;

the incidence mapping relation determining module is connected with the central control module and is used for determining incidence mapping relations of the commodity title name, the image characteristics, the description information, the parameter data and other detailed data;

the commodity level combined classification model building module is connected with the central control module and used for building a commodity level combined classification model;

and the classification module is connected with the central control module and used for determining a classification result of the commodity based on the commodity title, the commodity image characteristics, the commodity key words, other description information, parameter information and the associated mapping relation thereof based on the optimized commodity level combination classification model.

Further, the E-commerce commodity classification system based on the hierarchical combination model further comprises:

the commodity information acquisition module is connected with the central control module and is used for acquiring related information of E-commerce commodities to be classified;

the central control module is connected with the commodity title acquisition module, the commodity information acquisition module, the commodity image preprocessing module, the commodity feature extraction module, the commodity description information processing module, the keyword extraction module, the incidence mapping relation determination module, the commodity level combination classification model construction module, the classification module and the output module and is used for controlling each module to normally work by utilizing a single chip microcomputer or a controller;

the commodity feature extraction module is connected with the central control module and used for extracting commodity image features based on the preprocessed commodity image;

the commodity description information processing module is connected with the central control module and is used for processing the acquired commodity characteristics and the detail description information thereof;

the keyword extraction module is connected with the central control module and is used for extracting description keywords based on the processed commodity characteristics and description information thereof;

and the output module is connected with the central control module and used for outputting the acquired corresponding information of the commodities and the classification result of the commodities.

Further, the commodity information acquisition module includes:

the image acquisition unit is used for acquiring a real object image of the commodity by utilizing the camera equipment or acquiring a commodity image in a scanning mode;

the detailed description information acquisition unit is used for acquiring the characteristics and detailed description information of related commodities;

and the commodity parameter information acquisition unit is used for acquiring the relevant dimensions and other parameter data of the commodities.

Further, the commodity feature extraction module extracting the commodity image features based on the preprocessed commodity image comprises:

removing irrelevant information in the preprocessed commodity image by adopting a dynamic threshold layer-by-layer peeling segmentation method based on color characteristics;

acquiring commodity characteristic information in a commodity image by adopting a clustering segmentation algorithm based on color characteristics and texture characteristics;

removing segmentation fragments of the segmented commodity image by adopting a denoising method based on texture characteristics; and fusing the extracted color and texture features to obtain the commodity image features.

Further, in the process of layer-by-layer peeling and segmenting of the dynamic threshold based on the color features, the adopted model is as follows:

wherein t is a threshold value, and L similarity levels;

the sum of the variances of the commodity image area and the background area is specifically as follows:

the inter-class variance between the commodity image area and the background area specifically comprises the following steps:

d ² (t)＝ω ₀ (t)(μ ₀ (t)-μ) ² +ω ₁ (t)(μ ₁ (t)-μ) ² ；

the proportion of the image area is specifically as follows:

the proportion of the background area is specifically as follows:

the image area mean value specifically includes:

the background area mean is:

the overall image mean is:

μ＝ω ₀ (t)μ ₀ (t)+ω ₁ (t)μ ₁ (t)；

the sum of the total variance of the image areas is:

the sum of the total variance of the background area is:

further, the color feature and texture feature-based clustering segmentation algorithm obtains commodity feature information in the commodity image, and the specific process is as follows:

determining the maximum cycle number and the minimum termination cycle difference value by taking the color characteristics and the texture characteristics as the clustering number;

initializing central points of all color features and texture features, and determining fuzzy values of all the color features and the texture features;

determining object values of the color features and the texture features, calculating color values of the color features, and generating new image pixels;

determining clusters corresponding to the color features and the texture features according to the determined object values of the color features and the texture features, and updating the cluster index of each feature;

and calculating object values of the color features and the texture features, making a difference with the object values, stopping calculating and outputting corresponding features when the maximum cycle number is reached, and otherwise, repeating the process.

Further, the specific process of fusing the extracted color and texture features is as follows:

converting the corresponding image into a column vector form according to the extracted color and texture features, specifically:

wherein D = [ D ] ₁ ，d ₂ ，...，d _t ，...，d _T ]Containing T as an atom, d _t Is an atom, s, of a given overcomplete dictionary D ^j ＝[s ^j (1)，s ^j (2)，...，s ^j (t)，...，s ^j (T)]，s ^j Is V found by sparse decomposition model ^j Corresponding sparsity;

the corresponding image is divided into J blocks to form a matrix V, and the matrix is as follows:

obtaining S = [ S ] ¹ ，s ² ，...，s ^J ]And S is a sparse matrix, the characteristics of the commodity image obtained by fusion are as follows:

V＝DS。

further, the building of the commodity-level combined classification model by the commodity-level combined classification model building module includes:

constructing a GRU neural network with a multi-input multi-output structure, and training the neural network by taking a minimum cross entropy loss function as an optimization target to obtain a basic classification model;

obtaining historical commodity titles, commodity image characteristics, commodity keywords, other description information and parameter information, and generating a commodity hierarchy according to the obtained related information;

clustering the historical commodity title, the image characteristics of the commodity, the commodity keywords, other description information and parameter information, and associating the commodity hierarchy to obtain a commodity multi-level keyword list;

establishing a mapping relation of the commodity multi-level keyword list; and grouping and fusing the basic classification models based on the obtained mapping relation to obtain a commodity level combination classification model.

Further, the building of the commodity level combined classification model by the commodity level combined classification model building module further comprises:

and training and optimizing the commodity level combination classification model by using a historical commodity title, commodity image characteristics, commodity keywords, other description information, parameter information and a final classification result to obtain the optimized commodity level combination classification model.

Further, the processing module of the commodity description information processes the acquired commodity characteristics and the detail description information thereof, and the processing module of the commodity description information comprises:

integrating the obtained commodity characteristics and detail description information thereof into a description document of the commodity;

and performing word segmentation, word shape restoration, stop word removal, word removal with extremely high occurrence frequency and word pre-processing with extremely low occurrence frequency on the description document.

By combining all the technical schemes, the invention has the advantages and positive effects that: according to the invention, a hierarchical combined classification model is constructed through titles, commodity image characteristics, description characteristics and other various information, and commodity classification is carried out through the mutual cooperation of the combined model and a neural network, so that the accuracy of commodity classification can be ensured, meanwhile, the classification confusion of commodities of different categories with the same name can be effectively avoided, and comprehensive, accurate and multilevel E-commerce commodity classification is realized.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of an e-commerce commodity classification system based on a hierarchical combination model according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of a commodity information acquisition module according to an embodiment of the present invention.

Fig. 3 is a flowchart of a method for processing the acquired commodity characteristics and the detailed description information thereof by the commodity description information processing module according to the embodiment of the present invention.

Fig. 4 is a flowchart of a method for extracting features of a commodity image based on a preprocessed commodity image by a commodity feature extraction module according to an embodiment of the present invention.

Fig. 5 is a flowchart of a method for building a commodity-level combined classification model by a commodity-level combined classification model building module according to an embodiment of the present invention.

In the figure: 1. a commodity title acquisition module; 2. a commodity information acquisition module; 21. an image acquisition unit; 22. a detailed description information acquisition unit; 23. a commodity parameter information acquisition unit; 3. a central control module; 4. a commodity image preprocessing module; 5. a commodity feature extraction module; 6. a commodity description information processing module; 7. a keyword extraction module; 8. an association mapping relation determining module; 9. a commodity level combination classification model building module; 10. a classification module; 11. and an output module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

Aiming at the problems in the prior art, the invention provides an e-commerce commodity classification system based on a hierarchical combination model, and the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the electronic commerce goods classification system based on the hierarchical combination model provided by the embodiment of the present invention includes:

the commodity title acquisition module 1 is connected with the central control module 3 and is used for acquiring the title names of the e-commerce commodities to be classified;

the commodity information acquisition module 2 is connected with the central control module 3 and is used for acquiring the related information of the E-commerce commodities to be classified;

the central control module 3 is connected with the commodity title acquisition module 2, the commodity information acquisition module 3, the commodity image preprocessing module 4, the commodity feature extraction module 5, the commodity description information processing module 6, the keyword extraction module 7, the association mapping relation determination module 8, the commodity level combination classification model construction module 9, the classification module 10 and the output module 11, and is used for controlling each module to normally work by utilizing a single chip microcomputer or a controller;

the commodity image preprocessing module 4 is connected with the central control module 3 and is used for preprocessing the acquired commodity image;

the commodity feature extraction module 5 is connected with the central control module 3 and used for extracting commodity image features based on the preprocessed commodity image;

the commodity description information processing module 6 is connected with the central control module 3 and is used for processing the acquired commodity characteristics and the detail description information thereof;

the keyword extraction module 7 is connected with the central control module 3 and is used for extracting description keywords based on the processed commodity characteristics and description information thereof;

the incidence mapping relation determining module 8 is connected with the central control module 3 and is used for determining incidence mapping relations of the commodity title names, the image characteristics, the description information, the parameter data and other detailed data;

the commodity level combination and classification model building module 9 is connected with the central control module and used for building a commodity level combination and classification model;

the classification module 10 is connected with the central control module 3 and used for determining a classification result of the commodity based on the commodity title, the commodity image characteristics, the commodity key words, other description information, parameter information and the incidence mapping relation thereof based on the optimized commodity level combination classification model;

and the output module 11 is connected with the central control module 3 and is used for outputting the collected corresponding information of the commodities and the classification result of the commodities.

As shown in fig. 2, the commodity information acquiring module 2 provided in the embodiment of the present invention includes:

the image acquisition unit 21 is used for acquiring a real object image of the commodity by using a camera device or acquiring a commodity image in a scanning mode;

a detailed description information acquisition unit 22, configured to acquire characteristics and detailed description information of a related commodity;

the commodity parameter information acquisition unit 23 is used for acquiring relevant dimensions of commodities and other parameter data.

As shown in fig. 3, the processing module for commodity description information according to an embodiment of the present invention processes the acquired commodity features and the detail description information thereof, including:

s101, integrating the acquired commodity characteristics and the detail description information thereof into a description document of the commodity;

s102, preprocessing of word segmentation, word shape restoration, stop word removal, word removal with extremely high occurrence frequency and word with extremely low occurrence frequency is carried out on the description document.

The commodity image preprocessing module provided by the embodiment of the invention is used for preprocessing the obtained commodity image and comprises the following steps:

denoising the commodity image:

wherein label1 represents the de-noised commodity image, g (x, y, 0) is the initial commodity image, g (x, y, pi/2), g (x, y, pi/2) respectively represent 3 images obtained by pi/2, pi, 3 pi/2 and other interval phase shifts of the initial commodity image, f ₀ Represents a spatial carrier frequency, x represents one coordinate value among pixel coordinates (x, y); and enhancing the denoised commodity image to obtain a preprocessed commodity image.

As shown in fig. 4, the commodity feature extraction module provided in the embodiment of the present invention extracts commodity image features based on a preprocessed commodity image, and the commodity feature extraction module includes:

s201, removing irrelevant information in the preprocessed commodity image by adopting a dynamic threshold layer-by-layer peeling segmentation method based on color features;

s202, acquiring commodity feature information in a commodity image by adopting a clustering segmentation algorithm based on color features and texture features;

s203, removing segmentation fragments of the segmented commodity image by adopting a denoising method based on texture characteristics; and fusing the extracted color and texture features to obtain the commodity image features.

In S201 provided in the embodiment of the present invention, in the process of layer-by-layer peeling and segmenting of the dynamic threshold based on the color feature, the model used is:

wherein t is a threshold value, and L similarity levels;

d ² (t)＝ω ₀ (t)(μ ₀ (t)-μ) ² +ω ₁ (t)(μ ₁ (t)-μ) ² ；

the proportion of the image area is specifically as follows:

the proportion of the background area is specifically as follows:

the image area mean value specifically includes:

the background area mean is:

the overall image mean is:

μ＝ω ₀ (t)μ ₀ (t)+ω ₁ (t)μ ₁ (t)；

the sum of the total variance of the image areas is:

the sum of the background area total variances is:

in S202 provided by the embodiment of the present invention, the clustering segmentation algorithm based on the color features and the texture features obtains the commodity feature information in the commodity image, and the specific process is as follows:

In S203 provided by the embodiment of the present invention, the specific process of fusing the extracted color and texture features is as follows:

wherein D = [ D ] ₁ ，d ₂ ，...，d _t ，...，d _T ]Containing T as an atom, d _t Is an atom, s, of a given overcomplete dictionary D ^j ＝[s ^j (1)，s ^j (2)，...，s ^j (t)，...，s ^j (T)]，s ^j Is V found by sparse decomposition model ^j Corresponding sparse sparsity;

V＝DS。

as shown in fig. 5, the step of constructing the commodity-level combined classification model by the commodity-level combined classification model constructing module according to the embodiment of the present invention includes:

s301, constructing a GRU neural network with a multi-input multi-output structure, training the neural network by taking a minimum cross entropy loss function as an optimization target, and obtaining a basic classification model;

s302, obtaining historical commodity titles, commodity image characteristics, commodity keywords, other description information and parameter information, and generating a commodity hierarchy according to the obtained related information;

s303, clustering the historical commodity title, the image characteristics of the commodity, the commodity key words, other description information and parameter information, and associating the commodity hierarchy to obtain a commodity multi-level key word list;

s304, establishing a mapping relation of the commodity multi-level keyword list; and grouping and fusing the basic classification models based on the obtained mapping relation to obtain a commodity level combination classification model.

The building module for the commodity level combined classification model provided by the embodiment of the invention further comprises the following steps:

The above description is only for the purpose of illustrating the preferred embodiments of the present invention, and the scope of the present invention should not be limited thereto, and any modifications, equivalents and improvements made by those skilled in the art within the technical scope of the present invention as disclosed in the present invention should be covered thereby.

Claims

1. An e-commerce commodity classification system based on a hierarchical combination model is characterized by comprising:

denoising the commodity image:

wherein label1 represents the de-noised commodity image, g (x, y, 0) is the initial commodity image, g (x, y, pi/2), g (x, y, pi/2) respectively represent 3 images obtained by pi/2, pi, 3 pi/2 and other interval phase shifts of the initial commodity image, and f ₀ Represents a spatial carrier frequency, x represents one coordinate value among pixel coordinates (x, y);

2. The system for classifying E-commerce commodities based on a hierarchical combination model as claimed in claim 1, wherein said system for classifying E-commerce commodities based on a hierarchical combination model further comprises:

the commodity information acquisition module is connected with the central control module and is used for acquiring related information of the E-commerce commodities to be classified;

the central control module is connected with the commodity title acquisition module, the commodity information acquisition module, the commodity image preprocessing module, the commodity characteristic extraction module, the commodity description information processing module, the keyword extraction module, the incidence mapping relation determination module, the commodity level combination classification model construction module, the classification module and the output module and is used for controlling each module to normally work by utilizing a single chip microcomputer or a controller;

and the output module is connected with the central control module and used for outputting the collected corresponding information of the commodities and the classification results of the commodities.

3. The e-commerce commodity classification system based on the hierarchical combination model as claimed in claim 2, wherein the commodity information acquisition module comprises:

the detailed description information acquisition unit is used for acquiring the characteristics and detailed description information of the related commodities;

and the commodity parameter information acquisition unit is used for acquiring the relevant size of the commodity and other parameter data.

4. The e-commerce commodity classification system based on the hierarchical combination model as claimed in claim 2, wherein the commodity feature extraction module extracting commodity image features based on the preprocessed commodity image comprises:

removing irrelevant information in the preprocessed commodity image by adopting a dynamic threshold layer-by-layer peeling and segmenting method based on color characteristics;

acquiring commodity characteristic information in the commodity image by adopting a clustering segmentation algorithm based on color characteristics and texture characteristics;

removing segmentation fragments of the segmented commodity image by a denoising method based on texture characteristics; and fusing the extracted color and texture features to obtain the commodity image features.

5. The system for classifying E-commerce commodities based on a hierarchical combination model as claimed in claim 4, wherein in the dynamic threshold layer-by-layer peeling segmentation process based on color features, the model adopted is as follows:

wherein t is a threshold value, and L similarity levels;

d ² (t)＝ω ₀ (t)(μ ₀ (t)-μ) ² +ω ₁ (t)(μ ₁ (t)-μ) ² ；

the proportion of the image area is specifically as follows:

the proportion of the background area is specifically as follows:

the image area mean value specifically includes:

the background area mean is:

the overall image mean is:

μ＝ω ₀ (t)μ ₀ (t)+ω ₁ (t)μ ₁ (t)；

the sum of the total variance of the image areas is:

the sum of the background area total variances is:

6. the e-commerce commodity classification system based on the hierarchical combination model as claimed in claim 4, wherein the color feature and texture feature based clustering segmentation algorithm obtains commodity feature information in a commodity image by the specific process:

determining the maximum cycle times and the minimum termination cycle difference value by taking the color features and the texture features as the cluster number;

7. The E-commerce commodity classification system based on the hierarchical combination model as claimed in claim 4, wherein the specific process of fusing the extracted color and texture features is as follows:

V＝DS。

8. the e-commerce commodity classification system based on the hierarchical combination model as claimed in claim 1, wherein the commodity hierarchical combination classification model building module building the commodity hierarchical combination classification model comprises:

constructing a GRU neural network with a multi-input multi-output structure, training the neural network by taking a minimum cross entropy loss function as an optimization target, and obtaining a basic classification model;

9. The e-commerce commodity classification system based on the hierarchical combination model as claimed in claim 8 wherein the commodity-level combination classification model building module building the commodity-level combination classification model further comprises:

10. The electronic commerce commodity classification system based on the hierarchical combination model as claimed in claim 2, wherein the commodity description information processing module processing the obtained commodity features and the detail description information thereof includes:

and performing word segmentation, word shape reduction, stop word removal, word removal with extremely high occurrence frequency and word pretreatment with extremely low occurrence frequency on the description document.