CN115982630A - Intelligent commodity classification method, system, equipment and medium with multiple classifiers cooperated - Google Patents

Intelligent commodity classification method, system, equipment and medium with multiple classifiers cooperated Download PDF

Info

Publication number
CN115982630A
CN115982630A CN202310208508.XA CN202310208508A CN115982630A CN 115982630 A CN115982630 A CN 115982630A CN 202310208508 A CN202310208508 A CN 202310208508A CN 115982630 A CN115982630 A CN 115982630A
Authority
CN
China
Prior art keywords
commodity
classifiers
idf
vocabulary
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310208508.XA
Other languages
Chinese (zh)
Inventor
王静
李燕北
朱俊
夏竟翔
戴智鑫
闫晨光
沈达峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ouye Industrial Products Co ltd
Original Assignee
Ouye Industrial Products Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ouye Industrial Products Co ltd filed Critical Ouye Industrial Products Co ltd
Priority to CN202310208508.XA priority Critical patent/CN115982630A/en
Publication of CN115982630A publication Critical patent/CN115982630A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a system, equipment and a medium for intelligently classifying commodities under the cooperation of a plurality of classifiers, wherein the method comprises the following steps: step S1: acquiring a training set with uniformly distributed data quantity; step S2: performing word segmentation and word stop on the description information of each commodity in the training set to obtain word segmentation results; and step S3: performing feature coding on each word segmentation, and calculating TF-IDF values of the word segmentation as coding weight values of the word; and step S4: calculating a weighted average value of the code weight values of all the participles as a characteristic code of the commodity; step S5: dividing all data into a training set and a test set for training classifiers, and training a plurality of classifiers; step S6: calculating the weight value of each classifier, and weighting and summing the results of each classifier; step S7: and taking the category with the highest score as a classification result. The invention can judge the belonged class in the platform commodity management system according to the description information of the commodity, and provides support for the functions of digital management, commodity recommendation and the like of the commodity.

Description

Intelligent commodity classification method, system, equipment and medium with multiple classifiers cooperated
Technical Field
The invention relates to the technical field of commodity classification, in particular to a method, a system, equipment and a medium for intelligently classifying commodities by means of cooperation of multiple classifiers.
Background
The enterprise electronic commerce platform is a virtual network space for carrying out commerce activities on the Internet and a management environment for ensuring the smooth operation of the commerce; the system is an important place for coordinating and integrating information flow, cargo flow and fund flow in order, relevance and high-efficiency flow. Enterprises and merchants can make full use of shared resources such as network infrastructure, payment platform, security platform, management platform and the like provided by the electronic commerce platform to effectively develop own commercial activities at low cost.
The prior art has the following disadvantages: the goods on the e-commerce platform have wide coverage range and complex classification system, and the conditions that the seller fills in the goods categories are not standard or missing, and the like, are easy to occur; the commodity information uploaded by different sellers is differentiated and incomplete, and the general classification method is poor in performance.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method, a system, equipment and a medium for intelligently classifying commodities by cooperating a plurality of classifiers.
According to the intelligent commodity classification method, system, equipment and medium with multiple classifiers in cooperation, the scheme is as follows:
in a first aspect, a method for intelligently classifying commodities by cooperating multiple classifiers is provided, and the method comprises the following steps:
step S1: acquiring a training set with uniformly distributed data quantity;
step S2: performing word segmentation and word stop on the description information of each commodity in the training set to obtain word segmentation results;
and step S3: after word segmentation, performing feature coding on each segmented word, calculating TF-IDF values of the segmented words, and taking the TF-IDF values of the segmented words as coding weight values of the words;
and step S4: the product of the feature code of each participle multiplied by the weight value is used as the weighted feature of the participle under the category to which the participle belongs, and the sum of all the weighted features of the participles in the commodity is used as the feature code of the commodity;
step S5: dividing all data into a training set and a test set for training classifiers, and respectively training a plurality of classifiers;
step S6: calculating the weight value of each classifier, and weighting and summing the results of each classifier;
step S7: and taking the category with the highest score as a classification result.
Preferably, the calculation of TF-IDF in step S3 includes: TF and IDF;
wherein TF represents the frequency of occurrence of a certain vocabulary in a certain document; the IDF represents a measure of the universal importance of a vocabulary, namely if the number of documents containing a certain vocabulary is less, the IDF is larger, and the vocabulary has good category distinguishing capability; if a certain vocabulary appears in a document with a high frequency TF and rarely appears in other documents, the vocabulary is considered to have good category distinguishing capability and is suitable for classification.
Preferably, the ith word t i With respect to the jth document d j The TF-IDF of (1) is calculated as follows:
Figure BDA0004111728580000021
wherein n is ij Representing the ith word t i Appear in the jth document d j The number of times of (c); s is the total number of the documents; k represents the number of words in the jth document; i represents the inclusion of t i Has a collection of documents.
Preferably, the step S6 adopts AIC information criterion:
AIC K =-2logl k +2λ k
wherein l k And λ k The maximum likelihood function and the classifier parameters of the kth classifier are respectively;
the weight of each classifier is:
Figure BDA0004111728580000022
let the probability of classifying each sample i into the class j obtained by the above k algorithms respectively be
Figure BDA0004111728580000023
Thus, the i-th sample is classified into the class J summary after the classifier weights
Figure BDA0004111728580000024
Comprises the following steps:
Figure BDA0004111728580000025
ith sample selection
Figure BDA0004111728580000026
As a result of the classification.
In a second aspect, a system for intelligently classifying commodities with multiple classifiers in cooperation is provided, and the system comprises:
a module M1: acquiring a training set with uniformly distributed data quantity;
a module M2: performing word segmentation and word stop on the description information of each commodity in the training set to obtain word segmentation results;
a module M3: after word segmentation, performing characteristic coding on each segmented word, calculating TF-IDF values of the segmented words, and taking the TF-IDF values of the segmented words as coding weight values of the vocabulary;
a module M4: the product of the feature code of each participle multiplied by the weight value is used as the weighted feature of the participle under the category to which the participle belongs, and the sum of all the weighted features of the participles in the commodity is used as the feature code of the commodity;
a module M5: dividing all data into a training set and a test set for training classifiers, and respectively training a plurality of classifiers;
a module M6: calculating the weight value of each classifier, and weighting and summing the results of each classifier;
a module M7: and taking the category with the highest score as a classification result.
Preferably, the calculation of TF-IDF in said module M3 comprises: TF and IDF;
wherein, TF represents the frequency of occurrence of a certain vocabulary in a certain document; the IDF represents a measure of the universal importance of a vocabulary, namely if the number of documents containing a certain vocabulary is smaller, the IDF is larger, and the vocabulary has good category distinguishing capability; if a certain vocabulary appears in a document with a high frequency TF and rarely appears in other documents, the vocabulary is considered to have good category distinguishing capability and is suitable for classification.
Preferably, the ith word t i With respect to the jth document d j The TF-IDF of (A) is calculated as follows:
Figure BDA0004111728580000031
wherein n is ij Representing the ith word t i Appear in the jth document d j The number of times of (c); s is the total number of the documents; k represents the number of words in the jth document; i represents a group containing t i Has a collection of documents.
Preferably, said module M6 uses the AIC information criterion:
AIC K =-2logl k +2λ k
wherein l k And λ k The maximum likelihood function and the classifier parameters of the kth classifier are respectively set;
the weight of each classifier is:
Figure BDA0004111728580000032
let the probability of classifying each sample i into the class j obtained by the above k algorithms respectively be
Figure BDA0004111728580000033
Thus, the i-th sample is classified into class J summary after weighting by the classifier
Figure BDA0004111728580000034
Comprises the following steps:
Figure BDA0004111728580000035
ith sample selection
Figure BDA0004111728580000036
As a result of the classification.
In a third aspect, a computer readable storage medium is provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps in the intelligent classification method for goods by cooperation of multiple classifiers.
In a fourth aspect, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the computer program is executed by the processor, the electronic device implements the steps in the intelligent classification method for goods by using the cooperation of multiple classifiers.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the invention, the commodities are automatically classified in a unified and standard manner through the commodity description information, so that the labor cost is reduced;
2. the invention only classifies the commodity name and the model specification according to the two parts, and improves the classification effect of the method through a mode of weighted combination of a plurality of models.
Other advantages of the present invention will be described in the detailed description, and those skilled in the art will understand the technical features and technical solutions presented in the description.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a diagram illustrating the TF-IDF cumulative contribution;
FIG. 3 is a diagram illustrating TF-IDF growth rate.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will aid those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any manner. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The embodiment of the invention provides an intelligent commodity classification method with multiple classifiers in cooperation, which is used for judging the class of commodities in a platform commodity management system according to description information of the commodities when a purchasing party or a merchant releases commodity information on an E-commerce platform, and providing support for functions of digital management, commodity recommendation and the like of the commodities. Referring to fig. 1, the method specifically includes the following steps:
step S1: and acquiring a training set with uniformly distributed data quantity.
Step S2: and performing word segmentation and word stop on the description information of each commodity in the training set to obtain word segmentation results.
Specifically, data preprocessing: most commodity classifications have the condition that the number of class samples is not uniform, data enhancement is carried out through methods such as synonym replacement, classes with few samples are expanded, and a training set with relatively uniform data size distribution is obtained. The word segmentation is performed on the description information of each commodity in the training set to obtain a basic unit-word group of semantic analysis, and some words with a small sample classification effect exist in the word segmentation result, for example: the type specification, application occasion, material and the like, and the words are used as stop words to be processed so as to achieve the purpose of further cleaning data.
And step S3: after word segmentation, performing feature coding on each segmented word, calculating TF-IDF values of the segmented words, and taking the TF-IDF values of the segmented words as coding weight values of the vocabulary.
And step S4: and taking the product of the feature code of each participle multiplied by the weight value as the weighted feature of the participle under the category to which the participle belongs, and taking the sum of all the weighted features of the participles in the commodity as the feature code of the commodity.
Specifically, feature engineering: and converting the character description information of the commodity into a characteristic vector which can be processed by a machine through a word vector coding mode for the result after word segmentation. The coding mode can adopt word2vector, deep learning and other coding modes. Considering that some words in the word groups after word segmentation have little classification effect and may interfere with classification, in order to further remove redundant information, reduce algorithm complexity and improve algorithm efficiency, the invention calculates TF-IDF values of the words to obtain the importance degree of each word in classification, only reserves the words with larger influence on classification, simultaneously takes the TF-IDF values of each word as the coding weight value of the word, and takes the result of weighting and averaging all codes of the commodity as the characteristic code of the commodity.
The calculation of TF-IDF mainly comprises two parts: TF (word frequency) and IDF (inverse file frequency). Wherein TF represents the frequency of occurrence of a certain vocabulary in a certain document, and IDF is a measure of the general importance of the vocabulary, that is, if the fewer documents containing the certain vocabulary are, the larger IDF is, the better category distinguishing capability of the vocabulary is shown. If a certain vocabulary appears in a document with a high frequency TF and rarely appears in other documents, the vocabulary is considered to have good category distinguishing capability and is suitable for classification. The ith vocabulary t i With respect to the jth document d i The TF-IDF of (A) is calculated as follows:
Figure BDA0004111728580000051
wherein n is ij Representing the ith word t i The number of times of occurrence of the jth document dj; s is the total number of the documents; k represents the number of words in the jth document; i represents the inclusion of t i Has a collection of documents.
For the classification task of the invention, each category represents a document, TF-IDF is calculated, TF-IDF of all words is summed according to rows, after sorting, the accumulated contribution degree of each column of words is calculated according to columns, and the calculation result is shown in figure 2 and figure 3 by using self-training samples. And 5, 000 vocabulary with the highest classification importance is selected for classification prediction, and the cumulative contribution degree is about 80 percent, so that the purpose of further reducing the dimension is achieved.
Step S5: and dividing all commodity feature codes into a training set and a testing set for training the classifiers, and respectively training a plurality of classifiers.
Step S6: and calculating the weight value of each classifier, and weighting and summing the results of each classifier.
Step S7: and taking the category with the highest score as a classification result.
In particular, multiple classifiers act in concert. All data are divided into a training set and a testing set according to the ratio of 4: 1, and a plurality of classifiers with better classification results, such as SVM, XGBOOST, RANDOMFOREST, ADAMBOOST, DNN and other multi-classifiers, are trained respectively. And inputting the feature codes and the real labels corresponding to the samples into the classifier models, and adjusting the hyper-parameters of each model to ensure that each model achieves a good classification effect.
In order to obtain an ideal classification result, considering that the overall classification results of different models are similar, but the classification precision of specific classes has a certain difference, the results of the plurality of classifiers are subjected to model averaging to synthesize the advantages of the different classifiers. The invention uses a model averaging method to carry out weighted averaging on the prediction results of the various algorithms according to certain weight. The weight selection method can adopt common AIC or BIC information criteria and other weight acquisition methods. AIC is predicted by selecting a good model from a prediction perspective, and BIC is predicted by selecting a model that best fits existing data from a fitting perspective. In order to obtain good prediction effect, the invention takes AIC as an example,
AIC K =-2logl k +2λ k
wherein l k And λ k Respectively the maximum likelihood function and the model parameters of the kth model;
the weight of each model is:
Figure BDA0004111728580000061
let the probability of classifying each sample i into the class j obtained by the above k algorithms respectively be
Figure BDA0004111728580000062
Thus, after model weighting, the ith sample is classified into a summary of class j
Figure BDA0004111728580000063
Comprises the following steps:
Figure BDA0004111728580000064
ith sample selection
Figure BDA0004111728580000065
As a result of the classification.
The invention also provides a goods intelligent classification system with multiple classifiers in cooperation, which can be realized by executing the flow steps of the goods intelligent classification method with multiple classifiers in cooperation, namely, the person skilled in the art can understand the goods intelligent classification method with multiple classifiers in cooperation as the preferred embodiment of the goods intelligent classification system with multiple classifiers in cooperation.
A module M1: and acquiring a training set with uniformly distributed data quantity.
A module M2: and performing word segmentation and word stop on the description information of each commodity in the training set to obtain word segmentation results.
Specifically, data preprocessing: most commodity classifications have the condition that the number of class samples is not uniform, data enhancement is carried out through methods such as synonym replacement, classes with few samples are expanded, and a training set with relatively uniform data size distribution is obtained. The word segmentation is performed on the description information of each commodity in the training set to obtain a basic unit, namely a word group of semantic analysis, and some words with small sample classification effects exist in the word segmentation result, such as: the type specification, application occasion, material and the like, and the words are used as stop words to be processed so as to achieve the purpose of further cleaning the data.
A module M3: after word segmentation, performing feature coding on each segmented word, calculating TF-IDF values of the segmented words, and taking the TF-IDF values of the segmented words as coding weight values of the vocabulary.
A module M4: and the product of the feature code of each participle multiplied by the weight value is used as the weighting feature of the participle under the class to which the participle belongs, and the sum of all participle weighting features in the commodity is used as the feature code of the commodity.
Specifically, feature engineering: and converting the word description information of the commodity into a characteristic vector which can be processed by a machine through a word vector coding mode for the result after word segmentation. The coding mode can adopt word2vector, deep learning and other coding modes. Considering that some words in the word groups after word segmentation have little classification effect and may interfere with classification, in order to further remove redundant information, reduce algorithm complexity and improve algorithm efficiency, the invention calculates TF-IDF values of the words to obtain the importance degree of each word in classification, only the words with larger influence on classification are reserved, simultaneously, the TF-IDF values of each word are used as the coding weight values of the words, and the result of weighting and averaging all codes of the commodity is used as the feature code of the commodity.
The calculation of TF-IDF mainly comprises two parts: TF (word frequency) and IDF (inverse document frequency). Wherein TF represents the frequency of occurrence of a word in a document, and IDF is a measure of the general importance of a word, i.e., if the number of documents containing a word is smaller, the IDF is larger, and the word is classified into a good categoryAbility to distinguish. If a certain vocabulary appears in a document with a high frequency TF and rarely appears in other documents, the vocabulary is considered to have a good category distinction capability and is suitable for classification. The ith vocabulary t i With respect to the jth document d j The TF-IDF of (A) is calculated as follows:
Figure BDA0004111728580000071
wherein n is ij Representing the ith word t i Appear in the jth document d i The number of times of (c); s is the total number of the documents; k represents the number of words in the jth document; i represents the inclusion of t i Has a collection of documents.
For the classification task of the invention, each category represents a document, TF-IDFs of all vocabularies are summed according to rows by calculating TF-IDFs, the accumulated contribution of each row of vocabularies is calculated according to columns after sequencing, and the result is calculated by using self-training samples as shown in FIG. 2 and FIG. 3. And 5, 000 vocabulary with the highest classification importance is selected for classification prediction, and the cumulative contribution degree is about 80 percent, so that the purpose of further reducing the dimension is achieved.
A module M5: all data are divided into a training set and a test set for training classifiers, and a plurality of classifiers are trained respectively.
A module M6: and calculating the weight value of each classifier, and weighting and summing the results of each classifier.
A module M7: and taking the category with the highest score as a classification result.
In particular, multiple classifiers function in concert. All data are divided into a training set and a testing set according to the ratio of 4: 1, and a plurality of classifiers with better classification results, such as SVM, XGBOOST, RANDOMFOREST, ADAMBOOST, DNN and other multi-classifiers, are trained respectively. And inputting the feature codes and the real labels corresponding to the samples into the models, and adjusting the hyper-parameters of the models to ensure that each model achieves a good classification effect.
In order to obtain an ideal classification result, considering that the overall classification results of different models are similar, but the classification precision of specific classes has a certain difference, the results of the plurality of classifiers are subjected to model averaging to synthesize the advantages of the different classifiers. The invention uses a model averaging method to carry out weighted averaging on the prediction results of the various algorithms according to certain weight. The weight selection method can adopt common AIC or BIC information criteria and other weight acquisition methods. Taking the example of the AIC as an example,
AIC K =-2logl k +2λ k
wherein l k And λ k The maximum likelihood function and the model parameter of the kth model are respectively;
the weight of each model is:
Figure BDA0004111728580000081
let the probability of classifying each sample i into the class j obtained by the above k algorithms respectively be
Figure BDA0004111728580000082
Thus, after model weighting, the ith sample is classified into a summary of class j
Figure BDA0004111728580000083
Comprises the following steps:
Figure BDA0004111728580000084
ith sample selection
Figure BDA0004111728580000085
As a result of the classification. />
The embodiment of the invention provides an intelligent commodity classification method, system, equipment and medium with multiple classifiers in cooperation, which can automatically perform unified and standard classification through commodity description information, reduce labor cost, classify commodities only by means of commodity names and model specifications, and improve the classification effect of the method in a mode of weighted combination of multiple models.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A commodity intelligent classification method with multiple classifiers in cooperation is characterized by comprising the following steps:
step S1: acquiring a training set with uniformly distributed data quantity;
step S2: performing word segmentation and word stop on the description information of each commodity in the training set to obtain word segmentation results;
and step S3: after word segmentation, performing feature coding on each segmented word, calculating TF-IDF values of the segmented words, and taking the TF-IDF values of the segmented words as coding weight values of the words;
and step S4: the product of the feature code of each participle multiplied by the weight value is used as the weighted feature of the participle under the category to which the participle belongs, and the sum of all the weighted features of the participles in the commodity is used as the feature code of the commodity;
step S5: dividing the feature codes of all commodities into a training set and a test set for training classifiers, and respectively training a plurality of classifiers;
step S6: calculating the weight value of each classifier, and weighting and summing the results of each classifier;
step S7: and taking the category with the highest score as a classification result.
2. The intelligent classification method for commodities with multiple classifiers in cooperation according to claim 1, wherein the calculation of the TF-IDF in the step S3 comprises: TF and IDF;
wherein TF represents the frequency of occurrence of a certain vocabulary in a certain document; the IDF represents a measure of the universal importance of a vocabulary, namely if the number of documents containing a certain vocabulary is less, the IDF is larger, and the vocabulary has good category distinguishing capability; if a certain vocabulary appears in a document with a high frequency TF and rarely appears in other documents, the vocabulary is considered to have good category distinguishing capability and is suitable for classification.
3. The intelligent commodity classification method based on cooperation of multiple classifiers according to claim 2, wherein the ith vocabulary t is i With respect to the jth document d j The TF-IDF of (1) is calculated as follows:
Figure FDA0004111728570000011
wherein n is ij Representing the ith word t i Appear in the jth document d j The number of times of (c); s is the total number of the documents; k represents the number of words in the jth document; i represents the inclusion of t i Has a collection of documents.
4. The intelligent classification method for commodities with multiple classifiers in cooperation according to claim 1, wherein the step S6 adopts AIC information criterion:
AIC K =-2logl k +2λ k
wherein l k And λ k The maximum likelihood function and the classifier parameters of the kth classifier are respectively;
the weight of each classifier is:
Figure FDA0004111728570000021
the probability of classifying each sample i into the category j obtained by the k algorithms is respectively set as
Figure FDA0004111728570000022
Thus, the i-th sample is classified into a summary of class j after weighting by the classifier
Figure FDA0004111728570000023
Comprises the following steps:
Figure FDA0004111728570000024
ith sample selection
Figure FDA0004111728570000025
As a result of the classification.
5. An intelligent commodity classification system with multiple classifiers in cooperation is characterized by comprising:
a module M1: acquiring a training set with uniformly distributed data quantity;
a module M2: performing word segmentation and word stop on the description information of each commodity in the training set to obtain word segmentation results;
a module M3: after word segmentation, performing characteristic coding on each segmented word, calculating TF-IDF values of the segmented words, and taking the TF-IDF values of the segmented words as coding weight values of the vocabulary;
a module M4: the product of the feature code of each participle multiplied by the weight value is used as the weighted feature of the participle under the category to which the participle belongs, and the sum of all the weighted features of the participles in the commodity is used as the feature code of the commodity;
a module M5: dividing all data into a training set and a test set for training classifiers, and respectively training a plurality of classifiers;
a module M6: calculating the weight value of each classifier, and weighting and summing the results of each classifier;
a module M7: and taking the category with the highest score as a classification result.
6. The intelligent classification system for commodities with multiple cooperative classifiers according to claim 5, wherein the calculation of TF-IDF in the module M3 comprises: TF and IDF;
wherein TF represents the frequency of occurrence of a certain vocabulary in a certain document; the IDF represents a measure of the universal importance of a vocabulary, namely if the number of documents containing a certain vocabulary is less, the IDF is larger, and the vocabulary has good category distinguishing capability; if a certain vocabulary appears in a document with a high frequency TF and rarely appears in other documents, the vocabulary is considered to have good category distinguishing capability and is suitable for classification.
7. The intelligent classification system for commodities with multiple classifiers in cooperation as claimed in claim 6, wherein the ith vocabulary t i With respect to the jth document d j The TF-IDF of (A) is calculated as follows:
Figure FDA0004111728570000026
wherein n is ij Indicates the ith word t i Appear in the jth document d j The number of times of (c); s is the total number of the documents; k represents the number of words in the jth document; i represents a group containing t i Has a collection of documents.
8. The intelligent classification system for commodities with multiple coordinated classifiers according to claim 5, wherein said module M6 employs AIC information criterion:
AIC K =-2logl k +2λ k
wherein l k And λ k The maximum likelihood function and the classifier parameters of the kth classifier are respectively;
the weight of each classifier is:
Figure FDA0004111728570000031
the probability of classifying each sample i into the category j obtained by the k algorithms is respectively set as
Figure FDA0004111728570000032
Thus, the ith sample is classified into the summary of category j after weighting by the classifier
Figure FDA0004111728570000033
Comprises the following steps:
Figure FDA0004111728570000034
ith sample selection
Figure FDA0004111728570000035
As a result of the classification.
9. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the intelligent classification method for commodities with the cooperation of a plurality of classifiers according to any one of claims 1 to 4.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program, when executed by the processor, implements the steps of the intelligent classification method for goods by cooperation of a plurality of classifiers as claimed in any one of claims 1 to 4.
CN202310208508.XA 2023-03-06 2023-03-06 Intelligent commodity classification method, system, equipment and medium with multiple classifiers cooperated Pending CN115982630A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310208508.XA CN115982630A (en) 2023-03-06 2023-03-06 Intelligent commodity classification method, system, equipment and medium with multiple classifiers cooperated

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310208508.XA CN115982630A (en) 2023-03-06 2023-03-06 Intelligent commodity classification method, system, equipment and medium with multiple classifiers cooperated

Publications (1)

Publication Number Publication Date
CN115982630A true CN115982630A (en) 2023-04-18

Family

ID=85974463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310208508.XA Pending CN115982630A (en) 2023-03-06 2023-03-06 Intelligent commodity classification method, system, equipment and medium with multiple classifiers cooperated

Country Status (1)

Country Link
CN (1) CN115982630A (en)

Similar Documents

Publication Publication Date Title
CN111222332B (en) Commodity recommendation method combining attention network and user emotion
CN107766929B (en) Model analysis method and device
Xiao et al. Feature-selection-based dynamic transfer ensemble model for customer churn prediction
CN112633426B (en) Method and device for processing data class imbalance, electronic equipment and storage medium
CN109739844B (en) Data classification method based on attenuation weight
Li et al. Multi-factor based stock price prediction using hybrid neural networks with attention mechanism
US20090089228A1 (en) Generalized reduced error logistic regression method
Mustika et al. Analysis accuracy of xgboost model for multiclass classification-a case study of applicant level risk prediction for life insurance
CN111738532A (en) Method and system for acquiring influence degree of event on object
CN111695024A (en) Object evaluation value prediction method and system, and recommendation method and system
Muthukumar et al. A stochastic time series model for predicting financial trends using nlp
Zhou et al. Supply chain fraud prediction based on xgboost method
Elgohary et al. Smart evaluation for deep learning model: churn prediction as a product case study
CN115982630A (en) Intelligent commodity classification method, system, equipment and medium with multiple classifiers cooperated
Li et al. A study on customer churn of commercial banks based on learning from label proportions
CN114942974A (en) E-commerce platform commodity user evaluation emotional tendency classification method
Jiang et al. Soft computing model using cluster-PCA in port model for throughput forecasting
CN113987170A (en) Multi-label text classification method based on convolutional neural network
Zhang et al. Extending associative classifier to detect helpful online reviews with uncertain classes
AbdulSattar et al. Towards harnessing based learning algorithms for tweets sentiment analysis
CN110956528A (en) Recommendation method and system for e-commerce platform
Wu et al. Variable selection method affects SVM-based models in bankruptcy prediction
CN114596120B (en) Commodity sales predicting method, system, equipment and storage medium
CN113627653B (en) Method and device for determining activity prediction strategy of mobile banking user
Hardin et al. BNPL Probability of Default Modeling Including Macroeconomic Factors: A Supervised Learning Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination