CN116166805A

CN116166805A - Commodity coding prediction method and device

Info

Publication number: CN116166805A
Application number: CN202310174800.4A
Authority: CN
Inventors: 徐梦璇; 张丹; 熊晓菁
Original assignee: Beijing Qingmeng Shuhai Technology Co ltd
Current assignee: Beijing Qingmeng Shuhai Technology Co ltd
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-05-26
Anticipated expiration: 2043-02-24
Also published as: CN116166805B

Abstract

The application discloses a method and a device for predicting commodity codes, wherein the method comprises the following steps: taking commodity information of cosmetics to be declared as a prediction sample, and calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each first layer word bank; predicting the cosmetic category of the cosmetic to be declared according to the word frequency-inverse text frequency of each feature vector in each first-layer word stock; calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word bank; and predicting commodity codes of the cosmetics to be declared according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock. The method and the device can improve the accuracy of classification and the matching degree of commodity codes, and further save the time for inquiring commodity codes.

Description

Commodity coding prediction method and device

Technical Field

The application belongs to the technical field of big data, and particularly relates to a method and a device for predicting commodity codes.

Background

When the cosmetics enterprises declare import and export goods, HS (Harmonized System, coordination system) codes of the goods are required to be filled in a customs declaration attached to the goods. The HS code is a set of international trade commodity classification system codes, and is mainly used for customs personnel to confirm commodity category, carry out commodity classification management, audit tariff standard and check commodity quality index. The currently used HS coding system in China consists of ten digits, and usually one commodity only corresponds to one HS code, and one HS code not only corresponds to one commodity. Correctly filling the HS code can accelerate the customs process, ensure smooth clearance of the goods, and avoid extra cost or delay. If the HS codes are wrongly classified, normal order of customs is disturbed, and the situation is seriously penalized by administration of customs.

In order to accurately fill in the coding of the commodity of the cosmetics, the declaration personnel of the enterprises need to know the basic knowledge of the classification of the HS codes, as well as the properties, characteristics, purposes, etc. of the commodity itself, which requires the accumulation of knowledge over the years and months, not every bit of the HS codes of the commodity can be classified and distinguished quickly and skillfully. Currently, there are many websites on the network that can query HS codes by obtaining a keyword entered by a user and then returning all relevant HS codes that contain the keyword. However, the results obtained by the query are numerous, different in category and lack of hierarchical relationship, and the matching degree is low, so that the time cost of querying commodity codes of enterprises is increased.

Content of the application

The embodiment of the application aims to provide a method and a device for predicting commodity codes, which are used for solving the defect of low matching degree of commodity code query in the prior art.

In order to solve the technical problems, the application is realized as follows:

in a first aspect, a method of predicting commodity coding is provided, comprising the steps of:

taking commodity information of cosmetics to be declared as a prediction sample, and calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each first-layer word stock, wherein each feature vector corresponds to one attribute of the cosmetics to be declared, and each first-layer word stock corresponds to one cosmetic class;

predicting the cosmetic category to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each first-layer word stock, the prior probability of each cosmetic category and the correlation between the cosmetic attribute and the cosmetic category in the historical declaration data;

calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word stock, wherein each second-layer word stock corresponds to a commodity code contained in the cosmetic category to which the cosmetic to be declared belongs;

predicting the commodity code of the cosmetic to be declared according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code and the correlation between the cosmetic attribute and the commodity code in the historical declaration data.

In a second aspect, there is provided an apparatus for predicting commodity codes, comprising:

the first calculation module is used for taking commodity information of cosmetics to be declared as a prediction sample, calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each first-layer word stock, wherein each feature vector corresponds to one attribute of the cosmetics to be declared, and each first-layer word stock corresponds to one cosmetic category;

the first prediction module is used for predicting the cosmetic category to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each first-layer word stock, the prior probability of each cosmetic category and the correlation between the cosmetic attribute and the cosmetic category in the historical declaration data;

the second calculation module is used for calculating the word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word stock, and each second-layer word stock corresponds to one commodity code contained in the cosmetic category to which the cosmetic to be declared belongs;

and the second prediction module is used for predicting the commodity code to which the cosmetics to be declared belong according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code and the correlation between the cosmetic attribute and the commodity code in the historical declaration data.

According to the method and the device for predicting the commodity codes, the classification accuracy and the matching degree of the commodity codes can be improved, and the time for inquiring the commodity codes is further saved.

Drawings

FIG. 1 is a flow chart of a method for predicting commodity codes according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an apparatus for predicting commodity codes according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In order to solve the problems in the prior art, the embodiment of the application provides a method for searching cosmetic HS codes based on an improved naive Bayes classifier, which is improved from the following three aspects:

1. because the number of commodity codes contained under the cosmetics category is large, and the establishment difficulty of the multi-classification model is large, the embodiment of the application establishes a two-layer classification model, wherein the first layer predicts the cosmetic category to which the commodity belongs and the second layer predicts the specific commodity code to which the commodity belongs.

2. By calculating TF-IDF values instead of conditional probabilities in a naive bayes model, the evaluation of the importance of the terms in the classification process is increased.

3. By calculating the correlation between the attributes and the categories, different weights are given to different attributes, and the higher the degree of correlation between the attributes and the categories, the greater the importance of the attributes to the categories, and therefore the higher the weight given to the attributes.

In order to achieve the above purpose, the embodiment of the present application provides the following technical solutions:

1. and acquiring historical declaration data of cosmetics, and establishing a label extraction model based on commodity information filled in by enterprises.

2. Based on the improved naive Bayes classifier, a first-layer classification model is established, and the cosmetic category to which the commodity belongs is predicted.

2.1 definition of first layer classification model categories

2.2, acquiring historical declaration data of cosmetics as training samples, and establishing five word banks according to the types of the cosmetics.

And 2.3, acquiring commodity information of cosmetics to be declared, which is input by enterprises, as a prediction sample. The prediction samples are subjected to label extraction in step 1.

2.4 defines an improved naive bayes classifier:

2.5 calculating the prior probability for each cosmetic class

2.6 calculate prediction samples x= { X ₁ ,x ₂ ,...,x _n Each feature vector x in } _i The TF-IDF value of (i) is the word frequency-inverse text frequency, used to evaluate the importance of words in the word stock.

2.7 calculation of cosmetic Properties and cosmeticsCorrelation of categories. Giving each attribute a weight w according to the correlation _i The higher the correlation, the greater the weight.

2.8 based on the calculation results of steps 2.5, 2.6, 2.7, a prediction sample x= { X is calculated ₁ ,x ₂ ,...,x _n Probability of belonging to each category

Then, the cosmetic class in which the probability is the greatest is selected as the prediction result of the first-layer classification model.

3. And based on the improved naive Bayes classifier, establishing a second-layer classification model, and predicting the specific commodity code to which the commodity belongs.

3.1, based on the cosmetic category of the commodity predicted by the first-layer classification model, establishing a second-layer classification model under the category. Defining second-level classification model classes

3.2 based on training samples and class L of the second-layer classification model ² And respectively establishing word libraries.

3.3 repeating the steps 2.3 to 2.8 to calculate the maximum posterior probability

The category with the highest probability is the predicted commodity code of the commodity.

The method for predicting commodity codes provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

As shown in fig. 1, a flowchart of a method for predicting commodity coding according to an embodiment of the present application is provided, where the method includes the following steps:

step 101, commodity information of cosmetics to be declared is used as a prediction sample, word frequency-inverse text frequency of each feature vector in the prediction sample in each first-layer word stock is calculated, each feature vector corresponds to one attribute of the cosmetics to be declared, and each first-layer word stock corresponds to one cosmetic category.

Step 102, predicting the cosmetic category to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each first-layer word stock, the prior probability of each cosmetic category, and the correlation between the cosmetic attribute and the cosmetic category in the historical declaration data.

Specifically, the weight coefficient of each feature vector in the prediction sample can be calculated according to the correlation between the cosmetic attribute and the cosmetic category in the historical declaration data;

the prediction samples x= { X are calculated by the following formulas, respectively ₁ ,x ₂ ,...,x _n Probability of belonging to each cosmetic class j

/>

Wherein { x ₁ ,x ₂ ,...,x _n Is a plurality of feature vectors in the prediction samples,

for the prior probability of cosmetic class j, tf _ij For the feature vector x _i The frequency of occurrence in the first layer word stock corresponding to cosmetic class j; idf (idf) _i For the feature vector x _i The frequency of the reverse text in the first-layer word stock; w (w) _i For the feature vector x _i Weight coefficient of (2);

and comparing the probabilities that the prediction samples belong to the cosmetic categories, and taking the cosmetic category with the highest probability as a prediction result.

In this embodiment, before calculating the word frequency-inverse text frequency of each feature vector in the prediction sample in each first layer word stock, historical declaration data of cosmetics may be further obtained as a training sample, and each declaration data in the training sample is classified according to a cosmetic class corresponding to a commodity code of each declaration data in the training sample; and respectively extracting labels from the declaration information of the multiple declaration data corresponding to each cosmetic class to obtain a first-layer word stock corresponding to each cosmetic class.

Step 103, calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word stock, wherein each second-layer word stock corresponds to a commodity code contained in the cosmetic category to which the cosmetic to be declared belongs.

And 104, predicting the commodity code to which the cosmetics to be declared belong according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code and the correlation between the cosmetic attribute and the commodity code in the historical declaration data.

Specifically, the weight coefficient of each feature vector in the prediction sample can be calculated according to the correlation between the cosmetic attribute and commodity code in the historical declaration data;

the prediction samples x= { X are calculated by the following formulas, respectively ₁ ,x ₂ ,...,x _n Probability of belonging to each commodity code k

the prior probability of k for commodity code tf _ik For the feature vector x _i The frequency of occurrence in a second-layer word stock corresponding to the commodity code k; idf (idf) _i For the feature vector x _i The frequency of the reverse text in the second-layer word stock; w (w) _i For the feature vector x _i Weight coefficient of (2);

and comparing the probability that the prediction sample belongs to each commodity code, and taking the commodity code with the highest probability as a prediction result.

In this embodiment, before calculating the word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word stock, the plurality of pieces of declaration data may be further classified according to commodity codes of the plurality of pieces of declaration data corresponding to each cosmetic class; and respectively extracting labels from the declaration information of each declaration data corresponding to each commodity code to obtain a second-layer word stock corresponding to each commodity code.

Further, the technical solutions of the embodiments of the present application may be described in detail as follows:

1. the method comprises the steps of acquiring historical declaration data of cosmetics, and establishing a label extraction model based on commodity information filled by enterprises, and specifically comprises the following steps:

1.1 definition of cosmetic Properties Z= { Z ₁ ,z ₂ ,...,z ₇ The goods types, the objects of use, the efficacy, the packaging, the specifications, the brands and the components are respectively.

1.2, dividing commodity information filled in by an enterprise into a plurality of piece information through word segmentation and attribute labeling. Specifically, the bert+crf model may be used to implement chinese named-body recognition. The examples of the present application are not limited herein.

1.3, duplicate word segmentation results are de-duplicated, and commodity types, usage objects, efficacy, packaging, specifications, brands and component attributes are extracted.

Use in "OLAY rinse water bloom|: facial moisturizing and whitening|packaging specification: 50G/bottle |brand: OLAY "is exemplified by the label extraction model, and the results are" OLAY-brand, cream-commodity type, face-subject of use, moisturizing and whitening-efficacy, G-specification, bottle-pack ".

2. Based on an improved naive Bayes classifier, a first-layer classification model is established, and the cosmetic category to which the commodity belongs is predicted, specifically comprising the following steps:

2.1 definition of first layer classification model categories

Respectively, cosmetics for lips, cosmetics for eyes, cosmetics for nails, powdery cosmetics, and other cosmetics or cosmetics for skin care.

2.2, acquiring historical declaration data of cosmetics as training samples, and respectively establishing word libraries according to the types of the cosmetics, wherein the method specifically comprises the following steps of:

and classifying the data according to HS codes of each claim data to obtain data of 5 cosmetic categories. For example, declaration data of "33041000" 8 bits before HS encoding is classified into cosmetics for lips. And (3) extracting the label of the step (1) for the declaration information of declaration data in each cosmetic category to obtain 5 word libraries.

And 2.3, acquiring commodity information of cosmetics to be declared, which is input by enterprises, as a prediction sample. The commodity information of the cosmetics to be declared, which is input by enterprises, is extracted through the label in the step 1, and a feature vector X= { X is obtained ₁ ,x ₂ ,...,x _n Each feature x corresponds to an attribute z.

2.4 defines an improved naive bayes classifier, calculating the probability of each cosmetic class.

Specifically, based on the principle of naive Bayes classification, the probability that the prediction sample belongs to j classes of cosmetics is judged to be

Since P (X) is constant for all cosmetic categories, the posterior probability is maximized

Can be converted into maximizing the prior probability +.>

Assuming that the feature vectors are mutually independent to obtain

And further obtain a naive Bayes classification model as

Wherein, the liquid crystal display device comprises a liquid crystal display device,

a priori probabilities for each cosmetic class; />

Is a conditional probability, i.e. x _i Probability of the word occurring in class j word stock. From the above formula, x _i The higher the frequency of occurrence of words in the class j lexicon, the greater the probability that the sample belongs to class j cosmetics. In reality, however, some frequently occurring common words may not contribute much to classifying categories, such as "moisturizing", and may occur very frequently in each category, and simply using word frequencies may reduce the accuracy of the classifier. Therefore, we use the TF-IDF value instead of +.>

In addition, the weights of the attributes in the naive Bayes model are equal, but in the actual classification process of cosmetics, the importance of each attribute to classification is different, for example, in the first-layer classification model, the attribute of using an object plays the most important role in distinguishing the cosmetics categories, so different attributes can be given different weights to improve the Bayes modelAccuracy of type classification. Let w be _i Is the characteristic x _i The improved naive bayes model is that

2.5 calculating the prior probability for each cosmetic class

Namely, the declaration data with the cosmetic class of j in the training sample accounts for the proportion of the total declaration data, and the specific calculation formula is as follows:

wherein, |D| is the total amount of training samples, |D _j And I is the number of training samples with the cosmetic class of j.

2.6 calculate prediction samples x= { X ₁ ,x ₂ ,...,x _n Each feature vector x in } _i The TF-IDF value of (i) is the word frequency-inverse text frequency, used to evaluate the importance of words in the word stock. If a word occurs frequently in one word stock and less frequently in other word stocks, it is considered that the word has good class distinction capability. The specific calculation formula is as follows:

TF-IDF _ij ＝tf _ij ×idf _i

wherein tf is _ij Representing word frequency, namely the frequency of occurrence of i words in a j-class word stock; idf (idf) _i The inverse text frequency of the i word, i.e. the frequency of occurrence in other word stores, is represented.

tf _ij The specific formula of (2) is:

wherein n is _ij For the number of training samples of the occurrence of the i word in the j word stock, sigma _k n _kj The total training sample number in the j word stock. If the i word does not appear in the training set, the whole probability becomes 0, and in order to solve the problem of zero probability, laplacian smoothing can be used for correcting the probability, and tf after correction _ij The method comprises the following steps:

wherein m is the number of word stock.

idf _i The specific formula of (2) is:

wherein, D is the total training sample amount, |{ j: t _i ∈d _j The number of training samples containing i words is increased by 1 to avoid 0.

2.7 calculating the correlation of the cosmetic properties in the training sample with the cosmetic class. Each attribute is given a weight according to the correlation, the higher the correlation, the greater the weight.

Specifically, the cosmetic property z is calculated ₁ 、z ₂ 、...、z ₇ Class with cosmetics

To measure the correlation between two event sets. The calculation formula is as follows:

is cosmetic property z _i And cosmetic class L ¹ Is a joint probability distribution function of P (z) _i,k ) And

cosmetic properties z, respectively _i And cosmetic class L ¹ Is a function of the edge probability distribution of (a).

Further, I (z _i ；L ¹ ) As attribute z _i Weight coefficient of (a), i.e. w _i ＝I(z _i ；L ¹ )。

2.8 based on the results of the calculations of steps 2.5, 2.6, 2.7

tf _ij 、idf _i 、w _i And an improved naive Bayes model formula

Calculate the prediction sample x= { X ₁ ,x ₂ ,...,x _n Probability of belonging to each category

3. Based on the improved naive Bayes classifier, a second-layer classification model is established, and specific commodity codes of the commodity are predicted, wherein the specific commodity codes comprise the following steps:

Wherein the lip cosmetic contains 7 codes, the eye cosmetic contains 7 codes, the nail cosmetic contains 4 codes, the powder cosmetic contains 2 codes, other cosmetics or cosmetics contains 9 codes.

3.2 acquiring historical declaration data of the cosmetic according to the second-layer classification modelCategory L ² And respectively establishing word libraries. Namely, 7 word banks are built under the lip cosmetics, 7 word banks are built under the eye cosmetics, and so on.

According to the embodiment of the application, the method for searching the HS codes of the cosmetics based on the improved naive Bayesian classifier is established, so that an enterprise can return the HS code with the highest matching degree only by inputting commodity information of the cosmetics to be declared, and the time for querying the commodity codes by the enterprise is greatly saved; by constructing a two-layer classification model, the first layer predicts the cosmetic category to which the commodity belongs, and the second layer predicts the specific commodity code to which the commodity belongs, so that the classification accuracy is improved; the calculation of the correlation between the attribute and the category is increased, the weight of the attribute with large contribution to the category discrimination is increased, and the weight of the attribute with small contribution to the category discrimination is reduced, so that the influence of low-weight attribute words on the classification result is weakened, and the stability of the classification effect is ensured.

As shown in fig. 2, a schematic structural diagram of an apparatus for predicting commodity codes according to an embodiment of the present application includes:

the first calculation module 210 is configured to calculate, using commodity information of the cosmetics to be declared as a prediction sample, word frequency-inverse text frequency of each feature vector in the prediction sample in each first-layer word stock, where each feature vector corresponds to an attribute of the cosmetics to be declared, and each first-layer word stock corresponds to a cosmetic category.

The first prediction module 220 is configured to predict, according to the word frequency-inverse text frequency of each feature vector in each first-layer word stock, the prior probability of each cosmetic category, and the correlation between the cosmetic attribute and the cosmetic category in the historical reporting data, the cosmetic category to which the cosmetic to be reported belongs.

Specifically, the first prediction module 220, specifically usesCalculating weight coefficients of each feature vector in the prediction sample according to the correlation between the cosmetic attribute and the cosmetic category in the historical declaration data; the prediction samples x= { X are calculated by the following formulas, respectively ₁ ,x ₂ ,...,x _n Probability of belonging to each cosmetic class j

The second calculating module 230 is configured to calculate a word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word stock, where each second-layer word stock corresponds to a commodity code included in the cosmetic category to which the cosmetic to be declared belongs.

The second prediction module 240 is configured to predict the commodity code to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code, and the correlation between the cosmetic attribute and the commodity code in the historical declaration data.

Specifically, the second prediction module 240 is specifically configured to report the cosmetic attributes and the merchandise in the data according to the historyCalculating the weight coefficient of each feature vector in the prediction sample according to the encoded correlation; the prediction samples x= { X are calculated by the following formulas, respectively ₁ ,x ₂ ,...,x _n Probability of belonging to each commodity code k

Further, the device further comprises:

the first classification module is used for acquiring historical declaration data of cosmetics as a training sample, and classifying each declaration data in the training sample according to the cosmetic category corresponding to the commodity code of each declaration data in the training sample; and respectively extracting labels from the declaration information of the multiple declaration data corresponding to each cosmetic class to obtain a first-layer word stock corresponding to each cosmetic class.

The second classification module is used for classifying the plurality of pieces of declaration data according to commodity codes of the plurality of pieces of declaration data corresponding to each cosmetic class; and respectively extracting labels from the declaration information of each declaration data corresponding to each commodity code to obtain a second-layer word stock corresponding to each commodity code.

The embodiment of the present application further provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor implements each process of the above-mentioned method embodiment for predicting commodity coding, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims

1. A method of predicting commodity codes, comprising the steps of:

2. The method of claim 1, wherein said calculating each feature vector in the prediction samples precedes a word frequency-inverse text frequency of the respective first level thesaurus further comprises:

acquiring historical declaration data of cosmetics as a training sample, and classifying each declaration data in the training sample according to the cosmetic category corresponding to commodity codes of each declaration data in the training sample;

and respectively extracting labels from the declaration information of the multiple declaration data corresponding to each cosmetic class to obtain a first-layer word stock corresponding to each cosmetic class.

3. The method according to claim 1, wherein predicting the cosmetic category to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each first-layer word bank, the prior probability of each cosmetic category, and the correlation between the cosmetic attribute and the cosmetic category in the historical declaration data, specifically includes:

according to the correlation between the cosmetic attributes and the cosmetic categories in the historical declaration data, calculating the weight coefficient of each feature vector in the prediction sample;

for the prior probability of cosmetic class j, tf _ij For the feature vector x _i First-layer word stock corresponding to cosmetic class jIs a frequency of occurrence in the first and second embodiments; idf (idf) _i For the feature vector x _i The frequency of the reverse text in the first-layer word stock; w (w) _i For the feature vector x _i Weight coefficient of (2);

4. The method of claim 1, wherein said calculating each feature vector in the prediction samples precedes a word frequency-inverse text frequency of the respective second level word stock further comprises:

classifying the multiple pieces of declaration data according to commodity codes of the multiple pieces of declaration data corresponding to each cosmetic class;

and respectively extracting labels from the declaration information of each declaration data corresponding to each commodity code to obtain a second-layer word stock corresponding to each commodity code.

5. The method according to claim 1, wherein predicting the commodity code to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code, and the correlation between the cosmetic attribute and the commodity code in the historical declaration data specifically includes:

according to the correlation between the cosmetic attributes and commodity codes in the historical declaration data, calculating the weight coefficient of each feature vector in the prediction sample;

6. An apparatus for predicting commodity codes, comprising:

7. The apparatus as recited in claim 6, further comprising:

8. The apparatus of claim 6, wherein the device comprises a plurality of sensors,

the first prediction module is specifically configured to calculate a weight coefficient of each feature vector in the prediction sample according to a correlation between a cosmetic attribute and a cosmetic category in the historical declaration data;

9. The apparatus as recited in claim 6, further comprising:

10. The apparatus of claim 6, wherein the device comprises a plurality of sensors,

the second prediction module is specifically configured to calculate a weight coefficient of each feature vector in the prediction sample according to the correlation between the cosmetic attribute and the commodity code in the historical declaration data;

the prior probability of k for commodity code tf _ik For the feature vector x _i The frequency of occurrence in a second-layer word stock corresponding to the commodity code k; idf (idf) _i For the feature vector x _i The frequency of the reverse text in the second-layer word stock; w (w) _i Is of special interestSign vector x _i Weight coefficient of (2);