CN116166805B

CN116166805B - Commodity coding prediction method and device

Info

Publication number: CN116166805B
Application number: CN202310174800.4A
Authority: CN
Inventors: 徐梦璇; 张丹; 熊晓菁
Original assignee: Beijing Qingmeng Shuhai Technology Co ltd
Current assignee: Beijing Qingmeng Shuhai Technology Co ltd
Priority date: 2023-02-24
Filing date: 2023-02-24
Publication date: 2023-09-22
Anticipated expiration: 2043-02-24
Also published as: CN116166805A

Abstract

The application discloses a method and a device for predicting commodity codes, wherein the method comprises the following steps: taking commodity information of cosmetics to be declared as a prediction sample, and calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each first layer word bank; predicting the cosmetic category of the cosmetic to be declared according to the word frequency-inverse text frequency of each feature vector in each first-layer word stock; calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word bank; and predicting commodity codes of the cosmetics to be declared according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock. The embodiment of the application can improve the accuracy of classification and the matching degree of commodity codes, thereby saving the time for inquiring commodity codes.

Description

Commodity coding prediction method and device

Technical Field

The application belongs to the technical field of big data, and particularly relates to a method and a device for predicting commodity codes.

Background

When the cosmetics enterprises declare import and export goods, HS (Harmonized System, coordination system) codes of the goods are required to be filled in a customs declaration attached to the goods. The HS code is a set of international trade commodity classification system codes, and is mainly used for customs personnel to confirm commodity category, carry out commodity classification management, audit tariff standard and check commodity quality index. The currently used HS coding system in China consists of ten digits, and usually one commodity only corresponds to one HS code, and one HS code not only corresponds to one commodity. Correctly filling the HS code can accelerate the customs process, ensure smooth clearance of the goods, and avoid extra cost or delay. If the HS codes are wrongly classified, normal order of customs is disturbed, and the situation is seriously penalized by administration of customs.

In order to accurately fill in the coding of the commodity of the cosmetics, the declaration personnel of the enterprises need to know the basic knowledge of the classification of the HS codes, as well as the properties, characteristics, purposes, etc. of the commodity itself, which requires the accumulation of knowledge over the years and months, not every bit of the HS codes of the commodity can be classified and distinguished quickly and skillfully. Currently, there are many websites on the network that can query HS codes by obtaining a keyword entered by a user and then returning all relevant HS codes that contain the keyword. However, the results obtained by the query are numerous, different in category and lack of hierarchical relationship, and the matching degree is low, so that the time cost of querying commodity codes of enterprises is increased.

Content of the application

The embodiment of the application aims to provide a method and a device for predicting commodity codes, which are used for solving the defect of low matching degree of commodity code query in the prior art.

In order to solve the technical problems, the application is realized as follows:

in a first aspect, a method of predicting commodity coding is provided, comprising the steps of:

taking commodity information of cosmetics to be declared as a prediction sample, and calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each first-layer word stock, wherein each feature vector corresponds to one attribute of the cosmetics to be declared, and each first-layer word stock corresponds to one cosmetic class;

predicting the cosmetic category to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each first-layer word stock, the prior probability of each cosmetic category and the correlation between the cosmetic attribute and the cosmetic category in the historical declaration data;

calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word stock, wherein each second-layer word stock corresponds to a commodity code contained in the cosmetic category to which the cosmetic to be declared belongs;

predicting the commodity code of the cosmetic to be declared according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code and the correlation between the cosmetic attribute and the commodity code in the historical declaration data.

In a second aspect, there is provided an apparatus for predicting commodity codes, comprising:

the first calculation module is used for taking commodity information of cosmetics to be declared as a prediction sample, calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each first-layer word stock, wherein each feature vector corresponds to one attribute of the cosmetics to be declared, and each first-layer word stock corresponds to one cosmetic category;

the first prediction module is used for predicting the cosmetic category to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each first-layer word stock, the prior probability of each cosmetic category and the correlation between the cosmetic attribute and the cosmetic category in the historical declaration data;

the second calculation module is used for calculating the word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word stock, and each second-layer word stock corresponds to one commodity code contained in the cosmetic category to which the cosmetic to be declared belongs;

and the second prediction module is used for predicting the commodity code to which the cosmetics to be declared belong according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code and the correlation between the cosmetic attribute and the commodity code in the historical declaration data.

According to the embodiment of the application, the classification of the cosmetics to be declared and the commodity code to be declared are predicted according to the word frequency-inverse text frequency of each feature vector in the prediction sample and the correlation between the cosmetic attribute and the cosmetic class, so that the classification accuracy and the matching degree of the commodity code can be improved, and the commodity code inquiring time is further saved.

Drawings

FIG. 1 is a flow chart of a method for predicting commodity codes according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an apparatus for predicting commodity codes according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In order to solve the problems in the prior art, the embodiment of the application provides a method for searching cosmetic HS codes based on an improved naive Bayes classifier, which is improved from the following three aspects:

1. because the number of commodity codes contained under the cosmetics category is large, and the establishment difficulty of the multi-classification model is large, the embodiment of the application constructs a two-layer classification model, wherein the first layer predicts the cosmetic category to which the commodity belongs and the second layer predicts the specific commodity code to which the commodity belongs.

2. By calculating TF-IDF values instead of conditional probabilities in a naive bayes model, the evaluation of the importance of the terms in the classification process is increased.

3. By calculating the correlation between the attributes and the categories, different weights are given to different attributes, and the higher the degree of correlation between the attributes and the categories, the greater the importance of the attributes to the categories, and therefore the higher the weight given to the attributes.

In order to achieve the above object, the embodiment of the present application provides the following technical solutions:

1. and acquiring historical declaration data of cosmetics, and establishing a label extraction model based on commodity information filled in by enterprises.

2. Based on the improved naive Bayes classifier, a first-layer classification model is established, and the cosmetic category to which the commodity belongs is predicted.

2.1 definition of first layer classification model categories

2.2, acquiring historical declaration data of cosmetics as training samples, and establishing five word banks according to the types of the cosmetics.

And 2.3, acquiring commodity information of cosmetics to be declared, which is input by enterprises, as a prediction sample. The prediction samples are subjected to label extraction in step 1.

2.4 defines an improved naive bayes classifier:

2.5 calculating the prior probability for each cosmetic class

2.6 calculate prediction samples x= { X ₁ ,x ₂ ,...,x _n Each feature vector x in } _i The TF-IDF value of (i) is the word frequency-inverse text frequency, used to evaluate the importance of words in the word stock.

2.7 calculate the correlation of cosmetic properties to cosmetic categories. Giving each attribute a weight w according to the correlation _i The higher the correlation, the greater the weight.

2.8 based on the calculation results of steps 2.5, 2.6, 2.7, a prediction sample x= { X is calculated ₁ ,x ₂ ,...,x _n Probability of belonging to each categoryThen, the cosmetic class in which the probability is the greatest is selected as the prediction result of the first-layer classification model.

3. And based on the improved naive Bayes classifier, establishing a second-layer classification model, and predicting the specific commodity code to which the commodity belongs.

3.1, based on the cosmetic category of the commodity predicted by the first-layer classification model, establishing a second-layer classification model under the category. Defining second-level classification model classes

3.2 based on training samples and class L of the second-layer classification model ² And respectively establishing word libraries.

3.3 repeating the steps 2.3 to 2.8 to calculate the maximum posterior probabilityThe category with the highest probability is the predicted commodity code of the commodity.

The method for predicting commodity codes provided by the embodiment of the application is described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

As shown in fig. 1, a flowchart of a method for predicting commodity coding according to an embodiment of the present application is provided, where the method includes the following steps:

step 101, commodity information of cosmetics to be declared is used as a prediction sample, word frequency-inverse text frequency of each feature vector in the prediction sample in each first-layer word stock is calculated, each feature vector corresponds to one attribute of the cosmetics to be declared, and each first-layer word stock corresponds to one cosmetic category.

Step 102, predicting the cosmetic category to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each first-layer word stock, the prior probability of each cosmetic category, and the correlation between the cosmetic attribute and the cosmetic category in the historical declaration data.

Specifically, the weight coefficient of each feature vector in the prediction sample can be calculated according to the correlation between the cosmetic attribute and the cosmetic category in the historical declaration data;

the prediction samples x= { X are calculated by the following formulas, respectively ₁ ,x ₂ ,...,x _n Probability of belonging to each cosmetic class j

Wherein { x ₁ ,x ₂ ,...,x _n Is a plurality of feature vectors in the prediction samples,for the prior probability of cosmetic class j, tf _ij For the feature vector x _i The frequency of occurrence in the first layer word stock corresponding to cosmetic class j; idf (idf) _i For the feature vector x _i The frequency of the reverse text in the first-layer word stock; w (w) _i For the feature vector x _i Weight coefficient of (2);

and comparing the probabilities that the prediction samples belong to the cosmetic categories, and taking the cosmetic category with the highest probability as a prediction result.

In this embodiment, before calculating the word frequency-inverse text frequency of each feature vector in the prediction sample in each first layer word stock, historical declaration data of cosmetics may be further obtained as a training sample, and each declaration data in the training sample is classified according to a cosmetic class corresponding to a commodity code of each declaration data in the training sample; and respectively extracting labels from the declaration information of the multiple declaration data corresponding to each cosmetic class to obtain a first-layer word stock corresponding to each cosmetic class.

Step 103, calculating word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word stock, wherein each second-layer word stock corresponds to a commodity code contained in the cosmetic category to which the cosmetic to be declared belongs.

And 104, predicting the commodity code to which the cosmetics to be declared belong according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code and the correlation between the cosmetic attribute and the commodity code in the historical declaration data.

Specifically, the weight coefficient of each feature vector in the prediction sample can be calculated according to the correlation between the cosmetic attribute and commodity code in the historical declaration data;

the prediction samples x= { X are calculated by the following formulas, respectively ₁ ,x ₂ ,...,x _n Probability of belonging to each commodity code k

Wherein { x ₁ ,x ₂ ,...,x _n Is a plurality of feature vectors in the prediction samples,the prior probability of k for commodity code tf _ik For the feature vector x _i The frequency of occurrence in a second-layer word stock corresponding to the commodity code k; idf (idf) _i For the feature vector x _i The frequency of the reverse text in the second-layer word stock; w (w) _i For the feature vector x _i Weight coefficient of (2);

and comparing the probability that the prediction sample belongs to each commodity code, and taking the commodity code with the highest probability as a prediction result.

In this embodiment, before calculating the word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word stock, the plurality of pieces of declaration data may be further classified according to commodity codes of the plurality of pieces of declaration data corresponding to each cosmetic class; and respectively extracting labels from the declaration information of each declaration data corresponding to each commodity code to obtain a second-layer word stock corresponding to each commodity code.

Further, the technical solution of the embodiment of the present application may be described in detail as follows:

1. the method comprises the steps of acquiring historical declaration data of cosmetics, and establishing a label extraction model based on commodity information filled by enterprises, and specifically comprises the following steps:

1.1 definition of cosmetic Properties Z= { Z ₁ ,z ₂ ,...,z ₇ The goods types, the objects of use, the efficacy, the packaging, the specifications, the brands and the components are respectively.

1.2, dividing commodity information filled in by an enterprise into a plurality of piece information through word segmentation and attribute labeling. Specifically, the bert+crf model may be used to implement chinese named-body recognition. The examples of the present application are not limited herein.

1.3, duplicate word segmentation results are de-duplicated, and commodity types, usage objects, efficacy, packaging, specifications, brands and component attributes are extracted.

Use in "OLAY rinse water bloom|: facial moisturizing and whitening|packaging specification: 50G/bottle |brand: OLAY "is exemplified by the label extraction model, and the results are" OLAY-brand, cream-commodity type, face-subject of use, moisturizing and whitening-efficacy, G-specification, bottle-pack ".

2. Based on an improved naive Bayes classifier, a first-layer classification model is established, and the cosmetic category to which the commodity belongs is predicted, specifically comprising the following steps:

2.1 definition of first layer classification model categoriesRespectively, lip cosmetics, eye cosmetics, nail cosmetics, powdery cosmetics, and other cosmetics or cosmeticsSkin care products.

2.2, acquiring historical declaration data of cosmetics as training samples, and respectively establishing word libraries according to the types of the cosmetics, wherein the method specifically comprises the following steps of:

and classifying the data according to HS codes of each claim data to obtain data of 5 cosmetic categories. For example, declaration data of "33041000" 8 bits before HS encoding is classified into cosmetics for lips. And (3) extracting the label of the step (1) for the declaration information of declaration data in each cosmetic category to obtain 5 word libraries.

And 2.3, acquiring commodity information of cosmetics to be declared, which is input by enterprises, as a prediction sample. The commodity information of the cosmetics to be declared, which is input by enterprises, is extracted through the label in the step 1, and a feature vector X= { X is obtained ₁ ,x ₂ ,...,x _n Each feature x corresponds to an attribute z.

2.4 defines an improved naive bayes classifier, calculating the probability of each cosmetic class.

Specifically, based on the principle of naive Bayes classification, the probability that the prediction sample belongs to j classes of cosmetics is judged to be

Since P (X) is constant for all cosmetic categories, the posterior probability is maximizedCan be converted into maximizing the prior probability +.>Assuming that the feature vectors are mutually independent to obtain

And further obtain a naive Bayes classification model as

Wherein,,a priori probabilities for each cosmetic class; />Is a conditional probability, i.e. x _i Probability of the word occurring in class j word stock. From the above formula, x _i The higher the frequency of occurrence of words in the class j lexicon, the greater the probability that the sample belongs to class j cosmetics. In reality, however, some frequently occurring common words may not contribute much to classifying categories, such as "moisturizing", and may occur very frequently in each category, and simply using word frequencies may reduce the accuracy of the classifier. Therefore, we use the TF-IDF value instead of +.>

In addition, the weights of the attributes in the naive Bayes model are equal, but in the actual classification process of cosmetics, the importance of each attribute to classification is different, for example, in the first-layer classification model, the attribute of "using an object" plays the most important role in distinguishing the cosmetics categories, so different attributes can be given different weights to improve the accuracy of classification of the Bayes model. Let w be _i Is the characteristic x _i The improved naive bayes model is that

2.5 calculating the prior probability for each cosmetic classNamely, the cosmetic class j in the training sampleThe reporting data accounts for the proportion of the total reporting data, and the specific calculation formula is as follows:

wherein, |D| is the total amount of training samples, |D _j And I is the number of training samples with the cosmetic class of j.

2.6 calculate prediction samples x= { X ₁ ,x ₂ ,...,x _n Each feature vector x in } _i The TF-IDF value of (i) is the word frequency-inverse text frequency, used to evaluate the importance of words in the word stock. If a word occurs frequently in one word stock and less frequently in other word stocks, it is considered that the word has good class distinction capability. The specific calculation formula is as follows:

TF-IDF _ij ＝tf _ij ×idf _i

wherein tf is _ij Representing word frequency, namely the frequency of occurrence of i words in a j-class word stock; idf (idf) _i The inverse text frequency of the i word, i.e. the frequency of occurrence in other word stores, is represented.

tf _ij The specific formula of (2) is:

wherein n is _ij For the number of training samples of the occurrence of the i word in the j word stock, sigma _k n _kj The total training sample number in the j word stock. If the i word does not appear in the training set, the whole probability becomes 0, and in order to solve the problem of zero probability, laplacian smoothing can be used for correcting the probability, and tf after correction _ij The method comprises the following steps:

wherein m is the number of word stock.

idf _i The specific formula of (2) is:

wherein, D is the total training sample amount, |{ j: t _i ∈d _j The number of training samples containing i words is increased by 1 to avoid 0.

2.7 calculating the correlation of the cosmetic properties in the training sample with the cosmetic class. Each attribute is given a weight according to the correlation, the higher the correlation, the greater the weight.

Specifically, the cosmetic property z is calculated ₁ 、z ₂ 、...、z ₇ Class with cosmeticsTo measure the correlation between two event sets. The calculation formula is as follows:

wherein,,is cosmetic property z _i And cosmetic class L ¹ Is a joint probability distribution function of P (z) _i,k ) Andcosmetic properties z, respectively _i And cosmetic class L ¹ Is a function of the edge probability distribution of (a).

Further, I (z _i ；L ¹ ) As attribute z _i Weight coefficient of (a), i.e. w _i ＝I(z _i ；L ¹ )。

2.8 based on the results of the calculations of steps 2.5, 2.6, 2.7tf _ij 、idf _i 、w _i And an improved naive Bayes model formula

Calculate the prediction sample x= { X ₁ ,x ₂ ,...,x _n Probability of belonging to each categoryThen, the cosmetic class in which the probability is the greatest is selected as the prediction result of the first-layer classification model.

3. Based on the improved naive Bayes classifier, a second-layer classification model is established, and specific commodity codes of the commodity are predicted, wherein the specific commodity codes comprise the following steps:

3.1, based on the cosmetic category of the commodity predicted by the first-layer classification model, establishing a second-layer classification model under the category. Defining second-level classification model classesWherein the lip cosmetic contains 7 codes, the eye cosmetic contains 7 codes, the nail cosmetic contains 4 codes, the powder cosmetic contains 2 codes, other cosmetics or cosmetics contains 9 codes.

3.2 acquiring historical declaration data of the cosmetics according to class L of the second-layer classification model ² And respectively establishing word libraries. Namely, 7 word banks are built under the lip cosmetics, 7 word banks are built under the eye cosmetics, and so on.

According to the embodiment of the application, by establishing the method for searching the HS codes of the cosmetics based on the improved naive Bayesian classifier, enterprises can return the HS codes with highest matching degree only by inputting commodity information of the cosmetics to be declared, and the time for inquiring the commodity codes by the enterprises is greatly saved; by constructing a two-layer classification model, the first layer predicts the cosmetic category to which the commodity belongs, and the second layer predicts the specific commodity code to which the commodity belongs, so that the classification accuracy is improved; the calculation of the correlation between the attribute and the category is increased, the weight of the attribute with large contribution to the category discrimination is increased, and the weight of the attribute with small contribution to the category discrimination is reduced, so that the influence of low-weight attribute words on the classification result is weakened, and the stability of the classification effect is ensured.

As shown in fig. 2, a schematic structural diagram of an apparatus for predicting commodity codes according to an embodiment of the present application includes:

the first calculation module 210 is configured to calculate, using commodity information of the cosmetics to be declared as a prediction sample, word frequency-inverse text frequency of each feature vector in the prediction sample in each first-layer word stock, where each feature vector corresponds to an attribute of the cosmetics to be declared, and each first-layer word stock corresponds to a cosmetic category.

The first prediction module 220 is configured to predict, according to the word frequency-inverse text frequency of each feature vector in each first-layer word stock, the prior probability of each cosmetic category, and the correlation between the cosmetic attribute and the cosmetic category in the historical reporting data, the cosmetic category to which the cosmetic to be reported belongs.

Specifically, the first prediction module 220 is specifically configured to calculate a weight coefficient of each feature vector in the prediction sample according to a correlation between the cosmetic attribute and the cosmetic category in the historical declaration data; the prediction samples x= { X are calculated by the following formulas, respectively ₁ ,x ₂ ,...,x _n Probability of belonging to each cosmetic class j

The second calculating module 230 is configured to calculate a word frequency-inverse text frequency of each feature vector in the prediction sample in each second-layer word stock, where each second-layer word stock corresponds to a commodity code included in the cosmetic category to which the cosmetic to be declared belongs.

The second prediction module 240 is configured to predict the commodity code to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code, and the correlation between the cosmetic attribute and the commodity code in the historical declaration data.

Specifically, the second prediction module 240 is specifically configured to calculate a weight coefficient of each feature vector in the prediction sample according to a correlation between the cosmetic attribute and the commodity code in the historical declaration data; the prediction samples x= { X are calculated by the following formulas, respectively ₁ ,x ₂ ,...,x _n Probability of belonging to each commodity code k

Wherein,,{x ₁ ,x ₂ ,...,x _n is a plurality of feature vectors in the prediction samples,the prior probability of k for commodity code tf _ik For the feature vector x _i The frequency of occurrence in a second-layer word stock corresponding to the commodity code k; idf (idf) _i For the feature vector x _i The frequency of the reverse text in the second-layer word stock; w (w) _i For the feature vector x _i Weight coefficient of (2);

Further, the device further comprises:

the first classification module is used for acquiring historical declaration data of cosmetics as a training sample, and classifying each declaration data in the training sample according to the cosmetic category corresponding to the commodity code of each declaration data in the training sample; and respectively extracting labels from the declaration information of the multiple declaration data corresponding to each cosmetic class to obtain a first-layer word stock corresponding to each cosmetic class.

The second classification module is used for classifying the plurality of pieces of declaration data according to commodity codes of the plurality of pieces of declaration data corresponding to each cosmetic class; and respectively extracting labels from the declaration information of each declaration data corresponding to each commodity code to obtain a second-layer word stock corresponding to each commodity code.

The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the processes of the method embodiment of predicting commodity coding, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A method of predicting commodity codes, comprising the steps of:

predicting commodity codes of the cosmetics to be declared according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code and the correlation between the cosmetic attribute and the commodity code in the historical declaration data;

the predicting the cosmetic category to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each first-layer word stock, the prior probability of each cosmetic category and the correlation between the cosmetic attribute and the cosmetic category in the historical declaration data, specifically includes:

according to the correlation between the cosmetic attributes and the cosmetic categories in the historical declaration data, calculating the weight coefficient of each feature vector in the prediction sample;

the prediction samples x= { X are calculated by the following formulas, respectively ₁ ,x ₂ ,...,x _n Each cosmetic class

Probability of category j

2. The method of claim 1, wherein said calculating each feature vector in the prediction samples precedes a word frequency-inverse text frequency of the respective first level thesaurus further comprises:

acquiring historical declaration data of cosmetics as a training sample, and classifying each declaration data in the training sample according to the cosmetic category corresponding to commodity codes of each declaration data in the training sample;

and respectively extracting labels from the declaration information of the multiple declaration data corresponding to each cosmetic class to obtain a first-layer word stock corresponding to each cosmetic class.

3. The method of claim 1, wherein said calculating each feature vector in the prediction samples precedes a word frequency-inverse text frequency of the respective second level word stock further comprises:

classifying the multiple pieces of declaration data according to commodity codes of the multiple pieces of declaration data corresponding to each cosmetic class;

and respectively extracting labels from the declaration information of each declaration data corresponding to each commodity code to obtain a second-layer word stock corresponding to each commodity code.

4. The method according to claim 1, wherein predicting the commodity code to which the cosmetic to be declared belongs according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code, and the correlation between the cosmetic attribute and the commodity code in the historical declaration data specifically includes:

according to the correlation between the cosmetic attributes and commodity codes in the historical declaration data, calculating the weight coefficient of each feature vector in the prediction sample;

Wherein { x ₁ ,x ₂ ,...,x _n Is a plurality of feature vectors in the prediction samples,the prior probability of k for commodity code tf _ik The frequency of the feature vector xi appearing in the second-layer word stock corresponding to the commodity code k is used; idf (idf) _i The inverse text frequency of the feature vector xi in the second-layer word stock; w (w) _i For the feature vector x _i Weight coefficient of (2);

5. An apparatus for predicting commodity codes, comprising:

the second prediction module is used for predicting commodity codes of the cosmetics to be declared according to the word frequency-inverse text frequency of each feature vector in each second-layer word stock, the prior probability of each commodity code and the correlation between the cosmetic attribute and the commodity code in the historical declaration data;

the first prediction module is specifically configured to calculate a weight coefficient of each feature vector in the prediction sample according to a correlation between a cosmetic attribute and a cosmetic category in the historical declaration data;

6. The apparatus as recited in claim 5, further comprising:

7. The apparatus as recited in claim 5, further comprising:

8. The apparatus of claim 5, wherein the device comprises a plurality of sensors,

the second prediction module is specifically configured to calculate a weight coefficient of each feature vector in the prediction sample according to the correlation between the cosmetic attribute and the commodity code in the historical declaration data;