CN109784776B - Commodity quality risk judgment method based on label identification - Google Patents

Commodity quality risk judgment method based on label identification Download PDF

Info

Publication number
CN109784776B
CN109784776B CN201910145851.8A CN201910145851A CN109784776B CN 109784776 B CN109784776 B CN 109784776B CN 201910145851 A CN201910145851 A CN 201910145851A CN 109784776 B CN109784776 B CN 109784776B
Authority
CN
China
Prior art keywords
commodity
ingredients
commodities
ingredient
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910145851.8A
Other languages
Chinese (zh)
Other versions
CN109784776A (en
Inventor
张华桁
何军良
宋博
严伟
杨锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pinboluo Intelligent Technology Co ltd
Original Assignee
Shanghai Pinboluo Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pinboluo Intelligent Technology Co ltd filed Critical Shanghai Pinboluo Intelligent Technology Co ltd
Priority to CN201910145851.8A priority Critical patent/CN109784776B/en
Publication of CN109784776A publication Critical patent/CN109784776A/en
Application granted granted Critical
Publication of CN109784776B publication Critical patent/CN109784776B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a commodity quality risk judgment method based on label identification, which comprises the following steps: reading label information of a commodity, converting the commodity name into a numerical value coding set, and performing 0-1 vectorization expression on a commodity formula; calculating the comprehensive distance of each two commodities, and clustering by adopting a K-Medoide algorithm to obtain a commodity class; determining an illegal additive set according to the occurrence frequency of ingredients in the commodities, calculating average mutual information among ingredients for each commodity, selecting a certain amount of ingredients with the minimum average mutual information, and comparing and judging the ingredients with the illegal additive set. The method comprises commodity clustering, commodity classification and illegal additive identification, commodity information or a large number of rules of codes do not need to be registered in advance, illegal additives in a commodity formula can be automatically identified and memorized, the learning capability is realized, multiple languages can be compatible, and the quick identification of the illegal additives in the commodities and the automatic screening of commodity quality risks are realized.

Description

Commodity quality risk judgment method based on label identification
Technical Field
The invention relates to the technical field of commodity quality analysis, in particular to a commodity quality risk judgment method based on label identification.
Background
The quality of goods is related to the life and property safety of people, and is an important field of government supervision. The rapid development of the current imported cross-border e-commerce enables a large amount of goods produced and sold abroad to rapidly enter China. The cross-border e-commerce import trade has the characteristics of small batch and multiple batches, and great pressure is caused to customs and other supervision departments. Since the characters on the foreign product label are foreign, it is difficult to determine the product type according to the domestic standard, and therefore, it is very difficult to determine the risk of the product.
The main basis for judging the quality risk of the commodities is the national standard of the people's republic of China. At present, 303 food safety national standards such as dairy safety standards, mycotoxins, pesticide and veterinary drug residues, food additives and nutrition enhancers, prepackaged food labels and nutrition label general rules and the like are established and published in China, and more than 6000 food safety indexes are covered. Strictly speaking, judging whether a commodity has a quality risk requires sending a commodity sample to a laboratory for inspection, and then comparing the result with the national standard. However, in practical regulatory procedures, only a small portion of the sample is drawn for inspection due to the constraints. Most of the time, due to the lack of commodity pre-judgment, random inspection has low representativeness and can not accurately reflect the quality of commodities. The detection rate of risks is improved, suspected risk commodities are selected as far as possible for submission, the commodity quality risks need to be judged in advance before the implementation of a sampling behavior, and targeted sampling is carried out according to the judgment result.
The existing commodity quality risk judgment method mainly depends on scanning commodity bar codes, obtaining commodity information registered on a server, identifying risk items according to a predefined illegal word list, or coding national standards into rules and carrying out risk reasoning by using a rule engine. The method of scanning the bar code requires that the commodity information is registered in the database in advance, which is not suitable for imported commodities, especially for commodities imported for the first time; for the method of encoding all national standards into rules, it is necessary to consume considerable labor and time costs, and it is necessary to translate information such as names and formulas of foreign language commodities into chinese so as to correspond to the national standards of china.
At present, certain achievements are obtained in the aspect of the construction of a domestic and foreign import and export commodity quality safety risk supervision system, but some problems still exist. Some countries or regions, such as the european union, have commodity quality safety risk regulatory systems whose risk assessment is not data-driven, but based on a single event for early warning. At home, at present, units or departments use an 'intelligent import and export industrial product risk management informatization platform', although the application data analysis method is used for managing the commodity quality risk, the evaluation model is relatively solidified and cannot be expanded and automatically learned. In the prior art, although the 'risk assessment grading rule in the technology' considers the inherent risk of the product and special risk caused by the production place, the user and the like, the establishment of the rule depends on manual work, and the automation degree is not high.
Disclosure of Invention
The invention aims to provide a commodity quality risk judgment method based on label identification, which can realize identification and judgment of illegal additives in a commodity formula based on the information of a commodity label without coding a large number of rules, has high accuracy, is compatible with multiple languages, and can automatically screen commodity quality risks.
The invention adopts the following technical scheme:
the commodity quality risk judgment method based on label identification comprises the following steps:
firstly, inputting a commodity label;
secondly, judging whether the input commodity labels are in batches or in single;
1. when the label is a batch commodity label, the method comprises the following steps:
1.1 scanning batch commodity labels, extracting commodity names by adopting an N-gram language model, converting the commodity names into a set consisting of continuous N characters, and calculating the Jaccard distance between the two commodity names;
meanwhile, carrying out 0-1 vectorization representation on the commodity formula, and calculating the Cosine distance between the two commodity names;
1.2 calculating the comprehensive distance between two commodities;
clustering the commodities by adopting a K-Medoide clustering algorithm based on the comprehensive distance to obtain a commodity class;
1.3 establishing a set of offending additions belonging to each commodity class, the set of offending additions comprising: confirming an illegal additive set and a suspected illegal additive set, wherein the confirmed illegal additive set comprises illegal additives which are determined in historical risk information and belong to the commodity class, the suspected illegal additive set comprises newly appeared illegal additives, the newly appeared illegal additives are ingredients with the frequency of 0-p x n in an ingredient information list of the commodity, n is the number of commodities contained in the commodity class, p is a constant set manually, and 0< p < 1;
1.4 calculating the average mutual information of each ingredient and other ingredients of each commodity inside each commodity class;
1.5 selecting q ingredients with minimum average mutual information of each commodity, and detecting whether the q ingredients are contained in a suspected illegal additive set one by one; q is an artificially set positive integer and is less than the total number of ingredients of the commodity;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, entering the next step;
1.6 detecting whether a part of the ingredients other than the q ingredients is contained in the confirmed offending additive set;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, entering the next step;
1.7 judging whether the commodity has marked illegal additives: if yes, reporting the commodity and the corresponding illegal additives, and entering a third step; if not, judging the product to be qualified, and entering a third step;
2. when the label is a single commodity label, the method comprises the following steps:
2.1 scanning a single commodity label, extracting the commodity name by adopting an N-gram language model, converting the commodity name into a set consisting of continuous N characters, and calculating the Jaccard distance between the commodity and at most x commodities in each commodity class in historical data; wherein x is a manually set positive integer and is less than or equal to the number of all commodities in the commodity class;
meanwhile, carrying out 0-1 vectorization representation on the commodity formula, and calculating the Cosine distance between the commodity and at most x commodities in each commodity class in the historical data;
2.2 calculating the comprehensive distance between the commodity and at most x commodities in each commodity class in the historical data;
2.3, selecting y commodities with the minimum comprehensive distance from the commodities, and counting the commodity classes to which the commodities belong, wherein the commodity class with the maximum number of commodities is the commodity class of the commodities; wherein y is a manually set positive integer and is smaller than the total number of commodities participating in calculating the comprehensive distance;
2.4 calculating the average mutual information of each ingredient and other ingredients of the commodity inside the commodity class;
2.5 selecting q ingredients with minimum average mutual information, and detecting whether the q ingredients are contained in a suspected illegal additive set; q is an artificially set positive integer and is less than the total number of ingredients of the commodity;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, entering the next step;
2.6 detecting whether a portion of the ingredients other than the q ingredients are included in the identified offending additive set;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, entering the next step;
2.7 judging whether the commodity has marked illegal additives: if yes, reporting the commodity and the corresponding illegal additives, and entering a third step; if not, judging the product to be qualified, and entering a third step;
and thirdly, auditing, modifying, confirming and storing data by the user.
Preferably, the commodity class is the commodity class obtained in step 1.2 or a commodity class existing in a database.
In a preferred embodiment, the N-gram language models include a 1-gram language model and a 2-gram language model, wherein: the 1-gram language model extracts the names of the commodities adopting the label information in the form of the Indonesian system; the 2-gram language model extracts names of commodities which adopt tag information in a Chinese form.
In a preferred embodiment, in the steps 1.1 and 2.1, the method for performing 0-1 vectorization representation on the commercial formula comprises the following steps:
establishing an ingredient information list capable of adding ingredients at the tail end, if the ingredients of the commodity are already in the current ingredient information list, replacing the ingredients with the order of the ingredients in the ingredient information list, if the ingredients are not in the ingredient information list, adding the ingredients at the tail end in the ingredient information list, and then replacing the ingredients with the order of the ingredients in the ingredient information list;
the recipe of each commodity is represented by an array having a length of the ingredient information list, and the array takes 1 at the position where the ingredient is located in the ingredient information list when the ingredient is contained in the recipe, and otherwise takes 0 at the position.
In a preferred embodiment, in step 1, the method for calculating the comprehensive distance between two commodities comprises the following steps:
extracting the name of a commodity A through an N-gram language model to obtain a set A, extracting the name of a commodity B to obtain a set B, wherein a is the 0-1 vector representation of the formula of the commodity A, and B is the 0-1 vector representation of the formula of the commodity B; when the total distance between article a and article B is denoted by D (a, B):
Figure GDA0002659799390000041
wherein the content of the first and second substances,
Figure GDA0002659799390000042
airepresenting the i-th component of the vector a, biRepresenting the i-th component of the vector b,
j (A, B) represents the Jaccard distance, and C (A, B) represents the Cosine distance.
In a preferred embodiment, the average mutual information is represented by MI, and the calculating method includes:
(1) calculating mutual information I (r, r') of every two ingredients; wherein
Figure GDA0002659799390000051
Wherein p (r, r ') is the proportion of the commodities containing the ingredients r and r' in all the commodities, p (r) is the proportion of the commodity containing the ingredient r in all the commodities, and p (r ') is the proportion of the commodity containing the ingredient r' in all the commodities;
(2) calculating average mutual information MI, r and r of each ingredient and other ingredientsOther n ingredients r'1,r′2,r′3,...,r′i,...r′nThe average mutual information mi (r) of (a) is:
Figure GDA0002659799390000052
in a preferred embodiment, the K-medoid clustering algorithm calculates a clustering result by means of partitioning, and the method includes:
(1) randomly selecting K commodities as clustering centers, wherein K can be specified by a user;
(2) calculating the distance from each other commodity to the center of the K selected commodities, and attributing the commodities to the nearest clustering center to form a commodity cluster;
(3) calculating the sum of the distances between each commodity and other commodities in each commodity cluster, and selecting the commodity with the minimum distance sum as a new clustering center;
(4) and (3) repeating the steps (2) and (3) until the center of each commodity cluster is not changed any more, and finally forming the commodity cluster as a clustering result.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
(1) the method comprises commodity clustering, commodity classification and illegal additive identification based on mutual information after commodity label image identification, commodity information or a large number of rules for coding are not required to be registered in advance, and illegal additives in a commodity formula can be automatically identified and added in an unsupervised environment, so that the learning capability is realized, and the operation efficiency and accuracy are effectively improved;
(2) the name of the commodity is extracted by adopting an N-gram language model, so that the commodity is compatible with multiple languages;
(3) the rapid identification of one or more commodity illegal additives and the automatic screening of commodity quality risks are realized.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
fig. 1 is a flowchart of a method for determining a product quality risk based on tag identification according to the present invention.
Detailed Description
The present invention provides a method for determining a product quality risk based on tag identification, which is described in further detail below with reference to the accompanying drawings and examples in order to make the objects, technical solutions and effects of the present invention clearer and clearer. It should be understood that the embodiments described herein are only for the purpose of illustrating the present invention and are not to be construed as limiting the present invention.
The embodiment provides a method for determining a quality risk of a commodity based on tag identification, as shown in fig. 1, including the following steps:
firstly, inputting a commodity label.
And secondly, judging whether the input commodity labels are in batches or single.
(1) When a batch of merchandise tags, comprising the steps of:
1.1 scanning batch commodity labels, extracting commodity names by adopting an N-gram language model, converting the commodity names into a set consisting of continuous N characters, and calculating the Jaccard distance between the two commodity names. The N-gram language models include a 1-gram language model and a 2-gram language model, wherein: the 1-gram language model extracts the names of the commodities adopting the label information in the form of the Indonesian system; the 2-gram language model extracts names of commodities which adopt tag information in a Chinese form.
Meanwhile, the commodity formula is subjected to 0-1 vectorization expression, and the Cosine distance between the two commodity names is calculated. The method for performing 0-1 vectorization table on the commodity formula comprises the following steps:
establishing an ingredient information list capable of adding ingredients at the tail, wherein if the ingredients of the commodity are in the current ingredient information list, the ingredients are replaced by the order of the ingredients in the ingredient information list, if the ingredients are not in the ingredient information list, the ingredients are added at the tail in the ingredient information list, and then the ingredients are replaced by the order of the ingredients in the ingredient information list;
and representing the formula of each commodity by using an array with the length being the length of the ingredient information list, wherein when the ingredient is contained in the formula, the array takes 1 at the position of the ingredient in the ingredient information list, and otherwise, the array takes 0 at the position.
1.2 calculating the comprehensive distance between two commodities; and clustering the commodities by adopting a K-Medoide clustering algorithm based on the comprehensive distance to obtain a commodity class.
A method of calculating a composite distance between two items of merchandise comprising the steps of:
extracting the name of a commodity A through an N-gram language model to obtain a set A, extracting the name of a commodity B to obtain a set B, wherein a is the 0-1 vector representation of the formula of the commodity A, and B is the 0-1 vector representation of the formula of the commodity B; when the total distance between article a and article B is denoted by D (a, B):
Figure GDA0002659799390000061
wherein:
Figure GDA0002659799390000071
airepresenting the i-th component of the vector a, biRepresenting the i-th component of the vector b,
j (A, B) represents the Jaccard distance, and C (A, B) represents the Cosine distance.
1.3 establishing a set of offending additions belonging to each commodity class, the set of offending additions comprising: and confirming a violation additive set and a suspected violation additive set, wherein the confirmed violation additive set comprises the violation additives which are determined in the historical risk information and belong to the commodity class, the suspected violation additive set comprises newly appeared violation additives, the newly appeared violation additives are the ingredients with the appearance frequency of 0-p x n in the ingredient information list of the commodity, n is the number of the commodities contained in the commodity class, p is a constant which is set manually, and 0< p < 1.
1.4 calculating the average mutual information of each ingredient and other ingredients of each commodity inside each commodity class; the average mutual information is represented by MI, and the calculation method comprises the following steps:
calculating mutual information I (r, r') of every two ingredients, then
Figure GDA0002659799390000072
Wherein: p (r, r ') is the proportion of the commodities containing the ingredients r and r' in all the commodities, p (r) is the proportion of the commodity containing the ingredient r in all the commodities, and p (r ') is the proportion of the commodity containing the ingredient r' in all the commodities;
calculating average mutual information MI of each ingredient and other ingredients, and calculating average mutual information of the ingredient r and other n ingredients r'1,r′2,r′3,...,r′i,...r′nThe average mutual information mi (r) of (a) is:
Figure GDA0002659799390000073
1.5 selecting q ingredients with minimum average mutual information of each commodity, and detecting whether the q ingredients are contained in a suspected illegal additive set one by one; q is an artificially set positive integer and is less than the total number of ingredients of the commodity;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, the next step is carried out.
1.6 detecting whether a part of the ingredients other than the q ingredients is contained in the confirmed offending additive set;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, the next step is carried out.
1.7 judging whether the commodity has marked illegal additives: if yes, reporting the commodity and the corresponding illegal additives, and entering a third step; if not, the judgment is qualified, and the third step is entered.
2. When the label is a single commodity label, the method comprises the following steps:
2.1 scanning a single commodity label, extracting the commodity name by adopting an N-gram language model, converting the commodity name into a set consisting of continuous N characters, and calculating the Jaccard distance between the commodity and at most x commodities in each commodity class in historical data; wherein x is a manually set positive integer and is less than or equal to the number of all commodities in the commodity class.
Meanwhile, the commodity formula is subjected to 0-1 vectorization expression, and the Cosine distance between the commodity and at most x commodities in each commodity class in the historical data is calculated.
The commodity class in the history data refers to the commodity class obtained by operating step 1.2 for the non-category-labeled commodities input in batches or commodities of a naturally existing classified type in the database.
2.2 calculating the comprehensive distance between the commodity and at most x commodities in each commodity class in the historical data; the method for calculating the comprehensive distance is the same as the method for calculating the comprehensive distance in the step 1.2.
2.3, selecting y commodities with the minimum comprehensive distance from the commodities, and counting the commodity classes to which the commodities belong, wherein the commodity class with the maximum number of commodities is the commodity class of the commodities; wherein y is a manually set positive integer and is less than the total number of commodities participating in calculating the comprehensive distance.
2.4 calculating the average mutual information of each ingredient and other ingredients of the commodity inside the commodity class; the method for calculating the average mutual information is the same as the method for calculating the average mutual information in step 1.4.
2.5 selecting q ingredients with minimum average mutual information, and detecting whether the q ingredients are contained in a suspected illegal additive set; q is an artificially set positive integer and is less than the total number of ingredients of the commodity;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, the next step is carried out.
2.6 detecting whether a portion of the ingredients other than the q ingredients are included in the identified offending additive set;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, the next step is carried out.
2.7 judging whether the commodity has marked illegal additives: if yes, reporting the commodity and the corresponding illegal additives, and entering a third step; if not, the judgment is qualified, and the third step is entered.
And thirdly, auditing, modifying, confirming and storing data by the user.
The embodiments of the present invention have been described in detail, but the embodiments are merely examples, and the present invention is not limited to the embodiments described above. Any equivalent modifications and substitutions to those skilled in the art are also within the scope of the present invention. Accordingly, equivalent changes and modifications made without departing from the spirit and scope of the present invention should be covered by the present invention.

Claims (5)

1. The commodity quality risk judgment method based on label identification is characterized by comprising the following steps of:
firstly, inputting a commodity label;
secondly, judging whether the input commodity labels are in batches or in single;
(1) when the label is a batch commodity label, the method comprises the following steps:
1.1 scanning batch commodity labels, extracting commodity names by adopting an N-gram language model, converting the commodity names into a set consisting of continuous N characters, and calculating the Jaccard distance between the two commodity names;
meanwhile, carrying out 0-1 vectorization representation on the commodity formula, and calculating the Cosine distance between the two commodity names;
1.2 calculating the comprehensive distance between two commodities;
clustering the commodities by adopting a K-Medoide clustering algorithm based on the comprehensive distance to obtain a commodity class;
1.3 establishing a set of offending additions belonging to each commodity class, the set of offending additions comprising: confirming an illegal additive set and a suspected illegal additive set, wherein the confirmed illegal additive set comprises illegal additives which are determined in historical risk information and belong to the commodity class, the suspected illegal additive set comprises newly appeared illegal additives, the newly appeared illegal additives are ingredients with the frequency of 0-p x n in an ingredient information list of the commodity, n is the number of commodities contained in the commodity class, p is a constant set manually, and 0< p < 1;
1.4 calculating the average mutual information of each ingredient and other ingredients of each commodity inside each commodity class;
1.5 selecting q ingredients with minimum average mutual information of each commodity, and detecting whether the q ingredients are contained in a suspected illegal additive set one by one; q is an artificially set positive integer and is less than the total number of ingredients of the commodity;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, entering the next step;
1.6 detecting whether a part of the ingredients other than the q ingredients is contained in the confirmed offending additive set;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, entering the next step;
1.7 judging whether the commodity has marked illegal additives: if yes, reporting the commodity and the corresponding illegal additives, and entering a third step; if not, judging the product to be qualified, and entering a third step;
(2) when the label is a single commodity label, the method comprises the following steps:
2.1 scanning a single commodity label, extracting the commodity name by adopting an N-gram language model, converting the commodity name into a set consisting of continuous N characters, and calculating the Jaccard distance between the commodity and at most x commodities in each commodity class in historical data; wherein x is a manually set positive integer and is less than or equal to the number of all commodities in the commodity class;
meanwhile, carrying out 0-1 vectorization representation on the commodity formula, and calculating the Cosine distance between the commodity and at most x commodities in each commodity class in the historical data;
2.2 calculating the comprehensive distance between the commodity and at most x commodities in each commodity class in the historical data;
2.3, selecting y commodities with the minimum comprehensive distance from the commodities, and counting the commodity classes to which the commodities belong, wherein the commodity class with the maximum number of commodities is the commodity class of the commodities; wherein y is a manually set positive integer and is smaller than the total number of commodities participating in calculating the comprehensive distance;
2.4 calculating the average mutual information of each ingredient and other ingredients of the commodity inside the commodity class;
2.5 selecting q ingredients with minimum average mutual information, and detecting whether the q ingredients are contained in a suspected illegal additive set; q is an artificially set positive integer and is less than the total number of ingredients of the commodity;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, entering the next step;
2.6 detecting whether a portion of the ingredients other than the q ingredients are included in the identified offending additive set;
if yes, marking the contained ingredients as illegal additives of the commodity, and entering the next step; if not, entering the next step;
2.7 judging whether the commodity has marked illegal additives: if yes, reporting the commodity and the corresponding illegal additives, and entering a third step; if not, judging the product to be qualified, and entering a third step;
and thirdly, auditing, modifying, confirming and storing data by the user.
2. The commodity quality risk assessment method according to claim 1, wherein the N-gram language model includes a 1-gram language model and a 2-gram language model, wherein: the 1-gram language model extracts the names of the commodities adopting the label information in the form of the Indonesian system; the 2-gram language model extracts names of commodities which adopt tag information in a Chinese form.
3. The method for determining the risk of quality of a commodity according to claim 1, wherein in the steps 1.1 and 2.1, the method for performing a 0-1 vectorization table on a commodity formula comprises the following steps:
establishing an ingredient information list capable of adding ingredients at the tail end, if the ingredients of the commodity are already in the current ingredient information list, replacing the ingredients with the order of the ingredients in the ingredient information list, if the ingredients are not in the ingredient information list, adding the ingredients at the tail end in the ingredient information list, and then replacing the ingredients with the order of the ingredients in the ingredient information list;
the recipe of each commodity is represented by an array having a length of the ingredient information list, and the array takes 1 at the position where the ingredient is located in the ingredient information list when the ingredient is contained in the recipe, and otherwise takes 0 at the position.
4. The method for determining a quality risk of a commodity according to claim 1, wherein the method for calculating the integrated distance between two commodities in step 1 includes the steps of:
extracting the name of a commodity A through an N-gram language model to obtain a set A, extracting the name of a commodity B to obtain a set B, wherein a is the 0-1 vector representation of the formula of the commodity A, and B is the 0-1 vector representation of the formula of the commodity B; when the total distance between article a and article B is denoted by D (a, B):
Figure FDA0002659799380000021
wherein the content of the first and second substances,
Figure FDA0002659799380000022
airepresenting the i-th component of the vector a, biRepresenting the i-th component of the vector b,
j (A, B) represents the Jaccard distance, and C (A, B) represents the Cosine distance.
5. The method of determining a product quality risk according to claim 1, wherein the average mutual information is represented by MI, and the calculation method includes:
(1) calculating mutual information I (r, r') of every two ingredients
Figure FDA0002659799380000031
Wherein: p (r, r ') is the proportion of the commodities containing the ingredients r and r' in all the commodities, p (r) is the proportion of the commodity containing the ingredient r in all the commodities, and p (r ') is the proportion of the commodity containing the ingredient r' in all the commodities;
(2) calculating average mutual information MI of each ingredient and other ingredients, and calculating average mutual information of the ingredient r and other n ingredients r'1,r'2,r'3,...,r'i,...r'nThe average mutual information mi (r) of (a) is:
Figure FDA0002659799380000032
CN201910145851.8A 2019-02-27 2019-02-27 Commodity quality risk judgment method based on label identification Active CN109784776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910145851.8A CN109784776B (en) 2019-02-27 2019-02-27 Commodity quality risk judgment method based on label identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910145851.8A CN109784776B (en) 2019-02-27 2019-02-27 Commodity quality risk judgment method based on label identification

Publications (2)

Publication Number Publication Date
CN109784776A CN109784776A (en) 2019-05-21
CN109784776B true CN109784776B (en) 2020-11-06

Family

ID=66486478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910145851.8A Active CN109784776B (en) 2019-02-27 2019-02-27 Commodity quality risk judgment method based on label identification

Country Status (1)

Country Link
CN (1) CN109784776B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108267443A (en) * 2018-01-22 2018-07-10 吴君国 The system and detection method of additive are harmful in field quick detection food

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202093541U (en) * 2011-06-08 2011-12-28 山西中特物联科技股份有限公司 Food safety tracking smart label
CN107230137A (en) * 2017-05-31 2017-10-03 北京小米移动软件有限公司 Merchandise news acquisition methods and device
CN107977845B (en) * 2017-12-21 2021-10-22 华中农业大学 Food traceability system and method based on label information
CN108595418A (en) * 2018-04-03 2018-09-28 上海透云物联网科技有限公司 A kind of commodity classification method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108267443A (en) * 2018-01-22 2018-07-10 吴君国 The system and detection method of additive are harmful in field quick detection food

Also Published As

Publication number Publication date
CN109784776A (en) 2019-05-21

Similar Documents

Publication Publication Date Title
CN107515873B (en) Junk information identification method and equipment
CN109284372B (en) User operation behavior analysis method, electronic device and computer readable storage medium
CN107145516B (en) Text clustering method and system
CN105095755A (en) File recognition method and apparatus
CN112053061A (en) Method and device for identifying surrounding label behaviors, electronic equipment and storage medium
CN104731958A (en) User-demand-oriented cloud manufacturing service recommendation method
CN107038591A (en) A kind of aquatic products electronics traceability system
CN113626607B (en) Abnormal work order identification method and device, electronic equipment and readable storage medium
CN108241867B (en) Classification method and device
CN116402399B (en) Business data processing method and system based on artificial intelligence and electronic mall
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN113268615A (en) Resource label generation method and device, electronic equipment and storage medium
CN114386856A (en) Method, device and equipment for identifying empty-shell enterprise and computer storage medium
CN110619535A (en) Data processing method and device
CN109784776B (en) Commodity quality risk judgment method based on label identification
CN105447076A (en) Web page tag based security monitoring method and system
CN111222923B (en) Method and device for judging potential clients, electronic equipment and storage medium
CN110941713B (en) Self-optimizing financial information block classification method based on topic model
CN115578155A (en) Order searching method and device, computer equipment and storage medium
CN114357178A (en) Commodity label information processing method, commodity label information processing device, storage medium and commodity label information processing system
KR102110350B1 (en) Domain classifying device and method for non-standardized databases
CN113822715A (en) Data acquisition, training and processing integrated platform analysis method
CN114341756A (en) Server and system for automatically selecting tags for modeling and anomaly detection
CN116029299B (en) Named entity recognition method, system and storage medium based on polysemous words
CN115187387B (en) Identification method and equipment for risk merchant

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant