CN107247970A - A kind of method for digging and device of commodity qualification rate correlation rule - Google Patents

A kind of method for digging and device of commodity qualification rate correlation rule Download PDF

Info

Publication number
CN107247970A
CN107247970A CN201710487560.8A CN201710487560A CN107247970A CN 107247970 A CN107247970 A CN 107247970A CN 201710487560 A CN201710487560 A CN 201710487560A CN 107247970 A CN107247970 A CN 107247970A
Authority
CN
China
Prior art keywords
data
characteristic
decision tree
qualification rate
association rules
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710487560.8A
Other languages
Chinese (zh)
Inventor
王连印
凌建华
魏旭晖
黄景涛
黄晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI TENLY SOFTWARE Inc
Information Center Of State Administration Of Quality Supervision Inspection And Quarantine
Original Assignee
SHANGHAI TENLY SOFTWARE Inc
Information Center Of State Administration Of Quality Supervision Inspection And Quarantine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI TENLY SOFTWARE Inc, Information Center Of State Administration Of Quality Supervision Inspection And Quarantine filed Critical SHANGHAI TENLY SOFTWARE Inc
Priority to CN201710487560.8A priority Critical patent/CN107247970A/en
Publication of CN107247970A publication Critical patent/CN107247970A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of method for digging and device of commodity qualification rate correlation rule, memory module, for obtaining and storing original training data collection;First excavates module, for carrying out tagsort to data training set using decision Tree algorithms, and extracts characteristic of division variable importance data set;Second excavates module, and for characteristic parameter importance threshold values will to be set to exclude Multidimensional Association Rules training dataset distracter to obtained characteristic variable importance data set and tune ginseng data cross, screening obtains pure characteristic variable parameter set;3rd excavates module, for pure characteristic variable parameter set to be obtained into commodity qualification rate rule model by Multidimensional Association Rules.The advantage of the invention is that:Optimize the input variable optimization of Association Rules Model;The value after the information gain standardization of decision tree spanning tree is utilized simultaneously, it is to avoid decision tree is in face of continuous variable and the calculating performance issue of sequence type data;Without the extensive beta pruning optimization problem of decision tree spanning tree.

Description

A kind of method for digging and device of commodity qualification rate correlation rule
Technical field
The present invention relates to a kind of method for digging and device of commodity qualification rate correlation rule.
Background technology
Inspection and quarantine business statistics data are collecting and count to the data produced by routine check quarantine business, from total Reflect the operation conditions of regular period inspection and quarantine business on body, and support from different perspectives to carry out the every business of inspection and quarantine Analysis, including the inspection declaration of inspection and quarantine business enterprise, concentrate document examination, the view of the scene, examine detection etc. produced by data.
In daily inspection and quarantine business generally by the way of sampling inspection, comprehensively detection is examined and can not almost done Arrive;To give batch commodity, not all examine for every batch, excavate import-export commodity quality law, determine emphasis inspection content, Detection and degree of risk, just turn into the important means that big data aids in quality testing department to solve this thorny problem.
Rule is understood using big data analysis in the industry at present, more typically using Multidimensional Association Rules, but multidimensional association rule Then have:
Database table it is very huge and to input data without examination ability, cause invalid or onrelevant variable information excessively to produce, And algorithm model generation is easily excessively extensive, and support it is relatively low when when adding a large amount of Hash functions, Mining Multidimensional Association Rules Efficiency can low-down shortcoming.
The content of the invention
The Multidimensional Association Rules data used for above-mentioned commodity inspection quarantine commodity big data analysis are huge without examination energy Power, the low technical problem of efficiency, the present invention provide a kind of method of use decision-tree model algorithm optimization Multidimensional Association Rules and Device, it is specific as follows:
A kind of method for digging of commodity qualification rate correlation rule, the method for digging comprises the following steps:
A. original training data collection is obtained;
B. tagsort is carried out to data training set using decision Tree algorithms, and extracts characteristic of division variable importance data Collection;
C., characteristic variable importance data set and adjust ginseng data cross row that characteristic parameter importance threshold values is obtained to step B are set Except Multidimensional Association Rules training dataset distracter, screening obtains pure characteristic variable parameter set;
D. the pure characteristic variable parameter set obtained to step C obtains commodity qualification rate rule model by Multidimensional Association Rules.
On the basis of above-mentioned technical proposal, further, the step B is entered using decision Tree algorithms to data training set Decision Tree algorithms described in row tagsort are C4.5 decision Tree algorithms.
Further, a kind of device of the excavation of commodity qualification rate correlation rule, it is characterised in that including:
Memory module, for obtaining and storing original training data collection;
First excavates module, for carrying out tagsort to data training set using decision Tree algorithms, and extracts characteristic of division Variable importance data set;
Second excavates module, for will set characteristic parameter importance threshold values to obtained characteristic variable importance data set and tune Join data cross and exclude Multidimensional Association Rules training dataset distracter, screening obtains pure characteristic variable parameter set;
3rd excavates module, for pure characteristic variable parameter set to be obtained into the regular mould of commodity qualification rate by Multidimensional Association Rules Type.
The advantage of the invention is that:The input variable optimization of Association Rules Model is optimized, while being generated using decision tree Value after the information gain standardization of tree, it is to avoid decision tree is in face of continuous variable and the calculating performance issue of sequence type data; Without the extensive beta pruning optimization problem of decision tree spanning tree.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the method for digging of commodity qualification rate correlation rule of the present invention;
Fig. 2 is the structural representation of the excavating gear of commodity qualification rate correlation rule of the present invention.
Embodiment
Embodiments of the invention are described below in detail, the strength of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar original paper or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and be not considered as limiting the invention.
As shown in figure 1, planting the method for digging of commodity qualification rate correlation rule, the method for digging comprises the following steps:
A. original training data collection is obtained;
B. tagsort is carried out to data training set using decision Tree algorithms, and extracts characteristic of division variable importance data Collection;
C., characteristic variable importance data set and adjust ginseng data cross row that characteristic parameter importance threshold values is obtained to step B are set Except Multidimensional Association Rules training dataset distracter, screening obtains pure characteristic variable parameter set;
D. the pure characteristic variable parameter set obtained to step C obtains commodity qualification rate rule model by Multidimensional Association Rules.
Wherein step B is specific as follows:
B1:The training set obtained according to step A, it is multinode or single-node data collection to judge the training set, if single node Data set is directly transferred to step D and sets up model;
B2:If S is the set of n data sample, sample set is divided into c different classes
, each classThe number of samples contained is, then S be divided into the comentropy of c class or expect letter Breath, has
WhereinIt is that sample belongs to the i-th class in SProbability, i.e.,
It is combined into assuming that attribute A all different values must collect,It is that the value of attribute A in S is v sample set, i.e.,, on each branch node after selecting attribute A, to the sample set of the node The entropy of classification.It is expected that entropy is defined as each subset caused by selection AEntropy weighted sum, weights is belong to's Sample accounts for original sample S ratio, that is, it is expected that entropy is
Wherein,Being willIn sample be divided into the comentropy of c class, information of the attribute A with respect to sample set S increases Beneficial GainIt is defined as
Information gain GainThe expectation compression of entropy, Gain caused by referring to know after attribute A valueIt is bigger, say The information that bright selection testing attribute A is provided classification is more.
Information gain is used for dividing the feature of training dataset, there is the Characteristic Problem for being partial to select value more, The use information ratio of gains(information gain ratio)This problem can be corrected.This is the another of feature selecting One criterion information gain ratioIt is defined as follows
B3:Information gain is chosen than current maximum structure current node, and records this tagsort parameter;
B4:Corresponding node builds decision tree ergodic data collection, obtains all information gain ratios.
B5:Information gain is exported than being preserved after standardization as characteristic of division variable importance data set.
Wherein step C is as follows:
C1:Characteristic variable importance data set DB and Multidimensional Association Rules minimum support that input step B is obtained;
C2:Scan data set first finds out all frequency collection, the frequency that these item collections occur at least with predefined most ramuscule Degree of holding is the same;Then Strong association rule is produced by frequency collection, these rules must are fulfilled for minimum support and Minimum support4;Then Desired rule is produced using the C1 frequency collection found, the strictly all rules of the item only comprising set is produced, each of which rule Right part only has one.
It is defined as follows:It can represent to shape the implications such as A → B, what the conjunctive normal form that A and B are expressed as rule was constituted Logical formula, A ∩ B=.Its major parameter has support and confidence level.
(1)Support S
Affairs A and B percentage are included in transaction set D simultaneously, being referred to as rule A → B has support S.
The computational methods of support are:
S (A → B)=things number comprising A and B/things sum × 100%
(2)Confidence level C
The percentage of number of transactions comprising A and the number of transactions comprising B simultaneously in transaction set D, being referred to as rule A → B has confidence level C。
The computational methods of confidence level are:
C (A → B)=the things number for including A and B/includes A things number × 100%
The rule referred to as Strong association rule of minimum support and min confidence is met simultaneously, i.e., wished in association rule mining Hope the correlation rule found.
C3:Using downward closing attribute, if that is, one item collection is Frequent Item Sets, then its nonvoid subset must be Frequent Item Sets, the subset of Frequent Set also must be Frequent Set.The like, all Frequent Item Sets are generated, then from frequency Qualified correlation rule is found out in numerous Item Sets.
C4:By joint and the step of beta pruning two, a Frequent Set is generated.For example:
1, wherein Lk-1 are Frequent Set.Merge the item for only having last element different, such as
{ 1,2 }, { 1,3 }, { Isosorbide-5-Nitrae }, { 2,3 }, { 2,4 }
Generate 3- Frequent Item Sets:
Because { 1,2 }, { 1,3 }, { Isosorbide-5-Nitrae } is all identical in addition to last element, institute is in the hope of { 1,2 }, and the union of { 1,3 } is obtained To { 1,2,3 }, the union of { 1,2 } and { Isosorbide-5-Nitrae } obtains { 1,2,4 }, and the union of { 1,3 } and { Isosorbide-5-Nitrae } is obtained { 1,3,4 }.But by Subset { 3,4 } in { 1,3,4 } is not concentrated in 2- frequent items, so needing { 1,3,4 } to weed out.
2, the set after merging, if support is unsatisfactory for requiring, deletes the merging set.
C5:For all Frequent Sets for meeting minimum support, strong rule association is obtained according to min confidence.
As shown in Fig. 2 a kind of device of the excavation of commodity qualification rate correlation rule, it is characterised in that including:
Memory module 10, for obtaining and storing original training data collection;
First excavates module 11, and for carrying out tagsort to data training set using decision Tree algorithms, and it is special to extract classification Levy variable importance data set;
Second excavates module 12, for characteristic parameter importance threshold values will to be set to obtained characteristic variable importance data set and Ginseng data cross is adjusted to exclude Multidimensional Association Rules training dataset distracter, screening obtains pure characteristic variable parameter set;
3rd excavates module 13, for pure characteristic variable parameter set to be obtained into commodity qualification rate rule by Multidimensional Association Rules Model.
Although embodiments of the invention have been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to limitation of the present invention is interpreted as, one of ordinary skill in the art is not departing from the principle and objective of the present invention In the case of above-described embodiment can be changed within the scope of the invention, change, replace and modification.The model of the present invention Enclose and extremely equally limited by appended claims.

Claims (3)

1. a kind of method for digging of commodity qualification rate correlation rule, it is characterised in that the method for digging comprises the following steps:
A. original training data collection is obtained;
B. tagsort is carried out to data training set using decision Tree algorithms, and extracts characteristic of division variable importance data Collection;
C., characteristic variable importance data set and adjust ginseng data cross row that characteristic parameter importance threshold values is obtained to step B are set Except Multidimensional Association Rules training dataset distracter, screening obtains pure characteristic variable parameter set;
D. the pure characteristic variable parameter set obtained to step C obtains commodity qualification rate rule model by Multidimensional Association Rules.
2. a kind of method for digging of commodity qualification rate correlation rule according to claim 1, it is characterised in that the step It is C4.5 decision Tree algorithms that B carries out decision Tree algorithms described in tagsort to data training set using decision Tree algorithms.
3. a kind of device of the excavation of commodity qualification rate correlation rule, it is characterised in that including:
Memory module, for obtaining and storing original training data collection;
First excavates module, for carrying out tagsort to data training set using decision Tree algorithms, and extracts characteristic of division Variable importance data set;
Second excavates module, for will set characteristic parameter importance threshold values to obtained characteristic variable importance data set and tune Join data cross and exclude Multidimensional Association Rules training dataset distracter, screening obtains pure characteristic variable parameter set;
3rd excavates module, for pure characteristic variable parameter set to be obtained into the regular mould of commodity qualification rate by Multidimensional Association Rules Type.
CN201710487560.8A 2017-06-23 2017-06-23 A kind of method for digging and device of commodity qualification rate correlation rule Pending CN107247970A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710487560.8A CN107247970A (en) 2017-06-23 2017-06-23 A kind of method for digging and device of commodity qualification rate correlation rule

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710487560.8A CN107247970A (en) 2017-06-23 2017-06-23 A kind of method for digging and device of commodity qualification rate correlation rule

Publications (1)

Publication Number Publication Date
CN107247970A true CN107247970A (en) 2017-10-13

Family

ID=60019539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710487560.8A Pending CN107247970A (en) 2017-06-23 2017-06-23 A kind of method for digging and device of commodity qualification rate correlation rule

Country Status (1)

Country Link
CN (1) CN107247970A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520039A (en) * 2018-04-02 2018-09-11 河南大学 A kind of big data method for optimization analysis
CN110119551A (en) * 2019-04-29 2019-08-13 西安电子科技大学 Shield machine cutter abrasion degeneration linked character analysis method based on machine learning
CN111670445A (en) * 2018-01-31 2020-09-15 Asml荷兰有限公司 Substrate marking method based on process parameters
CN117376108A (en) * 2023-12-07 2024-01-09 深圳市亲邻科技有限公司 Intelligent operation and maintenance method and system for Internet of things equipment
CN117725527A (en) * 2023-12-27 2024-03-19 北京领雁科技股份有限公司 Score model optimization method based on machine learning analysis rules

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101419627A (en) * 2008-12-03 2009-04-29 山东中烟工业公司 Cigarette composition maintenance action digging system based on associations ruler and method thereof
CN102567807A (en) * 2010-12-23 2012-07-11 上海亚太计算机信息系统有限公司 Method for predicating gas card customer churn
CN104239437A (en) * 2014-08-28 2014-12-24 国家电网公司 Power-network-dispatching-oriented intelligent warning analysis method
CN106407349A (en) * 2016-09-06 2017-02-15 北京三快在线科技有限公司 Product recommendation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101419627A (en) * 2008-12-03 2009-04-29 山东中烟工业公司 Cigarette composition maintenance action digging system based on associations ruler and method thereof
CN102567807A (en) * 2010-12-23 2012-07-11 上海亚太计算机信息系统有限公司 Method for predicating gas card customer churn
CN104239437A (en) * 2014-08-28 2014-12-24 国家电网公司 Power-network-dispatching-oriented intelligent warning analysis method
CN106407349A (en) * 2016-09-06 2017-02-15 北京三快在线科技有限公司 Product recommendation method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111670445A (en) * 2018-01-31 2020-09-15 Asml荷兰有限公司 Substrate marking method based on process parameters
CN111670445B (en) * 2018-01-31 2024-03-22 Asml荷兰有限公司 Substrate marking method based on process parameters
CN108520039A (en) * 2018-04-02 2018-09-11 河南大学 A kind of big data method for optimization analysis
CN110119551A (en) * 2019-04-29 2019-08-13 西安电子科技大学 Shield machine cutter abrasion degeneration linked character analysis method based on machine learning
CN117376108A (en) * 2023-12-07 2024-01-09 深圳市亲邻科技有限公司 Intelligent operation and maintenance method and system for Internet of things equipment
CN117376108B (en) * 2023-12-07 2024-03-01 深圳市亲邻科技有限公司 Intelligent operation and maintenance method and system for Internet of things equipment
CN117725527A (en) * 2023-12-27 2024-03-19 北京领雁科技股份有限公司 Score model optimization method based on machine learning analysis rules

Similar Documents

Publication Publication Date Title
CN107247970A (en) A kind of method for digging and device of commodity qualification rate correlation rule
Aldino et al. Implementation of K-means algorithm for clustering corn planting feasibility area in south lampung regency
Tang et al. When do random forests fail?
US8346779B2 (en) Method and system for extended bitmap indexing
CN102364498B (en) Multi-label-based image recognition method
CN108960833B (en) Abnormal transaction identification method, equipment and storage medium based on heterogeneous financial characteristics
CN110135494A (en) Feature selection approach based on maximum information coefficient and Geordie index
CN104462184B (en) A kind of large-scale data abnormality recognition method based on two-way sampling combination
CN106339942A (en) Financial information processing method and system
CN104933444B (en) A kind of design method of the multi-level clustering syncretizing mechanism towards multidimensional property data
CN105373606A (en) Unbalanced data sampling method in improved C4.5 decision tree algorithm
CN104216874B (en) Positive and negative mode excavation method and system are weighted between the Chinese word based on coefficient correlation
CN108846338A (en) Polarization characteristic selection and classification method based on object-oriented random forest
CN109299185B (en) Analysis method for convolutional neural network extraction features aiming at time sequence flow data
CN108280236A (en) A kind of random forest visualization data analysing method based on LargeVis
CN110533116A (en) Based on the adaptive set of Euclidean distance at unbalanced data classification method
CN110297853A (en) Frequent Set method for digging and device
CN108596227B (en) Mining method for dominant influence factors of electricity consumption behaviors of users
CN105045806A (en) Dynamic splitting and maintenance method of quantile query oriented summary data
CN115952067A (en) Database operation abnormal behavior detection method and readable storage medium
CN102799616A (en) Outlier point detection method in large-scale social network
CN109389172B (en) Radio signal data clustering method based on non-parameter grid
Zhang et al. Multiscale analysis of time irreversibility based on phase-space reconstruction and horizontal visibility graph approach
CN105938561A (en) Canonical-correlation-analysis-based computer data attribute reduction method
Dong Application of Big Data Mining Technology in Blockchain Computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171013

WD01 Invention patent application deemed withdrawn after publication