CN105825226A - Association-rule-based distributed multi-label image identification method - Google Patents

Association-rule-based distributed multi-label image identification method Download PDF

Info

Publication number
CN105825226A
CN105825226A CN201610141659.8A CN201610141659A CN105825226A CN 105825226 A CN105825226 A CN 105825226A CN 201610141659 A CN201610141659 A CN 201610141659A CN 105825226 A CN105825226 A CN 105825226A
Authority
CN
China
Prior art keywords
tag
frequent
image
item sets
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610141659.8A
Other languages
Chinese (zh)
Inventor
彭彦
朱玉全
李竞
何峰
余飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Changyuan Information Technology Co ltd
Original Assignee
Jiangsu Changyuan Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Changyuan Information Technology Co ltd filed Critical Jiangsu Changyuan Information Technology Co ltd
Priority to CN201610141659.8A priority Critical patent/CN105825226A/en
Publication of CN105825226A publication Critical patent/CN105825226A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The invention discloses an association-classification-rule-based distributed multi-label image identification method. The method comprises steps of distributed image sample pretreatment, image segmentation, feature extraction, feature value discretization, mining of a global frequent item set L, multi-label association classification rule (MLACR) construction, and image identification. With utilization of a binary-form-based global candidate item set generation, trimming, and supporting number calculation method is employed, the realization difficulty and the communication traffic of the algorithm are reduced. According to the method, trimming is carried out twice, so that the global candidate item set scale is reduced obviously and the algorithm execution efficiency is improved further. With a reduction method, superfluous rule occurrence in the MLACR can be avoided. Multiple labels includes by an image on the training sample distributed condition can be identified by one time; calculation of the global candidate frequent item set and the supporting number in a distributed environment can be realized rapidly; and the accuracy and high-efficiency multi-label image identification function can be realized.

Description

A kind of distributed multi-tag image-recognizing method based on correlation rule
Technical field
The invention belongs to the application of the computer analytical technology of multi-tag image, be specifically related to a kind of recognition methods with distributed multi-tag image.
Background technology
Multi-tag image recognition is an important research branch in data mining technology, and it is intended to construct a classification function or grader by training image sample data set, and utilizes this classification function or grader to identify the tally set of testing image.At present, the multi-tag sorting technique that can be used for image recognition has ML-KNN, C4.5, Bp-MLL, PT series of improvement, PPT, PPT-n, MMAC, RAKEL, RPC, CLR, INSDIF, MLRW, ML-CMBAR2 etc..ML-KNN is a kind of based on KNN multi-tag sorting technique that ZhangM.L. et al. proposes, the method draws the prior probability of each label by statistical method, when inputting testing image data x, each label s in tally set S is calculated x respectively there is label s and not there is the probability of label s, and then whether prediction x has label s.Algorithm Bp-MLL is by defining the global optimization function for multi-tag view data so that artificial neural network can process multi-tag data.PT serial algorithm attempts to existing sorting technique based on single label and solves multi-tag classification problem, the most disposably the training image sample data all training data samples comprising multiple label of concentration are converted into single label data, after respective handling, algorithm faced by training data sample set be single exemplar collection, thus multi-tag classification problem is converted into single labeling problem.Uncontrollability for number of labels new in PT method, algorithm PPT, PPT-n, RAKEL propose a series of processing method, algorithm PPT and PPT-n reduces the quantity of new label by arranging of threshold value, and algorithm RA KEL is then to reduce its quantity by the way of randomly choosing.Algorithm RPC, CLR then by relation between any two label in contrast tally set S, set up k (k-1)/2 grader, and each grader is voted between two labels, combines these voting results as final multi-tag classification results.ML-CMBAR2 is a kind of multi-tag sorting technique based on correlation rule.
In many actual application, data itself are distributions, and between them in addition to by network delivery information, other resource is all independent.For solving problems, a feasible scheme is to be focused on a certain machine by these data sets, recycles existing algorithm to construct multi-tag grader, or utilizes MapReduce programming model to the multi-tag grader constructing under distributional environment.Generally, at least there is the problem of two aspects in this type of thought, one is to need the performance computer that relatively (very) is high to store and process jumbo data, and two is under many circumstances, for the consideration to Information Security and privacy, data can not put together.To this, the present invention proposes a kind of distributed multi-tag image-recognizing method based on correlation rule, and the method, by finding that the correlation rule that under distributional environment, training image sample data is concentrated constructs multi-tag grader, is achieved in the automatic identification of image.
Summary of the invention
It is an object of the invention to provide a kind of method disposably image comprising multiple label being identified, the method can quickly generate the overall candidate's Frequent Item Sets under distributed environment and support the calculating of number, it is achieved the multi-tag image identification function of precise and high efficiency.
The technical scheme is that a kind of distributed multi-tag image-recognizing method based on correlation rule, including the generation of overall situation candidate's Frequent Item Sets, the calculating supporting number and image recognizing step, it is characterised in that: the generation of described overall situation candidate's Frequent Item Sets, the calculating of support number and image recognizing step include:
The preparation of step 1 distributed image sample data set and pretreatment, including each website training image form conversion, dimension normalization, denoising, enhancing;
The each website of step 2 uses image partition method based on Density Clustering to identify the region to be identified of every width training image respectively;
The each website of step 3 extracts the feature in region to be identified in every width training image respectively, constructs the training image sample database DB of each websitei, described image sample data collection DBiPattern be relation schema R (A1,.,Ap,B1,.,Bq), wherein p and q is respectively the number of non-tag attributes and tag attributes, A1,.,ApFor the attribute-name of non-tag attributes, B1,.,BqFor the attribute-name of tag attributes, i=1,2 ..., n.Total training sample set DB=DB1∪DB2∪……∪DBn, DBi∩DBj=Φ, i ≠ j;
Step 4 eigenvalue discretization, each website carries out discretization to connection attribute respectively;
The excavation of step 5 Global Frequent Itemsets L;
The structure of step 6 multi-tag related sides transaction MLACR, the structure of described multi-tag related sides transaction MLACR is divided into structure and the generation of multi-tag related sides transaction MLACR of multi-tag frequent related sides transaction MLFCAR, and its step includes:
Step 6.1 constructs the former piece P and consequent Q of the frequent related sides transaction of multi-tag, wherein: described former piece is the non-label property set included in Global Frequent Itemsets L, and described consequent is the tag attributes collection included in Global Frequent Itemsets L;
Step 6.2 calculates the confidence level of each classifying rules in multi-tag frequent related sides transaction MLFCAR, wherein rule respectivelyConfidence calculations formula be: Count (P ∪ Q)/Count (P);
Step 6.3 deletes the confidence level classifying rules less than minconf in multi-tag frequent related sides transaction MLFCAR, constructs final multi-tag frequent related sides transaction MLFCAR, and wherein minconf is minimal confidence threshold;
Step 6.4 related sides transaction frequent to multi-tag MLFCAR carries out yojan, obtains multi-tag related sides transaction MLACR;
Step 7 image recognition.
The concrete steps of the excavation of described step 5 Global Frequent Itemsets L include:
Step 5.1 initializes, comprising:
Step 5.1.1 selects n website S1、S2、……、SnIn a website or other unique host as master computer (being designated as website S);
The each website of step 5.1.2 is respectively by training image sample data DB thereoniIt is converted into the data base of binary form, is the most still designated as DBi
Step 5.1.3 arranges the non-label property set NLA in DB and tag attributes collection LA, NLA={A1,.,Ap, LA={B1,.,Bq};
The each website of step 5.1.4 adds up the support number of each attribute in non-label property set NLA and tag attributes collection LA respectively, and is supported that number issues main frame S;
The overall situation in step 5.1.5 Framework computing tag attributes frequently 1-Item Sets LL1=c ∈ LA | sup (c) >=minsup};
The overall situation in the non-tag attributes of step 5.1.6 Framework computing frequently 1-Item Sets NLL1=c ∈ NLA | sup (c) >=minsup};
The overall situation in step 5.1.7 Framework computing DB frequently 1-Item Sets L1=LL1∪NLL1
Wherein: minsup is given minimum support threshold value;C is given Item Sets;Count (c), for supporting number, is the number of times that occurs in DB of Item Sets c;Sup (c) is support, and Sup (c)=Count (c)/| DB |, | DB | represent the number of sample in total training image sample data set DB;
The generation (being completed by main frame S) of step 5.2 overall situation candidate's Frequent Item Sets, comprising:
Step 5.2.1 is according to the overall situation frequent k-Item Sets LkGenerate overall situation candidate frequently (k+1)-Item Sets Ck+1, wherein k is the length of Global Frequent Itemsets;
The each website of step 5.2.2 calculates overall situation candidate's Frequent Item Sets Ck+1The support number of middle projects collection;
Step 5.2.3 main frame statistics overall situation candidate's Frequent Item Sets Ck+1The support number of middle projects collection, and generate the overall situation frequently (k+1)-Item Sets L according to minimum support threshold value minsupk+1
Step 5.2.4 repeats step 5.2.1, step 5.2.2, step 5.2.3, if it is empty for generating overall situation candidate's Frequent Item Sets, then enters step 5.3;
Step 5.3 generates Global Frequent Itemsets
The concrete steps that described step 7 main frame carries out image recognition include:
The preparation of step 7.1 figure to be identified and pretreatment, including the conversion of training image form, dimension normalization, denoising and enhancing;
Step 7.2 main frame uses image partition method based on Density Clustering to identify the region to be identified of described image to be identified;
Step 7.3 extracts the non-label attribute character in region to be identified in described image to be identified;
Step 7.4 non-label attribute character value discretization;
Step 7.5 identifies the tag attributes of described image to be identified according to multi-tag related sides transaction MLACR.
Described step 5.2.1 is according to the overall situation frequent k-Item Sets LkGenerate overall situation candidate frequently (k+1)-Item Sets Ck+1Concrete steps include:
Step 5.2.1.1 selects the overall situation frequently k-Item Sets LkIn any two disparity items collection c1And c2If, described c1And c2Or operation result comprises and only comprises k+1 1, then Ck+1+={ c1∪c2};
Step 5.2.1.2 repeats step 5.2.1.1, until completing all of Item Sets to comparing, obtains overall situation candidate frequently (k+1)-Item Sets Ck+1
Step 5.2.1.3 is to Ck+1In any Item Sets c, if there is subset c of a length of k in described Item Sets c3, andDelete described Item Sets c;
Step 5.1.2.4 deletes Ck+1In only comprise tag attributes or the Item Sets of non-tag attributes.
Described step 6.4 related sides transaction frequent to multi-tag MLFCAR carries out yojan, and the concrete steps obtaining multi-tag related sides transaction MLACR include:
Step 6.4.1 selects the regular R1 that in multi-tag frequent related sides transaction MLFCAR, former piece length is the shortest;
Step 6.4.2 calculates MLACR=MLACR ∪ { R1};
MLFCAR=MLFCAR-{R1};
Regular R in step 6.4.3 related sides transaction frequent for each multi-tag MLFCAR, if rule R1 cover up rule R, then performs
MLFCAR=MLFCAR-{R};
MLACR=MLACR ∪ { R};
If step 6.4.4 multi-tag frequent related sides transaction MLFCAR is not empty, repeated execution of steps 6.4.1 is to step 6.4.4.
Regular R1 cover up rule R in described step 6.4.3 refers to for multi-tag related sides transaction R1:And R:MeetWherein P1 and Q1 is respectively former piece and the consequent of rule R1, P2 and Q2 is respectively former piece and the consequent of rule R.
The non-tag attributes of described step 3 includes average, variance, gradient, kurtosis, energy, entropy and cluster feature.
Mainly the providing the benefit that of the present invention can disposably identify training sample distributed in the case of multiple labels of being comprised of image, and the aspect such as the post processing of the structure of overall situation candidate's Frequent Item Sets and rule gives corresponding optimal solution during just identifying, it is mainly reflected in:
(1) the generation aspect of overall situation Candidate itemsets
For the generation problem of overall situation Candidate itemsets in association rule mining, the present invention proposes a kind of overall Candidate itemsets based on binary form and generates, prune and support the methods such as number calculating, method binary form describes training sample and overall situation Candidate itemsets, simplifying the generation of overall situation Candidate itemsets, transmit and support the calculating process of number, reduce algorithm realizes difficulty.It addition, through twice cut operation, hence it is evident that reduce the scale of overall situation Candidate itemsets, further increase the execution efficiency of algorithm.
(2) the post processing aspect of multi-tag related sides transaction
For related sides transaction MLFCAR frequent for multi-tag, it there may be the problem of two aspects, and one is the rule comprising mutually covering in MLFCAR;Two is to comprise conflicting rule in MLFCAR.To this, The present invention gives the reduction method of a kind of multi-tag related sides transaction, the method may insure that and not have superfluos rule in MLACR, is very easy to the use of rule, further increases effectiveness of the invention and operability.
Accompanying drawing explanation
Fig. 1 is the structured flowchart of the embodiment of the present invention
Fig. 2 is that in the embodiment of the present invention, Global Frequent Itemsets excavates flow chart
Fig. 3 is structure flow process in multi-tag related sides transaction storehouse in the embodiment of the present invention
Detailed description of the invention
If n website under distributional environment is S respectively1、S2、……、Sn, between them in addition to by network delivery information, other resource (such as hard disk, internal memory etc.) is all independent, website Si(i=1,2 ..., n) on training image sample data set be DBi, total training sample set DB=DB1∪DB2∪……∪DBn, and DBi∩DBj=Φ, i ≠ j.R(A1,.,Ap,B1,.,Bq) it is training image sample data set DBiRelation schema, wherein p and q is respectively the number of non-tag attributes and tag attributes, A1,.,ApFor the attribute-name of non-tag attributes, B1,.,BqAttribute-name for tag attributes.As it is shown in figure 1, it mainly includes the content of the following aspects:
(1) pretreatment
Each website is trained the preparation of image sample data collection, form conversion, dimension normalization, denoising, enhancing respectively.
(2) image segmentation
Each website all uses image partition method based on Density Clustering to identify the region to be identified of every width training image respectively.
(3) feature extraction
Each website extracts the feature in region to be identified in every width training image respectively, constructs the training image sample database DB of each websitei, i=1,2 ..., n.
(4) eigenvalue discretization
Each website carries out discretization respectively to connection attribute, and non-tag attributes is united by the following method:
1. numerical attribute values presses Interval Discrete, be mapped to continuous Positive Integer Set 0,1,2 ... on };Referring specifically to EXAMPLEPART.
2. Category Attributes value press lexcographical order sequence, be mapped to continuous Positive Integer Set 0,1,2 ... on }.
(5) excavation of Global Frequent Itemsets L
If minimum support threshold value is minsup, given Item Sets c, if it supports that number scale is Count (c), corresponding support is designated as Sup (c), Sup (c)=Count (c)/| DB |, | DB | represent the number of sample in training image sample data set DB.As in figure 2 it is shown, the excavation of Global Frequent Itemsets L is divided into initialization, the generation of overall situation candidate's Frequent Item Sets, the calculating of overall situation candidate's Frequent Item Sets support number.
1. initialize
Initialization comprises the following steps:
A selects n website S1、S2、……、SnIn a website or other unique host as master computer (being designated as website S), this machine is used for adding up the support number (spending) of Global Frequent Itemsets;
The each website of b is respectively by training image sample database DB thereoniIt is converted into the data base of binary form, is designated as DBi, such as p=5, during q=2, record R1(A1=1, A2=0, A3=1, A14=1, A5=1, B1=1, B2=1) result after changing is 1011111;
C arranges the non-label property set NLA in DB and tag attributes collection LA, NLA={A1,.,Ap, LA={B1,.,Bq};
The each website of d adds up the support number of each attribute in non-label property set NLA and tag attributes collection LA respectively, and is supported that number issues main frame S;
eLL1=c ∈ LA | sup (c) >=minsup};//LL1For the frequently 1-Item Sets of the overall situation in tag attributes, main frame S calculate and be saved on main frame S
fNLL1=c ∈ NLA | sup (c) >=minsup};//NLL1For the frequently 1-Item Sets of the overall situation in non-tag attributes, main frame S calculate and be saved on main frame S
gL1=LL1∪NLL1;//L1For the frequently 1-Item Sets of the overall situation in DB, it is saved on main frame S
2. the generation (being completed by main frame S) of overall situation candidate's Frequent Item Sets
If LkBeing the Global Frequent Itemsets of a length of k, Global Frequent Itemsets therein is binary form, by LkGenerate overall situation candidate frequently (k+1)-Item Sets Ck+1Method comprise the following steps:
aCk+1=Φ;//Ck+1The set of the binary number corresponding to all overall situation candidate frequently (k+1)-Item Sets, is initially empty set
3. overall situation candidate's Frequent Item Sets Ck+1The calculating of number supported by middle projects collection
Main frame S is by Ck+1In overall Candidate itemsets be conveyed directly to each website, each website Si(i=1,2 ..., n) calculate Ck+1Middle projects collection supports that the method for number comprises the following steps:
4. the generation of Global Frequent Itemsets L
Global Frequent Itemsets L generates and comprises the following steps:
A main frame S accepts the support number from each website;
B calculates Ck+1In each the overall situation Candidate itemsets support number;
cLk+1={ c ∈ Ck+1|sup(c)≥minsup}
DL=L1∪L2∪…;
(6) structure (being completed by main frame S) of multi-tag related sides transaction MLACR
If minimal confidence threshold is minconf, the structure of multi-tag related sides transaction MLACR is divided into structure and the generation of multi-tag related sides transaction MLACR of multi-tag frequent related sides transaction MLFCAR.
1. the structure of multi-tag frequent related sides transaction MLFCAR
The structure of MLFCAR comprises the following steps:
2. the generation of multi-tag related sides transaction MLACR
The generation of MLACR comprises the following steps:
Definition 1 is for two given multi-tag related sides transaction R1:And R2:If Then claim rule R1Cover up rule R2
(7) identification (main frame completes) of image
For the image t of a width Unknown Label collection, its identification process comprises the following steps:
1. pretreatment
Image t carries out form conversion, dimension normalization, denoising, enhancing etc. process.
2. image segmentation
Image partition method based on Density Clustering is used to identify the region to be identified of image t.
3. feature extraction
Extract the feature in region to be identified in image t.
4. eigenvalue discretization
The concrete grammar of eigenvalue discretization sees step (4).
5. image recognition
If the Discrete Eigenvalue that image t obtains after above-mentioned 4 steps process is Vt, Vt=(t.A1=t1,……,t.Ai=ti,……,t.Ap=tp).The identification process of image t comprises the following steps:
Medical image is embodiment the most in a distributed manner, explains the execution process of the present invention.This example have selected 100 width medical images altogether, and they are respectively distributed on the website of three platform independent, website 1,2,3 respectively houses 35,35,30 width sample medical images, separately has the main frame S of a platform independent, q=4, B1、B2、B3、B4Being respectively disease 1, disease 2, disease 3, disease 4, concrete execution step is as follows:
(1) each website carries out form conversion, dimension normalization, denoising, enhancement process to this 100 width medical image respectively.
(2) each website is split respectively and extracts the correlated characteristic in region to be identified in every width medical image and be normalized, and result is as shown in table 1.The feature that present example is extracted includes average, variance, gradient, kurtosis, energy, entropy and cluster feature, i.e. p=7, A1、A2、A3、A4、A5、A6、A7It is respectively average, variance, gradient, kurtosis, energy, entropy, cluster feature.
(3) numerical attribute discretization.Each website carries out sliding-model control to each attribute in table 1 respectively, its method can use wide division, etc. deep divide or the method such as division based on distance.This example uses wide division, 0 to 1 interval division will become 20 parts, be respectively (0.00,0.05], (0.05,0.10] ..., (0.95,1.00].As: Article 4 record { 0.3974,0.4812,0.5222,0.4316, the discretized values of 0.1525,0.7633,0.6608} is: (0.35,0.40], (0.45,0.50], (0.50,0.55], (0.40,0.45], (0.15,0.20], (0.75,0.80], (0.65,0.70] }.
Table 1 medical image features table
(4) discrete segment integer.The discrete segment of numerical attribute is mapped to continuous print integer mark by each website respectively, incite somebody to action respectively (0.00,0.05], (0.05,0.10] ..., (0.95,1.00] 1,2,3 it is mapped to, ..., 20, then it is { 08 after Article 4 record discrete segment integer, 10,11,09,04,16,14}.After treatment, table 1 is converted into such as the form of table 2.
(5) binarization of attribute.Property value after discretization is carried out Binary Conversion by each website respectively, and table 2 is converted into such as the form of table 3, and these binary numbers will reside in each website, its purpose is to convenient overall situation Candidate itemsets and supports the calculating of number.
Result table after table 2 discrete segment integer
Result table after table 3 attribute binarization
(5) excavation of Global Frequent Itemsets L
If minimum support threshold value minsup is 0.2, generate including at least one of { average, variance, gradient, kurtosis, energy, entropy and cluster feature } and disease 1, disease 2, disease 3, the Global Frequent Itemsets L of one of disease 4}, specific as follows:
The most each station scans result table after attribute binarization on it, obtains the support number of each property value, and is transferred to main frame S, and main frame S adds up the support number of each property value, generates the overall situation frequently 1-Item Sets L according to minimum support threshold value minsup1, L1null={ { average=01000},{ average=01011},{ average=01010},{ variance=01011},{ variance=01010},{ variance=01001},{ gradient=01011},{ gradient=01100},{ kurtosis=01001},{ kurtosis=01000},{ energy=00011},{ energy=00100},{ cluster feature=01100},{ cluster feature=01110},{ disease 1},{ disease 2},{ disease 3},{ disease 4}},As after scan transfer Item Sets { the support number of average=01000} is 25,I.e. Count (average=01000})=25,Sup (average=01000}) and=Count (average=00111})/| T |=25/100=0.25,Due to Sup ({ average=01000}) > minsup,Thus Item Sets { average=01000} is a Global Frequent Itemsets,Remaining Item Sets is analogized;
2. main frame S is according to L1Generate overall candidate frequent 2-Item Sets C simultaneously comprising tag attributes and non-tag attributes2, C2={ { average=01000, disease 1}, { average=01000, disease 2}, { average=01000, disease 3}, { average=01000, disease 4}, { average=01011, disease 1}, { average=01011, disease 2}, { average=01011, disease 3}, average=01011, disease 4} ..., { gradient=01100, disease 1}, { gradient=01100, disease 2}, { gradient=01100, disease 3}, and gradient=01100, disease 4} ....
3. main frame S is by frequent for overall situation candidate 2-Item Sets C2Send website 1, website 2, website 3 to, it is able to know that transmitted implication in order to ensure each website, before transmission overall situation candidate's Frequent Item Sets is processed, as by { gradient=01100, disease 1} is processed into 000000000001100000000000000000000001000, and the most each website is just not required to understand everybody implication.
The most each station scans result table after attribute binarization on it, tries to achieve C2Middle projects collection is in the support (only need to carrying out or operate) of each website, if 001 record on website 1 is 010100100101011010010000110100011001010, due to 010100100101011010010000110100011001010 or 000000000001100000000000000000000001000 ≠ 010100100101011010010000110100011001010, so this Item Sets do not supported in this record.After calculating terminates, each website is transferred to main frame S.
5. main frame generates the overall situation frequently 2-Item Sets L according to minimum support threshold value minsup2。L2=average=01000, disease 2}, variance=01011, disease 2}, gradient=01011, disease 2} ..., average=01000, disease 1}, kurtosis=01001, disease 4} ....
6. according to L2Generate overall candidate frequent 3-Item Sets C simultaneously comprising tag attributes and non-tag attributes3, by frequent for overall situation candidate 3-Item Sets C3Sending website 1, website 2, website 3 to, each station scans correspondence table once, tries to achieve C3Middle projects collection is in the support of each website, and each website is transferred to main frame S, main frame S and generates the overall situation frequently 3-Item Sets L according to minimum support threshold value minsup3.Try to achieve L the most respectively4、L5、…...、Lk, its termination condition is: according to LkGenerate candidate (k+1)-Item Sets C simultaneously comprising tag attributes and non-tag attributes(k+1)For empty set.
7. collect result above, obtain Global Frequent Itemsets L,
L={{ average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011, cluster feature=01100, disease 2}, { average=01011, variance=01010, gradient=01100, kurtosis=00110, disease 4}, { average=01000, variance=01011, gradient=01011, kurtosis=01001, disease 1, disease 2, disease 4}, { average=01010, variance=01001, gradient=01100, kurtosis=01000, energy=00100, cluster feature=01110, disease 2, disease 4} ....
(6) structure (being completed by main frame S) of multi-tag related sides transaction MLACR
The structure of multi-tag related sides transaction is divided into structure and the generation of multi-tag related sides transaction MLACR of multi-tag frequent related sides transaction MLFCAR.
If minimal confidence threshold minconf is 0.6, the structure of multi-tag frequent related sides transaction MLFCAR comprises the following steps:
1. constructing former piece and the consequent of each classifying rules in multi-tag frequent related sides transaction MLFCAR, former piece is the non-label property set that Global Frequent Itemsets in L is comprised, and consequent is the tag attributes collection that Global Frequent Itemsets in L is comprised.Such as Item Sets { average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011, cluster feature=01100, the former piece of disease 2}, consequent are respectively { average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011, cluster feature=01100} and { disease 2};{ average=01011, variance=01010, gradient=01100, kurtosis=00110, the former piece of disease 4}, consequent are respectively { average=01011, variance=01010, gradient=01100, kurtosis=00110} and { disease 4} to Item Sets;Item Sets { average=01000, variance=01011, gradient=01011, kurtosis=01001, disease 1, disease 2, the former piece of disease 4}, consequent are respectively { average=01000, variance=01011, gradient=01011, kurtosis=01001} and { disease 1, disease 2, disease 4}.Remaining Global Frequent Itemsets makees same process, thus obtains initial multi-tag frequent related sides transaction MLFCAR.
MLFCAR={{ average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011,Average=01011, and variance=10, gradient=01100, Average=01000, and variance=01011, gradient=01011, Average=10, and variance=01001, gradient=01100, kurtosis=01000, energy=00100,
Calculate the confidence level of each classifying rules in MLFCAR the most respectively.RuleConfidence calculations formula be: Count (P ∪ Q)/Count (P), Count (P ∪ Q), the occurrence of Count (P) are tried to achieve in the mining process of Frequent Item Sets L.As rule average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011,nullConfidence level be: Count ({ average=01000,Variance=01011,Gradient=01011,Kurtosis=01001,Energy=00011,Cluster feature=01100,Disease 2})/Count ({ average=01000,Variance=01011,Gradient=01011,Kurtosis=01001,Energy=00011,Cluster feature=01100}),Count ({ average=01000,Variance=01011,Gradient=01011,Kurtosis=01001,Energy=00011,Cluster feature=01100,Disease 2})=17,Count ({ average=01000,Variance=01011,Gradient=01011,Kurtosis=01001,Energy=00011,Cluster feature=01100})=20,Its confidence level is 17/20,It is 0.85.The confidence level of other classifying ruless of MLFCAR can be calculated by same method.
3. delete the confidence level classifying rules less than 0.6 in MLFCAR, construct final multi-tag frequent related sides transaction MLFCAR, thus can obtain MLFCAR.
MLFCAR={{ average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011,Average=01000, and variance=01011, gradient=01011, Average=10, and variance=01001, gradient=01100, kurtosis=01000, energy=00100,
4. MLFCAR is carried out yojan, deletes the part superfluos rule in MLFCAR, such as the Article 1 rule in MLFCAR is exactly unnecessary, can obtain multi-tag related sides transaction MLACR accordingly.
MLACR={{ average=01000, variance=01011, gradient=01011, Average=01010, and variance=01001, gradient=01100, kurtosis=01000, energy=00100,
(7) identification of image
For the image t of a width Unknown Label collection, after preprocessed, image segmentation, feature extraction, eigenvalue discretization, obtain the Discrete Eigenvalue V of its correspondencet
Such as Vt={ average=01000, variance=01011, gradient=01011, kurtosis=01001, kurtosis=01001, energy=01010, entropy=01010, cluster feature=01101}, VtComprising the former piece of the rule of Article 1 in MLACR, therefore the tally set of image t is the consequent of this rule, and its tally set is that { disease 4}, i.e. this image may comprise and " disease 1 ", " disease 2 ", information that " disease 3 " is relevant simultaneously for disease 1, disease 2.
Such as Vt={ average=01000, variance=01001, gradient=01100, kurtosis=01000, kurtosis=01001, energy=01010, entropy=01010, cluster feature=01110}, owing to not existing by V in strictly all rules former piece in MLACRtThe rule comprised, takes former piece and V to thistIntersect most rules, i.e. rule average=01010, variance=01001, gradient=01100, kurtosis=01000, energy=00100,This image may comprise and " disease 2 ", information that " disease 4 " is relevant simultaneously.
The present embodiment is the identification process of a kind of medical image, and the method can also apply other field of image recognition being similar to therewith, such as the accompanying drawing data etc. in patent.

Claims (6)

1. a distributed multi-tag image-recognizing method based on correlation rule, including the generation of overall situation candidate's Frequent Item Sets, the calculating supporting number and image recognizing step, it is characterised in that: the generation of described overall situation candidate's Frequent Item Sets, the calculating of support number and image recognizing step include:
The preparation of step 1 distributed image sample data set and pretreatment, including each website training image form conversion, dimension normalization, denoising, enhancing;
The each website of step 2 uses image partition method based on Density Clustering to identify the region to be identified of every width training image respectively;
The each website of step 3 extracts the feature in region to be identified in every width training image respectively, constructs the training image sample data set DB of each websitei, described image sample data collection DBiPattern be relation schema R (A1,.,Ap,B1,.,Bq), wherein p and q is respectively the number of non-tag attributes and tag attributes, A1,.,ApFor the attribute-name of non-tag attributes, B1,.,BqFor the attribute-name of tag attributes, i=1,2 ..., n.Total training sample set DB=DB1∪DB2∪……∪DBn, DBi∩DBj=Φ, i ≠ j;
Step 4 eigenvalue discretization, each website carries out discretization to connection attribute respectively;
The excavation of step 5 Global Frequent Itemsets L;
The structure of step 6 multi-tag related sides transaction MLACR, the structure of described multi-tag related sides transaction MLACR is divided into structure and the generation of multi-tag related sides transaction MLACR of multi-tag frequent related sides transaction MLFCAR, and its step includes:
Step 6.1 constructs the former piece P and consequent Q of the frequent related sides transaction of multi-tag, wherein: described former piece is the non-label property set included in Global Frequent Itemsets L, and described consequent is the tag attributes collection included in Global Frequent Itemsets L;
Step 6.2 calculates the confidence level of each classifying rules in multi-tag frequent related sides transaction MLFCAR, wherein rule respectivelyConfidence calculations formula be: Count (P ∪ Q)/Count (P);
Step 6.3 deletes the confidence level classifying rules less than minconf in multi-tag frequent related sides transaction MLFCAR, constructs final multi-tag frequent related sides transaction MLFCAR, and wherein minconf is minimal confidence threshold;
Step 6.4 related sides transaction frequent to multi-tag MLFCAR carries out yojan, obtains multi-tag related sides transaction MLACR;
Step 7 image recognition.
A kind of distributed multi-tag image-recognizing method based on correlation rule the most according to claim 1, it is characterised in that: the concrete steps of described step 5 include:
Step 5.1 initializes, comprising:
Step 5.1.1 selects n website S1、S2、……、SnIn a website or other unique host as master computer (being designated as website S);
The each website of step 5.1.2 is respectively by training image sample data DB thereoniIt is converted into the data base of binary form, is the most still designated as DBi
Step 5.1.3 arranges the non-label property set NLA in DB and tag attributes collection LA, NLA={A1,.,Ap, LA={B1,.,Bq};
The each website of step 5.1.4 adds up the support number of each attribute in non-label property set NLA and tag attributes collection LA respectively, and is supported that number issues main frame S;
The overall situation in step 5.1.5 Framework computing tag attributes frequently 1-Item Sets LL1=c ∈ LA | sup (c) >=minsup};
The overall situation in the non-tag attributes of step 5.1.6 Framework computing frequently 1-Item Sets NLL1=c ∈ NLA | sup (c) >=minsup};
The overall situation in step 5.1.7 Framework computing DB frequently 1-Item Sets L1=LL1∪NLL1
Wherein: minsup is given minimum support threshold value;C is given Item Sets;Count (c), for supporting number, is the number of times that occurs in DB of Item Sets c;Sup (c) is support, and Sup (c)=Count (c)/| DB |, | DB | represent the number of sample in total training image sample data set DB;
The generation (being completed by main frame S) of step 5.2 overall situation candidate's Frequent Item Sets, comprising:
Step 5.2.1 is according to the overall situation frequent k-Item Sets LkGenerate overall situation candidate frequently (k+1)-Item Sets Ck+1, wherein k is the length of Global Frequent Itemsets;
The each website of step 5.2.2 calculates overall situation candidate's Frequent Item Sets Ck+1The support number of middle projects collection;
Step 5.2.3 main frame statistics overall situation candidate's Frequent Item Sets Ck+1The support number of middle projects collection, and generate the overall situation frequently (k+1)-Item Sets L according to minimum support threshold value minsupk+1
Step 5.2.4 repeats step 5.2.1, step 5.2.2, step 5.2.3, if it is empty for generating overall situation candidate's Frequent Item Sets, then enters step 5.3;
Step 5.3 generates Global Frequent Itemsets
A kind of distributed multi-tag image-recognizing method based on correlation rule the most according to claim 1, it is characterised in that: the concrete steps of described step 7 include:
The preparation of step 7.1 figure to be identified and pretreatment, including the conversion of training image form, dimension normalization, denoising and enhancing;
Step 7.2 main frame uses image partition method based on Density Clustering to identify the region to be identified of described image to be identified;
Step 7.3 extracts the non-label attribute character in region to be identified in described image to be identified;
Step 7.4 non-label attribute character value discretization;
Step 7.5 identifies the tag attributes of described image to be identified according to multi-tag related sides transaction MLACR.
A kind of distributed multi-tag image-recognizing method based on correlation rule the most according to claim 1, it is characterised in that: the concrete steps of described step 5.2.1 include:
Step 5.2.1.1 selects the overall situation frequently k-Item Sets LkIn any two disparity items collection c1And c2If, described c1And c2Or operation result comprises and only comprises k+1 1, then Ck+1+={ c1∪c2};
Step 5.2.1.2 repeats step 5.2.1.1, until completing all of Item Sets to comparing, obtains overall situation candidate frequently (k+1)-Item Sets Ck+1
Step 5.2.1.3 is to Ck+1In any Item Sets c, if there is subset c of a length of k in described Item Sets c3, andDelete described Item Sets c;
Step 5.1.2.4 deletes Ck+1In only comprise tag attributes or the Item Sets of non-tag attributes.
A kind of distributed multi-tag image-recognizing method based on correlation rule the most according to claim 1, it is characterised in that: the concrete steps of described step 6.4 include:
Step 6.4.1 selects the regular R1 that in multi-tag frequent related sides transaction MLFCAR, former piece length is the shortest;
Step 6.4.2 calculates MLACR=MLACR ∪ { R1};
MLFCAR=MLFCAR-{R1};
Regular R in step 6.4.3 related sides transaction frequent for each multi-tag MLFCAR, if rule R1 cover up rule R, then performs
MLFCAR=MLFCAR-{R};
MLACR=MLACR ∪ { R};
If step 6.4.4 multi-tag frequent related sides transaction MLFCAR is not empty, repeated execution of steps 6.4.1 is to step 6.4.4.
Regular R1 cover up rule R in described step 6.4.3 refers to for multi-tag related sides transaction R1:And R:MeetWherein P1 and Q1 is respectively former piece and the consequent of rule R1, P2 and Q2 is respectively former piece and the consequent of rule R.
A kind of distributed multi-tag image-recognizing method based on correlation rule, it is characterised in that: the non-tag attributes of described step 3 includes average, variance, gradient, kurtosis, energy, entropy and cluster feature.
CN201610141659.8A 2016-03-11 2016-03-11 Association-rule-based distributed multi-label image identification method Pending CN105825226A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610141659.8A CN105825226A (en) 2016-03-11 2016-03-11 Association-rule-based distributed multi-label image identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610141659.8A CN105825226A (en) 2016-03-11 2016-03-11 Association-rule-based distributed multi-label image identification method

Publications (1)

Publication Number Publication Date
CN105825226A true CN105825226A (en) 2016-08-03

Family

ID=56987917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610141659.8A Pending CN105825226A (en) 2016-03-11 2016-03-11 Association-rule-based distributed multi-label image identification method

Country Status (1)

Country Link
CN (1) CN105825226A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503575A (en) * 2016-09-22 2017-03-15 广东工业大学 A kind of Mining Association Rules in Distributed Environments method for protecting privacy
CN106529580A (en) * 2016-10-24 2017-03-22 浙江工业大学 EDSVM-based software defect data association classification method
CN106875367A (en) * 2017-03-15 2017-06-20 中山大学 A kind of automatic delineation method in primary lesion of nasopharyngeal carcinoma clinic target area based on mutual correlation rule
CN107092591A (en) * 2017-03-30 2017-08-25 南京理工大学 Multiple labeling Chinese emotional reaction categorization method based on correlation rule
CN110263804A (en) * 2019-05-06 2019-09-20 杭州电子科技大学 A kind of medical image dividing method based on safe semi-supervised clustering
CN110781323A (en) * 2019-10-25 2020-02-11 北京达佳互联信息技术有限公司 Method and device for determining label of multimedia resource, electronic equipment and storage medium
CN110990434A (en) * 2019-11-29 2020-04-10 国网四川省电力公司信息通信公司 Spark platform grouping and Fp-Growth association rule mining method
CN112364933A (en) * 2020-11-23 2021-02-12 北京达佳互联信息技术有限公司 Image classification method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102364498A (en) * 2011-10-17 2012-02-29 江苏大学 Multi-label-based image recognition method
CN103150515A (en) * 2012-12-29 2013-06-12 江苏大学 Association rule mining method for privacy protection under distributed environment
CN104298975A (en) * 2014-10-13 2015-01-21 江苏大学 Distributed image identification method
CN104715258A (en) * 2013-12-17 2015-06-17 镇江金全软件有限公司 Distributed image recognition method based on SVM

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102364498A (en) * 2011-10-17 2012-02-29 江苏大学 Multi-label-based image recognition method
CN103150515A (en) * 2012-12-29 2013-06-12 江苏大学 Association rule mining method for privacy protection under distributed environment
CN104715258A (en) * 2013-12-17 2015-06-17 镇江金全软件有限公司 Distributed image recognition method based on SVM
CN104298975A (en) * 2014-10-13 2015-01-21 江苏大学 Distributed image identification method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
(美)RAJKUMAR BUYYA编,郑纬民等译: "《高性能集群计算 编程与应用 第2卷》", 31 July 2001, 电子工业出版社 *
王治和等: "分布式关联规则挖掘研究", 《南京师大学报(自然科学版)》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503575A (en) * 2016-09-22 2017-03-15 广东工业大学 A kind of Mining Association Rules in Distributed Environments method for protecting privacy
CN106503575B (en) * 2016-09-22 2019-03-05 广东工业大学 A kind of Mining Association Rules in Distributed Environments method for protecting privacy
CN106529580A (en) * 2016-10-24 2017-03-22 浙江工业大学 EDSVM-based software defect data association classification method
CN106875367A (en) * 2017-03-15 2017-06-20 中山大学 A kind of automatic delineation method in primary lesion of nasopharyngeal carcinoma clinic target area based on mutual correlation rule
CN106875367B (en) * 2017-03-15 2019-09-27 北京思创科泰科技有限公司 A kind of automatic delineation method in primary lesion of nasopharyngeal carcinoma clinic target area based on mutual correlation rule
CN107092591A (en) * 2017-03-30 2017-08-25 南京理工大学 Multiple labeling Chinese emotional reaction categorization method based on correlation rule
CN107092591B (en) * 2017-03-30 2020-06-30 南京理工大学 Multi-label Chinese emotion classification method based on association rule
CN110263804A (en) * 2019-05-06 2019-09-20 杭州电子科技大学 A kind of medical image dividing method based on safe semi-supervised clustering
CN110781323A (en) * 2019-10-25 2020-02-11 北京达佳互联信息技术有限公司 Method and device for determining label of multimedia resource, electronic equipment and storage medium
CN110990434A (en) * 2019-11-29 2020-04-10 国网四川省电力公司信息通信公司 Spark platform grouping and Fp-Growth association rule mining method
CN110990434B (en) * 2019-11-29 2023-04-18 国网四川省电力公司信息通信公司 Spark platform grouping and Fp-Growth association rule mining method
CN112364933A (en) * 2020-11-23 2021-02-12 北京达佳互联信息技术有限公司 Image classification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105825226A (en) Association-rule-based distributed multi-label image identification method
CN102364498B (en) Multi-label-based image recognition method
CN106529968B (en) Customer classification method and system based on transaction data
CN109243618B (en) Medical model construction method, disease label construction method and intelligent device
Wu et al. Positive and unlabeled multi-graph learning
CN104715021B (en) A kind of learning method of the Multi-label learning based on hash method
CN106682411A (en) Method for converting physical examination diagnostic data into disease label
CN111694963B (en) Key government affair flow identification method and device based on item association network
CN109376796A (en) Image classification method based on active semi-supervised learning
CN104239553A (en) Entity recognition method based on Map-Reduce framework
CN110688549B (en) Artificial intelligence classification method and system based on knowledge system map construction
CN112926045B (en) Group control equipment identification method based on logistic regression model
WO2018113370A1 (en) Method, device, and system for increasing users
CN109858518A (en) A kind of large data clustering method based on MapReduce
CN113452802A (en) Equipment model identification method, device and system
CN102722578B (en) Unsupervised cluster characteristic selection method based on Laplace regularization
CN115098650A (en) Comment information analysis method based on historical data model and related device
CN104699851A (en) Service tag extension method in big data environment
CN103929499B (en) A kind of Internet of Things isomery index identification method and system
Xiao et al. Superpixel-guided two-view deterministic geometric model fitting
CN107729377A (en) Customer classification method and system based on data mining
CN109857892B (en) Semi-supervised cross-modal Hash retrieval method based on class label transfer
CN110347827A (en) Event Distillation method towards isomery text operation/maintenance data
Goyal et al. Leaf Bagging: A novel meta heuristic optimization based framework for leaf identification
Zhu et al. Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160803

WD01 Invention patent application deemed withdrawn after publication