CN105825226A

CN105825226A - Association-rule-based distributed multi-label image identification method

Info

Publication number: CN105825226A
Application number: CN201610141659.8A
Authority: CN
Inventors: 彭彦; 朱玉全; 李竞; 何峰; 余飞
Original assignee: Jiangsu Changyuan Information Technology Co ltd
Current assignee: Jiangsu Changyuan Information Technology Co ltd
Priority date: 2016-03-11
Filing date: 2016-03-11
Publication date: 2016-08-03

Abstract

The invention discloses an association-classification-rule-based distributed multi-label image identification method. The method comprises steps of distributed image sample pretreatment, image segmentation, feature extraction, feature value discretization, mining of a global frequent item set L, multi-label association classification rule (MLACR) construction, and image identification. With utilization of a binary-form-based global candidate item set generation, trimming, and supporting number calculation method is employed, the realization difficulty and the communication traffic of the algorithm are reduced. According to the method, trimming is carried out twice, so that the global candidate item set scale is reduced obviously and the algorithm execution efficiency is improved further. With a reduction method, superfluous rule occurrence in the MLACR can be avoided. Multiple labels includes by an image on the training sample distributed condition can be identified by one time; calculation of the global candidate frequent item set and the supporting number in a distributed environment can be realized rapidly; and the accuracy and high-efficiency multi-label image identification function can be realized.

Description

A kind of distributed multi-tag image-recognizing method based on correlation rule

Technical field

The invention belongs to the application of the computer analytical technology of multi-tag image, be specifically related to a kind of recognition methods with distributed multi-tag image.

Background technology

Multi-tag image recognition is an important research branch in data mining technology, and it is intended to construct a classification function or grader by training image sample data set, and utilizes this classification function or grader to identify the tally set of testing image.At present, the multi-tag sorting technique that can be used for image recognition has ML-KNN, C4.5, Bp-MLL, PT series of improvement, PPT, PPT-n, MMAC, RAKEL, RPC, CLR, INSDIF, MLRW, ML-CMBAR2 etc..ML-KNN is a kind of based on KNN multi-tag sorting technique that ZhangM.L. et al. proposes, the method draws the prior probability of each label by statistical method, when inputting testing image data x, each label s in tally set S is calculated x respectively there is label s and not there is the probability of label s, and then whether prediction x has label s.Algorithm Bp-MLL is by defining the global optimization function for multi-tag view data so that artificial neural network can process multi-tag data.PT serial algorithm attempts to existing sorting technique based on single label and solves multi-tag classification problem, the most disposably the training image sample data all training data samples comprising multiple label of concentration are converted into single label data, after respective handling, algorithm faced by training data sample set be single exemplar collection, thus multi-tag classification problem is converted into single labeling problem.Uncontrollability for number of labels new in PT method, algorithm PPT, PPT-n, RAKEL propose a series of processing method, algorithm PPT and PPT-n reduces the quantity of new label by arranging of threshold value, and algorithm RA KEL is then to reduce its quantity by the way of randomly choosing.Algorithm RPC, CLR then by relation between any two label in contrast tally set S, set up k (k-1)/2 grader, and each grader is voted between two labels, combines these voting results as final multi-tag classification results.ML-CMBAR2 is a kind of multi-tag sorting technique based on correlation rule.

In many actual application, data itself are distributions, and between them in addition to by network delivery information, other resource is all independent.For solving problems, a feasible scheme is to be focused on a certain machine by these data sets, recycles existing algorithm to construct multi-tag grader, or utilizes MapReduce programming model to the multi-tag grader constructing under distributional environment.Generally, at least there is the problem of two aspects in this type of thought, one is to need the performance computer that relatively (very) is high to store and process jumbo data, and two is under many circumstances, for the consideration to Information Security and privacy, data can not put together.To this, the present invention proposes a kind of distributed multi-tag image-recognizing method based on correlation rule, and the method, by finding that the correlation rule that under distributional environment, training image sample data is concentrated constructs multi-tag grader, is achieved in the automatic identification of image.

Summary of the invention

It is an object of the invention to provide a kind of method disposably image comprising multiple label being identified, the method can quickly generate the overall candidate's Frequent Item Sets under distributed environment and support the calculating of number, it is achieved the multi-tag image identification function of precise and high efficiency.

The technical scheme is that a kind of distributed multi-tag image-recognizing method based on correlation rule, including the generation of overall situation candidate's Frequent Item Sets, the calculating supporting number and image recognizing step, it is characterised in that: the generation of described overall situation candidate's Frequent Item Sets, the calculating of support number and image recognizing step include:

The preparation of step 1 distributed image sample data set and pretreatment, including each website training image form conversion, dimension normalization, denoising, enhancing；

The each website of step 2 uses image partition method based on Density Clustering to identify the region to be identified of every width training image respectively；

The each website of step 3 extracts the feature in region to be identified in every width training image respectively, constructs the training image sample database DB of each website_i, described image sample data collection DB_iPattern be relation schema R (A₁,.,A_p,B₁,.,B_q), wherein p and q is respectively the number of non-tag attributes and tag attributes, A₁,.,A_pFor the attribute-name of non-tag attributes, B₁,.,B_qFor the attribute-name of tag attributes, i=1,2 ..., n.Total training sample set DB=DB₁∪DB₂∪……∪DB_n, DB_i∩DB_j=Φ, i ≠ j；

Step 4 eigenvalue discretization, each website carries out discretization to connection attribute respectively；

The excavation of step 5 Global Frequent Itemsets L；

The structure of step 6 multi-tag related sides transaction MLACR, the structure of described multi-tag related sides transaction MLACR is divided into structure and the generation of multi-tag related sides transaction MLACR of multi-tag frequent related sides transaction MLFCAR, and its step includes:

Step 6.1 constructs the former piece P and consequent Q of the frequent related sides transaction of multi-tag, wherein: described former piece is the non-label property set included in Global Frequent Itemsets L, and described consequent is the tag attributes collection included in Global Frequent Itemsets L；

Step 6.2 calculates the confidence level of each classifying rules in multi-tag frequent related sides transaction MLFCAR, wherein rule respectivelyConfidence calculations formula be: Count (P ∪ Q)/Count (P)；

Step 6.3 deletes the confidence level classifying rules less than minconf in multi-tag frequent related sides transaction MLFCAR, constructs final multi-tag frequent related sides transaction MLFCAR, and wherein minconf is minimal confidence threshold；

Step 6.4 related sides transaction frequent to multi-tag MLFCAR carries out yojan, obtains multi-tag related sides transaction MLACR；

Step 7 image recognition.

The concrete steps of the excavation of described step 5 Global Frequent Itemsets L include:

Step 5.1 initializes, comprising:

Step 5.1.1 selects n website S₁、S₂、……、S_nIn a website or other unique host as master computer (being designated as website S)；

The each website of step 5.1.2 is respectively by training image sample data DB thereon_iIt is converted into the data base of binary form, is the most still designated as DB_i；

Step 5.1.3 arranges the non-label property set NLA in DB and tag attributes collection LA, NLA={A₁,.,A_p, LA={B₁,.,B_q}；

The each website of step 5.1.4 adds up the support number of each attribute in non-label property set NLA and tag attributes collection LA respectively, and is supported that number issues main frame S；

The overall situation in step 5.1.5 Framework computing tag attributes frequently 1-Item Sets LL₁=c ∈ LA | sup (c) >=minsup}；

The overall situation in the non-tag attributes of step 5.1.6 Framework computing frequently 1-Item Sets NLL₁=c ∈ NLA | sup (c) >=minsup}；

The overall situation in step 5.1.7 Framework computing DB frequently 1-Item Sets L₁=LL₁∪NLL₁；

Wherein: minsup is given minimum support threshold value；C is given Item Sets；Count (c), for supporting number, is the number of times that occurs in DB of Item Sets c；Sup (c) is support, and Sup (c)=Count (c)/| DB |, | DB | represent the number of sample in total training image sample data set DB；

The generation (being completed by main frame S) of step 5.2 overall situation candidate's Frequent Item Sets, comprising:

Step 5.2.1 is according to the overall situation frequent k-Item Sets L_kGenerate overall situation candidate frequently (k+1)-Item Sets C_k+1, wherein k is the length of Global Frequent Itemsets；

The each website of step 5.2.2 calculates overall situation candidate's Frequent Item Sets C_k+1The support number of middle projects collection；

Step 5.2.3 main frame statistics overall situation candidate's Frequent Item Sets C_k+1The support number of middle projects collection, and generate the overall situation frequently (k+1)-Item Sets L according to minimum support threshold value minsup_k+1；

Step 5.2.4 repeats step 5.2.1, step 5.2.2, step 5.2.3, if it is empty for generating overall situation candidate's Frequent Item Sets, then enters step 5.3；

Step 5.3 generates Global Frequent Itemsets

The concrete steps that described step 7 main frame carries out image recognition include:

The preparation of step 7.1 figure to be identified and pretreatment, including the conversion of training image form, dimension normalization, denoising and enhancing；

Step 7.2 main frame uses image partition method based on Density Clustering to identify the region to be identified of described image to be identified；

Step 7.3 extracts the non-label attribute character in region to be identified in described image to be identified；

Step 7.4 non-label attribute character value discretization；

Step 7.5 identifies the tag attributes of described image to be identified according to multi-tag related sides transaction MLACR.

Described step 5.2.1 is according to the overall situation frequent k-Item Sets L_kGenerate overall situation candidate frequently (k+1)-Item Sets C_k+1Concrete steps include:

Step 5.2.1.1 selects the overall situation frequently k-Item Sets L_kIn any two disparity items collection c₁And c₂If, described c₁And c₂Or operation result comprises and only comprises k+1 1, then C_k+1+={ c₁∪c₂}；

Step 5.2.1.2 repeats step 5.2.1.1, until completing all of Item Sets to comparing, obtains overall situation candidate frequently (k+1)-Item Sets C_k+1；

Step 5.2.1.3 is to C_k+1In any Item Sets c, if there is subset c of a length of k in described Item Sets c₃, andDelete described Item Sets c；

Step 5.1.2.4 deletes C_k+1In only comprise tag attributes or the Item Sets of non-tag attributes.

Described step 6.4 related sides transaction frequent to multi-tag MLFCAR carries out yojan, and the concrete steps obtaining multi-tag related sides transaction MLACR include:

Step 6.4.1 selects the regular R1 that in multi-tag frequent related sides transaction MLFCAR, former piece length is the shortest；

Step 6.4.2 calculates MLACR=MLACR ∪ { R₁}；

MLFCAR=MLFCAR-{R₁}；

Regular R in step 6.4.3 related sides transaction frequent for each multi-tag MLFCAR, if rule R1 cover up rule R, then performs

MLFCAR=MLFCAR-{R}；

MLACR=MLACR ∪ { R}；

If step 6.4.4 multi-tag frequent related sides transaction MLFCAR is not empty, repeated execution of steps 6.4.1 is to step 6.4.4.

Regular R1 cover up rule R in described step 6.4.3 refers to for multi-tag related sides transaction R₁:And R:MeetWherein P1 and Q1 is respectively former piece and the consequent of rule R1, P2 and Q2 is respectively former piece and the consequent of rule R.

The non-tag attributes of described step 3 includes average, variance, gradient, kurtosis, energy, entropy and cluster feature.

Mainly the providing the benefit that of the present invention can disposably identify training sample distributed in the case of multiple labels of being comprised of image, and the aspect such as the post processing of the structure of overall situation candidate's Frequent Item Sets and rule gives corresponding optimal solution during just identifying, it is mainly reflected in:

(1) the generation aspect of overall situation Candidate itemsets

For the generation problem of overall situation Candidate itemsets in association rule mining, the present invention proposes a kind of overall Candidate itemsets based on binary form and generates, prune and support the methods such as number calculating, method binary form describes training sample and overall situation Candidate itemsets, simplifying the generation of overall situation Candidate itemsets, transmit and support the calculating process of number, reduce algorithm realizes difficulty.It addition, through twice cut operation, hence it is evident that reduce the scale of overall situation Candidate itemsets, further increase the execution efficiency of algorithm.

(2) the post processing aspect of multi-tag related sides transaction

For related sides transaction MLFCAR frequent for multi-tag, it there may be the problem of two aspects, and one is the rule comprising mutually covering in MLFCAR；Two is to comprise conflicting rule in MLFCAR.To this, The present invention gives the reduction method of a kind of multi-tag related sides transaction, the method may insure that and not have superfluos rule in MLACR, is very easy to the use of rule, further increases effectiveness of the invention and operability.

Accompanying drawing explanation

Fig. 1 is the structured flowchart of the embodiment of the present invention

Fig. 2 is that in the embodiment of the present invention, Global Frequent Itemsets excavates flow chart

Fig. 3 is structure flow process in multi-tag related sides transaction storehouse in the embodiment of the present invention

Detailed description of the invention

If n website under distributional environment is S respectively₁、S₂、……、S_n, between them in addition to by network delivery information, other resource (such as hard disk, internal memory etc.) is all independent, website S_i(i=1,2 ..., n) on training image sample data set be DB_i, total training sample set DB=DB₁∪DB₂∪……∪DB_n, and DB_i∩DB_j=Φ, i ≠ j.R(A₁,.,A_p,B₁,.,B_q) it is training image sample data set DB_iRelation schema, wherein p and q is respectively the number of non-tag attributes and tag attributes, A₁,.,A_pFor the attribute-name of non-tag attributes, B₁,.,B_qAttribute-name for tag attributes.As it is shown in figure 1, it mainly includes the content of the following aspects:

(1) pretreatment

Each website is trained the preparation of image sample data collection, form conversion, dimension normalization, denoising, enhancing respectively.

(2) image segmentation

Each website all uses image partition method based on Density Clustering to identify the region to be identified of every width training image respectively.

(3) feature extraction

Each website extracts the feature in region to be identified in every width training image respectively, constructs the training image sample database DB of each website_i, i=1,2 ..., n.

(4) eigenvalue discretization

Each website carries out discretization respectively to connection attribute, and non-tag attributes is united by the following method:

1. numerical attribute values presses Interval Discrete, be mapped to continuous Positive Integer Set 0,1,2 ... on }；Referring specifically to EXAMPLEPART.

2. Category Attributes value press lexcographical order sequence, be mapped to continuous Positive Integer Set 0,1,2 ... on }.

(5) excavation of Global Frequent Itemsets L

If minimum support threshold value is minsup, given Item Sets c, if it supports that number scale is Count (c), corresponding support is designated as Sup (c), Sup (c)=Count (c)/| DB |, | DB | represent the number of sample in training image sample data set DB.As in figure 2 it is shown, the excavation of Global Frequent Itemsets L is divided into initialization, the generation of overall situation candidate's Frequent Item Sets, the calculating of overall situation candidate's Frequent Item Sets support number.

1. initialize

Initialization comprises the following steps:

A selects n website S₁、S₂、……、S_nIn a website or other unique host as master computer (being designated as website S), this machine is used for adding up the support number (spending) of Global Frequent Itemsets；

The each website of b is respectively by training image sample database DB thereon_iIt is converted into the data base of binary form, is designated as DB_i, such as p=5, during q=2, record R₁(A₁=1, A₂=0, A₃=1, A₁₄=1, A₅=1, B₁=1, B₂=1) result after changing is 1011111；

C arranges the non-label property set NLA in DB and tag attributes collection LA, NLA={A₁,.,A_p, LA={B₁,.,B_q}；

The each website of d adds up the support number of each attribute in non-label property set NLA and tag attributes collection LA respectively, and is supported that number issues main frame S；

eLL₁=c ∈ LA | sup (c) >=minsup}；//LL₁For the frequently 1-Item Sets of the overall situation in tag attributes, main frame S calculate and be saved on main frame S

fNLL₁=c ∈ NLA | sup (c) >=minsup}；//NLL₁For the frequently 1-Item Sets of the overall situation in non-tag attributes, main frame S calculate and be saved on main frame S

gL₁=LL₁∪NLL₁；//L₁For the frequently 1-Item Sets of the overall situation in DB, it is saved on main frame S

2. the generation (being completed by main frame S) of overall situation candidate's Frequent Item Sets

If L_kBeing the Global Frequent Itemsets of a length of k, Global Frequent Itemsets therein is binary form, by L_kGenerate overall situation candidate frequently (k+1)-Item Sets C_k+1Method comprise the following steps:

aC_k+1=Φ；//C_k+1The set of the binary number corresponding to all overall situation candidate frequently (k+1)-Item Sets, is initially empty set

3. overall situation candidate's Frequent Item Sets C_k+1The calculating of number supported by middle projects collection

Main frame S is by C_k+1In overall Candidate itemsets be conveyed directly to each website, each website S_i(i=1,2 ..., n) calculate C_k+1Middle projects collection supports that the method for number comprises the following steps:

4. the generation of Global Frequent Itemsets L

Global Frequent Itemsets L generates and comprises the following steps:

A main frame S accepts the support number from each website；

B calculates C_k+1In each the overall situation Candidate itemsets support number；

cL_k+1={ c ∈ C_k+1|sup(c)≥minsup}

DL=L₁∪L₂∪…；

(6) structure (being completed by main frame S) of multi-tag related sides transaction MLACR

If minimal confidence threshold is minconf, the structure of multi-tag related sides transaction MLACR is divided into structure and the generation of multi-tag related sides transaction MLACR of multi-tag frequent related sides transaction MLFCAR.

1. the structure of multi-tag frequent related sides transaction MLFCAR

The structure of MLFCAR comprises the following steps:

2. the generation of multi-tag related sides transaction MLACR

The generation of MLACR comprises the following steps:

Definition 1 is for two given multi-tag related sides transaction R₁:And R₂:If Then claim rule R₁Cover up rule R₂。

(7) identification (main frame completes) of image

For the image t of a width Unknown Label collection, its identification process comprises the following steps:

1. pretreatment

Image t carries out form conversion, dimension normalization, denoising, enhancing etc. process.

2. image segmentation

Image partition method based on Density Clustering is used to identify the region to be identified of image t.

3. feature extraction

Extract the feature in region to be identified in image t.

4. eigenvalue discretization

The concrete grammar of eigenvalue discretization sees step (4).

5. image recognition

If the Discrete Eigenvalue that image t obtains after above-mentioned 4 steps process is V_t, V_t=(t.A₁=t₁,……,t.A_i=t_i,……,t.A_p=t_p).The identification process of image t comprises the following steps:

Medical image is embodiment the most in a distributed manner, explains the execution process of the present invention.This example have selected 100 width medical images altogether, and they are respectively distributed on the website of three platform independent, website 1,2,3 respectively houses 35,35,30 width sample medical images, separately has the main frame S of a platform independent, q=4, B₁、B₂、B₃、B₄Being respectively disease 1, disease 2, disease 3, disease 4, concrete execution step is as follows:

(1) each website carries out form conversion, dimension normalization, denoising, enhancement process to this 100 width medical image respectively.

(2) each website is split respectively and extracts the correlated characteristic in region to be identified in every width medical image and be normalized, and result is as shown in table 1.The feature that present example is extracted includes average, variance, gradient, kurtosis, energy, entropy and cluster feature, i.e. p=7, A₁、A₂、A₃、A₄、A₅、A₆、A₇It is respectively average, variance, gradient, kurtosis, energy, entropy, cluster feature.

(3) numerical attribute discretization.Each website carries out sliding-model control to each attribute in table 1 respectively, its method can use wide division, etc. deep divide or the method such as division based on distance.This example uses wide division, 0 to 1 interval division will become 20 parts, be respectively (0.00,0.05], (0.05,0.10] ..., (0.95,1.00].As: Article 4 record { 0.3974,0.4812,0.5222,0.4316, the discretized values of 0.1525,0.7633,0.6608} is: (0.35,0.40], (0.45,0.50], (0.50,0.55], (0.40,0.45], (0.15,0.20], (0.75,0.80], (0.65,0.70] }.

Table 1 medical image features table

(4) discrete segment integer.The discrete segment of numerical attribute is mapped to continuous print integer mark by each website respectively, incite somebody to action respectively (0.00,0.05], (0.05,0.10] ..., (0.95,1.00] 1,2,3 it is mapped to, ..., 20, then it is { 08 after Article 4 record discrete segment integer, 10,11,09,04,16,14}.After treatment, table 1 is converted into such as the form of table 2.

(5) binarization of attribute.Property value after discretization is carried out Binary Conversion by each website respectively, and table 2 is converted into such as the form of table 3, and these binary numbers will reside in each website, its purpose is to convenient overall situation Candidate itemsets and supports the calculating of number.

Result table after table 2 discrete segment integer

Result table after table 3 attribute binarization

(5) excavation of Global Frequent Itemsets L

If minimum support threshold value minsup is 0.2, generate including at least one of { average, variance, gradient, kurtosis, energy, entropy and cluster feature } and disease 1, disease 2, disease 3, the Global Frequent Itemsets L of one of disease 4}, specific as follows:

The most each station scans result table after attribute binarization on it, obtains the support number of each property value, and is transferred to main frame S, and main frame S adds up the support number of each property value, generates the overall situation frequently 1-Item Sets L according to minimum support threshold value minsup₁, L₁null={ { average=01000}，{ average=01011},{ average=01010},{ variance=01011},{ variance=01010},{ variance=01001},{ gradient=01011},{ gradient=01100},{ kurtosis=01001},{ kurtosis=01000},{ energy=00011},{ energy=00100},{ cluster feature=01100},{ cluster feature=01110},{ disease 1},{ disease 2},{ disease 3},{ disease 4}}，As after scan transfer Item Sets { the support number of average=01000} is 25，I.e. Count (average=01000})=25，Sup (average=01000}) and=Count (average=00111})/| T |=25/100=0.25，Due to Sup ({ average=01000}) > minsup，Thus Item Sets { average=01000} is a Global Frequent Itemsets，Remaining Item Sets is analogized；

2. main frame S is according to L₁Generate overall candidate frequent 2-Item Sets C simultaneously comprising tag attributes and non-tag attributes₂, C₂={ { average=01000, disease 1}, { average=01000, disease 2}, { average=01000, disease 3}, { average=01000, disease 4}, { average=01011, disease 1}, { average=01011, disease 2}, { average=01011, disease 3}, average=01011, disease 4} ..., { gradient=01100, disease 1}, { gradient=01100, disease 2}, { gradient=01100, disease 3}, and gradient=01100, disease 4} ....

3. main frame S is by frequent for overall situation candidate 2-Item Sets C₂Send website 1, website 2, website 3 to, it is able to know that transmitted implication in order to ensure each website, before transmission overall situation candidate's Frequent Item Sets is processed, as by { gradient=01100, disease 1} is processed into 000000000001100000000000000000000001000, and the most each website is just not required to understand everybody implication.

The most each station scans result table after attribute binarization on it, tries to achieve C₂Middle projects collection is in the support (only need to carrying out or operate) of each website, if 001 record on website 1 is 010100100101011010010000110100011001010, due to 010100100101011010010000110100011001010 or 000000000001100000000000000000000001000 ≠ 010100100101011010010000110100011001010, so this Item Sets do not supported in this record.After calculating terminates, each website is transferred to main frame S.

5. main frame generates the overall situation frequently 2-Item Sets L according to minimum support threshold value minsup₂。L₂=average=01000, disease 2}, variance=01011, disease 2}, gradient=01011, disease 2} ..., average=01000, disease 1}, kurtosis=01001, disease 4} ....

6. according to L₂Generate overall candidate frequent 3-Item Sets C simultaneously comprising tag attributes and non-tag attributes₃, by frequent for overall situation candidate 3-Item Sets C₃Sending website 1, website 2, website 3 to, each station scans correspondence table once, tries to achieve C₃Middle projects collection is in the support of each website, and each website is transferred to main frame S, main frame S and generates the overall situation frequently 3-Item Sets L according to minimum support threshold value minsup₃.Try to achieve L the most respectively₄、L₅、…...、L_k, its termination condition is: according to L_kGenerate candidate (k+1)-Item Sets C simultaneously comprising tag attributes and non-tag attributes_(k+1)For empty set.

7. collect result above, obtain Global Frequent Itemsets L,

L={{ average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011, cluster feature=01100, disease 2}, { average=01011, variance=01010, gradient=01100, kurtosis=00110, disease 4}, { average=01000, variance=01011, gradient=01011, kurtosis=01001, disease 1, disease 2, disease 4}, { average=01010, variance=01001, gradient=01100, kurtosis=01000, energy=00100, cluster feature=01110, disease 2, disease 4} ....

The structure of multi-tag related sides transaction is divided into structure and the generation of multi-tag related sides transaction MLACR of multi-tag frequent related sides transaction MLFCAR.

If minimal confidence threshold minconf is 0.6, the structure of multi-tag frequent related sides transaction MLFCAR comprises the following steps:

1. constructing former piece and the consequent of each classifying rules in multi-tag frequent related sides transaction MLFCAR, former piece is the non-label property set that Global Frequent Itemsets in L is comprised, and consequent is the tag attributes collection that Global Frequent Itemsets in L is comprised.Such as Item Sets { average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011, cluster feature=01100, the former piece of disease 2}, consequent are respectively { average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011, cluster feature=01100} and { disease 2}；{ average=01011, variance=01010, gradient=01100, kurtosis=00110, the former piece of disease 4}, consequent are respectively { average=01011, variance=01010, gradient=01100, kurtosis=00110} and { disease 4} to Item Sets；Item Sets { average=01000, variance=01011, gradient=01011, kurtosis=01001, disease 1, disease 2, the former piece of disease 4}, consequent are respectively { average=01000, variance=01011, gradient=01011, kurtosis=01001} and { disease 1, disease 2, disease 4}.Remaining Global Frequent Itemsets makees same process, thus obtains initial multi-tag frequent related sides transaction MLFCAR.

MLFCAR={{ average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011,Average=01011, and variance=10, gradient=01100, Average=01000, and variance=01011, gradient=01011, Average=10, and variance=01001, gradient=01100, kurtosis=01000, energy=00100,

Calculate the confidence level of each classifying rules in MLFCAR the most respectively.RuleConfidence calculations formula be: Count (P ∪ Q)/Count (P), Count (P ∪ Q), the occurrence of Count (P) are tried to achieve in the mining process of Frequent Item Sets L.As rule average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011,nullConfidence level be: Count ({ average=01000，Variance=01011，Gradient=01011，Kurtosis=01001，Energy=00011，Cluster feature=01100，Disease 2})/Count ({ average=01000，Variance=01011，Gradient=01011，Kurtosis=01001，Energy=00011，Cluster feature=01100})，Count ({ average=01000，Variance=01011，Gradient=01011，Kurtosis=01001，Energy=00011，Cluster feature=01100，Disease 2})=17，Count ({ average=01000，Variance=01011，Gradient=01011，Kurtosis=01001，Energy=00011，Cluster feature=01100})=20，Its confidence level is 17/20，It is 0.85.The confidence level of other classifying ruless of MLFCAR can be calculated by same method.

3. delete the confidence level classifying rules less than 0.6 in MLFCAR, construct final multi-tag frequent related sides transaction MLFCAR, thus can obtain MLFCAR.

MLFCAR={{ average=01000, variance=01011, gradient=01011, kurtosis=01001, energy=00011,Average=01000, and variance=01011, gradient=01011, Average=10, and variance=01001, gradient=01100, kurtosis=01000, energy=00100,

4. MLFCAR is carried out yojan, deletes the part superfluos rule in MLFCAR, such as the Article 1 rule in MLFCAR is exactly unnecessary, can obtain multi-tag related sides transaction MLACR accordingly.

MLACR={{ average=01000, variance=01011, gradient=01011, Average=01010, and variance=01001, gradient=01100, kurtosis=01000, energy=00100,

(7) identification of image

For the image t of a width Unknown Label collection, after preprocessed, image segmentation, feature extraction, eigenvalue discretization, obtain the Discrete Eigenvalue V of its correspondence_t。

Such as V_t={ average=01000, variance=01011, gradient=01011, kurtosis=01001, kurtosis=01001, energy=01010, entropy=01010, cluster feature=01101}, V_tComprising the former piece of the rule of Article 1 in MLACR, therefore the tally set of image t is the consequent of this rule, and its tally set is that { disease 4}, i.e. this image may comprise and " disease 1 ", " disease 2 ", information that " disease 3 " is relevant simultaneously for disease 1, disease 2.

Such as V_t={ average=01000, variance=01001, gradient=01100, kurtosis=01000, kurtosis=01001, energy=01010, entropy=01010, cluster feature=01110}, owing to not existing by V in strictly all rules former piece in MLACR_tThe rule comprised, takes former piece and V to this_tIntersect most rules, i.e. rule average=01010, variance=01001, gradient=01100, kurtosis=01000, energy=00100,This image may comprise and " disease 2 ", information that " disease 4 " is relevant simultaneously.

The present embodiment is the identification process of a kind of medical image, and the method can also apply other field of image recognition being similar to therewith, such as the accompanying drawing data etc. in patent.

Claims

1. a distributed multi-tag image-recognizing method based on correlation rule, including the generation of overall situation candidate's Frequent Item Sets, the calculating supporting number and image recognizing step, it is characterised in that: the generation of described overall situation candidate's Frequent Item Sets, the calculating of support number and image recognizing step include:

The each website of step 3 extracts the feature in region to be identified in every width training image respectively, constructs the training image sample data set DB of each website_i, described image sample data collection DB_iPattern be relation schema R (A₁,.,A_p,B₁,.,B_q), wherein p and q is respectively the number of non-tag attributes and tag attributes, A₁,.,A_pFor the attribute-name of non-tag attributes, B₁,.,B_qFor the attribute-name of tag attributes, i=1,2 ..., n.Total training sample set DB=DB₁∪DB₂∪……∪DB_n, DB_i∩DB_j=Φ, i ≠ j；

The excavation of step 5 Global Frequent Itemsets L；

Step 7 image recognition.

A kind of distributed multi-tag image-recognizing method based on correlation rule the most according to claim 1, it is characterised in that: the concrete steps of described step 5 include:

Step 5.1 initializes, comprising:

Step 5.3 generates Global Frequent Itemsets

A kind of distributed multi-tag image-recognizing method based on correlation rule the most according to claim 1, it is characterised in that: the concrete steps of described step 7 include:

Step 7.4 non-label attribute character value discretization；

A kind of distributed multi-tag image-recognizing method based on correlation rule the most according to claim 1, it is characterised in that: the concrete steps of described step 5.2.1 include:

A kind of distributed multi-tag image-recognizing method based on correlation rule the most according to claim 1, it is characterised in that: the concrete steps of described step 6.4 include:

Step 6.4.2 calculates MLACR=MLACR ∪ { R₁}；

MLFCAR=MLFCAR-{R₁}；

MLFCAR=MLFCAR-{R}；

MLACR=MLACR ∪ { R}；

A kind of distributed multi-tag image-recognizing method based on correlation rule, it is characterised in that: the non-tag attributes of described step 3 includes average, variance, gradient, kurtosis, energy, entropy and cluster feature.