CN104715258A

CN104715258A - Distributed image recognition method based on SVM

Info

Publication number: CN104715258A
Application number: CN201310687112.4A
Authority: CN
Inventors: 朱玉全; 陈耿; 孙蕾; 耿霞; 彭晓冰
Original assignee: ZHEJIANG JINQUAN SOFTWARE CO Ltd
Current assignee: ZHEJIANG JINQUAN SOFTWARE CO Ltd
Priority date: 2013-12-17
Filing date: 2013-12-17
Publication date: 2015-06-17

Abstract

The invention discloses a distributed image recognition method based on SVM. The method includes the steps of pre-processing of distributed image samples, image segmentation, feature extraction, inner production calculation, optimal problem solution and image recognition. Through the method, the types of images to be recognized can be recognized when the training image samples are distributed, a corresponding solution is given for construction of a linear classifier in the recognizing process, and the distributed image recognition method based on SVM is provided. Through the method, it can be guaranteed that data of stations do not reside on other stations, safety and privacy of the data are guaranteed, and meanwhile the method is high in recognition accuracy rate.

Description

A kind of distributed image recognition methods based on SVM

Technical field

The invention belongs to the application of the Computer Analysis technology of image, be specifically related to the image-recognizing method under a kind of distributed environment.

Background technology

SVM is a kind of data mining technology with solving classification and regression problem, because SVM method has many noticeable advantages and good experimental performance, become the focus of machine learning research field, and achieved good effect, as text classification, handwriting recognition, Images Classification and identification etc.

In many practical applications, data itself are distributions, between them except passing through network delivery information, other resource is all independent, distributed image identification is an important research branch in distributed data digging technology, it is intended to come structural classification function or sorter by the training image sample data collection under distributional environment, and utilizes this classification function or sorter to identify the classification of testing image.For solving the problem of image recognition in training image sample distribution situation, a feasible solution is focused on by these data sets on a certain machine, recycling algorithm SVM carrys out structural classification device, or utilizes MapReduce programming model to construct the sorter under distributional environment.Generally, at least there is the problem of two aspects in this type of thought, one is need to arrange the performance computing machine that comparatively (very) is high store and process these jumbo data, two is under many circumstances, for the consideration to data security and privacy, the concentrated of data is impossible.To this, the present invention proposes a kind of distributed image recognition methods based on SVM, the method concentrates implied sorter by training image sample data under discovery distributional environment, realizes the automatic identification of image thus.

Summary of the invention

Carry out knowledge method for distinguishing to image under the object of this invention is to provide a kind of training image sample distribution formula situation, the method can construct linear classifier rapidly, realizes the image identification function of precise and high efficiency.

Technical scheme of the present invention is: a kind of distributed image recognition methods based on SVM, comprising: inner product calculates, optimal problem solves and image recognizing step, it is characterized in that: described inner product calculates, optimal problem solves and comprises with image recognizing step:

The preparation of step 1 image sample data collection and pre-service, each website completes the preparation of training image sample data collection, format conversion, dimension normalization, denoising, enhancing work respectively;

Step 2 Iamge Segmentation, each website adopts the image partition method of density based cluster to identify the region to be identified of every width training image respectively;

Step 3 feature extraction, each website extracts the feature in region to be identified in every width training image respectively, constructs the training image sample data collection DB of each website _i, i=1,2 ..., k.Described training image sample set DB _iin each sample be expressed as (x ₁, x ₂..., x _p, y), wherein p is non-category attribute number, x ₁, x ₂..., x _pfor non-category attribute, y is category attribute, and the value of y is 1 or-1, represents two class situations respectively.

The structure of step 4 optimal classification function f (x);

The identification of step 5 image.

The concrete steps of the structure of described step 4 optimal classification function f (x) comprising:

Step 4.1 initialization, it comprises:

Step 4.1.1 selects a platform independent computing machine as main frame (being designated as website S), and this machine is used for calculating inner product and solving optimization problem;

Step 4.1.2 sets the size of main frame two memory blocks, is used for the data of reception two websites respectively;

Step 4.2 inner product calculates, and it comprises:

The each website of step 4.2.1 host requests sends training image sample;

Step 4.2.2 calculates inner product;

Solving (main frame completes) of step 4.3 optimum solution, it comprises:

Step 4.3.1 asks mathematical model to be the optimum solution of (1) formula;

\min φ (x) = \frac{1}{2} (w . w) - - - (1)

s.t.y _i((w.x _i)+b)≥1

(1) formula is converted into the saddle point asking formula (2) Lagrange function by step 4.3.2;

L (w, b, α) = \frac{1}{2} (w . w) - Σ_{i = 1}^{n} α_{i} [y_{i} ((w . x_{i}) + b] - 1], α_{i} &GreaterEqual; 0 - - - (2)

(2) formula is converted into the optimization problem asking formula (3) by step 4.3.3;

\max W (α) = Σ_{i = 1}^{n} α_{i} - \frac{1}{2} \underset{i, j}{Σ} α_{i} α_{j} y_{i} y_{j} (x_{i} . x_{j}) - - - (3)

Σ_{i = 1}^{n} α_{i} y_{i} = 0, α_{i} &GreaterEqual; 0

Step 4.3.4 solves the optimum solution of formula (3), obtains the solution α of α ^*;

Step 4.3.5 calculates w, wherein SV is support vector collection;

Step 4.3.6 calculates b, selects not to be 0 substitute into obtain b;

The concrete steps that described step 5 main frame carries out image recognition comprise:

The preparation of step 5.1 image to be identified and pre-service, comprise format conversion, dimension normalization, denoising, enhancing;

Step 5.2 main frame adopts the image partition method of density based cluster to identify the region to be identified of image to be identified;

Step 5.3 extracts the provincial characteristics to be identified of image to be identified;

Step 5.4 is according to the provincial characteristics x to be identified described in step 5.3 _t=(x _t1, x _t2..., x _tp) calculate f (x _t);

Step 5.5 is according to f (x _t) determine the classification of image to be identified.

Main beneficial effect of the present invention is the classification that can identify image to be identified in training image sample distribution situation, and give corresponding solution with regard to the structure of identifying neutral line sorter, propose a kind of distributed image recognition methods based on SVM, the method can guarantee that each station data is not resident at other websites, ensure that safety and the privacy of data, have higher recognition accuracy simultaneously.

Accompanying drawing explanation

Fig. 1 is the structured flowchart of the embodiment of the present invention

Fig. 2 is the structure flow process of optimal classification function f (x) in the embodiment of the present invention

Embodiment

If training image total sample number is n, k website under distributional environment is S respectively ₁, S ₂..., S _k, between them except passing through network delivery information, other resource (as hard disk, internal memory etc.) is all independent, website S _i(i=1,2 ..., the training image sample number k) is n _i, obvious n ₁+ n ₂+ ... ..+n _kthe vector representation of=n, training image sample x is (x ₁, x ₂..., x _p, y), wherein p is non-category attribute number, x ₁, x ₂..., x _pfor non-category attribute, y is category attribute, and the value of y is 1 or-1, represents two class situations respectively.As shown in Figure 1, it mainly comprises the content of the following aspects:

(1) pre-service

Each website completes preparation to training image sample data collection, format conversion, dimension normalization, denoising, enhancing work respectively.

(2) Iamge Segmentation

Each website adopts the image partition method of density based cluster to identify the region to be identified of every width training image respectively.

(3) feature extraction

Each website extracts the feature in region to be identified in every width training image respectively, constructs the training image sample data collection DB of each website _i, i=1,2 ..., k.

(4) structure of optimal classification function f (x)

As shown in Figure 2, the structure of optimal classification function f (x) is divided into initialization, calculates inner product, asks optimum solution.

1. initialization

Initialization comprises the following steps:

A selects a platform independent computing machine as main frame (being designated as website S), and this machine is used for calculating inner product and solving optimization problem;

B sets the size of main frame two memory blocks, is used for the data of reception two websites respectively, if its value is m respectively ₁, m ₂, m ₁>m ₂;

2. inner product calculates

If every block can deposit m bar sample, website S ₁, S ₂..., S _kon sample number be designated as respectively | S ₁|, | S ₂| ..., | S _k|, suppose | S ₁|≤| S ₂|≤... ≤ | S _k|.The calculating of inner product comprises the following steps:

a for(i=1；i≤k；i++)do begin

B while (website i does not send sample in addition) do begin

C main frame asks to send m*m to website i ₁bar sample;

d for(j＝i+1；j≤k；i++)do begin

E while (website j does not send sample in addition) do begin

F main frame asks to send m*m to website j ₂bar sample;

G Framework computing website i and website j sends the inner product between sample and is stored on main frame;

h end

i end

j end

k end

3. the solving (main frame completes) of optimum solution

The method for solving of optimum solution comprises the following steps:

A asks mathematical model to be the optimum solution of (1) formula;

\min φ (x) = \frac{1}{2} (w . w) - - - (1)

s.t.y _i((w.x _i)+b)≥1

(1) formula is converted into the saddle point asking formula (2) Lagrange function by b;

L (w, b, α) = \frac{1}{2} (w . w) - Σ_{i = 1}^{n} α_{i} [y_{i} ((w . x_{i}) + b] - 1], α_{i} &GreaterEqual; 0 - - - (2)

(2) formula is converted into the optimization problem asking formula (3) by c;

\max W (α) = Σ_{i = 1}^{n} α_{i} - \frac{1}{2} \underset{i, j}{Σ} α_{i} α_{j} y_{i} y_{j} (x_{i} . x_{j}) - - - (3)

Σ_{i = 1}^{n} α_{i} y_{i} = 0, α_{i} &GreaterEqual; 0

D solves the optimum solution of formula (3), obtains the solution α of α ^*;

E calculates w, wherein SV is support vector collection;

F calculates b, selects not to be 0 substitute into obtain b;

g f(x)=(w.x)+b；

(5) identification (main frame completes) of image

For the image t of a width Unknown Label collection, its identifying comprises the following steps:

1. pre-service

Format conversion, dimension normalization, denoising, enhancing process are carried out to image t.

2. Iamge Segmentation

The image partition method of density based cluster is adopted to identify the region to be identified of image t.

3. feature extraction

Extract the feature in region to be identified in image t.

4. image recognition

If the proper vector value that image t obtains after above-mentioned 4 step process is x _t=(x _t1, x _t2..., x _tp), the identifying of image t comprises the following steps:

A calculates f (x _t);

B is according to f (x _t) determine the classification of image t;

Below with certain distributed image for embodiment, explain implementation of the present invention.This example have selected 52 width images, and they are distributed on the website of three platform independent respectively, website 1,2,3 respectively houses 20,16,16 width sample images, and concrete execution step is as follows:

(1) each website carries out format conversion, dimension normalization, denoising, enhancing process to this 52 width image respectively.

(2) each website is split respectively and is extracted the correlated characteristic in region to be identified in every width image and be normalized, and result is as shown in table 1.Cass collection of the present invention three features, be designated as feature 1, feature 2, feature 3 respectively, classification divides two classes, is designated as 1 and-1 respectively.

Fig. 1 characteristics of image table

(3) structure of optimal classification function f (x)

Select a platform independent computing machine as main frame (being designated as website S), this machine is used for calculating inner product and solving optimization problem, if two memory block sizes that this main frame is reserved are respectively 2 and 1, the open ended sample number of every block is 8, and the structure of optimal classification function f (x) is specific as follows:

1. website S is to website 2 (sample number on website 2 is minimum) request transmission 16 (2*8) individual sample data, and website 2 sends 16 sample datas to website S, is designated as DS ₂₁, website S calculates DS ₂₁the inner product of interior any two samples;

2. website S asks transmission 8 (1*8) individual sample data to website 3, and website 3 sends 8 sample datas to website S, is designated as DS ₃₁, website S calculates DS ₂₁, DS ₃₁between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 3, and website 3 sends 8 sample datas to website S, is designated as DS ₃₂, website S calculates DS ₂₁, DS ₃₂between the inner product of two samples, and be kept on website S;

3. website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS ₁₁, website S calculates DS ₂₁, DS ₁₁between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS ₁₂, website S calculates DS ₂₁, DS ₁₂between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, because website 1 only has 4 samples, website 1 sends 4 sample datas to website S, is designated as DS ₁₃, website S calculates DS ₂₁, DS ₁₃between the inner product of two samples, and be kept on website S;

4. website S asks transmission 16 sample datas to website 3, and website 3 sends 16 sample datas to website S, is designated as DS ₃₁;

5. website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS ₁₁, website S calculates DS ₃₁, DS ₁₁between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS ₁₂, website S calculates DS ₃₁, DS ₁₂between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, and website 1 sends 4 sample datas to website S, is designated as DS ₁₃, website S calculates DS ₃₁, DS ₁₃between the inner product of two samples, and be kept on website S;

6. optimization problem (5) formula is solved;

\max W (α) = Σ_{i = 1}^{n} α_{i} - \frac{1}{2} Σ_{i = 1}^{52} Σ_{j = 1}^{52} α_{i} α_{j} y_{i} y_{j} (x_{i} . x_{j}) - - - (5)

Σ_{i = 1}^{52} α_{i} y_{i} = 0, α_{i} &GreaterEqual; 0

X in formula (5) _i.x _jthe inner product of each step gained before being.

7. w is calculated, according to obtain w=(1.2,3.3,4.2);

8. calculate b, obtain b=-2.2;

⑨f(x)=(w.x)+b＝1.2x ₁+3.3x ₂+4.2x ₃-2.2。

(5) identification (main frame completes) of image

For the image t of the unknown classification of a width, after pre-service, Iamge Segmentation, feature extraction, obtain its characteristic of correspondence vector value is x _t=(x _t1, x _t2..., x _tp).

As x _t=(0.48,0.56,0.65), calculates f (x)=2.95>=1, and corresponding image belongs to the 1st class.

As x _t=(0.25,0.12,0.11), calculates f (x)=-1.09≤-1, and corresponding image belongs to the 2nd class.

Claims

1. based on a distributed image recognition methods of SVM, comprising: inner product calculates, optimal problem solves and image recognizing step, it is characterized in that: described inner product calculates, optimal problem solves and comprises with image recognizing step:

Step 3 feature extraction, each website extracts the feature in region to be identified in every width training image respectively, constructs the training image sample data collection DB of each website _i, i=1,2 ..., k.Described training image sample set DB _iin each sample be expressed as (x ₁, x ₂..., x _p, y), wherein p is non-category attribute number, x ₁, x ₂..., x _pfor non-category attribute, y is category attribute, and the value of y is 1 or-1, represents two class situations respectively;

The structure of step 4 optimal classification function f (x);

The identification of step 5 image.

2. a kind of distributed image recognition methods based on SVM according to claim 1, is characterized in that: the concrete steps of described step 4 comprise:

Step 4.1 initialization, it comprises:

Step 4.2 inner product calculates, and it comprises:

The each website of step 4.2.1 host requests sends training image sample;

Step 4.2.2 calculates inner product;

Solving (main frame completes) of step 4.3 optimum solution, it comprises:

Step 4.3.1 asks mathematical model to be the optimum solution of (1) formula;

s.t.y _i((w.x _i)+b)≥1

Step 4.3.5 calculates w, wherein SV is support vector collection;

Step 4.3.6 calculates b, selects not to be 0 substitute into obtain b.

3. a kind of distributed image recognition methods based on SVM according to claim 1, is characterized in that: the concrete steps of described step 5 comprise: