CN104715258A - Distributed image recognition method based on SVM - Google Patents

Distributed image recognition method based on SVM Download PDF

Info

Publication number
CN104715258A
CN104715258A CN201310687112.4A CN201310687112A CN104715258A CN 104715258 A CN104715258 A CN 104715258A CN 201310687112 A CN201310687112 A CN 201310687112A CN 104715258 A CN104715258 A CN 104715258A
Authority
CN
China
Prior art keywords
image
website
identified
calculates
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310687112.4A
Other languages
Chinese (zh)
Inventor
朱玉全
陈耿
孙蕾
耿霞
彭晓冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHEJIANG JINQUAN SOFTWARE CO Ltd
Original Assignee
ZHEJIANG JINQUAN SOFTWARE CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHEJIANG JINQUAN SOFTWARE CO Ltd filed Critical ZHEJIANG JINQUAN SOFTWARE CO Ltd
Priority to CN201310687112.4A priority Critical patent/CN104715258A/en
Publication of CN104715258A publication Critical patent/CN104715258A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a distributed image recognition method based on SVM. The method includes the steps of pre-processing of distributed image samples, image segmentation, feature extraction, inner production calculation, optimal problem solution and image recognition. Through the method, the types of images to be recognized can be recognized when the training image samples are distributed, a corresponding solution is given for construction of a linear classifier in the recognizing process, and the distributed image recognition method based on SVM is provided. Through the method, it can be guaranteed that data of stations do not reside on other stations, safety and privacy of the data are guaranteed, and meanwhile the method is high in recognition accuracy rate.

Description

A kind of distributed image recognition methods based on SVM
Technical field
The invention belongs to the application of the Computer Analysis technology of image, be specifically related to the image-recognizing method under a kind of distributed environment.
Background technology
SVM is a kind of data mining technology with solving classification and regression problem, because SVM method has many noticeable advantages and good experimental performance, become the focus of machine learning research field, and achieved good effect, as text classification, handwriting recognition, Images Classification and identification etc.
In many practical applications, data itself are distributions, between them except passing through network delivery information, other resource is all independent, distributed image identification is an important research branch in distributed data digging technology, it is intended to come structural classification function or sorter by the training image sample data collection under distributional environment, and utilizes this classification function or sorter to identify the classification of testing image.For solving the problem of image recognition in training image sample distribution situation, a feasible solution is focused on by these data sets on a certain machine, recycling algorithm SVM carrys out structural classification device, or utilizes MapReduce programming model to construct the sorter under distributional environment.Generally, at least there is the problem of two aspects in this type of thought, one is need to arrange the performance computing machine that comparatively (very) is high store and process these jumbo data, two is under many circumstances, for the consideration to data security and privacy, the concentrated of data is impossible.To this, the present invention proposes a kind of distributed image recognition methods based on SVM, the method concentrates implied sorter by training image sample data under discovery distributional environment, realizes the automatic identification of image thus.
Summary of the invention
Carry out knowledge method for distinguishing to image under the object of this invention is to provide a kind of training image sample distribution formula situation, the method can construct linear classifier rapidly, realizes the image identification function of precise and high efficiency.
Technical scheme of the present invention is: a kind of distributed image recognition methods based on SVM, comprising: inner product calculates, optimal problem solves and image recognizing step, it is characterized in that: described inner product calculates, optimal problem solves and comprises with image recognizing step:
The preparation of step 1 image sample data collection and pre-service, each website completes the preparation of training image sample data collection, format conversion, dimension normalization, denoising, enhancing work respectively;
Step 2 Iamge Segmentation, each website adopts the image partition method of density based cluster to identify the region to be identified of every width training image respectively;
Step 3 feature extraction, each website extracts the feature in region to be identified in every width training image respectively, constructs the training image sample data collection DB of each website i, i=1,2 ..., k.Described training image sample set DB iin each sample be expressed as (x 1, x 2..., x p, y), wherein p is non-category attribute number, x 1, x 2..., x pfor non-category attribute, y is category attribute, and the value of y is 1 or-1, represents two class situations respectively.
The structure of step 4 optimal classification function f (x);
The identification of step 5 image.
The concrete steps of the structure of described step 4 optimal classification function f (x) comprising:
Step 4.1 initialization, it comprises:
Step 4.1.1 selects a platform independent computing machine as main frame (being designated as website S), and this machine is used for calculating inner product and solving optimization problem;
Step 4.1.2 sets the size of main frame two memory blocks, is used for the data of reception two websites respectively;
Step 4.2 inner product calculates, and it comprises:
The each website of step 4.2.1 host requests sends training image sample;
Step 4.2.2 calculates inner product;
Solving (main frame completes) of step 4.3 optimum solution, it comprises:
Step 4.3.1 asks mathematical model to be the optimum solution of (1) formula;
min φ ( x ) = 1 2 ( w . w ) - - - ( 1 )
s.t.y i((w.x i)+b)≥1
(1) formula is converted into the saddle point asking formula (2) Lagrange function by step 4.3.2;
L ( w , b , α ) = 1 2 ( w . w ) - Σ i = 1 n α i [ y i ( ( w . x i ) + b ] - 1 ] , α i ≥ 0 - - - ( 2 )
(2) formula is converted into the optimization problem asking formula (3) by step 4.3.3;
max W ( α ) = Σ i = 1 n α i - 1 2 Σ i , j α i α j y i y j ( x i . x j ) - - - ( 3 )
Σ i = 1 n α i y i = 0 , α i ≥ 0
Step 4.3.4 solves the optimum solution of formula (3), obtains the solution α of α *;
Step 4.3.5 calculates w, wherein SV is support vector collection;
Step 4.3.6 calculates b, selects not to be 0 substitute into obtain b;
The concrete steps that described step 5 main frame carries out image recognition comprise:
The preparation of step 5.1 image to be identified and pre-service, comprise format conversion, dimension normalization, denoising, enhancing;
Step 5.2 main frame adopts the image partition method of density based cluster to identify the region to be identified of image to be identified;
Step 5.3 extracts the provincial characteristics to be identified of image to be identified;
Step 5.4 is according to the provincial characteristics x to be identified described in step 5.3 t=(x t1, x t2..., x tp) calculate f (x t);
Step 5.5 is according to f (x t) determine the classification of image to be identified.
Main beneficial effect of the present invention is the classification that can identify image to be identified in training image sample distribution situation, and give corresponding solution with regard to the structure of identifying neutral line sorter, propose a kind of distributed image recognition methods based on SVM, the method can guarantee that each station data is not resident at other websites, ensure that safety and the privacy of data, have higher recognition accuracy simultaneously.
Accompanying drawing explanation
Fig. 1 is the structured flowchart of the embodiment of the present invention
Fig. 2 is the structure flow process of optimal classification function f (x) in the embodiment of the present invention
Embodiment
If training image total sample number is n, k website under distributional environment is S respectively 1, S 2..., S k, between them except passing through network delivery information, other resource (as hard disk, internal memory etc.) is all independent, website S i(i=1,2 ..., the training image sample number k) is n i, obvious n 1+ n 2+ ... ..+n kthe vector representation of=n, training image sample x is (x 1, x 2..., x p, y), wherein p is non-category attribute number, x 1, x 2..., x pfor non-category attribute, y is category attribute, and the value of y is 1 or-1, represents two class situations respectively.As shown in Figure 1, it mainly comprises the content of the following aspects:
(1) pre-service
Each website completes preparation to training image sample data collection, format conversion, dimension normalization, denoising, enhancing work respectively.
(2) Iamge Segmentation
Each website adopts the image partition method of density based cluster to identify the region to be identified of every width training image respectively.
(3) feature extraction
Each website extracts the feature in region to be identified in every width training image respectively, constructs the training image sample data collection DB of each website i, i=1,2 ..., k.
(4) structure of optimal classification function f (x)
As shown in Figure 2, the structure of optimal classification function f (x) is divided into initialization, calculates inner product, asks optimum solution.
1. initialization
Initialization comprises the following steps:
A selects a platform independent computing machine as main frame (being designated as website S), and this machine is used for calculating inner product and solving optimization problem;
B sets the size of main frame two memory blocks, is used for the data of reception two websites respectively, if its value is m respectively 1, m 2, m 1>m 2;
2. inner product calculates
If every block can deposit m bar sample, website S 1, S 2..., S kon sample number be designated as respectively | S 1|, | S 2| ..., | S k|, suppose | S 1|≤| S 2|≤... ≤ | S k|.The calculating of inner product comprises the following steps:
a for(i=1;i≤k;i++)do begin
B while (website i does not send sample in addition) do begin
C main frame asks to send m*m to website i 1bar sample;
d for(j=i+1;j≤k;i++)do begin
E while (website j does not send sample in addition) do begin
F main frame asks to send m*m to website j 2bar sample;
G Framework computing website i and website j sends the inner product between sample and is stored on main frame;
h end
i end
j end
k end
3. the solving (main frame completes) of optimum solution
The method for solving of optimum solution comprises the following steps:
A asks mathematical model to be the optimum solution of (1) formula;
min φ ( x ) = 1 2 ( w . w ) - - - ( 1 )
s.t.y i((w.x i)+b)≥1
(1) formula is converted into the saddle point asking formula (2) Lagrange function by b;
L ( w , b , α ) = 1 2 ( w . w ) - Σ i = 1 n α i [ y i ( ( w . x i ) + b ] - 1 ] , α i ≥ 0 - - - ( 2 )
(2) formula is converted into the optimization problem asking formula (3) by c;
max W ( α ) = Σ i = 1 n α i - 1 2 Σ i , j α i α j y i y j ( x i . x j ) - - - ( 3 )
Σ i = 1 n α i y i = 0 , α i ≥ 0
D solves the optimum solution of formula (3), obtains the solution α of α *;
E calculates w, wherein SV is support vector collection;
F calculates b, selects not to be 0 substitute into obtain b;
g f(x)=(w.x)+b;
(5) identification (main frame completes) of image
For the image t of a width Unknown Label collection, its identifying comprises the following steps:
1. pre-service
Format conversion, dimension normalization, denoising, enhancing process are carried out to image t.
2. Iamge Segmentation
The image partition method of density based cluster is adopted to identify the region to be identified of image t.
3. feature extraction
Extract the feature in region to be identified in image t.
4. image recognition
If the proper vector value that image t obtains after above-mentioned 4 step process is x t=(x t1, x t2..., x tp), the identifying of image t comprises the following steps:
A calculates f (x t);
B is according to f (x t) determine the classification of image t;
Below with certain distributed image for embodiment, explain implementation of the present invention.This example have selected 52 width images, and they are distributed on the website of three platform independent respectively, website 1,2,3 respectively houses 20,16,16 width sample images, and concrete execution step is as follows:
(1) each website carries out format conversion, dimension normalization, denoising, enhancing process to this 52 width image respectively.
(2) each website is split respectively and is extracted the correlated characteristic in region to be identified in every width image and be normalized, and result is as shown in table 1.Cass collection of the present invention three features, be designated as feature 1, feature 2, feature 3 respectively, classification divides two classes, is designated as 1 and-1 respectively.
Fig. 1 characteristics of image table
(3) structure of optimal classification function f (x)
Select a platform independent computing machine as main frame (being designated as website S), this machine is used for calculating inner product and solving optimization problem, if two memory block sizes that this main frame is reserved are respectively 2 and 1, the open ended sample number of every block is 8, and the structure of optimal classification function f (x) is specific as follows:
1. website S is to website 2 (sample number on website 2 is minimum) request transmission 16 (2*8) individual sample data, and website 2 sends 16 sample datas to website S, is designated as DS 21, website S calculates DS 21the inner product of interior any two samples;
2. website S asks transmission 8 (1*8) individual sample data to website 3, and website 3 sends 8 sample datas to website S, is designated as DS 31, website S calculates DS 21, DS 31between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 3, and website 3 sends 8 sample datas to website S, is designated as DS 32, website S calculates DS 21, DS 32between the inner product of two samples, and be kept on website S;
3. website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS 11, website S calculates DS 21, DS 11between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS 12, website S calculates DS 21, DS 12between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, because website 1 only has 4 samples, website 1 sends 4 sample datas to website S, is designated as DS 13, website S calculates DS 21, DS 13between the inner product of two samples, and be kept on website S;
4. website S asks transmission 16 sample datas to website 3, and website 3 sends 16 sample datas to website S, is designated as DS 31;
5. website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS 11, website S calculates DS 31, DS 11between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS 12, website S calculates DS 31, DS 12between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, and website 1 sends 4 sample datas to website S, is designated as DS 13, website S calculates DS 31, DS 13between the inner product of two samples, and be kept on website S;
6. optimization problem (5) formula is solved;
max W ( α ) = Σ i = 1 n α i - 1 2 Σ i = 1 52 Σ j = 1 52 α i α j y i y j ( x i . x j ) - - - ( 5 )
Σ i = 1 52 α i y i = 0 , α i ≥ 0
X in formula (5) i.x jthe inner product of each step gained before being.
7. w is calculated, according to obtain w=(1.2,3.3,4.2);
8. calculate b, obtain b=-2.2;
⑨f(x)=(w.x)+b=1.2x 1+3.3x 2+4.2x 3-2.2。
(5) identification (main frame completes) of image
For the image t of the unknown classification of a width, after pre-service, Iamge Segmentation, feature extraction, obtain its characteristic of correspondence vector value is x t=(x t1, x t2..., x tp).
As x t=(0.48,0.56,0.65), calculates f (x)=2.95>=1, and corresponding image belongs to the 1st class.
As x t=(0.25,0.12,0.11), calculates f (x)=-1.09≤-1, and corresponding image belongs to the 2nd class.

Claims (3)

1. based on a distributed image recognition methods of SVM, comprising: inner product calculates, optimal problem solves and image recognizing step, it is characterized in that: described inner product calculates, optimal problem solves and comprises with image recognizing step:
The preparation of step 1 image sample data collection and pre-service, each website completes the preparation of training image sample data collection, format conversion, dimension normalization, denoising, enhancing work respectively;
Step 2 Iamge Segmentation, each website adopts the image partition method of density based cluster to identify the region to be identified of every width training image respectively;
Step 3 feature extraction, each website extracts the feature in region to be identified in every width training image respectively, constructs the training image sample data collection DB of each website i, i=1,2 ..., k.Described training image sample set DB iin each sample be expressed as (x 1, x 2..., x p, y), wherein p is non-category attribute number, x 1, x 2..., x pfor non-category attribute, y is category attribute, and the value of y is 1 or-1, represents two class situations respectively;
The structure of step 4 optimal classification function f (x);
The identification of step 5 image.
2. a kind of distributed image recognition methods based on SVM according to claim 1, is characterized in that: the concrete steps of described step 4 comprise:
Step 4.1 initialization, it comprises:
Step 4.1.1 selects a platform independent computing machine as main frame (being designated as website S), and this machine is used for calculating inner product and solving optimization problem;
Step 4.1.2 sets the size of main frame two memory blocks, is used for the data of reception two websites respectively;
Step 4.2 inner product calculates, and it comprises:
The each website of step 4.2.1 host requests sends training image sample;
Step 4.2.2 calculates inner product;
Solving (main frame completes) of step 4.3 optimum solution, it comprises:
Step 4.3.1 asks mathematical model to be the optimum solution of (1) formula;
s.t.y i((w.x i)+b)≥1
(1) formula is converted into the saddle point asking formula (2) Lagrange function by step 4.3.2;
(2) formula is converted into the optimization problem asking formula (3) by step 4.3.3;
Step 4.3.4 solves the optimum solution of formula (3), obtains the solution α of α *;
Step 4.3.5 calculates w, wherein SV is support vector collection;
Step 4.3.6 calculates b, selects not to be 0 substitute into obtain b.
3. a kind of distributed image recognition methods based on SVM according to claim 1, is characterized in that: the concrete steps of described step 5 comprise:
The preparation of step 5.1 image to be identified and pre-service, comprise format conversion, dimension normalization, denoising, enhancing;
Step 5.2 main frame adopts the image partition method of density based cluster to identify the region to be identified of image to be identified;
Step 5.3 extracts the provincial characteristics to be identified of image to be identified;
Step 5.4 is according to the provincial characteristics x to be identified described in step 5.3 t=(x t1, x t2..., x tp) calculate f (x t);
Step 5.5 is according to f (x t) determine the classification of image to be identified.
CN201310687112.4A 2013-12-17 2013-12-17 Distributed image recognition method based on SVM Pending CN104715258A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310687112.4A CN104715258A (en) 2013-12-17 2013-12-17 Distributed image recognition method based on SVM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310687112.4A CN104715258A (en) 2013-12-17 2013-12-17 Distributed image recognition method based on SVM

Publications (1)

Publication Number Publication Date
CN104715258A true CN104715258A (en) 2015-06-17

Family

ID=53414568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310687112.4A Pending CN104715258A (en) 2013-12-17 2013-12-17 Distributed image recognition method based on SVM

Country Status (1)

Country Link
CN (1) CN104715258A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825226A (en) * 2016-03-11 2016-08-03 江苏畅远信息科技有限公司 Association-rule-based distributed multi-label image identification method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825226A (en) * 2016-03-11 2016-08-03 江苏畅远信息科技有限公司 Association-rule-based distributed multi-label image identification method

Similar Documents

Publication Publication Date Title
CN107563385B (en) License plate character recognition method based on depth convolution production confrontation network
CN110348294A (en) The localization method of chart, device and computer equipment in PDF document
CN110795919A (en) Method, device, equipment and medium for extracting table in PDF document
WO2017035922A1 (en) Online internet topic mining method based on improved lda model
CN102722713B (en) Handwritten numeral recognition method based on lie group structure data and system thereof
CN103605794A (en) Website classifying method
CN103106262B (en) The method and apparatus that document classification, supporting vector machine model generate
CN102663401B (en) Image characteristic extracting and describing method
WO2021233041A1 (en) Data annotation method and device, and fine granularity identification method and device
CN104517106A (en) List recognition method and system
EP4138050A1 (en) Table generating method and apparatus, electronic device, storage medium and product
CN105912525A (en) Sentiment classification method for semi-supervised learning based on theme characteristics
CN114782970A (en) Table extraction method, system and readable medium
WO2023001059A1 (en) Detection method and apparatus, electronic device and storage medium
CN103473275A (en) Automatic image labeling method and automatic image labeling system by means of multi-feature fusion
CN103473308B (en) High-dimensional multimedia data classifying method based on maximum margin tensor study
CN104484347A (en) Geographic information based hierarchical visual feature extracting method
CN104298975A (en) Distributed image identification method
EP2771813A1 (en) Aligning annotation of fields of documents
CN102194097A (en) Multifunctional method for identifying hand gestures
CN105279517A (en) Weak tag social image recognition method based on semi-supervision relation theme model
CN104715258A (en) Distributed image recognition method based on SVM
US20240037911A1 (en) Image classification method, electronic device, and storage medium
WO2018120575A1 (en) Method and device for identifying main picture in web page
CN102637200B (en) Method for distributing multi-level associated data to same node of cluster

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150617