CN104715258A - Distributed image recognition method based on SVM - Google Patents
Distributed image recognition method based on SVM Download PDFInfo
- Publication number
- CN104715258A CN104715258A CN201310687112.4A CN201310687112A CN104715258A CN 104715258 A CN104715258 A CN 104715258A CN 201310687112 A CN201310687112 A CN 201310687112A CN 104715258 A CN104715258 A CN 104715258A
- Authority
- CN
- China
- Prior art keywords
- image
- website
- identified
- calculates
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a distributed image recognition method based on SVM. The method includes the steps of pre-processing of distributed image samples, image segmentation, feature extraction, inner production calculation, optimal problem solution and image recognition. Through the method, the types of images to be recognized can be recognized when the training image samples are distributed, a corresponding solution is given for construction of a linear classifier in the recognizing process, and the distributed image recognition method based on SVM is provided. Through the method, it can be guaranteed that data of stations do not reside on other stations, safety and privacy of the data are guaranteed, and meanwhile the method is high in recognition accuracy rate.
Description
Technical field
The invention belongs to the application of the Computer Analysis technology of image, be specifically related to the image-recognizing method under a kind of distributed environment.
Background technology
SVM is a kind of data mining technology with solving classification and regression problem, because SVM method has many noticeable advantages and good experimental performance, become the focus of machine learning research field, and achieved good effect, as text classification, handwriting recognition, Images Classification and identification etc.
In many practical applications, data itself are distributions, between them except passing through network delivery information, other resource is all independent, distributed image identification is an important research branch in distributed data digging technology, it is intended to come structural classification function or sorter by the training image sample data collection under distributional environment, and utilizes this classification function or sorter to identify the classification of testing image.For solving the problem of image recognition in training image sample distribution situation, a feasible solution is focused on by these data sets on a certain machine, recycling algorithm SVM carrys out structural classification device, or utilizes MapReduce programming model to construct the sorter under distributional environment.Generally, at least there is the problem of two aspects in this type of thought, one is need to arrange the performance computing machine that comparatively (very) is high store and process these jumbo data, two is under many circumstances, for the consideration to data security and privacy, the concentrated of data is impossible.To this, the present invention proposes a kind of distributed image recognition methods based on SVM, the method concentrates implied sorter by training image sample data under discovery distributional environment, realizes the automatic identification of image thus.
Summary of the invention
Carry out knowledge method for distinguishing to image under the object of this invention is to provide a kind of training image sample distribution formula situation, the method can construct linear classifier rapidly, realizes the image identification function of precise and high efficiency.
Technical scheme of the present invention is: a kind of distributed image recognition methods based on SVM, comprising: inner product calculates, optimal problem solves and image recognizing step, it is characterized in that: described inner product calculates, optimal problem solves and comprises with image recognizing step:
The preparation of step 1 image sample data collection and pre-service, each website completes the preparation of training image sample data collection, format conversion, dimension normalization, denoising, enhancing work respectively;
Step 2 Iamge Segmentation, each website adopts the image partition method of density based cluster to identify the region to be identified of every width training image respectively;
Step 3 feature extraction, each website extracts the feature in region to be identified in every width training image respectively, constructs the training image sample data collection DB of each website
i, i=1,2 ..., k.Described training image sample set DB
iin each sample be expressed as (x
1, x
2..., x
p, y), wherein p is non-category attribute number, x
1, x
2..., x
pfor non-category attribute, y is category attribute, and the value of y is 1 or-1, represents two class situations respectively.
The structure of step 4 optimal classification function f (x);
The identification of step 5 image.
The concrete steps of the structure of described step 4 optimal classification function f (x) comprising:
Step 4.1 initialization, it comprises:
Step 4.1.1 selects a platform independent computing machine as main frame (being designated as website S), and this machine is used for calculating inner product and solving optimization problem;
Step 4.1.2 sets the size of main frame two memory blocks, is used for the data of reception two websites respectively;
Step 4.2 inner product calculates, and it comprises:
The each website of step 4.2.1 host requests sends training image sample;
Step 4.2.2 calculates inner product;
Solving (main frame completes) of step 4.3 optimum solution, it comprises:
Step 4.3.1 asks mathematical model to be the optimum solution of (1) formula;
s.t.y
i((w.x
i)+b)≥1
(1) formula is converted into the saddle point asking formula (2) Lagrange function by step 4.3.2;
(2) formula is converted into the optimization problem asking formula (3) by step 4.3.3;
Step 4.3.4 solves the optimum solution of formula (3), obtains the solution α of α
*;
Step 4.3.5 calculates w,
wherein SV is support vector collection;
Step 4.3.6 calculates b, selects not to be 0
substitute into
obtain b;
The concrete steps that described step 5 main frame carries out image recognition comprise:
The preparation of step 5.1 image to be identified and pre-service, comprise format conversion, dimension normalization, denoising, enhancing;
Step 5.2 main frame adopts the image partition method of density based cluster to identify the region to be identified of image to be identified;
Step 5.3 extracts the provincial characteristics to be identified of image to be identified;
Step 5.4 is according to the provincial characteristics x to be identified described in step 5.3
t=(x
t1, x
t2..., x
tp) calculate f (x
t);
Step 5.5 is according to f (x
t) determine the classification of image to be identified.
Main beneficial effect of the present invention is the classification that can identify image to be identified in training image sample distribution situation, and give corresponding solution with regard to the structure of identifying neutral line sorter, propose a kind of distributed image recognition methods based on SVM, the method can guarantee that each station data is not resident at other websites, ensure that safety and the privacy of data, have higher recognition accuracy simultaneously.
Accompanying drawing explanation
Fig. 1 is the structured flowchart of the embodiment of the present invention
Fig. 2 is the structure flow process of optimal classification function f (x) in the embodiment of the present invention
Embodiment
If training image total sample number is n, k website under distributional environment is S respectively
1, S
2..., S
k, between them except passing through network delivery information, other resource (as hard disk, internal memory etc.) is all independent, website S
i(i=1,2 ..., the training image sample number k) is n
i, obvious n
1+ n
2+ ... ..+n
kthe vector representation of=n, training image sample x is (x
1, x
2..., x
p, y), wherein p is non-category attribute number, x
1, x
2..., x
pfor non-category attribute, y is category attribute, and the value of y is 1 or-1, represents two class situations respectively.As shown in Figure 1, it mainly comprises the content of the following aspects:
(1) pre-service
Each website completes preparation to training image sample data collection, format conversion, dimension normalization, denoising, enhancing work respectively.
(2) Iamge Segmentation
Each website adopts the image partition method of density based cluster to identify the region to be identified of every width training image respectively.
(3) feature extraction
Each website extracts the feature in region to be identified in every width training image respectively, constructs the training image sample data collection DB of each website
i, i=1,2 ..., k.
(4) structure of optimal classification function f (x)
As shown in Figure 2, the structure of optimal classification function f (x) is divided into initialization, calculates inner product, asks optimum solution.
1. initialization
Initialization comprises the following steps:
A selects a platform independent computing machine as main frame (being designated as website S), and this machine is used for calculating inner product and solving optimization problem;
B sets the size of main frame two memory blocks, is used for the data of reception two websites respectively, if its value is m respectively
1, m
2, m
1>m
2;
2. inner product calculates
If every block can deposit m bar sample, website S
1, S
2..., S
kon sample number be designated as respectively | S
1|, | S
2| ..., | S
k|, suppose | S
1|≤| S
2|≤... ≤ | S
k|.The calculating of inner product comprises the following steps:
a for(i=1;i≤k;i++)do begin
B while (website i does not send sample in addition) do begin
C main frame asks to send m*m to website i
1bar sample;
d for(j=i+1;j≤k;i++)do begin
E while (website j does not send sample in addition) do begin
F main frame asks to send m*m to website j
2bar sample;
G Framework computing website i and website j sends the inner product between sample and is stored on main frame;
h end
i end
j end
k end
3. the solving (main frame completes) of optimum solution
The method for solving of optimum solution comprises the following steps:
A asks mathematical model to be the optimum solution of (1) formula;
s.t.y
i((w.x
i)+b)≥1
(1) formula is converted into the saddle point asking formula (2) Lagrange function by b;
(2) formula is converted into the optimization problem asking formula (3) by c;
D solves the optimum solution of formula (3), obtains the solution α of α
*;
E calculates w,
wherein SV is support vector collection;
F calculates b, selects not to be 0
substitute into
obtain b;
g f(x)=(w.x)+b;
(5) identification (main frame completes) of image
For the image t of a width Unknown Label collection, its identifying comprises the following steps:
1. pre-service
Format conversion, dimension normalization, denoising, enhancing process are carried out to image t.
2. Iamge Segmentation
The image partition method of density based cluster is adopted to identify the region to be identified of image t.
3. feature extraction
Extract the feature in region to be identified in image t.
4. image recognition
If the proper vector value that image t obtains after above-mentioned 4 step process is x
t=(x
t1, x
t2..., x
tp), the identifying of image t comprises the following steps:
A calculates f (x
t);
B is according to f (x
t) determine the classification of image t;
Below with certain distributed image for embodiment, explain implementation of the present invention.This example have selected 52 width images, and they are distributed on the website of three platform independent respectively, website 1,2,3 respectively houses 20,16,16 width sample images, and concrete execution step is as follows:
(1) each website carries out format conversion, dimension normalization, denoising, enhancing process to this 52 width image respectively.
(2) each website is split respectively and is extracted the correlated characteristic in region to be identified in every width image and be normalized, and result is as shown in table 1.Cass collection of the present invention three features, be designated as feature 1, feature 2, feature 3 respectively, classification divides two classes, is designated as 1 and-1 respectively.
Fig. 1 characteristics of image table
(3) structure of optimal classification function f (x)
Select a platform independent computing machine as main frame (being designated as website S), this machine is used for calculating inner product and solving optimization problem, if two memory block sizes that this main frame is reserved are respectively 2 and 1, the open ended sample number of every block is 8, and the structure of optimal classification function f (x) is specific as follows:
1. website S is to website 2 (sample number on website 2 is minimum) request transmission 16 (2*8) individual sample data, and website 2 sends 16 sample datas to website S, is designated as DS
21, website S calculates DS
21the inner product of interior any two samples;
2. website S asks transmission 8 (1*8) individual sample data to website 3, and website 3 sends 8 sample datas to website S, is designated as DS
31, website S calculates DS
21, DS
31between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 3, and website 3 sends 8 sample datas to website S, is designated as DS
32, website S calculates DS
21, DS
32between the inner product of two samples, and be kept on website S;
3. website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS
11, website S calculates DS
21, DS
11between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS
12, website S calculates DS
21, DS
12between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, because website 1 only has 4 samples, website 1 sends 4 sample datas to website S, is designated as DS
13, website S calculates DS
21, DS
13between the inner product of two samples, and be kept on website S;
4. website S asks transmission 16 sample datas to website 3, and website 3 sends 16 sample datas to website S, is designated as DS
31;
5. website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS
11, website S calculates DS
31, DS
11between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, and website 1 sends 8 sample datas to website S, is designated as DS
12, website S calculates DS
31, DS
12between the inner product of two samples, and to be kept on website S.Website S asks transmission 8 sample datas to website 1, and website 1 sends 4 sample datas to website S, is designated as DS
13, website S calculates DS
31, DS
13between the inner product of two samples, and be kept on website S;
6. optimization problem (5) formula is solved;
X in formula (5)
i.x
jthe inner product of each step gained before being.
7. w is calculated, according to
obtain w=(1.2,3.3,4.2);
8. calculate b, obtain b=-2.2;
⑨f(x)=(w.x)+b=1.2x
1+3.3x
2+4.2x
3-2.2。
(5) identification (main frame completes) of image
For the image t of the unknown classification of a width, after pre-service, Iamge Segmentation, feature extraction, obtain its characteristic of correspondence vector value is x
t=(x
t1, x
t2..., x
tp).
As x
t=(0.48,0.56,0.65), calculates f (x)=2.95>=1, and corresponding image belongs to the 1st class.
As x
t=(0.25,0.12,0.11), calculates f (x)=-1.09≤-1, and corresponding image belongs to the 2nd class.
Claims (3)
1. based on a distributed image recognition methods of SVM, comprising: inner product calculates, optimal problem solves and image recognizing step, it is characterized in that: described inner product calculates, optimal problem solves and comprises with image recognizing step:
The preparation of step 1 image sample data collection and pre-service, each website completes the preparation of training image sample data collection, format conversion, dimension normalization, denoising, enhancing work respectively;
Step 2 Iamge Segmentation, each website adopts the image partition method of density based cluster to identify the region to be identified of every width training image respectively;
Step 3 feature extraction, each website extracts the feature in region to be identified in every width training image respectively, constructs the training image sample data collection DB of each website
i, i=1,2 ..., k.Described training image sample set DB
iin each sample be expressed as (x
1, x
2..., x
p, y), wherein p is non-category attribute number, x
1, x
2..., x
pfor non-category attribute, y is category attribute, and the value of y is 1 or-1, represents two class situations respectively;
The structure of step 4 optimal classification function f (x);
The identification of step 5 image.
2. a kind of distributed image recognition methods based on SVM according to claim 1, is characterized in that: the concrete steps of described step 4 comprise:
Step 4.1 initialization, it comprises:
Step 4.1.1 selects a platform independent computing machine as main frame (being designated as website S), and this machine is used for calculating inner product and solving optimization problem;
Step 4.1.2 sets the size of main frame two memory blocks, is used for the data of reception two websites respectively;
Step 4.2 inner product calculates, and it comprises:
The each website of step 4.2.1 host requests sends training image sample;
Step 4.2.2 calculates inner product;
Solving (main frame completes) of step 4.3 optimum solution, it comprises:
Step 4.3.1 asks mathematical model to be the optimum solution of (1) formula;
s.t.y
i((w.x
i)+b)≥1
(1) formula is converted into the saddle point asking formula (2) Lagrange function by step 4.3.2;
(2) formula is converted into the optimization problem asking formula (3) by step 4.3.3;
Step 4.3.4 solves the optimum solution of formula (3), obtains the solution α of α
*;
Step 4.3.5 calculates w,
wherein SV is support vector collection;
Step 4.3.6 calculates b, selects not to be 0
substitute into
obtain b.
3. a kind of distributed image recognition methods based on SVM according to claim 1, is characterized in that: the concrete steps of described step 5 comprise:
The preparation of step 5.1 image to be identified and pre-service, comprise format conversion, dimension normalization, denoising, enhancing;
Step 5.2 main frame adopts the image partition method of density based cluster to identify the region to be identified of image to be identified;
Step 5.3 extracts the provincial characteristics to be identified of image to be identified;
Step 5.4 is according to the provincial characteristics x to be identified described in step 5.3
t=(x
t1, x
t2..., x
tp) calculate f (x
t);
Step 5.5 is according to f (x
t) determine the classification of image to be identified.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310687112.4A CN104715258A (en) | 2013-12-17 | 2013-12-17 | Distributed image recognition method based on SVM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310687112.4A CN104715258A (en) | 2013-12-17 | 2013-12-17 | Distributed image recognition method based on SVM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104715258A true CN104715258A (en) | 2015-06-17 |
Family
ID=53414568
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310687112.4A Pending CN104715258A (en) | 2013-12-17 | 2013-12-17 | Distributed image recognition method based on SVM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104715258A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105825226A (en) * | 2016-03-11 | 2016-08-03 | 江苏畅远信息科技有限公司 | Association-rule-based distributed multi-label image identification method |
-
2013
- 2013-12-17 CN CN201310687112.4A patent/CN104715258A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105825226A (en) * | 2016-03-11 | 2016-08-03 | 江苏畅远信息科技有限公司 | Association-rule-based distributed multi-label image identification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107563385B (en) | License plate character recognition method based on depth convolution production confrontation network | |
CN110348294A (en) | The localization method of chart, device and computer equipment in PDF document | |
CN110795919A (en) | Method, device, equipment and medium for extracting table in PDF document | |
WO2017035922A1 (en) | Online internet topic mining method based on improved lda model | |
CN102722713B (en) | Handwritten numeral recognition method based on lie group structure data and system thereof | |
CN103605794A (en) | Website classifying method | |
CN103106262B (en) | The method and apparatus that document classification, supporting vector machine model generate | |
CN102663401B (en) | Image characteristic extracting and describing method | |
WO2021233041A1 (en) | Data annotation method and device, and fine granularity identification method and device | |
CN104517106A (en) | List recognition method and system | |
EP4138050A1 (en) | Table generating method and apparatus, electronic device, storage medium and product | |
CN105912525A (en) | Sentiment classification method for semi-supervised learning based on theme characteristics | |
CN114782970A (en) | Table extraction method, system and readable medium | |
WO2023001059A1 (en) | Detection method and apparatus, electronic device and storage medium | |
CN103473275A (en) | Automatic image labeling method and automatic image labeling system by means of multi-feature fusion | |
CN103473308B (en) | High-dimensional multimedia data classifying method based on maximum margin tensor study | |
CN104484347A (en) | Geographic information based hierarchical visual feature extracting method | |
CN104298975A (en) | Distributed image identification method | |
EP2771813A1 (en) | Aligning annotation of fields of documents | |
CN102194097A (en) | Multifunctional method for identifying hand gestures | |
CN105279517A (en) | Weak tag social image recognition method based on semi-supervision relation theme model | |
CN104715258A (en) | Distributed image recognition method based on SVM | |
US20240037911A1 (en) | Image classification method, electronic device, and storage medium | |
WO2018120575A1 (en) | Method and device for identifying main picture in web page | |
CN102637200B (en) | Method for distributing multi-level associated data to same node of cluster |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150617 |