CN108121998A - A kind of training method of support vector machine based on Spark frames - Google Patents

A kind of training method of support vector machine based on Spark frames Download PDF

Info

Publication number
CN108121998A
CN108121998A CN201711269096.1A CN201711269096A CN108121998A CN 108121998 A CN108121998 A CN 108121998A CN 201711269096 A CN201711269096 A CN 201711269096A CN 108121998 A CN108121998 A CN 108121998A
Authority
CN
China
Prior art keywords
sample
sample vector
sphere
centre
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711269096.1A
Other languages
Chinese (zh)
Other versions
CN108121998B (en
Inventor
许千帆
王宇
陈玫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Send Cloud Dingcheng Technology Co Ltd
Original Assignee
Beijing Send Cloud Dingcheng Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Send Cloud Dingcheng Technology Co Ltd filed Critical Beijing Send Cloud Dingcheng Technology Co Ltd
Priority to CN201711269096.1A priority Critical patent/CN108121998B/en
Publication of CN108121998A publication Critical patent/CN108121998A/en
Application granted granted Critical
Publication of CN108121998B publication Critical patent/CN108121998B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention provides a kind of training method of support vector machine based on Spark frames, including:Training sample set is obtained, all sample vector distributed storages that training sample is concentrated are in the back end of Spark frames;It is concentrated from training sample and extracts the sample vector V for violating KKT condition maximums2, while choose and sample vector V2The centre of sphere away from the maximum sample vector V of difference1;To sample vector V1And V2It is iterated optimization to calculate, obtains updated sample vector V1 newAnd V2 new;By sample vector V1 newAnd V2 newIt is broadcast in the back end of Spark, sample vector V is calculated in each back end1And V2The difference of generation obtains the updated centre of sphere so as to calculate;Then update the data the centre of sphere of each sample vector in node away from and the radius of a ball.Method provided by the invention by the way that the computation-intensive work of unit is distributed to each working node using Spark distributed computing frameworks, when data increase, can carry out extending transversely, memory space limits from unit.

Description

A kind of training method of support vector machine based on Spark frames
Technical field
The present invention relates to field of computer technology, are instructed more particularly, to a kind of support vector machines based on Spark frames Practice method.
Background technology
Support vector machines (Support Vector Machine, SVM) has been applied to information peace in large quantities since appearance Entirely, image procossing, pattern-recognition, the fields such as fault diagnosis, abnormality detection.1999, Tax, Scholkopf and Duin et al., It proposes 2 kinds of One Class SVM algorithms, is the One Class SVM based on hyperplane and based on suprasphere respectively.Wherein support Vector data description (support vector data description, SVDD) be with suprasphere into single class sorting technique, It aims at by the use of training data to describe a suprasphere as the discrimination model of classification.
Current common SVM pattern-recognitions are the scikit-learn of python and Taiwan woods intelligence with the software package returned The LIBSVM of benevolence professor.Wherein, Scikit-Learn is the machine learning module based on python, is increased income licensing based on BSD, What this project was initiated earliest by David Cournapeau in 2007, be also at present to be safeguarded by community volunteer; LIBSVM is a simple, easy to use and quickly and effectively SVM pattern of Taiwan Univ. Lin Zhiren professors et al. exploitation design Identification and the software package returned, it is not only provided compiled can additionally provide in the execution file of Windows serial systems Source code facilitates improvement, modification and is applied in other operating systems;The software is opposite to the parameter regulation involved by SVM It is fewer, many default parameters are provided, can be solved the problems, such as using these default parameters very much;And provide cross-verification Function.The software can solve the problems such as C-SVM, ν-SVM, ε-SVR and ν-SVR, including the multiclass based on one-to-one algorithm Pattern recognition problem.
But with the exponential growth of data volume, the requirement of standalone version memory and CPU can not meet demand, to calculating The demand of the method for solving of method parallelization is more and more urgent.SMO Algorithm for Solving Support Vector data description (support vector Data description, SVDD) it needs to calculate multiple quadratic programming problems and there is higher computational complexity, SVDD operations Time can increase with training samples number and increased dramatically.Storing the required memories of nuclear matrix Kii is instructed in training set Practice the rapid growth of points N, the scale of nuclear matrix is sample number quadratic relationship, directly detects SVDD applied to data exception Calculation amount can be caused excessive and memory overflow problem.
The content of the invention
To solve in the prior art, SMO Algorithm for Solving SVDD needs to calculate multiple quadratic programming problems and have higher Computational complexity, SVDD run times can increase with training samples number and increased dramatically.It is different that SVDD is directly applied to data Calculation amount can be caused excessive for often detection and memory overflow problem, proposes a kind of support vector machines training side based on Spark frames Method.
Method provided by the invention includes:
S1 obtains training sample set, and all sample vector distributed storages that the training sample is concentrated are in Spark frames In the back end of frame;
S2 is concentrated from the training sample and is extracted the sample vector V for violating KKT condition maximums2, while choose with sample to Measure V2The centre of sphere away from the maximum sample vector V of difference1
S3, to the sample vector V1And V2It is iterated optimization to calculate, obtains updated sample vector V1 newWith V2 new
S4, by the updated sample vector V1 newAnd V2 newIt is broadcast in the back end of the Spark, each The sample vector V is calculated in back end1And V2The difference of generation, according to the difference calculated in each back end, meter It calculates and obtains updated centre of sphere anew
S5, according to the updated centre of sphere anew, update the ball of each sample vector in the back end of the Spark The heart is away from while updating radius of sphericity R.
Wherein, the step S1 is further included:It reads in each back end and is instructed described in the corresponding back end Practice the sample vector in sample, a unique data mark is generated to sample vector each described.
Preferably, the unique data is identified by the timestamp of burst area code and the back end local of the back end It is composed.
Wherein, the calculating parameter initialized needed for the iteration optimization calculating is further included in the step S1;Wherein, it is described Calculating parameter includes Lagrange multiplier α, the centre of sphere a of all sample vectors and the centre of sphere of each sample vector away from d2
Wherein, the calculating parameter that the initialization iteration optimization calculates specifically includes:
The Lagrange multiplier α values for initializing all sample vectors are 1/N;Wherein, N is described in the training sample set The number of sample vector;
Initialize square R of radius of sphericity2So that R2=0;
The centre of sphere is initialized according to the following formula:
A is the centre of sphere in formula, and α i and α j concentrate any two sample vector, K for the training sampleijFor kernel function;
According to formulaThe centre of sphere of the sample vector is calculated away from d2
Preferably, in the step S2, concentrated from the training sample and extract the sample vector V for violating KKT condition maximums2 Extraction type be without putting back to extraction.
Wherein, chosen and sample vector V in the step S22The centre of sphere away from the maximum sample vector V of difference1It specifically includes:
For any one of back end, obtain in the back end with the sample vector V2The centre of sphere away from difference Maximum sample vector;
In the Driver Program of Spark frames according in each back end with the sample vector V2's The centre of sphere obtains and sample vector V away from the maximum sample vector of difference2The centre of sphere away from the maximum sample vector V of difference1
Wherein, in the step S4, calculate and obtain updated centre of sphere anewThe step of, it specifically includes:In Spark frames Driver Program in the difference being calculated in all back end is added up, calculate and obtain the new centre of sphere anew
Wherein, further included after the step S5:According to the centre of sphere of updated each vector away from seek radius R, Sample vector in boundary is removed, the sample for retaining all unbounded samples performs S1 to return.
Wherein, further included after the step S5, according to the drawing for judging to work as all sample vectors of training sample concentration Ge Lang multipliers all meet KKT conditions or the sample vector V1And V2Target loss function loss be less than predetermined threshold value when, stop Only train.
Method provided by the invention, by being distributed to the computation-intensive work of unit using Spark distributed computing frameworks Each working node;Unit is largely stored to nuclear matrix KiiIt is distributed to each back end, during data increase, transverse direction can be carried out Extension, and the time is calculated since operating point is independent, will not substantially it increase;Memory space limits from unit.On the other hand, apply The mode of incremental computations saves a large amount of computations cycles by the full dose calculation that each iteration will carry out is avoided, and accelerates Solve calculating process.
Description of the drawings
Fig. 1 is a kind of flow for training method of support vector machine based on Spark frames that one embodiment of the invention provides Figure;
Fig. 2 is Spark in a kind of training method of support vector machine based on Spark frames that one embodiment of the invention provides The structure chart of frame;
Fig. 3 is a kind of stream for training method of support vector machine based on Spark frames that further embodiment of this invention provides Cheng Tu.
Specific embodiment
With reference to the accompanying drawings and examples, the specific embodiment of the present invention is described in further detail.Implement below Example is not limited to the scope of the present invention for illustrating the present invention.
With reference to figure 1, Fig. 1 is a kind of support vector machines training side based on Spark frames that one embodiment of the invention provides The flow chart of method, the described method includes:
S1 obtains training sample set, and all sample vector distributed storages that the training sample is concentrated are in Spark frames In the back end of frame.
Specifically, after training sample set is received, it is by distributed storage, the sample vector in sample set is distributed It is stored in the back end under Spark frames.
As shown in Fig. 2, Apache Spark are to aim at large-scale distributed data distribution formula memory to calculate and design fast The general engine of speed.It is by the class Hadoop MapReduce to increase income of the AMP laboratories offer of University of California Berkeley Universal parallel frame.Spark can be preserved in memory due to exporting result among MapReduce Job, so as to no longer need HDFS is read and write, therefore Spark can preferably be suitable for the calculation that data mining and machine learning etc. need the MapReduce of iteration Method.Many Parallel Algorithms all have realization on Spark.
By the method, by the sample vector distributed storage in training set in multiple back end, during data increase, It can carry out extending transversely.
S2 is concentrated from the training sample and is extracted the sample vector V for violating KKT condition maximums2, while choose with sample to Measure V2The centre of sphere away from the maximum sample vector V of difference1
Specifically, Optimized Iterative process uses SMO algorithms, i.e., once two sample vectors is selected to optimize.General mark It is V to know two sample vectors optimized1And V2, according to the stop condition of selection can determine how selected element can to calculate Method convergence contribution is maximum, such as using the method for monitoring feasible gap, optimizes those point conducts for most violating KKT conditions first V2, according to KKT conditions, V1, V2Iterative relation can be determined as formula:
λ1122
In formula, K is kernel function, and α is Lagrange multiplier, d2For the centre of sphere away from.
In order to make the update step-length of each largest optimization maximum, it is seen that needs are foundMaximum, i.e.,Most Small value, so as to find V1
S3, to the sample vector V1And V2It is iterated optimization to calculate, obtains updated sample vector V1 newWith V2 new
S4, by the updated sample vector V1 newAnd V2 newIt is broadcast in the back end of the multiple Spark, The sample vector V is calculated in each back end1And V2The difference of generation, according to the difference calculated in each back end Point, it calculates and obtains updated centre of sphere anew
S5, according to the updated centre of sphere anew, update the ball of each sample vector in the back end of the Spark The heart is away from while updating radius of sphericity R.
Specifically, Optimized Iterative process uses SMO algorithms, according to One Class SVM model minimum sphere body Models, mesh Scalar functions formula is:
s.t.||Φ(xi)-a||2≤R2
ζi≥0
In formula, middle R is radius of sphericity, and a is the centre of sphere, and ζ is slack variable.
Solve the following formula quadratic programming problem, you can acquire the centre of sphere and radius.
All parameters are updated according to the step of Fig. 3, wherein newer parameter includes V1, V2Lagrange multiplier alpha parameter; Centre of sphere a, update the centre of sphere of each sample point vector away fromRadius of sphericity R, specific steps include:According to the following formula with new V1 And V2Lagrange multiplier α.
λ1122
V is updated1And V2Afterwards, updated sample vector V is obtained1 newAnd V2 new, by V1、V2、V1 newAnd V2 newAnd protocorm Heart a is broadcast in each back end of Spark, updates centre of sphere a, and more new formula is:
In formula, α i and α j concentrate any two sample vector, Ki for the training samplejFor kernel function, due to there was only V1, V2The parameter alpha of sample vector changes, thus only with feature vector V1And V2The data of related feature vector can be become Change, it is possible to be calculated using differential pair a, specific formula is as follows:
In formula, aoldFor the protocorm heart, anewIt is kernel function for updated centre of sphere K.The process of Difference Calculation is in each Spark Data fragmentation on carry out Distributed Calculation, and added up on the Driver Program of Spark.
It, can be to the centre of sphere of each sample vector away from being updated, by applying difference formula after with new centre of sphere parameter:
It can realize to the centre of sphere of each sample vector away from being updated, in formulaFor the new centre of sphere away from,For original The centre of sphere is away from a is the centre of sphere, and K is kernel function.This step carries out Distributed Calculation in the back end of Spark.
Finally, the update to radius of sphericity R is further included, specifically, working as V1And V2When being all unbounded sample, i.e. ξ < αi During < C, ξ is the decimal close to 0, and C is penalty factor, then the more new formula of R is:
Work as V1And V2When being all sample in boundary, i.e. αi≤ ξ, or αiWhen >=C, then more new formula is:
By the method, the computation-intensive work of unit is distributed to each work section using Spark distributed computing frameworks Point;Unit is largely stored to nuclear matrix Kii and is distributed to each working node.During data increase, extending transversely, calculating can be carried out Time since operating point is independent, will not substantially increase;Memory space limits from unit.On the other hand, using incremental computations Mode saves a large amount of computations cycles by the full dose calculation that each iteration will carry out is avoided, and accelerates to solve and calculated Journey.
On the basis of above-described embodiment, the step S1 is further included:Corresponding be somebody's turn to do is read in each back end Sample vector described in back end in training sample generates sample vector each described one unique data mark.
Preferably, the unique data is identified by the timestamp of burst area code and the back end local of the back end It is composed.
Specifically, before starting optimization and calculating, when all sample vector distributed storages that training sample is concentrated exist After in the back end of Spark frames, on each back end, the data data in the block of corresponding local can be read in, each Sample vector can generate a not repeating random number formation unique data mark id.Due to the sample that training sample is concentrated to Amount can carry out area there may be the identical situation of parameter, therefore here by unique data mark id to all sample vectors Point.
Preferably, id can be composed by burst sequence number and local timestamp.The unique id of data can be used for area The sample vector with identical memory address in point on difference Executor.
On the basis of the various embodiments described above, further included in the step S1 needed for the initialization iteration optimization calculating Calculating parameter;Wherein, the calculating parameter includes Lagrange multiplier α, centre of sphere a and each sample vector of all sample vectors The centre of sphere away from d2
Preferably, the calculating parameter that the initialization iteration optimization calculates specifically includes:
The Lagrange multiplier α values for initializing all sample vectors are 1/N;Wherein, N is described in the training sample set The number of sample vector;
Initialize square R of radius of sphericity2So that R2=0;
The centre of sphere is initialized according to the following formula:
A is the centre of sphere in formula, and α i and α j concentrate any two sample vector for the training sample, and Kij is kernel function;
According to formulaThe centre of sphere of the sample vector is calculated away from d2
Specifically, before iteration optimization calculating, the iterative calculation parameter of support vector machines is initialized first, it is first First initialize the Lagrange multiplier α of each sample vector, it is preferred that initial value is arranged to 1/N, wherein, N is the training The number of all sample vectors in sample set.This process is Distributed Calculation, is calculated respectively on each back end.
Then, square R of radius of sphericity is initialized2, it is preferred that radius of sphericity square is arranged to 0, i.e. R2=0.
Then, the centre of sphere is initialized according to the following formula:
In formula, a is the centre of sphere, and α i and α j concentrate any two sample vector for the training sample,For gaussian kernel function.
Finally, according to formula:
Each sample vector is calculated to the distance d of centre of sphere a2, the step need on data set carry out full dose calculating, obtain The result gone out is stored in using sample vector as in the HashMap of key.
On the basis of above-described embodiment, in the step S2, concentrate to extract from the training sample and violate KKT conditions most Big sample vector V2Extraction type be without putting back to extraction.
Specifically, as the sample vector V for extracting violation KKT condition maximums2When, selection is that nothing puts back to extraction, is made It obtains in entire big iteration cycle, all samples are traversed.
On the basis of the various embodiments described above, chosen and sample vector V in the step S22The centre of sphere it is away from difference maximum Sample vector V1It specifically includes:
For any one of back end, obtain in the back end with the sample vector V2The centre of sphere away from difference Maximum sample vector;
In the Driver Program of Spark frames according in each back end with the sample vector V2's The centre of sphere obtains and sample vector V away from the maximum sample vector of difference2The centre of sphere away from the maximum sample vector V of difference1
Specifically, as shown in figure 3, extracting sample vector V2Afterwards, in each back end of Spark, look for respectively Go out in the back end with sample vector V2The centre of sphere away from the maximum sample vector of difference;Thereafter, under Spark frames The global centre of sphere is chosen in Driver Program away from the maximum sample vector of difference as V1
On the basis of the various embodiments described above, in the step S4, calculate and obtain updated centre of sphere anewThe step of, tool Body includes:The difference being calculated in all back end is added up in the Driver Program of Spark frames, It calculates and obtains new centre of sphere anew
Specifically, as shown in figure 3, when to the sample vector V1And V2It is iterated optimization to calculate, obtains updated sample This vector V1 newAnd V2 newAfterwards, the V after will be updated1And V2It, can be in each data after being broadcast to each back end of Spark V is calculated in node1And V2The difference generated after variation, then, to all data in the Driver Program of Spark frames The difference being calculated in node is added up, and is calculated and is obtained new centre of sphere anew
On the basis of the various embodiments described above, further included after the step S5:According to updated each vector The centre of sphere away from radius of sphericity R, remove sample vector in boundary, retain the samples of all unbounded samples to returning and perform S1.
Specifically, after an iteration optimization calculates completion, next group of V is reselected1And V2, carry out next round iteration It calculates, using heuristic selection method, the unbounded sample of prioritizing selection is calculated, sample in suboptimization circle.It preferably, can be with All sample vectors of sample in boundary are removed, it is follow-up to differentiate that calculated value needs to use the sample vector of unbounded sample.
On the basis of the various embodiments described above, according to judgement when the training sample concentrates the glug of all sample vectors bright Day multiplier all meets KKT conditions or the sample vector V1And V2Target loss function loss be less than predetermined threshold value when, stop instruction Practice.
Specifically, all the points Lagrange multiplier ɑ meets KKT conditions or reaches optimization aim after certain iterations When loss function loss is less than a predetermined threshold value, then it is assumed that optimization reaches approximately KKT conditions.It can stop instructing at this time Practice.
By the method, when target loss function loss is less than a predetermined threshold value, then may indicate that follow-up excellent The effect of change is not apparent enough, at this time deconditioning, to reduce whole calculation amount.
Finally, the present processes are only preferable embodiment, are not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modifications, equivalent replacements and improvements are made should be included in the protection of the present invention Within the scope of.

Claims (10)

1. a kind of training method of support vector machine based on Spark frames, which is characterized in that including:
S1 obtains training sample set, and all sample vector distributed storages that the training sample is concentrated are in Spark frames In back end;
S2 is concentrated from the training sample and is extracted the sample vector V for violating KKT condition maximums2, while choose and sample vector V2 The centre of sphere away from the maximum sample vector V of difference1
S3, to the sample vector V1And V2It is iterated optimization to calculate, obtains updated sample vector V1 newAnd V2 new
S4, by the updated sample vector V1 newAnd V2 newIt is broadcast in the back end of the Spark, in each data The sample vector V is calculated in node1And V2The difference of generation, according to the difference calculated in each back end, calculating obtains Obtain updated centre of sphere anew
S5, according to the updated centre of sphere anew, update the centre of sphere of each sample vector in the back end of the Spark away from, Update radius of sphericity R simultaneously.
2. according to the method described in claim 1, it is characterized in that, the step S1 is further included:To each back end The sample vector in training sample described in the corresponding back end is read in, one is generated to sample vector each described only One Data Identification.
3. according to the method described in claim 2, it is characterized in that, the unique data identifies the burst by the back end The timestamp of area code and back end local is composed.
4. according to the method described in claim 2, it is characterized in that, the initialization iteration optimization is further included in the step S1 Calculating parameter needed for calculating;
Wherein, the calculating parameter includes Lagrange multiplier α, the centre of sphere a of all sample vectors and the ball of each sample vector The heart is away from d2
5. the according to the method described in claim 4, it is characterized in that, calculating parameter that the initialization iteration optimization calculates It specifically includes:
The Lagrange multiplier α values for initializing all sample vectors are 1/N;
Wherein, N is the number of sample vector described in the training sample set;
Initialize square R2 of radius of sphericity so that R2=0;
The centre of sphere is initialized according to the following formula:
<mrow> <msup> <mi>a</mi> <mn>2</mn> </msup> <mo>=</mo> <munder> <mo>&amp;Sigma;</mo> <mi>i</mi> </munder> <munder> <mo>&amp;Sigma;</mo> <mi>j</mi> </munder> <mi>&amp;alpha;</mi> <mi>i</mi> <mo>&amp;CenterDot;</mo> <mi>&amp;alpha;</mi> <mi>j</mi> <mo>&amp;CenterDot;</mo> <mi>K</mi> <mi>i</mi> <mi>j</mi> </mrow>
A is the centre of sphere in formula, and α i and α j concentrate any two sample vector, K for the training sampleijFor kernel function;
According to formulaThe centre of sphere of the sample vector is calculated away from d2
6. according to the method described in claim 1, it is characterized in that, in the step S2, concentrate and extract from the training sample Violate the sample vector V of KKT condition maximums2Extraction type be without putting back to extraction.
7. it according to the method described in claim 1, it is characterized in that, is chosen and sample vector V in the step S22The centre of sphere away from Differ maximum sample vector V1It specifically includes:
For any one of back end, obtain in the back end with the sample vector V2The centre of sphere it is maximum away from difference Sample vector;
In the Driver Program of Spark frames according in each back end with the sample vector V2The centre of sphere Away from the sample vector that difference is maximum, obtain and sample vector V2The centre of sphere away from the maximum sample vector V of difference1
8. according to the method described in claim 1, it is characterized in that, in the step S4, calculate and obtain updated centre of sphere anew The step of, it specifically includes:To the difference being calculated in all back end in the Driver Program of Spark frames It is added up, calculates and obtain new centre of sphere anew
9. it according to the method described in claim 1, it is characterized in that, is further included after the step S5:According to updated institute State the centre of sphere of each vector away from radius of a ball R, remove sample vector in boundary, retain the samples of all unbounded samples and perform to returning S1。
10. it according to the method described in claim 8, it is characterized in that, is further included after the step S5, according to judging to work as Training sample concentrates the Lagrange multiplier of all sample vectors all to meet KKT conditions or the sample vector V1And V2Target When loss function loss is less than predetermined threshold value, deconditioning.
CN201711269096.1A 2017-12-05 2017-12-05 Spark frame-based support vector machine training method Active CN108121998B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711269096.1A CN108121998B (en) 2017-12-05 2017-12-05 Spark frame-based support vector machine training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711269096.1A CN108121998B (en) 2017-12-05 2017-12-05 Spark frame-based support vector machine training method

Publications (2)

Publication Number Publication Date
CN108121998A true CN108121998A (en) 2018-06-05
CN108121998B CN108121998B (en) 2020-09-25

Family

ID=62228798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711269096.1A Active CN108121998B (en) 2017-12-05 2017-12-05 Spark frame-based support vector machine training method

Country Status (1)

Country Link
CN (1) CN108121998B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190719A (en) * 2018-11-30 2019-01-11 长沙理工大学 Support vector machines learning method, device, equipment and computer readable storage medium
CN110210566A (en) * 2019-06-06 2019-09-06 无锡火球普惠信息科技有限公司 One-to-many supporting vector machine frame and its parallel method based on Spark
CN111368874A (en) * 2020-01-23 2020-07-03 天津大学 Image category incremental learning method based on single classification technology

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184421A (en) * 2011-04-22 2011-09-14 北京航空航天大学 Training method of support vector regression machine
CN104463211A (en) * 2014-12-08 2015-03-25 天津大学 Support vector data description method based on maximum distance between centers of spheres
CN105975907A (en) * 2016-04-27 2016-09-28 江苏华通晟云科技有限公司 SVM model pedestrian detection method based on distributed platform
CN106203485A (en) * 2016-07-01 2016-12-07 北京邮电大学 A kind of parallel training method and device of support vector machine
CN106469315A (en) * 2016-09-05 2017-03-01 南京理工大学 Based on the multi-mode complex probe target identification method improving One Class SVM algorithm
CN107194411A (en) * 2017-04-13 2017-09-22 哈尔滨工程大学 A kind of SVMs parallel method of improved layering cascade

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184421A (en) * 2011-04-22 2011-09-14 北京航空航天大学 Training method of support vector regression machine
CN104463211A (en) * 2014-12-08 2015-03-25 天津大学 Support vector data description method based on maximum distance between centers of spheres
CN105975907A (en) * 2016-04-27 2016-09-28 江苏华通晟云科技有限公司 SVM model pedestrian detection method based on distributed platform
CN106203485A (en) * 2016-07-01 2016-12-07 北京邮电大学 A kind of parallel training method and device of support vector machine
CN106469315A (en) * 2016-09-05 2017-03-01 南京理工大学 Based on the multi-mode complex probe target identification method improving One Class SVM algorithm
CN107194411A (en) * 2017-04-13 2017-09-22 哈尔滨工程大学 A kind of SVMs parallel method of improved layering cascade

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DAVID MJ TAX: "Data Domain Description using Support Vectors", 《ESANN"1999》 *
张瑜,罗可: "基于OC-SVM的大型数据集分类方法", 《计算机工程与应用》 *
徐图 等: "超球体单类支持向量机的 SMO 训练算法", 《计算机科学》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190719A (en) * 2018-11-30 2019-01-11 长沙理工大学 Support vector machines learning method, device, equipment and computer readable storage medium
CN110210566A (en) * 2019-06-06 2019-09-06 无锡火球普惠信息科技有限公司 One-to-many supporting vector machine frame and its parallel method based on Spark
CN111368874A (en) * 2020-01-23 2020-07-03 天津大学 Image category incremental learning method based on single classification technology
CN111368874B (en) * 2020-01-23 2022-11-15 天津大学 Image category incremental learning method based on single classification technology

Also Published As

Publication number Publication date
CN108121998B (en) 2020-09-25

Similar Documents

Publication Publication Date Title
US11501192B2 (en) Systems and methods for Bayesian optimization using non-linear mapping of input
Zaremba et al. Reinforcement learning neural turing machines-revised
Stachurski Economic dynamics: theory and computation
CN110569979B (en) Logical-physical bit remapping method for noisy medium-sized quantum equipment
CN112132179A (en) Incremental learning method and system based on small number of labeled samples
CN108121998A (en) A kind of training method of support vector machine based on Spark frames
Valero-Carreras et al. Support vector frontiers: A new approach for estimating production functions through support vector machines
CN109241290A (en) A kind of knowledge mapping complementing method, device and storage medium
Zhang et al. Discrete-time and discrete-space dynamical systems
Hajipour et al. SampleFix: learning to correct programs by sampling diverse fixes
CN112183671A (en) Target attack counterattack sample generation method for deep learning model
CN110795736B (en) Malicious android software detection method based on SVM decision tree
Bhatt et al. Policy gradient using weak derivatives for reinforcement learning
Lee et al. Trajectory of mini-batch momentum: batch size saturation and convergence in high dimensions
CN113535947A (en) Multi-label classification method and device for incomplete data with missing labels
CN103955443A (en) Ant colony algorithm optimization method based on GPU (Graphic Processing Unit) acceleration
CN110717601A (en) Anti-fraud method based on supervised learning and unsupervised learning
Dixit et al. An implementation of data pre-processing for small dataset
Sha et al. Estimating minimum operation steps via memory-based recurrent calculation network
Luan et al. Optimal representative distribution margin machine for multi-instance learning
Hu et al. Optimizing resource allocation for data-parallel jobs via gcn-based prediction
WO2019209571A1 (en) Proactive data modeling
CN107480790A (en) A kind of optimal parameter combination of SVMs(C, σ)Method for fast searching
CN112561047B (en) Apparatus, method and computer readable storage medium for processing data
CN114708608B (en) Full-automatic characteristic engineering method and device for bank bills

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant