CN108121998A - A kind of training method of support vector machine based on Spark frames - Google Patents
A kind of training method of support vector machine based on Spark frames Download PDFInfo
- Publication number
- CN108121998A CN108121998A CN201711269096.1A CN201711269096A CN108121998A CN 108121998 A CN108121998 A CN 108121998A CN 201711269096 A CN201711269096 A CN 201711269096A CN 108121998 A CN108121998 A CN 108121998A
- Authority
- CN
- China
- Prior art keywords
- sample
- sample vector
- sphere
- centre
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
- G06V10/95—Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Complex Calculations (AREA)
Abstract
The present invention provides a kind of training method of support vector machine based on Spark frames, including:Training sample set is obtained, all sample vector distributed storages that training sample is concentrated are in the back end of Spark frames;It is concentrated from training sample and extracts the sample vector V for violating KKT condition maximums2, while choose and sample vector V2The centre of sphere away from the maximum sample vector V of difference1;To sample vector V1And V2It is iterated optimization to calculate, obtains updated sample vector V1 newAnd V2 new;By sample vector V1 newAnd V2 newIt is broadcast in the back end of Spark, sample vector V is calculated in each back end1And V2The difference of generation obtains the updated centre of sphere so as to calculate;Then update the data the centre of sphere of each sample vector in node away from and the radius of a ball.Method provided by the invention by the way that the computation-intensive work of unit is distributed to each working node using Spark distributed computing frameworks, when data increase, can carry out extending transversely, memory space limits from unit.
Description
Technical field
The present invention relates to field of computer technology, are instructed more particularly, to a kind of support vector machines based on Spark frames
Practice method.
Background technology
Support vector machines (Support Vector Machine, SVM) has been applied to information peace in large quantities since appearance
Entirely, image procossing, pattern-recognition, the fields such as fault diagnosis, abnormality detection.1999, Tax, Scholkopf and Duin et al.,
It proposes 2 kinds of One Class SVM algorithms, is the One Class SVM based on hyperplane and based on suprasphere respectively.Wherein support
Vector data description (support vector data description, SVDD) be with suprasphere into single class sorting technique,
It aims at by the use of training data to describe a suprasphere as the discrimination model of classification.
Current common SVM pattern-recognitions are the scikit-learn of python and Taiwan woods intelligence with the software package returned
The LIBSVM of benevolence professor.Wherein, Scikit-Learn is the machine learning module based on python, is increased income licensing based on BSD,
What this project was initiated earliest by David Cournapeau in 2007, be also at present to be safeguarded by community volunteer;
LIBSVM is a simple, easy to use and quickly and effectively SVM pattern of Taiwan Univ. Lin Zhiren professors et al. exploitation design
Identification and the software package returned, it is not only provided compiled can additionally provide in the execution file of Windows serial systems
Source code facilitates improvement, modification and is applied in other operating systems;The software is opposite to the parameter regulation involved by SVM
It is fewer, many default parameters are provided, can be solved the problems, such as using these default parameters very much;And provide cross-verification
Function.The software can solve the problems such as C-SVM, ν-SVM, ε-SVR and ν-SVR, including the multiclass based on one-to-one algorithm
Pattern recognition problem.
But with the exponential growth of data volume, the requirement of standalone version memory and CPU can not meet demand, to calculating
The demand of the method for solving of method parallelization is more and more urgent.SMO Algorithm for Solving Support Vector data description (support vector
Data description, SVDD) it needs to calculate multiple quadratic programming problems and there is higher computational complexity, SVDD operations
Time can increase with training samples number and increased dramatically.Storing the required memories of nuclear matrix Kii is instructed in training set
Practice the rapid growth of points N, the scale of nuclear matrix is sample number quadratic relationship, directly detects SVDD applied to data exception
Calculation amount can be caused excessive and memory overflow problem.
The content of the invention
To solve in the prior art, SMO Algorithm for Solving SVDD needs to calculate multiple quadratic programming problems and have higher
Computational complexity, SVDD run times can increase with training samples number and increased dramatically.It is different that SVDD is directly applied to data
Calculation amount can be caused excessive for often detection and memory overflow problem, proposes a kind of support vector machines training side based on Spark frames
Method.
Method provided by the invention includes:
S1 obtains training sample set, and all sample vector distributed storages that the training sample is concentrated are in Spark frames
In the back end of frame;
S2 is concentrated from the training sample and is extracted the sample vector V for violating KKT condition maximums2, while choose with sample to
Measure V2The centre of sphere away from the maximum sample vector V of difference1;
S3, to the sample vector V1And V2It is iterated optimization to calculate, obtains updated sample vector V1 newWith
V2 new;
S4, by the updated sample vector V1 newAnd V2 newIt is broadcast in the back end of the Spark, each
The sample vector V is calculated in back end1And V2The difference of generation, according to the difference calculated in each back end, meter
It calculates and obtains updated centre of sphere anew;
S5, according to the updated centre of sphere anew, update the ball of each sample vector in the back end of the Spark
The heart is away from while updating radius of sphericity R.
Wherein, the step S1 is further included:It reads in each back end and is instructed described in the corresponding back end
Practice the sample vector in sample, a unique data mark is generated to sample vector each described.
Preferably, the unique data is identified by the timestamp of burst area code and the back end local of the back end
It is composed.
Wherein, the calculating parameter initialized needed for the iteration optimization calculating is further included in the step S1;Wherein, it is described
Calculating parameter includes Lagrange multiplier α, the centre of sphere a of all sample vectors and the centre of sphere of each sample vector away from d2。
Wherein, the calculating parameter that the initialization iteration optimization calculates specifically includes:
The Lagrange multiplier α values for initializing all sample vectors are 1/N;Wherein, N is described in the training sample set
The number of sample vector;
Initialize square R of radius of sphericity2So that R2=0;
The centre of sphere is initialized according to the following formula:
A is the centre of sphere in formula, and α i and α j concentrate any two sample vector, K for the training sampleijFor kernel function;
According to formulaThe centre of sphere of the sample vector is calculated away from d2。
Preferably, in the step S2, concentrated from the training sample and extract the sample vector V for violating KKT condition maximums2
Extraction type be without putting back to extraction.
Wherein, chosen and sample vector V in the step S22The centre of sphere away from the maximum sample vector V of difference1It specifically includes:
For any one of back end, obtain in the back end with the sample vector V2The centre of sphere away from difference
Maximum sample vector;
In the Driver Program of Spark frames according in each back end with the sample vector V2's
The centre of sphere obtains and sample vector V away from the maximum sample vector of difference2The centre of sphere away from the maximum sample vector V of difference1。
Wherein, in the step S4, calculate and obtain updated centre of sphere anewThe step of, it specifically includes:In Spark frames
Driver Program in the difference being calculated in all back end is added up, calculate and obtain the new centre of sphere
anew。
Wherein, further included after the step S5:According to the centre of sphere of updated each vector away from seek radius R,
Sample vector in boundary is removed, the sample for retaining all unbounded samples performs S1 to return.
Wherein, further included after the step S5, according to the drawing for judging to work as all sample vectors of training sample concentration
Ge Lang multipliers all meet KKT conditions or the sample vector V1And V2Target loss function loss be less than predetermined threshold value when, stop
Only train.
Method provided by the invention, by being distributed to the computation-intensive work of unit using Spark distributed computing frameworks
Each working node;Unit is largely stored to nuclear matrix KiiIt is distributed to each back end, during data increase, transverse direction can be carried out
Extension, and the time is calculated since operating point is independent, will not substantially it increase;Memory space limits from unit.On the other hand, apply
The mode of incremental computations saves a large amount of computations cycles by the full dose calculation that each iteration will carry out is avoided, and accelerates
Solve calculating process.
Description of the drawings
Fig. 1 is a kind of flow for training method of support vector machine based on Spark frames that one embodiment of the invention provides
Figure;
Fig. 2 is Spark in a kind of training method of support vector machine based on Spark frames that one embodiment of the invention provides
The structure chart of frame;
Fig. 3 is a kind of stream for training method of support vector machine based on Spark frames that further embodiment of this invention provides
Cheng Tu.
Specific embodiment
With reference to the accompanying drawings and examples, the specific embodiment of the present invention is described in further detail.Implement below
Example is not limited to the scope of the present invention for illustrating the present invention.
With reference to figure 1, Fig. 1 is a kind of support vector machines training side based on Spark frames that one embodiment of the invention provides
The flow chart of method, the described method includes:
S1 obtains training sample set, and all sample vector distributed storages that the training sample is concentrated are in Spark frames
In the back end of frame.
Specifically, after training sample set is received, it is by distributed storage, the sample vector in sample set is distributed
It is stored in the back end under Spark frames.
As shown in Fig. 2, Apache Spark are to aim at large-scale distributed data distribution formula memory to calculate and design fast
The general engine of speed.It is by the class Hadoop MapReduce to increase income of the AMP laboratories offer of University of California Berkeley
Universal parallel frame.Spark can be preserved in memory due to exporting result among MapReduce Job, so as to no longer need
HDFS is read and write, therefore Spark can preferably be suitable for the calculation that data mining and machine learning etc. need the MapReduce of iteration
Method.Many Parallel Algorithms all have realization on Spark.
By the method, by the sample vector distributed storage in training set in multiple back end, during data increase,
It can carry out extending transversely.
S2 is concentrated from the training sample and is extracted the sample vector V for violating KKT condition maximums2, while choose with sample to
Measure V2The centre of sphere away from the maximum sample vector V of difference1。
Specifically, Optimized Iterative process uses SMO algorithms, i.e., once two sample vectors is selected to optimize.General mark
It is V to know two sample vectors optimized1And V2, according to the stop condition of selection can determine how selected element can to calculate
Method convergence contribution is maximum, such as using the method for monitoring feasible gap, optimizes those point conducts for most violating KKT conditions first
V2, according to KKT conditions, V1, V2Iterative relation can be determined as formula:
λ1=α1+α2-λ2
In formula, K is kernel function, and α is Lagrange multiplier, d2For the centre of sphere away from.
In order to make the update step-length of each largest optimization maximum, it is seen that needs are foundMaximum, i.e.,Most
Small value, so as to find V1。
S3, to the sample vector V1And V2It is iterated optimization to calculate, obtains updated sample vector V1 newWith
V2 new。
S4, by the updated sample vector V1 newAnd V2 newIt is broadcast in the back end of the multiple Spark,
The sample vector V is calculated in each back end1And V2The difference of generation, according to the difference calculated in each back end
Point, it calculates and obtains updated centre of sphere anew。
S5, according to the updated centre of sphere anew, update the ball of each sample vector in the back end of the Spark
The heart is away from while updating radius of sphericity R.
Specifically, Optimized Iterative process uses SMO algorithms, according to One Class SVM model minimum sphere body Models, mesh
Scalar functions formula is:
s.t.||Φ(xi)-a||2≤R2+ζ
ζi≥0
In formula, middle R is radius of sphericity, and a is the centre of sphere, and ζ is slack variable.
Solve the following formula quadratic programming problem, you can acquire the centre of sphere and radius.
All parameters are updated according to the step of Fig. 3, wherein newer parameter includes V1, V2Lagrange multiplier alpha parameter;
Centre of sphere a, update the centre of sphere of each sample point vector away fromRadius of sphericity R, specific steps include:According to the following formula with new V1
And V2Lagrange multiplier α.
λ1=α1+α2-λ2
V is updated1And V2Afterwards, updated sample vector V is obtained1 newAnd V2 new, by V1、V2、V1 newAnd V2 newAnd protocorm
Heart a is broadcast in each back end of Spark, updates centre of sphere a, and more new formula is:
In formula, α i and α j concentrate any two sample vector, Ki for the training samplejFor kernel function, due to there was only V1,
V2The parameter alpha of sample vector changes, thus only with feature vector V1And V2The data of related feature vector can be become
Change, it is possible to be calculated using differential pair a, specific formula is as follows:
In formula, aoldFor the protocorm heart, anewIt is kernel function for updated centre of sphere K.The process of Difference Calculation is in each Spark
Data fragmentation on carry out Distributed Calculation, and added up on the Driver Program of Spark.
It, can be to the centre of sphere of each sample vector away from being updated, by applying difference formula after with new centre of sphere parameter:
It can realize to the centre of sphere of each sample vector away from being updated, in formulaFor the new centre of sphere away from,For original
The centre of sphere is away from a is the centre of sphere, and K is kernel function.This step carries out Distributed Calculation in the back end of Spark.
Finally, the update to radius of sphericity R is further included, specifically, working as V1And V2When being all unbounded sample, i.e. ξ < αi
During < C, ξ is the decimal close to 0, and C is penalty factor, then the more new formula of R is:
Work as V1And V2When being all sample in boundary, i.e. αi≤ ξ, or αiWhen >=C, then more new formula is:
By the method, the computation-intensive work of unit is distributed to each work section using Spark distributed computing frameworks
Point;Unit is largely stored to nuclear matrix Kii and is distributed to each working node.During data increase, extending transversely, calculating can be carried out
Time since operating point is independent, will not substantially increase;Memory space limits from unit.On the other hand, using incremental computations
Mode saves a large amount of computations cycles by the full dose calculation that each iteration will carry out is avoided, and accelerates to solve and calculated
Journey.
On the basis of above-described embodiment, the step S1 is further included:Corresponding be somebody's turn to do is read in each back end
Sample vector described in back end in training sample generates sample vector each described one unique data mark.
Preferably, the unique data is identified by the timestamp of burst area code and the back end local of the back end
It is composed.
Specifically, before starting optimization and calculating, when all sample vector distributed storages that training sample is concentrated exist
After in the back end of Spark frames, on each back end, the data data in the block of corresponding local can be read in, each
Sample vector can generate a not repeating random number formation unique data mark id.Due to the sample that training sample is concentrated to
Amount can carry out area there may be the identical situation of parameter, therefore here by unique data mark id to all sample vectors
Point.
Preferably, id can be composed by burst sequence number and local timestamp.The unique id of data can be used for area
The sample vector with identical memory address in point on difference Executor.
On the basis of the various embodiments described above, further included in the step S1 needed for the initialization iteration optimization calculating
Calculating parameter;Wherein, the calculating parameter includes Lagrange multiplier α, centre of sphere a and each sample vector of all sample vectors
The centre of sphere away from d2。
Preferably, the calculating parameter that the initialization iteration optimization calculates specifically includes:
The Lagrange multiplier α values for initializing all sample vectors are 1/N;Wherein, N is described in the training sample set
The number of sample vector;
Initialize square R of radius of sphericity2So that R2=0;
The centre of sphere is initialized according to the following formula:
A is the centre of sphere in formula, and α i and α j concentrate any two sample vector for the training sample, and Kij is kernel function;
According to formulaThe centre of sphere of the sample vector is calculated away from d2。
Specifically, before iteration optimization calculating, the iterative calculation parameter of support vector machines is initialized first, it is first
First initialize the Lagrange multiplier α of each sample vector, it is preferred that initial value is arranged to 1/N, wherein, N is the training
The number of all sample vectors in sample set.This process is Distributed Calculation, is calculated respectively on each back end.
Then, square R of radius of sphericity is initialized2, it is preferred that radius of sphericity square is arranged to 0, i.e. R2=0.
Then, the centre of sphere is initialized according to the following formula:
In formula, a is the centre of sphere, and α i and α j concentrate any two sample vector for the training sample,For gaussian kernel function.
Finally, according to formula:
Each sample vector is calculated to the distance d of centre of sphere a2, the step need on data set carry out full dose calculating, obtain
The result gone out is stored in using sample vector as in the HashMap of key.
On the basis of above-described embodiment, in the step S2, concentrate to extract from the training sample and violate KKT conditions most
Big sample vector V2Extraction type be without putting back to extraction.
Specifically, as the sample vector V for extracting violation KKT condition maximums2When, selection is that nothing puts back to extraction, is made
It obtains in entire big iteration cycle, all samples are traversed.
On the basis of the various embodiments described above, chosen and sample vector V in the step S22The centre of sphere it is away from difference maximum
Sample vector V1It specifically includes:
For any one of back end, obtain in the back end with the sample vector V2The centre of sphere away from difference
Maximum sample vector;
In the Driver Program of Spark frames according in each back end with the sample vector V2's
The centre of sphere obtains and sample vector V away from the maximum sample vector of difference2The centre of sphere away from the maximum sample vector V of difference1。
Specifically, as shown in figure 3, extracting sample vector V2Afterwards, in each back end of Spark, look for respectively
Go out in the back end with sample vector V2The centre of sphere away from the maximum sample vector of difference;Thereafter, under Spark frames
The global centre of sphere is chosen in Driver Program away from the maximum sample vector of difference as V1。
On the basis of the various embodiments described above, in the step S4, calculate and obtain updated centre of sphere anewThe step of, tool
Body includes:The difference being calculated in all back end is added up in the Driver Program of Spark frames,
It calculates and obtains new centre of sphere anew。
Specifically, as shown in figure 3, when to the sample vector V1And V2It is iterated optimization to calculate, obtains updated sample
This vector V1 newAnd V2 newAfterwards, the V after will be updated1And V2It, can be in each data after being broadcast to each back end of Spark
V is calculated in node1And V2The difference generated after variation, then, to all data in the Driver Program of Spark frames
The difference being calculated in node is added up, and is calculated and is obtained new centre of sphere anew。
On the basis of the various embodiments described above, further included after the step S5:According to updated each vector
The centre of sphere away from radius of sphericity R, remove sample vector in boundary, retain the samples of all unbounded samples to returning and perform S1.
Specifically, after an iteration optimization calculates completion, next group of V is reselected1And V2, carry out next round iteration
It calculates, using heuristic selection method, the unbounded sample of prioritizing selection is calculated, sample in suboptimization circle.It preferably, can be with
All sample vectors of sample in boundary are removed, it is follow-up to differentiate that calculated value needs to use the sample vector of unbounded sample.
On the basis of the various embodiments described above, according to judgement when the training sample concentrates the glug of all sample vectors bright
Day multiplier all meets KKT conditions or the sample vector V1And V2Target loss function loss be less than predetermined threshold value when, stop instruction
Practice.
Specifically, all the points Lagrange multiplier ɑ meets KKT conditions or reaches optimization aim after certain iterations
When loss function loss is less than a predetermined threshold value, then it is assumed that optimization reaches approximately KKT conditions.It can stop instructing at this time
Practice.
By the method, when target loss function loss is less than a predetermined threshold value, then may indicate that follow-up excellent
The effect of change is not apparent enough, at this time deconditioning, to reduce whole calculation amount.
Finally, the present processes are only preferable embodiment, are not intended to limit the scope of the present invention.It is all
Within the spirit and principles in the present invention, any modifications, equivalent replacements and improvements are made should be included in the protection of the present invention
Within the scope of.
Claims (10)
1. a kind of training method of support vector machine based on Spark frames, which is characterized in that including:
S1 obtains training sample set, and all sample vector distributed storages that the training sample is concentrated are in Spark frames
In back end;
S2 is concentrated from the training sample and is extracted the sample vector V for violating KKT condition maximums2, while choose and sample vector V2
The centre of sphere away from the maximum sample vector V of difference1;
S3, to the sample vector V1And V2It is iterated optimization to calculate, obtains updated sample vector V1 newAnd V2 new;
S4, by the updated sample vector V1 newAnd V2 newIt is broadcast in the back end of the Spark, in each data
The sample vector V is calculated in node1And V2The difference of generation, according to the difference calculated in each back end, calculating obtains
Obtain updated centre of sphere anew;
S5, according to the updated centre of sphere anew, update the centre of sphere of each sample vector in the back end of the Spark away from,
Update radius of sphericity R simultaneously.
2. according to the method described in claim 1, it is characterized in that, the step S1 is further included:To each back end
The sample vector in training sample described in the corresponding back end is read in, one is generated to sample vector each described only
One Data Identification.
3. according to the method described in claim 2, it is characterized in that, the unique data identifies the burst by the back end
The timestamp of area code and back end local is composed.
4. according to the method described in claim 2, it is characterized in that, the initialization iteration optimization is further included in the step S1
Calculating parameter needed for calculating;
Wherein, the calculating parameter includes Lagrange multiplier α, the centre of sphere a of all sample vectors and the ball of each sample vector
The heart is away from d2。
5. the according to the method described in claim 4, it is characterized in that, calculating parameter that the initialization iteration optimization calculates
It specifically includes:
The Lagrange multiplier α values for initializing all sample vectors are 1/N;
Wherein, N is the number of sample vector described in the training sample set;
Initialize square R2 of radius of sphericity so that R2=0;
The centre of sphere is initialized according to the following formula:
<mrow>
<msup>
<mi>a</mi>
<mn>2</mn>
</msup>
<mo>=</mo>
<munder>
<mo>&Sigma;</mo>
<mi>i</mi>
</munder>
<munder>
<mo>&Sigma;</mo>
<mi>j</mi>
</munder>
<mi>&alpha;</mi>
<mi>i</mi>
<mo>&CenterDot;</mo>
<mi>&alpha;</mi>
<mi>j</mi>
<mo>&CenterDot;</mo>
<mi>K</mi>
<mi>i</mi>
<mi>j</mi>
</mrow>
A is the centre of sphere in formula, and α i and α j concentrate any two sample vector, K for the training sampleijFor kernel function;
According to formulaThe centre of sphere of the sample vector is calculated away from d2。
6. according to the method described in claim 1, it is characterized in that, in the step S2, concentrate and extract from the training sample
Violate the sample vector V of KKT condition maximums2Extraction type be without putting back to extraction.
7. it according to the method described in claim 1, it is characterized in that, is chosen and sample vector V in the step S22The centre of sphere away from
Differ maximum sample vector V1It specifically includes:
For any one of back end, obtain in the back end with the sample vector V2The centre of sphere it is maximum away from difference
Sample vector;
In the Driver Program of Spark frames according in each back end with the sample vector V2The centre of sphere
Away from the sample vector that difference is maximum, obtain and sample vector V2The centre of sphere away from the maximum sample vector V of difference1。
8. according to the method described in claim 1, it is characterized in that, in the step S4, calculate and obtain updated centre of sphere anew
The step of, it specifically includes:To the difference being calculated in all back end in the Driver Program of Spark frames
It is added up, calculates and obtain new centre of sphere anew。
9. it according to the method described in claim 1, it is characterized in that, is further included after the step S5:According to updated institute
State the centre of sphere of each vector away from radius of a ball R, remove sample vector in boundary, retain the samples of all unbounded samples and perform to returning
S1。
10. it according to the method described in claim 8, it is characterized in that, is further included after the step S5, according to judging to work as
Training sample concentrates the Lagrange multiplier of all sample vectors all to meet KKT conditions or the sample vector V1And V2Target
When loss function loss is less than predetermined threshold value, deconditioning.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711269096.1A CN108121998B (en) | 2017-12-05 | 2017-12-05 | Spark frame-based support vector machine training method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711269096.1A CN108121998B (en) | 2017-12-05 | 2017-12-05 | Spark frame-based support vector machine training method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108121998A true CN108121998A (en) | 2018-06-05 |
CN108121998B CN108121998B (en) | 2020-09-25 |
Family
ID=62228798
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711269096.1A Active CN108121998B (en) | 2017-12-05 | 2017-12-05 | Spark frame-based support vector machine training method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108121998B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190719A (en) * | 2018-11-30 | 2019-01-11 | 长沙理工大学 | Support vector machines learning method, device, equipment and computer readable storage medium |
CN110210566A (en) * | 2019-06-06 | 2019-09-06 | 无锡火球普惠信息科技有限公司 | One-to-many supporting vector machine frame and its parallel method based on Spark |
CN111368874A (en) * | 2020-01-23 | 2020-07-03 | 天津大学 | Image category incremental learning method based on single classification technology |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184421A (en) * | 2011-04-22 | 2011-09-14 | 北京航空航天大学 | Training method of support vector regression machine |
CN104463211A (en) * | 2014-12-08 | 2015-03-25 | 天津大学 | Support vector data description method based on maximum distance between centers of spheres |
CN105975907A (en) * | 2016-04-27 | 2016-09-28 | 江苏华通晟云科技有限公司 | SVM model pedestrian detection method based on distributed platform |
CN106203485A (en) * | 2016-07-01 | 2016-12-07 | 北京邮电大学 | A kind of parallel training method and device of support vector machine |
CN106469315A (en) * | 2016-09-05 | 2017-03-01 | 南京理工大学 | Based on the multi-mode complex probe target identification method improving One Class SVM algorithm |
CN107194411A (en) * | 2017-04-13 | 2017-09-22 | 哈尔滨工程大学 | A kind of SVMs parallel method of improved layering cascade |
-
2017
- 2017-12-05 CN CN201711269096.1A patent/CN108121998B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184421A (en) * | 2011-04-22 | 2011-09-14 | 北京航空航天大学 | Training method of support vector regression machine |
CN104463211A (en) * | 2014-12-08 | 2015-03-25 | 天津大学 | Support vector data description method based on maximum distance between centers of spheres |
CN105975907A (en) * | 2016-04-27 | 2016-09-28 | 江苏华通晟云科技有限公司 | SVM model pedestrian detection method based on distributed platform |
CN106203485A (en) * | 2016-07-01 | 2016-12-07 | 北京邮电大学 | A kind of parallel training method and device of support vector machine |
CN106469315A (en) * | 2016-09-05 | 2017-03-01 | 南京理工大学 | Based on the multi-mode complex probe target identification method improving One Class SVM algorithm |
CN107194411A (en) * | 2017-04-13 | 2017-09-22 | 哈尔滨工程大学 | A kind of SVMs parallel method of improved layering cascade |
Non-Patent Citations (3)
Title |
---|
DAVID MJ TAX: "Data Domain Description using Support Vectors", 《ESANN"1999》 * |
张瑜,罗可: "基于OC-SVM的大型数据集分类方法", 《计算机工程与应用》 * |
徐图 等: "超球体单类支持向量机的 SMO 训练算法", 《计算机科学》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190719A (en) * | 2018-11-30 | 2019-01-11 | 长沙理工大学 | Support vector machines learning method, device, equipment and computer readable storage medium |
CN110210566A (en) * | 2019-06-06 | 2019-09-06 | 无锡火球普惠信息科技有限公司 | One-to-many supporting vector machine frame and its parallel method based on Spark |
CN111368874A (en) * | 2020-01-23 | 2020-07-03 | 天津大学 | Image category incremental learning method based on single classification technology |
CN111368874B (en) * | 2020-01-23 | 2022-11-15 | 天津大学 | Image category incremental learning method based on single classification technology |
Also Published As
Publication number | Publication date |
---|---|
CN108121998B (en) | 2020-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zaremba et al. | Reinforcement learning neural turing machines-revised | |
CN110569979B (en) | Logical-physical bit remapping method for noisy medium-sized quantum equipment | |
US10346757B2 (en) | Systems and methods for parallelizing Bayesian optimization | |
Stachurski | Economic dynamics: theory and computation | |
CN112132179A (en) | Incremental learning method and system based on small number of labeled samples | |
Valero-Carreras et al. | Support vector frontiers: A new approach for estimating production functions through support vector machines | |
CN108121998A (en) | A kind of training method of support vector machine based on Spark frames | |
Zhang et al. | Discrete-time and discrete-space dynamical systems | |
Hajipour et al. | SampleFix: learning to correct programs by sampling diverse fixes | |
CN102208030A (en) | Bayesian-model-averaging-based model combing method on regularization path of support vector machine | |
CN103971136A (en) | Large-scale data-oriented parallel structured support vector machine classification method | |
CN113535947A (en) | Multi-label classification method and device for incomplete data with missing labels | |
CN112183671A (en) | Target attack counterattack sample generation method for deep learning model | |
CN110795736B (en) | Malicious android software detection method based on SVM decision tree | |
CN110717601B (en) | Anti-fraud method based on supervised learning and unsupervised learning | |
Zhang et al. | Imitating deep learning dynamics via locally elastic stochastic differential equations | |
Wang et al. | Rgtsvm: support vector machines on a GPU in R | |
Dixit et al. | An implementation of data pre-processing for small dataset | |
Sha et al. | Estimating minimum operation steps via memory-based recurrent calculation network | |
CN115129320A (en) | Indirect jump target address identification method and device based on loop invariance | |
Luan et al. | Optimal representative distribution margin machine for multi-instance learning | |
WO2019209571A1 (en) | Proactive data modeling | |
CN107480790A (en) | A kind of optimal parameter combination of SVMs(C, σ)Method for fast searching | |
CN114708608B (en) | Full-automatic characteristic engineering method and device for bank bills | |
CN112561047B (en) | Apparatus, method and computer readable storage medium for processing data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |