CN108536730A - A kind of mixing Fourier kernel function support vector machines file classification method - Google Patents

A kind of mixing Fourier kernel function support vector machines file classification method Download PDF

Info

Publication number
CN108536730A
CN108536730A CN201810160983.3A CN201810160983A CN108536730A CN 108536730 A CN108536730 A CN 108536730A CN 201810160983 A CN201810160983 A CN 201810160983A CN 108536730 A CN108536730 A CN 108536730A
Authority
CN
China
Prior art keywords
kernel function
indicate
fourier
mixing
support vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810160983.3A
Other languages
Chinese (zh)
Other versions
CN108536730B (en
Inventor
于舒娟
张昀
朱文峰
何伟
董茜茜
金海红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University Of Posts And Telecommunications Nantong Institute Ltd
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University Of Posts And Telecommunications Nantong Institute Ltd
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University Of Posts And Telecommunications Nantong Institute Ltd, Nanjing Post and Telecommunication University filed Critical Nanjing University Of Posts And Telecommunications Nantong Institute Ltd
Priority to CN201810160983.3A priority Critical patent/CN108536730B/en
Publication of CN108536730A publication Critical patent/CN108536730A/en
Application granted granted Critical
Publication of CN108536730B publication Critical patent/CN108536730B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of mixing Fourier kernel function support vector machines file classification method.The method forms new mixing Fourier kernel function according to the different study of various kernel functions in support vector machines, generalization ability, and then by linear weighted function mixing multinomial and Fourier kernel function;Since the learning ability and generalization ability of kernel function largely influence support vector cassification effect, Polynomial kernel function is combined with Fourier kernel function.The method of the present invention inherits the high learning ability of Fourier kernel function and the generalization ability of Polynomial kernel function, improves the performance of support vector machine classifier;And with the multinomial in Polynomial kernel function, gaussian kernel function, Fourier kernel function and the mixed kernel function in monokaryon compared with Gaussian kernel compound kernel function, mixing Fourier kernel function has better extensive, learning ability, text classification best results.

Description

A kind of mixing Fourier kernel function support vector machines file classification method
Technical field
In terms of present invention is mainly applied to the natural language processing in machine learning, particularly with regard in a kind of mixing Fu Leaf kernel function support vector machine file classification method.
Background technology
With the arriving in big data epoch, have in terms of the related data processing such as natural language processing, image procossing fast The development of speed.Due to the feature that text message is high-dimensional, how in these complicated high-dimensional features distinctive rule is found, To be that people preferably service in the future, this is the important research direction of Statistical Learning Theory.Support vector machines (Support VectorMachines, SVM) it is a kind of machine learning side based on Statistical Learning Theory proposed by Vapnik et al. nineteen ninety-five Method.SVM by various kernel function by solving nonlinear problem.
SVM has also obtained extensive research in nonlinear text classification problem at present.Article [Liu Gaohui, Yang Xing mono- Support vector machines [J] the micro computers of kind mixed kernel function and application, 2017,36 (11):19-22.] in mention polynomial kernel letter The outstanding generalization ability of number is very suitable for text classification problem.Polynomial kernel function is added for the stronger kernel function of learning ability Tend to improve the effect of classification.A kind of article [improved mixed kernel function support vector machines file classification methods of Liu Zhi health [J] industrial control computers .2016,29 (6):113-117] in propose Polynomial kernel function and Definite core composition Mixed kernel function.Article [J.A.K.Suykens, J.Vandewalle, Least squares support vector Machine classifiers, Neural Processing Letters 9 (3), 293 (1999)] propose least square Support vector machines solves nonlinear problem, but accuracy is not very high.Document [opens Fourier in brave support vector machines Performance evaluation [D] the East China Normal University .2008. of core] N-dimensional Fourier kernel is had studied on the basis of one-dimensional Fourier kernel, but it is logical It crosses experimental analysis to show in text classification problem, N-dimensional is approximate with one-dimensional Fourier kernel function classifying quality.Illustrate first herein Support vector machines obtains basic theories, and analyzes and compare excellent in text classification of traditional kernel function and Fourier kernel and lack Point.Different classifying qualities, learning ability, the generalization ability etc. shown by comparing analysis kernel function, it is proposed that one kind is mixed Close Fourier kernel function supporting vector machine model file classification method.
Invention content
The technical problem to be solved by the present invention is in order to improve support vector machines effect in text classification, it is proposed that one Kind mixing Fourier kernel function support vector machines file classification method.The method of the present invention is mainly in one-dimensional Fourier kernel function Upper addition Polynomial kernel function forms new mixing Fourier kernel function, and mixing Fourier kernel function inherits Fourier kernel function Learning ability and the generalization ability of Polynomial kernel function improve text point to constitute new supporting vector machine model The effect of class.
In order to solve the above technical problems, the technical solution adopted in the present invention is:
A kind of mixing Fourier kernel function support vector machines file classification method, comprises the following steps:
Step A, Training Support Vector Machines, to obtain αiAnd b, according to common Lagrange multiplier in optimization problem and KKT conditions will solve expression formula and be combined respectively with equality constraint and inequality constraints condition, and simplifying support vector machine is asked Solution preocess, solution are converted into:
Constraints:Wherein C indicates slack variable;
In formula,Indicate supporting vector largest interval equivalence transformation result;
Expression formula minimum value is sought in expression;
Expression formula maximum value is sought in expression;
It indicates to sum to expression formula;
xi,xj∈{x1,x2,...,xnIndicating i-th, j trained set document vectorization value, wherein n indicates training set document Quantity, 1≤i, j≤n;
yi,yj∈{y1,y2,...,ynIndicate i-th, j trained set document belonging to classification, value 1 or -1;
αij∈ α={ α12,...,αnIndicate xi,xjCorresponding Lagrange multiplier;
Indicate normal vector;
wTIndicate w transposition;
||w||2Indicate square of w Euclid norms;
B indicates intercept of the hyperplane in reference axis;
K(xi,xj) indicate kernel function;
Step B, construction mixing Fourier kernel function, to be introduced into support vector machines, mixing Fourier kernel function is:
In formula, 0≤u≤1;
Kpoly=(xi×xj+c)dRepresentative polynomial kernel function, wherein c values are 1, d values 2 or 3;
Indicate Fourier kernel function, wherein cos (xi-xj) indicate xi-xj Cosine value, 0 < q < 1;
Mixing Fourier kernel function is introduced support vector machines by step C:
Step D, document vectorization:
In formula, λkjIndicate document deMiddle Feature Words tkWeights, as vectorization result
tk∈{t1,t2,...,tmIndicate Feature Words tk, wherein m indicates Feature Words total quantity in total document, 1≤i≤m;
de∈{d1,d2,...,dNIndicate e-th of document in total document, 1≤e≤N;
tf(tk,de) indicate Feature Words tkIn document deThe number of middle appearance;
NkIt indicates to include Feature Words tkNumber of documents;
N indicates total number of files;
β is empirical value, value 0.1;
Step E, total document choose training set and test set, terminal decision function by cross validation method:
In formula, f (x's) indicate supporting vector machine model classification results;
xs'∈{x1',x'2,...,x'zIndicate that s-th of test set document after vectorization, wherein z indicate test set document 1≤s of quantity≤z;
K(x's,xi) indicate the mixing Fourier kernel function proposed;
αiThe parameter that Training Support Vector Machines obtain is indicated with b;
Sgn () indicates sign function;
The beneficial effects of the invention are as follows:The present invention uses new mixing Fourier kernel function supporting vector machine model, to Improve the effect of text classification.The method:According to various kernel functions in support vector machines it is different learn, extensive energy Power, and then by linear weighted function mixing multinomial and Fourier kernel function, form new mixing Fourier kernel function.Due to core letter Several learning abilities and generalization ability largely influences support vector cassification effect, thus Polynomial kernel function with Fourier kernel function is combined, and the present invention inherits the high learning ability of Fourier kernel function and the extensive energy of Polynomial kernel function Power improves the performance of support vector machine classifier;And with the Polynomial kernel function in monokaryon, gaussian kernel function, Fourier Compared with Gaussian kernel compound kernel function, mixing Fourier kernel function has more preferable multinomial in kernel function and mixed kernel function Extensive, learning ability, text classification best results.
Description of the drawings:
Fig. 1 is the linear weighted combination kernel function of traditional Polynomial kernel function and gaussian kernel function in two-dimensional space Sample figure.
Fig. 2 is present invention mixing Fourier's mixed kernel function two-dimensional space sample figure.
Specific implementation mode
It is literary to a kind of mixing Fourier kernel function support vector machines proposed by the present invention below in conjunction with the accompanying drawings with simulation result This sorting technique is described in detail:
A kind of mixing Fourier kernel function support vector machines file classification method, implementation process are as follows:
Training Support Vector Machines, to obtain αiAnd b, according to common Lagrange multiplier and KKT items in optimization problem Part will solve expression formula and be combined respectively with equality constraint and inequality constraints condition, and simplifying support vector machine solved Journey, solution are converted into:
Constraints:Wherein C indicates slack variable;
In formula,Indicate supporting vector largest interval equivalence transformation result;
Expression formula minimum value is sought in expression;
Expression formula maximum value is sought in expression;
It indicates to sum to expression formula;
xi,xj∈{x1,x2,...,xnIndicating i-th, j trained set document vectorization value, wherein n indicates training set document Quantity, 1≤i, j≤n;
yi,yj∈{y1,y2,...,ynIndicate i-th, j trained set document belonging to classification, value 1 or -1;
αij∈ α={ α12,...,αnIndicate xi,xjCorresponding Lagrange multiplier;
Indicate normal vector;
wTIndicate w transposition;
||w||2Indicate square of w Euclid norms;
B indicates intercept of the hyperplane in reference axis;
K(xi,xj) indicate kernel function;
Construction mixing Fourier kernel function, to be introduced into support vector machines, mixing Fourier kernel function is:
In formula, 0≤u≤1;
Kpoly=(xi×xj+c)dRepresentative polynomial kernel function, wherein c values are 1, d values 2 or 3;
Indicate Fourier kernel function, wherein cos (xi-xj) indicate xi-xj Cosine value, 0 < q < 1;
Mixing Fourier kernel function is introduced into support vector machines:
Document vectorization:
In formula, λkjIndicate document deMiddle Feature Words tkWeights, as vectorization result
tk∈{t1,t2,...,tmIndicate Feature Words tk, wherein m indicates Feature Words total quantity in total document, 1≤i≤m;
de∈{d1,d2,...,dNIndicate e-th of document in total document, 1≤e≤N;
tf(tk,de) indicate Feature Words tkIn document deThe number of middle appearance;
NkIt indicates to include Feature Words tkNumber of documents;
N indicates total number of files;
β is empirical value, value 0.1;
Total document chooses training set and test set, terminal decision function by cross validation method:
In formula, f (x's) indicate supporting vector machine model classification results;
xs'∈{x1',x'2,...,x'zIndicate that s-th of test set document after vectorization, wherein z indicate test set document 1≤s of quantity≤z;
K(x's,xi) indicate the mixing Fourier kernel function proposed;
αiThe parameter that Training Support Vector Machines obtain is indicated with b;
Sgn () indicates sign function.
Multinomial shown in Fig. 1 and gaussian kernel function mixed kernel function, explanation consistent with gaussian kernel function in the value of test point Big change does not occur in learning ability for mixed kernel function, but is all increased far from the value at test point, illustrates multinomial Formula improves generalization ability with gaussian kernel function mixed kernel function.D in figure, gamma distinguish index in representative polynomial kernel function Parameter and gaussian kernel function parameter.
Linear weighting coefficient u in parameter u representation formulas 3 in Fig. 2.Fourier kernel function parameter q values are 0.5, and one-dimensional Fourier kernel function is compared, and mixing Fourier kernel function is approximate with one-dimensional Fourier kernel function in the value of test point, illustrates to mix Fourier kernel function inherits the learning ability of one-dimensional Fourier kernel function;It is higher than one-dimensional Fourier in the value far from test point Kernel function illustrates that mixing Fourier kernel function generalization ability is higher than traditional one-dimensional Fourier kernel function.Compare multinomial and height This kernel function mixed kernel function, value of the mixing Fourier kernel function at the value of test point and other points will be higher than multinomial With the mixed kernel function of gaussian kernel function, illustrate mix Fourier kernel function no matter will in learning ability or generalization ability Higher than multinomial and gaussian kernel function mixed kernel function.
Document by word frequency method carry out characteristic dimension selection, feature quantity selection 500 to 3000 and 5000, 7000,9000 dimension.These features are transferred in the supporting vector machine model of different kernel function compositions, compare different kernel functions Precision ratio, recall rate and the F1 values of supporting vector machine model result.Comparing result is shown, with the increase of dimension, Ge Gehe Three indexs of function have 2%~4% or so growth, compare other monokaryon functions, three indexs of one-dimensional Fourier kernel function It is higher by 2%~3%, fourier function is mixed and promotes 2%~3% compared to one-dimensional Fourier kernel function, compared to multinomial and height This kernel function promotes 1.5%~2%.
In conclusion mixing Fourier kernel function supporting vector machine model proposed by the present invention is in learning ability and extensive Ability is better than other kernel functions, and under each parameter square one such as data set and feature quantity, classification performance is better than biography The kernel function of system.

Claims (1)

1. a kind of mixing Fourier kernel function support vector machines file classification method, which is characterized in that the method includes as follows Step:Step A, Training Support Vector Machines, to obtain αiAnd b, according to Lagrange multiplier and KKT conditions, simplify support to Amount machine solution procedure, solution are converted into:
Constraints:Wherein C indicates slack variable;
In formula,Indicate supporting vector largest interval equivalence transformation result;
Expression formula minimum value is sought in expression;
Expression formula maximum value is sought in expression;
It indicates to sum to expression formula;
xi,xj∈{x1,x2,...,xnIndicating i-th, j trained set document vectorization value, wherein n indicates training set number of files Amount, 1≤i, j≤n;
yi,yj∈{y1,y2,...,ynIndicate i-th, j trained set document belonging to classification, value 1 or -1;
αij∈ α={ α12,...,αnIndicate xi,xjCorresponding Lagrange multiplier;
Indicate normal vector;
wTIndicate w transposition;
||w||2Indicate square of w Euclid norms;
B indicates intercept of the hyperplane in reference axis;
K(xi,xj) indicate kernel function;
Step B, construction mixing Fourier kernel function, to be introduced into support vector machines, mixing Fourier kernel function is:
In formula, 0≤η≤1;
Kpoly=(xi×xj+c)dRepresentative polynomial kernel function, wherein c values are 1, d values 2 or 3;
Indicate Fourier kernel function, wherein cos (xi-xj) indicate xi-xjIt is remaining String value, 0 < q < 1;
Mixing Fourier kernel function is introduced support vector machines by step C:
Step D, document vectorization:
In formula, λkjIndicate document deMiddle Feature Words tkWeights, as vectorization result;
tk∈{t1,t2,...,tmIndicate Feature Words tk, wherein m indicates Feature Words total quantity in total document, 1≤i≤m;
de∈{d1,d2,...,dNIndicate e-th of document in total document, 1≤e≤N;
tf(tk,de) indicate Feature Words tkIn document deThe number of middle appearance;
NkIt indicates to include Feature Words tkNumber of documents;
N indicates total number of files;
β is empirical value, value 0.1;
Step E, total document choose training set and test set, terminal decision function by cross validation method:
In formula, f (x's) indicate supporting vector machine model classification results;
x′s∈{x′1,x′2,...,x′zIndicate that s-th of test set document after vectorization, wherein z indicate test set number of documents 1 ≤s≤z;
K(x's,xi) indicate the mixing Fourier kernel function proposed;
αiThe parameter that Training Support Vector Machines obtain is indicated with b;
Sgn () indicates sign function.
CN201810160983.3A 2018-02-27 2018-02-27 Text classification method for hybrid Fourier kernel function support vector machine Active CN108536730B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810160983.3A CN108536730B (en) 2018-02-27 2018-02-27 Text classification method for hybrid Fourier kernel function support vector machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810160983.3A CN108536730B (en) 2018-02-27 2018-02-27 Text classification method for hybrid Fourier kernel function support vector machine

Publications (2)

Publication Number Publication Date
CN108536730A true CN108536730A (en) 2018-09-14
CN108536730B CN108536730B (en) 2020-04-07

Family

ID=63486141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810160983.3A Active CN108536730B (en) 2018-02-27 2018-02-27 Text classification method for hybrid Fourier kernel function support vector machine

Country Status (1)

Country Link
CN (1) CN108536730B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010086466A (en) * 2008-10-02 2010-04-15 Toyota Central R&D Labs Inc Data classification device and program
CN102567742A (en) * 2010-12-15 2012-07-11 中国科学院电子学研究所 Automatic classification method of support vector machine based on selection of self-adapting kernel function
CN103366175A (en) * 2013-07-14 2013-10-23 西安电子科技大学 Natural image classification method based on potential Dirichlet distribution
CN106874935A (en) * 2017-01-16 2017-06-20 衢州学院 SVMs parameter selection method based on the fusion of multi-kernel function self adaptation
CN106951466A (en) * 2017-03-01 2017-07-14 常州大学怀德学院 Field text feature and system based on KNN SVM

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010086466A (en) * 2008-10-02 2010-04-15 Toyota Central R&D Labs Inc Data classification device and program
CN102567742A (en) * 2010-12-15 2012-07-11 中国科学院电子学研究所 Automatic classification method of support vector machine based on selection of self-adapting kernel function
CN103366175A (en) * 2013-07-14 2013-10-23 西安电子科技大学 Natural image classification method based on potential Dirichlet distribution
CN106874935A (en) * 2017-01-16 2017-06-20 衢州学院 SVMs parameter selection method based on the fusion of multi-kernel function self adaptation
CN106951466A (en) * 2017-03-01 2017-07-14 常州大学怀德学院 Field text feature and system based on KNN SVM

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
N. JANNAH,S. HADJILOUCAS: "Detection of ECG arrhythmia conditions using CSVM and MSVM classifiers", 《 2015 IEEE SIGNAL PROCESSING IN MEDICINE AND BIOLOGY SYMPOSIUM (SPMB)》 *
李希鹏: "基于混合核函数支持向量机的文本分类研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
黄瑜青: "基于混合核函数的SVM 在文本自动分类的应用", 《计算机光盘软件与应用》 *

Also Published As

Publication number Publication date
CN108536730B (en) 2020-04-07

Similar Documents

Publication Publication Date Title
CN100585617C (en) Based on sorter integrated face identification system and method thereof
CN106777402B (en) A kind of image retrieval text method based on sparse neural network
CN107885849A (en) A kind of moos index analysis system based on text classification
CN109766911A (en) A kind of behavior prediction method
CN111813939A (en) Text classification method based on representation enhancement and fusion
CN114970725A (en) Adaboost-SVM-based transformer working condition identification method
Zhang et al. Hybrid nonlinear convolution filters for image recognition
Ma et al. Chinese text classification review
CN108536730A (en) A kind of mixing Fourier kernel function support vector machines file classification method
Wang et al. The weighted multiple meta-models stacking method for regression problem
Cao et al. Adaptable focal loss for imbalanced text classification
Kastrati et al. Sentiment Polarity and Emotion Detection from Tweets Using Distant Supervision and Deep Learning Models
Liu et al. Multi-loss Siamese convolutional neural network for Chinese calligraphy style classification
Xu et al. Classification method of marine tourism resource of least square support vector machines based on particle swarm algorithm
CN105320968A (en) Improved method for centroid classifier
CN109359694A (en) A kind of image classification method and device of the classifier indicated based on mixing collaboration
Song et al. Towards deeper insights into deep learning from imbalanced data
Liu et al. Automatic decision support by rule exhaustion decision tree algorithm
Bulat et al. Language-Aware Soft Prompting: Text-to-Text Optimization for Few-and Zero-Shot Adaptation of V &L Models
Li et al. One-shot chinese character recognition based on deep siamese networks
CHEN et al. AdaBoost-SVM Based Undergraduates Evaluations
CN102637205A (en) Document classification method based on Hadoop
CN108268873A (en) A kind of population data sorting technique and device based on SVM
Xu Adaptive Classification Algorithm Design for Online Teaching Resources of Opera Singing
Winter et al. Learned Feature Generation for Molecules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: No. 9 Wenyuan Road, Yadong New District, Nanjing, Jiangsu Province, 210012

Applicant after: Nanjing Post & Telecommunication Univ.

Applicant after: Nanjing University of Posts and Telecommunications Nantong Institute Limited

Address before: 210044 No. 9 Wenyuan Road, Qixia District, Nanjing, Jiangsu Province

Applicant before: Nanjing Post & Telecommunication Univ.

Applicant before: Nanjing University of Posts and Telecommunications Nantong Institute Limited

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 226000 No. 33 Xinkang Road, Gangzhao District, Nantong City, Jiangsu Province

Applicant after: NANJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS

Applicant after: Nanjing University of Posts and Telecommunications Nantong Institute Limited

Address before: 210012 9 Wen Yuan Road, Ya Dong new town, Nanjing, Jiangsu.

Applicant before: NANJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS

Applicant before: Nanjing University of Posts and Telecommunications Nantong Institute Limited

GR01 Patent grant
GR01 Patent grant