CN108268873A - A kind of population data sorting technique and device based on SVM - Google Patents
A kind of population data sorting technique and device based on SVM Download PDFInfo
- Publication number
- CN108268873A CN108268873A CN201611254023.0A CN201611254023A CN108268873A CN 108268873 A CN108268873 A CN 108268873A CN 201611254023 A CN201611254023 A CN 201611254023A CN 108268873 A CN108268873 A CN 108268873A
- Authority
- CN
- China
- Prior art keywords
- group
- svm
- classification
- population data
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of population data sorting technique based on SVM and device, method include:Step S1 extracts history population data, determines the characteristic of group and group;Step S2 according to the characteristic, builds the quadratic character matrix of the group;Step S3, according to the quadratic character matrix, the corresponding SVM classifier of training;Step S4 treats classification population data using the SVM classifier and classifies;Its device includes corresponding historical data processing unit, eigenmatrix construction unit, classifier training unit and grader taxon.In this way, can classify by computer to population data, easily and fast, manpower and materials energetically are saved;In addition, compared to other graders, SVM has a distinct increment on classifier performance, and the advantage with high-class precision, so as to improve the accuracy of group's composition analysis.
Description
Technical field
The present invention relates to data classification fields, and in particular to a kind of population data sorting technique and device based on SVM.
Background technology
Market survey is a long-standing subject, in developing history so for many years, has emerged in large numbers many research methods.
Into after 21st century, with the development of computer technology, the computing platform by investigation of market survey field also slowly
It is transferred on computer.The analysis of marketing data is carried out using computer, can quickly generate report with all kinds of visualizations
Data model, greatly reduce the time of artificial calculation amount and investigation, improve accuracy.At this by information dominance
Epoch, we are higher and higher for the attention degree of information.Equally, during some group is studied, understand this group
The composition of body and essential.
Analysis to group's composition is substantially exactly to be classified according to historical data to sample populations data, but
Be current sorting technique mainly by manually carrying out, not only heavy workload, but also time-consuming and laborious.It can therefore, it is necessary to one kind
With the method and device classified by computer to population data.
In view of drawbacks described above, creator of the present invention obtains the present invention finally by prolonged research and practice.
Invention content
To solve above-mentioned technological deficiency, the technical solution adopted by the present invention is, provides a kind of group based on SVM first
Data classification method, including:
Step S1 extracts history population data, determines the characteristic of group and group;
Step S2 according to the characteristic, builds the quadratic character matrix of the group;
Step S3, according to the quadratic character matrix, the corresponding SVM classifier of training;
Step S4 treats classification population data using the SVM classifier and classifies.
Preferably, the step 2 includes:
Step S21, analyzes the characteristic of group, therefrom extracts the corresponding essential characteristic of each classification of group;
Data in the history population data are converted into feature vector by step S22;
Step S24, with the quadratic character matrix of described eigenvector building group.
Preferably, the step 2 further includes:Step S23 assigns its different power according to the significance level of characteristic
Value, and correct described eigenvector.
Preferably, the step 3 includes:
Step S31 adds in the classification information of each classification in group in the quadratic character matrix;
Step S32 learns the quadratic character matrix with the classification information, described eigenvector with
Correspondence is established between the classification information of group, the training SVM classifier obtains its discriminant function.
Preferably, in the step S2, row vector and column vector represent the every of group respectively in the quadratic character matrix
Individual and the characteristic of group, each element in the quadratic character matrix is corresponding individual in population and spy
Levy the degree of association of data.
Preferably, in the step S4, the quantity of the SVM classifier is identical with the categorical measure of the group.
Preferably, in the step S4, the quantity of the SVM classifier is identical with the categorical measure of the group and one by one
Corresponding, during classification, the population data to be sorted passes through all SVM classifiers, if only one of which SVM classifier exports
Positive number, then the population data to be sorted belong to the corresponding classification of the SVM classifier;If wherein there are zero or more than one SVM
Grader export positive number, then the population data to be sorted belong to value maximum of discriminant function in all SVM classifiers SVM divide
The corresponding classification of class device.
Secondly a kind of group based on SVM corresponding with the population data sorting technique described above based on SVM is provided
Device for classifying data, including:
Historical data processing unit extracts history population data, determines the characteristic of group and group;
Eigenmatrix construction unit according to the characteristic, builds the quadratic character matrix of the group;
Classifier training unit, according to the quadratic character matrix, the corresponding SVM classifier of training;
Grader taxon treats classification population data using the SVM classifier and classifies.
Preferably, the eigenmatrix construction unit includes:
Essential characteristic extracts subelement, analyzes the characteristic of group, it is corresponding therefrom to extract each classification of group
Essential characteristic;
Data in the history population data are converted into feature vector by feature vector transforming subunit;
Vector structure matrix subelement, with the quadratic character matrix of described eigenvector building group.
Preferably, the eigenmatrix construction unit further includes:Weights assign subelement, according to the important of characteristic
Degree and assign its different weights, and correct described eigenvector.
Compared with the prior art the beneficial effects of the present invention are:In this way, population data can be carried out by computer
Classification, easily and fast, saves manpower and materials energetically;In addition, compared to Various Classifiers on Regional, such as:Neural network, decision tree,
Naive Bayesian etc., SVM has a distinct increment on classifier performance, and the advantage with high-class precision, so as to improve group
The accuracy of body composition analysis;Population characteristic is analyzed and is extracted, the pass of feature and corresponding classification can be greatly improved
Connection degree, so as to make classification results relatively reliable.
Description of the drawings
It is required in being described below to embodiment in order to illustrate more clearly of the technical solution in various embodiments of the present invention
The attached drawing used is briefly described.
Fig. 1 is the flow chart of the population data sorting technique the present invention is based on SVM;
Fig. 2 is the flow chart one of the population data sorting technique step S2 the present invention is based on SVM;
Fig. 3 is the flowchart 2 of the population data sorting technique step S2 the present invention is based on SVM;
Fig. 4 is the flow chart of the population data sorting technique step S3 the present invention is based on SVM;
Fig. 5 is the structure chart of the population data sorter the present invention is based on SVM;
Fig. 6 is the structure chart one of the population data sorter eigenmatrix construction unit the present invention is based on SVM;
Fig. 7 is the structure chart two of the population data sorter eigenmatrix construction unit the present invention is based on SVM;
Fig. 8 is the structure chart of the population data sorter classifier training unit the present invention is based on SVM.
Specific embodiment
Below in conjunction with attached drawing, the forgoing and additional technical features and advantages are described in more detail.
Embodiment 1
As shown in Figure 1, it is the flow chart of the population data sorting technique the present invention is based on SVM;Wherein, it is described to be based on
The population data sorting technique of SVM, including:
Step S1 extracts history population data, determines the characteristic of group and group;
The history population data, the corresponding characteristic of classification and group including at least group;
History population data analyzed, first have to analyze history population data, therefrom determine group
The corresponding characteristic of the group of each classification and each classification.
By taking market shopping as an example, group therein is the shopping group in market, can be labeled as student, white collar, religion for we
The classification of teacher, old man, youth, child etc. as group, wherein the place for having conflict can be adjusted according to actual conditions,
But the classification of each group should have apparent differentiation with other classifications of group, otherwise during follow-up progress group classification
Accuracy can substantially reduce;The characteristic of group, it is related with the classification of group, for example, the type of student's classification shopping is is somebody's turn to do
The characteristic of classification, wherein may include:Books, stationery, rubber, fruit, milk etc., are its characteristic, old man's class
Another characteristic data may include:Manufacture of Walnut Milk, Radix Isatidis, fruit etc. are also its characteristic.History population data comes
Source can be by artificial or computer statistics daily shopping data, specifically be subject to actual conditions.
The corresponding characteristic of classification and each group of group is determined from history population data, it can be combed
Reason, while the data of wherein apparent error can be rejected, improve the accuracy rate of subsequent analysis;Subsequent analysis speed can also be improved
Degree, and then improve the speed and efficiency of the entirely population data sorting technique based on SVM.
Step S2 according to the characteristic, builds the quadratic character matrix of the group;
The group determined according to above-mentioned steps and corresponding characteristic, the quadratic character matrix of building group, wherein, two
Row vector and column vector represent each individual of group and the characteristic of group, quadratic character matrix respectively in secondary eigenmatrix
In each element be corresponding individual in population and characteristic the degree of association.
In this way, group and corresponding characteristic can be converted to the form of matrix, digitized, convenient for computer
It is identified and classifies, fast and easy, and then improve the entirely efficiency of the population data sorting technique based on SVM and accuracy.
Step S3, according to the quadratic character matrix, the corresponding SVM classifier of training.
According to the quadratic character matrix that history population data is built, SVM classifier is trained, so as to obtain maturation
SVM classifier, subsequently to classify to new population data.
SVM solves two class classification problems and is mainly based upon structural risk minimization, and it is super flat to find an optimal classification
Face is separated two class data with largest interval.If linear separability sample set S=(xi, yi) | i=1 ..., n }, wherein xi ∈ Rd
(Rd is d dimensional feature spaces), yi={+1, -1 } is the corresponding category labels of xi.Linear discriminant function in d dimension spaces it is general
Form is g (x)=wx+b, and corresponding classifying face equation normalizes discriminant function g (x) so that two class samples for wx+b=0.
This all meets | g (x) | >=1, and such class interval is equal to 2/ | | w | |.Therefore, class interval face maximum is made just to be equivalent to make | |
W | | it is minimum;And require classifying face that can correctly classify all samples, it seeks to meet
Yi [(wx)+b] -1 >=0, i=1,2 ... n
The classifying face for meeting above-mentioned two condition is exactly optimal classification surface, and by nearest from classifying face in two class samples
Point and be parallel to the super unilateral H1 of optimal classification surface, the training sample on H2 be exactly so that those samples of above formula equal sign inside the city,
Referred to as supporting vector.Optimal classification surface problem can be expressed as under the constraint of above formula condition, seek object function
Minimum value.For linearly inseparable sample, introduce slack variable ξ i and penalty factor, object function are rewritten as
For this purpose, introducing Lagrange multiplier (α 1, α 2 ..., α N), constrained quadratic function extreme value can be converted into
Problem solving optimal classification surface, corresponding solution are w=Σ α iyi xi, wherein, α i are only to xi non-zeros, then optimal classification function can
It is rewritten as
F (x)=sign { (wx)+b }=sign { Σ α iyi (xix)+b }
Step S4 treats classification population data using the SVM classifier and classifies.
In this way, can classify by computer to population data, easily and fast, manpower and materials energetically are saved.
In addition, compared to Various Classifiers on Regional, such as:Neural network, decision tree, naive Bayesian etc., SVM have larger on classifier performance
It is promoted, and the advantage with high-class precision, so as to improve the accuracy of group's composition analysis.
Embodiment 2
Population data sorting technique based on SVM as described above, the present embodiment is different from part and is, such as Fig. 2
Shown, the step S2 includes:
Step S21, analyzes the characteristic of group, therefrom extracts the corresponding essential characteristic of each classification of group;
Group has multiple classifications, and each classification has multiple characteristics again;But these characteristics and classification
The degree of association simultaneously differs, it is also necessary to extract;For example, the corresponding characteristic of student's classification includes books, stationery, rubber
Deng, but be also possible to due to cause specific purchase seafood, soymilk, a milk powder etc. in characteristic can also include
Seafood, soymilk, milk powder, but the product that seafood, soymilk, milk powder and not all student or Most students can all be bought, very may be used
Can be several students single shopping product, if seafood, soymilk, milk powder are also assert be student's classification characteristic,
The accuracy rate of classification results can be caused to substantially reduce, it is therefore desirable to extract.
Extraction can express the feature of classification information, be the primary premise for realizing machine learning.Characteristic more being capable of table
The characteristics of up to group, shows that its discrimination is higher, and the effect of machine learning also will be better.Therefore, point of effective group is selected
Category feature is the key that realize to be classified to group.Demographic categories can be described by being extracted from the characteristic of group
Essential characteristic can greatly improve the accuracy to group classification.
For example, books, stationery, rubber, fruit, milk etc. are extracted from the corresponding characteristic of student's classification is used as such
Other essential characteristic.In this way when subsequently judging, the accuracy of judgement can be greatly improved.
Data in the history population data are converted into feature vector by step S22;
For the individual data items in the history population data, there is the classification described in it, an individual centainly belongs to one
A classification if the individual has some foundation characteristic of the category, 1 is denoted as in this feature, is otherwise 0, thus will be every
A individual data items are converted into a basic feature vector.For example, student's single purchase books, stationery, rubber, water
Fruit, soymilk (not being the essential characteristic of student), then its feature vector may be (0,1,0,1,1,1,0,0,0), wherein, feature to
Each element in amount corresponds to character pair data seafood, books, Manufacture of Walnut Milk, stationery, rubber, fruit, flapjack, three texts respectively
Fish, soymilk (being only demonstration in this example, it is not intended that the inclusion relation between characteristic);Wherein, since soymilk is not student
Essential characteristic, therefore its corresponding position is still denoted as 0.
Step S24, with the quadratic character matrix of described eigenvector building group;
Wherein, row vector and column vector represent each individual of group and the characteristic of group respectively in quadratic character matrix
According to each element in quadratic character matrix is the degree of association of corresponding individual in population and characteristic.
In this way, being analyzed population characteristic and being extracted, the degree of association of feature and corresponding classification can be greatly improved, from
And make classification results relatively reliable.
Embodiment 3
Population data sorting technique based on SVM as described above, the present embodiment is different from part and is, such as Fig. 3
Shown, the step S2 is further included:
Step S23 assigns its different weights according to the significance level of characteristic, and corrects described eigenvector;
Characteristic of division usually requires us and adds weight for it.In the foundation characteristic of one classification, each feature and such
Other degree of association is also different, such as student, and books, stationery, rubber and its degree of association are higher than rubber, water
Fruit;If its degree of association not distinguished, it can so that subsequent classification is inaccurate.Therefore it needs, for some classification, to assign
The different weights of the characteristic, to be modified to feature vector.For example, we are by student's classification, books, stationery,
Rubber, fruit weights be assigned to 5,4,3,2 respectively, then its feature vector modification be (0,5,0,4,3,2,0,0,0).
To characteristic setting weights process be:It counts for each classification in group, p before degree of association ranking
The feature of name (according to actual conditions choose, and p value is bigger, and analysis result is more accurate, but comparatively workload is also got over by p value
Greatly), remaining feature is we can assume that their discrimination is identical and be 1.For the features of p before these rankings, power is set
Again (being in general greater than 1), then the feature vector of feature will be corrected.
The feature extracted is weighted, the degree of association of feature and corresponding classification can be greatly improved, so as to make point
Class result is relatively reliable.
Embodiment 4
Population data sorting technique based on SVM as described above, the present embodiment is different from part and is, such as Fig. 4
Shown, the step S3 includes:
Step S31 adds in the classification information of each classification in group in the quadratic character matrix;
It that is to say in the quadratic character matrix and add a column data, which is the classification number of corresponding each individual
According to (classification information of group), in this way, the classification of each individual is added in the quadratic character matrix, convenient for SVM points
Class device is trained.
Step S32 learns the quadratic character matrix with the classification information, described eigenvector with
Correspondence is established between the classification information of group, the training SVM classifier obtains its discriminant function;
Wherein, the quantity of the SVM classifier is identical with the categorical measure of the group, in this way, multiple institutes can be trained
SVM classifier is stated, each SVM classifier corresponds to a classification of the group.In training, by the corresponding classification of the grader
With it is remaining it is of all categories demarcate, take the category as positive class, it is remaining of all categories for negative class, which is trained,
Obtain discriminant function;Wherein, the discriminant function is the g (x).
In this way, it is only necessary to which a small amount of SVM classifier of training significantly reduces the workload of calculating, improves classification speed.
Assuming that customer is divided into k classes by us, by gained feature+classification information matrix, supporting vector machine model is calculated
(SVM), the support vector machines of k two classification can be obtained.Wherein i-th of vector machine is the i-th similar remaining all kinds of divisions
It opens, the i-th class will be taken when training as positive class, remaining other class is that negative class is trained.For k class classification problems, it is only necessary to
Training k two class category support vector machines, therefore the number of its obtained classification function (k) is less, then speed of its classification
It is relatively fast.
Embodiment 5
Population data sorting technique based on SVM as described above, the present embodiment are different from part and are, the step
In rapid S4, the quantity of the SVM classifier is identical with the categorical measure of the group and one-to-one correspondence, described to treat point during classification
Monoid volume data passes through all SVM classifiers, if only one of which SVM classifier exports positive number, the group to be sorted
Volume data belongs to the corresponding classification of the SVM classifier;If wherein there are zero or more than one SVM classifier output positive number, institute
State the corresponding classification of SVM classifier that population data to be sorted belongs to the value maximum of discriminant function in all SVM classifiers.
When differentiating, k output valve fi (x)=sign (gi (x)) is obtained by k classifier respectively in sample, if only
There are one+1 occur, then its corresponding classification be input signal classification;The decision function constructed under actual conditions is always wrong
Difference, if output more than just one+1 (more than one class claims it to one's name) or neither one output (do not have for+1
One class claims it to one's name), then compare the output valve of g (x), the corresponding classification of the maximum is the sample class of input.
In this way, it is only necessary to by population data to be sorted successively through too small amount of several SVM classifiers, significantly reduce calculating
Workload, improve classification speed.And compared to Various Classifiers on Regional, such as:Neural network, decision tree, naive Bayesian etc.,
SVM methods are having a distinct increment in classifier performance, and the advantage with high-class precision, so as to improve group's composition analysis
Accuracy.
Embodiment 6
Population data sorting technique based on SVM as described above, the present embodiment be different from part be, be with
The corresponding population data sorter based on SVM of the population data sorting technique based on SVM, as shown in figure 5, it is wrapped
It includes:
Historical data processing unit 1 extracts history population data, determines the characteristic of group and group;
Eigenmatrix construction unit 2 according to the characteristic, builds the quadratic character matrix of the group;
Classifier training unit 3, according to the quadratic character matrix, the corresponding SVM classifier of training.
Grader taxon 4 treats classification population data using the SVM classifier and classifies.
In this way, can classify by computer to population data, easily and fast, manpower and materials energetically are saved.
In addition, compared to Various Classifiers on Regional, such as:Neural network, decision tree, naive Bayesian etc., SVM have larger on classifier performance
It is promoted, and the advantage with high-class precision, so as to improve the accuracy of group's composition analysis.
In historical data processing unit 1,
The history population data, the corresponding characteristic of classification and group including at least group;
History population data analyzed, first have to analyze history population data, therefrom determine group
The corresponding characteristic of the group of each classification and each classification.
By taking market shopping as an example, group therein is the shopping group in market, can be labeled as student, white collar, religion for we
The classification of teacher, old man, youth, child etc. as group, wherein the place for having conflict can be adjusted according to actual conditions,
But the classification of each group should have apparent differentiation with other classifications of group, otherwise during follow-up progress group classification
Accuracy can substantially reduce;The characteristic of group, it is related with the classification of group, for example, the type of student's classification shopping is is somebody's turn to do
The characteristic of classification, wherein may include:Books, stationery, rubber, fruit, milk etc., are its characteristic, old man's class
Another characteristic data may include:Manufacture of Walnut Milk, Radix Isatidis, fruit etc. are also its characteristic.History population data comes
Source can be by artificial or computer statistics daily shopping data, specifically be subject to actual conditions.
The corresponding characteristic of classification and each group of group is determined from history population data, it can be combed
Reason, while the data of wherein apparent error can be rejected, improve the accuracy rate of subsequent analysis;Subsequent analysis speed can also be improved
Degree, and then improve the speed and efficiency of the entirely population data sorter based on SVM.
In eigenmatrix construction unit 2,
The group determined according to said units and corresponding characteristic, the quadratic character matrix of building group, wherein, two
Row vector and column vector represent each individual of group and the characteristic of group, quadratic character matrix respectively in secondary eigenmatrix
In each element be corresponding individual in population and characteristic the degree of association.
In this way, group and corresponding characteristic can be converted to the form of matrix, digitized, convenient for computer
It is identified and classifies, fast and easy, and then improve the entirely efficiency of the population data sorter based on SVM and accuracy.
In classifier training unit 3,
According to the quadratic character matrix that history population data is built, SVM classifier is trained, so as to obtain maturation
SVM classifier, subsequently to classify to new population data.
In this way, can classify by computer to population data, easily and fast, manpower and materials energetically are saved.
In addition, compared to Various Classifiers on Regional, such as:Neural network, decision tree, naive Bayesian etc., SVM have larger on classifier performance
It is promoted, and the advantage with high-class precision, so as to improve the accuracy of group's composition analysis.
Embodiment 7
Population data sorter based on SVM as described above, the present embodiment is different from part and is, such as Fig. 6
Shown, the eigenmatrix construction unit 2 includes:
Essential characteristic extracts subelement 21, analyzes the characteristic of group, therefrom extracts each classification of group and corresponds to
Essential characteristic;
Data in the history population data are converted into feature vector by feature vector transforming subunit 22;
Vector structure matrix subelement 24, with the quadratic character matrix of described eigenvector building group.
In this way, being analyzed population characteristic and being extracted, the degree of association of feature and corresponding classification can be greatly improved, from
And make classification results relatively reliable.
In essential characteristic extraction subelement 21,
Group has multiple classifications, and each classification has multiple characteristics again;But these characteristics and classification
The degree of association simultaneously differs, it is also necessary to extract;For example, the corresponding characteristic of student's classification includes books, stationery, rubber
Deng, but be also possible to due to cause specific purchase seafood, soymilk, a milk powder etc. in characteristic can also include
Seafood, soymilk, milk powder, but the product that seafood, soymilk, milk powder and not all student or Most students can all be bought, very may be used
Can be several students single shopping product, if seafood, soymilk, milk powder are also assert be student's classification characteristic,
The accuracy rate of classification results can be caused to substantially reduce, it is therefore desirable to extract.
Extraction can express the feature of classification information, be the primary premise for realizing machine learning.Characteristic more being capable of table
The characteristics of up to group, shows that its discrimination is higher, and the effect of machine learning also will be better.Therefore, point of effective group is selected
Category feature is the key that realize to be classified to group.Demographic categories can be described by being extracted from the characteristic of group
Essential characteristic can greatly improve the accuracy to group classification.
For example, books, stationery, rubber, fruit, milk etc. are extracted from the corresponding characteristic of student's classification is used as such
Other essential characteristic.In this way when subsequently judging, the accuracy of judgement can be greatly improved.
In feature vector transforming subunit 22,
For the individual data items in the history population data, there is the classification described in it, an individual centainly belongs to one
A classification if the individual has some foundation characteristic of the category, 1 is denoted as in this feature, is otherwise 0, thus will be every
A individual data items are converted into a basic feature vector.For example, student's single purchase books, stationery, rubber, water
Fruit, soymilk (not being the essential characteristic of student), then its feature vector may be (0,1,0,1,1,1,0,0,0), wherein, feature to
Each element in amount corresponds to character pair data seafood, books, Manufacture of Walnut Milk, stationery, rubber, fruit, flapjack, three texts respectively
Fish, soymilk (being only demonstration in this example, it is not intended that the inclusion relation between characteristic);Wherein, since soymilk is not student
Essential characteristic, therefore its corresponding position is still denoted as 0.
In vector structure matrix subelement 24,
In quadratic character matrix row vector and column vector represent respectively group it is each individual and group characteristic, two
Each element in secondary eigenmatrix is the degree of association of corresponding individual in population and characteristic.
In this way, being analyzed population characteristic and being extracted, the degree of association of feature and corresponding classification can be greatly improved, from
And make classification results relatively reliable.
Embodiment 8
Population data sorter based on SVM as described above, the present embodiment is different from part and is, such as Fig. 7
Shown, the eigenmatrix construction unit 2 further includes:
Weights assign subelement 23, its different weights are assigned according to the significance level of characteristic, and correct institute
State feature vector;
Characteristic of division usually requires us and adds weight for it.In the foundation characteristic of one classification, each feature and such
Other degree of association is also different, such as student, and books, stationery, rubber and its degree of association are higher than rubber, water
Fruit;If its degree of association not distinguished, it can so that subsequent classification is inaccurate.Therefore it needs, for some classification, to assign
The different weights of the characteristic, to be modified to feature vector.For example, we are by student's classification, books, stationery,
Rubber, fruit weights be assigned to 5,4,3,2 respectively, then its feature vector modification be (0,5,0,4,3,2,0,0,0).
To characteristic setting weights process be:It counts for each classification in group, p before degree of association ranking
The feature of name (according to actual conditions choose, and p value is bigger, and analysis result is more accurate, but comparatively workload is also got over by p value
Greatly), remaining feature is we can assume that their discrimination is identical and be 1.For the features of p before these rankings, power is set
Again (being in general greater than 1), then the feature vector of feature will be corrected.
The feature extracted is weighted, the degree of association of feature and corresponding classification can be greatly improved, so as to make point
Class result is relatively reliable.
Embodiment 9
Population data sorter based on SVM as described above, the present embodiment is different from part and is, such as Fig. 8
Shown, the classifier training unit 3 includes:
Classification information adds subelement 31, and the classification letter of each classification in group is added in the quadratic character matrix
Breath;
Matrix learning training subelement 32 learns the quadratic character matrix with the classification information,
Correspondence is established between described eigenvector and the classification information of group, the training SVM classifier obtains it and sentences
Disconnected function.
In this way, it is only necessary to which a small amount of SVM classifier of training significantly reduces the workload of calculating, improves classification speed.
In classification information addition subelement 31, it that is to say in the quadratic character matrix and add a column data, the column data
It is the categorical data (classification information of group) of corresponding each individual, in this way, the classification of each individual is added to described two
In secondary eigenmatrix, convenient for being trained to SVM classifier.
In matrix learning training subelement 32,
The quantity of the SVM classifier is identical with the categorical measure of the group, in this way, multiple SVM can be trained
Grader, each SVM classifier correspond to a classification of the group.In training, by the corresponding classification congruence of the grader
Under it is of all categories demarcate, take the category as positive class, it is remaining of all categories for negative class, which is trained, is obtained
Discriminant function;Wherein, the discriminant function is the g (x).
In this way, it is only necessary to which a small amount of SVM classifier of training significantly reduces the workload of calculating, improves classification speed.
Assuming that customer is divided into k classes by us, by gained feature+classification information matrix, supporting vector machine model is calculated
(SVM), the support vector machines of k two classification can be obtained.Wherein i-th of vector machine is the i-th similar remaining all kinds of divisions
It opens, the i-th class will be taken when training as positive class, remaining other class is that negative class is trained.For k class classification problems, it is only necessary to
Training k two class category support vector machines, therefore the number of its obtained classification function (k) is less, then speed of its classification
It is relatively fast.
Embodiment 10
Population data sorter based on SVM as described above, the present embodiment are different from part and are, described point
In class device taxon 4, the quantity of the SVM classifier is identical with the categorical measure of the group and corresponds, during classification,
Therefore the population data to be sorted passes through all SVM classifiers, if only one of which SVM classifier exports positive number,
The population data to be sorted belongs to the corresponding classification of the SVM classifier;If wherein having zero or more than one SVM classifier defeated
Go out positive number, then the population data to be sorted belongs to the SVM classifier correspondence of the value maximum of discriminant function in all SVM classifiers
Classification.
When differentiating, k output valve fi (x)=sign (gi (x)) is obtained by k classifier respectively in sample, if only
There are one+1 occur, then its corresponding classification be input signal classification;The decision function constructed under actual conditions is always wrong
Difference, if output more than just one+1 (more than one class claims it to one's name) or neither one output (do not have for+1
One class claims it to one's name), then compare the output valve of g (x), the corresponding classification of the maximum is the sample class of input.
In this way, it is only necessary to by population data to be sorted successively through too small amount of several SVM classifiers, significantly reduce calculating
Workload, improve classification speed.And compared to Various Classifiers on Regional, such as:Neural network, decision tree, naive Bayesian etc.,
SVM methods are having a distinct increment in classifier performance, and the advantage with high-class precision, so as to improve group's composition analysis
Accuracy.
The foregoing is merely presently preferred embodiments of the present invention, is merely illustrative for the purpose of the present invention, and not restrictive
's.Those skilled in the art understands, many changes can be carried out to it in the spirit and scope limited in the claims in the present invention,
It changes or even equivalent, but falls in protection scope of the present invention.
Claims (10)
1. a kind of population data sorting technique based on SVM, which is characterized in that including:
Step S1 extracts history population data, determines the characteristic of group and group;
Step S2 according to the characteristic, builds the quadratic character matrix of the group;
Step S3, according to the quadratic character matrix, the corresponding SVM classifier of training;
Step S4 treats classification population data using the SVM classifier and classifies.
2. the population data sorting technique based on SVM as described in claim 1, which is characterized in that the step 2 includes:
Step S21, analyzes the characteristic of group, therefrom extracts the corresponding essential characteristic of each classification of group;
Data in the history population data are converted into feature vector by step S22;
Step S24, with the quadratic character matrix of described eigenvector building group.
3. the population data sorting technique based on SVM as claimed in claim 2, which is characterized in that the step 2 further includes:
Step S23 assigns its different weights according to the significance level of characteristic, and corrects described eigenvector.
4. the population data sorting technique based on SVM as described in any in claim 1-3, which is characterized in that the step 3
Including:
Step S31 adds in the classification information of each classification in group in the quadratic character matrix;
Step S32 learns the quadratic character matrix with the classification information, in described eigenvector and group
The classification information between establish correspondence, the training SVM classifier obtains its discriminant function.
5. the population data sorting technique based on SVM as described in any in claim 1-3, which is characterized in that the step
In S2, row vector and column vector represent each individual of group and the characteristic of group respectively in the quadratic character matrix,
Each element in the quadratic character matrix is the degree of association of corresponding individual in population and characteristic.
6. the population data sorting technique based on SVM as described in any in claim 1-3, which is characterized in that the step
In S4, the quantity of the SVM classifier is identical with the categorical measure of the group.
7. the population data sorting technique based on SVM as claimed in claim 4, which is characterized in that described in the step S4
The quantity of SVM classifier is identical with the categorical measure of the group and corresponds, during classification, the population data warp to be sorted
All SVM classifiers are crossed, if only one of which SVM classifier exports positive number, the population data to be sorted belongs to this
The corresponding classification of SVM classifier;If wherein there are zero or more than one SVM classifier output positive number, the group to be sorted
Data belong to the corresponding classification of SVM classifier of the value maximum of discriminant function in all SVM classifiers.
It is 8. a kind of corresponding based on SVM's with the population data sorting technique based on SVM any in the claims
Population data sorter, which is characterized in that including:
Historical data processing unit extracts history population data, determines the characteristic of group and group;
Eigenmatrix construction unit according to the characteristic, builds the quadratic character matrix of the group;
Classifier training unit, according to the quadratic character matrix, the corresponding SVM classifier of training;
Grader taxon treats classification population data using the SVM classifier and classifies.
9. the population data sorter based on SVM as claimed in claim 8, which is characterized in that the eigenmatrix structure
Unit includes:
Essential characteristic extracts subelement, analyzes the characteristic of group, and it is corresponding basic therefrom to extract each classification of group
Feature;
Data in the history population data are converted into feature vector by feature vector transforming subunit;
Vector structure matrix subelement, with the quadratic character matrix of described eigenvector building group.
10. the population data sorter based on SVM as claimed in claim 9, which is characterized in that the eigenmatrix structure
Unit further includes:Weights assign subelement, its different weights are assigned according to the significance level of characteristic, and correct institute
State feature vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611254023.0A CN108268873A (en) | 2016-12-30 | 2016-12-30 | A kind of population data sorting technique and device based on SVM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611254023.0A CN108268873A (en) | 2016-12-30 | 2016-12-30 | A kind of population data sorting technique and device based on SVM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108268873A true CN108268873A (en) | 2018-07-10 |
Family
ID=62754314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611254023.0A Pending CN108268873A (en) | 2016-12-30 | 2016-12-30 | A kind of population data sorting technique and device based on SVM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108268873A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197207A (en) * | 2019-05-13 | 2019-09-03 | 腾讯科技(深圳)有限公司 | To not sorting out the method and relevant apparatus that user group is sorted out |
-
2016
- 2016-12-30 CN CN201611254023.0A patent/CN108268873A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197207A (en) * | 2019-05-13 | 2019-09-03 | 腾讯科技(深圳)有限公司 | To not sorting out the method and relevant apparatus that user group is sorted out |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103309953B (en) | Method for labeling and searching for diversified pictures based on integration of multiple RBFNN classifiers | |
CN106611052A (en) | Text label determination method and device | |
CN102521656B (en) | Integrated transfer learning method for classification of unbalance samples | |
CN109933670A (en) | A kind of file classification method calculating semantic distance based on combinatorial matrix | |
CN103207913B (en) | The acquisition methods of commercial fine granularity semantic relation and system | |
CN105260437B (en) | Text classification feature selection approach and its application in biological medicine text classification | |
CN103632168A (en) | Classifier integration method for machine learning | |
CN106845528A (en) | A kind of image classification algorithms based on K means Yu deep learning | |
CN105045913B (en) | File classification method based on WordNet and latent semantic analysis | |
CN104966105A (en) | Robust machine error retrieving method and system | |
CN101604322A (en) | A kind of decision level text automatic classified fusion method | |
CN110674846A (en) | Genetic algorithm and k-means clustering-based unbalanced data set oversampling method | |
CN109213853A (en) | A kind of Chinese community's question and answer cross-module state search method based on CCA algorithm | |
CN105938565A (en) | Multi-layer classifier and Internet image aided training-based color image emotion classification method | |
CN109492105A (en) | A kind of text sentiment classification method based on multiple features integrated study | |
CN105975611A (en) | Self-adaptive combined downsampling reinforcing learning machine | |
CN110297888A (en) | A kind of domain classification method based on prefix trees and Recognition with Recurrent Neural Network | |
CN103020167A (en) | Chinese text classification method for computer | |
CN106570076A (en) | Computer text classification system | |
CN104978569A (en) | Sparse representation based incremental face recognition method | |
Li et al. | Support cluster machine | |
CN112183652A (en) | Edge end bias detection method under federated machine learning environment | |
CN109933648A (en) | A kind of differentiating method and discriminating device of real user comment | |
CN103745242A (en) | Cross-equipment biometric feature recognition method | |
CN103268346A (en) | Semi-supervised classification method and semi-supervised classification system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180710 |