CN105740896A - High-dimensional data mode classification method and apparatus - Google Patents

High-dimensional data mode classification method and apparatus Download PDF

Info

Publication number
CN105740896A
CN105740896A CN201610059218.3A CN201610059218A CN105740896A CN 105740896 A CN105740896 A CN 105740896A CN 201610059218 A CN201610059218 A CN 201610059218A CN 105740896 A CN105740896 A CN 105740896A
Authority
CN
China
Prior art keywords
training sample
target
layer
classification
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610059218.3A
Other languages
Chinese (zh)
Inventor
李利伟
张兵
厉为
高连如
高建威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Remote Sensing and Digital Earth of CAS
Original Assignee
Institute of Remote Sensing and Digital Earth of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Remote Sensing and Digital Earth of CAS filed Critical Institute of Remote Sensing and Digital Earth of CAS
Priority to CN201610059218.3A priority Critical patent/CN105740896A/en
Publication of CN105740896A publication Critical patent/CN105740896A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The present application discloses a high-dimensional data mode classification method and apparatus. The method comprises: reading to-be-classified high-dimensional data in a three-dimensional array R manner, wherein a data set in a corresponding position of any line and column combination is taken as a target; reading a training sample set S marked with a classification category; according to a preset category vote rule, carrying out mode match on each target in high-dimensional data with each training sample in the training sample set S, and determining a voting result of each category; and determining the category with the highest category voting score as a category including the target. According to the method and apparatus disclosed by the present application, high-dimensional data is embodied as the three-dimensional array, each target is a multidimensional column vector, each element in the column vector characterizes an attribute of the target, mode match is further carried out on the target and the training sample according to the preset category vote rule to determine the vote result of each category, and the category with the highest category voting score is determined as a category including the target, so that calculation accuracy of high-dimensional data mode classification is improved.

Description

A kind of high dimensional data method for classifying modes and device
Technical field
The application relates to Pattern classification techniques field, more particularly, it relates to a kind of high dimensional data method for classifying modes and device.
Background technology
The modern life progresses into the Internet of Things epoch, and various different classes of and function physical object realizes connecting each other by digitized description and network service, greatly improves quality of life and production efficiency.Sensor technology is according to the medium such as electromagnetic wave and interacting goals principle, it is possible to economy gathers the multiple attribute data of physical object easily, and powerful support people are to physical object digital management and Scientific Cognition.
Along with being continuously increased of sensor type and level of application thereof, kind and the quantity of the objective attribute target attribute data that people can obtain constantly increase, by excavating the different classes of pattern information lying in mass data, bring more possibility for daily life and scientific research activity.
But, practical application generally comprises several or even 100,000 dimensions about the attribute data of target, there is very strong complexity, in computational accuracy, bring very big difficulty to pattern classification.High dimensional data pattern classification difficult point in actual applications is mainly reflected in: the feature space statistical distribution at data place is sufficiently complex, and traditional algorithm is very low to the computational accuracy of high dimensional data pattern classification.
Summary of the invention
In view of this, this application provides a kind of high dimensional data method for classifying modes and device, for solving the problem that existing high dimensional data pattern classification computational accuracy is low.
To achieve these goals, it is proposed that scheme as follows:
A kind of high dimensional data method for classifying modes, including:
Reading high dimensional data to be sorted, described high dimensional data is the three-dimensional array R being made up of row, column and layer, and wherein, the data acquisition system of any row and column combination corresponding position is as a target, and target is Bands dimensional vector, and Bands is the number of plies of three-dimensional array R;
Read the training sample set S being marked with class categories;Described training sample set S comprises the training sample subset of N number of classification, and the training sample subset of each classification is that a Bands ties up columns group, a training sample of the every a line record category in columns group;
According to preset category vote rule, each target is carried out pattern match with each training sample in described training sample set S, it is determined that voting results of all categories;
According to described voting results of all categories, the classification of category vote highest scoring is defined as the classification belonging to target.
Preferably, described according to preset category vote rule, each training sample in each target and described training sample set S is carried out pattern match, it is determined that voting results of all categories, including:
Calculate each element in target, and the distance of corresponding element in each training sample in training sample set S, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding;
According to the category vote result of all elements in target, add up the gained vote number of N number of classification.
Preferably, described according to preset category vote rule, each training sample in each target and described training sample set S is carried out pattern match, it is determined that voting results of all categories, including:
According to preset multi-level features set construction strategy, described target carrying out multi-level features set structure, obtain the target characteristic set of the Bands layer of target, wherein the target characteristic set of i-th layer comprises C (Bands, i) individual element;
According to described preset multi-level features set construction strategy, each training sample in described training sample set S is carried out multi-level features set structure, obtain the training sample characteristic set of the Bands layer of each training sample, wherein the training sample characteristic set of i-th layer comprises C (Bands, i) individual element;
Calculate each element in the target characteristic set of each layer of target successively, distance with the corresponding element in the training sample characteristic set of the respective layer of each training sample, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding;
According to the category vote result of all elements in the target characteristic set of layer, add up the gained vote number of N number of classification, obtain this layer and characterize N number of classification and each win the vote the category feature column vector of number;
By the described category feature column vector element number divided by the target characteristic set of this layer, obtain the category feature column vector after normalization;
According to each layer weighted value set, it is weighted the category feature column vector after the normalization of all layers being added, obtains total category feature column vector;
Then, described according to described voting results of all categories, the classification of category vote highest scoring is defined as the classification belonging to target, including:
The classification that selected value is maximum in described total category feature column vector, as the classification belonging to target.
Preferably, each element in the target characteristic set of described calculate target successively each layer, and the distance of the corresponding element in the training sample characteristic set of the respective layer of each training sample, including:
Calculate each element in the target characteristic set of each layer of target successively, with the Euclidean distance of the corresponding element in the training sample characteristic set of the respective layer of each training sample.
Preferably, described according to preset multi-level features set construction strategy, described target is carried out multi-level features set structure, obtain the target characteristic set of the Bands layer of target, including:
Target self is defined as the target characteristic set of the 1st layer;
I element any in the target characteristic set of described 1st layer being combined, obtains i dimensional feature vector set, this i dimensional feature vector set is defined as the target characteristic set of i-th layer of target, wherein i is the integer from 2 to Bands.
A kind of high dimensional data pattern classification device, including:
High dimensional data reads unit, and for reading high dimensional data to be sorted, described high dimensional data is the three-dimensional array R being made up of row, column and layer, wherein, the data acquisition system of any row and column combination corresponding position is as a target, and target is Bands dimensional vector, and Bands is the number of plies of three-dimensional array R;
Training sample set reads unit, for reading the training sample set S being marked with class categories;Described training sample set S comprises the training sample subset of N number of classification, and the training sample subset of each classification is that a Bands ties up columns group, a training sample of the every a line record category in columns group;
Pattern matching unit, for carrying out pattern match to each target with each training sample in described training sample set S according to preset category vote rule, it is determined that voting results of all categories;
Classification determination unit, for according to described voting results of all categories, being defined as the classification belonging to target by the classification of category vote highest scoring.
Preferably, described pattern matching unit includes:
First mode coupling subelement, for calculating each element in target, and the distance of corresponding element in each training sample in training sample set S, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding;According to the category vote result of all elements in target, add up the gained vote number of N number of classification.
Preferably, described pattern matching unit includes:
Second pattern match subelement, for according to preset multi-level features set construction strategy, described target being carried out multi-level features set structure, obtain the target characteristic set of the Bands layer of target, wherein the target characteristic set of i-th layer comprises C (Bands, i) individual element;
3rd pattern match subelement, for according to described preset multi-level features set construction strategy, each training sample in described training sample set S is carried out multi-level features set structure, obtain the training sample characteristic set of the Bands layer of each training sample, wherein the training sample characteristic set of i-th layer comprises C (Bands, i) individual element;
Fourth mode coupling subelement, for calculating each element in the target characteristic set of each layer of target successively, distance with the corresponding element in the training sample characteristic set of the respective layer of each training sample, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding;
5th pattern match subelement, for according to the category vote result of all elements in the target characteristic set of layer, adding up the gained vote number of N number of classification, obtains this layer and characterizes N number of classification and each win the vote the category feature column vector of number;
6th pattern match subelement, for by the described category feature column vector element number divided by the target characteristic set of this layer, obtaining the category feature column vector after normalization;
7th pattern match subelement, for according to each layer weighted value set, being weighted the category feature column vector after the normalization of all layers being added, obtain total category feature column vector;
Then, described classification determination unit includes:
First category determines subelement, for the classification that selected value is maximum in described total category feature column vector, as the classification belonging to target.
Preferably, described fourth mode coupling subelement each element in calculating the target characteristic set of each layer of target, during with the distance of the corresponding element in the training sample characteristic set of the respective layer of each training sample, use Euclidean distance computational methods.
Preferably, described second pattern match subelement includes:
Unit is determined in 1st layer of target characteristic set, for target self is defined as the target characteristic set of the 1st layer;
Unit is determined in i-th layer of target characteristic set, it is combined for i element any in the target characteristic set to described 1st layer, obtaining i dimensional feature vector set, this i dimensional feature vector set is defined as the target characteristic set of i-th layer of target, wherein i is the integer from 2 to Bands.
nullCan be seen that from above-mentioned technical scheme,The high dimensional data method for classifying modes that the embodiment of the present application provides,Read high dimensional data to be sorted,High dimensional data is for by going、The three-dimensional array R of row and layer composition,Wherein,The data acquisition system of any row and column combination corresponding position is as a target,Target is Bands dimensional vector,Bands is the number of plies of three-dimensional array R,Further,Read the training sample set S being marked with class categories,Training sample set S comprises the training sample subset of N number of classification,The training sample subset of each classification is that a Bands ties up columns group,One training sample of the every a line record category in columns group,Then,According to preset category vote rule, each target in high dimensional data is carried out pattern match with each training sample in described training sample set S,Determine voting results of all categories,Finally,The classification of category vote highest scoring is defined as the classification belonging to target.High dimensional data is embodied by the application with three-dimensional array form, each target is Bands dimensional vector, in column vector, each element characterizes an attribute of target, further target and training sample are carried out pattern match according to preset category vote rule, determine the voting results of each classification, the classification of category vote highest scoring is defined as the classification belonging to target, improves the computational accuracy of high dimensional data pattern classification.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, the accompanying drawing used required in embodiment or description of the prior art will be briefly described below, apparently, accompanying drawing in the following describes is only embodiments herein, for those of ordinary skill in the art, under the premise not paying creative work, it is also possible to obtain other accompanying drawing according to the accompanying drawing provided.
Fig. 1 is the disclosed a kind of high dimensional data method for classifying modes flow chart of the embodiment of the present application;
Fig. 2 is a kind of method flow diagram determining voting results of all categories disclosed in the embodiment of the present application;
Fig. 3 is a kind of high dimensional data pattern classification apparatus structure schematic diagram disclosed in the present application.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described embodiment is only some embodiments of the present application, rather than whole embodiments.Based on the embodiment in the application, the every other embodiment that those of ordinary skill in the art obtain under not making creative work premise, broadly fall into the scope of the application protection.
It is the disclosed a kind of high dimensional data method for classifying modes flow chart of the embodiment of the present application referring to Fig. 1, Fig. 1.
As it is shown in figure 1, the method includes:
Step S100, read high dimensional data to be sorted;
Wherein, described high dimensional data is the three-dimensional array R being made up of row, column and layer, and wherein, the data acquisition system of any row and column combination corresponding position is as a target, and target is Bands dimensional vector, and Bands is the number of plies of three-dimensional array R.
For a target, each element in its Bands dimensional vector is all considered as a property value of target, determines Bands value size according to the number of the attribute of target.
Step S110, read and be marked with the training sample set S of class categories;
Specifically, training sample set S is made up of multiple training samples of known class categories.
Optionally, training sample set S can comprise the training sample subset of N number of classification, the training sample subset of each classification is that a Bands ties up columns group, and a training sample of the every a line record category in columns group, the line number of columns group is the number of the training sample of the category.
Step S120, according to preset category vote rule, each training sample in each target and described training sample set S is carried out pattern match, it is determined that voting results of all categories;
Specifically, each training sample is all identified with generic, and the classification belonging to training sample each in training sample set S forms category set.By according to preset category vote rule, target and each training sample being carried out pattern match, it may be determined that the voting results of each classification in category set.
Step S130, according to described voting results of all categories, the classification of category vote highest scoring is defined as the classification belonging to target.
nullThe high dimensional data method for classifying modes that the embodiment of the present application provides,Read high dimensional data to be sorted,High dimensional data is for by going、The three-dimensional array R of row and layer composition,Wherein,The data acquisition system of any row and column combination corresponding position is as a target,Target is Bands dimensional vector,Bands is the number of plies of three-dimensional array R,Further,Read the training sample set S being marked with class categories,Training sample set S comprises the training sample subset of N number of classification,The training sample subset of each classification is that a Bands ties up columns group,One training sample of the every a line record category in columns group,Then,According to preset category vote rule, each target in high dimensional data is carried out pattern match with each training sample in described training sample set S,Determine voting results of all categories,Finally,The classification of category vote highest scoring is defined as the classification belonging to target.High dimensional data is embodied by the application with three-dimensional array form, each target is Bands dimensional vector, in column vector, each element characterizes an attribute of target, further target and training sample are carried out pattern match according to preset category vote rule, determine the voting results of each classification, the classification of category vote highest scoring is defined as the classification belonging to target, improves the computational accuracy of high dimensional data pattern classification.
Optionally, the sequencing of above-mentioned steps S100 and step S110 can overturn or perform simultaneously, and Fig. 1 merely illustrates a kind of alternative.
It follows that to above-mentioned steps S120, each target is carried out pattern match with each training sample in described training sample set S according to preset category vote rule, it is determined that the process of voting results of all categories is introduced.
Present embodiment discloses two kinds of optional modes, as follows respectively:
The first:
Target is Bands dimensional vector, comprises Bands element.Each training sample is considered as Bands and ties up row vector, also comprises Bands element.
Therefore, calculate each element in target, and the distance of corresponding element in each training sample in training sample set S, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding.
According to the category vote result of all elements in target, add up the gained vote number of N number of classification.
Citing as, target is three dimensional vectors x 1 x 2 x 3
Training sample set S comprises the training sample subset of 2 classifications, is respectively as follows:
The training sample subset of classification 1: A 1 B 1 C 1 A 2 B 2 C 2
The training sample subset of classification 2: (A3B3C3)
The training sample subset of classification 1 comprises two training samples, is respectively as follows:
And (A2B2C2) (A1B1C1)
The training sample subset of classification 2 comprises a training sample, for:
(A3B3C3)
When then calculating, calculating the respective distance of A1, A2, A3 of element x 1 and three training samples in target, if A1 and x1 is closest, then the classification 1 belonging to the training sample at A1 place remembers a ticket;
Further, calculating the respective distance of B1, B2, B3 of element x 2 and three training samples in target, if B2 and x1 is closest, then the classification 1 belonging to the training sample at B2 place remembers a ticket;
Finally, calculating the respective distance of C1, C2, C3 of element x 3 and three training samples in target, if C3 and x1 is closest, then the classification 2 belonging to the training sample at C3 place remembers a ticket;
Adding up gained vote number of all categories it can be seen that classification 1 obtained two tickets, classification 2 obtained a ticket.
Therefore, classification 2 is defined as the classification belonging to target.
If it should be noted that the gained vote mark that there is multiple classification is identical and the highest, then can select a forward classification is defined as the classification belonging to target.
The second, is a kind of method flow diagram determining voting results of all categories disclosed in the embodiment of the present application referring to Fig. 2, Fig. 2.
As in figure 2 it is shown, the method includes:
Step S200, described target is carried out multi-level features set structure, obtain the target characteristic set of the Bands layer of target;
Specifically, according to preset multi-level features set construction strategy, described target carrying out multi-level features set structure, obtain the target characteristic set of the Bands layer of target, wherein, the target characteristic set of i-th layer comprises C (Bands, i) individual element.Wherein (a, b) for mathematical symbol, represents the number of all combinations taking out b element from a element to C.
Step S210, each training sample in described training sample set S is carried out multi-level features set structure, obtain the training sample characteristic set of the Bands layer of each training sample;
Specifically, according to preset multi-level features set construction strategy same as described above, each training sample in described training sample set S is carried out multi-level features set structure, obtain the training sample characteristic set of the Bands layer of each training sample, wherein, the training sample characteristic set of i-th layer comprises C (Bands, i) individual element.
Step S220, calculate each element in the target characteristic set of each layer of target successively, distance with the corresponding element in the training sample characteristic set of the respective layer of each training sample, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding;
Specifically, in this step, if existence two and plural training sample are identical and minimum with the distance of target, then can take the classification that the preceding training sample that sorts is corresponding, remember a ticket for the category.
Citing carries out distance calculating respectively such as: a (1≤a≤Bands) the individual element of the ground floor of target with a element of the ground floor of each training sample, selects apart from classification X corresponding to minimum training samplea, for classification XaRemember a ticket.
Step S230, according to the category vote result of all elements in the target characteristic set of layer, add up the gained vote number of N number of classification, obtain this layer and characterize N number of classification and each win the vote the category feature column vector of number;
Specifically, for the category vote result of all elements in the target characteristic set of a certain layer, add up the number of each winning the vote of N number of classification, form one and classification number N column vector of the same size.
Lift for example, it is assumed that classification number N is 3.Added up by the category vote result of all elements in the target characteristic set to the ground floor of target, obtain category feature column vector:
3 4 2
By category characteristic series vector it can be seen that the gained vote number of first category is 3, the gained vote number of second category is 4, and the gained vote number of the 3rd classification is 2.
Step S240, by the described category feature column vector element number divided by the target characteristic set of this layer, obtain the category feature column vector after normalization;
Step S250, according to set each layer weighted value, the category feature column vector after the normalization of all layers is weighted be added, obtain total category feature column vector.
Specifically, the application has preset the weighted value of each level, and then is weighted the category feature column vector after normalization being added, and obtains total category feature column vector.
On this basis, described according to described voting results of all categories, the classification of category vote highest scoring is defined as the classification belonging to target, particularly as follows:
The classification that selected value is maximum in described total category feature column vector, as the classification belonging to target.
The category vote result defining method that the present embodiment provides, by target and training sample are carried out multi-level features set structure, enriches element contrast range so that calculated classification results is more accurate.
Optionally, each element in the target characteristic set of each layer of above-mentioned calculating target, during with the distance of the corresponding element in the training sample characteristic set of the respective layer of each training sample, Euclidean distance computational methods can be selected, certainly in addition to this it is possible to select other distance calculating methods such as mahalanobis distance.Wherein, the covariance matrix needed for calculating mahalanobis distance can utilize the covariance matrix of the training sample element set participating in calculating to replace.
Further, above-mentioned preset multi-level features set construction strategy can have multiple, and the present embodiment provides a kind of optional construction strategy, as follows:
It is configured to example with the multi-level features set of target to illustrate:
Target is Bands dimensional vector, and target self is defined as the target characteristic set of the 1st layer;
I element any in the target characteristic set of the 1st layer being combined, obtains i dimensional feature vector set, this i dimensional feature vector set is defined as the target characteristic set of i-th layer of target, wherein i is the integer from 2 to Bands, is listed below:
The target characteristic set of the 2nd layer comprises, the 2 dimensional feature vector set that 2 element combinations any in the target characteristic set of the 1st layer are formed, and amounts to C (Bands, 2) individual element;
The target characteristic set of the 3rd layer comprises, the 3 dimensional feature vector set that 3 element combinations any in the target characteristic set of the 1st layer are formed, and amounts to C (Bands, 2) individual element;
The target characteristic set of i-th layer comprises, and to the i dimensional feature vector set that in the target characteristic set of the 1st level, arbitrarily i element combinations is formed, amounts to C (Bands, i) individual element.
Further, the multi-level features set construction strategy of training sample is referred to the multi-level features set building mode of above-mentioned target, repeats no more herein.
For the ease of understanding above-mentioned calculating process, the application enumerates an instantiation and illustrates:
Hypothetical target is: x 1 x 2 x 3
Training sample set S comprises the training sample subset of 2 classifications, is respectively as follows:
The training sample subset of classification 1: A 1 B 1 C 1 A 2 B 2 C 2
The training sample subset of classification 2: (A3B3C3)
First, target is carried out multi-level features set structure, obtains:
1st layer of target characteristic set: x 1 x 2 x 3
2nd layer of target characteristic set: x 1 x 2 x 1 x 3 x 2 x 3
3rd layer of target characteristic set: [(x1x2x3)]
Meanwhile, three training samples are carried out multi-level features set structure, only illustrate for training sample (A1B1C1) herein:
1st layer of training sample characteristic set: (A1B1C1)
2nd layer of training sample characteristic set: A 1 B 1 A 1 C 1 B 1 C 1
3rd layer of training sample characteristic set: [ A 1 B 1 C 1 ]
Next, (k belongs to [1 to the kth of calculating target, 3]) in the set of layer target characteristic, i-th (i belongs to [1, C (3, k)]) individual element and the distance of corresponding element in the respective layer training sample characteristic set of each training sample, therefrom determine apart from the classification corresponding to minimum training sample, remember a ticket for the category;
In the target characteristic set of the 1st layer of hypothetical target, the category feature column vector of the category vote result composition of each element is:
2 0 1
That is, classification 1 obtains 2 tickets, classification 2 obtains 0 ticket, and classification 3 obtains 1 ticket;
In the target characteristic set of the 2nd layer of target, the category feature column vector of the category vote result composition of each element is:
1 1 1
That is, classification 1 obtains 1 ticket, classification 2 obtains 1 ticket, and classification 3 obtains 1 ticket;
In the target characteristic set of the 3rd layer of target, the category feature column vector of the category vote result composition of each element is:
1 0 0
That is, classification 1 obtains 1 ticket, classification 2 obtains 0 ticket, and classification 3 obtains 0 ticket.
The category feature column vector that three layers are obtained is normalized respectively, is followed successively by after normalization:
2 / 3 0 1 / 3 , 1 / 3 1 / 3 1 / 3 , 1 0 0
Each layer weighted value set in advance respectively 0.5,0.3,0.2, then be weighted the category feature column vector after normalization being added, obtain total category feature column vector:
0.5 * 2 / 3 0 1 / 3 + 0.3 * 1 / 3 1 / 3 1 / 3 + 0.2 * 1 0 0 = 19 / 30 3 / 30 8 / 30
It follows that ballot value maximum be classification 1, therefore classification 1 is defined as the classification of target.
The high dimensional data pattern classification device below the embodiment of the present application provided is described, and high dimensional data pattern classification device described below and above-described high dimensional data method for classifying modes can mutually to should refer to.
It is a kind of high dimensional data pattern classification apparatus structure schematic diagram disclosed in the present application referring to Fig. 3, Fig. 3.
As it is shown on figure 3, this device includes:
High dimensional data reads unit 31, for reading high dimensional data to be sorted, described high dimensional data is the three-dimensional array R being made up of row, column and layer, wherein, the data acquisition system of any row and column combination corresponding position is as a target, target is Bands dimensional vector, and Bands is the number of plies of three-dimensional array R;
Training sample set reads unit 32, for reading the training sample set S being marked with class categories;Described training sample set S comprises the training sample subset of N number of classification, and the training sample subset of each classification is that a Bands ties up columns group, a training sample of the every a line record category in columns group;
Pattern matching unit 33, for carrying out pattern match to each target with each training sample in described training sample set S according to preset category vote rule, it is determined that voting results of all categories;
Classification determination unit 34, for according to described voting results of all categories, being defined as the classification belonging to target by the classification of category vote highest scoring.
Present application illustrates the Liang Zhong working mechanism of pattern matching unit, as follows respectively:
The first, described pattern matching unit may include that
First mode coupling subelement, for calculating each element in target, and the distance of corresponding element in each training sample in training sample set S, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding;According to the category vote result of all elements in target, add up the gained vote number of N number of classification.
The second, described pattern matching unit may include that
Second pattern match subelement, for according to preset multi-level features set construction strategy, described target being carried out multi-level features set structure, obtain the target characteristic set of the Bands layer of target, wherein the target characteristic set of i-th layer comprises C (Bands, i) individual element;
3rd pattern match subelement, for according to described preset multi-level features set construction strategy, each training sample in described training sample set S is carried out multi-level features set structure, obtain the training sample characteristic set of the Bands layer of each training sample, wherein the training sample characteristic set of i-th layer comprises C (Bands, i) individual element;
Fourth mode coupling subelement, for calculating each element in the target characteristic set of each layer of target successively, distance with the corresponding element in the training sample characteristic set of the respective layer of each training sample, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding;
5th pattern match subelement, for according to the category vote result of all elements in the target characteristic set of layer, adding up the gained vote number of N number of classification, obtains this layer and characterizes N number of classification and each win the vote the category feature column vector of number;
6th pattern match subelement, for by the described category feature column vector element number divided by the target characteristic set of this layer, obtaining the category feature column vector after normalization;
7th pattern match subelement, for according to each layer weighted value set, being weighted the category feature column vector after the normalization of all layers being added, obtain total category feature column vector.
Based on this, described classification determination unit includes:
First category determines subelement, for the classification that selected value is maximum in described total category feature column vector, as the classification belonging to target.
Further alternative, above-mentioned fourth mode coupling subelement each element in calculating the target characteristic set of each layer of target, during with the distance of the corresponding element in the training sample characteristic set of the respective layer of each training sample, it is possible to use Euclidean distance computational methods.
Further, this application discloses the second pattern match subelement and carry out the optional mode of multi-level features set structure, described second pattern match subelement may include that
Unit is determined in 1st layer of target characteristic set, for target self is defined as the target characteristic set of the 1st layer;
Unit is determined in i-th layer of target characteristic set, it is combined for i element any in the target characteristic set to described 1st layer, obtaining i dimensional feature vector set, this i dimensional feature vector set is defined as the target characteristic set of i-th layer of target, wherein i is the integer from 2 to Bands.
Finally, it can further be stated that, in this article, the relational terms of such as first and second or the like is used merely to separate an entity or operation with another entity or operating space, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " includes ", " comprising " or its any other variant are intended to comprising of nonexcludability, so that include the process of a series of key element, method, article or equipment not only include those key elements, but also include other key elements being not expressly set out, or also include the key element intrinsic for this process, method, article or equipment.When there is no more restriction, statement " including ... " key element limited, it is not excluded that there is also other identical element in including the process of described key element, method, article or equipment.
In this specification, each embodiment adopts the mode gone forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar portion mutually referring to.
Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses the application.The multiple amendment of these embodiments be will be apparent from for those skilled in the art, and generic principles defined herein when without departing from spirit herein or scope, can realize in other embodiments.Therefore, the application is not intended to be limited to the embodiments shown herein, and is to fit to the widest scope consistent with principles disclosed herein and features of novelty.

Claims (10)

1. a high dimensional data method for classifying modes, it is characterised in that including:
Reading high dimensional data to be sorted, described high dimensional data is the three-dimensional array R being made up of row, column and layer, and wherein, the data acquisition system of any row and column combination corresponding position is as a target, and target is Bands dimensional vector, and Bands is the number of plies of three-dimensional array R;
Read the training sample set S being marked with class categories;Described training sample set S comprises the training sample subset of N number of classification, and the training sample subset of each classification is that a Bands ties up columns group, a training sample of the every a line record category in columns group;
According to preset category vote rule, each target is carried out pattern match with each training sample in described training sample set S, it is determined that voting results of all categories;
According to described voting results of all categories, the classification of category vote highest scoring is defined as the classification belonging to target.
2. method according to claim 1, it is characterised in that described according to preset category vote rule, each training sample in each target and described training sample set S is carried out pattern match, it is determined that voting results of all categories, including:
Calculate each element in target, and the distance of corresponding element in each training sample in training sample set S, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding;
According to the category vote result of all elements in target, add up the gained vote number of N number of classification.
3. method according to claim 1, it is characterised in that described according to preset category vote rule, each training sample in each target and described training sample set S is carried out pattern match, it is determined that voting results of all categories, including:
According to preset multi-level features set construction strategy, described target carrying out multi-level features set structure, obtain the target characteristic set of the Bands layer of target, wherein the target characteristic set of i-th layer comprises C (Bands, i) individual element;
According to described preset multi-level features set construction strategy, each training sample in described training sample set S is carried out multi-level features set structure, obtain the training sample characteristic set of the Bands layer of each training sample, wherein the training sample characteristic set of i-th layer comprises C (Bands, i) individual element;
Calculate each element in the target characteristic set of each layer of target successively, distance with the corresponding element in the training sample characteristic set of the respective layer of each training sample, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding;
According to the category vote result of all elements in the target characteristic set of layer, add up the gained vote number of N number of classification, obtain this layer and characterize N number of classification and each win the vote the category feature column vector of number;
By the described category feature column vector element number divided by the target characteristic set of this layer, obtain the category feature column vector after normalization;
According to each layer weighted value set, it is weighted the category feature column vector after the normalization of all layers being added, obtains total category feature column vector;
Then, described according to described voting results of all categories, the classification of category vote highest scoring is defined as the classification belonging to target, including:
The classification that selected value is maximum in described total category feature column vector, as the classification belonging to target.
4. method according to claim 3, it is characterised in that each element in the target characteristic set of described calculate target successively each layer, and the distance of the corresponding element in the training sample characteristic set of the respective layer of each training sample, including:
Calculate each element in the target characteristic set of each layer of target successively, with the Euclidean distance of the corresponding element in the training sample characteristic set of the respective layer of each training sample.
5. method according to claim 3, it is characterised in that described according to preset multi-level features set construction strategy, described target carries out multi-level features set structure, obtains the target characteristic set of the Bands layer of target, including:
Target self is defined as the target characteristic set of the 1st layer;
I element any in the target characteristic set of described 1st layer being combined, obtains i dimensional feature vector set, this i dimensional feature vector set is defined as the target characteristic set of i-th layer of target, wherein i is the integer from 2 to Bands.
6. a high dimensional data pattern classification device, it is characterised in that including:
High dimensional data reads unit, and for reading high dimensional data to be sorted, described high dimensional data is the three-dimensional array R being made up of row, column and layer, wherein, the data acquisition system of any row and column combination corresponding position is as a target, and target is Bands dimensional vector, and Bands is the number of plies of three-dimensional array R;
Training sample set reads unit, for reading the training sample set S being marked with class categories;Described training sample set S comprises the training sample subset of N number of classification, and the training sample subset of each classification is that a Bands ties up columns group, a training sample of the every a line record category in columns group;
Pattern matching unit, for carrying out pattern match to each target with each training sample in described training sample set S according to preset category vote rule, it is determined that voting results of all categories;
Classification determination unit, for according to described voting results of all categories, being defined as the classification belonging to target by the classification of category vote highest scoring.
7. device according to claim 6, it is characterised in that described pattern matching unit includes:
First mode coupling subelement, for calculating each element in target, and the distance of corresponding element in each training sample in training sample set S, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding;According to the category vote result of all elements in target, add up the gained vote number of N number of classification.
8. device according to claim 6, it is characterised in that described pattern matching unit includes:
Second pattern match subelement, for according to preset multi-level features set construction strategy, described target being carried out multi-level features set structure, obtain the target characteristic set of the Bands layer of target, wherein the target characteristic set of i-th layer comprises C (Bands, i) individual element;
3rd pattern match subelement, for according to described preset multi-level features set construction strategy, each training sample in described training sample set S is carried out multi-level features set structure, obtain the training sample characteristic set of the Bands layer of each training sample, wherein the training sample characteristic set of i-th layer comprises C (Bands, i) individual element;
Fourth mode coupling subelement, for calculating each element in the target characteristic set of each layer of target successively, distance with the corresponding element in the training sample characteristic set of the respective layer of each training sample, and determine apart from minimum training sample, remember a ticket for the classification that this training sample is corresponding;
5th pattern match subelement, for according to the category vote result of all elements in the target characteristic set of layer, adding up the gained vote number of N number of classification, obtains this layer and characterizes N number of classification and each win the vote the category feature column vector of number;
6th pattern match subelement, for by the described category feature column vector element number divided by the target characteristic set of this layer, obtaining the category feature column vector after normalization;
7th pattern match subelement, for according to each layer weighted value set, being weighted the category feature column vector after the normalization of all layers being added, obtain total category feature column vector;
Then, described classification determination unit includes:
First category determines subelement, for the classification that selected value is maximum in described total category feature column vector, as the classification belonging to target.
9. device according to claim 8, it is characterized in that, described fourth mode coupling subelement each element in calculating the target characteristic set of each layer of target, during with the distance of the corresponding element in the training sample characteristic set of the respective layer of each training sample, use Euclidean distance computational methods.
10. device according to claim 8, it is characterised in that described second pattern match subelement includes:
Unit is determined in 1st layer of target characteristic set, for target self is defined as the target characteristic set of the 1st layer;
Unit is determined in i-th layer of target characteristic set, it is combined for i element any in the target characteristic set to described 1st layer, obtaining i dimensional feature vector set, this i dimensional feature vector set is defined as the target characteristic set of i-th layer of target, wherein i is the integer from 2 to Bands.
CN201610059218.3A 2016-01-28 2016-01-28 High-dimensional data mode classification method and apparatus Pending CN105740896A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610059218.3A CN105740896A (en) 2016-01-28 2016-01-28 High-dimensional data mode classification method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610059218.3A CN105740896A (en) 2016-01-28 2016-01-28 High-dimensional data mode classification method and apparatus

Publications (1)

Publication Number Publication Date
CN105740896A true CN105740896A (en) 2016-07-06

Family

ID=56246832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610059218.3A Pending CN105740896A (en) 2016-01-28 2016-01-28 High-dimensional data mode classification method and apparatus

Country Status (1)

Country Link
CN (1) CN105740896A (en)

Similar Documents

Publication Publication Date Title
Gustafsson et al. Comparison and validation of community structures in complex networks
CN103648106B (en) WiFi indoor positioning method of semi-supervised manifold learning based on category matching
CN108846259A (en) A kind of gene sorting method and system based on cluster and random forests algorithm
CN107766883A (en) A kind of optimization random forest classification method and system based on weighted decision tree
CN108804677A (en) In conjunction with the deep learning question classification method and system of multi-layer attention mechanism
CN104751469B (en) The image partition method clustered based on Fuzzy c-means
CN101178703B (en) Failure diagnosis chart clustering method based on network dividing
CN110490227A (en) A kind of few sample image classification method based on Feature Conversion
CN109558902A (en) A kind of fast target detection method
CN104581644B (en) Indoor WLAN fingerprint databases multiple spot adaptive updates method based on radial base interpolation
CN104615894A (en) Traditional Chinese medicine diagnosis method and system based on k-nearest neighbor labeled specific weight characteristics
Wei et al. Minimum deviation models for multiple attribute decision making in intuitionistic fuzzy setting
CN106681305A (en) Online fault diagnosing method for Fast RVM (relevance vector machine) sewage treatment
CN110366244B (en) WiFi fingerprint indoor positioning method
CN106202999B (en) Microorganism high-pass sequencing data based on different scale tuple word frequency analyzes agreement
CN101957913A (en) Information fusion technology-based fingerprint identification method and device
CN103455612B (en) Based on two-stage policy non-overlapped with overlapping network community detection method
CN108877947A (en) Depth sample learning method based on iteration mean cluster
CN109935337A (en) A kind of medical record lookup method and system based on similarity measurement
CN109003266A (en) A method of based on fuzzy clustering statistical picture quality subjective evaluation result
CN108764346A (en) A kind of mixing sampling integrated classifier based on entropy
CN104463251A (en) Cancer gene expression profile data identification method based on integration of extreme learning machines
CN102194134A (en) Biological feature recognition performance index prediction method based on statistical learning
CN101324926A (en) Method for selecting characteristic facing to complicated mode classification
CN101625725A (en) Artificial immunization non-supervision image classification method based on manifold distance

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160706