CN109948705A - A kind of rare class detection method and device based on k neighbour's figure - Google Patents

A kind of rare class detection method and device based on k neighbour's figure Download PDF

Info

Publication number
CN109948705A
CN109948705A CN201910213519.0A CN201910213519A CN109948705A CN 109948705 A CN109948705 A CN 109948705A CN 201910213519 A CN201910213519 A CN 201910213519A CN 109948705 A CN109948705 A CN 109948705A
Authority
CN
China
Prior art keywords
node
neighbour
data
variation coefficient
data sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910213519.0A
Other languages
Chinese (zh)
Inventor
李易
黄浩
李宗鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201910213519.0A priority Critical patent/CN109948705A/en
Publication of CN109948705A publication Critical patent/CN109948705A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of rare class detection methods based on k neighbour's figure, construct its k neighbour figure firstly, for given non-label data collection S, wherein k value is chosen automatically by algorithm;It is then based on the k neighbour figure of construction, gives the definition of variation coefficient Vc, each node concentrated to data calculates its variation coefficient Vc value.Then the maximum node x of variation coefficient is found out from all nodes, and inquires to obtain its class label y to labeling person, and x and y is added in selected data sample set I and the true class label set L of selected data sample respectively.The present invention is carried out rare class detection using the method for the abrupt local of detection data intensive data sample distribution and compared with remaining without the rare class detection method of priori, and KRED method is more efficient, and algorithm expense is lower.Method by choosing k value automatically simultaneously, effectively increases the discovery efficiency of each class in data set, and significantly reduces in discovery data and inquire number required for whole classes.

Description

A kind of rare class detection method and device based on k neighbour's figure
Technical field
The present invention relates to data mining technology fields, and in particular to a kind of rare class detection method based on k neighbour's figure and Device.
Background technique
Rare class detection is a very important job in data mining, is intended to no label data concentration and finds those Rare class often has more than accounting for the most of main class of data set data sample although these class data samples are fewer There is realistic meaning, thus more worth further research.For example, in magnanimity financial transaction record data, it is sometimes under cover a small amount of Using financial system loophole or take fraudulent mean carry out illegal transaction record;In the normal network access of magnanimity, There is a small amount of hostile network behaviors.In addition to can be used for the above practical problem, rare class detection can also be from given nothing A small amount of classification data sample is obtained in class label data set, to be further used for structural classification device or for semi-supervised Learning method such as coorinated training and Active Learning etc..Therefore, rare class detection all has wide in practical application and theoretical research General application scenarios and higher researching value.
At least there is following technology in implementing the present invention, it may, the method for finding the prior art in present inventor Problem:
Since rare class data sample is very few and is usually hidden in the data distribution of main class, traditional cluster, classification Technology is often more difficult quickly and accurately to detect rare class.But, rare class has following characteristics usually to help identification, example Such as form between the cluster of consolidation and its neighboring area that there are the larger differences in data distribution.In the prior art, according to whether needing Priori knowledge is wanted, if the data of classification number or each class account for the ratio of data set, rare class detection algorithm, which can be divided into, to be based on The rare class detection algorithm of priori knowledge and rare two class of class algorithm without priori knowledge.
Wherein, the rare class detection algorithm based on priori knowledge assumes that user has first used unbalanced dataset General proportions of the data sample of the number of class or each class in data set in knowledge, such as data set are tested, utilization is passed through It is that the rare class of each of the data set finds out at least one data sample that priori knowledge, which helps user, to find rare class. It include typically the method based on model, the method based on boundary degree, the method based on density variation based on priori knowledge method, But these methods or need to assume rare class and main class linear separability;Or it needs to spend more inquiry times and cause Lower efficiency.
Due in practical applications, data set does not have that the case where priori knowledge is more common in user opponent, therefore without elder generation The rare class algorithm for testing knowledge has more common application scenarios.The rare class detection technique of existing no priori knowledge is main Including based on the degree of peeling off method, based on the method for rate of change of the density, but these methods often also all have it is relatively high Time complexity.
It follows that there are the higher technical problems of time complexity for method in the prior art.
Summary of the invention
In view of this, the present invention provides a kind of rare class detection method and device based on k neighbour's figure, solving or Person at least partly solves method in the prior art, and there are the higher technical problems of time complexity.
First aspect present invention provides a kind of rare class detection method based on k neighbour's figure, comprising:
Non- label data collection S is given for default, constructs its k neighbour figure;
K neighbour figure based on construction, is arranged the definition of the variation coefficient Vc of node p, Vc (p)=maxV (p) × std (EL (p)),
Wherein, kNN (p) indicates the k arest neighbors node set of p, and EL (p) indicates the side length set of node p, and deg (p) is The in-degree of k neighbour's figure midpoint p, and the definition of the variation coefficient based on setting calculate the corresponding change of each node in data set S Change coefficient value;
The maximum node x of variation coefficient is found out from all nodes, and obtains its class label y, then respectively add x and y Enter in the data sample set I and the true class label set L of data sample chosen in advance;
In one embodiment, non-label data collection S is given for default, constructs its k neighbour figure, specifically includes:
K value is calculated by default clustering algorithm;
Based on calculated k value, to data set S construction k neighbour figure, wherein it is that a weighting has that k neighbour, which schemes G=(V, E), Xiang Tu, each node p ∈ V indicate that a data sample, each edge e ∈ E indicate that two endpoints on side are neighbor relationships, the power on side Euclidean distance of the value between two data samples has k directed edge to be directed toward its k apart from nearest section from each node Point.
In one embodiment, after the maximum node x of variation coefficient is found out in all nodes, the method is also wrapped It includes:
Judge whether node and node x in data set S are neighbor relationships, if it is, by the variation coefficient of the node It is set as-∞.
In one embodiment, the method also includes:
Whether judgement can be 0 with inquiry number, if not being 0, continue to execute and find out variation coefficient from all nodes Maximum node x, and its class label y is obtained, then x and y is added to the data sample set I and data sample chosen in advance respectively Step in this true class label set L, the data sample set I and data sample for otherwise returning to selection are really class labels Set L.
Based on same inventive concept, second aspect of the present invention provides a kind of rare class detection dress based on k neighbour's figure It sets, comprising:
K neighbour's figure constructing module constructs its k neighbour figure for giving non-label data collection S for default;
Variation coefficient defines setup module, and for the k neighbour figure based on construction, determining for the variation coefficient Vc of node p is arranged Justice, Vc (p)=maxV (p) × std (EL (p)),
Wherein, kNN (p) indicates the k arest neighbors node set of p, and EL (p) indicates the side length set of node p, and deg (p) is The in-degree of k neighbour's figure midpoint p, and the definition of the variation coefficient based on setting calculate the corresponding change of each node in data set S Change coefficient value;
Rare classification obtains module, for finding out the maximum node x of variation coefficient from all nodes, and obtains its classification Label y, then x and y is added in the data sample set I and the true class label set L of data sample chosen in advance respectively.
In one embodiment, k neighbour figure constructing module is specifically used for:
By presetting the calculated k value of clustering algorithm;
Based on calculated k value, to data set S construction k neighbour figure, wherein it is that a weighting has that k neighbour, which schemes G=(V, E), Xiang Tu, each node p ∈ V indicate that a data sample, each edge e ∈ E indicate that two endpoints on side are neighbor relationships, the power on side Euclidean distance of the value between two data samples has k directed edge to be directed toward its k apart from nearest section from each node Point.
In one embodiment, described device further includes first judgment module, for finding out variation in all nodes After the maximum node x of coefficient:
Judge whether node and node x in data set S are neighbor relationships, if it is, by the variation coefficient of the node It is set as-∞.
In one embodiment, described device further includes the second judgment module, is used for:
Whether judgement can be 0 with inquiry number, if not being 0, continue to execute and find out variation coefficient from all nodes Maximum node x, and its class label y is obtained, then x and y is added to the data sample set I and data sample chosen in advance respectively Step in this true class label set L, the data sample set I and data sample for otherwise returning to selection are really class labels Set L.
Based on same inventive concept, third aspect present invention provides a kind of computer readable storage medium, deposits thereon Computer program is contained, which, which is performed, realizes method described in first aspect.
Based on same inventive concept, fourth aspect present invention provides a kind of computer equipment, including memory, processing On a memory and the computer program that can run on a processor, when processor execution described program, is realized for device and storage Method as described in relation to the first aspect.
Said one or multiple technical solutions in the embodiment of the present application at least have following one or more technology effects Fruit:
A kind of rare class detection method based on k neighbour's figure disclosed by the invention;Firstly, for default given non-label Data set S constructs its k neighbour figure;It is then based on the k neighbour figure of construction, gives variation coefficient Vc (Variation Coefficient definition), each node concentrated to data calculate its variation coefficient Vc value.Then from all nodes The maximum node x of variation coefficient is found out, and inquires to obtain its class label y to labeling person, and selected number is added in x and y respectively According in sample set I and the true class label set L of selected data sample.Since the present invention utilizes detection data intensive data sample The method of the abrupt local of this distribution is compared to carry out rare class detection with the existing no rare class detection method of priori, and efficiency is more Height, algorithm expense are lower.
Further, the method for choosing k value automatically by preset algorithm can effectively improve the hair of each class in data set Existing efficiency, and significantly reduce in discovery data and inquire number required for whole classes.
It further, is the change of the data point of neighbor relationships with x in order to avoid repeating to choose data sample in the same area - ∞ will be set to by changing coefficient Vc.When can inquiry times be not run out when, can continue inquire remaining data sample in Vc value the maximum True class label;Otherwise circulation is as a result, algorithm stopping, returning to set I and set L.To further improve algorithm Efficiency.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.
Fig. 1 is the flow chart of the rare class detection method based on k neighbour's figure in the embodiment of the present invention;
Fig. 2 is the calculation flow chart of detection method flow chart and k value in a kind of specific embodiment;
Fig. 3 be in specific example of the present invention inquiry times with it has been found that total classification percentage shared by classification relational graph;
Fig. 4 is the structural block diagram of rare class detection device of the embodiment of the present invention based on k neighbour's figure;
Fig. 5 is a kind of structural block diagram of computer readable storage medium in the embodiment of the present invention;
Fig. 6 is the structure chart of computer equipment in the embodiment of the present invention.
Specific embodiment
The purpose of the present invention is to provide a kind of rare class detection method and device based on k neighbour's figure, existing to improve With the presence of the higher technical problem of method time complexity in technology.
In order to solve the above-mentioned technical problem, central scope of the invention is as follows:
For default given unlabeled data collection S, first the k neighbour figure of construction data set S, then based on k neighbour figure The variation coefficient Vc (Variation coefficient) of each node in data set S is calculated, and picks out variation from data set The maximum point of coefficient is labeled for expert, and then finds out rare class according to the result of mark.Method of the invention is at runtime Between it is upper be better than existing algorithm, and there is advantage on rare class Detection accuracy.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Embodiment one
A kind of rare class detection method based on k neighbour's figure is present embodiments provided, referring to Figure 1, this method comprises:
Step S1: non-label data collection S is given for default, constructs its k neighbour figure.
Specifically, the data acquisition system that given non-label data collection S is existing calibration is preset, the k value of k neighbour's figure can be by Preset algorithm is calculated.
In one embodiment, non-label data collection S is given for default, constructs its k neighbour figure, specifically includes:
K value is calculated by default clustering algorithm;
Based on calculated k value, to data set S construction k neighbour figure, wherein it is that a weighting has that k neighbour, which schemes G=(V, E), Xiang Tu, each node p ∈ V indicate that a data sample, each edge e ∈ E indicate that two endpoints on side are neighbor relationships, the power on side Euclidean distance of the value between two data samples has k directed edge to be directed toward its k apart from nearest section from each node Point.
In specific implementation process, k value can be calculated automatically by following algorithms:
(1) the covariance matrix cov of data-oriented collection S is calculated;
(2) the characteristic value collection eig of cov is calculated;
(3) K=2 is enabled, takes K-means to cluster eig larger and two lesser eig to be divided into characteristic value automatically Point;
(4) the biggish part of characteristic value in cluster result is chosen, the number c of wherein characteristic value is counted;
(5) k=2c is returned;
Step S2: the k neighbour figure based on construction, the definition of the variation coefficient Vc of setting node p, Vc (p)=maxV (p) × std(EL(p)),
Wherein, kNN (p) indicates the k arest neighbors node set of p, and EL (p) indicates the side length set of node p, i.e., with node P is the set of the length on all sides on vertex;Deg (p) is the in-degree of k neighbour figure midpoint p.It is defined based on above-mentioned variation coefficient, Variation coefficient is calculated for point each in data set.
Specifically, after the definition for providing variation coefficient Vc, then any data object s in S in data set (is appointed Meaning node), the value Vc (s) of its variation coefficient can be calculated according to definition
Step S3: finding out the maximum node x of variation coefficient from all nodes, and obtain its class label y, then respectively will X and y is added in the data sample set I and the true class label set L of data sample chosen in advance.
Specifically, according to step S2 calculate as a result, can then find out the maximum node x of variation coefficient, made For destination node, and its classification is inquired to labeling person (such as expert), wherein data sample set I and the true class of data sample Distinguishing label set L by choosing in advance.After finding out the maximum node of variation coefficient, and inquire its classification, it is separately added into data In the sample set I and true class label set L of data sample.
It is found out in all nodes in one embodiment in order to avoid repeating selection data sample in the same area After the maximum node x of variation coefficient, the method also includes:
Judge whether node and node x in data set S are neighbor relationships, if it is, by the variation coefficient of the node It is set as-∞.
Specifically, in order to improve the accuracy of algorithm, for any node in data set, if it is neighbour pass with x A possibility that its Vc value then will be set to minus infinity, and choose the neighboring node of node x in this way by system is with regard to smaller.
In one embodiment, the method also includes:
Whether judgement can be 0 with inquiry number, if not being 0, continue to execute and find out variation coefficient from all nodes Maximum node x, and its class label y is obtained, then x and y is added to the data sample set I and data sample chosen in advance respectively Step in this true class label set L, the data sample set I and data sample for otherwise returning to selection are really class labels Set L.
It is a kind of flow chart of detection method in specific embodiment referring specifically to Fig. 2, comprising:
Step S101: construction data-oriented collection S k neighbour figure, and scheme to calculate all nodes in data set S based on k neighbour Variation coefficient value;
Step S102: judging whether available query number is greater than 0, if so, S103 is thened follow the steps, it is no to then follow the steps S106;
Step S103: picking out the maximum node x of variation coefficient value, and inquires and obtain its described classification y;
Step S104: x and y are separately added into selected data sample set I and the true class label set L of data sample In;
Step S105: the variation coefficient value of the Neighbor Points of node x is revised as-∞, return step S102.
Used in the present embodiment 4 in UCI database (http://archive.ics.uci.edu/ml/) it is true Real data collection, they are respectively data set Glass, data set Ecoli, data set Yeast and data set Abalone.Four The association attributes of data set are as shown in table 1.
Dataset name Data set size Data dimension
Glass 214 9
Ecoli 336 7
Yeast 1481 8
Abalone 4177 7
The association attributes of 1. real data set of table
Fig. 3 is referred to, is inquiry times and the pass for having found total classification percentage shared by classification in specific example of the present invention System's figure, it can be seen that compared with the conventional method, KRED method of the invention is more efficient, and algorithm expense is lower.Simultaneously by certainly The dynamic method for choosing k value, effectively increases the discovery efficiency of each class in data set, and significantly reduces in discovery data all Number is inquired required for class.And there is advantage on rare class Detection accuracy.
Based on the same inventive concept, present invention also provides in embodiment one based on the rare class detection side of k neighbour's figure The corresponding device of method, detailed in Example two.
Embodiment two
A kind of rare class detection device based on k neighbour's figure is present embodiments provided, refers to Fig. 4, which includes:
K neighbour's figure constructing module constructs its k neighbour figure for giving non-label data collection S for default;
Variation coefficient defines setup module, and for the k neighbour figure based on construction, determining for the variation coefficient Vc of node p is arranged Justice, Vc (p)=maxV (p) × std (EL (p)),
Wherein, kNN (p) indicates the k arest neighbors node set of p, and EL (p) indicates the side length set of node p, and deg (p) is The in-degree of k neighbour's figure midpoint p, and the definition of the variation coefficient based on setting calculate the corresponding change of each node in data set S Change coefficient value;
Rare classification obtains module, for finding out the maximum node x of variation coefficient from all nodes, and obtains its classification Label y, then x and y is added in the data sample set I and the true class label set L of data sample chosen in advance respectively.
In one embodiment, k neighbour figure constructing module 201 is specifically used for:
By presetting the calculated k value of clustering algorithm;
Based on calculated k value, to data set S construction k neighbour figure, wherein it is that a weighting has that k neighbour, which schemes G=(V, E), Xiang Tu, each node p ∈ V indicate that a data sample, each edge e ∈ E indicate that two endpoints on side are neighbor relationships, the power on side Euclidean distance of the value between two data samples has k directed edge to be directed toward its k apart from nearest section from each node Point.
In one embodiment, described device further includes first judgment module, for finding out variation in all nodes After the maximum node x of coefficient:
Judge whether node and node x in data set S are neighbor relationships, if it is, by the variation coefficient of the node It is set as-∞.
In one embodiment, described device further includes the second judgment module, is used for:
Whether judgement can be 0 with inquiry number, if not being 0, continue to execute and find out variation coefficient from all nodes Maximum node x, and its class label y is obtained, then x and y is added to the data sample set I and data sample chosen in advance respectively Step in this true class label set L, the data sample set I and data sample for otherwise returning to selection are really class labels Set L.
By the device that the embodiment of the present invention two is introduced, to implement in the embodiment of the present invention one based on the dilute of k neighbour's figure There is device used by class detection method, so based on the method that the embodiment of the present invention one is introduced, the affiliated personnel's energy in this field The specific structure much of that for solving the device and deformation, so details are not described herein.The method of all embodiment of the present invention one is used Device belong to the range to be protected of the invention.
Embodiment three
Fig. 5 is referred to, based on the same inventive concept, present invention also provides a kind of computer readable storage medium 300, On be stored with computer program 311, the program be performed realize the method as described in embodiment one.
Since the computer readable storage medium that the embodiment of the present invention three is introduced is base in the implementation embodiment of the present invention one The computer equipment used by the rare class detection method of k neighbour's figure, so the side introduced based on the embodiment of the present invention one Method, the affiliated personnel in this field can understand specific structure and the deformation of the computer readable storage medium, so it is no longer superfluous herein It states.Computer readable storage medium used by method belongs to the model of the invention to be protected in all embodiment of the present invention one It encloses.
Example IV
Based on the same inventive concept, present invention also provides a kind of computer equipment, Fig. 6 is referred to, including storage 401, On a memory and the computer program 403 that can run on a processor, processor 402 executes above-mentioned for processor 402 and storage The method in embodiment one is realized when program.
Since the computer equipment that the embodiment of the present invention four is introduced is schemed to be based on k neighbour in the implementation embodiment of the present invention one Rare class detection method used by computer equipment, so based on the method that the embodiment of the present invention one is introduced, this field Affiliated personnel can understand specific structure and the deformation of the computer equipment, so details are not described herein.All present invention are implemented Computer equipment used by method belongs to the range of the invention to be protected in example one.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, those skilled in the art can carry out various modification and variations without departing from this hair to the embodiment of the present invention The spirit and scope of bright embodiment.In this way, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention And its within the scope of equivalent technologies, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of rare class detection method based on k neighbour's figure characterized by comprising
Non- label data collection S is given for default, constructs its k neighbour figure;
K neighbour figure based on construction, the definition of the variation coefficient Vc of setting node p, Vc (p)=maxV (p) × std (EL (p)),
Wherein, kNN (p) indicates the k arest neighbors node set of p, and EL (p) indicates the side length set of node p, and deg (p) is that k is close The in-degree of adjacent figure midpoint p, and the definition of the variation coefficient based on setting calculate the corresponding variation of each node in data set S Coefficient value;
The maximum node x of variation coefficient is found out from all nodes, and obtains its class label y, then x and y is added in advance respectively In the data sample set I and the true class label set L of data sample first chosen, to find out rare class.
2. the method as described in claim 1, which is characterized in that give non-label data collection S for default, construct its k neighbour Figure, specifically includes:
K value is calculated by default clustering algorithm;
Based on calculated k value, to data set S construction k neighbour figure, wherein it is that a weighting is oriented that k neighbour, which schemes G=(V, E), Figure, each node p ∈ V indicate that a data sample, each edge e ∈ E indicate that two endpoints on side are neighbor relationships, the weight on side There is k directed edge to be directed toward its k apart from nearest node from each node for the Euclidean distance between two data samples.
3. the method as described in claim 1, which is characterized in that found out in all nodes the maximum node x of variation coefficient it Afterwards, the method also includes:
Judge whether node and node x in data set S are neighbor relationships, if it is, the variation coefficient of the node is arranged For-∞.
4. the method as described in claim 1, which is characterized in that the method also includes:
Whether judgement can be 0 with inquiry number, if not being 0, continue to execute and find out variation coefficient maximum from all nodes Node x, and obtain its class label y, then the data sample set I chosen in advance is added in x and y respectively and data sample is true Otherwise step in real class label set L returns to the data sample set I and data sample really class label set of selection L。
5. a kind of rare class detection device based on k neighbour's figure characterized by comprising
K neighbour's figure constructing module constructs its k neighbour figure for giving non-label data collection S for default;
Variation coefficient defines setup module, and for the k neighbour figure based on construction, the definition of the variation coefficient Vc of node p, Vc is arranged (p)=maxV (p) × std (EL (p)),
Wherein, kNN (p) indicates the k arest neighbors node set of p, and EL (p) indicates the side length set of node p, and deg (p) is that k is close The in-degree of adjacent figure midpoint p, and the definition of the variation coefficient based on setting calculate the corresponding variation of each node in data set S Coefficient value;
Rare classification obtains module, for finding out the maximum node x of variation coefficient from all nodes, and obtains its class label Y, then x and y is added in the data sample set I and the true class label set L of data sample chosen in advance respectively, to look for Rare class out.
6. device as claimed in claim 5, which is characterized in that k neighbour's figure constructing module is specifically used for:
K value is calculated by default clustering algorithm;
Based on calculated k value, to data set S construction k neighbour figure, wherein it is that a weighting is oriented that k neighbour, which schemes G=(V, E), Figure, each node p ∈ V indicate that a data sample, each edge e ∈ E indicate that two endpoints on side are neighbor relationships, the weight on side There is k directed edge to be directed toward its k apart from nearest node from each node for the Euclidean distance between two data samples.
7. device as claimed in claim 5, which is characterized in that described device further includes first judgment module, for all After finding out the maximum node x of variation coefficient in node:
Judge whether node and node x in data set S are neighbor relationships, if it is, the variation coefficient of the node is arranged For-∞.
8. device as claimed in claim 5, which is characterized in that described device further includes the second judgment module, is used for:
Whether judgement can be 0 with inquiry number, if not being 0, continue to execute and find out variation coefficient maximum from all nodes Node x, and obtain its class label y, then the data sample set I chosen in advance is added in x and y respectively and data sample is true Otherwise step in real class label set L returns to the data sample set I and data sample really class label set of selection L。
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is performed reality The now method as described in any one of claims 1 to 4 claim.
10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that realized when the processor executes described program as any one of claims 1 to 4 right is wanted Seek the method.
CN201910213519.0A 2019-03-20 2019-03-20 A kind of rare class detection method and device based on k neighbour's figure Pending CN109948705A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910213519.0A CN109948705A (en) 2019-03-20 2019-03-20 A kind of rare class detection method and device based on k neighbour's figure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910213519.0A CN109948705A (en) 2019-03-20 2019-03-20 A kind of rare class detection method and device based on k neighbour's figure

Publications (1)

Publication Number Publication Date
CN109948705A true CN109948705A (en) 2019-06-28

Family

ID=67010328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910213519.0A Pending CN109948705A (en) 2019-03-20 2019-03-20 A kind of rare class detection method and device based on k neighbour's figure

Country Status (1)

Country Link
CN (1) CN109948705A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111403027A (en) * 2020-03-17 2020-07-10 浙江工商大学 Rare disease picture searching method based on rare mining
CN112667709A (en) * 2020-12-24 2021-04-16 山东大学 Campus card leasing behavior detection method and system based on Spark
CN113515639A (en) * 2021-09-14 2021-10-19 华东交通大学 Noise data processing method and system based on belief learning and label smoothing
CN116204820A (en) * 2023-04-24 2023-06-02 山东科技大学 Impact risk grade discrimination method based on rare class mining

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778480A (en) * 2015-05-08 2015-07-15 江南大学 Hierarchical spectral clustering method based on local density and geodesic distance
CN108647297A (en) * 2018-05-08 2018-10-12 山东师范大学 A kind of the density peaks cluster centre choosing method and system of shared nearest neighbor optimization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778480A (en) * 2015-05-08 2015-07-15 江南大学 Hierarchical spectral clustering method based on local density and geodesic distance
CN108647297A (en) * 2018-05-08 2018-10-12 山东师范大学 A kind of the density peaks cluster centre choosing method and system of shared nearest neighbor optimization

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111403027A (en) * 2020-03-17 2020-07-10 浙江工商大学 Rare disease picture searching method based on rare mining
CN111403027B (en) * 2020-03-17 2023-06-27 浙江工商大学 Rare disease picture searching method based on rare class mining
CN112667709A (en) * 2020-12-24 2021-04-16 山东大学 Campus card leasing behavior detection method and system based on Spark
CN112667709B (en) * 2020-12-24 2022-05-03 山东大学 Campus card leasing behavior detection method and system based on Spark
CN113515639A (en) * 2021-09-14 2021-10-19 华东交通大学 Noise data processing method and system based on belief learning and label smoothing
CN113515639B (en) * 2021-09-14 2021-12-17 华东交通大学 Noise data processing method and system based on belief learning and label smoothing
CN116204820A (en) * 2023-04-24 2023-06-02 山东科技大学 Impact risk grade discrimination method based on rare class mining

Similar Documents

Publication Publication Date Title
CN109948705A (en) A kind of rare class detection method and device based on k neighbour's figure
Parimala et al. A survey on density based clustering algorithms for mining large spatial databases
Wu et al. Mining scale-free networks using geodesic clustering
Érdi et al. Prediction of emerging technologies based on analysis of the US patent citation network
Wu et al. A data mining approach for spatial modeling in small area load forecast
Jin et al. Load modeling by finding support vectors of load data from field measurements
Li et al. Stepping community detection algorithm based on label propagation and similarity
Epitropakis et al. Finding multiple global optima exploiting differential evolution's niching capability
CN103648106B (en) WiFi indoor positioning method of semi-supervised manifold learning based on category matching
Kim et al. Graph theoretic heuristics for unequal-sized facility layout problems
Rieck et al. Multivariate data analysis using persistence-based filtering and topological signatures
CN110070121A (en) A kind of quick approximate k nearest neighbor method based on tree strategy with balance K mean cluster
Ma et al. Decomposition‐based multiobjective evolutionary algorithm for community detection in dynamic social networks
CN103617163B (en) Quick target association method based on cluster analysis
CN109615550A (en) A kind of local corporations' detection method based on similitude
Fofonov et al. Projected Field Similarity for Comparative Visualization of Multi‐Run Multi‐Field Time‐Varying Spatial Data
CN108898244B (en) Digital signage position recommendation method coupled with multi-source elements
Kumar et al. Comparative analysis of SOM neural network with K-means clustering algorithm
CN111626311B (en) Heterogeneous graph data processing method and device
El Imrani et al. A fuzzy clustering-based niching approach to multimodal function optimization
Dahal Effect of different distance measures in result of cluster analysis
Liu et al. PRUC: P-regions with user-defined constraint
Wenzel et al. Accelerating navigation in the VecGeom geometry modeller
Zhai et al. A dynamic archive based niching particle swarm optimizer using a small population size
Li Community structure discovery algorithm on gpu with cuda

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190628

RJ01 Rejection of invention patent application after publication