CN110276401A - Sample clustering method, apparatus, equipment and storage medium - Google Patents
Sample clustering method, apparatus, equipment and storage medium Download PDFInfo
- Publication number
- CN110276401A CN110276401A CN201910551643.8A CN201910551643A CN110276401A CN 110276401 A CN110276401 A CN 110276401A CN 201910551643 A CN201910551643 A CN 201910551643A CN 110276401 A CN110276401 A CN 110276401A
- Authority
- CN
- China
- Prior art keywords
- sample
- distance
- connection
- bin
- samples
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses a kind of sample clustering method, apparatus, equipment and storage mediums, it is related to data processing field, comprising: which statistical sample concentrates the corresponding first sample distance of each sample, first sample distance is the distance between sample and S neighbour's sample of sample;In whole first sample distances, the first sample distance in set distance range is obtained;It is calculated based on the first sample distance in set distance range apart from mean value;Based on the corresponding k nearest neighbor sample set of each sample, whole connection samples of each sample are determined, wherein the connection sample of K > S, sample and sample is neighbour's sample each other and there are connection relationships;The sample in sample set is clustered according to connection sample, apart from mean value and S value, is sweep radius apart from mean value, S value is that cluster minimum includes sample number.The technical issues of DBSCAN algorithm can not rationally cluster the sample set of density unevenness in the prior art can solve using the above method.
Description
Technical field
The present embodiments relate to technical field of data processing more particularly to a kind of sample clustering method, apparatus, equipment and
Storage medium.
Background technique
Clustering refers to that the set by physics or abstract object is grouped into the analysis for the multiple classes being made of similar object
Process.Nowadays, clustering is widely used in all kinds of fields, and with the extensive use of clustering, all kinds of clusters are calculated
Method is come into being.For example, K-MEANS algorithm, K-MEDOIDS algorithm, BIRCH algorithm, CURE algorithm, DBSCAN algorithm,
OPTICS algorithm etc..Wherein, DBSCAN algorithm is a more representational density-based algorithms, is needed artificial
Input two parameters: one is sweep radius, is denoted as eps;Another is denoted as minPts comprising points to be minimum, and passes through two
A parameter is focused to find out the maximum set of the connected object of density in sample.In the implementation of the present invention, discovery is existing by inventor
When thering is technology to have following defects that cluster sample set based on DBSCAN algorithm, for the sample set of density unevenness,
If sweep radius is smaller, for the sample sparse for density, be easy to be considered as noise spot and reject, if sweep radius compared with
Greatly, then will easily gather apart from farther away sample for one kind, at this point, not can guarantee the accuracy of sample clustering.
To sum up, how under DBSCAN algorithm, the sample set of density unevenness is rationally clustered, becomes and urgently solves
Certainly the problem of.
Summary of the invention
The present invention provides a kind of sample clustering method, apparatus, equipment and storage mediums, to solve in the prior art
The technical issues of DBSCAN algorithm can not rationally cluster the sample set of density unevenness.
In a first aspect, the embodiment of the invention provides a kind of sample clustering methods, comprising:
Statistical sample concentrates the corresponding first sample distance of each sample, and the first sample distance is the sample and institute
State the distance between S neighbour's sample of sample;
In all first sample distances, the first sample distance in set distance range is obtained;
It is calculated based on the first sample distance in the set distance range apart from mean value;
Based on the corresponding k nearest neighbor sample set of each sample, whole connection samples of each sample are determined, wherein K
The connection sample of > S, the sample and the sample is neighbour's sample each other and there are connection relationships;
According to the connection sample, it is described the sample in the sample set is clustered apart from mean value and S value, it is described away from
It is sweep radius from mean value, the S value is that cluster is minimum comprising sample number.
Further, it is described according to the connection sample, it is described apart from mean value and S value to the sample in the sample set into
Row clusters
All connection samples are filtered apart from mean value based on described, to filter out the second sample distance greater than described
Connection sample apart from mean value, the second sample distance are the distance between the connection sample of sample and the sample;
The sample in the sample set is clustered based on the connection sample obtained after S value and filtering.
Further, described that the sample in the sample set is gathered based on the connection sample obtained after S value and filtering
Class includes:
Successively count the connection total sample number amount of each sample;
The connection total sample number amount is greater than the sample of S value as core sample;
In obtained whole core samples, select any core sample as current sample;
Access whole connection samples of the current sample;
Each connection sample that access is obtained accesses whole connection samples of the vertex correspondence as vertex
This;
The each connection sample for repeating to obtain access accesses whole connections of the vertex correspondence as vertex
The operation of sample, until accessing less than new connection sample;
Any core sample of not visited mistake is updated to current sample, and returns to execute and accesses the current sample
All operations of connection sample, until whole core samples are accessed;
It is cluster by the current sample and the connection sample clustering obtained based on current sample interview.
Further, described to be based on the corresponding k nearest neighbor sample set of each sample, determine that the whole of each sample connects
Connecing sample includes:
Obtain the corresponding k nearest neighbor sample set of each sample;
According to all k nearest neighbor sample sets, adjacency matrix is constructed, each element, which represents, in the adjacency matrix corresponds to
Neighbor relationships between two samples;
Nonzero element in the adjacency matrix is counted, with whole connection samples of each sample of determination.
Further, nonzero element in the statistics adjacency matrix, with whole connection samples of each sample of determination
Include:
In the adjacency matrix, the element group for being in symmetric position is obtained, the element group includes that the i-th row jth arranges
The second element that first element and jth row i-th arrange;
If in first element and the second element including at least one neutral element, by first element and the
Was Used is disposed as neutral element;
After the whole element groups for traversing the adjacency matrix, the adjacency matrix is updated;
Nonzero element in adjacency matrix after statistical updating, and corresponding two samples of the nonzero element are determined as mutually
For neighbour's sample and there is connection relationship;
Based on neighbour's sample each other, whole connection samples of each sample are obtained.
Further, described in all first samples distances, obtain first sample in set distance range away from
From including:
Based on all first sample distances, frequency distribution histogram is constructed;
The frequency of each bin in the frequency distribution histogram is counted, to determine set distance range;
Obtain the first sample distance in set distance range.
Further, the frequency for counting each bin in the frequency distribution histogram, to determine set distance range packet
It includes:
Obtain frequency maximum bin in the frequency distribution histogram;
The frequency drop between adjacent rear position bin is calculated, the rear position bin is to be located at frequency in the frequency distribution histogram
The bin at the number maximum rear bin;
Confirm the maximum adjacent rear position bin of frequency drop, and selects to be located behind in the maximum adjacent rear position bin
Bin;
By the corresponding first sample distance of the frequency maximum bin and the corresponding first sample of bin being located behind
Distance threshold of the distance as set distance range.
Further, the first sample distance based in the set distance range, which is calculated apart from mean value, includes:
Obtain sample size of the first sample distance in the set distance range;
First sample distance each in the set distance range is added, to obtain sample total distance;
Using the quotient of the sample total distance and the sample size as apart from mean value.
Further, before the corresponding first sample distance of each sample of the statistical sample concentration, further includes:
The k nearest neighbor figure of each sample in sample set is constructed, the weight of each edge is between corresponding to sample in the k nearest neighbor figure
Distance.
Second aspect, the embodiment of the invention also provides a kind of sample clustering devices, comprising:
Distance statistics module concentrates the corresponding first sample distance of each sample, the first sample for statistical sample
Distance is the distance between S neighbour's sample of the sample and the sample;
Distance obtains module, for obtaining the first sample in set distance range in all first sample distances
This distance;
Mean value computation module, for being calculated based on the first sample distance in the set distance range apart from mean value;
Determining module is connected, for being based on the corresponding k nearest neighbor sample set of each sample, determines the complete of each sample
Portion connects sample, wherein the connection sample of K > S, the sample and the sample closes for neighbour's sample each other and in the presence of connection
System;
Sample clustering module, for according to the connection sample, it is described apart from mean value and S value to the sample in the sample set
This is clustered, it is described apart from mean value be sweep radius, the S value be cluster it is minimum include sample number.
The third aspect, the embodiment of the invention also provides a kind of sample clustering equipment, comprising:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes sample clustering method as described in relation to the first aspect.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer
Program realizes sample clustering method as described in relation to the first aspect when the program is executed by processor.
Above-mentioned sample clustering method, apparatus, equipment and storage medium concentrate each sample and its S by statistical sample
First sample distance between neighbour's sample, and obtained based on first sample distance apart from mean value, meanwhile, based on each sample
K (K > S) neighbour's sample set, determines the corresponding connection sample of each sample, is neighbour each other between the connection sample and sample
Sample and there is connection relationship, later, based on cluster it is minimum comprising sample number (S value) and sweep radius (apart from mean value) to tool
The technological means for thering is the sample of connection relationship to be clustered, solve in the prior art DBSCAN algorithm for the sample of density unevenness
The technical issues of this collection can not be clustered rationally determines reasonable sweep radius by first sample distance, later, based on scanning half
Diameter clusters neighbour's sample each other, ensure that cluster reasonability, when the sample distribution density unevenness in sample set, passes through
Neighbour's sample can be to avoid by the sample clustering cluster of the sample of sparse distribution and dense distribution each other.Meanwhile passing through frequency point
Cloth histogram is determined apart from mean value, is inputted without user, and the workload for adjusting ginseng manually is reduced.
Detailed description of the invention
Fig. 1 is a kind of sample set distribution schematic diagram that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart for sample clustering method that the embodiment of the present invention one provides;
Fig. 3 is another sample set distribution schematic diagram that the embodiment of the present invention one provides;
Fig. 4 is a kind of flow chart of sample clustering method provided by Embodiment 2 of the present invention;
Fig. 5 is a kind of k nearest neighbor figure provided by Embodiment 2 of the present invention;
Fig. 6 is a kind of frequency distribution histogram provided by Embodiment 2 of the present invention;
Fig. 7 is a kind of adjacency matrix schematic diagram provided by Embodiment 2 of the present invention;
Fig. 8 is another adjacency matrix schematic diagram provided by Embodiment 2 of the present invention;
Fig. 9 is a kind of structural schematic diagram for sample clustering device that the embodiment of the present invention three provides;
Figure 10 is a kind of structural schematic diagram for sample clustering equipment that the embodiment of the present invention four provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used to explain the present invention, rather than limitation of the invention.It also should be noted that for the ease of retouching
It states, only the parts related to the present invention are shown in attached drawing rather than entire infrastructure.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is
One more representational density-based algorithms.In general, DBSCAN needs to be manually entered two parameters: eps and
minPts.For some object (i.e. some sample) in sample set, if the value of eps is E, by the scanning of the object half
Region in diameter E is denoted as the E neighborhood of the object.It, will if the sample points in the E neighborhood of the object are greater than or equal to minPts
The object is denoted as kernel object.For sample P and sample Q, if sample Q, in the E neighborhood of sample P, and sample P is core pair
As, then, sample Q is reachable from the direct density of sample P.For sample P1, sample P2..., sample PnIf sample PiFrom sample
Pi-1Direct density is reachable, then sample PnFrom sample P1Density is reachable.It is reachable to sample P density to set sample O, sample O to sample
This Q density is reachable, then sample P is connected with sample Q density.For DBSCAN, the purpose is to find the connected object of density
Maximum set.
For the sample set of density unevenness, sample set as shown in Figure 1, wherein Fig. 1 is what the embodiment of the present invention one provided
A kind of sample set distribution schematic diagram, with reference to Fig. 1, the sample of left-half is more intensive, and the sample of right half part is more sparse.This
When, if lesser value is arranged in eps, for example, the corresponding radius of circle 11 in Fig. 1 is set by eps, at this point, for right side
For the sample divided, due to scanning less than other samples, can be considered as noise spot and filter out.It will lead to right half in this way
Partial sample is largely filtered out, and influences cluster accuracy.If biggish value is arranged in eps, for example, setting eps in Fig. 1
The corresponding radius of circle 12, then, the sample of left-half and right half part can be gathered in scanning for one kind, alternatively, making
The great amount of samples of left-half is gathered for one kind, and then influences cluster accuracy.
To sum up, the embodiment of the present invention provides a kind of sample set clustering method, with solve for density unevenness sample set without
The problem of method rationally clusters.
Embodiment one
Fig. 2 is a kind of flow chart for sample clustering method that the embodiment of the present invention one provides.The sample provided in embodiment
Clustering method can be executed by sample clustering equipment, which can be realized by way of software and/or hardware,
The sample clustering equipment can be two or more physical entities and constitute, and is also possible to a physical entity and constitutes.For example, sample
Cluster equipment can be the smart machine with data operation, analysis ability such as computer, mobile phone, plate or interactive intelligent tablet computer.
Specifically, the sample clustering method specifically includes with reference to Fig. 2:
Step 110, statistical sample concentrate the corresponding first sample distance of each sample, and first sample distance is sample and sample
This distance between S neighbour's sample.
It illustratively, include multiple samples in sample set, the data type of each sample is identical.Wherein, the data of sample
Type may be set according to actual conditions, and embodiment is not construed as limiting this.Further, in embodiment to the sample in sample set
This describes sample clustering method for being clustered.Optionally, the acquisition modes embodiment of sample set is not construed as limiting, and can be
The data that sample clustering equipment voluntarily acquires are also possible to the data of user's input, can also be and handle specific data
The data obtained afterwards.In general, one data characteristics of each sample representation in sample set.For example, each sample indicates in sample set
Position data of the same user within the setting period in daily different time sections.
Typically, by taking Fig. 1 as an example, the sample in sample set is scattered in the different position of feature space.In general, sample position
Related with the feature that sample itself represents, feature is more similar, and the distance between sample is closer.Optionally, feature placement is preset
Rule, and then sample position is determined according to the rule.Wherein, the particular content that feature places rule can be set according to the actual situation
It is fixed.For example, dividing longitude and latitude in feature space for position data, later, the position data based on each sample includes
Longitude and latitude determine each sample in the position of feature space.
Further, after obtaining sample set, sample clustering equipment can be calculated in sample set between any two sample
Distance.Wherein, the calculation embodiment of distance is not construed as limiting, for example, using Euclidean distance, Minkowski Distance, Man Ha
The modes such as distance of pausing determine the distance between sample.In general, distance is closer between two samples, show that two samples are more similar.
Illustratively, S neighbour's sample of sample refers to the sample close apart from sample S.For any sample, sample
Cluster equipment can calculate the sample at a distance from other samples, and according to each distance, determine the sample close apart from sample S
This, and it is denoted as S neighbour's sample.Wherein, S is positive integer, and specific value can be set in conjunction with actual conditions.Further, S
Value indicates that cluster is minimum comprising sample number.I.e. to multiple clusters are obtained after sample clustering, each cluster at least includes S sample.One
As for, the quantity of S neighbour's sample is 1, and in some cases, the quantity of S neighbour's sample may be multiple, at this point it is possible to
Optional S neighbour's sample.
Optionally, before determining S neighbour's sample, k nearest neighbor figure corresponding to each sample drawing, wherein K is positive integer
And it is greater than S, in general, the specific value of K may be set according to actual conditions.Further, for the k nearest neighbor figure of some sample and
Speech, vertex are the sample, and include the K neighbour sample nearest apart from the sample in figure, meanwhile, by sample and K neighbour's sample
Originally it is respectively connected with, and the weight of any line is the distance between two samples of the line.For example, it when K is equal to 8, obtains
8 samples for taking distance sample nearest, and distinguished line, at this point, the sample at line both ends can consider connection relationship,
And the weight of its line is the distance between two samples.After determining k nearest neighbor figure, sample can be obtained according to k nearest neighbor figure
S neighbour sample and corresponding distance.
Further, first sample distance is denoted as at a distance from by sample between S neighbour's sample.
Step 120, the first sample distance in whole first sample distances, in acquisition set distance range.
Specifically, being obtained after set distance range statistics first sample distance, for calculating ginseng when sweep radius
Examine data.For the sample set of sample rate unevenness, in the sparse region of sample, first sample distance usually compared with
Greatly, in the intensive region of sample, first sample distance is usually smaller.At this point, the accuracy in order to guarantee sweep radius, needs to join
Set distance range is examined, the first sample distance only obtained in set distance range obtains sweep radius.In general, setting away from
It is data representative in whole first sample distances with a distance from the first sample in range.
Illustratively, each first sample is counted apart from corresponding sample size.Wherein, sample size 50 indicate to deposit
50 samples first sample apart from identical.Further, set distance range is determined according to sample size.For example, being based on
Sample size constructs frequency distribution histogram, wherein the particular number of the bin of frequency distribution histogram can be in conjunction with sample set
Total sample number amount determines.Further, horizontal axis represents first sample distance in frequency distribution histogram, and the longitudinal axis represents first sample
The sample size of distance.Obtain frequency maximum bin in frequency distribution histogram, wherein the corresponding sample size of frequency maximum bin
At most.Later, in the rear position bin of frequency maximum bin, the sample size difference between two bin of arbitrary neighborhood, selection are calculated
The maximum two neighboring bin of difference.Wherein, rear position bin refers to the bin for being located at the rear frequency maximum bin for horizontal axis.Into one
Step, select the bin being located behind in two neighboring bin, and the bin at the rear and frequency maximum bin is two the corresponding
One sample distance is determined as two distance thresholds of set distance range.Either, digit is set after obtaining frequency maximum bin
Position bin afterwards, and by the corresponding two first sample distances of rear position bin and frequency maximum bin for setting digit be determined as setting away from
Two distance thresholds from range.For another example, each first sample is counted after corresponding sample size, by being manually based on sample
Quantity determines set distance range.It is in embodiment, difference maximum rear bin and frequency maximum bin is two corresponding
One sample distance for two distance thresholds of set distance range as being described.Wherein, frequency maximum bin corresponding
One sample distance is smaller, therefore, as the small distance threshold value of set distance range, in cluster, to guarantee that scanning is arrived
The similar sample of sufficient amount feature.The corresponding first sample of the maximum rear bin of difference is apart from larger, therefore, as
The relatively large distance threshold value of set distance range.In general, the maximum rear bin of difference shows that the corresponding sample size of the bin is reduced
Amplitude is maximum, i.e., for explanation since the corresponding first sample distance of the bin, sample size is fewer and fewer, correspondingly, sample and its
Feature difference is increasing between S neighbour's sample, therefore, regard the corresponding first sample distance of the bin as set distance range
Larger threshold value, the sample to differ greatly can be ignored when calculating sweep radius, and then guarantee cluster accuracy.
Step 130 is calculated based on the first sample distance in set distance range apart from mean value.
Specifically, whole samples and corresponding first sample distance of the statistics in set distance range.Later, it will set
Whole first samples distance in distance range is added, and be will add up result and be denoted as sample total distance.Meanwhile counting set distance
Total sample number amount in range, and by sample total distance divided by total sample number amount, and then obtained quotient is denoted as apart from mean value.This
When, this indicates sweep radius apart from mean value.
Step 140 is based on the corresponding k nearest neighbor sample set of each sample, determines whole connection samples of each sample,
In, K > S, the connection sample of sample and sample is neighbour's sample and there are connection relationships each other.
Specifically, obtaining with the sample after calculating the distance between each sample and other each samples apart from nearest K
Sample forms the corresponding k nearest neighbor sample set of the sample, wherein each sample in k nearest neighbor sample set can be denoted as neighbour
There are neighbor relationships between sample, i.e. sample and neighbour's sample.If the k nearest neighbor figure of building sample in advance, this step can be straight
It obtains and takes the sample in k nearest neighbor figure as k nearest neighbor sample set.
Further, connection sample refers to the sample for having connection relationship with current sample, in general, with connection relationship
Two samples can be denoted as neighbour's sample each other, and neighbour's sample can be understood as the corresponding k nearest neighbor sample set of two samples each other
It include another sample in conjunction.Specifically, obtaining the k nearest neighbor sample set of current sample, and then obtain k nearest neighbor sample
The k nearest neighbor sample set of each neighbour's sample in set.Determine whether current sample is included in the k nearest neighbor sample of each neighbour's sample
In this set, if current sample is included in the k nearest neighbor sample set of some neighbour's sample, by neighbour's sample and current sample
Originally it is determined as neighbour's sample each other, and saves connection relationship between neighbour's sample and current sample, at this point, neighbour's sample is denoted as
The connection sample of current sample.Wherein, the mode for saving connection relationship can be in sample set, draw between two samples
Line.Further, the corresponding all connection samples of each sample can be determined according to above-mentioned steps.Correspondingly, close for K
Disconnected sample in adjacent sample set, can not save its connection relationship.
Further, it is also possible to determine connection sample by way of adjacency matrix.Specifically, being based on k nearest neighbor sample set structure
Build adjacency matrix, wherein each element is for indicating whether corresponding two samples are neighbor relationships in adjacency matrix.If close
Adjacent relationship, then corresponding element is nonzero element, if not neighbor relationships, then corresponding element is neutral element.Further, really
Whether the element for determining any group of positional symmetry in adjacency matrix is nonzero element, if so, the corresponding K of two samples of explanation is close
It include another sample in adjacent sample set, i.e. two samples is neighbour's samples each other and have connection relationship.Traversal adjoining
After whole symmetry elements of matrix, the connection sample of each sample can be determined.
For example, Fig. 3 is another sample set schematic diagram that the embodiment of the present invention one provides.It is right with reference to Fig. 3, K 5
In sample A, k nearest neighbor sample set includes 5 samples connecting with sample A for solid line, wherein there are two sample and sample A
Tie-portion be overlapped.For sample B, k nearest neighbor sample set includes 4 samples and sample connecting with sample B for solid line
This A.Although in the k nearest neighbor sample set of sample B including sample A, sample is not included in the k nearest neighbor sample set of sample A
B, therefore, sample B and sample A are not neighbour's samples each other, do not save the connection relationship of sample A and sample B at this time.According to above-mentioned
After mode traverses whole samples, the connection sample that there is connection relationship with each sample can be obtained.
It should be noted that embodiment do not limit step 140 and step 110- step 130 execute sequence, practical application
In, step 140 can also be first carried out, then execute step 110- step 130 or step 140 and step 110- step 130 together
Step executes.
Step 150 clusters the sample in sample set according to connection sample, apart from mean value and S value, is apart from mean value
Sweep radius, S value are that cluster is minimum comprising sample number.
Wherein, cluster minimum is minPts comprising sample number.It illustratively, will be apart from mean value as sweep radius, by S value
DBSCAN cluster is carried out to the sample in sample set as minPts.Specifically, select some sample as current sample, it
Afterwards, to be scanned apart from mean value as peripheral region of the sweep radius to current sample.At this point, during the scanning process, only obtaining
The connection sample of current sample is taken, later, if the quantity of connection sample is greater than S, the connection that current sample and scanning are obtained
Sample clustering is cluster.
Either, the connection total sample number amount of each sample is determined, if connection total sample number amount is greater than S, by the sample
As core sample.After traversing whole samples in sample set, whole core samples is found.Later, an optional core sample
As current sample, and obtain whole connection samples of current sample.Further, the connection sample that will acquire is as top
Point continues the whole connection samples for obtaining each connection sample, and then the connection sample that will acquire as vertex, and after
It is continuous to obtain its corresponding all connection sample, this operation is repeated, until traversing less than new connection sample, at this point, will
The whole connection samples arrived and current sample clustering are cluster.Later, not processed core sample is obtained, and is continued according to above-mentioned
Step obtains connection sample, and then forms new cluster.After determining that each core sample is processed, end of clustering is determined.It can
With understanding, in cluster process, if some the connection sample obtained is core sample, in the follow-up process, Bu Huizai
Any processing is carried out to the core sample.
At this point, being directed to for the sample set of Fig. 3, in the process of cluster, sample A and sample B will not cluster cluster, sample
A can other samples similar to its feature and density comparatively dense clustered, sample B can and density similar to its feature it is sparse
Other samples clustered, ensure that cluster reasonability.
It is above-mentioned, the first sample distance between each sample and its S neighbour's sample is concentrated by statistical sample, and be based on
First sample distance is obtained apart from mean value, meanwhile, K (K > S) neighbour's sample set based on each sample determines each sample
Corresponding connection sample, for neighbour's sample each other and with connection relationship between the connection sample and sample, later, based on cluster
The technology hand that minimum clusters the sample with connection relationship comprising sample number (S value) and sweep radius (apart from mean value)
Section, solves the technical issues of DBSCAN algorithm can not rationally cluster the sample set of density unevenness in the prior art, passes through
First sample distance determines that reasonable sweep radius clusters neighbour's sample each other based on sweep radius, ensure that later
Reasonability is clustered, it, can be to avoid by sparse distribution by neighbour's sample each other when the sample distribution density unevenness in sample set
Sample and dense distribution sample clustering cluster, and then guarantee cluster accuracy.
Embodiment two
Fig. 4 is a kind of flow chart of sample clustering method provided by Embodiment 2 of the present invention.The present embodiment is in above-mentioned reality
It applies and is embodied on the basis of example.With reference to Fig. 4, sample clustering method provided in this embodiment includes:
The k nearest neighbor figure of each sample in step 201, building sample set, the weight of each edge is corresponding sample in k nearest neighbor figure
Between distance.
Specifically, after calculating the distance between each sample and other each samples, when drawing the k nearest neighbor figure of some sample,
Using the sample as vertex, and obtained according to the distance between sample away from K nearest sample of the sample and corresponding distance.
Later, the line on vertex and K sample, and the weight by vertex at a distance from corresponding sample as the line are drawn respectively.It lifts
For example, Fig. 5 is a kind of k nearest neighbor figure provided by Embodiment 2 of the present invention.It is the k nearest neighbor figure of sample C in Fig. 3 with reference to Fig. 5,
Wherein, 6 K obtain K nearest sample of distance sample C according to sample C at a distance from other samples and distance are drawn later
The line of this C of sample preparation and K sample, and the weight by distance as line.It should be noted that line shown in Fig. 5
Weight is only used for description k nearest neighbor figure, not adjusts the distance or the restriction apart from calculating.In practical application, it is close that weight can not be shown in K
In adjacent figure.In general, it is based on k nearest neighbor figure, the line relationship in available sample set between each sample, i.e. neighbor relationships.
It should be noted that the benefit of building k nearest neighbor figure is easy for subsequent determining first sample distance, connection sample etc.,
It is convenient for subsequent calculating.
Step 202, statistical sample concentrate the corresponding first sample distance of each sample, and first sample distance is sample and sample
This distance between S neighbour's sample.
Due to S < K, it can obtain in the k nearest neighbor figure of each sample away from nearest S neighbour's sample,
And first sample distance is determined according to the weight of sample and the line of S neighbour's sample.
Step 203 is based on whole first sample distances, constructs frequency distribution histogram.
Specifically, counting the frequency of occurrence of each first sample distance, and using the number as corresponding first sample distance
Frequency, later, based on each frequency construct frequency distribution histogram.In practical application, it is contemplated that there are numerical value very close to
One sample distance, for example, two first samples distance is respectively 0.585 and 0.593, specific numerical value relatively, at this point,
If each corresponding frequency of first sample distance, will increase Statistical Complexity and frequency distribution histogram complexity, therefore.It is real
It applies in example, first sample distance is grouped in advance, later, the occurrence out of statistics first sample distance in each grouping
Number, and using frequency of occurrence as the corresponding frequency of the grouping.For example, there is 1100 the first samples in the grouping of distance 0.55-0.65
This distance, therefore, the corresponding frequency of the grouping are 1100.It should be noted that rule of classification embodiment is not construed as limiting.Further
, when establishing frequency distribution histogram, distance (being first sample distance in this example) is indicated with abscissa, ordinate indicates
Sample number (i.e. frequency).At this point, each distance grouping can be denoted as a bin.In general, sample of the quantity of bin with sample set
Total quantity is related.For example, the total sample number amount of sample set 5000 hereinafter, at this point, the quantity of bin can be set to 10, sample is total
Quantity is more than to set 500 samples of every increases, the quantity increase by 1 of bin after 5000.Optionally, it is contemplated that some are grouped interior frequency
Seldom, in subsequent calculating, reference significance is lower, therefore, can ignore the frequency of distance grouping in conjunction with actual conditions.
For example, Fig. 6 is a kind of frequency distribution histogram provided by Embodiment 2 of the present invention.With reference to Fig. 6, the frequency disribution is straight
The abscissa of square figure is distance, and ordinate is sample number, i.e. frequency, and bin is 10.The corresponding distance of each bin as shown in Figure 6
Range and frequency, also, from 1 to 10 sequence of bin arranges.
The benefit that frequency distribution histogram is arranged is to clearly show that frequency disribution situation in each distance range, and be easy to
Show the difference of frequency between each distance range.
The frequency of each bin in step 204, statistics frequency distribution histogram, to determine set distance range.
In embodiment, the frequency set based on each bin determines set distance range.At this point, the step specifically includes step
2041- step 2044:
Step 2041 obtains frequency maximum bin in frequency distribution histogram.
Specifically, counting the corresponding frequency of each distance range, later, the corresponding bin of maximum frequency is obtained, and is denoted as frequency
Number maximum bin.For example, determining that frequency maximum bin is 5 based on ordinate with reference to Fig. 6.
Frequency drop after step 2042, calculating are adjacent between the bin of position, rear position bin are to be located in frequency distribution histogram
The bin at the rear frequency maximum bin.
Illustratively, any bin for being located at some rear bin is denoted as the rear position of the bin by putting in order according to bin
bin.In embodiment, position bin after the whole of frequency maximum bin is obtained.By taking Fig. 6 as an example, the rear position bin of frequency maximum bin is the 6th
A bin to the 10th bin.
Further, adjacent bin refers to two adjacent bin of sequence, for example, in Fig. 6, the 1st bin and the 2nd bin is
Adjacent bin, the 2nd bin and the 3rd bin are adjacent bin, and so on.In embodiment, adjacent bin after acquisition in the bin of position,
And calculate the frequency difference of two bin in adjacent bin, wherein frequency difference, which is positive, to be counted and be denoted as frequency drop.Calculating frequency
When difference, can be and subtraction is done to the adjacent corresponding frequency of two bin, if result be positive number, directly using the result as
Frequency drop, if result is negative, using the absolute value of the result as frequency drop.
Optionally, in embodiment, count frequency drop when, can also calculate frequency maximum bin it is adjacent thereto after position bin it
Between frequency drop, and by the frequency drop with it is each it is adjacent after the calculating of this step is used as together with frequency drop between the bin of position
As a result.
The maximum adjacent rear position bin of step 2043, confirmation frequency drop, and position is selected in maximum adjacent rear position bin
Bin in rear.
In general, frequency drop is bigger, show the bin being located behind in adjacent bin and the frequency between the bin in front
It differs more, and then determines that the corresponding sample size of bin being located behind in adjacent bin significantly reduces.In embodiment, statistics is each
After frequency drop, select frequency drop it is maximum it is adjacent after position bin, at this point, after this is adjacent in the bin of position, the bin that is located behind
Corresponding sample size significantly reduces, and the corresponding sample size of bin for being located at the rear bin is seldom, and representativeness is lower,
It is smaller to accuracy contribution when calculating apart from mean value, therefore, it can be ignored when calculating.Accordingly, it is set in embodiment
Surely calculated result of the bin being located behind in the maximum adjacent rear position bin of frequency drop as this step is selected.For example,
With reference to Fig. 6, frequency drop between the 6th bin and the 7th bin, the frequency between the 7th bin and the 8th bin are calculated separately
The frequency drop between frequency drop and the 9th bin and the 10th bin between drop, the 8th bin and the 9th bin.Meter
It after calculating each frequency drop, determines, the frequency drop of frequency drop is maximum between the 6th bin and the 7th bin, at this point, selection position
The 7th bin in rear.
Step 2044, by the corresponding first sample distance of frequency maximum bin first sample corresponding with the bin being located behind
Distance threshold of the distance as set distance range.
Specifically, due to the corresponding distance range of each bin, accordingly, it is determined that can be selection when set distance range
The minimum value of each distance range is as distance threshold.It is set for example, the minimum value of the corresponding distance range of frequency maximum bin is used as
Small distance threshold in set a distance range, using the minimum value of the corresponding distance range of the bin at rear as in set distance range
Big distance threshold.It is also possible to select the maximum value of each distance range as distance threshold.For example, bin pairs of frequency maximum
The maximum value for the distance range answered is as distance threshold small in set distance range, by the corresponding distance range of the bin at rear
Maximum value as distance threshold big in set distance range.It can also be and combine actual conditions, select bin pairs of frequency maximum
The minimum value for the distance range answered is as distance threshold small in set distance range, by the corresponding distance range of the bin at rear
Maximum value as distance threshold big in set distance range.For example, with reference to Fig. 6, by the 5th corresponding distance range of bin
In minimum value as the small distance threshold in set distance range, by the maximum value in the 7th corresponding distance range of bin
As the big distance threshold in set distance range.It is also possible to determine the distance value placed in the middle of each distance range, and will occupies
Middle distance value is as distance threshold.
In general, by the corresponding first sample distance of frequency maximum bin first sample distance corresponding with the bin being located behind
The benefit of distance threshold as set distance range is that subsequent calculating is after mean value, it is ensured that by this apart from mean value
It clusters compared with multisample, and filters out apart from farther away sample, that is, ensure that the reasonability apart from mean value.
It is understood that step 2041- step 2044 is only to determine the optional way of set distance range.Practical application
In, it can be combined with frequency distribution histogram using other modes and determine set distance range.
First sample distance in step 205, acquisition set distance range.
Step 206 obtains sample size of the first sample distance in set distance range.
Specifically, due to the corresponding first sample distance of a sample.Therefore, it can count in set distance range
First sample distance total number, and using total number as sample size.Sample can also be determined in conjunction with frequency distribution histogram
This quantity, at this point, the corresponding frequency of each bin between frequency maximum bin and the bin being located behind is added, to obtain sample
This quantity.For example, the corresponding frequency of the 5th bin, the 6th bin and the 7th bin is added, can be obtained with reference to Fig. 6
Sample size.
Step 207 is added first sample distance each in set distance range, to obtain sample total distance.
Specifically, each first sample distance in set distance range is added, and result is denoted as sample total distance.
Either, sample total distance is obtained based on frequency distribution histogram, at this point it is possible to by the frequency of corresponding bin multiplied by respective distances
The distance value placed in the middle of range in the results added that will be obtained again, and then obtains sample total distance later.For example, being obtained with reference to Fig. 6
Take the 5th bin respective distances range, later, select the distance value placed in the middle of distance range.By the corresponding distance placed in the middle of the 5th bin
Value is multiplied to obtain with frequency first as a result, later, according to same calculation obtain corresponding second result of the 6th bin with
And the 7th corresponding third of bin be as a result, later, by three results addeds, to obtain sample total distance.
Step 208, using the quotient of sample total distance and sample size as apart from mean value.
Specifically, first sample distance in set distance range can be obtained divided by sample size with sample total distance
Average distance.In embodiment, average distance is denoted as apart from mean value, and sweep radius will be set as apart from mean value.Compared to
Artificial invisible scanning radius, the actual conditions that the present embodiment can gather sample set are adaptively swept in existing DBSCAN algorithm
Radius is retouched, and Principle of Statistics is utilized and determines sweep radius, ensure that the reasonability of sweep radius.
Step 209 obtains the corresponding k nearest neighbor sample set of each sample.
Specifically, the k nearest neighbor figure based on each sample, obtains K neighbour's sample of each sample, and form k nearest neighbor sample
This set.I.e. by the k nearest neighbor sample set on whole samples composition vertex in k nearest neighbor figure in addition to vertex.
Step 210, according to whole k nearest neighbor sample sets, construct adjacency matrix, each element, which represents, in adjacency matrix corresponds to
Neighbor relationships between two samples.
Wherein, adjacency matrix is to store all samples of sample set with an one-dimension array;Sample is stored with a two-dimensional array
The data of relationship between this collection.Adjacency matrix can be divided into digraph adjacency matrix and non-directed graph adjacency matrix.In embodiment, with nothing
To for figure adjacency matrix.Specifically, each sample is arranged in order, after arrangement, the corresponding number of each sample.Its
In, queueing discipline embodiment is without limitation.Further, horizontally and vertically using the sample after arrangement as matrix, it
Afterwards, show whether between corresponding two samples be neighbor relationships with intersection point element horizontally and vertically.Specifically, if some sample
Included in the k nearest neighbor sample set of another sample, then the sample and another sample are neighbor relationships, at this point, will be with this
Sample is ordinate, another sample is that the intersection point element of abscissa is denoted as nonzero element.Wherein, the occurrence of nonzero element can
To be set according to actual conditions.In embodiment, by taking nonzero element is 1 as an example.In practical application, nonzero element can also be correspondence
Distance value or other numerical value.Correspondingly, if some sample is not included in the k nearest neighbor sample set of another sample, it should
Sample and another sample are non-neighbors relationship, at this point, will be using the sample as ordinate, another sample is the intersection point of abscissa
Element is denoted as neutral element, that is, is denoted as 0.For example, Fig. 7 is a kind of adjacency matrix schematic diagram provided by Embodiment 2 of the present invention.
With reference to Fig. 7,8 samples are currently shared, at this point, assigning 8 samples to 1-8 number in order.Later, in horizontally and vertically upper row
8 samples of column, i.e., using 8 sample numbers as abscissa and ordinate, later, according to the neighbor relationships building two between sample
Tie up matrix.At this point, in two-dimensional matrix the i-th row jth arrange element show j-th of sample whether be i-th of sample neighbour's sample.
For example, the 2nd row the 3rd column element be 1, then show number be 2 the corresponding k nearest neighbor sample set of sample in comprising number be 3
Sample.For another example, the element of the 7th row the 1st column is 0, then shows to number and not wrap in the corresponding k nearest neighbor sample set of sample for being 7
The sample for being 1 containing number.In general, can be in adjacency matrix according to k nearest neighbor sample set after determining k nearest neighbor sample set
Element carry out assignment.In general, the element of the i-th row i-th column shows the connection relationship of sample and itself, in embodiment, by this yuan
Element is denoted as 1.It is understood that lateral 1-8 number and longitudinal 1-8 number are sample number in Fig. 7, disregard in line number and
In columns.
Optionally, in practical application, sample set includes many samples, at this point it is possible to construct one based on each sample
Adjacency matrix, and horizontally and vertically using the sample in respective sample and corresponding k nearest neighbor sample set as adjacency matrix, lead to
Neighbor relationships between sample and K neighbour's sample can be determined by crossing the adjacency matrix.Either, a neighbour is constructed based on sample set
Connect matrix.At this point, can determine the neighbor relationships in sample set between each sample by the adjacency matrix.
Nonzero element in step 211, statistics adjacency matrix, with whole connection samples of each sample of determination.
Specifically, if the k nearest neighbor sample set of some sample includes another sample, and the k nearest neighbor sample set of another sample
It closes and does not include the sample, it is determined that two samples are non-neighbour's sample each other, can be ignored in cluster.Therefore, it is necessary to find
Whole neighbour's samples each other, and determine based on neighbour's sample each other the connection sample of each sample.In embodiment, pass through adjacent square
Nonzero element in battle array determines neighbour's sample each other.For example, the number of two samples is respectively 5 and 6, and it is denoted as sample 5 and sample
This 6.At this point, whether the element of the 5th row the 6th column shows in the k nearest neighbor sample set of sample 5 to include sample 6 in adjacency matrix,
Whether the element of the 6th row the 5th column shows in the k nearest neighbor sample set of sample 6 comprising sample 5.If two elements are non-zero entry
Element, then illustrate sample 5 and sample 6 mutually includes, i.e., sample 5 and sample 6 are neighbour's sample, and preservation sample 5 and sample 6 each other
Connection relationship equally, sample 6 is denoted as to the connection sample of sample 5 at this point, sample 5 to be denoted as to the connection sample of sample 6.It presses
According to aforesaid way, each nonzero element is counted, the connection sample of each sample can be obtained.
Optionally, it in embodiment, when counting nonzero element, sets the step and specifically includes step 2111- step 2115:
Step 2111, in adjacency matrix, obtain be in symmetric position element group, element group includes that the i-th row jth arranges
The second element that first element and jth row i-th arrange.
Specifically, two elements that will abut against in matrix in symmetric position are denoted as element group.Wherein, symmetric position refers to
Two opposite positions of transverse and longitudinal coordinate.For example, the i-th row jth column and jth row i-th are classified as symmetric position, at this point, symmetrical by two
The corresponding element in position is denoted as an element group.Further, the element that the i-th row jth arranges is denoted as the first element, shows the
It whether include j-th of sample in the corresponding k nearest neighbor sample set of i sample.The element that jth row i-th arranges is denoted as second element,
Whether it shows in the corresponding k nearest neighbor sample set of j-th of sample comprising i-th of sample.By the element for obtaining symmetric position
Group can obtain the neighbor relationships between corresponding two samples.
If including at least one neutral element in step 2112, the first element and second element, by the first element and second
Element is disposed as neutral element.
Further, whether determine in the first element and second element comprising at least one neutral element, if the first element and
Include at least one neutral element in second element, then the first element and second element are disposed as neutral element, otherwise, keeps the
One element and second element are constant, and execute step 2123.It wherein, include at least one null element in the first element and second element
Element shows in corresponding two samples not including another sample in the k nearest neighbor sample set of at least one sample, i.e., two
Sample is non-neighbour's sample each other.At this point, the first element and second element are revised as neutral element, i.e., between two elements of cancellation
Neighbor relationships.For example, with reference to Fig. 7, the first element of the 1st row the 8th column is 1, and the second element that eighth row the 1st arranges is 0,
And first element and second element belong to the element group of symmetric position the first element that the 1st row the 8th arranges therefore be revised as 0,
Cancel the neighbor relationships of sample 1 and sample 8.
After step 2113, whole element groups of traversal adjacency matrix, adjacency matrix is updated.
Specifically, traversal after whole element groups of symmetric position, updates adjacency matrix.With adjoining shown in fig. 7
For matrix, at this point, updating the adjacency matrix, and obtain adjacency matrix shown in Fig. 8 after whole element groups in traversing graph 7.
Wherein, Fig. 8 is another adjacency matrix schematic diagram provided by Embodiment 2 of the present invention.
Nonzero element in adjacency matrix after step 2114, statistical updating, and corresponding two samples of nonzero element is true
It is set to neighbour's sample each other and there is connection relationship.
Specifically, corresponding two samples of any nonzero element are neighbour's sample each other in updated adjacency matrix.Cause
This, based on the nonzero element in updated adjacency matrix, side can determine all neighbour's sample each other.Further, consider
Therefore when obtaining nonzero element, symmetric position can be only obtained to the adjacency matrix that updated adjacency matrix is symmetrization
In a nonzero element, and determine that two samples are neighbour's sample each other according to the nonzero element.It determines each other adjacent to sample
Afterwards, the line between neighbour's sample each other can be retained in sample set.
Step 2115 is based on neighbour's sample each other, obtains whole connection samples of each sample.
Specifically, obtain whole neighbour's sample each other comprising some sample, and the whole that will acquire neighbour's sample each other
In whole connection samples of another sample as the sample.
Step 212, based on apart from mean value to all connection samples be filtered, with filter out the second sample distance be greater than distance
The connection sample of mean value, the second sample distance are the distance between sample and the connection sample of sample.
Specifically, corresponding the distance between the connection sample of sample is denoted as the second sample distance in embodiment, i.e.,
Distance between neighbour's sample each other is denoted as the second sample distance.Further, if the second sample distance is greater than apart from mean value,
Although illustrating corresponding two samples for neighbour's sample each other, its specific feature difference is larger, if cluster can together
Influence the accuracy of cluster result.Therefore, the connection relationship for rejecting two samples is set in embodiment, i.e., determines two samples
For non-neighbour's sample each other.At this point it is possible to delete the line between two samples in sample set.Meanwhile will abut against it is right in matrix
The element answered is adjusted to neutral element.It is understood that the corresponding connection sample of each sample can be filtered in the manner described above
This, and only it is retained less than the connection sample apart from mean value.At this point, the step is it can be appreciated that based on sweep radius to sample set
In the connection relationship of each sample be scanned, to obtain accurate connection relationship.
Step 213 clusters the sample in sample set based on the connection sample obtained after S value and filtering.
Specifically, the step includes step 2131- step 2139:
Step 2131, the connection total sample number amount for successively counting each sample.
Specifically, determining the line total quantity of each sample according to the line between sample each in sample set, and then connected
Connect total sample number amount.In general, the corresponding connection total sample number amount of each sample after filtering apart from mean value based on retaining
Connect the total quantity of sample.It is understood that the connection that can also only record between adjacent sample each other is closed in practical application
System, without being embodied a concentrated reflection of in sample.At this point it is possible to determine the connection sample of each sample according to the connection relationship of record, in turn
Obtain connection total sample number amount.
Step 2132 will connect sample of the total sample number amount greater than S value as core sample.
Specifically, the connection total sample number amount of each sample is compared with S value, if connection total sample number amount is greater than S
Corresponding sample is then denoted as core sample by value.In the manner described above, it after traversing each sample, can obtain in sample set
Whole core samples.Wherein, core sample can be understood as in cluster process, can be used as the sample of starting point.In general, each
Sample and its connection sample are because characteristic similarity is higher, it will usually cluster are clustered into, if the total quantity of some sample is less than S
Value then illustrates that there are sample sizes to be lower than the possibility that cluster minimum includes sample number in the subsequent cluster for clustering and obtaining, therefore, poly-
When class, the sample will not be selected as starting point, i.e., the sample will not be selected as core sample.
Step 2133, in obtained whole core samples, select any core sample as current sample.
Specifically, the starting point that an optional core sample is clustered as this, and it is denoted as current sample.It needs to illustrate
It is in embodiment, to determine current sample in a random basis.In practical application, current samples selection rule can also be set, and lead to
It crosses the rule and selects current sample.In general, the core sample is labeled as being accessed after determining current sample.
Whole connection samples of step 2134, the current sample of access.
Specifically, obtaining with current sample there is the whole of connection relationship to connect sample according to the connection relationship currently retained
This, and the whole connection sample labelings that will acquire are to be accessed.
Step 2135, each connection sample for obtaining access are as vertex, and the whole for accessing vertex correspondence connects
Connect sample.
Further, using currently available each connection sample as a vertex, later, according to what is currently retained
Connection relationship, continuing to obtain with each vertex there is the whole of connection relationship to connect sample.At this point, each vertex it is also assumed that
Sub- starting point in primary cluster.
Step 2136 is confirmed whether that access obtains new connection sample.If access obtains new connection sample, return is held
Row step 2135, if access thens follow the steps 2137 less than new connection sample.
Specifically, continuing to obtain with each vertex there is the whole of connection relationship to connect according to the connection relationship currently retained
When connecing sample, it is determined whether obtain new connection sample, i.e., whether obtain being not marked with the connection sample being accessed.If
To new connection sample, then currently there are also the new similar samples of feature for explanation, at this point it is possible to return to step 2135, i.e.,
Using the connection sample newly obtained as vertex, continue the whole connection samples for accessing the vertex, until cannot get new connection sample
Until this.If cannot get new connection sample, illustrate currently to have found the similar sample of whole features based on core sample
This.At this point it is possible to think this end of clustering, and execute 2137.
It should be noted that in this cluster process, if some core sample is considered as connection sample, by the core
Heart sample labeling is the sample being accessed.
Step 2137 is confirmed whether that there is also not visited core samples.If it exists, 2138 are thened follow the steps, otherwise,
Execute 2139.
Specifically, determining whether to determine whether that there are also be not labeled there are also not visited core sample
For the core sample being accessed.The core sample is then updated to current sample by not visited core sample if it exists, and
Start new primary cluster process, i.e. execution step 2138.If confirmation core sample is accessed, illustrate currently to have visited
It has asked the starting point that can be all clustered, cluster starting point can not be found again, therefore, executed step 2139.
Any core sample of not visited mistake is updated to current sample by step 2138.Return to step 2134.
Specifically, selecting current sample using random manner if the core sample quantity of not visited mistake is greater than 1.
If the core sample quantity of not visited mistake is 1, using the core sample as current sample.Later, it returns to step
2134, that is, start primary new cluster.
Step 2139, the connection sample clustering obtained by current sample and based on current sample interview are cluster.
Specifically, a cluster process may be considered an access process, at this point, setting will every time cluster obtain it is complete
Portion connects sample and current sample clustering is cluster.For sample set, there is cluster process several times can be obtained
The cluster of respective numbers.Optionally, the sample of not visited mistake always is denoted as noise spot.
For example, it is directed to for the sample set of Fig. 3, includes sample A in K (K=5) neighbour's sample set of sample B,
It is deleted comprising sample B at this point, the connection relationship between sample A and sample B can be kicked out of in the k nearest neighbor sample set of sample A
Dotted line between sample A and sample B.In subsequent cluster process, no matter access-sample A connection sample or access-sample B company
Sample is connect, sample A and sample B will not be clustered cluster, and then ensure that cluster reasonability.
It is above-mentioned, by constructing the k nearest neighbor figure of each sample, first sample distance is obtained based on k nearest neighbor figure, wherein first
Sample distance is the distance between sample and S (S < K) neighbour's sample, later, based on first sample distance building frequency disribution
Histogram, and determined according to frequency distribution histogram apart from mean value, meanwhile, adjacency matrix is constructed based on k nearest neighbor figure, and symmetrical adjacent
Matrix is connect, to determine the sample with connection relationship, later, by carrying out apart from mean value to the sample with connection relationship
Filter, and the technological means clustered based on filtered connection relationship and S value to sample set, are solved in the prior art
The technical issues of DBSCAN algorithm can not rationally cluster the sample set of density unevenness, by symmetrical adjacency matrix and based on away from
Mode from the sample that mean value filtering has connection relationship, can avoid different densities in the distribution density unevenness of sample
Sample is polymerized to a cluster, influences cluster accuracy.Meanwhile being determined by frequency distribution histogram apart from mean value, it is defeated without user
Enter, reduces the workload for adjusting ginseng manually, and determine by way of statistics apart from mean value, ensure that apart from the reasonable of mean value
Property, and then guarantee cluster accuracy.
Embodiment three
Fig. 9 is a kind of structural schematic diagram for sample clustering device that the embodiment of the present invention three provides.With reference to Fig. 9, the sample
Clustering apparatus includes: distance statistics module 301, distance acquisition module 302, mean value computation module 303, connection determining module 304
And sample clustering module 305.
Wherein, distance statistics module 301 concentrates the corresponding first sample distance of each sample for statistical sample, described
First sample distance is the distance between S neighbour's sample of the sample and the sample;Distance obtains module 302, is used for
In all first sample distances, the first sample distance in set distance range is obtained;Mean value computation module 303 is used
It calculates in based on the first sample distance in the set distance range apart from mean value;Determining module 304 is connected, for based on every
The corresponding k nearest neighbor sample set of a sample determines whole connection samples of each sample, wherein K > S, the sample with
The connection sample of the sample is neighbour's sample each other and there are connection relationships;Sample clustering module 305, for according to the company
Connect sample, it is described the sample in the sample set is clustered apart from mean value and S value, it is described apart from mean value be sweep radius,
The S value is that cluster is minimum comprising sample number.
It is above-mentioned, the first sample distance between each sample and its S neighbour's sample is concentrated by statistical sample, and be based on
First sample distance is obtained apart from mean value, meanwhile, K (K > S) neighbour's sample set based on each sample determines each sample
Corresponding connection sample, for neighbour's sample each other and with connection relationship between the connection sample and sample, later, based on cluster
The technology hand that minimum clusters the sample with connection relationship comprising sample number (S value) and sweep radius (apart from mean value)
Section, solves the technical issues of DBSCAN algorithm can not rationally cluster the sample set of density unevenness in the prior art, passes through
First sample distance determines that reasonable sweep radius clusters neighbour's sample each other based on sweep radius, ensure that later
Reasonability is clustered, it, can be to avoid by sparse distribution by neighbour's sample each other when the sample distribution density unevenness in sample set
Sample and dense distribution sample clustering cluster, and then guarantee cluster accuracy.
On the basis of the above embodiments, sample clustering module 305 includes: sample filter submodule, for based on described
All connection samples are filtered apart from mean value, are greater than the connection sample apart from mean value to filter out the second sample distance
This, the second sample distance is the distance between the connection sample of sample and the sample;Submodule is clustered, for being based on S
The connection sample obtained after value and filtering clusters the sample in the sample set.
On the basis of the above embodiments, cluster submodule includes: total quantity statistic unit, for successively counting each sample
This connection total sample number amount;Core sample determination unit, for using it is described connection total sample number amount be greater than S value sample as
Core sample;Current sample selecting unit, in obtained whole core samples, selecting any core sample as current
Sample;First access unit, for accessing whole connection samples of the current sample;Second access unit, for that will access
Obtained each connection sample accesses whole connection samples of the vertex correspondence respectively as vertex;Third access unit,
Each connection sample for repeating to obtain access accesses whole connection samples of the vertex correspondence as vertex
Operation, until access less than new connection sample until;Sample Refreshment unit, for by any core sample of not visited mistake
Originally it is updated to current sample, and returns to the operation for executing the whole connection samples for accessing the current sample, until whole cores
Until sample standard deviation is accessed;Cluster cluster cell, for by the current sample and the connection sample obtained based on current sample interview
This cluster is cluster.
On the basis of the above embodiments, connection determining module 304 includes: set acquisition submodule, each for obtaining
The corresponding k nearest neighbor sample set of sample;Adjacency matrix constructs submodule, for according to all k nearest neighbor sample sets, building
Adjacency matrix, each element represents the neighbor relationships between corresponding two samples in the adjacency matrix;Nonzero element counts submodule
Block, for counting nonzero element in the adjacency matrix, with whole connection samples of each sample of determination.
On the basis of the above embodiments, nonzero element statistic submodule includes: element group acquiring unit, for described
In adjacency matrix, the element group for being in symmetric position is obtained, the element group includes the first element and jth row of the i-th row jth column
The second element of i-th column;Zero setting unit, if for including at least one null element in first element and the second element
First element and second element are then disposed as neutral element by element;Matrix update unit, for traversing the adjacency matrix
Whole element groups after, update the adjacency matrix;Neighbor relationships determination unit, for non-in the adjacency matrix after statistical updating
Neutral element, and corresponding two samples of the nonzero element are determined as neighbour's sample each other and there is connection relationship;Connect sample
This determination unit, for obtaining whole connection samples of each sample based on neighbour's sample each other.
On the basis of the above embodiments, it includes: histogram building submodule that distance, which obtains module 302, for based on complete
First sample distance described in portion constructs frequency distribution histogram;Frequency statistics submodule, for counting the frequency disribution histogram
The frequency of each bin in figure, to determine set distance range;First distance acquisition submodule, for obtaining in set distance range
First sample distance.
On the basis of the above embodiments, Frequency statistics submodule includes: maximum bin acquiring unit, described for obtaining
Frequency maximum bin in frequency distribution histogram;Drop computing unit, for calculating the frequency drop between adjacent rear position bin, institute
Stating rear position bin is the bin for being located at the rear frequency maximum bin in the frequency distribution histogram;Bin confirmation unit, for confirming
The maximum adjacent rear position bin of frequency drop, and the bin being located behind is selected in the maximum adjacent rear position bin;Threshold value is true
Order member, for by the corresponding first sample distance of the frequency maximum bin and corresponding first sample of bin being located behind
Distance threshold of this distance as set distance range.
On the basis of the above embodiments, mean value computation module 303 includes: sample size acquisition submodule, for obtaining
Sample size of the first sample distance in the set distance range;Total distance submodule, for the setting
Each first sample distance is added in distance range, to obtain sample total distance;Quotient computational submodule, being used for will be described
The quotient of sample total distance and the sample size is used as apart from mean value.
On the basis of the above embodiments, further includes: k nearest neighbor figure constructs module, concentrates each sample for statistical sample
Before corresponding first sample distance, the k nearest neighbor figure of each sample in sample set is constructed, the power of each edge in the k nearest neighbor figure
Distance of the value between corresponding sample.
Sample clustering device provided in an embodiment of the present invention is included in sample clustering equipment, and can be used for executing above-mentioned
The sample clustering method that embodiment of anticipating provides, has corresponding function and beneficial effect.
Example IV
Figure 10 is a kind of structural schematic diagram for sample clustering equipment that the embodiment of the present invention four provides.As shown in Figure 10, should
Sample clustering equipment includes processor 40, memory 41, input unit 42 and output device 43;It is handled in sample clustering equipment
The quantity of device 40 can be one or more, in Figure 10 by taking a processor 40 as an example;Processor 40 in sample clustering equipment,
Memory 41, input unit 42 and output device 43 can be connected by bus or other modes, to pass through bus in Figure 10
For connection.
Memory 41 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer
Sequence and module, if the corresponding program instruction/module of sample clustering method in the embodiment of the present invention is (for example, sample clustering fills
It is poly- that distance statistics module 301, distance in setting obtain module 302, mean value computation module 303, connection determining module 304 and sample
Generic module 305).Software program, instruction and the module that processor 40 is stored in memory 41 by operation, thereby executing sample
The various function application and data processing of this cluster equipment realize above-mentioned sample clustering method.
Memory 41 can mainly include storing program area and storage data area, wherein storing program area can store operation system
Application program needed for system, at least one function;Storage data area can be stored to be created according to using for sample clustering equipment
Data etc..In addition, memory 41 may include high-speed random access memory, it can also include nonvolatile memory, such as
At least one disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, memory
41 can further comprise the memory remotely located relative to processor 40, these remote memories can be by being connected to the network extremely
Sample clustering equipment.The example of above-mentioned network include but is not limited to internet, intranet, local area network, mobile radio communication and
A combination thereof.
Input unit 42 can be used for receiving the number or character information of input, and generate the user with sample clustering equipment
Setting and the related key signals input of function control.Output device 43 may include that display screen etc. shows equipment.
Above-mentioned sample clustering equipment includes sample clustering device, can be used for executing arbitrary sample clustering method, has phase
The function and beneficial effect answered.
Embodiment five
The embodiment of the present invention also provides a kind of storage medium comprising computer executable instructions, and the computer is executable
Instruction is used to execute a kind of sample clustering method when being executed by computer processor, this method comprises:
Statistical sample concentrates the corresponding first sample distance of each sample, and the first sample distance is the sample and institute
State the distance between S neighbour's sample of sample;
In all first sample distances, the first sample distance in set distance range is obtained;
It is calculated based on the first sample distance in the set distance range apart from mean value;
Based on the corresponding k nearest neighbor sample set of each sample, whole connection samples of each sample are determined, wherein K
The connection sample of > S, the sample and the sample is neighbour's sample each other and there are connection relationships;
According to the connection sample, it is described the sample in the sample set is clustered apart from mean value and S value, it is described away from
It is sweep radius from mean value, the S value is that cluster is minimum comprising sample number.
Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention
The method operation that executable instruction is not limited to the described above, can also be performed sample clustering provided by any embodiment of the invention
Relevant operation in method.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention
It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more
Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art
Part can be embodied in the form of software products, which can store in computer readable storage medium
In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer
Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set
Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
It is worth noting that, included each unit and module are only pressed in the embodiment of above-mentioned sample clustering device
It is divided, but is not limited to the above division according to function logic, as long as corresponding functions can be realized;In addition,
The specific name of each functional unit is also only for convenience of distinguishing each other, the protection scope being not intended to restrict the invention.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (12)
1. a kind of sample clustering method characterized by comprising
Statistical sample concentrates the corresponding first sample distance of each sample, and the first sample distance is the sample and the sample
This distance between S neighbour's sample;
In all first sample distances, the first sample distance in set distance range is obtained;
It is calculated based on the first sample distance in the set distance range apart from mean value;
Based on the corresponding k nearest neighbor sample set of each sample, whole connection samples of each sample are determined, wherein K > S,
The connection sample of the sample and the sample is neighbour's sample each other and there are connection relationships;
According to the connection sample, it is described the sample in the sample set is clustered apart from mean value and S value, the distance is equal
Value is sweep radius, and the S value is that cluster minimum includes sample number.
2. sample clustering method according to claim 1, which is characterized in that it is described according to the connection sample, it is described away from
Carrying out cluster to the sample in the sample set from mean value and S value includes:
All connection samples are filtered apart from mean value based on described, are greater than the distance to filter out the second sample distance
The connection sample of mean value, the second sample distance are the distance between the connection sample of sample and the sample;
The sample in the sample set is clustered based on the connection sample obtained after S value and filtering.
3. sample clustering method according to claim 2, which is characterized in that described based on the company obtained after S value and filtering
It connects sample and the sample in the sample set cluster and include:
Successively count the connection total sample number amount of each sample;
The connection total sample number amount is greater than the sample of S value as core sample;
In obtained whole core samples, select any core sample as current sample;
Access whole connection samples of the current sample;
Each connection sample that access is obtained accesses whole connection samples of the vertex correspondence as vertex;
The each connection sample for repeating to obtain access accesses whole connection samples of the vertex correspondence as vertex
Operation, until access less than new connection sample until;
Any core sample of not visited mistake is updated to current sample, and returns to the whole for executing and accessing the current sample
The operation for connecting sample, until whole core samples are accessed;
It is cluster by the current sample and the connection sample clustering obtained based on current sample interview.
4. sample clustering method according to claim 1, which is characterized in that described to be based on the corresponding K of each sample
Neighbour's sample set determines that whole connection samples of each sample include:
Obtain the corresponding k nearest neighbor sample set of each sample;
According to all k nearest neighbor sample sets, adjacency matrix is constructed, each element represents two corresponding in the adjacency matrix
Neighbor relationships between sample;
Nonzero element in the adjacency matrix is counted, with whole connection samples of each sample of determination.
5. sample clustering method according to claim 4, which is characterized in that non-zero entry in the statistics adjacency matrix
Element, connecting samples with the whole of each sample of determination includes:
In the adjacency matrix, the element group for being in symmetric position is obtained, the element group includes the first of the i-th row jth column
The second element that element and jth row i-th arrange;
If including at least one neutral element in first element and the second element, by first element and second yuan
Element is disposed as neutral element;
After the whole element groups for traversing the adjacency matrix, the adjacency matrix is updated;
Nonzero element in adjacency matrix after statistical updating, and corresponding two samples of the nonzero element are determined as each other closely
Adjacent sample and have connection relationship;
Based on neighbour's sample each other, whole connection samples of each sample are obtained.
6. sample clustering method according to claim 1, which is characterized in that described in all first sample distances
In, the first sample distance obtained in set distance range includes:
Based on all first sample distances, frequency distribution histogram is constructed;
The frequency of each bin in the frequency distribution histogram is counted, to determine set distance range;
Obtain the first sample distance in set distance range.
7. sample clustering method according to claim 6, which is characterized in that in the statistics frequency distribution histogram
The frequency of each bin, to determine that set distance range includes:
Obtain frequency maximum bin in the frequency distribution histogram;
The frequency drop between adjacent rear position bin is calculated, the rear position bin is to be located at frequency most in the frequency distribution histogram
The bin at the big rear bin;
Confirm frequency drop it is maximum it is adjacent after position bin, and it is described it is maximum it is adjacent after in the bin of position selection be located behind
bin;
By the corresponding first sample distance of the frequency maximum bin and the corresponding first sample distance of bin being located behind
Distance threshold as set distance range.
8. sample clustering method according to claim 1, which is characterized in that described based in the set distance range
First sample distance, which is calculated apart from mean value, includes:
Obtain sample size of the first sample distance in the set distance range;
First sample distance each in the set distance range is added, to obtain sample total distance;
Using the quotient of the sample total distance and the sample size as apart from mean value.
9. sample clustering method according to claim 1, which is characterized in that the statistical sample concentrates each sample corresponding
First sample distance before, further includes:
The k nearest neighbor figure for constructing each sample in sample set, in the k nearest neighbor figure weight of each edge be between correspondence sample away from
From.
10. a kind of sample clustering device characterized by comprising
Distance statistics module concentrates the corresponding first sample distance of each sample, the first sample distance for statistical sample
For the distance between S neighbour's sample of the sample and the sample;
Distance obtains module, in all first samples distances, obtain the first sample in set distance range away from
From;
Mean value computation module, for being calculated based on the first sample distance in the set distance range apart from mean value;
Determining module is connected, for being based on the corresponding k nearest neighbor sample set of each sample, determines that the whole of each sample connects
Connect sample, wherein the connection sample of K > S, the sample and the sample is neighbour's sample each other and there are connection relationships;
Sample clustering module, for according to the connection sample, it is described apart from mean value and S value to the sample in the sample set into
Row cluster, it is described apart from mean value be sweep radius, the S value be cluster it is minimum include sample number.
11. a kind of sample clustering equipment, which is characterized in that the sample clustering equipment includes:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
The now sample clustering method as described in any in claim 1-9.
12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The sample clustering method as described in any in claim 1-9 is realized when execution.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910551643.8A CN110276401A (en) | 2019-06-24 | 2019-06-24 | Sample clustering method, apparatus, equipment and storage medium |
PCT/CN2019/126716 WO2020258772A1 (en) | 2019-06-24 | 2019-12-19 | Sample clustering method, apparatus and device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910551643.8A CN110276401A (en) | 2019-06-24 | 2019-06-24 | Sample clustering method, apparatus, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110276401A true CN110276401A (en) | 2019-09-24 |
Family
ID=67961664
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910551643.8A Pending CN110276401A (en) | 2019-06-24 | 2019-06-24 | Sample clustering method, apparatus, equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110276401A (en) |
WO (1) | WO2020258772A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111366160A (en) * | 2020-05-25 | 2020-07-03 | 深圳市城市交通规划设计研究中心股份有限公司 | Path planning method, path planning device and terminal equipment |
WO2020258772A1 (en) * | 2019-06-24 | 2020-12-30 | 广州视源电子科技股份有限公司 | Sample clustering method, apparatus and device and storage medium |
CN113239964A (en) * | 2021-04-13 | 2021-08-10 | 联合汽车电子有限公司 | Vehicle data processing method, device, equipment and storage medium |
CN113239963A (en) * | 2021-04-13 | 2021-08-10 | 联合汽车电子有限公司 | Vehicle data processing method, device, equipment, vehicle and storage medium |
CN114093521A (en) * | 2022-01-20 | 2022-02-25 | 广东工业大学 | Random forest based method and system for estimating blood sugar by reconstructing homogenized samples |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902655A (en) * | 2014-02-28 | 2014-07-02 | 小米科技有限责任公司 | Clustering method and device and terminal device |
CN105930856A (en) * | 2016-03-23 | 2016-09-07 | 深圳市颐通科技有限公司 | Classification method based on improved DBSCAN-SMOTE algorithm |
CN108776806A (en) * | 2018-05-08 | 2018-11-09 | 河海大学 | Mixed attributes data clustering method based on variation self-encoding encoder and density peaks |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI385544B (en) * | 2009-09-01 | 2013-02-11 | Univ Nat Pingtung Sci & Tech | Density-based data clustering method |
CN110276401A (en) * | 2019-06-24 | 2019-09-24 | 广州视源电子科技股份有限公司 | Sample clustering method, apparatus, equipment and storage medium |
-
2019
- 2019-06-24 CN CN201910551643.8A patent/CN110276401A/en active Pending
- 2019-12-19 WO PCT/CN2019/126716 patent/WO2020258772A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902655A (en) * | 2014-02-28 | 2014-07-02 | 小米科技有限责任公司 | Clustering method and device and terminal device |
CN105930856A (en) * | 2016-03-23 | 2016-09-07 | 深圳市颐通科技有限公司 | Classification method based on improved DBSCAN-SMOTE algorithm |
CN108776806A (en) * | 2018-05-08 | 2018-11-09 | 河海大学 | Mixed attributes data clustering method based on variation self-encoding encoder and density peaks |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020258772A1 (en) * | 2019-06-24 | 2020-12-30 | 广州视源电子科技股份有限公司 | Sample clustering method, apparatus and device and storage medium |
CN111366160A (en) * | 2020-05-25 | 2020-07-03 | 深圳市城市交通规划设计研究中心股份有限公司 | Path planning method, path planning device and terminal equipment |
CN111366160B (en) * | 2020-05-25 | 2020-10-27 | 深圳市城市交通规划设计研究中心股份有限公司 | Path planning method, path planning device and terminal equipment |
CN113239964A (en) * | 2021-04-13 | 2021-08-10 | 联合汽车电子有限公司 | Vehicle data processing method, device, equipment and storage medium |
CN113239963A (en) * | 2021-04-13 | 2021-08-10 | 联合汽车电子有限公司 | Vehicle data processing method, device, equipment, vehicle and storage medium |
CN113239963B (en) * | 2021-04-13 | 2024-03-01 | 联合汽车电子有限公司 | Method, device, equipment, vehicle and storage medium for processing vehicle data |
CN113239964B (en) * | 2021-04-13 | 2024-03-01 | 联合汽车电子有限公司 | Method, device, equipment and storage medium for processing vehicle data |
CN114093521A (en) * | 2022-01-20 | 2022-02-25 | 广东工业大学 | Random forest based method and system for estimating blood sugar by reconstructing homogenized samples |
CN114093521B (en) * | 2022-01-20 | 2022-04-12 | 广东工业大学 | Random forest based method and system for estimating blood sugar by reconstructing homogenized samples |
Also Published As
Publication number | Publication date |
---|---|
WO2020258772A1 (en) | 2020-12-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276401A (en) | Sample clustering method, apparatus, equipment and storage medium | |
Tao et al. | Approximate MaxRS in spatial databases | |
CN113593633B (en) | Convolutional neural network-based drug-protein interaction prediction model | |
CN109493119B (en) | POI data-based urban business center identification method and system | |
CN109067725A (en) | Network flow abnormal detecting method and device | |
CN109359115B (en) | Distributed storage method, device and system based on graph database | |
Du et al. | Representation and discovery of building patterns: A three-level relational approach | |
CN111737481B (en) | Method, device, equipment and storage medium for noise reduction of knowledge graph | |
Djenouri et al. | Fast and accurate deep learning framework for secure fault diagnosis in the industrial internet of things | |
Cuevas et al. | Evolutionary computation techniques: a comparative perspective | |
CN110781971A (en) | Merchant type identification method, device, equipment and readable medium | |
GB2534903A (en) | Method and apparatus for processing signal data | |
CN109661671A (en) | Improvement using boundary bitmap to image classification | |
López et al. | Automatic multi‐circle detection on images using the teaching learning based optimisation algorithm | |
CN114581473A (en) | Point cloud down-sampling method and device suitable for various scenes | |
Hossein-Abad et al. | Fuzzy c-means clustering method with the fuzzy distance definition applied on symmetric triangular fuzzy numbers | |
CN112005525A (en) | System and method for extracting structure from large, dense and noisy networks | |
CN116991955A (en) | Data processing method, device, electronic equipment and computer storage medium | |
CN104954873B (en) | A kind of smart television video method for customizing and system | |
Guada et al. | A novel edge detection algorithm based on a hierarchical graph-partition approach | |
CN115830342A (en) | Method and device for determining detection frame, storage medium and electronic device | |
CN109032940A (en) | A kind of test scene input method, device, equipment and storage medium | |
CN108256694A (en) | Based on Fuzzy time sequence forecasting system, the method and device for repeating genetic algorithm | |
CN106373151B (en) | Convex closure acquisition methods and device | |
CN110427558A (en) | The method for pushing and device of Energy Resources Service's director's part |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190924 |
|
RJ01 | Rejection of invention patent application after publication |