CN108846429B - Unsupervised learning-based network space resource automatic classification method and unsupervised learning-based network space resource automatic classification device - Google Patents

Unsupervised learning-based network space resource automatic classification method and unsupervised learning-based network space resource automatic classification device Download PDF

Info

Publication number
CN108846429B
CN108846429B CN201810548471.4A CN201810548471A CN108846429B CN 108846429 B CN108846429 B CN 108846429B CN 201810548471 A CN201810548471 A CN 201810548471A CN 108846429 B CN108846429 B CN 108846429B
Authority
CN
China
Prior art keywords
clustering
value
class
resources
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810548471.4A
Other languages
Chinese (zh)
Other versions
CN108846429A (en
Inventor
王继龙
缪葱葱
徐超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810548471.4A priority Critical patent/CN108846429B/en
Publication of CN108846429A publication Critical patent/CN108846429A/en
Application granted granted Critical
Publication of CN108846429B publication Critical patent/CN108846429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a network space resource automatic classification method and a device based on unsupervised learning, wherein the method comprises the following steps: collecting resources of a network environment where the classifier is located to generate a resource set; performing attribute marking on resources of the resource set according to preset n-dimensional attributes to generate a new resource set; extracting the characteristics of the new resource set through an extraction function to obtain a characteristic vector and obtain an initial sample space; obtaining a K value by a Parametric Bootstrap method, and classifying the initial sample space by using K mean value clustering so as to divide the initial sample space into K resources; and classifying the K resources into the network space resource atlas corresponding to each clustering center so as to perfect the network space resource atlas. The method can amplify and supplement the network space resource framework according to the clustering result, and is beneficial to the construction of the network space resource map.

Description

Unsupervised learning-based network space resource automatic classification method and device
Technical Field
The invention relates to the technical field of network space surveying and mapping, in particular to a network space resource automatic classification method and device based on unsupervised learning.
Background
The network space has become the fifth territory of human society, comprises multiple dimensions such as politics, economy, military, culture, society and ecology, and is developing into a new world parallel to the physical world. In recent years, with the development of internet technology and the diversification of the internet, network space resources show the trend of diversification development, but actually, network space still lacks the most basic conceptual model and the space theoretical basis at present. Many resources in the network exist objectively, but so far we have not named them systematically and comprehensively, especially not really standing inside the network space to name and describe them. The diversity and complexity of network resources bring considerable trouble to managers and users.
The development of the internet has driven the number of cyberspace resources and the amount of data to grow explosively. The network space resources refer to entity resources which can be directly perceived in a network space, and include various network application services, information resources and virtual subjects, in order to better express the connection relationship between the network space and the physical world, the network infrastructure is particularly brought into the research scope of the network space resources, and meanwhile, the network space resources are used more systematically and more efficiently, the safety of the network, namely the territory, is improved, and the classification of the network space resources is particularly important.
The complexity, diversity, increasing speed of the number of the network space resources and the birth speed of the novel network space resources all determine that the network space resources are infeasible to be classified only by manually marking, so that the network space resources are necessarily classified automatically by using an algorithm. The creatures all have corresponding maps, and network space resources also need the maps to classify the creatures.
Disclosure of Invention
The present invention is directed to solving, at least in part, one of the technical problems in the related art.
Therefore, an object of the present invention is to provide a method for automatically classifying network space resources based on unsupervised learning, which can automatically classify network resources and is beneficial to the construction of a network space resource map.
The invention also aims to provide a network space resource automatic classification device based on unsupervised learning.
In order to achieve the above object, an embodiment of the present invention provides a method for automatically classifying network space resources based on unsupervised learning, including the following steps: collecting resources of a network environment where the classifier is located to generate a resource set; performing attribute marking on the resources of the resource set according to preset n-dimensional attributes to generate a new resource set; extracting the characteristics of the new resource set through an extraction function to obtain a characteristic vector and obtain an initial sample space; obtaining a K value by a Parametric Bootstrap method, and classifying the initial sample space by using K mean value clustering so as to divide the initial sample space into K resources; and classifying the K resources into the network space resource atlas corresponding to the clustering center of each class so as to perfect the network space resource atlas.
According to the unsupervised learning-based network space resource automatic classification method, the attributes of network space resources are marked manually, relevant feature vectors are extracted, the value of K is determined by using a Parametric Bootstrap method, and the central positions of K clusters are stored; according to the clustering result, the network space resource framework can be amplified and supplemented, so that the network resources can be automatically classified, and the construction of a network space resource map is facilitated.
In addition, the unsupervised learning-based cyberspace resource automatic classification method according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the obtaining the K value by using the parametrica boottrap method further includes: after the characteristic vector is obtained, firstly, taking a preset value for the K value, obtaining a statistical value of K types by a K-means method, and obtaining a statistical quantity model; generating a set of data samples by a model of the statistics; acquiring indexes for estimating the quality of clustering, and increasing 1 from K +1 each time to investigate the WSS of the clustering of the samples generated by simulation one by one; and when the clustered WSS meets a preset condition, accepting K +1 classes, and sequentially increasing K after the K classes until the preset condition is not met so as to determine the K value.
Further, in an embodiment of the present invention, the classifying the initial sample space by using K-means clustering further includes: randomly selecting K eigenvectors from the initial sample space as initial clustering centers; obtaining distances between other objects in the initial sample space and the clustering center; and taking the mean value corresponding to all the objects in each category as the clustering center of the category, and acquiring the value of the target function to update the clustering center until the updated clustering center is equal to the previous clustering center or the difference value is smaller than a preset threshold value.
Further, in an embodiment of the present invention, the classifying the K classes of resources into the network space resource spectrum class corresponding to the clustering center of each class further includes: and taking the clustering center of each class as a corresponding feature vector, classifying the clustering centers into the most similar class, and classifying the network resources corresponding to other feature vectors in the class into the similar class.
Further, in an embodiment of the present invention, the classifying the K classes of resources into the network space resource spectrum class corresponding to the clustering center of each class further includes: and if the difference value between the characteristic value and each current class is larger than the upper threshold value, adding a new class to be classified into the new class.
In order to achieve the above object, another embodiment of the present invention provides an apparatus for automatically classifying cyberspace resources based on unsupervised learning, including: the acquisition module is used for acquiring resources of the network environment where the classifier is located so as to generate a resource set; the marking module is used for carrying out attribute marking on the resources of the resource set according to preset n-dimensional attributes so as to generate a new resource set; the extraction module is used for extracting the characteristics of the new resource set through an extraction function to obtain a characteristic vector and obtain an initial sample space; the initialization module is used for obtaining a K value through a Parametric Bootstrap method and classifying the initial sample space by using K mean value clustering so as to divide the initial sample space into K resources; and the classification module is used for classifying the K-type resources into the network space resource atlas corresponding to each type of clustering center so as to perfect the network space resource atlas.
According to the unsupervised learning-based network space resource automatic classification device, the network space resource attributes are marked manually, the relevant characteristic vectors are extracted, the value of K is determined by using a Parametric Bootstrap method, and the central positions of K clusters are stored; according to the clustering result, the network space resource framework can be amplified and supplemented, so that the network resources can be automatically classified, and the construction of a network space resource map is facilitated.
In addition, the unsupervised learning-based cyberspace resource automatic classification apparatus according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, after the feature vector is obtained, the initialization module is further configured to, after the feature vector is obtained, first take a preset value for the K value, obtain a statistical value of K categories by a K-means method, obtain a model of statistics, generate a data sample set by the model of statistics, obtain an index for estimating whether clustering is good or bad, increment by 1 each time from K +1, so as to examine the WSS of the cluster simulating the generated samples one by one, accept the K +1 categories when the WSS of the cluster satisfies a preset condition, and increase K sequentially thereafter until the preset condition is not satisfied, so as to determine the K value.
Further, in an embodiment of the present invention, the initialization module is further configured to arbitrarily select K feature vectors from the initial sample space as initial clustering centers, obtain distances between other objects in the initial sample space and the clustering centers, use a mean value corresponding to all objects in each category as a clustering center of the category, and obtain a value of a target function to update the clustering centers until the updated clustering centers are equal to or have a difference value smaller than a preset threshold value from a previous clustering center.
Further, in an embodiment of the present invention, the classifying module is further configured to classify the cluster center of each class as a corresponding feature vector into a most similar class, and classify network resources corresponding to other feature vectors in the class into the similar class.
Further, in an embodiment of the present invention, the classification module is further configured to add a new class to be classified into the new class when the difference between the feature value and each current class is greater than the upper bound threshold.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a method for automatically classifying cyber-space resources based on unsupervised learning according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for unsupervised learning-based automatic classification of cyberspace resources according to another embodiment of the present invention;
FIG. 3 is a diagram illustrating an embodiment of a method for automatically classifying cyberspace resources based on unsupervised learning according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an unsupervised learning-based cyberspace resource automatic classification apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a method and an apparatus for automatically classifying cyberspace resources based on unsupervised learning according to an embodiment of the present invention with reference to the accompanying drawings, and first, the method for automatically classifying cyberspace resources based on unsupervised learning according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 1 is a flowchart of a network space resource automatic classification method based on unsupervised learning according to an embodiment of the present invention.
As shown in fig. 1, the unsupervised learning-based network space resource automatic classification method includes the following steps:
in step S101, resources of the network environment where the classifier is located are collected to generate a resource set.
It can be understood that, as shown in fig. 2, first, the embodiment of the present invention collects the cyberspace resources, that is, collects the cyberspace resources of the location where the classifier is deployed, and puts them into the set U.
Specifically, as shown in FIG. 3, the collection of cyberspace resources is a classificationThe network environment where the device is located collects resources and forms a set U. Establishing a network space resource set U, representing 'unclassified network space resources', corresponding to the network space resources collected in the network environment where the classifier is located, adding the network space resources into U, and sequentially marking as C 1 ,C 2 ,…,C i …, so there is U = { C 1 ,C 2 ,…,C i ,…}。
In step S102, attribute marking is performed on resources of the resource set according to the preset n-dimensional attributes, so as to generate a new resource set.
It can be understood that, as shown in fig. 2, the resource attribute labeling according to the embodiment of the present invention is to label various network space resources according to an artificially set n-dimensional attribute. That is, according to the designed n-dimensional attributes, the attributes of the resources in U are labeled to form C.
Specifically, each element in U is a resource in the network environment where a classifier is located, and because of the diversity of network space resources, the representation forms of the elements may be different in practice, if the elements are to be classified automatically, each element is labeled uniformly, a plurality of attributes are selected to represent the element, the specific attribute number and labeling mode can be selected automatically according to the practical situation, and for the convenience of representation, each resource C is set i Selecting n attributes for labeling, then C i Can be expressed as:
Figure BDA0001680511840000051
in step S103, feature extraction is performed on the new resource set through an extraction function to obtain a feature vector, and an initial sample space is obtained.
It can be understood that, as shown in fig. 2, the feature vector extraction performed in the embodiment of the present invention is to extract the feature vector from the labeled resource attributes to obtain the feature vector. That is, the embodiment of the present invention customizes the Feature vector Extraction function Feature _ Extraction () as required, and applies the Feature vector Extraction function Feature _ Extraction () to C i Extracting the characteristic vector to obtain R i And forms an initial sample space Z.
Specifically, according to the number selection and actual situation of the attributes in the step S102, a Feature vector Extraction function Feature _ Extraction () is customized, and the labeled resource C is used to perform the Extraction i Performing feature extraction to obtain C i Characteristic vector R of i 。R i ←Feature_Extraction(C i ) All of R are i As an initial sample Z, Z = { R 1 ,R 2 ,…,R n }。
In step S104, a K value is obtained by a parametrica boottrap method, and the initial sample space is classified by using K-means clustering, so as to divide the initial sample space into K types of resources.
It can be understood that, as shown in fig. 2, in the embodiment of the present invention, resource clustering is performed by using a K-means clustering method to divide feature vectors corresponding to resources into K classes, and a parametrica bootstrapping method may be used to obtain a K value, and meanwhile, a K-means clustering method may be used to classify an initial sample space Z and divide the sample space into K classes.
That is, the automatic classification of cyberspace resources is implemented by using a K-means clustering (K-means) algorithm, and due to the diversity of cyberspace resources, the selection of K values in the K-means clustering becomes difficult to predict, so that the step is divided into two sub-parts: determining a K value; and clustering by using K means to realize automatic classification.
Further, in an embodiment of the present invention, obtaining the K value by the parametrica boottrap method further comprises: after the characteristic vector is obtained, firstly taking a preset value from the K value, obtaining a statistical value of the K type by a K-means method, and obtaining a statistical quantity model; generating a set of data samples by a model of the statistics; acquiring indexes for estimating the quality of clustering, and increasing 1 from K +1 each time to investigate the WSS of the clustering of the samples generated by simulation one by one; and when the clustered WSS meets the preset condition, accepting the K +1 classes, and sequentially increasing K in the following steps until the preset condition is not met so as to determine the K value.
In particular, in the present classifier implementation, for the determination of the value of K, the parametrica boottrap method is used. The Parametric Bootstrap method is an extension of the Bootstrap method, and Bootstrap is an important statistical method for estimating statistical variance and further performing interval estimation, and is also called a Bootstrap method. The central idea is to construct some estimated confidence interval by resampling from the samples. In an abstract way, the estimation obtained through the sample does not exhaust the information in the sample, and Bootstrap utilizes resampling to exert the residual value on the constructed confidence interval. A key point of the Bootstrap is that the size after each resampling is the same as the original data sample, which is also followed by the parametrica Bootstrap method, but unlike the Bootstrap method, which does not re-sample the original set, a specific mathematical model is set, and then the sample data is re-simulated by this model, and can be performed many times.
In the classifier, the specific operation is as follows:
(1) Obtaining a feature vector R i Then, the K value is taken as a smaller value (for example, K = 2), and then some statistical values such as the mean value and covariance matrix of the K types can be obtained according to the K-means method.
(2) Setting a mathematical model according to actual scenes and experience, assuming that the original data is randomly generated from the model (for example, assuming a gaussian model), so that the data sample set can be regenerated by using the model with the corresponding statistics obtained in step (1), and the regenerated size is the same as the original size.
(3) And designing an index (such as the error WSS in the overall class) for estimating the cluster quality, gradually increasing 1 from K +1 each time, and inspecting the clustered WSS of the sample generated by simulation one by one.
(4) The determination of the value of K follows a strategy: as long as the WSS calculated by K-means clustering of the real data of the K +1 class is at least smaller than mu-x WSS under the simulation points of the K class (mu is a manually set threshold value, generally more than or equal to 85% and less than or equal to 1), then we accept the K +1 class; and sequentially increasing k until the condition of being less than or equal to is not met.
Further, in an embodiment of the present invention, classifying the initial sample space by using K-means clustering further includes: randomly selecting K eigenvectors from an initial sample space as initial clustering centers; obtaining the distance between other objects in the initial sample space and the clustering center; and taking the mean values corresponding to all the objects in each category as the clustering centers of the categories, and acquiring the value of the target function to update the clustering centers until the updated clustering centers are equal to the former clustering centers or the difference value is smaller than a preset threshold value.
Specifically, the algorithm for realizing automatic resource classification by using K-means clustering is as follows:
(1) From the initial sample Z, K feature vectors are arbitrarily selected as initial cluster centers.
(2) For other objects Ri in the initial sample Z, their distance from the cluster center is calculated, this
The distance can be calculated by the Euclidean distance between every two points, and can also be selected according to the actual efficiency. Push button
And classifying the cluster centers to the class corresponding to the nearest cluster center according to the nearest criterion.
(3) Taking the mean value corresponding to all the objects in each category as the clustering center of the category to calculate the target
And (5) updating the clustering center according to the value of the standard function.
(4) Iterating the step 2 and the step 3 until the newly generated cluster center is equal to the cluster center of the previous step
Or their difference is less than a specified threshold ζ.
In step S105, the K-class resources are classified into the cyberspace resource atlas corresponding to each class of clustering center, so as to perfect the cyberspace resource atlas.
It can be understood that, as shown in fig. 2, in the resource classification according to the embodiment of the present invention, each resource in U is classified into a class in which a representative feature vector is located according to a classified result, so as to perfect a network space resource map. That is, the K-class resources divided in S104 are classified into the cluster center R 'of each class' j And the corresponding network space resource map class.
Further, in an embodiment of the present invention, classifying the K classes of resources into the network space resource graph class corresponding to the clustering center of each class further includes: and taking the clustering center of each class as a corresponding feature vector, classifying the clustering centers into the most similar classes, and classifying the network resources corresponding to other feature vectors in the class into the similar classes.
Specifically, in the embodiment of the invention, the step S104 is implemented by dividing n feature vectors into K classes, and clustering centers R 'of each class' 1 ,R’ 2 ,…,R’ k R 'is taken as a representative feature vector of the class' j Compare it with each class in the existing network space resource map, assign it to the class it is most similar to, and assign R' j The network resources corresponding to other feature vectors in the class are grouped together in the class.
Further, in an embodiment of the present invention, classifying the K classes of resources into the network space resource spectrum class corresponding to the clustering center of each class further includes: and if the difference value between the characteristic value and each current class is larger than the upper threshold value, adding a new class to be classified into the new class.
Concretely, if R' j If the difference value of the characteristic value of (1) and each class in the existing frame is greater than an upper bound threshold value beta, adding a new class in the existing frame, and adding R' j The network resources corresponding to other feature vectors in the class are included in the new class. In addition, a specific implementation of an embodiment of the present invention is shown in fig. 3.
According to the unsupervised learning-based network space resource automatic classification method provided by the embodiment of the invention, the network space resource attributes are manually marked, the relevant characteristic vectors are extracted, the value of K is determined by using a Parametric Bootstrap method, the central positions of K clusters are stored, the network space resource framework can be amplified and supplemented according to the clustering result, and the construction of a network space resource map is facilitated.
Next, an unsupervised learning-based cyberspace resource automatic classification apparatus according to an embodiment of the present invention will be described with reference to the drawings.
Fig. 4 is a schematic structural diagram of an unsupervised learning-based cyberspace resource automatic classification apparatus according to an embodiment of the present invention.
As shown in fig. 4, the apparatus 10 for automatically classifying cyberspace resources based on unsupervised learning includes: an acquisition module 100, a labeling module 200, an extraction module 300, an initialization module 400, and a categorization module 500.
The collecting module 100 is configured to collect resources of a network environment where the classifier is located, so as to generate a resource set. The marking module 200 is configured to mark the attribute of the resource set according to the preset n-dimensional attribute to generate a new resource set. The extraction module 300 is configured to perform feature extraction on the new resource set through an extraction function to obtain a feature vector, and obtain an initial sample space. The initialization module 400 is configured to obtain a K value by using a parametrica boottrap method, and classify an initial sample space by using K-means clustering, so as to divide the initial sample space into K types of resources. The classifying module 500 is configured to classify the K-class resources into the network space resource map class corresponding to the clustering center of each class, so as to improve the network space resource map. The device 10 of the embodiment of the invention can amplify and supplement the network space resource framework according to the clustering result, thereby being beneficial to the construction of the network space resource map.
Further, in an embodiment of the present invention, the initialization module 400 is further configured to, after obtaining the feature vector, first take a preset value for the K value, obtain a statistical value of the K categories by a K-means method, obtain a model of the statistical quantity, generate a data sample set by the model of the statistical quantity, obtain an index for estimating whether the clustering is good or bad, increment by 1 each time from K +1 to investigate the clustered WSS simulating the generated samples one by one, accept the K +1 categories when the clustered WSS satisfies a preset condition, and sequentially increase K thereafter until the preset condition is not satisfied to determine the K value.
Further, in an embodiment of the present invention, the initialization module 400 is further configured to arbitrarily select K feature vectors from the initial sample space as initial clustering centers, obtain distances between other objects in the initial sample space and the clustering centers, use a mean value corresponding to all objects in each category as the clustering centers of the categories, and obtain a value of the objective function, so as to update the clustering centers until the updated clustering centers are equal to or have a difference value smaller than a preset threshold value from the previous clustering centers.
Further, in an embodiment of the present invention, the classifying module 500 is further configured to classify the cluster center of each class as a corresponding feature vector into a most similar class, and classify network resources corresponding to other feature vectors in the class into the similar class.
Further, in an embodiment of the present invention, the classifying module 500 is further configured to add a new class to the new class when the difference between the feature value and each current class is greater than the upper threshold.
It should be noted that the foregoing explanation of the embodiment of the method for automatically classifying network space resources based on unsupervised learning is also applicable to the apparatus for automatically classifying network space resources based on unsupervised learning of this embodiment, and is not repeated here.
According to the unsupervised learning-based network space resource automatic classification device provided by the embodiment of the invention, the attributes of network space resources are marked manually, relevant characteristic vectors are extracted, the value of K is determined by using a Parametric Bootstrap method, the central positions of K clusters are stored, and the network space resource frame can be amplified and supplemented according to the clustering result, so that the construction of a network space resource map is facilitated.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise.
In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (2)

1. A network space resource automatic classification method based on unsupervised learning is characterized by comprising the following steps:
collecting resources of a network environment where the classifier is located to generate a resource set;
performing attribute marking on the resources of the resource set according to a preset n-dimensional attribute to generate a new resource set;
extracting the characteristics of the new resource set through an extraction function to obtain a characteristic vector and obtain an initial sample space;
obtaining a K value through a Parametric Bootstrap method, and classifying the initial sample space by using K mean value clustering so as to divide the initial sample space into K resources; and
classifying the K-class resources into the network space resource atlas corresponding to each class of clustering center to perfect the network space resource atlas, and classifying the K-class resources into the network space resource atlas corresponding to each class of clustering center further comprises: taking the clustering center of each class as a corresponding feature vector, classifying the clustering centers into the most similar classes, and classifying network resources corresponding to other feature vectors in the class into the similar classes; if the difference value between the characteristic value and each current class is larger than the upper threshold value, adding a new class to be classified into the new class;
classifying the initial sample space using K-means clustering, further comprising:
randomly selecting K eigenvectors from the initial sample space as initial clustering centers;
obtaining distances between other objects in the initial sample space and the clustering center;
taking the mean values corresponding to all objects in each category as the clustering centers of the categories, and acquiring the value of a target function to update the clustering centers until the updated clustering centers are equal to the former clustering centers or the difference value is smaller than a preset threshold value;
and further comprising the following steps of realizing automatic resource classification by using K-means clustering:
(1) Randomly selecting K eigenvectors from an initial sample Z as initial clustering centers;
(2) For other objects Ri in the initial sample Z, calculating the distance between the Ri and the clustering center, wherein the distance is calculated by the Euclidean distance between every two points or selected according to actual efficiency, and the Ri is classified into a class corresponding to the nearest clustering center according to the nearest criterion;
(3) Taking the mean values corresponding to all the objects in each category as the clustering center of each category, calculating the value of a target function, and updating the clustering center;
(4) Iterating the step 2 and the step 3 until the newly generated clustering center is equal to the clustering center of the previous step or the difference value is smaller than a specified threshold value zeta;
obtaining a K value by a Parametric Bootstrap method, further comprising:
after the characteristic vector is obtained, firstly, a preset value is taken for the K value, a statistical value of K types is obtained through a K-means method, and a statistical model is obtained;
generating a set of data samples by a model of the statistics;
acquiring indexes for estimating the quality of clustering, and increasing 1 from K +1 each time to investigate the WSS of the clustering of the samples generated by simulation one by one;
and when the clustered WSS meets a preset condition, accepting K +1 classes, and sequentially increasing K after the K classes until the preset condition is not met so as to determine the K value.
2. An automatic classification device for network space resources based on unsupervised learning, comprising:
the acquisition module is used for acquiring resources of the network environment where the classifier is located so as to generate a resource set;
the marking module is used for carrying out attribute marking on the resources of the resource set according to preset n-dimensional attributes so as to generate a new resource set;
the extraction module is used for extracting the characteristics of the new resource set through an extraction function to obtain a characteristic vector and obtain an initial sample space;
the initialization module is used for obtaining a K value through a Parametric Bootstrap method and classifying the initial sample space by using K mean value clustering so as to divide the initial sample space into K resources; and
the classification module is used for classifying the K resources into the network space resource map classes corresponding to the clustering centers of each class so as to perfect the network space resource maps; the classification module is further used for taking the clustering center of each class as a corresponding characteristic vector, classifying the clustering center into the most similar class, and classifying the network resources corresponding to other characteristic vectors in the class into the similar class; when the difference value between the characteristic value and each current class is larger than the upper threshold value, adding a new class to be classified into the new class;
the initialization module is further used for randomly selecting K eigenvectors from the initial sample space as initial clustering centers, obtaining the distances between other objects in the initial sample space and the clustering centers, taking the mean values corresponding to all the objects in each category as the clustering centers of the categories, and obtaining the value of a target function so as to update the clustering centers until the updated clustering centers are equal to the previous clustering centers or the difference value is smaller than a preset threshold value;
the initialization module is further used for realizing automatic resource classification by using K-means clustering:
the selecting unit is used for randomly selecting K eigenvectors from the initial sample Z as an initial clustering center;
a calculating unit, configured to calculate, for other objects Ri in the initial sample Z, a distance between the Ri and the cluster center, where the distance is calculated by using an euclidean distance between every two points or by selecting according to actual efficiency, and the Ri is classified into a class corresponding to a closest cluster center according to a closest criterion;
the updating unit is used for taking the mean value corresponding to all the objects in each category as the clustering center of each category, calculating the value of the target function and updating the clustering center;
the iteration unit is used for iterating the first calculation unit and the updating unit until the newly generated clustering center is equal to the clustering center in the previous step or the difference value is smaller than a specified threshold value zeta;
the initialization module is further used for taking a preset value from the K value after the characteristic vector is obtained, obtaining a statistical value of K types by a K-means method, obtaining a statistical model, generating a data sample set by the statistical model, obtaining indexes for estimating the cluster quality, increasing 1 from K +1 each time to investigate the clustered WSS of the simulated generated samples one by one, and accepting the K +1 types when the clustered WSS meets preset conditions, and sequentially increasing K later until the preset conditions are not met to determine the K value.
CN201810548471.4A 2018-05-31 2018-05-31 Unsupervised learning-based network space resource automatic classification method and unsupervised learning-based network space resource automatic classification device Active CN108846429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810548471.4A CN108846429B (en) 2018-05-31 2018-05-31 Unsupervised learning-based network space resource automatic classification method and unsupervised learning-based network space resource automatic classification device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810548471.4A CN108846429B (en) 2018-05-31 2018-05-31 Unsupervised learning-based network space resource automatic classification method and unsupervised learning-based network space resource automatic classification device

Publications (2)

Publication Number Publication Date
CN108846429A CN108846429A (en) 2018-11-20
CN108846429B true CN108846429B (en) 2023-04-07

Family

ID=64210292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810548471.4A Active CN108846429B (en) 2018-05-31 2018-05-31 Unsupervised learning-based network space resource automatic classification method and unsupervised learning-based network space resource automatic classification device

Country Status (1)

Country Link
CN (1) CN108846429B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263227B (en) * 2019-05-15 2023-07-18 创新先进技术有限公司 Group partner discovery method and system based on graph neural network
CN114244824B (en) * 2021-11-25 2024-05-03 国家计算机网络与信息安全管理中心河北分中心 Method for quickly identifying identity of network space WEB type asset risk Server

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810386A (en) * 2014-02-13 2014-05-21 国家电网公司 Relay protection device clustering method based on unsupervised learning
CN107016068A (en) * 2017-03-21 2017-08-04 深圳前海乘方互联网金融服务有限公司 Knowledge mapping construction method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102131099B1 (en) * 2014-02-13 2020-08-05 삼성전자 주식회사 Dynamically modifying elements of User Interface based on knowledge graph
CN105608091B (en) * 2014-11-21 2019-02-05 中国移动通信集团公司 A kind of construction method and device of dynamic medical knowledge base
CN105357063B (en) * 2015-12-14 2019-09-10 金润方舟科技股份有限公司 A kind of cyberspace security postures real-time detection method
CN106528768A (en) * 2016-11-04 2017-03-22 北京中电普华信息技术有限公司 Consultation hotspot analysis method and device
CN106708016B (en) * 2016-12-22 2019-12-10 中国石油天然气股份有限公司 fault monitoring method and device
CN106850333B (en) * 2016-12-23 2019-11-29 中国科学院信息工程研究所 A kind of network equipment recognition methods and system based on feedback cluster
CN107819698A (en) * 2017-11-10 2018-03-20 北京邮电大学 A kind of net flow assorted method based on semi-supervised learning, computer equipment
CN107886949B (en) * 2017-11-24 2021-04-30 科大讯飞股份有限公司 Content recommendation method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810386A (en) * 2014-02-13 2014-05-21 国家电网公司 Relay protection device clustering method based on unsupervised learning
CN107016068A (en) * 2017-03-21 2017-08-04 深圳前海乘方互联网金融服务有限公司 Knowledge mapping construction method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Building thesaurus-based knowledge graph based on schema layer;Bo Qiao 等;《Cluster Computing》;第20卷;全文 *

Also Published As

Publication number Publication date
CN108846429A (en) 2018-11-20

Similar Documents

Publication Publication Date Title
CN105488539B (en) The predictor method and device of the generation method and device of disaggregated model, power system capacity
CN103119582B (en) Reduce the dissimilar degree between the first multivariate data group and the second multivariate data group
CN104966105A (en) Robust machine error retrieving method and system
CN111667050B (en) Metric learning method, device, equipment and storage medium
CN111611486B (en) Deep learning sample labeling method based on online education big data
CN111008693B (en) Network model construction method, system and medium based on data compression
JP2012042990A (en) Image identification information adding program and image identification information adding apparatus
CN105760888A (en) Neighborhood rough set ensemble learning method based on attribute clustering
CN112001422B (en) Image mark estimation method based on deep Bayesian learning
CN112132014B (en) Target re-identification method and system based on non-supervised pyramid similarity learning
CN108681742B (en) Analysis method for analyzing sensitivity of driver driving behavior to vehicle energy consumption
CN103927510A (en) Image Identification Apparatus And Image Identification Method
CN111046930A (en) Power supply service satisfaction influence factor identification method based on decision tree algorithm
CN108846429B (en) Unsupervised learning-based network space resource automatic classification method and unsupervised learning-based network space resource automatic classification device
CN109189876A (en) A kind of data processing method and device
JP5139874B2 (en) LABELING DEVICE, LABELING PROGRAM, RECORDING MEDIUM CONTAINING LABELING PROGRAM, AND LABELING METHOD
CN108681505B (en) Test case ordering method and device based on decision tree
US11082125B2 (en) Systems and methods for expert guided rule based identification of relevant planetary images for downlinking over limited bandwidth
CN117294727A (en) Cloud edge end collaborative management method based on cloud primordia and container technology
CN109213831A (en) Event detecting method and device calculate equipment and storage medium
CN114898804A (en) Biomarker determination method and device, storage medium and electronic equipment
CN115081515A (en) Energy efficiency evaluation model construction method and device, terminal and storage medium
CN113066528B (en) Protein classification method based on active semi-supervised graph neural network
CN104468276B (en) Network flow identification method based on random sampling multi-categorizer
CN109509517A (en) A kind of medical test Index for examination modified method automatically

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant