CN108846429B

CN108846429B - Unsupervised learning-based network space resource automatic classification method and unsupervised learning-based network space resource automatic classification device

Info

Publication number: CN108846429B
Application number: CN201810548471.4A
Authority: CN
Inventors: 王继龙; 缪葱葱; 徐超
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2023-04-07
Anticipated expiration: 2038-05-31
Also published as: CN108846429A

Abstract

The invention discloses a network space resource automatic classification method and a device based on unsupervised learning, wherein the method comprises the following steps: collecting resources of a network environment where the classifier is located to generate a resource set; performing attribute marking on resources of the resource set according to preset n-dimensional attributes to generate a new resource set; extracting the characteristics of the new resource set through an extraction function to obtain a characteristic vector and obtain an initial sample space; obtaining a K value by a Parametric Bootstrap method, and classifying the initial sample space by using K mean value clustering so as to divide the initial sample space into K resources; and classifying the K resources into the network space resource atlas corresponding to each clustering center so as to perfect the network space resource atlas. The method can amplify and supplement the network space resource framework according to the clustering result, and is beneficial to the construction of the network space resource map.

Description

Unsupervised learning-based network space resource automatic classification method and device

Technical Field

The invention relates to the technical field of network space surveying and mapping, in particular to a network space resource automatic classification method and device based on unsupervised learning.

Background

The network space has become the fifth territory of human society, comprises multiple dimensions such as politics, economy, military, culture, society and ecology, and is developing into a new world parallel to the physical world. In recent years, with the development of internet technology and the diversification of the internet, network space resources show the trend of diversification development, but actually, network space still lacks the most basic conceptual model and the space theoretical basis at present. Many resources in the network exist objectively, but so far we have not named them systematically and comprehensively, especially not really standing inside the network space to name and describe them. The diversity and complexity of network resources bring considerable trouble to managers and users.

The development of the internet has driven the number of cyberspace resources and the amount of data to grow explosively. The network space resources refer to entity resources which can be directly perceived in a network space, and include various network application services, information resources and virtual subjects, in order to better express the connection relationship between the network space and the physical world, the network infrastructure is particularly brought into the research scope of the network space resources, and meanwhile, the network space resources are used more systematically and more efficiently, the safety of the network, namely the territory, is improved, and the classification of the network space resources is particularly important.

The complexity, diversity, increasing speed of the number of the network space resources and the birth speed of the novel network space resources all determine that the network space resources are infeasible to be classified only by manually marking, so that the network space resources are necessarily classified automatically by using an algorithm. The creatures all have corresponding maps, and network space resources also need the maps to classify the creatures.

Disclosure of Invention

The present invention is directed to solving, at least in part, one of the technical problems in the related art.

Therefore, an object of the present invention is to provide a method for automatically classifying network space resources based on unsupervised learning, which can automatically classify network resources and is beneficial to the construction of a network space resource map.

The invention also aims to provide a network space resource automatic classification device based on unsupervised learning.

In order to achieve the above object, an embodiment of the present invention provides a method for automatically classifying network space resources based on unsupervised learning, including the following steps: collecting resources of a network environment where the classifier is located to generate a resource set; performing attribute marking on the resources of the resource set according to preset n-dimensional attributes to generate a new resource set; extracting the characteristics of the new resource set through an extraction function to obtain a characteristic vector and obtain an initial sample space; obtaining a K value by a Parametric Bootstrap method, and classifying the initial sample space by using K mean value clustering so as to divide the initial sample space into K resources; and classifying the K resources into the network space resource atlas corresponding to the clustering center of each class so as to perfect the network space resource atlas.

According to the unsupervised learning-based network space resource automatic classification method, the attributes of network space resources are marked manually, relevant feature vectors are extracted, the value of K is determined by using a Parametric Bootstrap method, and the central positions of K clusters are stored; according to the clustering result, the network space resource framework can be amplified and supplemented, so that the network resources can be automatically classified, and the construction of a network space resource map is facilitated.

In addition, the unsupervised learning-based cyberspace resource automatic classification method according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, the obtaining the K value by using the parametrica boottrap method further includes: after the characteristic vector is obtained, firstly, taking a preset value for the K value, obtaining a statistical value of K types by a K-means method, and obtaining a statistical quantity model; generating a set of data samples by a model of the statistics; acquiring indexes for estimating the quality of clustering, and increasing 1 from K +1 each time to investigate the WSS of the clustering of the samples generated by simulation one by one; and when the clustered WSS meets a preset condition, accepting K +1 classes, and sequentially increasing K after the K classes until the preset condition is not met so as to determine the K value.

Further, in an embodiment of the present invention, the classifying the initial sample space by using K-means clustering further includes: randomly selecting K eigenvectors from the initial sample space as initial clustering centers; obtaining distances between other objects in the initial sample space and the clustering center; and taking the mean value corresponding to all the objects in each category as the clustering center of the category, and acquiring the value of the target function to update the clustering center until the updated clustering center is equal to the previous clustering center or the difference value is smaller than a preset threshold value.

Further, in an embodiment of the present invention, the classifying the K classes of resources into the network space resource spectrum class corresponding to the clustering center of each class further includes: and taking the clustering center of each class as a corresponding feature vector, classifying the clustering centers into the most similar class, and classifying the network resources corresponding to other feature vectors in the class into the similar class.

Further, in an embodiment of the present invention, the classifying the K classes of resources into the network space resource spectrum class corresponding to the clustering center of each class further includes: and if the difference value between the characteristic value and each current class is larger than the upper threshold value, adding a new class to be classified into the new class.

In order to achieve the above object, another embodiment of the present invention provides an apparatus for automatically classifying cyberspace resources based on unsupervised learning, including: the acquisition module is used for acquiring resources of the network environment where the classifier is located so as to generate a resource set; the marking module is used for carrying out attribute marking on the resources of the resource set according to preset n-dimensional attributes so as to generate a new resource set; the extraction module is used for extracting the characteristics of the new resource set through an extraction function to obtain a characteristic vector and obtain an initial sample space; the initialization module is used for obtaining a K value through a Parametric Bootstrap method and classifying the initial sample space by using K mean value clustering so as to divide the initial sample space into K resources; and the classification module is used for classifying the K-type resources into the network space resource atlas corresponding to each type of clustering center so as to perfect the network space resource atlas.

According to the unsupervised learning-based network space resource automatic classification device, the network space resource attributes are marked manually, the relevant characteristic vectors are extracted, the value of K is determined by using a Parametric Bootstrap method, and the central positions of K clusters are stored; according to the clustering result, the network space resource framework can be amplified and supplemented, so that the network resources can be automatically classified, and the construction of a network space resource map is facilitated.

In addition, the unsupervised learning-based cyberspace resource automatic classification apparatus according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, after the feature vector is obtained, the initialization module is further configured to, after the feature vector is obtained, first take a preset value for the K value, obtain a statistical value of K categories by a K-means method, obtain a model of statistics, generate a data sample set by the model of statistics, obtain an index for estimating whether clustering is good or bad, increment by 1 each time from K +1, so as to examine the WSS of the cluster simulating the generated samples one by one, accept the K +1 categories when the WSS of the cluster satisfies a preset condition, and increase K sequentially thereafter until the preset condition is not satisfied, so as to determine the K value.

Further, in an embodiment of the present invention, the initialization module is further configured to arbitrarily select K feature vectors from the initial sample space as initial clustering centers, obtain distances between other objects in the initial sample space and the clustering centers, use a mean value corresponding to all objects in each category as a clustering center of the category, and obtain a value of a target function to update the clustering centers until the updated clustering centers are equal to or have a difference value smaller than a preset threshold value from a previous clustering center.

Further, in an embodiment of the present invention, the classifying module is further configured to classify the cluster center of each class as a corresponding feature vector into a most similar class, and classify network resources corresponding to other feature vectors in the class into the similar class.

Further, in an embodiment of the present invention, the classification module is further configured to add a new class to be classified into the new class when the difference between the feature value and each current class is greater than the upper bound threshold.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a method for automatically classifying cyber-space resources based on unsupervised learning according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for unsupervised learning-based automatic classification of cyberspace resources according to another embodiment of the present invention;

FIG. 3 is a diagram illustrating an embodiment of a method for automatically classifying cyberspace resources based on unsupervised learning according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an unsupervised learning-based cyberspace resource automatic classification apparatus according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a method and an apparatus for automatically classifying cyberspace resources based on unsupervised learning according to an embodiment of the present invention with reference to the accompanying drawings, and first, the method for automatically classifying cyberspace resources based on unsupervised learning according to an embodiment of the present invention will be described with reference to the accompanying drawings.

Fig. 1 is a flowchart of a network space resource automatic classification method based on unsupervised learning according to an embodiment of the present invention.

As shown in fig. 1, the unsupervised learning-based network space resource automatic classification method includes the following steps:

in step S101, resources of the network environment where the classifier is located are collected to generate a resource set.

It can be understood that, as shown in fig. 2, first, the embodiment of the present invention collects the cyberspace resources, that is, collects the cyberspace resources of the location where the classifier is deployed, and puts them into the set U.

Specifically, as shown in FIG. 3, the collection of cyberspace resources is a classificationThe network environment where the device is located collects resources and forms a set U. Establishing a network space resource set U, representing 'unclassified network space resources', corresponding to the network space resources collected in the network environment where the classifier is located, adding the network space resources into U, and sequentially marking as C ₁ ,C ₂ ,…,C _i …, so there is U = { C ₁ ,C ₂ ,…,C _i ,…}。

In step S102, attribute marking is performed on resources of the resource set according to the preset n-dimensional attributes, so as to generate a new resource set.

It can be understood that, as shown in fig. 2, the resource attribute labeling according to the embodiment of the present invention is to label various network space resources according to an artificially set n-dimensional attribute. That is, according to the designed n-dimensional attributes, the attributes of the resources in U are labeled to form C.

Specifically, each element in U is a resource in the network environment where a classifier is located, and because of the diversity of network space resources, the representation forms of the elements may be different in practice, if the elements are to be classified automatically, each element is labeled uniformly, a plurality of attributes are selected to represent the element, the specific attribute number and labeling mode can be selected automatically according to the practical situation, and for the convenience of representation, each resource C is set _i Selecting n attributes for labeling, then C _i Can be expressed as:

in step S103, feature extraction is performed on the new resource set through an extraction function to obtain a feature vector, and an initial sample space is obtained.

It can be understood that, as shown in fig. 2, the feature vector extraction performed in the embodiment of the present invention is to extract the feature vector from the labeled resource attributes to obtain the feature vector. That is, the embodiment of the present invention customizes the Feature vector Extraction function Feature _ Extraction () as required, and applies the Feature vector Extraction function Feature _ Extraction () to C _i Extracting the characteristic vector to obtain R _i And forms an initial sample space Z.

Specifically, according to the number selection and actual situation of the attributes in the step S102, a Feature vector Extraction function Feature _ Extraction () is customized, and the labeled resource C is used to perform the Extraction _i Performing feature extraction to obtain C _i Characteristic vector R of _i 。R _i ←Feature_Extraction(C _i ) All of R are _i As an initial sample Z, Z = { R ₁ ,R ₂ ,…,R _n }。

In step S104, a K value is obtained by a parametrica boottrap method, and the initial sample space is classified by using K-means clustering, so as to divide the initial sample space into K types of resources.

It can be understood that, as shown in fig. 2, in the embodiment of the present invention, resource clustering is performed by using a K-means clustering method to divide feature vectors corresponding to resources into K classes, and a parametrica bootstrapping method may be used to obtain a K value, and meanwhile, a K-means clustering method may be used to classify an initial sample space Z and divide the sample space into K classes.

That is, the automatic classification of cyberspace resources is implemented by using a K-means clustering (K-means) algorithm, and due to the diversity of cyberspace resources, the selection of K values in the K-means clustering becomes difficult to predict, so that the step is divided into two sub-parts: determining a K value; and clustering by using K means to realize automatic classification.

Further, in an embodiment of the present invention, obtaining the K value by the parametrica boottrap method further comprises: after the characteristic vector is obtained, firstly taking a preset value from the K value, obtaining a statistical value of the K type by a K-means method, and obtaining a statistical quantity model; generating a set of data samples by a model of the statistics; acquiring indexes for estimating the quality of clustering, and increasing 1 from K +1 each time to investigate the WSS of the clustering of the samples generated by simulation one by one; and when the clustered WSS meets the preset condition, accepting the K +1 classes, and sequentially increasing K in the following steps until the preset condition is not met so as to determine the K value.

In particular, in the present classifier implementation, for the determination of the value of K, the parametrica boottrap method is used. The Parametric Bootstrap method is an extension of the Bootstrap method, and Bootstrap is an important statistical method for estimating statistical variance and further performing interval estimation, and is also called a Bootstrap method. The central idea is to construct some estimated confidence interval by resampling from the samples. In an abstract way, the estimation obtained through the sample does not exhaust the information in the sample, and Bootstrap utilizes resampling to exert the residual value on the constructed confidence interval. A key point of the Bootstrap is that the size after each resampling is the same as the original data sample, which is also followed by the parametrica Bootstrap method, but unlike the Bootstrap method, which does not re-sample the original set, a specific mathematical model is set, and then the sample data is re-simulated by this model, and can be performed many times.

In the classifier, the specific operation is as follows:

(1) Obtaining a feature vector R _i Then, the K value is taken as a smaller value (for example, K = 2), and then some statistical values such as the mean value and covariance matrix of the K types can be obtained according to the K-means method.

(2) Setting a mathematical model according to actual scenes and experience, assuming that the original data is randomly generated from the model (for example, assuming a gaussian model), so that the data sample set can be regenerated by using the model with the corresponding statistics obtained in step (1), and the regenerated size is the same as the original size.

(3) And designing an index (such as the error WSS in the overall class) for estimating the cluster quality, gradually increasing 1 from K +1 each time, and inspecting the clustered WSS of the sample generated by simulation one by one.

(4) The determination of the value of K follows a strategy: as long as the WSS calculated by K-means clustering of the real data of the K +1 class is at least smaller than mu-x WSS under the simulation points of the K class (mu is a manually set threshold value, generally more than or equal to 85% and less than or equal to 1), then we accept the K +1 class; and sequentially increasing k until the condition of being less than or equal to is not met.

Further, in an embodiment of the present invention, classifying the initial sample space by using K-means clustering further includes: randomly selecting K eigenvectors from an initial sample space as initial clustering centers; obtaining the distance between other objects in the initial sample space and the clustering center; and taking the mean values corresponding to all the objects in each category as the clustering centers of the categories, and acquiring the value of the target function to update the clustering centers until the updated clustering centers are equal to the former clustering centers or the difference value is smaller than a preset threshold value.

Specifically, the algorithm for realizing automatic resource classification by using K-means clustering is as follows:

(1) From the initial sample Z, K feature vectors are arbitrarily selected as initial cluster centers.

(2) For other objects Ri in the initial sample Z, their distance from the cluster center is calculated, this

The distance can be calculated by the Euclidean distance between every two points, and can also be selected according to the actual efficiency. Push button

And classifying the cluster centers to the class corresponding to the nearest cluster center according to the nearest criterion.

(3) Taking the mean value corresponding to all the objects in each category as the clustering center of the category to calculate the target

And (5) updating the clustering center according to the value of the standard function.

(4) Iterating the step 2 and the step 3 until the newly generated cluster center is equal to the cluster center of the previous step

Or their difference is less than a specified threshold ζ.

In step S105, the K-class resources are classified into the cyberspace resource atlas corresponding to each class of clustering center, so as to perfect the cyberspace resource atlas.

It can be understood that, as shown in fig. 2, in the resource classification according to the embodiment of the present invention, each resource in U is classified into a class in which a representative feature vector is located according to a classified result, so as to perfect a network space resource map. That is, the K-class resources divided in S104 are classified into the cluster center R 'of each class' _j And the corresponding network space resource map class.

Further, in an embodiment of the present invention, classifying the K classes of resources into the network space resource graph class corresponding to the clustering center of each class further includes: and taking the clustering center of each class as a corresponding feature vector, classifying the clustering centers into the most similar classes, and classifying the network resources corresponding to other feature vectors in the class into the similar classes.

Specifically, in the embodiment of the invention, the step S104 is implemented by dividing n feature vectors into K classes, and clustering centers R 'of each class' ₁ ,R’ ₂ ,…,R’ _k R 'is taken as a representative feature vector of the class' _j Compare it with each class in the existing network space resource map, assign it to the class it is most similar to, and assign R' _j The network resources corresponding to other feature vectors in the class are grouped together in the class.

Further, in an embodiment of the present invention, classifying the K classes of resources into the network space resource spectrum class corresponding to the clustering center of each class further includes: and if the difference value between the characteristic value and each current class is larger than the upper threshold value, adding a new class to be classified into the new class.

Concretely, if R' _j If the difference value of the characteristic value of (1) and each class in the existing frame is greater than an upper bound threshold value beta, adding a new class in the existing frame, and adding R' _j The network resources corresponding to other feature vectors in the class are included in the new class. In addition, a specific implementation of an embodiment of the present invention is shown in fig. 3.

According to the unsupervised learning-based network space resource automatic classification method provided by the embodiment of the invention, the network space resource attributes are manually marked, the relevant characteristic vectors are extracted, the value of K is determined by using a Parametric Bootstrap method, the central positions of K clusters are stored, the network space resource framework can be amplified and supplemented according to the clustering result, and the construction of a network space resource map is facilitated.

Next, an unsupervised learning-based cyberspace resource automatic classification apparatus according to an embodiment of the present invention will be described with reference to the drawings.

As shown in fig. 4, the apparatus 10 for automatically classifying cyberspace resources based on unsupervised learning includes: an acquisition module 100, a labeling module 200, an extraction module 300, an initialization module 400, and a categorization module 500.

The collecting module 100 is configured to collect resources of a network environment where the classifier is located, so as to generate a resource set. The marking module 200 is configured to mark the attribute of the resource set according to the preset n-dimensional attribute to generate a new resource set. The extraction module 300 is configured to perform feature extraction on the new resource set through an extraction function to obtain a feature vector, and obtain an initial sample space. The initialization module 400 is configured to obtain a K value by using a parametrica boottrap method, and classify an initial sample space by using K-means clustering, so as to divide the initial sample space into K types of resources. The classifying module 500 is configured to classify the K-class resources into the network space resource map class corresponding to the clustering center of each class, so as to improve the network space resource map. The device 10 of the embodiment of the invention can amplify and supplement the network space resource framework according to the clustering result, thereby being beneficial to the construction of the network space resource map.

Further, in an embodiment of the present invention, the initialization module 400 is further configured to, after obtaining the feature vector, first take a preset value for the K value, obtain a statistical value of the K categories by a K-means method, obtain a model of the statistical quantity, generate a data sample set by the model of the statistical quantity, obtain an index for estimating whether the clustering is good or bad, increment by 1 each time from K +1 to investigate the clustered WSS simulating the generated samples one by one, accept the K +1 categories when the clustered WSS satisfies a preset condition, and sequentially increase K thereafter until the preset condition is not satisfied to determine the K value.

Further, in an embodiment of the present invention, the initialization module 400 is further configured to arbitrarily select K feature vectors from the initial sample space as initial clustering centers, obtain distances between other objects in the initial sample space and the clustering centers, use a mean value corresponding to all objects in each category as the clustering centers of the categories, and obtain a value of the objective function, so as to update the clustering centers until the updated clustering centers are equal to or have a difference value smaller than a preset threshold value from the previous clustering centers.

Further, in an embodiment of the present invention, the classifying module 500 is further configured to classify the cluster center of each class as a corresponding feature vector into a most similar class, and classify network resources corresponding to other feature vectors in the class into the similar class.

Further, in an embodiment of the present invention, the classifying module 500 is further configured to add a new class to the new class when the difference between the feature value and each current class is greater than the upper threshold.

It should be noted that the foregoing explanation of the embodiment of the method for automatically classifying network space resources based on unsupervised learning is also applicable to the apparatus for automatically classifying network space resources based on unsupervised learning of this embodiment, and is not repeated here.

According to the unsupervised learning-based network space resource automatic classification device provided by the embodiment of the invention, the attributes of network space resources are marked manually, relevant characteristic vectors are extracted, the value of K is determined by using a Parametric Bootstrap method, the central positions of K clusters are stored, and the network space resource frame can be amplified and supplemented according to the clustering result, so that the construction of a network space resource map is facilitated.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specified otherwise.

In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A network space resource automatic classification method based on unsupervised learning is characterized by comprising the following steps:

collecting resources of a network environment where the classifier is located to generate a resource set;

performing attribute marking on the resources of the resource set according to a preset n-dimensional attribute to generate a new resource set;

extracting the characteristics of the new resource set through an extraction function to obtain a characteristic vector and obtain an initial sample space;

obtaining a K value through a Parametric Bootstrap method, and classifying the initial sample space by using K mean value clustering so as to divide the initial sample space into K resources; and

classifying the K-class resources into the network space resource atlas corresponding to each class of clustering center to perfect the network space resource atlas, and classifying the K-class resources into the network space resource atlas corresponding to each class of clustering center further comprises: taking the clustering center of each class as a corresponding feature vector, classifying the clustering centers into the most similar classes, and classifying network resources corresponding to other feature vectors in the class into the similar classes; if the difference value between the characteristic value and each current class is larger than the upper threshold value, adding a new class to be classified into the new class;

classifying the initial sample space using K-means clustering, further comprising:

randomly selecting K eigenvectors from the initial sample space as initial clustering centers;

obtaining distances between other objects in the initial sample space and the clustering center;

taking the mean values corresponding to all objects in each category as the clustering centers of the categories, and acquiring the value of a target function to update the clustering centers until the updated clustering centers are equal to the former clustering centers or the difference value is smaller than a preset threshold value;

and further comprising the following steps of realizing automatic resource classification by using K-means clustering:

(1) Randomly selecting K eigenvectors from an initial sample Z as initial clustering centers;

(2) For other objects Ri in the initial sample Z, calculating the distance between the Ri and the clustering center, wherein the distance is calculated by the Euclidean distance between every two points or selected according to actual efficiency, and the Ri is classified into a class corresponding to the nearest clustering center according to the nearest criterion;

(3) Taking the mean values corresponding to all the objects in each category as the clustering center of each category, calculating the value of a target function, and updating the clustering center;

(4) Iterating the step 2 and the step 3 until the newly generated clustering center is equal to the clustering center of the previous step or the difference value is smaller than a specified threshold value zeta;

obtaining a K value by a Parametric Bootstrap method, further comprising:

after the characteristic vector is obtained, firstly, a preset value is taken for the K value, a statistical value of K types is obtained through a K-means method, and a statistical model is obtained;

generating a set of data samples by a model of the statistics;

acquiring indexes for estimating the quality of clustering, and increasing 1 from K +1 each time to investigate the WSS of the clustering of the samples generated by simulation one by one;

and when the clustered WSS meets a preset condition, accepting K +1 classes, and sequentially increasing K after the K classes until the preset condition is not met so as to determine the K value.

2. An automatic classification device for network space resources based on unsupervised learning, comprising:

the acquisition module is used for acquiring resources of the network environment where the classifier is located so as to generate a resource set;

the marking module is used for carrying out attribute marking on the resources of the resource set according to preset n-dimensional attributes so as to generate a new resource set;

the extraction module is used for extracting the characteristics of the new resource set through an extraction function to obtain a characteristic vector and obtain an initial sample space;

the initialization module is used for obtaining a K value through a Parametric Bootstrap method and classifying the initial sample space by using K mean value clustering so as to divide the initial sample space into K resources; and

the classification module is used for classifying the K resources into the network space resource map classes corresponding to the clustering centers of each class so as to perfect the network space resource maps; the classification module is further used for taking the clustering center of each class as a corresponding characteristic vector, classifying the clustering center into the most similar class, and classifying the network resources corresponding to other characteristic vectors in the class into the similar class; when the difference value between the characteristic value and each current class is larger than the upper threshold value, adding a new class to be classified into the new class;

the initialization module is further used for randomly selecting K eigenvectors from the initial sample space as initial clustering centers, obtaining the distances between other objects in the initial sample space and the clustering centers, taking the mean values corresponding to all the objects in each category as the clustering centers of the categories, and obtaining the value of a target function so as to update the clustering centers until the updated clustering centers are equal to the previous clustering centers or the difference value is smaller than a preset threshold value;

the initialization module is further used for realizing automatic resource classification by using K-means clustering:

the selecting unit is used for randomly selecting K eigenvectors from the initial sample Z as an initial clustering center;

a calculating unit, configured to calculate, for other objects Ri in the initial sample Z, a distance between the Ri and the cluster center, where the distance is calculated by using an euclidean distance between every two points or by selecting according to actual efficiency, and the Ri is classified into a class corresponding to a closest cluster center according to a closest criterion;

the updating unit is used for taking the mean value corresponding to all the objects in each category as the clustering center of each category, calculating the value of the target function and updating the clustering center;

the iteration unit is used for iterating the first calculation unit and the updating unit until the newly generated clustering center is equal to the clustering center in the previous step or the difference value is smaller than a specified threshold value zeta;

the initialization module is further used for taking a preset value from the K value after the characteristic vector is obtained, obtaining a statistical value of K types by a K-means method, obtaining a statistical model, generating a data sample set by the statistical model, obtaining indexes for estimating the cluster quality, increasing 1 from K +1 each time to investigate the clustered WSS of the simulated generated samples one by one, and accepting the K +1 types when the clustered WSS meets preset conditions, and sequentially increasing K later until the preset conditions are not met to determine the K value.