CN117349087B

CN117349087B - Internet information data backup method

Info

Publication number: CN117349087B
Application number: CN202311648260.5A
Authority: CN
Inventors: 金保国; 李沙莎
Original assignee: Liaocheng Luoxi Information Technology Co ltd
Current assignee: Liaocheng Luoxi Information Technology Co ltd
Priority date: 2023-12-05
Filing date: 2023-12-05
Publication date: 2024-02-09
Anticipated expiration: 2043-12-05
Also published as: CN117349087A

Abstract

The invention relates to the technical field of data backup, and provides an internet information data backup method, which comprises the following steps: collecting a plurality of data packets of a network node; constructing a topological structure for all network nodes according to the data packet to obtain a plurality of nodes to be calculated of each network node; obtaining the classification weight of each attribute according to the similarity between attribute information of the same attribute in different data packets; obtaining distance measurement of any two nodes to be calculated of each network node; acquiring a plurality of optimal branches of each network node according to the topological structure and the distance measurement; obtaining the key degree of each network node; obtaining a target value of each character of each network node according to the key degree of each network node and the data packet; and acquiring a plurality of backup nodes and storage data. The invention aims to solve the problem that the network node cannot accurately carry out backup due to the connection relation.

Description

Internet information data backup method

Technical Field

The invention relates to the technical field of data backup, in particular to an internet information data backup method.

Background

In the information age today, the display life is increasingly dependent on the internet, and a large amount of data is generated, stored and transmitted on various network nodes every day, including various types of information such as social media accounts, emails, online shopping, banking transactions, and the like; however, network node data backup becomes critical because of network device hardware failures, software errors, network attacks, or other unknown factors, data may be lost or corrupted, causing significant loss to individuals and businesses.

For data backup, redundant compression is usually adopted in the existing method for processing; redundant compression can store redundant storage data and route information in a plurality of network nodes, so that data loss and service interruption caused by single node faults are effectively prevented; by screening part of data for backup, the data volume of backup can be further reduced on the premise of meeting the availability and the integrity of the data; however, even if the backup node exists, the backup node does not contain the data information of the lost node, and once the original network node has a problem, the backup node still cannot immediately replace the original node to work. Therefore, a redundant storage method is adopted, and partial data of other nodes are stored in the backup node; however, due to the limited storage space in each network node (including the backup node), it is necessary to determine which other nodes of data are stored in the backup node, that is, the corresponding needs to filter the data to obtain the stored data.

Disclosure of Invention

The invention provides an internet information data backup method, which solves the problem that the existing network nodes cannot accurately backup due to the connection relation, and adopts the following technical scheme:

An embodiment of the present invention provides a method for backing up internet information data, including the steps of:

collecting a plurality of data packets of a network node;

constructing a topological structure for all network nodes according to the data packet to obtain a plurality of nodes to be calculated of each network node; obtaining the classification weight of each attribute according to the similarity between attribute information of the same attribute in different data packets; obtaining distance measurement of any two nodes to be calculated of each network node according to the network node and the data packet of the corresponding node to be calculated; acquiring a plurality of optimal branches of each network node according to the topological structure and the distance measurement;

obtaining the key degree of each network node according to the distribution of the nodes to be calculated on the optimal branch; constructing an objective function for each character of each network node according to the key degree of each network node and the data packet, and outputting a target value of each character of each network node;

and acquiring a plurality of backup nodes and storage data according to the target value of each character of each network node.

Further, the topology structure is constructed for all network nodes according to the data packet, so as to obtain a plurality of nodes to be calculated of each network node, which comprises the following specific methods:

For all data packets of all network nodes, acquiring two network nodes respectively included in each data packet, wherein the corresponding data packets of the two network nodes are in connection relation with the two network nodes, and constructing a topological structure according to the connection relation between every two network nodes; for any one network node, all other network nodes directly connected with the network node and indirectly connected with the network node are recorded as a plurality of nodes to be calculated of the network node.

Further, the specific method for obtaining the classification weight of each attribute includes:

extracting the attribute information of two network nodes in each data packet for any attribute, and carrying out digital processing on characters in the attribute information to obtain vectors of the attribute of each network node in each data packet, wherein the maximum character number of the attribute information is used as the dimension of the vectors in the digital processing and vector conversion processes, and the deficiency of the character number is complemented; obtaining vectors of each attribute of each network node in each data packet, and obtaining a plurality of vectors for each attribute; first, theThe method for calculating the classification weight of the attribute comprises the following steps:

wherein, Indicate->Classification coefficient of seed attribute,/>Indicate->Dimension number of vector of seed attribute, +.>Representing from->The number of dimensions chosen in each vector of seed attributes, +.>Representing from->Before +.>Cosine similarity mean between several vectors derived from the dimensions, said mean from +.>Before +.>The specific acquisition method of the cosine similarity mean value among a plurality of vectors obtained by each dimension comprises the following steps: from->Extracting front +.>Obtaining a new vector by each dimension, obtaining cosine similarity for any two vectors in all the new vectors, and then averaging the cosine similarity; />Indicating except->The number of categories of other attributes than the category attribute, +.>Indicating except->Other than the seed attribute->Dimension number of vector of seed attribute, +.>Indicating except->Maximum value of the number of dimensions of the vector of all properties except the seed property, +.>Indicating except->Other than the seed attribute->Variance of cosine similarity between all vectors of the seed attribute;

and obtaining the classification coefficient of each attribute, and carrying out softmax normalization on the classification coefficients of all the attributes, wherein the obtained result is used as the classification weight of each attribute.

Further, the distance measurement of any two nodes to be calculated of each network node is obtained by the specific method that:

obtaining a plurality of vector groups of each network node according to a plurality of data packets of each network node; any network node is taken as a target node, and the first network node of the target node isThe node to be calculated and the->Nodes to be calculated are respectively from +.>Several vector groups of the nodes to be calculated +.>Selecting one vector group from a plurality of vector groups of the nodes to be calculated to obtain a vector group combination, and obtaining the +.>The node to be calculated and the->Several vector group combinations of nodes to be calculated, the +.>The node to be calculated and the->Distance measure of individual nodes to be calculated +.>The calculation method of (1) is as follows:

wherein,representing the->The node to be calculated and the->The number of vector group combinations of the nodes to be calculated,/-)>Representing the number of categories of attributes->Indicate->Classification weight of seed attribute,/->Representing the->The node to be calculated and the->The +.>In the combination of the vector groups, two +.>Cosine similarity between vectors of attributes.

Further, the method for obtaining a plurality of vector groups of each network node according to the plurality of data packets of each network node includes the following specific steps:

For any one data packet of any network node, acquiring vectors corresponding to various attribute information of the network node in the data packet, forming a vector group of the network node in the data packet by the vectors of various attribute information of the network node in the data packet, and acquiring the vector group of the network node in each data packet to acquire a plurality of vector groups of the network node.

Further, the specific acquisition method of the plurality of optimal branches of each network node is as follows:

k-means clustering is carried out on a plurality of nodes to be calculated of the target node, the distance measurement is the distance measurement obtained by the nodes to be calculated, a plurality of clusters are obtained, and the clusters are recorded as a plurality of categories;

starting from a target node, according to the connection relation between each node to be calculated and the target node in the topological structure, including direct connection and indirect connection, a plurality of branches are obtained outwards from the target node; in the process of acquiring a plurality of branches, after the branches reach the nodes to be calculated, any node to be calculated is connected with a plurality of other nodes to be calculated, the nodes to be calculated are marked as bifurcation nodes, one node to be calculated before the bifurcation nodes is marked as bifurcation nodes, and a plurality of nodes to be calculated, which are directly connected with the bifurcation nodes, are marked as node to be selected;

Obtaining the category of a node to be branched, deleting the node to be selected, which belongs to the category different from the category of the node to be branched, reserving the node to be selected, which belongs to the category identical to the category of the node to be branched, and marking the reserved node to be selected as the node to be branched, wherein the node to be branched with the minimum distance measurement with the node to be branched in all the nodes to be branched is used as the optimal branch node of the node to be branched, if only one node to be branched is used, the node to be branched is directly used as the optimal branch node, and if the node to be branched does not exist, the branching is stopped;

according to the branch extension method of the bifurcation node, the optimal branch node of the bifurcation node is obtained, the branch extension is continued, and a plurality of branches obtained finally are recorded as a plurality of optimal branches of the target node.

Further, the obtaining the key degree of each network node includes the following specific methods:

taking any one network node as a target node, regarding any one optimal branch of the network node, taking the order of the nodes to be calculated in the optimal branch as an abscissa, taking the serial number of the class to which the nodes to be calculated belong as an ordinate, and constructing a class change curve of the optimal branch of the network node, wherein the order in the optimal branch is from the nodes to be calculated which are directly connected with the target node, the order of the directly connected nodes to be calculated is 1, and the serial numbers are backward according to the optimal branch; numbering a plurality of categories of the target node to obtain the ordinate of each node to be calculated on the optimal branch;

Acquiring a class change curve of each optimal branch of the target node, calculating DTW distances for any two class change curves, acquiring the average value of all DTW distances, and obtaining the target nodeA critical degree as target node, wherein +.>Represents the mean of all DTW distances, +.>Is super-parameter (herba Cinchi Oleracei)>Is an exponential function with a base of natural constant.

Further, the method for constructing an objective function for each character of each network node and outputting a target value of each character of each network node includes the following specific steps:

for the firstA network node obtaining->Several characters of all data in all data packets of a network node, for which +.>A character is planted, and each occurrence of the character is combined with the adjacent previous character to obtain a character combination, thereby obtaining the +.>A plurality of character combinations of seed characters;

for any character combination, acquiring the interval between every two adjacent occurrences of the character combination, obtaining a plurality of intervals of the character combination, and recording the average value of all the intervals as the average interval of the character combination; acquisition of the firstThe variance of the average interval of all character combinations of the seed character is recorded as the occurrence fluctuation coefficient of the character to obtain the +.>Carrying out linear normalization on the occurrence fluctuation coefficient of each character in all data packets of each network node, and recording the obtained result as the occurrence fluctuation degree of each character;

According to the character combination and the fluctuation degree of a plurality of characters of the network node and the key degree of the network node, constructing an objective function for each character of each network node, and outputting the target value of each character of each network node through the objective function.

Further, the objective function is specifically constructed by the following steps:

first, theThe%>Objective function of seed character->The expression of (2) is:

wherein,indicate->The criticality of the individual network nodes +.>Indicate->The%>Number of character combinations of seed characters, +.>Indicate->Maximum value of the number of character combinations of each character of the individual network nodes>Represent the firstThe%>The degree of occurrence fluctuation of the seed character.

Further, the specific obtaining method of the backup nodes and the stored data includes:

all the characters of all the network nodes are arranged in descending order according to the target value to obtain a character sequence, each element in the character sequence is a binary group, and the binary groups are the characters of the network nodes and the network nodes;

presetting a backup number, wherein the backup number is described by 100; acquiring a plurality of first backup data according to the character sequence and the backup quantity, regarding any one of the first backup data, taking the network node as a first backup node, combining characters with the largest occurrence frequency among all data packets of the first backup node, and taking the combination as storage data of the first backup node to acquire the first backup node corresponding to each first backup data and storage data thereof;

Acquiring a plurality of second backup data according to the character sequence and the backup quantity, and acquiring a second backup node corresponding to each second backup data and storage data thereof; and acquiring a plurality of groups of backup nodes and corresponding storage data according to the character sequence and the backup quantity.

The beneficial effects of the invention are as follows: the invention stores the data of other nodes in the backup node by a self-adaptive redundancy storage method, thereby realizing the safe backup of the Internet information data. The method comprises the steps of analyzing data in all network nodes to obtain the key degree of the network nodes, wherein the classification weight of each attribute is quantized according to various attribute information in a data packet of each network node; meanwhile, according to the topological structure of the network node and the data packet, constructing vectors of different attributes of the network node, clustering a plurality of nodes to be calculated of the target node to obtain a plurality of optimal branches of the network node, and quantifying the optimal branches to obtain a key degree, so that the key degree can represent the positions of the network node in a large amount of information transmission process; performing objective function construction according to the key degree and character distribution in the data packet to obtain target values of all characters, and screening backup nodes and storage data thereof, thereby avoiding the defect that the traditional redundant storage process is screened only according to time sequence; the cost of the backup in the recovery process is lower, and the safe backup of the Internet information data is realized.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a schematic flow chart of an internet information data backup method according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, a flowchart of an internet information data backup method according to an embodiment of the invention is shown, and the method includes the following steps:

Step S001, collecting a plurality of data packets of the network node.

The purpose of this embodiment is to perform backup storage on network nodes in the internet, so that firstly, the network nodes need to be acquired, and meanwhile, a plurality of data packets of each network node need to be acquired, so as to provide a foundation for the subsequent topology structure and the data that the network nodes need to backup.

Specifically, the backup is performed on internet data, which is commonly found in an internet company, and in this embodiment, all network devices in a campus of the internet company are collected as network nodes, and all data packets generated by each network node from the beginning of the establishment are obtained, and it is to be noted that each data packet corresponds to two network nodes, namely, a network node for transmitting the data packet and a network node for receiving the data packet, so that all generated data packets are all data packets received or transmitted by the corresponding network node; meanwhile, each data packet contains information of various attributes, and in this embodiment, the IP address, the node ID, the port number and the device state of the network node are described as examples, and each data packet contains the information of the various attributes of two network nodes.

Thus, a plurality of network nodes and a plurality of data packets of each network node are obtained.

Step S002, constructing a topological structure for all network nodes according to the data packet to obtain a plurality of nodes to be calculated of each network node; obtaining the classification weight of each attribute according to the similarity between attribute information of the same attribute in different data packets; obtaining distance measurement of any two nodes to be calculated of each network node according to the network node and the data packet of the corresponding node to be calculated; and acquiring a plurality of optimal branches of each network node according to the topological structure and the distance measurement.

In the process of backing up the internet information data, analyzing the data in all network nodes to obtain the key data in the network nodes; in order to better perform data backup, the data to be stored by the expected backup node is key data, and according to the key data, data recovery can be performed well; therefore, the network nodes are required to be subjected to topology structure analysis, and an objective function is constructed to perform data screening by combining the distribution characteristics of each network node in the topology structure, so that data storage is performed.

It should be further noted that, for the topology structure, it is required to construct according to a plurality of data packets corresponding to each network node, where each data packet includes information of two network nodes, and the corresponding two network nodes present a connection relationship, and further obtain the topology structure according to the connection relationship; meanwhile, the data packet contains multiple attribute information, each attribute information is converted into a vector, the classification weight of each attribute is quantized through the similarity among the vectors, the classification weight characterizes the relevance of each attribute in different network nodes, and the classification weight is larger when the relevance is larger; and quantifying the distance measurement of the node to be calculated under each network node according to the classification weight of the attribute and the data packet, and providing a basis for acquiring a plurality of branches of each network node for subsequent clustering.

Specifically, for all data packets generated by all network nodes, acquiring two network nodes included in each data packet, wherein the corresponding data packets exist in the two network nodes, namely, the two network nodes have a connection relationship, a topology structure is constructed according to the connection relationship between all network nodes, and the topology structure is constructed as a known technology, so that the embodiment is not repeated; for any one network node, recording all other network nodes directly connected and indirectly connected with the network node as a plurality of nodes to be calculated of the network node, wherein the direct connection is the other network nodes connected with the network node, and the indirectly connected network node is the network node which can be finally connected with the network node through the other network nodes as an intermediary, namely the network node which is not directly connected but has a connecting path; and acquiring a plurality of nodes to be calculated of each network node according to the method.

Further, each data packet includes multiple attribute information of two network nodes, for any attribute, extracting the attribute information of the two network nodes in each data packet, digitizing characters in the attribute information to obtain a vector of the attribute of each network node in each data packet, wherein in the digitizing and vector conversion process, the maximum character number of the attribute information is used as a vector dimension, insufficient characters are complemented, for example, an IP address 192.168.1.1 and a maximum of 255.255.255, the maximum character number of the attribute is 15, and the complemented position is replaced by 11, and then the example is converted into 192 12 168 12 11 11 1 12 11 11 1, wherein ""; 12 is used for replacing, 11 is used for replacing the complement bit, and a 15-dimensional vector is formed according to the converted number; according to the method, the vector of each attribute of each network node in each data packet is obtained, a plurality of vectors are obtained for each attribute, wherein for the attribute of the equipment state, the equipment state comprises the equipment type and the node state corresponding to the network node, the digital processing is directly carried out on different node states (equipment types) through numbering, a two-dimensional vector is formed according to attribute information, and a plurality of two-dimensional vectors are obtained for the attribute of the equipment type; in the first placeThe attribute is taken as an example, and the method for calculating the classification weight of the attribute comprises the following steps:

wherein,indicate->Classification coefficient of seed attribute,/>Indicate->Dimension number of vector of seed attribute, +.>Representing from->The number of dimensions selected from each vector of seed attributes, i.e. from +.>Before +.>Dimension(s) (i.e.)>Representing from->Before +.>Cosine similarity means between several vectors obtained in each dimension, i.e. extracting the former +.>Obtaining a new vector by each dimension, obtaining cosine similarity for any two vectors in all the new vectors, and then averaging the cosine similarity; / >Indicating except->The number of categories of other attributes than the category attribute, +.>Indicating except->Other than the seed attribute->Dimension number of vector of seed attribute, +.>Indicating except->Maximum value of the number of dimensions of the vector of all properties except the seed property, +.>Indicating except->Other than the seed attribute->Variance of cosine similarity between all vectors of the seed property, i.e. for other +.>Any two vectors in the plurality of vectors of the seed attribute acquire cosine similarity, and then the variance of the cosine similarity is calculated; and obtaining the classification coefficient of each attribute according to the method, and carrying out softmax normalization on the classification coefficients of all the attributes, wherein the obtained result is used as the classification weight of each attribute.

At this time, the larger the cosine similarity mean value among vectors formed by different network nodes of different data packets under one attribute is, the larger the similarity is, the larger the relevance of different network nodes of different data packets is, and the larger the corresponding classification weight is; meanwhile, similarity of other attributes is analyzed, the larger the variance of cosine similarity of vectors of the other attributes is, the larger the discreteness of the cosine similarity is, association distribution of the other attributes is irregular, and accordingly classification weight of the attributes to be adjusted is reduced.

Further, for a data packet generated by any one of the network nodes, acquiring vectors corresponding to various attribute information of the network node in the data packet, forming a vector group of the network node in the data packet by using the vectors of various attribute information of the network node in the data packet, and acquiring a vector group of the network node in each generated data packet to obtain a plurality of vector groups of the network node; and acquiring a plurality of vector groups of each network node according to the method.

Further, taking any network node as a target node, acquiring a plurality of nodes to be calculated of the target node, and for each node to be calculated, having a plurality of vector groups, taking the first node of the target node as the second node of the target nodeThe node to be calculated and the->For example, the nodes to be calculated are respectively from +.>Several vector groups of the nodes to be calculated +.>Selecting one vector group from a plurality of vector groups of the nodes to be calculated to obtain a vector group combination, and obtaining the +.>The node to be calculated and the->Several vector groups of the nodes to be calculated are combined, then the +.>The node to be calculated and the->Distance measure of individual nodes to be calculated +.>The calculation method of (1) is as follows:

wherein, Representing the->The node to be calculated and the->The number of vector group combinations of the nodes to be calculated,/-)>Representing the nature of the attributeQuantity of->Indicate->Classification weight of seed attribute,/->Representing the->The node to be calculated and the->The +.>In the combination of the vector groups, two +.>Cosine similarity between vectors of seed attributes; it should be noted that, the distance measurement is the basis of the subsequent clustering, and the clustering is that the network nodes with smaller distance measurement are more easy to gather into one cluster, so that the larger the cosine similarity is, the smaller the distance measurement should be, meanwhile, the classification weight is used as the reference weight of different attributes, the larger the classification weight is, the influence of the cosine similarity of the vector on the distance measurement is larger; and obtaining the distance measurement of any two nodes to be calculated of the target node according to the method.

Further, K-means clustering is performed on a plurality of nodes to be calculated of the target node, in this embodiment, K=6 is adopted for clustering, and distance measurement is that the nodes to be calculated have acquired, six clusters are obtained and recorded as six categories; starting from the target node, according to the connection relation between each node to be calculated and the target node in the topological structure, including direct connection and indirect connection, a plurality of branches are obtained outwards from the target node; in the process of acquiring a plurality of branches, a branching node exists, namely after a certain branch reaches the branching node, the branching node is connected with a plurality of other branching nodes to be calculated, namely the branch extends backwards again to form branching, the branching node to be calculated is marked as a branching node, one branching node before the branching node is marked as a branching node to be calculated, the branching node to be calculated is the last branching node to be calculated before the branch reaches the branching node, the branching node is directly connected with a plurality of branching nodes to be calculated and is marked as a branching node to be selected (excluding the branching node), the relevance among the branching nodes to be calculated is larger, the distance measurement is smaller, so that the category of the branching node is acquired, the branching node to be selected which is different from the category of the branching node is deleted, the branching node to be selected which is the same as the category of the branching node is reserved, the branching node to be selected is marked as the branching node, the branching node to be selected with the minimum distance measurement with the branching node to be selected, if the branching node to be selected is the optimal branching node, and if the branching node to be branched node is not present, the branching node to be stopped; according to the method for extending the branches of the bifurcation node, the optimal branch node of the bifurcation node is obtained, and the branch extension is continued, and a plurality of branches obtained finally are recorded as a plurality of optimal branches of the target node.

Further, according to the method, the distance measurement of any two nodes to be calculated of each network node is obtained, and a plurality of optimal branches of each network node are obtained through clustering and branch obtaining.

The method comprises the steps of obtaining a plurality of nodes to be calculated of each network node by constructing a topological structure, quantifying classification weight of each attribute by various attribute information of each network node of different data packets, further obtaining distance measurement and clustering among the nodes to be calculated, and finally obtaining a plurality of branches of each network node according to connection relations and clustering results of the network nodes.

Step S003, obtaining the key degree of each network node according to the distribution of the nodes to be calculated on the optimal branch; and constructing an objective function for each character of each network node according to the key degree of each network node and the data packet, and outputting the target value of each character of each network node.

It should be noted that, after the optimal branches of the network node are obtained, the degree of criticality is quantified through the similarity between different optimal branches of the same network node, and the greater the similarity is, the greater the possibility that the network node is in a critical position for information distribution is, the greater the degree of criticality is; after obtaining the key degree, analyzing the data in the data packet generated by each network node (the data in the data packet does not comprise attribute information), and constructing an objective function according to the distribution of characters and character combinations, wherein the objective function aims at that the more important the characters are, and the fewer the number of character combinations is, namely the lower the cost of presuming the characters is; meanwhile, the smaller the key degree of the network node is, the more data in the corresponding data packet is received, the more characters are regular, and the less cost is required for speculation.

Specifically, any network node is taken as a target node, for any optimal branch of the network node, the order of the nodes to be calculated in the optimal branch is taken as an abscissa, the serial number of the class to which the nodes to be calculated belong is taken as an ordinate, a class change curve of the optimal branch of the network node is constructed, wherein the order in the optimal branch is from the node to be calculated which is directly connected with the target node, the order of the directly connected nodes to be calculated is 1, and the serial numbers are backward according to the optimal branch; meanwhile, a plurality of categories of the target node are numbered, so that the ordinate of each node to be calculated on the optimal branch can be obtained; acquiring a class change curve of each optimal branch of the target node, calculating DTW distances for any two class change curves, acquiring the average value of all DTW distances, and obtaining the target nodeA critical degree as target node, wherein +.>Represents the mean of all DTW distances, +.>In order to avoid hyper-parameters with excessively small output values of the exponential function, the present embodiment uses +.>To make a description of->The present embodiment uses +.>Model to present inverse proportional relationship and normalization process, < ->For inputting the model, an implementer can set an inverse proportion function and a normalization function according to actual conditions; the smaller the average value of the DTW distance of the category change curves is, the more similar the category change curves are, the more similar the category change trend is on each optimal branch, the greater the possibility that the target node is at the key position is, and the greater the key degree of the target node is; and acquiring the key degree of each network node according to the method.

Further, for the firstA network node obtaining a plurality of characters of all data in all data packets of the network node, for which +.>A character is seeded, and a character combination is obtained by combining each occurrence of the character with the adjacent preceding character, thereby obtaining the +.>A plurality of character combinations of seed characters; for any character combination, acquiring the interval between every two adjacent occurrences of the character combination, wherein the interval is represented by the number of characters in the two occurrences, acquiring data packets one by one in the interval acquisition process, and acquiring the character combination in the +.>Acquiring intervals from a first data packet of a network node, the data packet comprisingThe interval between the first occurrence and the second occurrence of the character combination, and so on, the interval between the last occurrence and the next occurrence of the character combination in the data packet is not acquired any more, a plurality of intervals of the character combination are obtained, and the average value of all the intervals is recorded as the average interval of the character combination; get->The variance of the average interval of all character combinations of the seed character is recorded as the occurrence fluctuation coefficient of the character to obtain the +.>The occurrence fluctuation coefficient of each character in all data packets of each network node is linearly normalized, and the obtained result is recorded as the occurrence fluctuation degree of each character, then +. >The%>Objective function of seed character->The expression of (2) is:

wherein,indicate->The criticality of the individual network nodes +.>Indicate->The%>Number of character combinations of seed characters, +.>Indicate->Maximum value of the number of character combinations of each character of the individual network nodes>Represent the firstThe%>The degree of occurrence fluctuation of the seed character; when the key degree of the network node is larger, the data is more important, and the number of character combinations of the characters is required to be ensured to be smaller, so that the cost of backup is reduced; when the key degree is smaller, the character combination corresponding to the character needs to be ensured to be distributed regularly in the data packet, and the larger the fluctuation degree is, the smaller the objective function is; and acquiring a plurality of characters of each network node according to the method, constructing an objective function for each character of each network node, and outputting a target value of each character of each network node through the objective function.

The key degree is acquired for each network node according to the optimal branch, and then the target value of each character of each network node is quantized according to the character distribution in the data packet of the network node, so that a basis is provided for the acquisition of the storage data of the subsequent backup node.

Step S004, according to the target value of each character of each network node, a plurality of backup nodes and storage data are obtained.

After the target value of each character of each network node is obtained, all the characters of all the network nodes are arranged in descending order according to the target value to obtain a character sequence, each element in the character sequence is a binary group, and the binary groups are the characters of the network nodes and the network nodes; presetting a backup number, wherein the backup number is described by 100; the method comprises the steps of obtaining first 100 elements of a character sequence as first backup data, regarding any one of the first backup data, taking a network node as a first backup node, combining characters with the largest occurrence frequency among all data packets of the first backup node, and taking the combination of the characters as storage data of the first backup node to obtain a first backup node corresponding to each first backup data and storage data of the first backup node; acquiring 101-200 elements of the character sequence as second backup data, and acquiring a second backup node and storage data thereof corresponding to each second backup data according to the method; the method comprises the steps that a group of backup nodes and corresponding storage data are obtained according to every 100 elements of a character sequence, and it is to be noted that if the storage data and the corresponding backup nodes are already appeared before in the process of obtaining, namely, the character combination is formed by two characters, so that the character combination is repeatedly appeared as the storage data, the corresponding later-appearing storage data and backup nodes do not participate in backup any more, and only the corresponding storage data and backup nodes are reserved when the first-appearing storage data and backup nodes are reserved; the backup storage is carried out from a first group of backup nodes and the storage data, wherein the first group of backup nodes are a plurality of first backup nodes; backup storage is carried out on each group of backup nodes and storage data thereof; in this embodiment, 5 groups are obtained in total, and redundant compression is used for backup storage, which is not described in detail in this embodiment.

The backup nodes and the storage data in the network nodes in the Internet are acquired, and backup storage is carried out in batches, so that the self-adaptive backup of the network nodes in the Internet is realized, and meanwhile, the information can be restored by rapidly replacing the backed-up nodes when the node information is lost.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims

1. The Internet information data backup method is characterized by comprising the following steps of:

collecting a plurality of data packets of a network node;

acquiring a plurality of backup nodes and storage data according to the target value of each character of each network node;

the specific method for obtaining the classification weight of each attribute comprises the following steps:

wherein,indicate->Classification coefficient of seed attribute,/>Indicate->Dimension number of vector of seed attribute, +. >Representing from->The number of dimensions chosen in each vector of seed attributes, +.>Representing from->Before +.>Cosine similarity mean between several vectors derived from the dimensions, said mean from +.>Before +.>The specific acquisition method of the cosine similarity mean value among a plurality of vectors obtained by each dimension comprises the following steps: from->Extracting front +.>Obtaining a new vector by each dimension, obtaining cosine similarity for any two vectors in all the new vectors, and then averaging the cosine similarity; />Indicating except->The number of categories of other attributes than the category attribute, +.>Indicating except->Other than the seed attribute->Dimension number of vector of seed attribute, +.>Indicating except->The maximum value of the number of dimensions of the vector for all attributes except the seed attribute,indicating except->Other than the seed attribute->Variance of cosine similarity between all vectors of the seed attribute;

acquiring classification coefficients of each attribute, and carrying out softmax normalization on the classification coefficients of all the attributes, wherein the obtained result is used as classification weight of each attribute;

the distance measurement of any two nodes to be calculated of each network node is obtained by the following specific method:

2. The method for backing up internet information data according to claim 1, wherein the topology structure is constructed for all network nodes according to the data packet to obtain a plurality of nodes to be calculated of each network node, comprising the following specific steps:

3. The method for backing up internet information data according to claim 1, wherein said obtaining a plurality of vector groups of each network node according to a plurality of data packets of each network node comprises the following specific steps:

4. The method for backing up internet information data according to claim 1, wherein the plurality of optimal branches of each network node are specifically obtained by:

5. The method for backing up internet information data according to claim 4, wherein the obtaining the key degree of each network node comprises the following specific steps:

acquiring a class change curve of each optimal branch of the target node, calculating DTW distances for any two class change curves, acquiring the average value of all DTW distances, and obtaining the target node A critical degree as target node, wherein +.>Represents the mean of all DTW distances, +.>Is super-parameter (herba Cinchi Oleracei)>Is an exponential function with a base of natural constant.

6. The method for backing up internet information data according to claim 1, wherein said constructing an objective function for each character of each network node and outputting a target value for each character of each network node comprises the specific steps of:

7. The method for backing up internet information data according to claim 6, wherein the objective function is constructed by the following specific method:

first, theThe%>Objective function of seed character->The expression of (2) is:

wherein,indicate->The criticality of the individual network nodes +.>Indicate->The%>Number of character combinations of seed characters, +.>Indicate->Maximum value of the number of character combinations of each character of the individual network nodes>Indicate->The%>The degree of occurrence fluctuation of the seed character.

8. The method for backing up internet information data according to claim 6, wherein the specific obtaining method comprises: