CN114567501A

CN114567501A - Automatic asset identification method, system and equipment based on label scoring

Info

Publication number: CN114567501A
Application number: CN202210213411.3A
Authority: CN
Inventors: 张雪梅; 李元雄
Original assignee: Kelai Network Technology Co ltd
Current assignee: Kelai Network Technology Co ltd
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2022-05-31
Anticipated expiration: 2042-03-04
Also published as: CN114567501B

Abstract

The invention relates to the technical field of asset identification, in particular to an asset automatic identification method, system and device based on tag scoring. The method comprises the following steps: s1, generating an image table according to the acquired network traffic data packet; and S2, extracting asset score data from the representation table, and calculating according to indexes of multiple dimensions and multiple levels and weight proportions to obtain a score for asset discrimination. The method establishes the multi-dimensional and multi-level asset identification indexes, and the obtained scores describe and evaluate the assets more objectively, so that the efficiency and accuracy of asset identification are improved.

Description

Automatic asset identification method, system and equipment based on label scoring

Technical Field

The invention relates to the technical field of asset identification, in particular to an asset automatic identification method, system and device based on label scoring.

Background

When network security analysis is carried out, the core is to comb out network assets and carry out unified management. However, most network administrators do not know which assets are in their own network system, so that the manually carded assets are limited and incomplete. There is therefore a need for an automatic asset identification technique that helps network administrators to comb through assets in a network.

The current asset identification technology is mainly based on active detection network scanning technologies, including ICMP, TCP, UDP and other scanning technologies to quickly discover online assets, i.e. sending data packets to target network assets needing detection and identification, and waiting for device responses, if responses are received, detecting the IP and MAC information of responding device assets.

As an asset identification technology, a network scanning technology based on active probing has the following main disadvantages:

1. when the network scanning mechanism is adopted, the request is not responded if the target host is not in a survival state, so that the condition of false negative exists.

2. The network scanning mechanism sends a detection data packet to wait for the response of the target host, but the device responding in the actual network is not necessarily the real asset in the system network, so the accuracy rate of asset identification is low.

Therefore, the invention is mainly based on passive detection, and performs asset identification according to the characteristics of network flow data, and as the closest prior art, the patent "method for intelligently combing assets based on network flow characteristics, computer program and storage medium" discloses the following method: continuously analyzing the flow and marking portrait labels on all active hosts which generate the flow and have active network behaviors based on passive flow identification; and performing feasibility probability calculation according to the characteristics of the asset types by activating the portrait labels of the host to complete the identification of the asset types of the host.

Although a scoring mode is adopted in the process of judging the asset type, comprehensive scoring indexes are not provided for a client and a server, and the influence of each index on asset determination is not considered, so that the final score cannot support efficient and accurate judgment; time accumulation is not considered in calculation, and judgment is inaccurate if the data at a certain moment is only scored according to the contingency; in addition, the scheme does not consider how to reduce the consumption of the memory in the system and improve the efficiency when a large amount of feature extraction is faced when extracting the feature value from the flow data.

Disclosure of Invention

In order to overcome the problems, the invention establishes multi-dimensional and multi-level asset identification indexes, takes the weight of each index into consideration, and provides an asset automatic identification method, system and equipment based on label scoring. The accuracy of asset identification is effectively improved.

In order to achieve the above purpose, the invention provides the following technical scheme:

an asset automatic identification method based on tag scoring comprises the following steps:

s1, generating an image table according to the acquired network traffic data packet;

and S2, extracting asset scoring data from the image table, and calculating according to indexes of multiple dimensions and multiple levels and weight proportion to obtain a score for asset discrimination.

As a preferred aspect of the present invention, the score is calculated based on a service asset scoring index and/or an IP asset scoring index.

As a preferred aspect of the present invention, the service asset scoring index includes a base information dimension, an activity dimension, and a client dimension.

As a preferred scheme of the invention, the basic information dimension comprises a host name, a host alias, a network segment and a protocol.

As a preferred scheme of the invention, the activity dimension comprises accumulated service providing days, service activity type distribution and service activity period distribution.

As a preferred scheme of the present invention, the client dimension includes an access client total occupation ratio, a client distribution range, an operating period and a non-operating period access client proportion.

Preferably, the IP asset scoring index includes a server attribute and a client attribute.

As a preferred aspect of the present invention, the client attributes include number of access servers, access asset server odds, access service score level, and cumulative days of occurrence.

When the scoring indexes are set, the scoring indexes are classified according to service assets and IP assets, the multi-level scoring indexes are respectively set according to the characteristics of the assets in various categories, specific acquisition modes, calculation modes and scoring strategies are given for each index, and the accuracy of asset identification is effectively improved through weighted calculation.

As a preferred embodiment of the present invention, in step S1, the network traffic packets are aggregated according to a preset time period of hierarchical levels, and each layer tags the traffic data with different network characteristics, so as to obtain a graph.

In the process of generating the sketch table according to the network traffic data packet acquired in real time, counting index data in a period of time by converging the traffic data layer by layer, and marking different network characteristic labels on the traffic in each layer to perform network asset sketch.

As a preferable scheme of the invention, the hierarchical time period comprises days and hours. In the data aggregation process, the hour tables are aggregated into the day tables, and the accumulation of time is considered when the asset scoring data is obtained, so that the accident that the scoring is carried out only according to the data at a certain moment is avoided, and the high efficiency and the accuracy of asset discrimination are considered.

As a preferred scheme of the present invention, a part of data is screened from the network traffic data packet for aggregation.

The generated image table comprises an IP image table and a service image table, and the service image table is generated by aggregating a service end IP + a service end port; the IP portrait table is generated according to IP aggregation.

As a preferred aspect of the present invention, the screening out part of the data includes screening out the triplet information from the seven-tuple information.

As a preferred embodiment of the present invention, the screened triplet information includes: client IP, server port. The sketch of the network assets is obtained after the traffic data are gathered layer by layer, the data are screened in the gathering process, and the sketch table is generated only through the key data, so that the memory consumption caused by extraction and calculation of a large amount of data is avoided.

As a preferred embodiment of the present invention, the screening of the triplet information from the seven-tuple information specifically includes: and aggregating the seven-element group information acquired in real time according to hours to generate a triple hour table, and aggregating the triple hour table according to days to generate a triple day table. The triple hour table is obtained by polymerizing the seven-element original tables acquired in real time, the triple day table is obtained by polymerizing the triple hour table, and the memory can be released once per hour by polymerizing the hour table levels without affecting the system performance. And the calculation of the triple day table through the fields generated in the triple hour table is not easy to cause the loss and omission of data.

As a preferable scheme of the invention, the method further comprises the following steps:

s15, respectively printing service and IP day level indexes on the service sketch table and the IP sketch table, wherein the service and IP day level indexes comprise the earliest and latest time points of the service provided by the service provider in the day, the number of intranet clients in the clients accessing the service in the day, the first and latest time points of the IP in the network in the day and whether the IP is active in the working period in the day.

As a preferred embodiment of the present invention, after the step S1 obtains the data packet in the network traffic in real time, the method further includes the following steps: and according to the defined internal and external network range, deleting the external network flow data from the data packet, and only entering the subsequent asset identification process by the internal network flow data packet.

As a preferable embodiment of the present invention, step S1 further includes the steps of: and comparing the portrait table with the asset standing book to find the asset to be determined, generating the asset to be determined for the service or IP which is not the asset or non-asset, and calculating the asset discrimination score of the asset to be determined in a corresponding step S3. And comparing the sketch table with the asset standing book to find the asset to be determined, generating the asset to be determined for the service or IP which is not the asset or the non-asset, and calculating the asset discrimination score of the asset to be determined. The steps enable the assets to be determined to be screened out before the scores of asset discrimination, then the scores of asset discrimination are calculated, but not all the assets need to be calculated, and the calculation efficiency is improved because the calculation is not carried out on the objects which are definitely the assets.

Based on the same conception, the automatic asset identification system based on the label scoring is also provided, and comprises the following components:

the first processing module is used for generating an image table according to the acquired network flow data packet;

and the second processing module is used for extracting asset scoring data from the representation table and calculating a score for asset discrimination according to the indexes of multiple dimensions and multiple levels and a weight ratio.

Based on the same concept, the automatic asset identification device based on the label scoring is further provided, and comprises at least one processor and a memory which is in communication connection with the at least one processor; the memory stores instructions executable by the at least one processor to cause the at least one processor to perform any of the tag scoring-based asset automatic identification methods described above.

Based on the same concept, a non-transitory computer-readable storage medium is also proposed, on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of any of the above described automatic asset identification methods based on tag scoring.

Compared with the prior art, the invention has the beneficial effects that:

the method extracts the asset score data from the image table generated by the flow data, calculates the score for asset discrimination according to the indexes of multiple dimensions and multiple levels and the weight proportion, comprehensively evaluates the characteristics of the asset by the indexes of multiple dimensions and multiple levels, and reflects the influence degree of each index on determining whether the asset is, so that the obtained score objectively describes and evaluates the asset, and improves the efficiency and accuracy of asset identification.

Description of the drawings:

FIG. 1 is a flowchart of an asset automatic identification method based on tag scoring in example 1;

fig. 2 is a flowchart illustrating an implementation of step S1 in embodiment 1 to generate a representation table according to the obtained network traffic data packet;

FIG. 3 is a flowchart for automatically identifying service assets and IP assets in a network according to scores in detail in example 1;

FIG. 4 is a flowchart of generating a representation table from the data packets in example 1.

Detailed Description

The present invention will be described in further detail with reference to test examples and specific embodiments. It should be understood that the scope of the above-described subject matter is not limited to the following examples, and any techniques implemented based on the disclosure of the present invention are within the scope of the present invention.

Example 1

An automatic asset identification method based on tag scoring is shown in figure 1,

Wherein the step S1 specifically includes the following steps:

s110, acquiring a data packet in network flow in real time;

s120, generating an image table according to the data packet;

and S130, comparing the portrait table with the asset ledger to determine the assets to be determined.

The specific flow of step S1 for generating the representation table from the acquired network traffic data packet is shown in fig. 2. It should be noted here that the scheme of the present invention may extract asset scoring data from all the generated portrait tables and calculate a score for asset discrimination, or compare the portrait tables with the asset ledger according to a preset condition, delete the portrait table that is definitely known as an asset, determine an asset to be determined, extract asset scoring data only from the asset to be determined, and calculate a score for asset discrimination to be determined. Therefore, step S130 is a preferred step of the present invention, and the object that is already definitely an asset is not calculated any more, so that the efficiency of calculation is improved, and if step S130 is deleted, the solution of the present invention can still be implemented, and step S130 is only a preferred step and is not to be construed as a limitation to the present invention.

A specific flow diagram for automated identification of service assets and IP assets in a network based on scores is shown in fig. 3.

Preferably, step S120 specifically includes: and acquiring a data packet in the network flow in real time, analyzing the data packet to generate seven-element group information, and storing the seven-element group information as a seven-element group original table. Wherein the seven tuple information comprises a source IP address, a destination IP address, a protocol, a source port, a destination port, a source physical address and a destination physical address.

As a preferred scheme, after the data packet in the network traffic is obtained in real time in step S110, the extranet traffic is screened out according to the defined extranet and extranet range, and only the intranet traffic enters the subsequent asset identification process. Specifically, the user manually configures the intranet range on the interface, and according to the intranet range input by the user, the traffic which does not belong to the intranet range is screened. And aggregating the network traffic data packets according to a preset hierarchical time period, and marking different network characteristic labels on the traffic data in each layer to obtain the sketch table.

Preferably, the step of generating a graph table from the data packet in step S120 is to generate an IP graph table and a service graph table of a day level for the IP and the service (IP: PORT) in the traffic, and to mark them, and the step of generating the graph table is as shown in fig. 4, and the specific steps include:

s121, obtaining a seven-element original log table from a real-time collected flow data packet according to a source IP address, a destination IP address, a protocol, a source port, a destination port, a source physical address and a destination physical address.

And S122, removing indexes of a source and target MAC (namely a source physical address and a target physical address), a protocol and a port from the seven-element original log table, and aggregating by hours by taking the server IP, the server port and the client IP as key values to generate a triple hour table comprising the client IP, the server IP and the server port.

In this process, the scan data of the incomplete three-way handshake needs to be discarded first, and only the data of the established session is retained; and secondly, marking indexes of the hour level on the triple hour table, such as the session duration, the session total amount, the number of rejected sessions, the number of handsfree packet sessions, the number of long-time sessions, the number of uploaded and downloaded packets, whether encrypted sessions exist and the like. Taking a server IP + a server port + a client IP as key values, the hourly aggregation specifically means: and carrying out statistics and aggregation on the seven-element original log once per hour, and combining the three data with the same field value of the server IP, the server port and the client IP into one piece of data.

And S123, aggregating the triple hour tables according to the day-level dimension to generate triple day tables. In the process, the index of the last day level is marked on the triple antenna table, such as different hours of the client accessing the service end on the current day, whether the access on the current day is completely rejected, the connection duration of effective data generated during the service user accessing the service on the current day, and the like.

In the invention, the triple day table is not directly obtained according to the seven-element original table, but the hierarchy of the triple hour table is added in the middle, and the reason for adopting the mode is that: 1. if the triple day table is directly generated according to the seven-element table, the result can be counted only by continuously storing the seven-element table information in the memory for 1 day, the memory cannot be released, and the system performance is influenced; 2. the triple day table is directly generated, so that the tags of an hour level can be lost (such as 'whether flow exists in a certain hour') and the triple hour table is increased, and the memory can be released once per hour, so that the system efficiency is improved; 3. some fields in the triple day table need to be calculated according to fields generated in the triple hour table (for example, "different hours of the client accessing the server on the current day"), if the triple hour table is skipped for direct calculation, the calculation amount is increased, and the performance overhead is too large.

S124, aggregating the triple antenna list according to the service end IP + the service end port to generate a service portrait list (only recording the data of the service end IP as an intranet); an IP image table (only data of which IP is an intranet is recorded) is generated according to IP aggregation. The contents in the representation table include one or more of a host MAC manufacturer, IP characteristics, data packet distribution, transmission mode, message type, load characteristics, operating system, software and version, data direction, session protocol, port access, communication frequency and communication mode.

In the process of generating the sketch table, service and IP day level indexes are respectively marked on the service sketch table and the IP sketch table, such as the earliest and latest time points of service provision by a service provider in the current day, the number of intranet clients in clients accessing the service in the current day, the first and last appearance time points of the IP in the network in the current day, whether the IP is active in the working period in the current day, and the like.

Preferably, the step S130 of comparing the sketch table with the asset ledger and determining the asset to be determined specifically includes the following steps:

and comparing the service profile table/IP profile table with the asset ledger and the non-asset table respectively to judge whether the service or the IP is confirmed to be asset or confirmed to be non-asset data, wherein the specific mode is that whether the IP or the service in the profile table exists in the asset ledger or the non-asset table is judged, and repeated judgment can be carried out by using the IP or the service. For services or IPs that are neither assets nor non-assets, a to-be-confirmed asset library of to-be-confirmed assets is generated, including a to-be-confirmed service table and a to-be-confirmed IP table. The data in the asset library to be confirmed is the target data which needs to be automatically combed.

Preferably, in step S2, a property score index is extracted from the representation table of the property, and a score is obtained according to the score index, where the score is used to determine whether the target property is the property to which the target property belongs. One specific embodiment is as follows:

and carrying out asset accuracy grading on the service or IP in the asset library to be confirmed, wherein if the score is greater than or equal to a preset score, the preset score can be dynamically adjusted according to the actual condition, so that different flow conditions can be conveniently met, and if the requirement on the asset identification accuracy is higher, the score can be correspondingly increased. As one embodiment of the present invention, the preset score is set to 70 points. The service or IP to be confirmed is the asset, if the score is less than 70 points, the automatic confirmation strategy is not met, and the service or IP to be confirmed is not the asset.

The service asset scoring strategy is shown in table 1,

TABLE 1 service asset scoring strategy

For the service assets, the service assets are respectively depicted from three dimensions of basic information, activities and clients, each dimension comprises a plurality of indexes, and each index has a corresponding scoring mechanism to calculate the score of the index corresponding to the service. The total score of all indexes under different dimensions is the score of the dimension, the three dimensions respectively occupy different weights, and the total score is multiplied by the weights according to the dimensions and then summed to obtain the total score of the accuracy of the service. The higher the service accuracy score, the greater the likelihood that the service is an asset, and when the service accuracy score is higher than 70, it is indicated that the service complies with the automatic validation policy, the service is determined to be an asset, and the system automatically adds the service to the asset ledger. The service index is described in detail as follows:

4.1, basic information dimension

The basic information dimension is the integrity of the basic information describing the record of the service in the whole network, and the higher the integrity, the higher the possibility that the service is an asset. The dimension is weighted by 0.4, and the total score is calculated by multiplying the sum of all index scores in the dimension by 0.4.

4.1.1 host name

And judging whether the IP to which the service belongs exists in the asset ledger or not so as to describe whether the service is an unknown service newly opened on the asset IP or not. The index is divided into 25 points, the calculation mode is to use the service attributive IP to search the IP asset account table, and if the value can be inquired, the index is divided into 25 points.

4.1.2 host alias

And judging whether the IP to which the service belongs is added with an alias by the user so as to describe whether the IP to which the service belongs has record information in the system. The index is divided into 25 points, the calculation method is to use the IP to which the service belongs to search the IP name table, and if the value can be inquired, the index is divided into 25 points.

4.1.3 belonging to network segment

And judging whether the service attributive IP is in a known asset network segment in the network so as to describe whether the service attributive IP belongs to the known network segment in the network. The index is divided into 25 points, the calculation mode is to use the IP to which the service belongs to search the network segment asset table, and if the IP is in the network segment range in the network segment asset table, the index is divided into 25 points.

4.1.4 protocol

And judging whether the service uses an application layer protocol or not to describe whether the service has application layer protocol information or not. The index is divided into 25 points, the calculation method is to search whether the 'protocol' field of the service (IP: PORT) in the service portrait table has a value, and if so, the index is divided into 25 points.

4.2 Activity dimension

The activity dimension is an activity condition describing that a service is provided in the whole network for a service user, and the more stable, active and long-time the service is provided, the higher the possibility that the service is an asset. The dimension has a weight of 0.3, and the total score is calculated by multiplying the sum of all index scores in the dimension by 0.3.

4.2.1 cumulative days of service

Describing how many days of service the service has been cumulatively provided in the network for the last 30 days, the longer the cumulative number of days of service provided, the greater the likelihood that the service is an asset. The index was divided into 35 points. The calculation method is to search the service image table of the last 30 days and the number of the service records. Since each service that generates communication data on the same day is aggregated into only one piece of data in the service profile table on the same day, the service profile table for the last 30 days shows how many pieces of data of the service are, that is, how many days the service has been provided on the last 30 days. If p is more than or equal to 30, the index score is 35; if p < 30, the index score is (p/30) × 35, with the last 30 days as the deadline, mainly based on the data volume and the system performance consideration. If the number of days is less than 30 days, the number of days is calculated according to the number of days recorded, and the other indexes which need to be calculated by taking 30 days as a deadline adopt the same processing mode.

4.2.2 service Activity type distribution

Describing the type distribution of the number of active hours of the service in the network every day in the last 30 days, calculating the number of active hours of the service in the day in a service sketch table, and providing long-term service in the day if the number of active hours in the day is more than 12 h; if the number of active hours in the day is between 4h and 12h, short-time service is provided in the day; and if the number of the active hours in the day is less than 4h, providing temporary service in the day. The longer the number of days that a long service was provided in the last 30 days, the more active the service is in the network, the greater the likelihood that the service is an asset. The index is divided into 35 points, and the calculation method is to count the number of the field value of the 'active hours type' of the service in the service representation table of the last 30 days to be respectively equal to the number of the long-term service, the short-term service and the temporary service. The index score is (p/(p + q + r)). times.p, q, r, and 35.

4.2.3 service activity period distribution

The number of days the service was active during the last 30 days, respectively, working period (9:00-18:00) and non-working period. In the service representation, the activity period of the service on the day is identified, if the service is active at 9:00-18:00, the label of 'active in working period' is marked, and if the service is not active at 9:00-18:00, the label of 'active in non-working period' is marked. The more days active during the work period, the greater the likelihood that the service is an asset. The index is divided into 30 points in total, and the calculation method is to count the days of the service in the last 30 days, namely the number of days of the service in the working period activity and the non-working period activity. If (p/(p + q)) > 1 or more, the index score is 30; if (p/(p + q)) < 1, the index score is (p/(p + q)). 30.

4.3 client dimension

The client dimension is the relevant information describing all clients accessing this service for the service. The higher the trustworthiness of the client accessing the service, the greater the likelihood that the service is an asset. The dimension has a weight of 0.3, and the total score is calculated by multiplying the sum of all index scores in the dimension by 0.3.

4.3.1 Access client Total occupancy

The ratio of the number of intranet clients accessing the service in the last 30 days to the total number of all clients in the entire intranet is described. The higher the proportion of clients in the network that access the service, the greater the likelihood that the service is an asset. The index is divided into 30 points in total, and the calculation mode is to calculate the ratio of the total number of intranet clients accessing the service in the last 30 days to the total number of all intranet clients in the whole network in the last 30 days. If the number of intranet access service clients is p, the total number of intranet clients is q, and if p/q is larger than or equal to 1, the index score is 30; if p/q < 1, the index score is (p/q) × 30.

4.3.2 client distribution Range

Describing the number of asset network segments to which all intranet clients accessing the service belong in the last 30 days, wherein the greater the number of the asset network segments to which the intranet clients accessing the service belong, the greater the distribution range of the intranet clients accessing the service, and the greater the possibility that the service is an asset. The index is divided into 30 points in total, and the calculation mode is to calculate the number of the asset network segments to which all the clients accessing the service belong in the last 30 days. If the number of the belonged asset network segments is equal to p, and if p is more than or equal to 3, the index score is equal to 30; p is 2, then the index score is 20; p is 1 and score is 10; if p is less than 1, the index score is 0.

4.3.3 ratio of active and inactive periods to access clients

The distribution of the number of clients accessing the service during the active period and the number of clients accessing the service during the inactive period of the last 30 days is described. The higher the number of clients accessing the service during the work period, the greater the likelihood that the service is an asset. The index is divided into 40 points in total, and the calculation mode is to calculate the proportion of the number of the clients accessed in the working period to the number of the clients accessed in the non-working period in the service sketch table in the last 30 days. Let the number of clients accessed during the working period be p and the number of clients accessed during the non-working period be q, the index score be (p/(p + q)). times.40.

The IP asset scoring strategy is shown in table 2.

TABLE 2 IP asset Scoring policy

For IP assets, the information presented as a client and the information presented as a server are not the same, and therefore IP needs to be characterized as dimensional information for the client and server, respectively. And after the accuracy scores of the client attribute and the server attribute are respectively calculated, taking the attribute with the high score as the attribute of the asset.

5.1 Server Attribute

When the IP server is used as a server, the description dimension of the IP is consistent with the description dimension of the service asset, so that the automatic confirmation strategy of the service asset can be directly multiplexed, and the service score with the highest score on the IP is taken as the score of the attribute of the IP server.

5.2 client Properties

When the client is used, the IP asset is mainly described from the related information of the opposite-end server accessed by the client. The more stable and reliable the peer server is accessed, the higher the probability that the client IP is an asset. The dimension is divided into the total of all index scores in the dimension.

5.2.1 number of Access servers

Describing the number of all intranet servers that the client IP has accessed in the last 30 days, the more servers accessed, the more active the client IP is in the network, the higher the probability of being an asset. The index is divided into 25 points in total, and the calculation method is to calculate the number of all intranet servers accessed in the last 30 days when the IP is taken as the client. If the number of the intranet servers is equal to p, and if p is more than 20, score is equal to 25; if p < 20, score is (p/20) × 25.

5.2.2 Access asset Server

Describing that in the last 30 days, the client IP accesses all intranet servers, the number of the intranet servers is proportional to the number of the asset servers. The higher the asset server share accessed, the more active the client IP is in the network, the higher the probability of being an asset. The index is divided into 25 points in total, and the calculation method is to calculate the proportion of the asset servers to the total intranet access servers in all intranet servers accessed in the last 30 days when the IP is taken as the client. Let the number of asset servers in the intranet server be p, the total number of intranet servers be q, and the index score be (p/q) × 25.

5.2.3 Access service score rating

Describe the score of the service with the highest asset accuracy of all non-asset services accessed by the client IP for the last 30 days. Accuracy scores were made for all unknown service assets as described above. The higher the score, the higher the likelihood that the service is an asset. If the service accessed by the client IP is an unknown asset, the possibility that the client IP is an asset can be judged according to the accuracy score of the service. The greater the likelihood that the access service is an asset, the greater the likelihood that the client IP is an asset. The index total score is 25 points, and the calculation mode is that the service with the highest asset accuracy score is calculated from all the accessed non-asset services when the IP is used as the client. If the service highest score is p, the index score is 25 (p/100).

5.2.4 cumulative days of appearance

Describing the last 30 days, the client IP accumulates the number of days in the network, and the specific steps are as follows: in the IP image table generated every day, only one piece of data is converged into one IP each day. If an IP appears on the same day, there is a piece of data in the IP image table, so that in the actual calculation, the number of the IP appearing in the IP image table of the last 30 days is directly calculated, namely the number of the accumulated appearing days of the IP in the network.

The longer the cumulative number of days of occurrence, the greater the likelihood that the client IP is an asset. The index is divided into 25 points, and the calculation method is to search the IP image table of the last 30 days, wherein the IP is the number of days of the appearance of the client. If p is more than 30, the index score is 25; if p < 30, the index score is (p/30) × 25.

Finally, according to the asset automatic confirmation strategy in the step 5, if the automatic confirmation strategy is met, the system automatically adds the service or the IP into the asset ledger; if not, the user judges the property of the asset according to the score, and manually confirms the service or the IP as the asset.

Further, one of ordinary skill in the art will recognize that the methods of the present disclosure may be implemented as computer programs. The methods of the above embodiments are performed by one or more programs, as described above in connection with the figures, including instructions to cause a computer or processor to perform the algorithms described in connection with the figures. These programs may be stored and provided to a computer or processor using various types of non-transitory computer readable media. Non-transitory computer readable media include various types of tangible storage media. Examples of the non-transitory computer readable medium include magnetic recording media such as floppy disks, magnetic tapes, and hard disk drives, magneto-optical recording media such as magneto-optical disks, CD-ROMs (compact disc read only memories), CD-R, CD-R/W, and semiconductor memories such as ROMs, PROMs (programmable ROMs), EPROMs (erasable PROMs), flash ROMs, and RAMs (random access memories).

For example, according to an embodiment of the present disclosure, a computer readable medium may be provided, on which instructions executable by a processor are stored, and when the instructions are executed by the processor, the instructions cause the processor to execute the automatic asset identification method based on tag scoring as described above, or may also cause the processor to execute only the obtained network traffic data packet as described above to generate a representation table, extract asset scoring data from the representation table, and calculate a score for asset discrimination according to a weighting ratio based on indexes of multiple dimensions and multiple levels.

Therefore, according to the disclosure of the present invention, a computer program or a computer program product may be proposed, which when executed, may implement the above-mentioned automatic asset identification method based on tag scoring, or may only execute the above-mentioned operations of generating a representation table from the acquired network traffic data packet, extracting asset scoring data from the representation table, and calculating a score for asset discrimination according to a weighting ratio based on indexes of multiple dimensions and multiple levels.

In addition, the invention also relates to an automatic asset identification system based on tag scoring, which comprises a processor and a memory, wherein the memory is stored with a computer program, when the computer program is executed by the processor, the obtained network traffic data packet can be generated into a portrait table, asset scoring data is extracted from the portrait table, and a scoring method for asset discrimination is obtained by calculation according to weighting proportions according to indexes of multiple dimensions and multiple levels.

Alternatively, the present invention also relates to a computing apparatus or a computing system including a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the operations of generating a representation table only by the network traffic data packet acquired as described above, extracting asset score data from the representation table, and calculating a score value for asset discrimination in accordance with a weighting ratio based on indexes of a plurality of dimensions and a plurality of levels are realized.

Furthermore, it should be understood that although the present description refers to embodiments, the embodiments do not include only one independent technical solution, and such description is only for clarity, and those skilled in the art should take the description as a whole, and the technical solutions in the embodiments may be appropriately combined to form other embodiments that can be understood by those skilled in the art.

Claims

1. An asset automatic identification method based on tag scoring is characterized by comprising the following steps:

and S2, extracting asset score data from the representation table, and calculating according to indexes of multiple dimensions and multiple levels and weight proportions to obtain a score for asset discrimination.

2. The method as claimed in claim 1, wherein the score is calculated according to service asset scoring index and/or IP asset scoring index.

3. The method of claim 2, wherein the service asset scoring metrics include a base information dimension, an activity dimension, and a client dimension.

4. The method as claimed in claim 3, wherein the basic information dimension includes host name, host alias, belonging network segment and protocol.

5. The method of claim 3, wherein the activity dimension comprises cumulative days of service provided, service activity type distribution, service activity period distribution.

6. The method as claimed in claim 3, wherein the client dimension includes total access client proportion, client distribution range, working period and non-working period access client proportion.

7. The method of claim 2, wherein the IP asset scoring metrics comprise server attributes and client attributes.

8. The method of claim 7, wherein the client attributes comprise number of access servers, access asset server percentage, access service score level, and cumulative days of occurrence.

9. The method for automatically identifying assets based on tag scoring as claimed in any one of claims 1 to 8, wherein in step S1, the network traffic data packets are aggregated according to the time period of the preset hierarchical level, and each layer is labeled with different network characteristics to obtain the sketch table.

10. The method of claim 9, wherein the hierarchical time periods include days and hours.

11. The method of claim 10, wherein partial data is selected from the network traffic data packets and aggregated.

12. The method of claim 11, wherein the generated graphs comprise an IP graph and a service graph, and the service graph is generated according to a service IP + service port aggregation; the IP portrait table is generated according to IP aggregation.

13. The method of claim 12, wherein the filtering out the partial data comprises filtering out triples from the heptas.

14. The method of claim 13, wherein the filtered triplet information comprises: client IP, server IP, and server port.

15. The method of claim 13, wherein the step of screening the triplet information from the seven-tuple information specifically comprises: and aggregating the seven-element group information acquired in real time according to hours to generate a triple hour table, and aggregating the triple hour table according to days to generate a triple day table.

16. The method of claim 15, wherein the steps further comprise:

and respectively printing service and IP day level indexes on the service sketch table and the IP sketch table, wherein the service and IP day level indexes comprise the earliest and latest time points of service provided by a service provider in the current day, the number of intranet clients in the clients accessing the service in the current day, the first and last appearance time points of the IP in the network in the current day and whether the IP is active in the working period in the current day.

17. The method as claimed in claim 1, wherein after the step S1 of acquiring the data packets in the network traffic in real time, the method further comprises the following steps: and according to the defined internal and external network range, deleting the external network flow data from the data packet, and only entering the subsequent asset identification process by the internal network flow data packet.

18. The method for automatically identifying assets based on tag scoring as claimed in any one of claims 1-8, wherein the step S1 further comprises the steps of: and comparing the sketch table with the asset standing book to find the asset to be determined, generating the asset to be determined for the service or IP which is not the asset or the non-asset, and calculating the asset discrimination score of the asset to be determined in the corresponding step S3.

19. An automatic asset identification system based on tag scoring, comprising:

the first processing module is used for generating an image table according to the network flow data packet acquired in real time;

the second processing module is used for comparing the image table with the asset standing book and finding out assets to be determined;

and the third processing module is used for extracting asset scoring data from the image table of the asset to be determined to obtain a score for asset discrimination.

20. An asset automatic identification device based on tag scoring, comprising at least one processor, and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for tag scoring-based automatic identification of assets of any of claims 1-18.

21. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the method for automatic asset identification based on tag scoring according to any one of claims 1 to 18.