CN114547491A - Time sequence map construction method, device, equipment and medium - Google Patents

Time sequence map construction method, device, equipment and medium Download PDF

Info

Publication number
CN114547491A
CN114547491A CN202210204857.XA CN202210204857A CN114547491A CN 114547491 A CN114547491 A CN 114547491A CN 202210204857 A CN202210204857 A CN 202210204857A CN 114547491 A CN114547491 A CN 114547491A
Authority
CN
China
Prior art keywords
sequence
time
subsequence
original
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210204857.XA
Other languages
Chinese (zh)
Inventor
鲍青波
万可
黄娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202210204857.XA priority Critical patent/CN114547491A/en
Publication of CN114547491A publication Critical patent/CN114547491A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the disclosure relates to a time sequence map construction method, a time sequence map construction device, time sequence map construction equipment and a time sequence map construction medium, wherein the method comprises the following steps: acquiring flow data information in a preset time period; constructing an original sequence based on the flow data information; sampling the original sequence to obtain a plurality of subsequences, and determining the information gain of each subsequence based on the distance distribution of each subsequence and the original sequence; comparing the information gains of the subsequences, and obtaining a target sequence from the subsequences based on the comparison result; and constructing the time sequence map by taking the target sequence as a node of the time sequence map and taking the time sequence relation of the target sequence in the same original sequence as an edge of the time sequence map. In the embodiment of the disclosure, the time-series network security attack or access path is converted into the expression form of the time-series map, and the network attack which cannot be detected by the equipment probe can be fed back to the time-series map, so that the accuracy of the time-series map is improved.

Description

Time sequence map construction method, device, equipment and medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for constructing a time-series graph.
Background
With the development of computer technology, network security is increasingly important, and data, behaviors and the like related to network security can be intuitively expressed through a security map.
In the related technology, the network can be detected safely through the probe equipment, if an abnormal condition is detected, the probe equipment can send alarm information, analyze the alarm information, extract a source Internet Protocol (IP) address, a destination IP address and an event from the alarm information, and accordingly a safety map is constructed according to the triples.
However, with the above technical solutions, some network attacks with higher complexity can bypass the detection of the probe device, so that the probe device cannot issue an alarm, and thus the accuracy of the security map is insufficient.
Disclosure of Invention
In order to solve the technical problems described above or at least partially solve the technical problems, the present disclosure provides a time-series graph construction method, apparatus, device, and medium.
In a first aspect, an embodiment of the present disclosure provides a time-series graph construction method, where the method includes:
acquiring flow data information in a preset time period;
constructing an original sequence based on the flow data information;
sampling the original sequence to obtain a plurality of subsequences, and determining the information gain of each subsequence based on the distance distribution of each subsequence and the original sequence;
comparing the information gains of the subsequences, and obtaining a target sequence from the subsequences based on the comparison result;
and constructing the time sequence map by taking the target sequence as a node of the time sequence map and taking the time sequence relation of the target sequence in the same original sequence as an edge of the time sequence map.
In an optional embodiment, the constructing an original sequence based on the traffic data information includes:
analyzing the flow data information, and acquiring the access times of the access terminal in each sub-time period in a preset time period;
and constructing an original sequence corresponding to each access terminal based on the access times and the time sequence relation among the sub-time periods.
In an optional embodiment, the sampling the original sequence to obtain a plurality of sub-sequences includes:
and sampling the original sequence according to a preset sliding window and a preset sliding distance to obtain the plurality of subsequences.
In an optional embodiment, the determining the information gain of each of the subsequences based on the distance distribution of each of the subsequences from the original sequence includes:
constructing a first sequence set based on the subsequences, and calculating a first distance distribution from the first sequence set to each original sequence;
removing the currently processed sub-sequences in the first sequence set to obtain a second sequence set, and calculating a second distance distribution from the second sequence set to each original sequence;
and determining the information gain of the currently processed subsequence according to the first distance distribution and the second distance distribution.
In an alternative embodiment, the calculating a first distance distribution from the first sequence set to each of the original sequences includes:
acquiring a first sampling subsequence for sampling and determining the currently processed original sequence;
calculating a first distance between each first sampling subsequence and a subsequence in the first sequence set, and taking a minimum value in the first distances as a first target distance between the subsequence in the first sequence set and the currently processed original sequence;
calculating the first target distance between each original sequence and each subsequence in the first sequence set, and determining the first distance distribution according to the first target distance;
said calculating a second distance distribution of said second set of sequences to each of said original sequences, comprising:
acquiring a second sampling subsequence for sampling and determining the currently processed original sequence;
calculating a second distance between each second sampling subsequence and a subsequence in the second sequence set, and taking a minimum value in the second distances as a second target distance between the subsequence in the second sequence set and the currently processed original sequence;
calculating the second target distance between each original sequence and each subsequence in the second sequence set, and determining the second distance distribution according to the second target distance.
In an alternative embodiment, the constructing the time-series graph by using the target sequence as a node of the time-series graph and using a time-series relationship of the target sequence in the same original sequence as an edge of the time-series graph comprises:
obtaining homologous sequences belonging to the same original sequence in the target sequence;
searching the homologous sequences in the original sequence to obtain a time sequence identifier corresponding to each homologous sequence;
and taking the homologous sequences as nodes of the time sequence graph, determining the connection relation between the homologous sequences according to the time sequence identification corresponding to each homologous sequence, and constructing the time sequence graph.
In a second aspect, an embodiment of the present disclosure further provides a time-series map building apparatus, where the apparatus includes:
the acquisition module is used for acquiring flow data information in a preset time period;
the first construction module is used for constructing an original sequence based on the flow data information;
the determining module is used for sampling the original sequence to obtain a plurality of subsequences, and determining the information gain of each subsequence based on the distance distribution between each subsequence and the original sequence;
the comparison module is used for comparing the information gain of each subsequence and obtaining a target sequence from the plurality of subsequences based on the comparison result;
and the second construction module is used for constructing the time sequence map by taking the target sequence as a node of the time sequence map and taking the time sequence relation of the target sequence in the same original sequence as an edge of the time sequence map.
In a third aspect, the present disclosure provides a computer-readable storage medium having stored therein instructions that, when run on a terminal device, cause the terminal device to implement the above-mentioned method.
In a fourth aspect, the present disclosure provides an apparatus comprising: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the method.
In a fifth aspect, the present disclosure provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the method described above.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
the time sequence map construction method of the embodiment of the disclosure acquires flow data information in a preset time period; constructing an original sequence based on the flow data information; sampling the original sequence to obtain a plurality of subsequences, and determining the information gain of each subsequence based on the distance distribution of each subsequence and the original sequence; comparing the information gains of the subsequences, and obtaining a target sequence from the subsequences based on the comparison result; and constructing the time sequence map by taking the target sequence as a node of the time sequence map and taking the time sequence relation of the target sequence in the same original sequence as an edge of the time sequence map. Therefore, the flow data information in the preset time period can be subjected to mode extraction, the time sequence map is obtained by combining the time sequence characteristics, the time sequence network security attack or access path is converted into the expression form of the time sequence map, the time sequence behavior is constructed on the time sequence map, and the network attack which cannot be detected by the equipment probe can be fed back to the time sequence map, so that the accuracy of the time sequence map is improved, the information contained in the flow data information is well utilized, and the problem of the latent attack behavior in a long time period can be well solved.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1 is a schematic flow chart of a time-series graph constructing method according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of another method for constructing a time-series graph according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram illustrating a method for determining a timing identifier according to an embodiment of the disclosure;
fig. 4 is a schematic structural diagram of a time-series map constructing apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
In order to solve the above problem, embodiments of the present disclosure provide a time sequence map construction method, which is described below with reference to specific embodiments.
Fig. 1 is a flowchart of a time-series graph constructing method provided by an embodiment of the present disclosure, where the method may be performed by a time-series graph constructing apparatus, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 1, the method includes:
step 101, obtaining flow data information in a preset time period.
In the field of network security, particularly in an intranet access environment, most access modes follow a certain time sequence rule, and the time sequence rule can be displayed in a time sequence map mode.
Specifically, first, flow data information in a preset time period is obtained, in this embodiment, the preset time period may be set according to an application scenario, and this embodiment is not limited, and the preset time period may be a longer time period, for example, a half year or a year. The flow data information records information such as access initiation times of one or more access terminals in the time period. The traffic data information may be a netflow protocol log or an http log or other network communication logs.
And 102, constructing an original sequence based on the flow data information.
After the traffic data information is obtained, it can be parsed to construct the original sequence. The method for constructing the original sequence may be various, and may be selected according to an application scenario, which is not limited in this embodiment, and examples are as follows:
in an optional implementation manner, dividing a preset time period into a plurality of sub-time periods, and constructing an original sequence based on the sub-time periods specifically includes:
firstly, analyzing the traffic data information, and acquiring the access times of an access terminal in each sub-time period in a preset time period.
Analyzing the traffic data information to obtain information such as access time, a source Internet Protocol (IP) Address and the like in the traffic information, and further obtaining the access times of the access terminal in each sub-period according to the information. The access terminal may be determined according to an application scenario, which is not limited in this embodiment, for example, the access terminal may be a source IP address.
It should be noted that the length of the sub-period may be determined based on the length of the preset period, and in an alternative embodiment, the length of the sub-period may be set to be relatively short, specifically, a proportional threshold may be preset, and a ratio of the length of the preset period to the length of the sub-period needs to be greater than the proportional threshold, for example, the proportional threshold may be 150, and if the preset period is half a year, the sub-period may be 1 day.
Further, an original sequence corresponding to each access terminal is constructed based on the access times and the time sequence relation among the sub-time periods.
After the access times in each sub-time period are determined, the access times of the same access terminal in a plurality of sub-time periods can be extracted, and the access times are sequenced according to the time sequence relation between each sub-time period, so that a corresponding original sequence is obtained.
For example, if analyzing the traffic data information, counting the number of times of access initiation of IP1 every day within 10 days, and the counting results sequentially from first to last according to the time sequence: 11. 2, 13, 5, 8, 6, 3, 10, 9, 4. The original sequence constructed is {11, 2, 13, 5, 8, 6, 3, 10, 9, 4}
In another optional implementation, the source IP Address, the destination Internet Protocol (IP) Address, and the time in the traffic data information may be obtained through parsing, and an original sequence is constructed according to the number of times of initiating the access from the target source IP Address to the destination IP Address in each sub-period of the preset period.
Step 103, sampling the original sequence to obtain a plurality of subsequences, and determining the information gain of each subsequence based on the distance distribution between each subsequence and the original sequence.
After the original sequences are obtained, sampling each original sequence to obtain a plurality of sub-sequences, in an alternative embodiment, the original sequences may be sampled through a preset sliding window and a sliding distance, so as to obtain a plurality of sub-sequences, and it should be noted that the plurality of sub-sequences may be sampled from different original sequences. The length and sliding distance of the sliding window may be set according to an application scenario, for example, the sliding window of IP1{11, 2, 13, 5, 8, 6, 3, 10, 9, 4} may be 7, the sliding distance is 1, and the obtained four subsequences are: {11, 2, 13, 5, 8, 6, 3}, {2, 13, 5, 8, 6, 3, 10}, {13, 5, 8, 6, 3, 10, 9}, {5, 8, 6, 3, 10, 9, 4 }. Optionally, the multiple sub-sequences obtained by sampling may be subjected to deduplication processing.
In this embodiment, distance calculation is performed on the subsequences and the original sequence, and the specific calculation method is not limited in this embodiment, and those skilled in the art can perform corresponding setting according to scene requirements and the like, determine the distance between each subsequence and each original sequence by calculation, and form distance distribution based on the distance, thereby determining the information gain of each subsequence according to the distance distribution.
In an alternative embodiment, the distance distribution including the currently processed sub-sequence may be calculated and the distance distribution of the currently processed sub-sequence may be eliminated, so as to determine the information gain of the currently processed sub-sequence according to the information difference between the two distance distributions, and process each sub-sequence, so as to obtain the information gain corresponding to each sub-sequence.
And 104, comparing the information gains of the subsequences, and obtaining the target sequence from the subsequences based on the comparison result.
Furthermore, the information gains of the sub-sequences may be compared numerically, and if the numerical value is larger, it indicates that the larger the information amount carried by the sub-sequence is, the stronger the representativeness of the sub-sequence is, so that the target sequence may be obtained from the plurality of sub-sequences based on the comparison result.
In an optional implementation manner, an integer N may be preset, and the subsequences are ordered from large to small according to the information gain, so as to obtain the first N subsequences as the target sequence.
And 105, constructing the time sequence graph by taking the target sequence as a node of the time sequence graph and taking the time sequence relation of the target sequence in the same original sequence as an edge of the time sequence graph.
In this embodiment, a time sequence graph can be constructed based on a target sequence obtained by screening, specifically, the target sequence can be used as a node of the time sequence graph, the target sequences are matched on the same original sequence, the target sequences successfully matched are sorted according to the time sequence, and an edge is established between two adjacent target sequences after sorting, so that the edge of the time sequence graph is determined, and the time sequence graph is constructed. In an alternative embodiment, the edge between adjacent target sequences may be a directed line pointing from a chronologically earlier target sequence to a chronologically later target sequence.
Taking the original sequence corresponding to IP1 as {11, 2, 13, 5, 8, 6, 3, 10, 9, 4} as an example, assuming that the first target sequence is {11, 2, 13, 5, 8, 6, 3} and the second target sequence is {13, 5, 8, 6, 3, 10, 9}, it may be determined that the timing of the first target sequence is prior to the timing of the second target sequence on IP1, two nodes of the time-series graph may be taken as the first target sequence and the second target sequence, and a line pointing from the first target sequence to the second target sequence is established as an edge of the time-series graph.
In summary, in the time sequence map construction method of the embodiment of the present disclosure, traffic data information within a preset time period is acquired; constructing an original sequence based on the flow data information; sampling the original sequence to obtain a plurality of subsequences, and determining the information gain of each subsequence based on the distance distribution of each subsequence and the original sequence; comparing the information gains of the subsequences, and obtaining a target sequence from the subsequences based on the comparison result; and constructing the time sequence map by taking the target sequence as a node of the time sequence map and taking the time sequence relation of the target sequence in the same original sequence as an edge of the time sequence map. Therefore, the embodiment of the disclosure can perform mode extraction on the traffic data information within the preset time period, and combine the time sequence characteristics to obtain the time sequence map, convert the time sequence network security attack or access path into the expression form of the time sequence map, construct the time sequence behavior on the time sequence map, and feed back the network attack that cannot be detected by the device probe on the time sequence map, thereby improving the accuracy of the time sequence map, and better utilize the information contained in the traffic data information, and can better solve the latent attack behavior of a long time period. Moreover, the method can be applied to scenes such as big data security analysis or threat hunting and the like.
Based on the foregoing embodiment, fig. 2 is a schematic flow chart of another time series graph constructing method provided by the embodiment of the present disclosure, as shown in fig. 2, where determining an information gain of each subsequence based on a distance distribution between each subsequence and an original sequence includes the following steps:
step 201, obtaining flow data information in a preset time period.
Step 202, constructing an original sequence based on the traffic data information.
In this embodiment, the information that can be obtained by analyzing the traffic data information includes, but is not limited to: and one or more of source IP address information, destination IP address information, port information, protocol information, flow size information and time information, and further constructing an original sequence based on the information obtained by the analysis.
Step 203, sampling the original sequence to obtain a plurality of subsequences, constructing a first sequence set based on the subsequences, and calculating a first distance distribution from the first sequence set to each original sequence.
In this embodiment, the first distance distribution is a distance distribution including a currently processed subsequence, and the second distance distribution is a distance distribution not including a currently processed subsequence, so that the information gain of the currently processed subsequence can be determined according to the information amount variation between the first distance distribution and the second distance distribution, specifically including:
in this embodiment, the first sequence set includes sub-sequences obtained by sampling the original sequences, and a first distance distribution from the first sequence set to each original sequence is determined by calculating a distance from each sub-sequence in the first sequence set to each original sequence. Wherein, the first distance distribution can reflect the distribution of the distances between the subsequences in the first sequence set and the original sequence.
In an optional embodiment, the calculating a first distance distribution from the first sequence set to each original sequence specifically includes:
step a1, a first sub-sequence of samples determined by sampling the currently processed original sequence is obtained.
In this embodiment, the first sampling subsequence is a subsequence included in the original sequence currently processed. In an optional embodiment, the corresponding relationship between the original sequence and the sub-sequence may be recorded in a preset list, and the preset list is retrieved according to the currently processed original sequence, so as to determine the first sampling sub-sequence corresponding to the currently processed original sequence. In another optional implementation, the currently processed original sequence may be sampled according to a preset sliding window and a preset sliding distance, so as to obtain a corresponding first sampling sequence.
Step a2, calculating a first distance between each first sample subsequence and the subsequence in the first sequence set, and taking the minimum value of the first distances as a first target distance between the subsequence in the first sequence set and the currently processed original sequence.
In this embodiment, the first distance is: euclidean distances between the first sub-sequence of samples and the sub-sequences in the first set of sequences. Calculating a plurality of first sampling subsequences included in the currently processed original sequence to obtain a first distance corresponding to each first sampling subsequence, and taking the minimum value of the plurality of first distances as a first target distance between the currently processed original sequence and a subsequence in the first sequence set.
For example, if there are N source IP addresses, then there are N elements in the original sequence set, and the original sequence set OrgS ═ OrgSi,i∈[1,2,…N]H, if the original sequence currently processed is OrgSiThe first sampling subsequence is SubSkThen there is a first sub-sequence of samples SubSk∈OrgSiThe subsequence in the first sequence set is SubSj,SubSkAnd SubSjHas a first distance of d (SubS)k,SubSj) And then:
Figure BDA0003530057270000111
wherein SubSk,rRepresents SubSkR value of, SubSj,rRepresents SubSjThe r-th value of (a). OrgSiAnd SubSjHas a first target distance of d (OrgS)i,SubSj) And then:
Figure BDA0003530057270000112
step a3, calculating a first target distance between each original sequence and each subsequence in the first sequence set, and determining a first distance distribution according to the first target distance.
In an alternative embodiment, a first target distance between the currently processed original sequence and each subsequence in the first sequence set is calculated, and the calculation is performed for each original sequence, so as to obtain a first target distance between each original sequence and each subsequence in the first sequence set, thereby forming a first distance distribution according to the plurality of first target distances.
And 204, removing the currently processed sub-sequences in the first sequence set to obtain a second sequence set, and calculating second distance distribution from the second sequence set to each original sequence.
In this embodiment, the currently processed subsequence is a subsequence currently performing information gain calculation, the currently processed subsequence may be removed from the first sequence set, so as to obtain a second sequence set, and a second distance distribution from the second sequence set to each original sequence is determined by calculating a distance from each subsequence in the second sequence set to each original sequence. And the second distance distribution can reflect the distribution of the distances between the subsequences in the second sequence set and the original sequence.
In an optional embodiment, the calculating a second distance distribution from the second sequence set to each original sequence specifically includes:
and b1, acquiring a second sampling subsequence for sampling and determining the currently processed original sequence.
In this embodiment, the second sampling subsequence is a subsequence included in the original sequence currently processed. In an optional implementation manner, the correspondence between the original sequence and the sub-sequence may be recorded in a preset list, and the preset list is retrieved according to the currently processed original sequence, so as to determine the second sampling sub-sequence corresponding to the currently processed original sequence. In another optional implementation, the currently processed original sequence may be sampled according to a preset sliding window and a preset sliding distance, so as to obtain a corresponding second sampling sequence.
And b2, calculating a second distance between each second sampling subsequence and the subsequence in the second sequence set, and taking the minimum value in the second distance as a second target distance between the subsequence in the second sequence set and the currently processed original sequence.
In this embodiment, the second distance is: euclidean distances between the second sub-sequence of samples and the sub-sequences in the second set of sequences. And calculating a plurality of second sampling subsequences included in the currently processed original sequence to obtain a second distance corresponding to each second sampling subsequence, and taking the minimum value in the plurality of second distances as a second target distance between the currently processed original sequence and the subsequences in the second sequence set.
For example, if the original sequence currently being processed is OrgSi', the second sampling subsequence is SubSk', then there is a second sub-sequence of samples SubSk′∈OrgSi', the subsequences in the second set of sequences are SubSj′,SubSk' and SubSjThe second distance of' is d (SubS)k′,SubSj'), then:
Figure BDA0003530057270000121
wherein SubSk,r' represents SubSk' the r-th value, SubSj,r' means SubSjThe r-th value of'. OrgSi' and SubSj' the second target distance is b (OrgS)i′,SubSj'), then:
Figure BDA0003530057270000131
and b3, calculating a second target distance between each original sequence and each subsequence in the second sequence set, and determining a second distance distribution according to the second target distance.
In an alternative embodiment, a second target distance between the currently processed original sequence and each subsequence in the second sequence set is calculated, and the calculation is performed for each original sequence, so as to obtain a second target distance between each original sequence and each subsequence in the second sequence set, thereby forming a second distance distribution according to the plurality of second target distances.
Step 205, determining the information gain of the currently processed subsequence according to the first distance distribution and the second distance distribution.
The information gain can be used to measure the difference between different probability distributions and thereby determine the amount of information contained in the corresponding data. In this embodiment, one of the reasons why the first distance distribution is different from the second distance distribution is: the first sequence set corresponding to the first distance distribution includes the currently processed subsequence, and the second sequence set corresponding to the second distance distribution does not include the currently processed subsequence, so that the change of the information amount between the first distance distribution and the second distance distribution can be calculated, and the information gain of the currently processed subsequence is determined according to the calculation result.
And step 206, comparing the information gains of the subsequences, and obtaining the target sequence from the subsequences based on the comparison result.
In an optional implementation manner, a number threshold N may be preset, where N is a positive integer, each subsequence is sorted from large to small according to a value of information gain, and the first N subsequences are taken as target sequences.
In another optional implementation, a quantity threshold M may be preset, where M is a positive integer, the target sequence with the largest information gain is determined by comparing the information gains corresponding to the sub-sequences, the target sequence is removed from the first sequence set, the first sequence set is updated, the information gains corresponding to the sub-sequences in the first sequence set are recalculated, the target sequence with the largest information gain is determined, and the target sequence set is updated according to the latest obtained target sequence until the number of the obtained target sequences is greater than or equal to M.
Optionally, the obtained target sequence may also be identified by a sequence number.
And step 207, acquiring homologous sequences belonging to the same original sequence in the target sequence.
In this embodiment, the target sequence can be used to match the original sequences, so as to screen out homologous sequences belonging to the same original sequence from the target sequence.
In an alternative embodiment, if the correspondence between the original sequence and the subsequence is recorded in advance, the subsequence in the correspondence may be screened according to the target sequence, so as to determine the homologous sequence belonging to the same original sequence.
And step 208, retrieving the homologous sequences in the original sequence to obtain a time sequence identifier corresponding to each homologous sequence.
Further, the homologous sequences can be searched in the original sequence, and the time sequence identifier corresponding to each homologous sequence is determined according to the position of the homologous sequence in the original sequence for searching and matching. It should be noted that a homologous sequence may be retrieved multiple times in the same original sequence, and thus a homologous sequence may correspond to multiple timing identifiers.
And step 208, taking the homologous sequences as nodes of the time sequence graph, determining the connection relation between the homologous sequences according to the time sequence identification corresponding to each homologous sequence, and constructing the time sequence graph.
Each homologous sequence has a corresponding timing identifier, so that the connection relationship between the homologous sequences can be determined according to the timing identifier, and the method for determining the connection relationship has various methods, which is not limited in this embodiment, for example: the homologous sequences can be connected two by two according to the sequence from the first to the last of the timing sequence identification.
Fig. 3 is a schematic diagram of a method for determining a timing identifier according to an embodiment of the disclosure, as shown in fig. 3, an original sequence is {11, 2, 13, 5, 8, 6, 3, 10, 9, 4}, a first homologous sequence is {11, 2, 13, 5, 8, 6, 3}, a second homologous sequence is {13, 5, 8, 6, 3, 10, 9}, a timing corresponding to a previous value in the original sequence is earlier, and the first homologous sequence is earlier than the second homologous sequence according to a timing relationship, so that the timing identifier of the first homologous sequence can be denoted as 1, and the timing identifier of the second homologous sequence is denoted as 2, where a larger timing identifier represents a later timing. The time-series graph may be constructed by establishing a directed connection from the first homologous sequence to the second homologous sequence using the first homologous sequence and the second homologous sequence as nodes of the time-series graph.
In summary, the time-series graph construction method of the embodiment of the disclosure can calculate the information gain of each subsequence based on the distance, so that a representative target sequence can be extracted, thereby improving the accuracy of a data source for constructing a graph, and construct a time-series graph according to the time-series relationship between target sequences, so that the time-series graph can reflect the time-series relationship between the target sequences, and the richness of information contained in the time-series graph is enhanced.
Fig. 4 is a schematic structural diagram of a timing map constructing apparatus provided in an embodiment of the present disclosure, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 4, the apparatus includes:
an obtaining module 401, configured to obtain traffic data information within a preset time period;
a first constructing module 402, configured to construct an original sequence based on the traffic data information;
a determining module 403, configured to sample the original sequence to obtain multiple sub-sequences, and determine an information gain of each sub-sequence based on a distance distribution between each sub-sequence and the original sequence;
a comparing module 404, configured to compare the information gains of the sub-sequences, and obtain a target sequence from the plurality of sub-sequences based on a comparison result;
a second constructing module 405, configured to construct the time-series graph by using the target sequence as a node of the time-series graph and using a time-series relationship of the target sequence in the same original sequence as an edge of the time-series graph.
Optionally, the first building module 402 is configured to:
analyzing the flow data information, and acquiring the access times of the access terminal in each sub-time period in a preset time period;
and constructing an original sequence corresponding to each access terminal based on the access times and the time sequence relation among the sub-time periods.
Optionally, the determining module 403 includes:
and the sampling unit is used for sampling the original sequence according to a preset sliding window and a preset sliding distance to obtain the plurality of subsequences.
Optionally, the determining module 403 includes:
a first calculating unit, configured to construct a first sequence set based on the subsequences, and calculate a first distance distribution from the first sequence set to each original sequence;
a second calculating unit, configured to remove the currently processed sub-sequences in the first sequence set to obtain a second sequence set, and calculate a second distance distribution from the second sequence set to each original sequence;
a determining unit, configured to determine an information gain of the currently processed subsequence according to the first distance distribution and the second distance distribution.
Optionally, the first computing unit is configured to:
acquiring a first sampling subsequence for sampling and determining the currently processed original sequence;
calculating a first distance between each first sampling subsequence and a subsequence in the first sequence set, and taking a minimum value in the first distances as a first target distance between the subsequence in the first sequence set and the currently processed original sequence;
calculating the first target distance between each original sequence and each subsequence in the first sequence set, and determining the first distance distribution according to the first target distance;
the second computing unit is configured to:
acquiring a second sampling subsequence for sampling and determining the currently processed original sequence;
calculating a second distance between each second sampling subsequence and a subsequence in the second sequence set, and taking a minimum value in the second distances as a second target distance between the subsequence in the second sequence set and the currently processed original sequence;
calculating the second target distance between each original sequence and each subsequence in the second sequence set, and determining the second distance distribution according to the second target distance.
Optionally, the second building module 405 is configured to:
obtaining homologous sequences belonging to the same original sequence in the target sequence;
searching the homologous sequences in the original sequence to obtain a time sequence identifier corresponding to each homologous sequence;
and taking the homologous sequences as nodes of the time sequence graph, determining the connection relation between the homologous sequences according to the time sequence identification corresponding to each homologous sequence, and constructing the time sequence graph.
The time sequence map construction device provided by the embodiment of the disclosure can execute the time sequence map construction method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.
To implement the above embodiments, the present disclosure also provides a computer program product, which includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the computer program/instruction implements the time-series graph constructing method in the above embodiments
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Referring now specifically to fig. 5, a schematic diagram of an electronic device 500 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device 500 in the disclosed embodiment may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the time-series graph construction method of the embodiment of the present disclosure when executed by the processing apparatus 501.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring flow data information in a preset time period; constructing an original sequence based on the flow data information; sampling the original sequence to obtain a plurality of subsequences, and determining the information gain of each subsequence based on the distance distribution of each subsequence and the original sequence; comparing the information gains of the subsequences, and obtaining a target sequence from the subsequences based on the comparison result; and constructing the time sequence map by taking the target sequence as a node of the time sequence map and taking the time sequence relation of the target sequence in the same original sequence as an edge of the time sequence map. Therefore, the embodiment of the disclosure can perform mode extraction on the traffic data information within the preset time period, and combine the time sequence characteristics to obtain the time sequence map, convert the time sequence network security attack or access path into the expression form of the time sequence map, construct the time sequence behavior on the time sequence map, and feed back the network attack that cannot be detected by the device probe on the time sequence map, thereby improving the accuracy of the time sequence map, and better utilize the information contained in the traffic data information, and can better solve the latent attack behavior of a long time period.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (10)

1. A time-series map construction method is characterized by comprising the following steps:
acquiring flow data information in a preset time period;
constructing an original sequence based on the flow data information;
sampling the original sequence to obtain a plurality of subsequences, and determining the information gain of each subsequence based on the distance distribution of each subsequence and the original sequence;
comparing the information gain of each subsequence, and obtaining a target sequence from the plurality of subsequences based on the comparison result;
and constructing the time sequence map by taking the target sequence as a node of the time sequence map and taking the time sequence relation of the target sequence in the same original sequence as an edge of the time sequence map.
2. The method of claim 1, wherein constructing an original sequence based on the traffic data information comprises:
analyzing the flow data information, and acquiring the access times of the access terminal in each sub-time period in a preset time period;
and constructing an original sequence corresponding to each access terminal based on the access times and the time sequence relation among the sub-time periods.
3. The method of claim 1, wherein the sampling the original sequence to obtain a plurality of sub-sequences comprises:
and sampling the original sequence according to a preset sliding window and a preset sliding distance to obtain the plurality of subsequences.
4. The method of claim 1, wherein the determining the information gain of each of the subsequences based on the distance distribution of each of the subsequences from the original sequence comprises:
constructing a first sequence set based on the subsequences, and calculating a first distance distribution from the first sequence set to each original sequence;
removing the currently processed sub-sequences in the first sequence set to obtain a second sequence set, and calculating a second distance distribution from the second sequence set to each original sequence;
and determining the information gain of the currently processed subsequence according to the first distance distribution and the second distance distribution.
5. The method of claim 4, wherein said calculating a first distance distribution of the first set of sequences to each of the original sequences comprises:
acquiring a first sampling subsequence for sampling and determining the currently processed original sequence;
calculating a first distance between each first sampling subsequence and a subsequence in the first sequence set, and taking a minimum value in the first distances as a first target distance between the subsequence in the first sequence set and the currently processed original sequence;
calculating the first target distance between each original sequence and each subsequence in the first sequence set, and determining the first distance distribution according to the first target distance;
said calculating a second distance distribution of said second set of sequences to each of said original sequences, comprising:
acquiring a second sampling subsequence for sampling and determining the currently processed original sequence;
calculating a second distance between each second sampling subsequence and a subsequence in the second sequence set, and taking a minimum value in the second distances as a second target distance between the subsequence in the second sequence set and the currently processed original sequence;
calculating the second target distance between each original sequence and each subsequence in the second sequence set, and determining the second distance distribution according to the second target distance.
6. The method according to claim 1, wherein constructing the time-series graph by taking the target sequence as a node of the time-series graph and taking a time-series relation of the target sequence in the same original sequence as an edge of the time-series graph comprises:
obtaining homologous sequences belonging to the same original sequence in the target sequence;
searching the homologous sequences in the original sequence to obtain a time sequence identifier corresponding to each homologous sequence;
and taking the homologous sequences as nodes of the time sequence graph, determining the connection relation between the homologous sequences according to the time sequence identification corresponding to each homologous sequence, and constructing the time sequence graph.
7. A time series map construction apparatus, comprising:
the acquisition module is used for acquiring flow data information in a preset time period;
the first construction module is used for constructing an original sequence based on the flow data information;
the determining module is used for sampling the original sequence to obtain a plurality of subsequences, and determining the information gain of each subsequence based on the distance distribution between each subsequence and the original sequence;
the comparison module is used for comparing the information gain of each subsequence and obtaining a target sequence from the plurality of subsequences based on the comparison result;
and the second construction module is used for constructing the time sequence map by taking the target sequence as a node of the time sequence map and taking the time sequence relation of the target sequence in the same original sequence as an edge of the time sequence map.
8. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing the processor-executable instructions;
the processor is used for reading the executable instructions from the memory and executing the instructions to realize the time sequence map construction method of any one of the claims 1-6.
9. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the time-series graph construction method according to any one of the above claims 1 to 6.
10. A computer program product comprising a computer program/instructions which, when executed by a processor, implement the timing graph construction method of any one of claims 1 to 6.
CN202210204857.XA 2022-03-03 2022-03-03 Time sequence map construction method, device, equipment and medium Pending CN114547491A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210204857.XA CN114547491A (en) 2022-03-03 2022-03-03 Time sequence map construction method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210204857.XA CN114547491A (en) 2022-03-03 2022-03-03 Time sequence map construction method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN114547491A true CN114547491A (en) 2022-05-27

Family

ID=81662486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210204857.XA Pending CN114547491A (en) 2022-03-03 2022-03-03 Time sequence map construction method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114547491A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115545082A (en) * 2022-10-20 2022-12-30 广东省麦思科学仪器创新研究院 Mass spectrogram generation method, device and system and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115545082A (en) * 2022-10-20 2022-12-30 广东省麦思科学仪器创新研究院 Mass spectrogram generation method, device and system and readable storage medium

Similar Documents

Publication Publication Date Title
CN114422267B (en) Flow detection method, device, equipment and medium
CN110413742B (en) Resume information duplication checking method, device, equipment and storage medium
CN110634047A (en) Method and device for recommending house resources, electronic equipment and storage medium
CN111552640A (en) Code detection method, device, equipment and storage medium
CN115277261A (en) Abnormal machine intelligent identification method, device and equipment based on industrial control network virus
CN110634050B (en) Method, device, electronic equipment and storage medium for identifying house source type
CN114547491A (en) Time sequence map construction method, device, equipment and medium
CN116827971B (en) Block chain-based carbon emission data storage and transmission method, device and equipment
CN110781066B (en) User behavior analysis method, device, equipment and storage medium
CN110895587A (en) Method and device for determining target user
CN111708680A (en) Error reporting information analysis method and device, electronic equipment and storage medium
CN110633411A (en) Method and device for screening house resources, electronic equipment and storage medium
CN114140723B (en) Multimedia data identification method and device, readable medium and electronic equipment
CN111628913B (en) Online time length determining method and device, readable medium and electronic equipment
CN113033552B (en) Text recognition method and device and electronic equipment
CN113051400A (en) Method and device for determining annotation data, readable medium and electronic equipment
CN113779103A (en) Method and apparatus for detecting abnormal data
CN111382233A (en) Similar text detection method and device, electronic equipment and storage medium
CN111143355A (en) Data processing method and device
CN116384945B (en) Project management method and system
CN116821160A (en) Correlation updating method, device, equipment and medium based on user behavior track information
CN117857388B (en) Switch operation information detection method and device, electronic equipment and computer medium
CN111461285B (en) Method and device for detecting electric equipment
CN108536362B (en) Method and device for identifying operation and server
CN116738184A (en) Application program fault analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination