WO2019107608A1

WO2019107608A1 - Method and system for counting data set

Info

Publication number: WO2019107608A1
Application number: PCT/KR2017/013903
Authority: WO
Inventors: 장룡호; 신동오; 양대헌
Original assignee: 주식회사 더볼터
Priority date: 2017-11-29
Filing date: 2017-11-30
Publication date: 2019-06-06
Also published as: KR102026128B1; KR20190062987A

Abstract

A method and system for counting a data set is disclosed. A counting method implemented using a computer in order to count network traffic for each data flow may comprise the steps of: allocating a memory space having a predesignated particular size to each different data flow; accumulating the number of packets belonging to a data flow by using a memory block corresponding to a partial space of an allocated memory space, which is shared with another data flow; and accumulating the accumulated number of packets in an accumulation table when the memory block is in a saturation state, wherein the memory block is confined within one word indicating a memory space in the allocated memory space, which corresponds to a word that a CPU can read at one time.

Description

Method and system for counting datasets

Embodiments of the present invention are directed to techniques for counting the number of packets received by a large data flow in the form of a stream.

An approximate counting technique is used to determine how many packets are sent from a source IP address in the network.

In general, a table is used to approximate a stream-like data set (i.e., a group of objects) by different flows. For example, whenever a packet is received in an electronic device such as a router, it is stored in a table together with a counter. If a new packet is received, a new entry is added to the table and the counter is set to one. If the packet is already received, the corresponding counter is incremented by one, and the table is searched to count how many packets have been received.

However, in recent years, the amount of data transmitted and received via the Internet such as live broadcasting has become too large (big data processing), and the SRAM of the router has a small memory space so that it is difficult to know where to send data continuously received in stream form, (I.e., whether to route to the next router).

Accordingly, a technique of counting the number and size of packets for each data flow so that data received in an electronic device using a small memory space, such as SRAM, can be quickly routed when large-scale data in stream form is received Is required.

Korean Patent Laid-Open No. 10-2007-0121219 relates to a packet accounting verification system and its verification method in an IP network, which extracts packets having a comparison object IP address from packets collected by a packet collection unit, And reconfiguring them as a flow, and comparing the amount of data of the flows to verify accuracy of the accuracies.

* Prior art literature

(Non-Patent Document 1) [1] A Linear-Time Probabilistic Counting Algorithm for Database Applications, KYU-YOUNG WHANG, ACM Transactions on Database Systems, Vol. 15, No. 2, June 1990.

TECHNICAL FIELD The present invention relates to a technology for measuring network traffic by data flow in an electronic device having a limited memory space such as a high-speed router. That is, the present invention relates to a technique for counting the number of packets per data flow and the size (capacity, bytes) of a packet.

CLAIMS 1. A computer implemented counting method for counting network traffic by data flow, comprising: allocating a predetermined size of memory space for each different data flow; Accumulating the number of packets belonging to the data flow using a memory block corresponding to a certain space shared with the memory block and accumulating the number of accumulated packets in the accumulation table when the memory block is in a saturation state, And the memory block may be confined within a word indicating a memory space corresponding to a word among the allocated memory space that the CPU can read at one time.

According to an aspect, the allocated memory space may have a multi-layer structure.

According to another aspect of the present invention, accumulating the number of packets belonging to the data flow using the memory block may include accumulating a number of bits belonging to the memory block each time at least one packet belonging to the data flow arrives, And checking whether the memory block is saturated based on the number of bits belonging to the memory block and the number of bits changed to the specific value.

According to another aspect of the present invention, the step of accumulating the number of packets in the accumulation table includes counting the number of packets accumulated in the memory block when the memory block is determined to be saturated, Accumulating the number of packets in the accumulation table, and initializing and reusing the memory block.

According to another aspect of the present invention, the step of counting the number of packets accumulated in the memory block includes counting the number of bits that have not been changed to the specific value for a plurality of bits belonging to the memory block, And counting how many packets have been transmitted from the source IP address based on the number of bits that have been transmitted.

According to another aspect of the present invention, the step of accumulating the number of packets in the accumulation table comprises: if the memory block is in a saturation state, accumulating the remaining bits excluding bits corresponding to noise among bits belonging to the memory block, Accumulates the number of packets accumulated in the accumulation table, and the noise may indicate that the corresponding bit has been changed to a predetermined specific value as it is shared with a packet belonging to another data flow.

According to another aspect of the present invention, accumulating the number of packets in the accumulation table may include accumulating the remaining space excluding the memory block in the allocated memory space to which the memory block belongs, Calculating a ratio of bits changed to a predetermined specific value to an average noise ratio of the memory block, counting the number of noise included in the memory block based on the calculated average noise ratio, Calculating a number of actually accumulated packets in the memory block by excluding the number of the accumulated number of packets in the memory block by comparing the number of actually accumulated packets with the ID of the corresponding data flow, Accumulating in the accumulation table.

According to another aspect of the present invention, accumulating the number of packets in the accumulation table may include accumulating the number of packets in the accumulation table until the ratio of the bits changed to the predetermined specific value among the bits belonging to the memory block becomes the average noise ratio, And initializing a selected bit among the bits belonging to the changed memory block.

According to another aspect of the present invention, the step of accumulating the number of packets in the accumulation table may include: determining, when the memory block is in a saturation state, a size of a packet generating the saturation and a number of packets accumulated in the memory block And associating the calculated size with the ID of the data flow and the number of accumulated packets with respect to the data flow and accumulating the accumulated size in the accumulation table have.

CLAIMS What is claimed is: 1. A counting system for counting network traffic by data flow, the system comprising: at least one storage unit; at least one processor, the at least one processor having a predetermined size Accumulating a number of packets belonging to a data flow using a memory block corresponding to a certain space shared with another data flow among the allocated memory spaces; And accumulating the accumulated number of packets in the accumulation table when the status of the memory is full, wherein the memory block includes a memory space corresponding to a word that the CPU can read at one time in the allocated memory space (Confinement) within a word that represents a word.

In the present invention, a memory space allocated for each data flow is limited in order to measure network traffic, and packet counting is performed through update such as reset in a confined memory space instead of the entire memory space Thus, the number and size (bytes) of packets received in stream form in an electronic device such as a high-speed router having a small memory space can be more accurately counted.

1 is a block diagram for explaining an internal configuration of a counting system in an embodiment of the present invention.

2 is a flow chart illustrating a counting method in an embodiment of the present invention.

3 is a diagram illustrating a data structure of a single layer used for packet counting according to an exemplary embodiment of the present invention.

FIG. 4 is a diagram showing a memory space of a single layer structure in a case where there is no noise in an embodiment of the present invention. FIG.

5 is a flowchart showing a counting method in the case where there is no noise in an embodiment of the present invention.

FIG. 6 is a diagram for explaining a counting method in the case where noise exists in an embodiment of the present invention. FIG.

7 is a diagram illustrating a memory space of a multi-layer structure according to an embodiment of the present invention.

8 is a diagram showing the structure of the accumulation table in the embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The present embodiments relate to a technology for counting a large-scale data set in the form of a stream. More particularly, the present invention relates to a technique of counting multi-layer data in an electronic device (i.e., a device having a small memory space) The present invention relates to an approximate counting technique for counting network traffic by a predefined data flow based on a structure.

In the present embodiments, the 'data flow' may be predefined as the source IP, and the source IP and the destination IP may be predefined as a pair. For example, when the source IP is defined as the data flow, how many packets are transmitted from the source corresponding to the source IP address can be counted, and if the source IP and the destination IP are defined as a pair of data flows, It can be counted how many data are transmitted from the source corresponding to the source IP address to the destination corresponding to the destination IP address.

In the present embodiments, a certain size of memory space may be allocated for different data flows (i.e., each object) in order to approximate the data set of the stream type by data flow, Allocated memory space 'can be represented by a matrix or a vector.

In the present embodiments, in order to count a large number of data packets within a confinement memory space, each of the data flows (i. E., Objects) share at least some of the memory space allocated to them with another data flow And a memory block that randomly shares a memory space may be represented by a virtual matrix or a virtual vector.

In the present embodiments, the memory space used for approximate counting is confined within a word, and a word is a word that the CPU can read at one time ) Of the memory space. For example, when a memory block (i.e., a virtual vector) becomes saturated, the accumulated number in the virtual vector may be counted and accumulated in the hash table together with the ID of the corresponding data flow. That is, the update can be performed on a word-by-word basis.

In the present embodiments, the approximate counting can be expressed as 'RCC (Recyclable Counter with Confinement)' or 'RCC + (Recyclable Counter with Confinement plus)', and can have the following two typical characteristics. The first feature is confinement of a virtual vector (ie, a memory block) within a word. The second feature is that if the virtual vector saturates, the number of packets accumulated in the saturated virtual vector Accumulates them in a cumulative table (i.e., a hash table), and recycles the virtual vector. Here, the feature of restricting the virtual vector to one word and reusing it will be described in more detail in FIG. 2 below.

In the present embodiments, 'noise' may indicate that at least one bit of bits corresponding to a memory block (i.e., a virtual vector) is shared with another data flow. In other words, the presence of noise indicates that a shared bit is present, and the absence of noise may indicate that bits belonging to a virtual vector are not shared with other data flows.

In the present embodiments, an 'object' may represent each data flow (ie, packet) belonging to a large scale data set transmitted at the source

FIG. 1 is a block diagram for explaining an internal configuration of a counting system in an embodiment of the present invention, and FIG. 2 is a flowchart illustrating a counting method in an embodiment of the present invention.

The counting system 100 according to the present embodiment may include a processor 110, a bus 120, a network interface 130 and a memory 140. Memory 140 may include an operating system 141 and a counting routine 142. The processor 110 may include an allocation unit 111, a memory accumulation control unit 112, a table accumulation control unit 113, and an initialization unit 114. In other embodiments, the counting system 100 may include more components than the components of FIG. However, there is no need to clearly illustrate most prior art components. For example, the counting system 100 may include other components such as a display or a transceiver.

The memory 140 may be a computer-readable recording medium and may include a permanent mass storage device such as a random access memory (RAM), a read only memory (ROM), and a disk drive. In addition, the memory 140 may store program codes for the operating system 141 and the counting routine 142. [ These software components may be loaded from a computer readable recording medium separate from the memory 140 using a drive mechanism (not shown). Such a computer-readable recording medium may include a computer-readable recording medium (not shown) such as a floppy drive, a disk, a tape, a DVD / CD-ROM drive, or a memory card. In other embodiments, the software components may be loaded into the memory 140 via the network interface 130 rather than from a computer readable recording medium.

The bus 120 may enable communication and data transfer between the components of the counting system 100. The bus 120 may be configured using a high-speed serial bus, a parallel bus, a Storage Area Network (SAN), and / or other suitable communication technology.

The network interface 130 may be a computer hardware component for connecting the counting system 100 to a computer network. The network interface 130 may connect the counting system 100 to a computer network via a wireless or wired connection.

The processor 110 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input / output operations of the counting system 100. The instructions may be provided by the memory 140 or the network interface 130 and to the processor 110 via the bus 120. The processor 110 may be configured to execute the program code for the assigning unit 111, the memory cumulative control unit 112, the table cumulative control unit 113, and the initialization unit 114. [ Such program codes may be stored in a recording device such as the memory 140. [

The allocation unit 111, the memory accumulation control unit 112, the table accumulation control unit 113, and the initialization unit 114 may be configured to perform the respective steps of FIG.

In step 210, the allocating unit 111 may allocate a memory space of a predetermined size for each different data flow. At this time, at least a part of the allocated memory space may be shared with other data flows. The allocating unit 111 allocates a word (word) indicating a word that the CPU can read at once to the shared memory block ). &Lt; / RTI >

In operation 220, the memory accumulation control unit 112 may accumulate the number of packets belonging to the data flow using a memory block corresponding to some space shared with other data flows among the allocated memory spaces.

For example, the memory blocks for each data flow (that is, the virtual vector) is assigned when the non-patent document of the above [1] A Linear-Time Algorithm for Probabilistic Counting Database Applications, KYU -YOUNG WHANG, ACM Transactions on Database Systems, Vol. 15, No. Approximate counting can be performed using Linear Counting as presented in 2, June 1990 . In this case, each memory block (i.e., a virtual vector) is confined in a word, and a process of counting packets by data flow and a process of measuring a result are performed in one memory read and write . &Lt; / RTI >

In step 230, when the memory block (i.e., the virtual vector) is in a saturation state, the table accumulation controller 113 may count the accumulated number of packets and accumulate them in the accumulation table.

In step 240, the initialization unit 114 selects one of the bits belonging to the memory block changed to the specific value until the ratio of the bits changed to the predetermined specific value among the bits belonging to the memory block becomes the average noise ratio By initializing the bit, at least some space of the memory block (i.e., the virtual vector) can be initialized.

For example, when the memory block (i.e., the virtual vector) is in a saturation state, the table accumulation control unit 113 counts the number of packets accumulated in the virtual vector, For example, a hash table). As such, as the accumulated packets in the memory block accumulate in the accumulation table, the memory block (i.e., the virtual vector) becomes empty, and the saturated memory block (i.e., the virtual vector) can be recycled. For example, when linear counting is used, the memory block may operate normally when only a portion (e.g., 70%) of the memory block (i.e., the virtual vector) is used. Accordingly, the memory accumulation control unit 112 checks whether the size of the used space of the memory block (i.e., the virtual vector) is less than or equal to a predetermined use reference size, so that the memory block (i.e., the virtual vector) ) State. At this time, if the size of the space used in the memory block becomes the use-based size (that is, the usage amount of the virtual vector reaches a predefined use reference size, for example, 70%), . Then, the table accumulation control section 113 can initialize at least some space of the memory block (i.e., the virtual vector) in order to recycle the saturated memory block (i.e., the virtual vector).

In this way, the utilization rate of the virtual vector can be calculated as the ratio of the average noise caused by the memory sharing of the virtual vector, rather than the utilization rate of the entire memory space, or the usage rate of the entire memory space It can also be calculated as the ratio of the average noise caused by the memory sharing of the vector. Then, a portion of the vector space can be randomly initialized until the utilization rate of the calculated virtual vector becomes equal to the utilization rate of the entire memory space. Then, the number of data packets accumulated in the virtual vector is calculated, and the number of actual accumulated data packets excluding the number of accumulated noises in the number of calculated packets can be calculated. The number of computed actual packets may be accumulated in a hash table (e.g., a quadratic probing hash table) with the ID of the data flow (i.e., the ID of the object).

FIG. 3 illustrates a data structure in a case where a memory space has a one-layer structure.

For example, 32 bits (bits) or 64 bits (bits) of memory space may be allocated per data flow for packet counting. That is, in FIG. 3, a single layer data structure may include a 32-bit or 64-bit CPU word array. A virtual vector of a predetermined size (i.e., a memory block) may be allocated to different data flows, and the allocated virtual vector may be confined within one word. For example, when a packet belonging to a specific data flow arrives, a word can be selected using a hash value (hash) of information of the packet (for example, source IP, destination IP, packet ID, etc.). Then, the positions of the bits of the virtual vector (hatched blocks in FIG. 3, 301 to 308) in the selected word can be designated by the hash value.

3, virtual vector f ₁ allocated in relation to data flow 1 is limited to within word w ₀ 310, virtual vector f ₂ allocated in relation to data flow 2 is limited to word w _m-1 320, , And the virtual vector f ₃ allocated in association with the data flow 3 can be limited to within w ₆ (330). Referring to w ₆ 330 in FIG. 3, when data flow 3 arrives, the bit positions (301 to 308) of the virtual vector in the selected word (w ₆ , 330) based on the hash value among the m words ) May be designated by the hash value. That is, eight bits (301 to 308) of the bits belonging to the virtual vector may be designated to accumulate data packets belonging to the data flow 3. At this time, since the same hash value can always be obtained with respect to the same data flow (i.e., the same object), the positions 301 to 308 of bits belonging to the allocated virtual vector (i.e., memory block) can be the same. Since the virtual vector is limited to one word, the size of the virtual vector can not be larger than the size of the word.

FIG. 4 is a diagram illustrating a memory space of a single layer structure in a case where there is no noise in an embodiment of the present invention. FIG. 5 is a diagram illustrating a memory space in which noise is absent Fig.

In FIG. 4, it is assumed that the memory space has a single layer structure. In the case of no noise, a multi-layer structure may be used in addition to a single layer.

FIG. 4 illustrates a case where an 8-bit virtual vector is used in a 32-bit word array. 5 may be performed by the allocation unit 111, the memory accumulation control unit 112, the table accumulation control unit 113, and the initialization unit 114, which are components of FIG. have.

In step 510, all bits belonging to the memory space allocated to the data flow may be initialized to zero. 4, a memory block (i.e., a virtual vector, 401 to 408) corresponding to a part of the memory space may also be initialized to zero.

In step 520, counting is started, and each time at least one packet belonging to a data flow arrives, the memory accumulation control unit 112 sets any bit belonging to the memory blocks 401 to 408 to a predetermined specific value (e.g., 1).

For example, the value of bit 401, bit 402, bit 405, bit 406, bit 408 may be changed from 0 to 1. At this time, the bit which is already 1 may not be changed. That is, the bits whose bit value is not 1 can be changed from 0 to 1 every time a packet arrives.

In step 530, the memory accumulation control unit 112 can check whether the memory block is saturated based on the number of bits belonging to the memory block and the number of bits changed to the specific value. At this time, the memory accumulation controller 112 may check whether the memory block is in a saturation state whenever a packet arrives. For example, it can be determined whether 70% or more of the space of a memory block (i.e., a virtual vector) is filled with 1s. If 70% or more is filled with 1, it is judged to be saturated, and if 70% or more is not filled with 1, it can be judged not to be saturated.

As described above, when the usage amount of the memory block (i.e., the virtual vector) exceeds 70% and is confirmed to be saturated, the table accumulation control unit 113 can count the number of accumulated packets by using linear counting. The table accumulation control unit 113 may accumulate the number of packets accumulated in the memory block in a cumulative table (e.g., a hash table). After accumulating in the accumulation table, the table accumulation control unit 113 can initialize all the values of the memory block (i.e., the virtual vector) to 0 and allow the accumulation to be recycled.

Referring to FIG. 4, the usage amount of the 8-bit virtual vector may correspond to 62.6% (5/8). At this time, if a bit with a value of 0 is selected in association with a newly arrived packet, the usage amount of the virtual vector may be 75% (6/8). Then, the table accumulation control unit 113 may store the accumulated number of packets (est) in the accumulation table and generate an event for initializing the virtual vector. Since the event is generated by one packet that over 70% of the virtual vector usage from 62.6% to 75%, the cumulative number of packets (est) is 1 Can be calculated on the basis of the number (i.e., 5/8). Then, a packet in which a virtual vector (that is, a memory block) is saturated can be regarded as one, and 1 can be added to est. Est accumulated in the virtual vector may be stored as a cumulative number of packets generating saturation, and the number of accumulated packets together with the packet information may be accumulated in a cumulative table (e.g., a hash table).

For example, when it is confirmed as being saturated, the table accumulation control section 113 sets the number of bits that have not been changed to a specific value (for example, 1) for a plurality of bits belonging to a memory block You can count. That is, the number of bits having a value of 0 belonging to a virtual block can be counted. The table accumulation control unit 113 may count how many packets are transmitted from the source IP address based on the counted number of bits. At this time, when the bit 408 is changed to 1, when the packet corresponding to the bit 408 corresponds to the packet generating the saturation, the packet information (for example, packet ID, etc.) (I.e., a hash table).

FIG. 6 shows the structure of the memory space in the presence of noise.

Noise may exist as the virtual vectors share memory space with each other. That is, the number of bits whose value is changed to 1 among the bits belonging to the allocated memory block (virtual vector) may not be all changed with respect to the corresponding data flow. In other words, it may correspond to a value changed to 1 by another flow sharing a bit, which may correspond to noise. Accordingly, when the memory block (i.e., the virtual vector) is confirmed to be in a saturated state, the table accumulation control section 113 can accumulate the remaining portion excluding the noise at the time of counting, and the initialization section 114 excludes noise The initialization of the memory block (i.e., the virtual vector) can proceed.

For example, in a saturation state, the table accumulation control unit 113 accumulates the number of packets accumulated in the remaining bits except the bit corresponding to noise among the bits belonging to the memory block (i.e., the virtual vector) As shown in FIG. At this time, the table accumulation control unit 113 stores the ratio of bits changed to a predetermined specific value (for example, 1) in the memory space (i.e., the virtual vector) A virtual vector). The table accumulation control unit 113 may count the number of noise included in the memory block based on the calculated average noise ratio. Then, the table accumulation control unit 113 can calculate the number of actually accumulated packets in the memory block by excluding the counted number of the noise among the accumulated number of packets in the memory block. The table accumulation control unit 113 may accumulate the accumulated number of actually accumulated packets and the ID of the corresponding data flow in a cumulative table (i.e., a hash table). At this time, the virtual vector may be randomly selected by one bit and initialized to 0 until a ratio of 1 becomes equal to the average noise ratio. That is, one bit of random bits among the bits 610 to 605 having a value of 1 among the 8-bit virtual vectors can be selected and initialized to zero.

For example, referring to FIG. 6, the ratio of 1 of the virtual vector may correspond to 5/8. Then, the table accumulation control unit 113 may calculate the ratio of 1 (i.e., 9/24 = 3/8) of the remaining space excluding the saturated virtual vector in one word to the average noise ratio. The initialization unit 114 may randomly select any one of the bits 601 to 605 and initialize it to 0 until the ratio 1/5 of 1 becomes equal to the average noise ratio 3/8.

FIG. 6 may show the structure of the memory space in a state before a specific virtual vector becomes saturated (for example, use of 70% or more, that is, 6 bits are changed to 1). In this case, the virtual vector shares bits with other data flows, so that the bits of the virtual vector can be changed to 1 in association with other data flows. That is, six or more bits may have a value of one.

At this time, since the use of the memory space is checked each time the data flow using the virtual vector is inspected, more than 6 bits can not have a value of 1. At the time of verification, however, it is not saturated to 70% or less. However, there may occur a case where the bit value is changed to 1 by another data flow while the new data flow arrives and the verification is resumed. Accordingly, in the case of the newly arrived data flow, when the virtual vector of its own is filled with 7 bits or 8 bits, that is, in the saturated state, the number of accumulated bits is 6, which is the limit of the number of bits that can be changed by itself The total number est can be calculated and the number of accumulated packets est can be constant depending on the size of the virtual vector.

In addition, if the ratio of the average noise is equal to or greater than the ratio of 1 of the virtual vector, it can be determined that there is no noise associated with the virtual vector. That is, if the average noise ratio in the limited word is equal to or greater than the ratio of 1 of the virtual vector calculated by counting, it can be determined that the other data flow is not the noise itself. In other words, it can be determined that the bit value changed to 1 is changed to 1 by the data sent by the bit value, and it can be determined that there is no noise. In this way, in the absence of noise, the table accumulation control section 113 does not separately calculate noise even in the saturation state, and as a result, noise may not be excluded when calculating the number of finally accumulated packets.

In FIG. 7, the case of using two layers is described as an example, but this corresponds to the embodiment, and three or more layers may be used.

Referring to FIG. 7, a layer 1 710 uses a CPU word array, and a layer 2 720 can represent a set of data structures used in a layer 1. At this time, the layer 1 710 and the layer 2 720 use the same hash and may have the same size. Accordingly, if the hash calculation is performed only once, both the layer 1 710 and the layer 2 720 can be updated.

For example, the number of arrays in layer 2 720 may be equal to the number of 1's just before the virtual vector (i.e., memory block) is saturated. Then, the number of 1's immediately before the virtual vector is saturated may correspond to the number of cases that can be initialized according to the ratio of the noise. For example, when 5 bits out of 8 bits have a value of 1, 5 bits that can be initialized from 1 bit to 5 bits to 1 according to noise (i.e., ratio of 1 in remaining space excluding a virtual vector in one word) There may be cases. That is, there may be five cases immediately before saturation. That is, the number of mean noises (est _noise ) may have five values. At this time, since the number of accumulated packets (est) is constant according to the size of the virtual vector, there are 5 results (i.e., cumulative number excluding the noise) including est-est _noise . For example, when an 8-bit virtual vector is used, five arrays exist in the layer 2 720, and five results may be units of each array. That is, when a packet arrives, it branches to five branches in the layer 1, and if the layer 1 is full, packets accumulated in the layer 2 can be sent up and accumulated.

In summary, when a multi-layer (for example, two layers) is used, when a specific virtual vector is saturated in the layer 1 710, the signal is saturated in the layer 1 710 according to the noise of the word to which the virtual vector belongs The number est est _noise can be calculated. Then, the specific virtual vector can be initialized. At this time, the saturated number est est _noise (i. E., The number of accumulated packets when the layer 1 virtual vector is saturated) is not immediately stored in the accumulation table, and the word array of layer 2 720 is selected Can be used. If an array with a unit est est _noise at layer 2 720 is selected, then a saturated est est _noise at layer 1 710 will result in one data flow (i. E. , Object), approximate counting and initialization can be performed on the layer 2 in the same manner as the layer 1. When the layer 2 becomes saturated, the number of accumulated packets, the packet ID, and the like can be cumulatively stored in the accumulation table. For example, the number of accumulated packets est _L2 calculated at the layer 2 720 may be stored in the accumulation table together with information (e.g., packet ID) of the packet that caused the saturation by multiplying by a unit have.

In other words, the size of the packet, as well as the number and ID of the accumulated packets, can be stored together in the accumulation table. At this time, since the size of the packet differs from one packet to another, the number of bytes (bytes) transmitted from the source (i.e., byte counting) may be performed in addition to counting the number of packets. Then, the size of the packet (i.e., the size of the packet in which the virtual vector of the layer 1 or the layer 2 is saturated) of the packet to be pushed from the layer 2 / cumulative table or the layer 2 to the cumulative table due to saturation and the packet belonging to the data flow Byte counting may be performed by multiplying the number of accumulated packets (e.g., the number of accumulated packets, e.g., 34). The number of accumulated packets and the ID (or ID of the data flow) of the packet that caused the saturation are calculated by multiplying the size of the packet generating the saturation by the number of accumulated packets, And stored in the accumulation table.

As described in FIG. 7, the cumulative size as well as the number of accumulated packets (i.e., cumulative frequency) can be approximated (i.e., byte counted). In a stream-like data flow, even the same data packet can have different sizes. In other words, not only the packet ID but also the size of the information included in the payload of the packet may not be constant. For example, if there are two packets with the same source IP and destination IP, the size of the information contained in each of the payload fields of the two packets may be different. Accordingly, although two packets can be regarded as the same object, the sizes of the packets may be different from each other.

8, when the memory block (i.e., the virtual vector) of the layer 2 810 is saturated, the number of accumulated packets est and the cumulative size calculated through byte counting are Accumulated in the accumulation table 820. At this time, as the virtual vector of the layer 2 810 becomes saturated, a cumulative size can be calculated using a sampling technique. For example, when the virtual vector assigned to the packet x is saturated, that is, when the packet x corresponds to the packet that caused the saturation, the number est accumulated in the virtual vector of the layer 2 810 can be calculated. An event to be stored in the accumulation table 820 may be generated. Then, the table accumulation control unit 113 refers to the size of the packet x causing the saturation of the virtual vector to be the size of est, and the table accumulation control unit 113 accumulates the product of the size (size) of the packet x and the number of accumulated packets (est) Size (est * size). The table accumulation control unit 113 associates the accumulated size (est * size), the accumulated number of packets (est), and the packet ID (obj ₁ , that is, the ID of the data flow) with each other and stores them in the accumulation table 820 .

As such, a cumulative table (e.g., a hash table) may be needed to count the stream-like data set by data flow. The accumulation table 820 can be represented as a hash table by locating or inserting the packet in a linear probing or quadratic probing manner using the hash value of the packet ID. Since the efficiency of insertion or update of a packet (i.e., object) decreases with a higher usage of a hash table, it is difficult to accumulate a large-scale data set in real-time by data flow in a limited memory space. At this time, in the counting system 100, all the data flows (that is, all the packets) are not accumulated in the hash tables having low efficiency, and the memory blocks (i.e., virtual vectors) And the number of insertions and updates of the hash table 820 can be significantly reduced by selectively accumulating in the hash table only when the virtual vector is saturated.

In addition, in the case of packets having a small frequency, the allocated virtual vector can not be saturated, and thus may not be stored in the accumulation table 820.

In addition, since a multi-layer data structure is supported in addition to a one-layer structure, a single-layer structure can guarantee high accuracy by using a small-sized virtual vector. At this time, since the number of accumulatable packets in the virtual vector is small, the saturation of the virtual vector may occur frequently, and the cumulative number of times of accumulation in the hash table 9820 may also occur. Then, the allocation unit 111 can increase the size of the virtual vector relatively more than before, and the cumulative capacity of the virtual vector is improved due to the increase of the size of the virtual vector, so that the number of times the packets are accumulated in the hash table 820 Can be reduced.

In addition, a memory space of a multi-layer structure can be used. In the case of using a multi-layer structure, the rate of increase of the cumulative capacity with the increase of the virtual vector size can be amplified as compared with that of a single layer. For example, the cumulative capability of 8-bit virtual vectors (16 bits in total) in two layers can be better than the cumulative capability of 16-bit virtual vectors. That is, when a memory block of the same size (i.e., a virtual vector) is allocated to a packet belonging to one data flow, more packets can be accumulated than when a single layer is used by using a multilayer structure, Can be relatively reduced. Also, since the frequency (i.e., the size of the accumulated packet) and the cumulative size are simultaneously measured and stored together in the hash table, the number and size of the packets transmitted from the source IP can be provided together. That is, network traffic can be measured more accurately.

The methods according to embodiments of the present invention may be implemented in the form of a program instruction that can be executed through various computer systems and recorded in a computer-readable medium.

The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A computer implemented counting method for counting network traffic by data flow, the method comprising:

Allocating a predetermined size of memory space for each different data flow;

Accumulating the number of packets belonging to the data flow using a memory block corresponding to a certain space shared with other data flows among the allocated memory spaces; And

Accumulating the number of accumulated packets in the accumulation table if the memory block is in a saturation state

Lt; / RTI >

The memory block is confined within a word indicating a memory space corresponding to a word which can be read at one time by the CPU among the allocated memory spaces

/ RTI >
The method according to claim 1,

The allocated memory space may have a multi-layer structure

/ RTI >
The method according to claim 1,

Wherein accumulating the number of packets belonging to a data flow using the memory block comprises:

Changing one bit belonging to the memory block to a predetermined specific value each time at least one packet belonging to the data flow arrives; And

Determining whether the memory block is in a saturated state based on the number of bits belonging to the memory block and the number of bits changed to the specific value

&Lt; / RTI >
The method of claim 3,

The step of accumulating the number of packets in the accumulation table comprises:

Counting the number of packets accumulated in the memory block when the memory block is confirmed to be in a saturated state; And

Accumulating the number of packets accumulated in the memory block in the accumulation table; And

Initializing and reusing the memory block

&Lt; / RTI >
5. The method of claim 4,

Wherein counting the number of packets accumulated in the memory block comprises:

Counting the number of bits not changed to the specific value for a plurality of bits belonging to the memory block; And

Counting how many packets have been transmitted from the source IP address based on the counted number of bits

&Lt; / RTI >
The method according to claim 1,

The step of accumulating the number of packets in the accumulation table comprises:

Accumulating in the accumulation table the number of packets accumulated in the remaining bits except the bit corresponding to noise among bits belonging to the memory block when the memory block is in a saturation state,

The noise indicates that the bit has been changed to a predetermined specific value as it is shared with a packet belonging to another data flow

/ RTI >
The method according to claim 1,

The step of accumulating the number of packets in the accumulation table comprises:

A ratio of a bit changed to a predetermined specific value for a remaining space excluding the memory block among the allocated memory spaces to which the memory block belongs is set to an average noise ratio of the memory block ;

Counting the number of noise included in the memory block based on the calculated average noise ratio;

Calculating the number of packets actually accumulated in the memory block by excluding a number of the noise among the number of packets accumulated in the memory block; And

Accumulating the number of the actually accumulated packets calculated and the ID of the corresponding data flow in the accumulation table

&Lt; / RTI >
8. The method of claim 7,

The step of accumulating the number of packets in the accumulation table comprises:

Initializing a selected bit among the bits belonging to the memory block changed to the specific value until a ratio of bits changed to a predetermined specific value among the bits belonging to the memory block becomes the average noise ratio

&Lt; / RTI >
The method according to claim 1,

The step of accumulating the number of packets in the accumulation table comprises:

Calculating a size of the data flow based on a size of a packet generating the saturation and a number of packets accumulated in the memory block when the memory block is in a saturation state; And

Associating the calculated size with the ID of the data flow and the number of packets accumulated with respect to the data flow and accumulating the accumulated size in the accumulation table

&Lt; / RTI >
1. A counting system for counting network traffic by data flow,

At least one storage unit;

At least one processor

Lt; / RTI >

Wherein the at least one processor comprises:

Allocating a predetermined size of memory space for each different data flow;

Accumulating the number of packets belonging to the data flow by using a memory block corresponding to some space shared with other data flows among the allocated memory spaces; And

Accumulating the number of accumulated packets in the accumulation table if the memory block is in a saturation state;

Lt; / RTI >

The memory block is confined within a word indicating a memory space corresponding to a word which can be read at one time by the CPU among the allocated memory spaces

The counting system comprising: