CN102946320B - Distributed supervision method and system for user behavior log forecasting network - Google Patents
Distributed supervision method and system for user behavior log forecasting network Download PDFInfo
- Publication number
- CN102946320B CN102946320B CN201210382322.8A CN201210382322A CN102946320B CN 102946320 B CN102946320 B CN 102946320B CN 201210382322 A CN201210382322 A CN 201210382322A CN 102946320 B CN102946320 B CN 102946320B
- Authority
- CN
- China
- Prior art keywords
- network
- lca
- popularity
- user
- access
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000000284 extract Substances 0.000 claims abstract description 12
- 238000004458 analytical method Methods 0.000 claims abstract description 8
- UNEIHNMKASENIG-UHFFFAOYSA-N para-chlorophenylpiperazine Chemical compound C1=CC(Cl)=CC=C1N1CCNCC1 UNEIHNMKASENIG-UHFFFAOYSA-N 0.000 claims abstract 11
- 238000004364 calculation method Methods 0.000 claims description 20
- 230000006399 behavior Effects 0.000 description 25
- 238000012544 monitoring process Methods 0.000 description 13
- 230000006870 function Effects 0.000 description 11
- 238000007726 management method Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000011160 research Methods 0.000 description 6
- 230000000903 blocking effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 244000062645 predators Species 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
- Computer And Data Communications (AREA)
Abstract
The invention discloses a distributed supervision method for a user behavior log forecasting network, which comprises the steps that a data package and strategy prefetching server PCPP acquires a network access request data package initiated by a network user, extracts an access log, and uploads to a log collection and analysis server LCA; the access log is memorized by the LCA; the network service popularity is calculated according to the access log; the LCA acquires k network service identifications corresponding to the network service popularity according to the access log, and returns to the PCPP; and the PCPP conducts strategy prefetching on a preset strategy library according to the k network service identifications and user attribute information in the access log, and supervises and handles a network user access request according to a prefetched strategy. The method can conduct network supervision and handling quickly, efficiently and accurately when mass network users request network service access.
Description
Technical Field
The invention relates to the technical field of network supervision, in particular to a distributed user behavior log prediction network supervision method and system.
Background
With the rapid development of the internet, network services become diverse and colorful. As the internet has started to be an open-free principle, more and more new web services are constantly being developed and are being accessed to the internet for global user access and access. Network services become diverse and colorful, and information exchange between people becomes more convenient. The network service with rich contents is a hotbed for dissonant information transmission while bringing great convenience to the life of people. Therefore, network supervision is an important research topic.
The network supervision aims at collecting, analyzing and processing network information and user operation behaviors, and identifying and extracting specific activity characteristics implied in the network service information and the user behaviors from the network information and the user operation behaviors, and the core of the network supervision is the function of discovering and early warning in advance. All countries in the world have paid high attention to the research in the aspect of network supervision, and monitoring infrastructures of government, financial and key industries are established. For example, the U.S. federal bureau of investigation FBI proposed "Carnivore" program as early as 2001, the french department of defense established the "french holon" system in 2004, the european ERCIM organization proposed the network monitoring conference program in 2007, and the british government communications headquarters initiated the MTI program in 2009. Various network equipment merchants, enterprises and scientific research students also actively conduct relevant research, develop various network behavior analysis products and provide a plurality of network monitoring schemes. In terms of network management, there are documents that propose autonomous network management systems for heterogeneous networks and policy-based network management methods, and the concepts of "self-management" and "trust management" are also widely applied to network management methods.
Logging plays an important role for regulatory systems, and therefore weblogs are of increasing interest for research. The monitoring of massive network users and information determines that the operation of a network supervision system needs a large amount of data materials, and simultaneously, a large amount of log records are generated. Current network policing architectures generally employ a single log server structure. Related research work for predicting streaming media access behavior based on user operation logs provides a centralized log collection server, and the log collection server realizes prediction of streaming media access by collecting and analyzing a large number of user operation records.
Aiming at the requirement of domestic network information safety supervision, a corresponding network supervision system and a key technology are continuously provided. However, in the process of implementing the present invention, the inventor finds that the prior art has at least the following problems:
the granularity of the existing supervision mode is not accurate enough, and the existing supervision mode does not have access control of a specific user on specific network service content. The current technology for processing illegal services or users is generally as follows: a) domain name hijacking; b) IP address blocking; c) specific port blocking; d) SSL connection blocking; e) keyword filtering blocking, etc.
Existing access logs typically employ a single structure. The simplex structure log server has a certain bottleneck in the processing of a large amount of data because: a) when a large-scale network service is requested, the time delay for extracting, analyzing and processing data by the system is increased; b) single node failure problem; c) the expandability and the robustness are poor; d) easily become a key object to be attacked, and the like.
The single terminal has limited capability, and when a large-scale network service is requested, the system processing time delay is increased, so that the real-time requirement of network supervision cannot be met. There is no access control for a particular user to particular web service content. The large-capacity network access data storage inevitably causes the system storage space to become a performance bottleneck. The scalability is poor. Obviously, such a single Client/Server (C/S) architecture cannot meet the requirements of efficient, fast, and accurate network supervision in scale and function.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a distributed user behavior log prediction network monitoring method and system. The technical scheme is as follows:
a distributed user behavior log prediction network policing method, the method comprising:
the method comprises the following steps that a data packet acquisition and strategy pre-fetching server PCPP captures a network access request data packet initiated by a network user, extracts an access log and uploads the access log to a log collection and analysis server LCA;
the LCA stores the access log and calculates the popularity of the network service according to the access log;
the LCA acquires k network service identifications corresponding to the popularity of the network service according to the access log; and returns to the PCPP;
and the PCPP performs strategy prefetching to a preset strategy library according to the k network service identifiers and the user attribute information in the access log, and performs supervision and treatment on the network user access request according to the prefetched strategy.
The PCPP captures a network access request data packet initiated by a network user from network data forwarding equipment in a bypass monitoring mode.
The access log is stored in a four-tuple form and comprises an identifier CNID of a network where the network user is located, an IP address DIP of a network user access target, a port address DPort of the network user access target and a network service identifier URL.
The LCA storing the access log, including:
the LCA extracts < CNID, DIP, DPort > and carries out hash calculation to obtain a key value;
according to the key value, the LCA obtains a successor node LCA stored by the access log, and distributes the user access log to the successor node;
and after receiving the distributed access log, the LCA of the subsequent node stores the access log into a network service access log library.
The calculating the popularity of the network service according to the network log comprises the following steps:
the LCA stores the access log in a network log storage table and establishes a network service popularity storage table;
circularly comparing the weblog storage table with the web service popularity storage table, and if the CNID, DIP, DPort and URL of the ith record in the weblog storage table and the jth record in the web service popularity storage table are the same, setting the corresponding web service popularity in the web service popularity storage table by the LCAAdding 1, and deleting the record item in the weblog storage table;
if the CNID, DIP, DPort or URL of the ith record in the weblog storage table is different from the CNID, DIP, DPort or URL of the jth record in the web service popularity storage table, the LCA adds a corresponding new record item in the web service popularity storage table and sets corresponding web service popularityAnd the number is 1, and the record item in the weblog storage table is deleted.
And deleting the access logs which are not updated within the set time length in the network service popularity storage table.
The method further comprises the following steps:
and when the difference between the current system time of the LCA and the last time of the LCA for performing network service access popularity calculation is equal to tau, the LCA starts a new round of network service popularity calculation.
The LCA acquires k network service identifications corresponding to the network service popularity according to the access log, and the method comprises the following steps:
the LCA obtains all relevant URL numbers in a network service popularity storage table by using CNID, DIP and DPort;
according to a preset k value, acquiring URLs of k items with the network service popularity ranking;
if more than k pieces of network services have the same network service popularity, the URL with the network service popularity ranked k items most ahead in the accessed time is extracted.
The method further comprises the following steps:
user attribute information in the access log is extracted from the network log by the PCPP; the preset strategy library is set according to the strategy of network supervision;
the supervision and treatment of the network user access request according to the pre-fetched strategy comprises the following steps:
acquiring supervision strategies between the network users and the k network services from the strategy library;
when the network user accesses the request next time, judging whether the network user aims at the k network services, if so, directly carrying out network supervision and treatment on the network user according to a pre-fetching strategy; otherwise, extracting the data packet requested to be accessed by the network user, and generating the access log of the network user.
A distributed user behavior log prediction network supervision system, the system comprising PCPP and LCA, wherein,
the PCPP is used for capturing a network access request data packet initiated by a network user, extracting an access log and uploading the access log to the LCA; acquiring popularity of network service, acquiring a network supervision policy according to a preset policy library, and supervising and handling a network access request of the network user;
the LCA is used for distributing and storing the access log, calculating the popularity of the network service according to the access log and issuing the popularity to the PCPP.
The system comprises a plurality of LCAs, wherein the LCAs form a DHT network;
the LCA performs hash calculation on the received access log through a distributed hash algorithm to obtain a key value, obtains a successor node LCA stored in the access log according to the key value, and distributes the user access log to the successor node.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
by collecting, storing and analyzing the operation log records of the network users in a distributed mode, calculating the popularity of the network service according to the access log, and predicting the network service which the behavior of the user accessing the network at the next time is possible to aim at according to the popularity of the network service, the supervision strategy required by the network access at the next time is pre-fetched. And combining the attributes of the network service and the user attributes, and calling a corresponding preprocessing strategy to realize the rapid, efficient and accurate network supervision and treatment of massive network users on the network service access request process.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a distributed user behavior log prediction network supervision method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a PCPP querying a user using an access network ID according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a distributed user behavior log prediction network monitoring system according to a second embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The operation of the network information security supervision system requires a large amount of data information and also generates a large amount of data records. Due to the limited capability of a single terminal and the strong real-time performance of network service supervision, cooperation among multiple nodes is more needed to improve the performance and stability of the system. Therefore, a network monitoring model (DUHP) based on Distributed Hash Table (DHT) is provided, and the DUHP realizes the advantages of user behavior log sharing, expandability, low cost, load balancing and the like in a Distributed environment by using the characteristics of distribution, self-organization and the like. The DUHP is characterized in that a user access network service log is distributed and stored in a distributed mode, the network service access popularity (hot) is calculated, the future access behavior of the user is predicted based on the natural attributes (such as age, hobbies, regions and the like) of the user and the log analysis result, and the dynamic attributes (such as a black-and-white list) of the user are matched with the attributes (such as a domain name, a black-and-white list and the like) of the network service according to the future access behavior, so that strategy prefetching is realized, and the high-efficiency supervision is achieved.
The scheme of the embodiment of the invention provides a novel network supervision system based on distributed behavior log prediction, aiming at the problems of the existing related network supervision schemes/technologies. It is characterized in that: 1) and predicting the future access behavior of the user based on the user operation log. The characteristic is used for solving the problems of low time delay and high-efficiency supervision in the process of requesting to access the network service by the user. 2) And carrying out access right matching based on the user attribute and the network service attribute. The characteristic is used for realizing the access control of a specific user on specific service content, solving the problem of the existing coarse-grained supervision based on IP addresses (ports) and the like, and achieving the supervision of user-service matching with finer granularity. 3) A Peer-to-Peer (Peer-to-Peer, P2P) network is employed to build a distributed "user-service" oplog distribution, computation and storage structure. The characteristic aims to solve the bottleneck problem of a single node and provide a solution for large-scale log record calculation and storage. The embodiment of the invention aims to meet the support of network service diversity and efficient supervision under a large-scale environment.
Example one
As shown in fig. 1, a flowchart of a distributed user behavior log prediction network supervision method provided in an embodiment of the present invention is specifically as follows:
and step 10, the PCPP captures a network access request data packet initiated by a network user, extracts an access log and uploads the access log to the LCA.
When a network user initiates a network access request, a data packet collection and policy pre-fetching server (PCPP) captures the data packet and extracts an access log according to a given rule. When the access log is extracted, the PCPP will do two things: 1) inquiring an IP address rule base (IPRD) according to the IP address of a source end (user) in the access log and obtaining an access network ID (CNID) used by the user; 2) uploading access logs to a log collection and analysis server (LCA) to which it is connected. Further, the LCA request connected with the access log is inquired about the popularity of the network service access according to the key information (such as URL) in the access log
The method comprises the following specific steps:
suppose that a user SIP (i) initiates an access request WebSite to a web siteiThe pcpp (p) fetches the data packet from the network data forwarding device (such as router, switch, etc.) and extracts the user IP address Source IP (i) in a certain way (such as bypass monitoring).
Pcpp (p) queries the IPRD by means of the Source IP (i) for the CNID used by the user. (CNID may be any user-defined unambiguous character, and for convenience of description, the CNID is represented by a positive integer in this embodiment, e.g., "1" for an educational network, "2" for a telecommunications network, etc.)
IPRD queries according to certain rules (e.g. according to the longest match method) and returns CNID to the pcpp (p) to which the query request was sent. As shown in fig. 2, the schematic diagram of querying the user for the access network ID for the PCPP is shown. The IP address library comprises an IP address range, a corresponding access network type and an access network ID. PCPP (p) acquires the source IP address of the user, inquires the IP address rule base IPRD, matches according to the longest matching rule, and returns the access network ID to the PCPP (p).
Pcpp (p) extracts the web service identification (e.g., URL) using, for example, Deep Packet Inspection (DPI) techniques and creates a user access log with a quadruplet < CNID, DIP, DPort, URL >.
Pcpp (p) uploads the created user access log to its connected LCA. At this point, the access log is uploaded to the LCA.
And step 20, the LCA stores the access log and calculates the popularity of the network service according to the access log.
When the user access log uploaded by the PCPP is reached, the LCA extracts the < CNID, DIP and DPort >, and hash calculation is carried out to obtain a key value. The hash calculation here may be a hash calculation commonly used in the art. According to the key value, the LCA obtains a successor node (key) stored in the user access log and distributes the user access log to the successor node. And after receiving the distributed access log, the subsequent node stores the access log into a network service access log library.
After the user access logs distributed by other LCAs in the DHT network, LCA (k) stores the user access logs into its web service access log repository (war) according to the storage structure shown in table one.
Watch 1
Wherein DIP represents the destination IP address. DPort represents a destination port. WebService denotes a web service identification such as URL (uniform resource locator). In this embodiment, for convenience of illustration, we only use URL instead of WebService for illustration.
Here, the storage of the access log by the LCA is completed. In fact, the distributed user behavior prediction proposed in this embodiment is based on distributed storage. The LCAs are constructed into a network structure according to a distributed network, and the distributed network built by a plurality of LCAs completes the storage of all access logs.
Further, the LCA needs to calculate popularity of the web service according to the obtained access log. The popularity of the network service is an index for identifying the popularity of a specific network service, and the more users accessing a network service, the higher the popularity of the network service, and the higher the popularity index of the network service. The popularity of the network service is calculated, and the purpose is to obtain the probability of the network service which is possibly accessed by the user next time through the index, so that the network service direction of the network which is accessed by the user next time is predicted.
To alleviate LCA pressure, web service popularity calculations are not frequent. For this embodiment, a period threshold τ is set, that is, when the difference between the current system time of the LCA and the last time of the network service access popularity calculation is equal to τ, the LCA will start a new round of network service popularity calculation. Table two gives the network service popularity storage structure.
Watch two
Where CNID, DIP, DPort and URL have the same meanings as in Table 1.Indicating the popularity of network services. LastAccess time indicates the last time the URL was accessed.
To facilitate the explanation of the network service popularity calculation scheme, the present embodiment defines some vocabularies and corresponding annotations, as shown in table three.
Watch III
The network service popularity calculation process performed by each LCA in the DHT network is exemplified by LCA (k). The method comprises the following steps:
when the local network service popularity calculation period tau arrives, the LCA (k) circularly compares the record items of a user access log storage table (hereinafter referred to as a table 1) in a table 1 and a network service popularity storage table (hereinafter referred to as a table 2) in a table 2.
If the CNID, DIP, DPort and URL of the ith record in Table 1 are the same as those of the jth record in Table 2, LCA (k) sets the values in Table 2An add 1 operation is performed while the entry in table 1 is deleted.
If the CNID, DIP, DPort or URL of the ith record in table 1 is different from that of the jth record in table 2, lca (k) adds a corresponding new record in table 2 and sets a corresponding new recordIs "1", and at the same time, the entry in table 1 is deleted.
That is, the weblog storage table and the web service popularity storage table are circularly compared, and if the CNID, DIP, DPort and URL of the ith record in the weblog storage table and the jth record in the web service popularity storage table are the same, the LCA sets the corresponding web service popularity in the web service popularity storage tableAdding 1, and deleting the record item in the weblog storage table; if the CNID, DIP, DPort or URL of the ith record in the weblog storage table is different from the CNID, DIP, DPort or URL of the jth record in the web service popularity storage table, the LCA adds a corresponding new record item in the web service popularity storage table and sets corresponding web service popularityAnd is 1, and meanwhile, the record item in the weblog storage table is deleted.
Further, in order to reduce the load of the LCA and improve the prediction accuracy, the user access logs that have not been updated for a long time are deleted periodically, and for this reason, the following formula gives a measure for deleting the redundant access logs periodically to determine which user access logs are to be deleted.
timerange=CurrentTime-LastAccessTime
Wherein, CurrentTime represents the current system time of LCA, and LastAccessTime represents the last time the network service was accessed. Timerange is the time interval between the current system time of the LCA and the time when the network service was last accessed. When the Timerange is equal to or greater than the set delete operation threshold, the outdated user access log will be deleted to improve the accuracy and efficiency of the prediction.
Step 30, the LCA acquires k network service identifications corresponding to the popularity of the network service according to the access log; and returned to the PCPP.
LCA obtains all relative URL number in the network service popularity storage table by using CNID, DIP and DPort; according to a preset k value, acquiring URLs of k items with the network service popularity ranking; if more than k pieces of network services have the same network service popularity, the URL with the network service popularity ranked k items most ahead in the accessed time is extracted.
When a user access record, e.g. record (i) pcpp (p), arrives at lca (k). Lca (k) will be triggered to execute the following workflow to achieve accurate and efficient support of network policing policy prefetching:
lca (k) obtains all relevant URL pieces (denoted countexit) in table 2 "web services popularity store table" using the triplet < CNID, DIP, DPort >.
LCA (k) queries the number of URLs (denoted as countspecific) set by the system administrator.
LCA (k) calculates k value, and obtains URL of k items (including k item) with network service popularity ranking according to the value. LCA (k) forms a replying message by the URLs and the source end (user) IP in the corresponding user access log, and sends the replying message to PCPP (p).
If there are more than k web services with the same popularity, lca (k) will extract URLs whose popularity of the web service that is most visited is ranked k top (including k < th >). And composing the URLs and source end (user) IP in the corresponding user access log into a replying message to send to PCPP (p).
And step 40, the PCPP performs strategy prefetching to a preset strategy library according to the k network service identifications and the user attribute information in the access log, and performs supervision and treatment on the network user access request according to the prefetched strategy.
Acquiring supervision strategies between the network users and the k network services from a strategy library; when a network user accesses a request next time, judging whether the network user aims at k network services, if so, directly carrying out network supervision and treatment on the network user according to a pre-fetching strategy; otherwise, extracting the data packet requested to be accessed by the network user, and generating the access log of the network user.
When the replying message reaches the PCPP (p), the PCPP (p) carries out strategy prefetching according to the corresponding user attribute information and k URLs in replying to a strategy library. Thus, when the user initiates the next network service request, the pcpp (p) intercepting the request data packet can implement the monitoring means and techniques such as passing, filtering, blocking, etc. by the pre-fetching strategy. That is to say, the policy prefetching here is actually to perform policy prefetching for the next access of the user to the network service, to determine in advance the possible network service for the next access of the user, and to obtain in advance the supervision policy between the user and the network service, so that when the user requests the network service, supervision and control are performed in a targeted manner.
The user attribute information here includes related information filled in when the user registers. When the user access log is uploaded, the PCPP sends a user information query request (Source IP (i)) to a user information base (UID) by using the Source IP address, the UID gives a user label according to attribute information (such as nationality, hobbies, ages and the like) filled in the user registration, and returns a corresponding user label to the PCPP initiating the user information query request. The PCPP caches the user tag.
Example 2
As shown in fig. 3, an embodiment of the present invention provides a distributed user behavior log prediction network supervision system, which includes PCPP and LCA, wherein,
the PCPP is used for capturing a network access request data packet initiated by a network user, extracting an access log and uploading the access log to the LCA; acquiring popularity of network service, acquiring a network supervision policy according to a preset policy library, and supervising and disposing a network access request of a network user;
the LCA is used for distributing and storing the access log, calculating the popularity of the network service according to the access log and sending the popularity to the PCPP.
Specifically, the system comprises a plurality of LCAs, wherein the LCAs form a DHT network;
the LCA performs hash calculation on the received access log through a distributed hash algorithm to obtain a key value, and according to the key value, the LCA obtains a successor node LCA stored in the access log and distributes the user access log to the successor node.
Specifically, the structural functions of the components of the present embodiment are as follows:
the DHT network: the DHT network is composed of log collection and analysis servers (LCA) in a certain logical structure (e.g. Chord), whose function is to: 1) distributing and storing user behavior logs; 2) calculating the popularity of network service access; 3) when the LCA leaves or joins the network, the DHT updates itself.
Log collection and analysis server (LCA): LCA functions in that: 1) receiving a user access log sent by a data packet acquisition and policy pre-fetching server (PCPP); 2) analyzing user access logs and periodically calculating network service access popularity3) Shared network service access popularityTo the DHT network; 4) and inquiring and forwarding the network service access popularity information.
Packet collection and policy pre-fetch Server (PCPP): the PCPP functions as: 1) fetching data packets from a network data forwarding device (e.g., a router, switch, etc.); 2) extracting data packet information according to given requirements and uploading the data packet information to an LCA connected with the data packet information; 3) and sending a network service access popularity information request to the LCA connected with the LCA, predicting the network service which is possibly accessed by the user in the future according to the returned heat information, and pre-fetching a processing strategy from a strategy library by combining the dynamic attribute of the user and the dynamic attribute of the network service.
It should be noted that: in the distributed user behavior log prediction network monitoring system provided in the above embodiment, when monitoring a network, only the division of each function module is exemplified, and in practical application, the function distribution may be completed by different function modules as needed, that is, the internal structure of the system device is divided into different function modules to complete all or part of the functions described above. In addition, the distributed user behavior log prediction network monitoring system and the method embodiment provided by the above embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment and is not described herein again.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In summary, in the embodiments of the present invention, the operation log records of the network user are collected, stored, and analyzed in a distributed manner, the popularity of the network service is calculated according to the access log, and then the network service that the behavior of the user accessing the network next time may be aimed at is predicted according to the popularity of the network service, so that the supervision policy required by the next network access is pre-fetched. And combining the attributes of the network service and the user attributes, and calling a corresponding preprocessing strategy to realize the rapid, efficient and accurate network supervision and treatment of massive network users on the network service access request process.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (8)
1. A distributed user behavior log prediction network supervision method is characterized by comprising the following steps:
the method comprises the following steps that a data packet acquisition and strategy pre-fetching server PCPP captures a network access request data packet initiated by a network user, extracts an access log and uploads the access log to a log collection and analysis server LCA;
the LCA stores the access log and calculates the popularity of the network service according to the access log; wherein:
the access log is stored in a four-tuple form and comprises an identifier CNID of a network where the network user is located, an IP address DIP of a network user access target, a port address DPort of the network user access target and a network service identifier URL;
the LCA extracts < CNID, DIP, DPort > and carries out hash calculation to obtain a key value;
according to the key value, the LCA obtains a successor node LCA stored by the access log, and distributes the user access log to the successor node;
after receiving the distributed access log, the LCA of the subsequent node stores the access log into a network service access log library;
the LCA stores the access log in a network log storage table and establishes a network service popularity storage table;
circularly comparing the weblog storage table with the web service popularity storage table, and if the CNID, DIP, DPort and URL of the ith record in the weblog storage table and the jth record in the web service popularity storage table are the same, setting the corresponding web service popularity in the web service popularity storage table by the LCAAdding 1, and deleting the record item in the weblog storage table;
if the CNID, DIP, DPort or URL of the ith record in the weblog storage table is different from the CNID, DIP, DPort or URL of the jth record in the web service popularity storage table, the LCA adds a corresponding new record item in the web service popularity storage table and sets corresponding web service popularityThe number is 1, and meanwhile, the record item in the weblog storage table is deleted;
the LCA acquires k network service identifications corresponding to the popularity of the network service according to the access log; and returns to the PCPP;
and the PCPP performs strategy prefetching to a preset strategy library according to the k network service identifiers and the user attribute information in the access log, and performs supervision and treatment on the network user access request according to the prefetched strategy.
2. The method of claim 1, wherein the PCPP captures network user initiated network access request packets from a network data forwarding device using bypass listening.
3. The method of claim 1,
and deleting the access logs which are not updated within the set time length in the network service popularity storage table.
4. The method of claim 3, wherein the method further comprises:
and when the difference between the current system time of the LCA and the last time of the LCA for performing network service access popularity calculation is equal to tau, the LCA starts a new round of network service popularity calculation.
5. The method of claim 3, wherein the LCA obtains k network service identifications corresponding to the popularity of the network service according to the access log, comprising:
the LCA obtains all relevant URL numbers in a network service popularity storage table by using CNID, DIP and DPort;
according to a preset k value, acquiring URLs of k items with the network service popularity ranking;
if more than k pieces of network services have the same network service popularity, the URL with the network service popularity ranked k items most ahead in the accessed time is extracted.
6. The method of claim 1, wherein the method further comprises:
user attribute information in the access log is extracted from the network log by the PCPP; the preset strategy base is set according to the strategy of network supervision;
the supervision and treatment of the network user access request according to the pre-fetched strategy comprises the following steps:
acquiring supervision strategies between the network users and the k network services from the strategy library;
when the network user accesses the request next time, judging whether the network user aims at the k network services, if so, directly carrying out network supervision and treatment on the network user according to a pre-fetching strategy; otherwise, extracting the data packet requested to be accessed by the network user, and generating the access log of the network user.
7. A distributed user behavior log prediction network supervision system, characterized in that the system comprises PCPP and LCA, wherein,
the PCPP is used for capturing a network access request data packet initiated by a network user, extracting an access log and uploading the access log to the LCA; acquiring popularity of network service, acquiring a network supervision policy according to a preset policy library, and supervising and handling a network access request of the network user; wherein,
the access log is stored in a four-tuple form and comprises an identifier CNID of a network where the network user is located, an IP address DIP of a network user access target, a port address DPort of the network user access target and a network service identifier URL;
the LCA extracts < CNID, DIP, DPort > and carries out hash calculation to obtain a key value;
according to the key value, the LCA obtains a successor node LCA stored by the access log, and distributes the user access log to the successor node;
after receiving the distributed access log, the LCA of the subsequent node stores the access log into a network service access log library;
the LCA stores the access log in a network log storage table and establishes a network service popularity storage table;
circularly comparing the weblog storage table with the web service popularity storage table, and if the ith record in the weblog storage table is the same as the ith record in the web service popularity storage tableIf the CNID, DIP, DPort and URL of j records are the same, the LCA sets the corresponding network service popularity in the network service popularity storage tableAdding 1, and deleting the record item in the weblog storage table;
if the CNID, DIP, DPort or URL of the ith record in the weblog storage table is different from the CNID, DIP, DPort or URL of the jth record in the web service popularity storage table, the LCA adds a corresponding new record item in the web service popularity storage table and sets corresponding web service popularityThe number is 1, and meanwhile, the record item in the weblog storage table is deleted;
the LCA is used for distributing and storing the access log, calculating the popularity of the network service according to the access log and issuing the popularity to the PCPP.
8. The system of claim 7, wherein the system comprises a number of LCAs, the LCAs comprising a distributed hash table, DHT, network;
the LCA performs hash calculation on the received access log through a distributed hash algorithm to obtain a key value, obtains a successor node LCA stored in the access log according to the key value, and distributes the user access log to the successor node.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210382322.8A CN102946320B (en) | 2012-10-10 | 2012-10-10 | Distributed supervision method and system for user behavior log forecasting network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210382322.8A CN102946320B (en) | 2012-10-10 | 2012-10-10 | Distributed supervision method and system for user behavior log forecasting network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102946320A CN102946320A (en) | 2013-02-27 |
CN102946320B true CN102946320B (en) | 2015-06-24 |
Family
ID=47729229
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210382322.8A Active CN102946320B (en) | 2012-10-10 | 2012-10-10 | Distributed supervision method and system for user behavior log forecasting network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102946320B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103914743B (en) * | 2014-04-21 | 2017-01-25 | 中国科学技术大学先进技术研究院 | On-line serial content popularity prediction method based on autoregressive model |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104135546A (en) * | 2014-07-25 | 2014-11-05 | 可牛网络技术(北京)有限公司 | Method for loading webpage and terminal |
CN104486098A (en) * | 2014-11-26 | 2015-04-01 | 中国建设银行股份有限公司 | Access fault monitoring method and device |
CN105827608B (en) * | 2016-03-31 | 2019-02-12 | 微梦创科网络科技(中国)有限公司 | Distributed API service abnormal user identifying and analyzing method and reverse proxy gateway |
CN109218401B (en) * | 2018-08-08 | 2021-08-31 | 平安科技(深圳)有限公司 | Log collection method, system, computer device and storage medium |
CN109271782B (en) * | 2018-09-14 | 2021-06-08 | 杭州朗和科技有限公司 | Method, medium, system and computing device for detecting attack behavior |
CN110609901B (en) * | 2019-09-17 | 2022-04-15 | 国家电网有限公司 | User network behavior prediction method based on vectorization characteristics |
CN111404960B (en) * | 2020-03-26 | 2022-02-25 | 军事科学院系统工程研究院网络信息研究所 | Attribute extraction method applied to heaven-earth integrated network access control system |
CN111949884B (en) * | 2020-08-26 | 2022-06-21 | 桂林电子科技大学 | Multi-mode feature interaction-based depth fusion recommendation method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101404630A (en) * | 2008-11-25 | 2009-04-08 | 中国网络通信集团公司 | Method and system for implementing internet service access gate |
CN101420554A (en) * | 2007-10-25 | 2009-04-29 | 索尼株式会社 | Program guide provides system, equipment, method and program |
CN102449633A (en) * | 2009-06-01 | 2012-05-09 | 皇家飞利浦电子股份有限公司 | Dynamic determination of access rights |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8640216B2 (en) * | 2009-12-23 | 2014-01-28 | Citrix Systems, Inc. | Systems and methods for cross site forgery protection |
-
2012
- 2012-10-10 CN CN201210382322.8A patent/CN102946320B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101420554A (en) * | 2007-10-25 | 2009-04-29 | 索尼株式会社 | Program guide provides system, equipment, method and program |
CN101404630A (en) * | 2008-11-25 | 2009-04-08 | 中国网络通信集团公司 | Method and system for implementing internet service access gate |
CN102449633A (en) * | 2009-06-01 | 2012-05-09 | 皇家飞利浦电子股份有限公司 | Dynamic determination of access rights |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103914743B (en) * | 2014-04-21 | 2017-01-25 | 中国科学技术大学先进技术研究院 | On-line serial content popularity prediction method based on autoregressive model |
Also Published As
Publication number | Publication date |
---|---|
CN102946320A (en) | 2013-02-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102946320B (en) | Distributed supervision method and system for user behavior log forecasting network | |
Rahman et al. | On the ICN-IoT with federated learning integration of communication: Concepts, security-privacy issues, applications, and future perspectives | |
CN104426718B (en) | Data decryptor server, cache server and redirection method for down loading | |
US20110099226A1 (en) | Method of requesting for location information of resources on network, user node and server for the same | |
KR101942566B1 (en) | Method for transmitting and caching information data in secure surveilance network, recordable medium, apparatus for caching information data in secure surveilance network, and secure surveilance network system | |
US11790016B2 (en) | Method, device and computer program for collecting data from multi-domain | |
US20160299971A1 (en) | Identifying Search Engine Crawlers | |
US11416564B1 (en) | Web scraper history management across multiple data centers | |
US20120158756A1 (en) | Searching in Peer to Peer Networks | |
Tigelaar et al. | Peer-to-peer information retrieval: An overview | |
Zhao et al. | A novel enhanced lightweight node for blockchain | |
Guo et al. | Blockchain-assisted caching optimization and data storage methods in edge environment | |
CN108259544A (en) | URL querying methods and URL inquiry servers | |
CN103957252B (en) | The journal obtaining method and its system of cloud stocking system | |
Chen et al. | Big data generation and acquisition | |
CN103731454B (en) | Method for responding to request in point-to-point network and server system | |
Folz et al. | CyCLaDEs: a decentralized cache for triple pattern fragments | |
Feng et al. | An efficient caching mechanism for network-based url filtering by multi-level counting bloom filters | |
CN107682281A (en) | A kind of application management method of SDN switch and SDN switch | |
Bhagat et al. | Content-based file sharing in peer-to-peer networks using threshold | |
Karolewicz et al. | On efficient data storage service for IoT | |
CN103078771B (en) | Based on Botnet distributed collaborative detection system and the method for P2P | |
Arumugam et al. | Optimal algorithms for generation of user session sequences using server side web user logs | |
CN104980493A (en) | Discovery service method based on active buffer algorithm | |
EP4222617A1 (en) | Web scraping through use of proxies, and applications thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |