CN107277095B - Session segmentation method and device - Google Patents

Session segmentation method and device Download PDF

Info

Publication number
CN107277095B
CN107277095B CN201610216672.5A CN201610216672A CN107277095B CN 107277095 B CN107277095 B CN 107277095B CN 201610216672 A CN201610216672 A CN 201610216672A CN 107277095 B CN107277095 B CN 107277095B
Authority
CN
China
Prior art keywords
data
data record
time
session
access time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610216672.5A
Other languages
Chinese (zh)
Other versions
CN107277095A (en
Inventor
苏晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610216672.5A priority Critical patent/CN107277095B/en
Publication of CN107277095A publication Critical patent/CN107277095A/en
Application granted granted Critical
Publication of CN107277095B publication Critical patent/CN107277095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/142Managing session states for stateless protocols; Signalling session states; State transitions; Keeping-state mechanisms

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer And Data Communications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a session segmentation method and a session segmentation device. Wherein, the method comprises the following steps: acquiring network flow between client equipment and a server; grouping the network flow according to the following information to obtain a data packet: an IP address of the client device and a server identification of the server accessed by the client device; wherein the respective data records within each data packet have different access times; performing difference operation on the access time of two data records selected from each data packet to obtain a difference value; and when the difference is larger than a preset threshold value, taking the access time of the two data records as the starting time and the ending time of one session.

Description

Session segmentation method and device
Technical Field
The invention relates to the field of network security, in particular to a session segmentation method and a session segmentation device.
Background
Currently, there are two main methods for session segmentation of data streams, such as HTTP traffic data: one method is to use a plurality of times of SQL queries and an intermediate table storage method to realize the capability of session segmentation; the other method is implemented by using SQL UDF function programming, and either method is a less efficient method in terms of development amount, operation efficiency of engineers and maintenance cost of late code, and specifically has the following problems:
(1) whether the function is multi-time SQL query and intermediate table storage or SQL UDF function programming, more codes need to be developed, and the UDF function needs to be developed by using SQL or non-SQL program languages such as Python and the like.
(2) The operation efficiency is not high, the SQL query and intermediate table storage methods need to query for multiple times, and in addition, the UDF function is customized by a user, because the function is the user-defined function, the optimization of performance is difficult to achieve, and the execution efficiency of the whole SQL query is directly influenced.
(3) The later maintenance workload of the codes is large, and the two methods need to write more codes, so the later maintenance cost is higher.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
According to an aspect of an embodiment of the present application, there is provided a session segmentation method, including: acquiring network flow between client equipment and a server; grouping the network flow according to the following information to obtain a data packet: an IP address of the client device and a server identification of the server accessed by the client device; wherein the respective data records within each data packet have different access times; performing difference operation on the access time of two data records selected from each data packet to obtain a difference value; and when the difference is larger than a preset threshold value, taking the access time of the two data records as the starting time and the ending time of one session.
According to another aspect of the embodiments of the present application, there is also provided a session splitting apparatus, including: the acquisition module is used for acquiring network flow between the client device and the server; the grouping module is used for grouping the network flow according to the following information to obtain a data grouping: an IP address of the client device and a server identification of the server accessed by the client device; wherein the respective data records within each data packet have different access times; the segmentation module is used for carrying out difference operation on the access time of two data records selected from each data record list to obtain a difference value; and when the difference is larger than a preset threshold value, taking the access time of the two data records as the starting time and the ending time of one session.
In the embodiment of the application, the received data stream is grouped according to the IP address of the client device and the server identification, and comparing the difference in access time between the two data records within each data packet to a preset threshold, when the obtained difference value is larger than a preset threshold value, the access time of the two data records is used as the starting time and the ending time of one session to realize the segmentation of the session, it can be realized by using the existing analysis function, thereby reducing the development workload and the maintenance workload, and, since the above method mainly uses the access time difference of the data record to realize the session division, therefore, the access time may be obtained in a manner known in the related art (e.g., analysis function acquisition), the session segmentation process may be optimized for performance, and then solved the not high, great technical problem of development work load and maintenance work load of operating efficiency.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware structure of a computer terminal of a session segmentation method according to an embodiment of the present application. (ii) a
FIG. 2 is a flow chart of an alternative session segmentation method according to an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating an alternative session segmentation principle according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an alternative session segmentation flow according to an embodiment of the present application;
FIG. 5 is a block diagram of an alternative session segmentation apparatus according to an embodiment of the present application;
fig. 6 is a block diagram of an alternative computer terminal according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For ease of understanding, terms referred to in the embodiments of the present application will now be explained as follows:
SQL is a special purpose programming language, a database query and programming language, used to access data and query, update and manage relational database systems;
structured query languages are high-level, non-procedural programming languages that allow users to work on high-level data structures. The method does not require a user to specify a data storage method and does not require the user to know a specific data storage mode, so that different database systems with completely different underlying structures can use the same structured query language as an interface for data input and management. The structured query language statements can be nested, which allows for great flexibility and powerful functionality.
The structured query language contains 6 parts:
firstly, the method comprises the following steps: data Query Language (DQL):
its statements, also called "data retrieval statements," are used to obtain data from the table and determine how the data is presented to the application. The reserved word SELECT is the verb most used BY DQL (also all SQL), and other reserved words commonly used BY DQL are WHERE, ORDER BY, GROUP BY and HAVING. These DQL reserved words are often used with other types of SQL statements.
II, secondly: data Manipulation Language (DML: Data Manipulation Language):
its statements include the verbs INSERT, UPDATE and DELETE. They are used to add, modify and delete rows in the table, respectively. Also known as action query language.
Thirdly, the method comprises the following steps: transaction language (TPL):
its statement ensures that all rows of the table affected by the DML statement are updated in time. TPL statements include BEGINTRANSACTION, COMMIT, and ROLLBACK.
Fourthly, the method comprises the following steps: data Control Language (DCL):
its statements are granted by GRANT or REVOKE to determine access to database objects by individual users and groups of users. Some RDBMSs may use GRANT or REVOKE to control access to individual columns.
Fifthly: data Definition Language (DDL):
its statements include verbs CREATE and DROP. Creating a new TABLE or a delete TABLE (CREAT TABLE or DROP TABLE) in the database; index the table, etc. The DDL includes a number of reserved words associated with the data obtained in the people database directory. It is also part of the action query.
Sixthly, the method comprises the following steps: pointer control language (CCL):
its statements, like DECLARE CURSOR, FETCH INTO, and UPDATE WHERE CURRENT, are used to operate on one or more forms independently.
Analysis function-an analysis function is used to calculate the aggregate value for a group of rows, unlike an aggregation function, which returns multiple records for each group. This set of rows defined by the analysis function is called a window. For each line, a sliding window over the line is defined, the window determining the range of lines to be calculated for the current line. The window size may then be based on the physical row number or logical spacing.
Ordinary aggregation functions are grouped with GROUP BY, each returning a statistic, while analysis functions are grouped with PARTITION BY, and each GROUP can return a statistic per row.
The windowing function is over ([ partition _ by _ close ] order _ by _ close). Say, i take sum of sum, rank ordering, etc., but what do i work on? over provides a window that can be used as a partition by what is grouped and then as an order by what is internally ordered within the group.
Session-the situation where an association between a user and different requests issued by the same user is maintained across multiple HTTP connections is referred to as maintaining a session. A session can associate a user with different requests made by the same user. The sessions of different users should be independent of each other. Once established, the session should exist until the user's idle time exceeds a certain time limit, and the container should not release the session resources. During the lifetime of a session, a user may send many requests to the server, and the requested information of the user may be stored in the session.
Example 1
There is also provided, in accordance with an embodiment of the present application, a method embodiment of a session segmentation method, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
The method provided by the embodiment 1 of the present application can be executed in a mobile terminal, a computer terminal or a similar computing device. Taking an example of the session segmentation method running on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of the session segmentation method according to the embodiment of the present application. As shown in fig. 1, the computer terminal 10 may include one or more (only one shown) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission device 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the session segmentation method in the embodiment of the present application, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
Under the operating environment, the application provides a session segmentation method as shown in fig. 2. Fig. 2 is a flowchart of a session segmentation method according to embodiment 1 of the present application. As shown in fig. 2, the method includes:
step S202, network flow between the client device and the server is obtained;
step S204, grouping the network flow according to the following information to obtain a data grouping: an IP address of the client device and a server identifier of the server accessed by the client device; wherein the respective data records within each data packet have different access times.
In an alternative embodiment of the present application, the server is also referred to as a host. At this time, as can be seen from the features in steps S202 to S204, in the embodiment of the present application, session segmentation is performed on traffic data of a source IP accessing the same host with a host and a source IP address (i.e., an IP address of a client device) as a segmentation dimension, and optionally, a start time, an end time, and a session identifier may be set every 30 minutes, that is, 30 minutes is used as a minimum time unit for session segmentation.
It should be noted that the network traffic in step S202 may be a hypertext transfer Protocol (HTTP) data stream. At this time, the data structure of the received data stream is shown in table 1:
table 1 input data-raw flow data http _ flow table
Name of field Type (B) Means of
host string Main unit
src_ip string Source IP
reqtime date Time of access
url string Accessing url
Step S206, carrying out difference operation on the access time of two data records selected from each data group to obtain a difference value; and when the difference value is larger than a preset threshold value, taking the access time of the two data records as the starting time and the ending time of one session.
The process of the above "difference operation" may be represented by the following implementation forms, but is not limited to this: for each data packet, sorting the data records in the data packet according to the sequence of the access time to obtain a data record list; and selecting the previous data record and the next data record of the specified data record in the data record list to carry out difference operation to obtain the difference value. It should be noted that the specified data record may be a data record corresponding to the access time being the current time of the system, but is not limited thereto.
It is noted that for all data records within each data packet, the data records within the data packet may be ordered according to different access times, since all data records have the same server identification and source IP address, but different access times.
Alternatively, the next data record may be a data record adjacent to the access time of the specified data record, and may be a data record selected from data records following the specified data record, and for the latter, the following method may be used to determine: and sequentially carrying out difference operation on the access time of the previous data record and the access time of the remaining data records, and taking the remaining data record corresponding to the difference value which is larger than the preset threshold value and appears for the first time as the next data record, wherein the remaining data record is the data record which is arranged after the specified data record in the data record list. That is, in the process of calculating the difference, the difference calculation may be performed sequentially from top to bottom, and if the difference between the access time of the data record adjacent to the designated data record and the access time of the designated data record is greater than a preset threshold, the data record adjacent to the designated data record is determined as the next data record, and if the difference is less than the preset threshold, the difference calculation may be performed by continuing to select the access time corresponding to the next data record and the access time of the designated data record in the data record list.
It should be noted that, for determining the start time of the session, when the specified data record is the first-ranked data record in the data record list, the access time of the specified data record needs to be the start time, and specifically, the following processing procedures are implemented: and judging whether the specified data record is the first data record in the data record list, wherein if the judgment result is yes, the access time of the specified data record is taken as the starting time of one session.
In an alternative embodiment of the present application, the data structure of the output data obtained by grouping the data streams is shown in table 2. Note that the data structure in table 2 is obtained based on a data stream having the data structure shown in table 1.
Table 2 output data-session segmentation data http _ session table
Name of field Type (B) Means of
host string Main unit
src_ip string Source IP
starttime date Starting access time
endtime date End access time
sid string Session identification ID
The above processing procedure in the embodiment of the present application may be implemented based on an analysis function (e.g., LAG analysis function and LEAD analysis function) in the related art, so that the analysis function may be implemented in one SQL statement, and the workload of program development is reduced.
The LAG (expression <, offset > <, default >) analysis function may access lines preceding the current line in the group, and the LEAD (expression <, offset > <, default >) analysis function may access lines following the current line in the group, in reverse, and may query lines following the current line in the group. The parameter offset is a positive integer and defaults to 1. Since there is no data before the first record in the packet and no data after the last record, the default parameter is used to handle the case where the default is null, i.e., the default value is null when there is no data. In an alternative embodiment of the present application, the two LAG and LEAD analysis functions may be used to implement session segmentation of traffic data, so that using the two LAG and LEAD analysis functions may implement session segmentation of traffic data with any length in one SQL statement.
Also, the analysis function may be executed on a data platform provided in the related art, such as an Oracle database (also called Oracle RDBMS, or simply Oracle) platform. The Oracle database system is a popular relational database management system in the world at present, has good portability, convenient use and strong function, and is suitable for various large, medium, small and microcomputer environments. The method is a high-efficiency and high-reliability database solution suitable for high throughput. The data platform can support analysis function operation, and various complex analysis requirements encountered in data analysis work are solved by utilizing the powerful functions and flexible programmable characteristics of the analysis functions, for example, the above-mentioned problem of session segmentation can be skillfully realized by using the analysis functions. The principle of session segmentation is explained in detail below in connection with an alternative embodiment. As shown in fig. 3, the principle of session segmentation is as follows:
(1) data grouping, grouping the flow data (namely data flow) according to host and src _ ip fields, and sequencing according to the request time (namely access time);
(2) session segmentation, namely acquiring the current data line request time starttime, the last data line request time lasttime and the maximum request time maxtime (namely, the preset threshold can be flexibly set according to the actual situation) which are sequenced according to the request time reqtime in the packet based on the data packet in (1); calculating the difference value between the current data line request time reqtime (starttime) and the last data line request time lasttime in the packet, namely the session division duration, and obtaining the session division end time by filtering the session division duration. The data structure of the grouped input data is shown in table 3, and the data structure of the divided session is shown in table 4.
TABLE 3
Figure BDA0000960151740000081
TABLE 4
host src_ip starttime endtime
a 192.168.0.1 2016.01.25 16:21:00 2016.01.25 16:21:16
a 192.168.0.2 2016.01.25 16:21:24 2016.01.25 16:21:24
b 192.168.0.1 2016.01.25 16:21:00 2016.01.25 16:21:04
Therefore, the analysis functions LAG and LEAD are comprehensively used, so that the function development can be completed more efficiently, the development workload is reduced, and the maintainability of the SQL code is improved. For ease of understanding, the session splitting process is further described below in conjunction with the process shown in fig. 4. FIG. 4 is a schematic diagram of an alternative session segmentation flow according to an embodiment of the present application; as shown in fig. 4, the process includes:
step S402, firstly, an analysis function lag is used for obtaining the last access time lasttime, partition by host and src _ IP corresponding to the reqtime access time, so as to ensure that the source IP accesses the same host, and the order byreq time ensures that the accesses are ordered according to the time sequence.
Step S404, a next access time endtime corresponding to lasttime is further obtained, where the endtime is the filter condition of datediff (reqttime, lasttime, 'mi') > 30. In which some additional judgment processes, such as lasttime is null, and the code unique _ ID () as sid for calculating the session identification ID, need to be added.
Step S406, completing session segmentation calculation of the starttime, endtime and sid corresponding to the same host accessed by the source IP, and obtaining a segmented session (http _ session).
Therefore, the whole session segmentation process can be realized by one SQL query statement, and can be normally and efficiently operated in ODPS or ORACLE.
To sum up, the embodiments of the present application can achieve the following effects: the development amount is small, because the analysis function can be used for realizing, the problem of session segmentation can be completed only by one SQL query statement, and the code development amount of an engineer is small; the operation efficiency is high, because the analysis function can be used, and the analysis function can be an SQL function built in an Oracle database, the operation mechanism of the analysis function is fully optimized in the platform, and especially when the problem of longitudinal analysis of data, namely session segmentation, is involved, the analysis function is particularly efficient; the maintenance workload is small, and because only one SQL statement can be used, an engineer can easily understand the SQL statement, so the later code maintenance workload is small.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
According to an embodiment of the present application, there is also provided an apparatus for implementing the session segmentation method, as shown in fig. 5, the apparatus includes:
an obtaining module 50, configured to obtain network traffic between a client device and a server; alternatively, the network traffic may be, but is not limited to, data traffic collected from the client device during the process of accessing the server by the client device.
A grouping module 52, configured to group the network traffic according to the following information to obtain a data packet: an IP address of the client device and a server identifier of the server accessed by the client device; wherein the respective data records within each data packet have different access times;
a dividing module 54 connected to the grouping module 52 for performing a difference operation on the access time of two data records selected from each data group to obtain a difference value; and when the difference value is larger than a preset threshold value, taking the access time of the two data records as the starting time and the ending time of one session.
In an optional embodiment of the present application, the segmentation module 54 is further configured to, for each data packet, sort the data records in the data packet according to the sequence of the access times to obtain a data record list; and selecting the previous data record and the next data record of the specified data record in the data record list to carry out difference operation to obtain the difference value. Alternatively, the next data record is a data record selected from data records subsequent to the designated data record.
It should be noted that for all data records within each data packet, all data records have the same server identification and source IP address, but have different access times, and therefore, the data records within the data packet may be ordered according to the different access times.
Optionally, the segmenting module 54 is configured to select the next data record according to the following manner: and sequentially carrying out difference operation on the access time of the previous data record and the access time of the remaining data records, and taking the remaining data record corresponding to the difference value which is larger than the preset threshold value and appears for the first time as the next data record, wherein the remaining data record is the data record which is arranged after the specified data record in the data record list.
Optionally, the dividing module 54 is further configured to determine whether the specified data record is the first data record in the data record list before the access time of the two data records is taken as the start time and the end time of a session, and when the determination result is yes, the access time of the specified data record is taken as the start time of a session.
In addition, the functions implemented by the grouping module 52 and the splitting module 54 in the embodiment of the present application may be implemented based on analysis functions (e.g., LAG analysis function and LEAD analysis function) in the related art, so that the analysis functions may be implemented in one SQL statement, and the workload of program development is reduced.
For example, the LAG (expression <, offset > <, default >) analysis function may access lines preceding the current line in the group, whereas the LEAD (expression <, offset > <, default >) analysis function may interrogate lines following the current line in the group inversely. The parameter offset is a positive integer and defaults to 1. Since there is no data before the first record in the packet and no data after the last record, the default parameter is used to handle the case where the default is null, i.e., the default value is null when there is no data. In an alternative embodiment of the present application, the two LAG and LEAD analysis functions may be used to implement session segmentation of traffic data, so that using the two LAG and LEAD analysis functions may implement session segmentation of traffic data with any length in one SQL statement.
It should be noted that the above modules may be implemented by software or hardware, wherein, for the latter, the following may be implemented, but not limited to: the modules are located in the same processor, and the modules are located in different processors in any combination.
It should be noted that, for the preferred implementation in this embodiment, reference may be made to the description in embodiment 1, and details are not described here.
Example 3
The embodiment of the application can provide a computer terminal, and the computer terminal can be any one computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.
Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.
In this embodiment, the computer terminal may execute the program code of the following steps in the session splitting method: in the process that the client device accesses the server, grouping data streams from the client device according to the following information to obtain data packets: an IP address of the client device and a server identifier of the server accessed by the client device; wherein the respective data records within each data packet have different access times; for each data packet, sorting the data records in the data packet according to the sequence of the access time to obtain a data record list; performing difference operation on the access time of two selected data records in the data record list to obtain a difference value; and when the difference value is larger than a preset threshold value, taking the access time of the two data records as the starting time and the ending time of one session.
Optionally, fig. 6 is a block diagram of a computer terminal according to an embodiment of the present application. As shown in fig. 6, the computer terminal a may include: one or more processors 61 (only one of which is shown), a memory 63, and a transmission means 65 connected to the web server.
The memory 63 may be used to store software programs and modules, such as program instructions/modules corresponding to the methods and apparatuses in the embodiments of the present application, and the processor 61 executes various functional applications and data processing by running the software programs and modules stored in the memory 63, so as to implement the session segmentation method described above. The memory 63 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 63 may further include memory located remotely from processor 61, which may be connected to terminal a via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 65 is used for receiving or transmitting data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 65 includes a network adapter (NIC) that can be connected to a router via a network cable and other network devices to communicate with the internet or a local area network. In one example, the transmission device 65 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
The memory 63 is used for storing preset action conditions, information of preset authorized users, and application programs.
The processor 61 may call the information and application stored in the memory 63 through the transmission device to perform the following steps: and selecting the previous data record and the next data record of the specified data record in the data record list to carry out difference operation to obtain the difference value.
Optionally, the processor 61 may further execute program codes of the following steps: and sequentially carrying out difference operation on the access time of the previous data record and the access time of the remaining data records, and taking the remaining data record corresponding to the difference value which is larger than the preset threshold value and appears for the first time as the next data record, wherein the remaining data record is the data record which is arranged after the specified data record in the data record list.
Optionally, the processor 61 may further execute program codes of the following steps: and judging whether the specified data record is the first data record in the data record list, wherein if the judgment result is yes, the access time of the specified data record is taken as the starting time of one session.
It can be understood by those skilled in the art that the structure shown in fig. 6 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 6 is a diagram illustrating a structure of the electronic device. For example, the computer terminal a may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 6, or have a different configuration than shown in fig. 6.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
Example 4
Embodiments of the present application also provide a storage medium. Optionally, in this embodiment, the storage medium may be configured to store program codes executed by the session splitting method provided in embodiment 1.
Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: in the process that the client device accesses the server, grouping data streams from the client device according to the following information to obtain data packets: an IP address of the client device and a server identifier of the server accessed by the client device; wherein the respective data records within each data packet have different access times; for each data packet, sorting the data records in the data packet according to the sequence of the access time to obtain a data record list; performing difference operation on the access time of two selected data records in the data record list to obtain a difference value; and when the difference value is larger than a preset threshold value, taking the access time of the two data records as the starting time and the ending time of one session.
It should be noted here that any one of the computer terminal groups may establish a communication relationship with the web server and the scanner, and the scanner may scan the value commands of the web application executed by the php on the computer terminal.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method for session segmentation, comprising:
acquiring network flow between client equipment and a server;
grouping the network flow according to the following information to obtain a data packet: an IP address of the client device and a server identification of the server accessed by the client device; wherein each data record in each data packet has a different access time, and all data in each data packet has the same server identification and source address;
performing difference operation on the access time of two data records selected from each data packet to obtain a difference value; and when the difference is larger than a preset threshold value, taking the access time of the two data records as the starting time and the ending time of one session.
2. The method of claim 1, wherein differencing access times of two selected data records in the list of data records comprises:
for each data packet, sequencing the data records in the data packet according to the sequence of the access time to obtain a data record list;
and selecting the previous data record and the next data record of the specified data record in the data record list to carry out difference operation to obtain the difference value.
3. The method of claim 2, wherein the next data record is a selected one of the data records subsequent to the specified data record.
4. The method of claim 3, wherein the next data record is a data record selected in the following manner:
and sequentially carrying out difference operation on the access time of the previous data record and the access time of the remaining data records, and taking the remaining data record corresponding to the difference value which is larger than the preset threshold value and appears for the first time as the next data record, wherein the remaining data record is the data record which is arranged after the specified data record in the data record list.
5. The method according to any of claims 2 to 4, wherein the access time of the two data records is taken as being before the start time and the end time of one session, the method further comprising:
and judging whether the specified data record is the first data record in the data record list, wherein when the judgment result is yes, the access time of the specified data record is taken as the starting time of one session.
6. A session segmentation apparatus, comprising:
the acquisition module is used for acquiring network flow between the client device and the server;
the grouping module is used for grouping the network flow according to the following information to obtain a data grouping: an IP address of the client device and a server identification of the server accessed by the client device; wherein the respective data records within each data packet have different access times,all data within each data packet have the same server identification And source address
The segmentation module is used for carrying out difference operation on the access time of two data records selected from each data packet to obtain a difference value; and when the difference is larger than a preset threshold value, taking the access time of the two data records as the starting time and the ending time of one session.
7. The apparatus according to claim 6, wherein the partitioning module is further configured to, for each of the data packets, sort the data records in the data packet according to the sequence of the access times to obtain a data record list; and selecting the previous data record and the next data record of the specified data record in the data record list to carry out difference operation to obtain the difference value.
8. The apparatus of claim 7, wherein the next data record is a selected one of the data records subsequent to the specified data record.
9. The apparatus of claim 8, wherein the partitioning module is configured to select the next data record as follows: and sequentially carrying out difference operation on the access time of the previous data record and the access time of the remaining data records, and taking the remaining data record corresponding to the difference value which is larger than the preset threshold value and appears for the first time as the next data record, wherein the remaining data record is the data record which is arranged after the specified data record in the data record list.
10. The apparatus according to any one of claims 7 to 9, wherein the dividing module is further configured to determine whether the specified data record is the first data record in the data record list before the access time of the two data records is taken as the start time and the end time of one session, and when the determination result is yes, the access time of the specified data record is taken as the start time of one session.
CN201610216672.5A 2016-04-07 2016-04-07 Session segmentation method and device Active CN107277095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610216672.5A CN107277095B (en) 2016-04-07 2016-04-07 Session segmentation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610216672.5A CN107277095B (en) 2016-04-07 2016-04-07 Session segmentation method and device

Publications (2)

Publication Number Publication Date
CN107277095A CN107277095A (en) 2017-10-20
CN107277095B true CN107277095B (en) 2020-09-25

Family

ID=60052824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610216672.5A Active CN107277095B (en) 2016-04-07 2016-04-07 Session segmentation method and device

Country Status (1)

Country Link
CN (1) CN107277095B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107958024B (en) * 2017-11-09 2020-10-16 广州虎牙信息科技有限公司 Session merging method and device and computer equipment
CN108650334B (en) * 2018-08-02 2021-03-30 东软集团股份有限公司 Session failure setting method and device
CN113705250B (en) * 2021-10-29 2022-02-22 北京明略昭辉科技有限公司 Session content identification method, device, equipment and computer readable medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012012560A3 (en) * 2010-07-20 2012-05-18 Box Top Solutions, Inc. Application activity system
CN103051553A (en) * 2013-01-25 2013-04-17 西安电子科技大学 Balanced splitting system and method for network traffics
CN104270427A (en) * 2014-09-18 2015-01-07 用友优普信息技术有限公司 Session control method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012012560A3 (en) * 2010-07-20 2012-05-18 Box Top Solutions, Inc. Application activity system
CN103051553A (en) * 2013-01-25 2013-04-17 西安电子科技大学 Balanced splitting system and method for network traffics
CN103051553B (en) * 2013-01-25 2015-09-02 西安电子科技大学 The balanced segmenting system of network traffics and dividing method
CN104270427A (en) * 2014-09-18 2015-01-07 用友优普信息技术有限公司 Session control method and device

Also Published As

Publication number Publication date
CN107277095A (en) 2017-10-20

Similar Documents

Publication Publication Date Title
CN104376053B (en) A kind of storage and retrieval method based on magnanimity meteorological data
CN102541990B (en) Database redistribution method and system utilizing virtual partitions
US9633104B2 (en) Methods and systems to operate on group-by sets with high cardinality
US10223437B2 (en) Adaptive data repartitioning and adaptive data replication
CN111258978B (en) Data storage method
CN109446252B (en) Unified access method and system for power grid regulation and control
US9984081B2 (en) Workload aware data placement for join-based query processing in a cluster
CN104462222A (en) Distributed storage method and system for checkpoint vehicle pass data
CN109902126B (en) Loading system supporting HIVE automatic partition and implementation method thereof
CN105338113A (en) Multi-platform data interconnected system for sharing urban data resources
CN102957622B (en) Method, device and system for data processing
CN107277095B (en) Session segmentation method and device
CN104267932A (en) Method, device and server for operating databases
WO2015074477A1 (en) Path analysis method and apparatus
CN105405070A (en) Distributed memory power grid system construction method
CN103886508A (en) Mass farmland data monitoring method and system
CN104539750A (en) IP locating method and device
CN108154024B (en) Data retrieval method and device and electronic equipment
WO2019085778A1 (en) Method, apparatus, and system for determining valid partition in database
CN108664665A (en) Data format method for transformation, device, equipment and readable storage medium storing program for executing
CN110928681A (en) Data processing method and device, storage medium and electronic device
CN103200269A (en) Internet information statistical method and Internet information statistical system
US9117005B2 (en) Statistics collection using path-value pairs for relational databases
CN105719072B (en) System and method for associating multi-segment component transactions
CN105095224A (en) Method, apparatus and system for carrying out OLAP analysis in mobile communication network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant