CN109889292B - Time deviation calibration method in three-layer correlation audit - Google Patents

Time deviation calibration method in three-layer correlation audit Download PDF

Info

Publication number
CN109889292B
CN109889292B CN201910087096.2A CN201910087096A CN109889292B CN 109889292 B CN109889292 B CN 109889292B CN 201910087096 A CN201910087096 A CN 201910087096A CN 109889292 B CN109889292 B CN 109889292B
Authority
CN
China
Prior art keywords
sql
sum
url
square
correlation coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910087096.2A
Other languages
Chinese (zh)
Other versions
CN109889292A (en
Inventor
岑峰
李武壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201910087096.2A priority Critical patent/CN109889292B/en
Publication of CN109889292A publication Critical patent/CN109889292A/en
Application granted granted Critical
Publication of CN109889292B publication Critical patent/CN109889292B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a time deviation calibration method in three-layer correlation audit, which continuously receives an HTTP network packet and an SQL database packet and divides time intervals based on time deviation; counting statistical information in each interval in the HTTP network packet flow and the SQL database packet flow, and calculating to obtain and store a correlation coefficient; for each SQL, sorting the correlation coefficient values of various URLs corresponding to the SQL from large to small, selecting the maximum set number of correlation coefficient values, summing the correlation coefficient values, and storing the summation result and the set time deviation into a time deviation calculation table; then resetting a new possible time offset value; and selecting the correlation coefficient and the maximum corresponding time deviation value as a final time deviation value according to the stored time deviation table. Compared with the prior art, the method and the device can accurately estimate the time deviation between the URL end and the SQL end under the condition of not acquiring the true value of the corresponding relation between the URL and the SQL, thereby improving the accuracy of the association between the URL and the SQL.

Description

Time deviation calibration method in three-layer correlation audit
Technical Field
The invention relates to a time deviation calibration method, in particular to a time deviation calibration method in three-layer correlation audit.
Background
The three-layer correlation audit technology is a technology for performing correlation analysis by integrating application layer audit data and database layer audit data so as to accurately correspond application layer operation to database layer operation. The three layers refer to a browser, a WEB server, and a database server. Under normal conditions, the WEB server sends an SQL command to access another database server according to the submitting action of the user, and with the correlation audit, the database access triggered by the URL request can be inquired, and the URL request triggering of a certain database access can also be inquired, so that the real visitor can be traced. The correlation function is roughly realized in such a way that the system audits events of two types, namely browser-Web server and WEB server-database server: the former is HTTP events and the latter are SQL events. The HTTP event corresponds to an HTTP packet, and the SQL event corresponds to an SQL database packet. And associating the HTTP event with the SQL event, and performing association audit on the HTTP packet and the SQL database packet to finally realize the purpose of associating the accessed resource account with the related database operation. Among these, frequently used information is: the method comprises the following steps of obtaining detailed information such as a user name, an access IP address, access time, end time, a Web server address, a Web server IP, an SQL statement, a database name, a database table name, a port, an execution result and the like, wherein the access time and the end time are one of the most important information, and the accuracy of time seriously influences the accuracy of association between the URL and the SQL.
In the practical application aspect of the three-layer correlation audit technology, the common correlation method mainly comprises three types: firstly, performing text analysis on an HTTP network packet and an SQL database packet; secondly, based on a machine learning method, a complete training data set is collected in advance, each sample contains the relevant information of URL and the relevant information of SQL as the characteristics of the sample, and whether the URL is related to the SQL or not is used as a label of the sample. Thirdly, the calculation method of the correlation coefficient of the SQL and the URL based on the statistical principle calculates the correlation coefficient of the URL and the SQL by counting the distribution characteristics of the URL data and the SQL data in each time interval, thereby carrying out the relation prediction of the URL and the SQL.
In the three methods, the time information of the URL end and the time information of the SQL end are extremely important information, and it is generally assumed that the time information of the URL end and the time information of the SQL end are aligned in the three methods, however, in an actual application scenario, the relevant data of the URL and the relevant data of the SQL are collected and sent by two different listeners respectively, which are responsible for monitoring the network server and the SQL database respectively, and clocks of the two are asynchronous, so that time deviation exists, which causes a dislocation phenomenon in a sampling time interval, thereby seriously affecting the accuracy of association.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a time deviation calibration method in three-layer correlation audit.
The purpose of the invention can be realized by the following technical scheme:
a time offset calibration method in three-layer correlation audit comprises the following steps:
step S1: continuously receiving an HTTP network packet and an SQL database packet;
step S2: dividing time intervals of the HTTP network packet flow and the SQL database packet flow according to given time deviation;
step S3: counting the number and the square of each URL in each sampling time interval in the HTTP network packet flow, and respectively accumulating to obtain the number sum and the square sum;
step S4: counting the quantity and the square of each SQL in each sampling time interval in the SQL database packet flow, and respectively accumulating to obtain a quantity sum and a square sum;
step S5: counting the product of the number of each URL and SQL in each sampling time interval, and accumulating to obtain a product sum;
step S6: obtaining and storing correlation coefficients based on the quantity sum and the square sum of each URL, the quantity sum and the square sum of each SQL and the product sum;
step S7: repeating the steps, and updating the stored correlation coefficient;
step S8: for each SQL, sorting the correlation coefficient values corresponding to each URL from large to small, selecting the maximum set number of correlation coefficient values, summing the correlation coefficient values, and storing the summation result and the set time deviation into a time deviation calculation table;
step S9: resetting new possible time offset values and repeating steps S1-S8 until all possible time offset values are traversed;
step S10: and selecting the correlation coefficient and the maximum corresponding time deviation value as a final time deviation value according to the stored time deviation table.
The step S3 specifically includes:
step S31: counting the number and the square of each URL in each sampling time interval in the HTTP network packet flow;
step S32: respectively accumulating to obtain a quantity sum and a square sum:
Figure GDA0002385008370000031
Figure GDA0002385008370000032
wherein:
Figure GDA0002385008370000033
for the sum of the number of each type of URL,
Figure GDA0002385008370000034
for the sum of squares of each URL, XNNumber of Nth URL, XN 2The square of the number of the Nth URL, wherein N is the number of the URL types in the interval;
step S33: and storing the number and the square of each URL in each sampling time interval, and accumulating the number sum and the square sum.
The step S4 specifically includes:
step S41: counting the quantity and the square of each SQL in each sampling time interval in the SQL database packet flow;
step S42: respectively accumulating to obtain a quantity sum and a square sum:
Figure GDA0002385008370000035
Figure GDA0002385008370000036
wherein:
Figure GDA0002385008370000037
for the sum of the number of each SQL,
Figure GDA0002385008370000038
for the sum of squares of each SQL, YMIs the number of SQL types M, YM 2The square of the quantity of the M SQL types, wherein M is the number of SQL types in the interval;
step S43: and storing the quantity and the square of each SQL in each sampling time interval, and the quantity sum and the square sum obtained by accumulation.
The mathematical expression of the correlation coefficient in step S6 is:
Figure GDA0002385008370000041
wherein: r is a correlation coefficient of the signal to be measured,
Figure GDA0002385008370000042
for the cumulative sum of the products of the numbers of each URL and each SQL, | X | is the number of samples sampled in step S3, | Y | is the number of samples sampled in step S4, | X ∩ Y | is the number of intersections of the sampled samples of SQL and URL.
The mathematical expression of the new possible time deviation value in step S9 is:
t←t+t′
wherein: t' is the time offset increment and t is the time offset.
A device for realizing the time deviation calibration method in the three-layer correlation audit comprises a network packet monitor, an SQL packet monitor and a correlation coefficient calculation and time deviation calibration module, wherein the time deviation calibration module is respectively connected with the network packet monitor and the SQL packet monitor.
Compared with the prior art, the invention has the following beneficial effects: under the condition of not acquiring the true value of the corresponding relation between the URL and the SQL, the time deviation between the URL end and the SQL end can be accurately estimated, so that the accuracy of the correlation between the URL and the SQL is improved.
Drawings
FIG. 1 is a plot of rf versus accuracy distribution;
FIG. 2 is a schematic flow chart of a correlation coefficient calculation and time offset calibration procedure;
FIG. 3 is a time offset calibration apparatus;
fig. 4 is a diagram of URL sampling intervals and their corresponding offset SQL sampling intervals.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
The application particularly provides a time deviation calibration method aiming at a third method in the background technology, namely a SQL (structured query language) and URL (uniform resource locator) correlation coefficient calculation method based on a statistical principle, so as to improve the correlation accuracy of the method, and the other two methods can also use the method through a small amount of modification work.
To calibrate for time offsets, one simple idea is to: using a data set with a true value of the corresponding relation between URL and SQL, after selecting a more appropriate sampling time interval length, presetting a group of time deviation numerical values for calibration, and executing a correlation coefficient calculation method under different time deviations[1]And verifying the accuracy of the judgment result, wherein the time deviation value corresponding to the highest accuracy can be regarded as the real time deviation between two groups of data of URL and SQL. After obtaining the value of the time deviation, we can use it to perform correlation analysis of the rest of data, so that the result is more accurate.
However, this concept of obtaining the time offset still has problems. In practical situations, we cannot obtain the true value of the correspondence between URL and SQL because we calculate the correlation coefficient exactly to determine the correspondence between URL and SQL. Therefore, the accuracy cannot be calculated, and we cannot estimate the true time offset by comparing the accuracy under different time offset conditions.
However, the value of the correlation coefficient may be calculated whether or not the true value of the correspondence between the URL and the SQL can be obtained. Therefore, we have conceived a more feasible time offset correction method, which uses the correlation coefficient as the measure. However, in the calculation results of a set of samples, each SQL has a correlation coefficient for each URL, which is stored in a large matrix, how to construct the metric using the correlation coefficients?
Let us return to the correlation coefficient calculation method[1]The calculation method is essentially a pearson correlation coefficient for the sample. First, the pearson correlation coefficient is symmetrical, and in the current application scenario, the correlation coefficient of a certain URL corresponding to a certain SQL is equal to the correlation coefficient of a certain URL corresponding to a certain SQL. One of the simplest construction methods is the pairThe correlation coefficients are summed and averaged[10]. Because the correlation coefficients have the characteristics, when a measure index is constructed by using the correlation coefficients, attention needs to be paid to the fact that the correlation coefficients with larger values and close to 1 are most interesting, and if all the correlation coefficients are introduced during calculation, some values are necessarily introduced to be negative, which can cause adverse effects on the analysis. Therefore, we propose the following correlation coefficient selection criteria to construct the time-correction metric: for each SQL, the correlation coefficient values corresponding to each URL are sorted from large to small, the largest k correlation values r _ top (i, 1), r _ top (i,2) and … r _ top (i, k) are selected and summed to obtain the sum
Figure GDA0002385008370000051
For each SQL
Figure GDA0002385008370000052
And summing, and dividing by the number m and k of SQL types to obtain a final time correction measurement index rf:
Figure GDA0002385008370000053
when selecting the k value, we should try to avoid taking the negative correlation coefficient value, but also ensure that the k value cannot be too small. Experiments prove that the rf can well replace the accuracy to become a qualified time deviation measuring index, the experimental result is shown in figure 1, and the rf and the accuracy have the same distribution on different time deviations, particularly at the peak value.
A device for realizing a time offset calibration method in three-layer correlation audit is disclosed in fig. 3, and comprises a network packet listener, an SQL packet listener, and a correlation coefficient calculation and time offset calibration module, wherein the time offset calibration module is respectively connected with the network packet listener and the SQL packet listener. The correlation coefficient calculation and time offset calibration module is a procedure, the network packet listener continuously acquires a plurality of HTTP network packets (also referred to as URL packets herein) and outputs the HTTP network packets to the calculation procedure, and the HTTP network packets at least include one URL. The SQL packet monitor continuously acquires a plurality of SQL network packets and outputs the SQL network packets to the computer program. And the correlation calculation and time deviation calculation program realizes the calculation of the correlation coefficient of each URL and each SQL through the received HTTP network packet and SQL network packet, and calculates and calibrates the time deviation according to the calculated correlation coefficient.
A method for calibrating time offset in three-tier correlation audit, as shown in fig. 2, includes:
step S1: continuously receiving an HTTP network packet and an SQL database packet;
step S2: as shown in fig. 4, time interval division is performed on HTTP network packet flow and SQL database packet flow according to a given time offset;
step S3: counting the number and the square of each URL in each sampling time interval in the HTTP network packet flow, and respectively accumulating to obtain the number sum and the square sum, specifically comprising the following steps:
step S31: counting the number and the square of each URL in each sampling time interval in the HTTP network packet flow;
step S32: respectively accumulating to obtain a quantity sum and a square sum:
Figure GDA0002385008370000061
Figure GDA0002385008370000062
wherein:
Figure GDA0002385008370000063
for the sum of the number of each type of URL,
Figure GDA0002385008370000064
for the sum of squares of each URL, XNNumber of Nth URL, XN 2The square of the number of the Nth URL, wherein N is the number of the URL types in the interval;
step S33: and storing the number and the square of each URL in each sampling time interval, and accumulating the number sum and the square sum.
Step S4: counting the quantity and the square of each SQL in each sampling time interval in the SQL database packet flow, and respectively accumulating to obtain the quantity sum and the square sum, wherein the method specifically comprises the following steps:
step S41: counting the quantity and the square of each SQL in each sampling time interval in the SQL database packet flow;
step S42: respectively accumulating to obtain a quantity sum and a square sum:
Figure GDA0002385008370000071
Figure GDA0002385008370000072
wherein:
Figure GDA0002385008370000073
for the sum of the number of each SQL,
Figure GDA0002385008370000074
for the sum of squares of each SQL, YMIs the number of SQL types M, YM 2The square of the quantity of the M SQL types, wherein M is the number of SQL types in the interval;
step S43: and storing the quantity and the square of each SQL in each sampling time interval, and the quantity sum and the square sum obtained by accumulation.
Step S5: counting the product of the number of each URL and SQL in each sampling time interval, and accumulating to obtain a product sum;
step S6: and obtaining and storing a correlation coefficient based on the quantity sum and the square sum of each URL, the quantity sum and the square sum of each SQL and the product sum, wherein the mathematical expression of the correlation coefficient is as follows:
Figure GDA0002385008370000075
wherein: r is a correlation coefficient of the signal to be measured,
Figure GDA0002385008370000076
for the cumulative sum of the products of the numbers of each URL and each SQL, | X | is the number of samples sampled in step S3, | Y | is the number of samples sampled in step S4, | X ∩ Y | is the number of intersections of the sampled samples of SQL and URL.
Step S7: repeating the steps, and updating the stored correlation coefficient;
step S8: for each SQL, sorting the correlation coefficient values corresponding to each URL from large to small, selecting the maximum set number of correlation coefficient values, summing the correlation coefficient values, and storing the summation result and the set time deviation into a time deviation calculation table;
step S9: resetting a new possible time deviation value, wherein the mathematical expression is as follows:
t←t+t′
wherein: t' is the time offset increment and t is the time offset.
Step S10: and selecting the correlation coefficient and the maximum corresponding time deviation value as a final time deviation value according to the stored time deviation table.

Claims (3)

1. A time offset calibration method in three-layer correlation audit is characterized by comprising the following steps:
step S1: continuously receiving HTTP network packet and SQL database packet,
step S2: dividing the time interval of the HTTP network packet flow and the SQL database packet flow according to the given time deviation,
step S3: counting the number and square of each URL in each sampling time interval in the HTTP network packet flow, respectively accumulating to obtain the number sum and the square sum,
step S4: counting the number and square of each SQL in each sampling time interval in the SQL database packet flow, respectively accumulating to obtain the number sum and the square sum,
step S5: counting the product of the number of each URL and SQL in each sampling time interval, accumulating to obtain the product sum,
step S6: based on the sum of the number and the square of each URL, the sum of the number and the square of each SQL, and the product sum, the correlation coefficient is obtained and stored,
step S7: repeating the steps S1 to S6, updating the stored correlation coefficient,
step S8: for each SQL, the correlation coefficient values corresponding to each URL are sorted from large to small, the maximum set number of correlation coefficient values are selected and summed, the summed result and the set time deviation are stored in a time deviation calculation table,
step S9: resetting new possible time offset values, and repeating steps S1-S8 until all possible time offset values are traversed,
step S10: selecting the correlation coefficient and the maximum corresponding time deviation value as a final time deviation value according to the stored time deviation table;
the step S3 specifically includes:
step S31: counting the number and square of each URL in each sampling time interval in the HTTP network packet flow,
step S32: respectively accumulating to obtain a quantity sum and a square sum:
Figure FDA0002624489000000011
Figure FDA0002624489000000012
wherein:
Figure FDA0002624489000000021
for the sum of the number of each type of URL,
Figure FDA0002624489000000022
as the sum of squares of each URL,XNNumber of Nth URL, XN 2The square of the number of the Nth URL, wherein N is the number of the URL types in the interval;
step S33: storing the quantity and the square of each URL in each sampling time interval, and the quantity sum and the square sum obtained by accumulation;
the step S4 specifically includes:
step S41: the number and square of each SQL in each sampling time interval in the SQL database packet flow is counted,
step S42: respectively accumulating to obtain a quantity sum and a square sum:
Figure FDA0002624489000000023
Figure FDA0002624489000000024
wherein:
Figure FDA0002624489000000025
for the sum of the number of each SQL,
Figure FDA0002624489000000026
for the sum of squares of each SQL, YMIs the number of SQL types M, YM 2The square of the number of SQL types M, the number of SQL types in the interval,
step S43: storing the quantity and the square of each SQL in each sampling time interval, and the quantity sum and the square sum obtained by accumulation;
the mathematical expression of the correlation coefficient in step S6 is:
Figure FDA0002624489000000027
wherein: r is a correlation coefficient of the signal to be measured,
Figure FDA0002624489000000028
for the cumulative sum of the products of the numbers of each URL and each SQL, | X | is the number of samples sampled in step S3, | Y | is the number of samples sampled in step S4, | X ∩ Y | is the number of intersections of the sampled samples of SQL and URL.
2. The method of claim 1, wherein the mathematical expression of the new possible time deviation value in step S9 is as follows:
t←t+t′
wherein: t' is the time offset increment and t is the time offset.
3. An apparatus for implementing the time offset calibration method in the three-tier correlation audit according to claim 1 or 2, comprising a network packet listener, an SQL packet listener, and a correlation coefficient calculation and time offset calibration module, wherein the correlation coefficient calculation and time offset calibration module is respectively connected to the network packet listener and the SQL packet listener.
CN201910087096.2A 2019-01-29 2019-01-29 Time deviation calibration method in three-layer correlation audit Expired - Fee Related CN109889292B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910087096.2A CN109889292B (en) 2019-01-29 2019-01-29 Time deviation calibration method in three-layer correlation audit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910087096.2A CN109889292B (en) 2019-01-29 2019-01-29 Time deviation calibration method in three-layer correlation audit

Publications (2)

Publication Number Publication Date
CN109889292A CN109889292A (en) 2019-06-14
CN109889292B true CN109889292B (en) 2020-10-02

Family

ID=66927295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910087096.2A Expired - Fee Related CN109889292B (en) 2019-01-29 2019-01-29 Time deviation calibration method in three-layer correlation audit

Country Status (1)

Country Link
CN (1) CN109889292B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069193A (en) * 2020-08-27 2020-12-11 上海上讯信息技术股份有限公司 Correlation method and device based on asynchronous correlation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113598A (en) * 2014-07-21 2014-10-22 蓝盾信息安全技术有限公司 Three-layer auditing method for database
CN107479988A (en) * 2017-08-01 2017-12-15 西安交大捷普网络科技有限公司 Three layers of related auditing method based on DCOM

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070192381A1 (en) * 2006-02-15 2007-08-16 Padmanabhan Arun K Recalling website customer information across multiple servers located at different sites not directly connected to each other without requiring customer registration
US9305180B2 (en) * 2008-05-12 2016-04-05 New BIS Luxco S.à r.l Data obfuscation system, method, and computer implementation of data obfuscation for secret databases
CN108334547B (en) * 2017-12-27 2020-10-30 中电科华云信息技术有限公司 Data sharing exchange system and method based on big data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113598A (en) * 2014-07-21 2014-10-22 蓝盾信息安全技术有限公司 Three-layer auditing method for database
CN107479988A (en) * 2017-08-01 2017-12-15 西安交大捷普网络科技有限公司 Three layers of related auditing method based on DCOM

Also Published As

Publication number Publication date
CN109889292A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN106708818A (en) Pressure testing method and system
CN102612314A (en) Inspection system, simulation method and system, suitability determination method for determination standard value
CN107168063B (en) Soft measurement method based on integrated variable selection type partial least square regression
CN103365965B (en) A kind of aggregation process method and apparatus of data
CN114168906A (en) Mapping geographic information data acquisition system based on cloud computing
CN105577477A (en) IP address geographical positioning system of use probability model based on measurement
CN108345985A (en) A kind of power distribution network Data Quality Assessment Methodology and system
CN107545361A (en) Compare System and method between room
CN109889292B (en) Time deviation calibration method in three-layer correlation audit
CN103577660B (en) Gray scale experiment system and method
CN110830322A (en) Network flow measuring method and system based on probability measurement data structure Sketch with approximate zero error
CN116187621B (en) Carbon emission monitoring method and device
CN106600303A (en) Method and device for assessment of advertisement putting rationality
CN104796282A (en) Evaluating system and evaluating method for deep packet inspection product
CN115932530A (en) Method for calibrating semiconductor detection equipment
CN105740361B (en) The detection method and device of full dose data integrity degree
CN110690982B (en) Method and system for correlation analysis of management performance data of telecommunication network
CN106446405B (en) A kind of integrated circuit device neural net model establishing Method of Sample Selection and device
CN115841108A (en) Metering detection calibration method
CN110487315A (en) A kind of analysis system and method for instrument drift
CN106777313A (en) Based on holographic time scale measurement electric network data calculated value and calculated value Component Analysis method
CN107423222A (en) A kind of method and apparatus for determining test coverage
CN113609449A (en) Inertia measurement device acceleration test data validity evaluation method
CN113376469A (en) Analysis method of power quality disturbance data
CN103164625B (en) A kind of method being estimated each parameter in PAS system by measured data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20201002