CN113810335A - Method and system for identifying target IP, storage medium and equipment - Google Patents

Method and system for identifying target IP, storage medium and equipment Download PDF

Info

Publication number
CN113810335A
CN113810335A CN202010533071.3A CN202010533071A CN113810335A CN 113810335 A CN113810335 A CN 113810335A CN 202010533071 A CN202010533071 A CN 202010533071A CN 113810335 A CN113810335 A CN 113810335A
Authority
CN
China
Prior art keywords
login
target
obtaining
matrix
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010533071.3A
Other languages
Chinese (zh)
Other versions
CN113810335B (en
Inventor
王璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN202010533071.3A priority Critical patent/CN113810335B/en
Publication of CN113810335A publication Critical patent/CN113810335A/en
Application granted granted Critical
Publication of CN113810335B publication Critical patent/CN113810335B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a method for identifying a target IP (Internet protocol). As a current time window is adjacent to a previous time window, and the time windows are shorter by 0.5-1 h, the characteristics of login IPs of the two adjacent time windows have high similarity on the whole, therefore, a first target parameter value which is obtained by the current IP of the current time window and represents the deviation degree of the characteristics of the current IP and the characteristics in a characteristic matrix is used for accurately reflecting the deviation degree of the current IP by utilizing the characteristic value and the characteristic vector of the characteristic matrix of the previous time window, and the IP of which the threshold value of the first target parameter is greater than the threshold value is identified as the target IP by comparing with the threshold value. The method and the device realize timely and accurate identification of the login IP, avoid mistaken identification of the normal IP, and can timely intercept and limit the target IP and timely release occupied live broadcast network resources due to the timeliness of identification.

Description

Method and system for identifying target IP, storage medium and equipment
Technical Field
The invention relates to the technical field of network live broadcast, in particular to a method, a system, a storage medium and equipment for identifying a target IP.
Background
On a live network platform, malicious network attacks of some target IP often occur, such as obtaining free virtual props of the platform in batches, brushing advertisement bullets in batches and the like, and live network resources are occupied. In the prior art, suspicious login is intercepted through an IP frequency rule, if the login times or the number of accounts under the same IP are too many, the login under the IP is considered to be abnormal, and the identification method can cause the false identification of the IP such as a base station or a public internet bar. Therefore, the existing method for identifying the target IP has low accuracy and can cause error limitation on the normal IP.
Disclosure of Invention
In view of the above, the present invention has been made to provide a method and system, storage medium, and device for identifying a target IP that overcome or at least partially solve the above problems.
On one hand, the present application provides the following technical solutions through an embodiment of the present application:
a method for identifying a target IP for a live webcast platform, the method comprising:
obtaining m IPs logged in a previous time window and a feature matrix formed by n feature values of each IP based on a log of log events of the network live broadcast platform; wherein m and n are positive integers, and the previous time window is 0.5-1 h;
obtaining a matrix eigenvalue and an eigenvector based on the characteristic matrix;
acquiring n characteristic values of a current IP logged in a current time window, wherein the current time window is adjacent to the previous time window and is 0.5-1 h;
obtaining a first target parameter value representing the deviation degree of the features of the current IP and the features in the feature matrix based on the n feature values of the current IP, the matrix feature value and the feature vector;
judging whether the first target parameter value is larger than a first target parameter threshold value;
and if the first target parameter value is larger than the first target parameter threshold value, identifying the current IP as a target IP.
Optionally, after identifying the current IP as the target IP, the method further includes:
obtaining login information of each login event of a target IP, wherein the login information comprises a login timestamp T, a login nickname N and whether login is successful or not S;
obtaining a characteristic weight beta of a login timestamp based on a historical login eventTCharacteristic weight of login nicknameβNAnd a characteristic weight beta of whether the login was successful or notS
Based on the characteristic weight betaTCharacteristic weight betaNCharacteristic weight betaSThe login information of each login event is used for acquiring a second target parameter value representing the similarity between the two login events;
and obtaining a target login event based on the second target parameter value and a second target parameter threshold value.
Optionally, the weight β is used as the basisTWeight betaNWeight betaSAnd the login information of each login event, and obtaining a second target parameter value representing the similarity between the two login events, wherein the method specifically comprises the following steps:
obtaining the second target parameter value using the following equation:
sim(Ei,Ej)=1-dist(Ei,Ej);
Figure BDA0002536040030000021
wherein:
sim(Ei,Ej) Is a second target parameter value for login events i and j; dist (E)i,Ej) Is the distance between login events i and j; t isiAnd TjIs the log-in timestamp of log-in events i and j; n is a radical ofiAnd NjA character string which is a login nickname of login events i and j; i (S)i=Sj) And (3) a value representing whether the login of the login events i and j is successful or not, wherein the value is 1 if the login events i and j are consistent, and the value is 0 if the login events i and j are inconsistent.
Optionally, the weight β of the characteristic of the login timestamp is obtained based on the historical login eventTWeight beta of login nickname featureNAnd a weight beta characterizing whether the login was successfulSThe method specifically comprises the following steps:
acquiring a plurality of first login event pairs from a plurality of target IPs, wherein two login events in the second login event pair belong to the same target IP;
randomly extracting a plurality of second login event pairs from login events which do not belong to a target IP, wherein two login events in the second login event pairs do not belong to the same IP;
respectively obtaining a first average distance of each login information of the plurality of first login event pairs and a second average distance of each login information of the plurality of second login event pairs;
obtaining the weight beta of the characteristic of the login timestamp based on the first average distance and the second average distanceTWeight beta of login nickname featureNAnd a weight beta characterizing whether the login was successfulS
Optionally, the obtaining a matrix eigenvalue and an eigenvector based on the feature matrix specifically includes:
carrying out zero equalization on each column of the feature matrix to obtain an equalized feature matrix;
based on the mean characteristic matrix, obtaining a covariance matrix according to the following formula:
Figure BDA0002536040030000031
c is a covariance matrix, X is a mean feature matrix, XTIs the transpose of the equalized feature matrix;
obtaining the matrix eigenvalue lambda based on the covariance matrix12,...,λkAnd a feature vector e1,e2,...,ekWhere k represents the number of eigenvalues contributing the most.
Optionally, the obtaining, based on the n eigenvalues of the current IP, the matrix eigenvalue, and the eigenvector, a first target parameter value representing a degree of deviation between the characteristic of the current IP and the characteristic in the eigenvector matrix specifically includes:
obtaining the first target parameter value using the following equation:
Figure BDA0002536040030000032
wherein: score (x) is a first target parameter value with a current IP characterized by x; e.g. of the typeyIs the y-th eigenvector, λyIs the corresponding y-th characteristic value, y being 1, 2.
Optionally, after obtaining the target login event, the method further includes:
and limiting the functions of the target IP and/or the account related to the target login event.
On the other hand, the present application provides a system for identifying a target IP through another embodiment of the present application, where the system is used for a live webcast platform, and the system includes:
the first obtaining module is used for obtaining m IPs logged in a previous time window and a feature matrix formed by n feature values of each IP based on a log of logging events of the live webcast platform; wherein m and n are positive integers, and the previous time window is 0.5-1 h;
a second obtaining module, configured to obtain a matrix eigenvalue and an eigenvector based on the feature matrix;
the first acquisition module is used for acquiring n characteristic values of a current IP logged in a current time window, wherein the current time window is adjacent to the previous time window, and the current time window is 0.5-1 h;
a third obtaining module, configured to obtain, based on the n eigenvalues of the current IP, the matrix eigenvalue, and the eigenvector, a first target parameter value representing a degree of deviation between the feature of the current IP and the feature in the feature matrix;
the judging module is used for judging whether the first target parameter value is larger than a first target parameter threshold value or not;
and the identification module is used for identifying the current IP as the target IP if the first target parameter value is greater than the first target parameter threshold value.
The invention discloses a readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The invention discloses an apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor performing the steps of the method.
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:
obtaining m IPs logged in a previous time window and a feature matrix formed by n feature values of each IP based on a log of log events of the network live broadcast platform; obtaining a matrix eigenvalue and an eigenvector based on the characteristic matrix; acquiring n characteristic values of a current IP logged in a current time window, wherein the current time window is adjacent to the previous time window and is 0.5-1 h; obtaining a first target parameter value representing the deviation degree of the features of the current IP and the features in the feature matrix based on the n feature values of the current IP, the matrix feature value and the feature vector; judging whether the first target parameter value is larger than a first target parameter threshold value; and if the first target parameter value is larger than the first target parameter threshold value, identifying the current IP as a target IP. The method comprises the steps that a current time window is adjacent to a previous time window, and the time window is shorter for 0.5-1 h, so that the characteristics of login IPs of the two adjacent time windows are highly similar on the whole, and therefore, the first target parameter value which is obtained by the current IP of the current time window and represents the deviation degree of the characteristics of the current IP and the characteristics in the characteristic matrix can accurately reflect the deviation degree of the current IP by utilizing the characteristic value and the characteristic vector of the characteristic matrix of the previous time window, and the IP of which the threshold value of the first target parameter is larger than the threshold value is identified as the target IP by comparing with the threshold value. After a target IP is identified, obtaining login information of each login event of the target IP, wherein the login information comprises a login timestamp T, a login nickname N and whether login is successful or not S; obtaining a characteristic weight beta of a login timestamp based on a historical login eventTCharacteristic weight beta of login nicknameNAnd a characteristic weight beta of whether the login was successful or notS(ii) a Based on the characteristic weight betaTCharacteristic weight betaNCharacteristic weight betaSAnd each isObtaining a second target parameter value representing the similarity between the two login events according to the login information of each login event; and obtaining a target login event based on the second target parameter value and a second target parameter threshold value. For the identified target login event, the related functions of the account related to the target login abnormal event can be limited, so that the error limitation caused by the unified limitation of the target IP is avoided, the occupied network resources are released, and the flow of live broadcast is increased.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a flow diagram of a method of identifying a target IP in one embodiment of the invention;
fig. 2 is a system architecture diagram for identifying a target IP in one embodiment of the invention.
Detailed Description
The embodiment of the application provides a method, a system, a storage medium and equipment for identifying a target IP, and solves the technical problem that the method for identifying the target IP is low in accuracy rate and causes error limitation on a normal IP.
In order to solve the technical problems, the general idea of the embodiment of the application is as follows:
a method for identifying a target IP (Internet protocol) is characterized in that a characteristic matrix consisting of m IPs logged in a previous time window and n characteristic values of each IP is obtained based on a log of logging events of a live network platform; obtaining a matrix eigenvalue and an eigenvector based on the characteristic matrix; acquiring n characteristic values of a current IP logged in a current time window, wherein the current time window is adjacent to the previous time window and is 0.5-1 h; obtaining the feature characterizing the current IP and the feature vector based on the n eigenvalues of the current IP, the matrix eigenvalue and the feature vectorA first target parameter value of a degree of deviation of a feature in the matrix; judging whether the first target parameter value is larger than a first target parameter threshold value; and if the first target parameter value is larger than the first target parameter threshold value, identifying the current IP as a target IP. The method comprises the steps that a current time window is adjacent to a previous time window, the time windows are shorter by 0.5-1 h, and therefore the characteristics of login IPs of the two adjacent time windows are highly similar on the whole, the characteristic value and the characteristic vector of a characteristic matrix of the previous time window are utilized, a first target parameter value which is obtained by the current IP of the current time window and represents the deviation degree of the characteristics of the current IP and the characteristics in the characteristic matrix can accurately reflect the deviation degree of the current IP, and the IP of which the threshold value of the first target parameter is larger than the threshold value is identified as the target IP through comparison with the threshold value. After a target IP is identified, obtaining login information of each login event of the target IP, wherein the login information comprises a login timestamp T, a login nickname N and whether login is successful or not S; obtaining a characteristic weight beta of a login timestamp based on a historical login eventTCharacteristic weight beta of login nicknameNAnd a characteristic weight beta of whether the login was successful or notS(ii) a Based on the characteristic weight betaTCharacteristic weight betaNCharacteristic weight betaSThe login information of each login event is used for acquiring a second target parameter value representing the similarity between the two login events; and obtaining a target login event based on the second target parameter value and a second target parameter threshold value. For the identified target login event, the related functions of the account related to the target login abnormal event can be limited, so that the error limitation caused by the unified limitation of the target IP is avoided, the occupied network resources are released, and the flow of live broadcast is increased.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
First, it is stated that the term "and/or" appearing herein is merely one type of associative relationship that describes an associated object, meaning that three types of relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Example one
The present embodiment provides a method for identifying a target IP, which is used for a live webcast platform, and referring to fig. 1, the method of the present embodiment includes the following steps:
s100, obtaining m IPs logged in a previous time window and a feature matrix formed by n feature values of each IP based on a log of a log-in event of the network live broadcast platform; wherein m and n are positive integers, and the previous time window is 0.5-1 h;
s200, obtaining a matrix eigenvalue and an eigenvector based on the characteristic matrix;
s300, acquiring n characteristic values of a current IP logged in a current time window, wherein the current time window is adjacent to the previous time window and is 0.5-1 h;
s400, obtaining a first target parameter value representing the deviation degree of the features of the current IP and the features in the feature matrix based on the n feature values of the current IP, the matrix feature values and the feature vectors;
s500, judging whether the first target parameter value is larger than a first target parameter threshold value;
s600, if the first target parameter value is larger than the first target parameter threshold value, identifying the current IP as a target IP.
It should be noted that the IP in this embodiment refers to a network IP address used by a user to log in a live webcast platform, and the user may be a person participating in live webcast or an electronic device participating in live webcast interaction, such as an intelligent robot.
The method for identifying the target IP provided by the embodiment can be applied to scenes for identifying the target IP which participates in live broadcasting room activities in an illegal malicious network attack mode, such as obtaining free virtual props of a platform in batches, brushing advertisement barrages in batches and the like. The method may be performed by a target IP device, which may be implemented in software and/or hardware, typically integrated in a terminal, such as a server corresponding to a live platform.
Referring to fig. 1, the method of the present embodiment is performed as follows:
firstly, executing S100, and obtaining m IPs logged in a previous time window and a feature matrix formed by n feature values of each IP based on a log of log events of the network live broadcast platform; wherein m and n are positive integers, and the previous time window is 0.5-1 h.
It can be understood that, in order to subsequently obtain the matrix eigenvalue and eigenvector, the feature matrix needs to be obtained first for identifying the current IP. Therefore, the log of the behavior of the user login platform can be recorded in the log of the login event, and the log can contain information such as login account id, login account nickname, login timestamp, whether login is successful and the like.
According to the information of the behavior log, the following 3 characteristics can be counted for each login event under the IP login:
the method is characterized in that: nickname length standard deviation (nickname length is the number of characters of the nickname text);
and (2) feature: the nickname mode with the highest frequency of appearance (the nickname mode is to convert the nickname text characters into L in lowercase English, U in uppercase English, D in number, C in Chinese character and O in other words);
and (3) feature: number of accounts logged in (deduplication, i.e. the same account id is computed only once).
It should be noted that the login event herein refers to a login behavior occurring under each login IP, that is, an event of logging in a platform by using the IP. Each login event can comprise information such as login account id, login account nickname, login timestamp, whether login is successful and the like.
Because the data in the log of the login event refers to objective data generated by the login event recorded on the live webcast platform, in order to identify the target IP subsequently, the information such as the login account id, the nickname of the login account, the timestamp of the login, whether the login is successful and the like in the log of the login event is selected, and the 3 characteristics are obtained.
The reason for selecting the above-mentioned 3 features is: the account number of the abnormal user is usually registered by a registration machine, and the nickname mode of the account number is very similar; the nickname length of the account under the normal IP is relatively random, the nickname standard difference is larger, and the nickname standard difference of the account under the abnormal IP is smaller, so that the nickname mode ratio with the highest occurrence frequency and the nickname length standard difference can represent the abnormal degree of the IP; in addition, due to resource limitation, the number of login accounts of an abnormal user under the same login IP is usually more than that of a normal IP and the number of login accounts of the abnormal user is not the same as that of the normal IP, so that the abnormal degree of the IP can be objectively represented by the number of login accounts.
Based on this, it is obvious to those skilled in the art that, in step S100 of the present embodiment, selected: the three feature data, namely the nickname length standard deviation, the proportion of the nickname mode with the highest occurrence frequency and the number of logged accounts, are all necessary parameters for further improving the identification accuracy, are traces left after the use of a user, are objectively existed, are not selected by artificial subjective factors, but are objectively obtained through log data (namely, are selected according with a natural rule) for solving the technical problem, and a data basis is provided for the following S200-S600.
For convenience of subsequently extracting the matrix eigenvalue and the eigenvector, the m IPs logged in the previous time window and the n eigenvalues of each IP can be sorted into the eigenvector matrix. In the present embodiment, the n feature values are feature values corresponding to the 3 kinds of features of the IP.
Next, S200 is executed, and based on the feature matrix, a matrix feature value and a feature vector are obtained.
In this embodiment, the matrix eigenvalues and eigenvectors may be obtained according to the following steps:
zero averaging is carried out on each row of the feature matrix, namely, the average value of each row is subtracted from each numerical value on the row to obtain an averaged feature matrix;
based on the mean characteristic matrix, obtaining a covariance matrix according to the following formula:
Figure BDA0002536040030000091
c is a covariance matrix, X is a mean feature matrix, XTIs the transpose of the equalized feature matrix;
obtaining the matrix eigenvalue lambda based on the covariance matrix12,...,λkAnd a feature vector e1,e2,...,ekWhere k represents the number of eigenvalues contributing the most.
And then executing S300, and acquiring n characteristic values of the current IP logged in a current time window, wherein the current time window is adjacent to the previous time window, and the current time window is 0.5-1 h.
It can be understood that after obtaining the eigenvalue and the eigenvector of the feature matrix of the previous time window, in order to identify the current IP in time subsequently, n eigenvalues of the current IP logged in the current time window need to be obtained first. In addition, in order to ensure the accuracy of identification while identifying in time, the current time window is adjacent to the previous time window in the embodiment, and the current time window and the previous time window are 0.5-1 h, and because the current time window is adjacent to the previous time window and the time window is 0.5-1 h shorter, the characteristics of the login IPs of the two adjacent time windows have high similarity on the whole. Providing theoretical basis for the subsequent S400.
Next, S400 is executed, and based on the n eigenvalues of the current IP, the matrix eigenvalue, and the eigenvector, a first target parameter value is obtained, which characterizes a degree of deviation of the features of the current IP from the features in the eigenvector matrix.
In a specific implementation process, in order to identify whether the current IP is a target IP with abnormal features, a first target parameter value which characterizes the deviation degree of the features of the current IP from the features in the feature matrix needs to be obtained firstly.
Illustratively, the first target parameter value is obtained using the following formula:
Figure BDA0002536040030000101
wherein: score (x) is a first target parameter value with a current IP characterized by x; e.g. of the typeyIs the y-th eigenvector, λyIs the corresponding y-th characteristic value, y being 1, 2. Here, the feature x is 3 feature values corresponding to the 3 feature values.
It should be noted that the principle of the above method and calculation formula is:
the characteristic vector represents different directions of variance change degrees of the characteristic data, the matrix characteristic value is the variance size of the characteristic data in the corresponding direction, and variance change in different directions reflects the internal characteristics of the data, so that if the characteristic of a single data sample is different from the characteristic shown by the whole data sample, and the characteristic deviates greatly from other data samples in certain directions, the data sample is an abnormal point.
In the formula, the deviation degree of the characteristic x in the direction y is shown in the normalization operation, so that the deviation degrees in different directions can be compared. After the deviation degrees of the data samples in all directions are calculated, the deviation degrees are summed to finally obtain a first target parameter value representing the deviation degrees of the features of the current IP and the features in the feature matrix.
After the first target parameter value is obtained, next, executing S500 and S600, and judging whether the first target parameter value is larger than a first target parameter threshold value; and if the first target parameter value is larger than the first target parameter threshold value, identifying the current IP as a target IP.
In the present embodiment, the IP having abnormal characteristics, such as the proxy IP, is identified in the login IP, and the first target parameter values of these IPs having abnormal characteristics are calculated and sorted from the size. After sorting, according to the requirement on the identification accuracy, the higher the quantile is, the fewer the number of target IPs obtained by identification is, and the higher the accuracy of the target IPs already identified is, but some identification may be missed, so in order to obtain an accurate identification result and make identification as missed as possible, the 95% quantile is taken as the first target parameter threshold in this embodiment.
The above has clearly described a complete process of identifying a current IP, and it can be understood that as long as the identification of the login IP in any time period can be realized according to the above steps, so as to obtain a target IP set in a certain time period.
After the target IP is identified and obtained, subsequent behavior limitation can be performed on the target IP, but normal user login events can also exist under the target IP, and if the behavior limitation is uniformly performed, false limitation can also be caused. For this reason, it is necessary to further identify a target login event under the target IP.
As an optional embodiment, after the identifying the current IP as the target IP, the method further comprises:
the method comprises the steps of firstly, obtaining login information of each login event of a target IP, wherein the login information comprises a login timestamp T, a login nickname N and whether login is successful or not S;
secondly, obtaining the characteristic weight beta of the login timestamp based on the historical login eventTCharacteristic weight beta of login nicknameNAnd a characteristic weight beta of whether the login was successful or notS
Third, based on the feature weight betaTCharacteristic weight betaNCharacteristic weight betaSThe login information of each login event is used for acquiring a second target parameter value representing the similarity between the two login events;
and fourthly, acquiring a target login event based on the second target parameter value and a second target parameter threshold value.
It should be noted that, in order to identify a target login event, login information of each login event of a target IP needs to be first acquired, and in this embodiment, the login information includes a login timestamp T, a login nickname N, and whether login is successful S. The login information of the embodiment is also obtained from the login event log, and objectivity is achieved. And because the subsequent process is only to obtain the similarity of the two login events, the typical login information such as the login timestamp T, the login nickname N, whether the login is successful S and the like in the login events can be selected as basic data according to needs.
Next, based on the weight βTWeight betaNWeight betaSAnd the login information of each login event is used for obtaining a second target parameter value representing the similarity between the two login events.
For example, the second target parameter value may be obtained by using the following formula:
sim(Ei,Ej)=1-dist(Ei,Ej);
Figure BDA0002536040030000121
wherein:
sim(Ei,Ej) Is a second target parameter value for login events i and j; dist (E)i,Ej) Is the distance between login events i and j; t isiAnd TjIs the log-in timestamp of log-in events i and j; n is a radical ofiAnd NjA character string which is a login nickname of login events i and j; i (S)i=Sj) And (3) a value representing whether the login of the login events i and j is successful or not, wherein the value is 1 if the login events i and j are consistent, and the value is 0 if the login events i and j are inconsistent.
The principle of the above formula is: the timestamp of the occurrence of the logging event is a numerical variable, and the calculated distance, which represents the difference of the numerical variable, i.e., the manhattan distance, is processed by a function in order to normalize it to 0-1, which is 1 if 0 and tends to 0 if large. The nickname of the login account can be regarded as a set of characters, so that the set Jacard distance is used for calculation, namely, the character string similarity of the nickname of the account can be expressed. Whether the login is successful or not is a discrete variable, and only two states are yes or no, so that the distance is different from 0 when the states are the same, and the distance is 1.
However, since each login information contributes to the similarity of login events differently, in order to improve the identification accuracy, a weight needs to be added to the feature of each login information. It is now common to determine the weights manually. In this embodiment, a method for determining a feature weight of each login information is provided:
acquiring a plurality of first login event pairs from a plurality of target IPs, wherein two login events in the second login event pair belong to the same target IP;
randomly extracting a plurality of second login event pairs from login events which do not belong to a target IP, wherein two login events in the second login event pairs do not belong to the same IP;
respectively obtaining a first average distance of each login information of the plurality of first login event pairs and a second average distance of each login information of the plurality of second login event pairs;
obtaining a characteristic weight beta of the login timestamp based on the first average distance and the second average distanceTCharacteristic weight beta of login nicknameNAnd a characteristic weight beta of whether the login was successful or notS
It should be noted that, in the method for respectively obtaining the first average distance of each login information of the plurality of first login event pairs and the second average distance of each login information of the plurality of second login event pairs, referring to the calculation principle of the distance between login events i and j in this embodiment, the distance of each login event pair is calculated, and then the average distance of the plurality of login event pairs is calculated.
Next, the feature weight of each login information can be obtained by using the following formula:
Figure BDA0002536040030000131
wherein, waIs the characteristic weight of the login information a, in this embodiment, is the weight β of the login timestamp characteristicTWeight beta of login nickname featureNAnd a weight beta characterizing whether the login was successfulS;SaIs the first mean distance, S, of the landing information aaIs the second average distance of the login information a; sbIs the first mean distance of the landing information b, DbIs the second average distance of the login information b.
The principle of the above formula is: because a plurality of first login event pairs are selected under the unified target IP, and the login events are under the same abnormal IP, the login events can be considered to be similar; while several second login event pairs do not belong under the same IP and can therefore be considered dissimilar. If a feature is important, then its similarity is generally low in a number of login events that are not similar, and the similarity will be higher in a number of login events that are similar. Therefore, the first average distance and the second average distance are taken as the weights. In order to add up the weight to 1, the ratio of the first average distance to the second average distance is normalized by dividing the ratio of the first average distance to the second average distance of the login information a by the sum of the ratios of the first average distance to the second average distance of all login information.
After a second target parameter value accurately representing the similarity between the two login events is obtained, a target login event is obtained based on the second target parameter value and a second target parameter threshold value.
Specifically, the login event pair with the second target parameter value higher than the second target parameter threshold is identified as a target login event pair, wherein the login event is a target login event.
The setting method of the second target parameter threshold value comprises the following steps: and (4) counting the average second target parameter values of the login event pairs under each target IP, sequencing from small to large, and taking a 95% quantile as a threshold value. If more exceptions need to be found, the threshold can be adjusted down, otherwise the threshold is increased.
For the identified target login event, the related functions of the account related to the target login abnormal event can be limited, so that the error limitation caused by the unified limitation of the target IP is avoided, the occupied network resources are released, and the flow of live broadcast is increased.
The following describes the implementation process of the method of this embodiment by using a practical example:
assume that there are 5 log entries:
(1) the IP is 12.30.34.124, the nickname is sadt001, the login time is 12:34, and the login fails;
(2) the IP is 12.30.34.124, the nickname is sadt002, the login time is 12:36, and the login fails;
(3) the IP is 12.30.34.124, the nickname is support, the login time is 12:39, and the login is successful;
(4) the IP is 35.68.90.11, the nickname is rajan, the login time is 12:41, and the login is successful;
(5) IP is 35.68.90.11, nickname is jordan, login time is 13:41, and login is successful.
First, the elements of the feature matrix include:
for IP 12.30.34.124:
nickname length standard deviation is 0
The nickname pattern with the highest frequency of occurrence has a ratio of 0.67
The number of logged-in accounts is 3
For IP 35.68.90.11:
nickname length standard deviation is 0.5
The nickname pattern with the highest frequency of occurrence has a ratio of 0.5
The number of the logged-in accounts is 2.
The two IPs then correspond to the features: (0,0.67, 3; 0.5,0.5,2)
Normalizing the features can yield: (-0.25,0.085, 0.5; 0.25, -0.085, -0.5).
Then, a first target parameter value is calculated:
assuming that the calculation yields λ1=0.6,λ2=0.3,e1=(-0.7,0.2,0.9),e2=(0.1,-0.5,0.3)
The first target parameter values for IP12.30.34.124 are then:
Figure BDA0002536040030000151
the first target parameter values for IP 35.68.90.11 are:
Figure BDA0002536040030000152
assume a first target parameter threshold of 1, and therefore 12.30.34.124 is the target IP.
And aiming at the target IP, identifying a target login event:
a second target parameter value between the three login events in IP12.30.34.124 is calculated.
Assuming that the feature weights of the 3 login messages are 0.4,0.4 and 0.2, respectively, then:
Figure BDA0002536040030000153
Figure BDA0002536040030000154
Figure BDA0002536040030000155
the above-mentioned second target parameter values are then 0.73,0.32.0.35, respectively. Setting the second target parameter threshold value of 0.7, the event combination E can be found1,E2Is then E1,E2Is a target login event.
The technical scheme in the embodiment of the application at least has the following technical effects or advantages:
in the method of this embodiment, based on the log of the log event of the live webcast platform, m IPs logged in a previous time window and a feature matrix formed by n feature values of each IP are obtained; obtaining a matrix eigenvalue and an eigenvector based on the characteristic matrix; acquiring n characteristic values of a current IP logged in a current time window, wherein the current time window is adjacent to the previous time window and is 0.5-1 h; obtaining a first target parameter value representing the deviation degree of the features of the current IP and the features in the feature matrix based on the n feature values of the current IP, the matrix feature value and the feature vector; judging whether the first target parameter value is larger than a first target parameter threshold value; and if the first target parameter value is larger than the first target parameter threshold value, identifying the current IP as a target IP. The method comprises the steps that a current time window is adjacent to a previous time window, the time windows are shorter by 0.5-1 h, and therefore the characteristics of login IPs of the two adjacent time windows are highly similar on the whole, the characteristic value and the characteristic vector of a characteristic matrix of the previous time window are utilized, a first target parameter value which is obtained by the current IP of the current time window and represents the deviation degree of the characteristics of the current IP and the characteristics in the characteristic matrix can accurately reflect the deviation degree of the current IP, and the IP of which the threshold value of the first target parameter is larger than the threshold value is identified as the target IP through comparison with the threshold value. The method and the device realize timely and accurate identification of the login IP, avoid mistaken identification of the normal IP, and can timely intercept and limit the target IP and timely release occupied live broadcast network resources due to the timeliness of identification.
Example two
Based on the same inventive concept as the embodiment, the embodiment provides a system for identifying a target IP, which is used for a live webcast platform, and referring to fig. 2, the system includes:
the first obtaining module is used for obtaining m IPs logged in a previous time window and a feature matrix formed by n feature values of each IP based on a log of logging events of the live webcast platform; wherein m and n are positive integers, and the previous time window is 0.5-1 h;
a second obtaining module, configured to obtain a matrix eigenvalue and an eigenvector based on the feature matrix;
the first acquisition module is used for acquiring n characteristic values of a current IP logged in a current time window, wherein the current time window is adjacent to the previous time window, and the current time window is 0.5-1 h;
a third obtaining module, configured to obtain, based on the n eigenvalues of the current IP, the matrix eigenvalue, and the eigenvector, a first target parameter value representing a degree of deviation between the feature of the current IP and the feature in the feature matrix;
the judging module is used for judging whether the first target parameter value is larger than a first target parameter threshold value or not;
and the identification module is used for identifying the current IP as the target IP if the first target parameter value is greater than the first target parameter threshold value. .
Since the system for identifying a target IP described in this embodiment is a system adopted to implement the method for identifying a target IP described in the first embodiment of this application, a person skilled in the art can understand the specific implementation manner of the system described in this embodiment and various variations thereof based on the method for identifying a target IP described in the first embodiment of this application, and therefore, how to implement the method in the first embodiment using the system described in this embodiment is not described in detail here. The system adopted by a person skilled in the art for implementing the method for identifying the target IP in the embodiment of the present application is within the protection scope of the present application.
Based on the same inventive concept as in the previous embodiments, embodiments of the present invention further provide a readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of any of the methods described above.
Based on the same inventive concept as in the previous embodiments, an embodiment of the present invention further provides an apparatus, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the steps of any of the methods described above when executing the program.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. A method for identifying a target IP (Internet protocol) is used for a network live platform, and is characterized by comprising the following steps:
obtaining m IPs logged in a previous time window and a feature matrix formed by n feature values of each IP based on a log of log events of the network live broadcast platform; wherein m and n are positive integers, and the previous time window is 0.5-1 h;
obtaining a matrix eigenvalue and an eigenvector based on the characteristic matrix;
acquiring n characteristic values of a current IP logged in a current time window, wherein the current time window is adjacent to the previous time window and is 0.5-1 h;
obtaining a first target parameter value representing the deviation degree of the features of the current IP and the features in the feature matrix based on the n feature values of the current IP, the matrix feature value and the feature vector;
judging whether the first target parameter value is larger than a first target parameter threshold value;
if the first target parameter value is greater than the first target parameter threshold value, identifying the current IP as a target IP;
obtaining login information of each login event of the target IP, wherein the login information comprises a login timestamp T, a login nickname N and whether login is successful or not S;
obtaining a characteristic weight beta of a login timestamp based on a historical login eventTCharacteristic weight beta of login nicknameNAnd a characteristic weight beta of whether the login was successful or notS
Based on the characteristic weight betaTCharacteristic weight betaNCharacteristic weight betaSThe login information of each login event is used for acquiring a second target parameter value representing the similarity between the two login events;
and obtaining a target login event based on the second target parameter value and a second target parameter threshold value.
2. The method of claim 1, wherein the weight-based β is based onTWeight betaNWeight betaSAnd the login information of each login event, and obtaining a second target parameter value representing the similarity between the two login events, wherein the method specifically comprises the following steps:
obtaining the second target parameter value using the following equation:
sim(Ei,Ej)=1-dist(Ei,Ej);
Figure FDA0002536040020000021
wherein:
sim(Ei,Ej) Is a second target parameter value for login events i and j; dist (E)i,Ej) Is the distance between login events i and j; t isiAnd TjIs the log-in timestamp of log-in events i and j; n is a radical ofiAnd NjA character string which is a login nickname of login events i and j; i (S)i=Sj) And (3) a value representing whether the login of the login events i and j is successful or not, wherein the value is 1 if the login events i and j are consistent, and the value is 0 if the login events i and j are inconsistent.
3. The method of claim 2, wherein the obtaining the weight β of the login timestamp feature is based on historical login eventsTWeight beta of login nickname featureNAnd a weight beta characterizing whether the login was successfulSThe method specifically comprises the following steps:
acquiring a plurality of first login event pairs from a plurality of target IPs, wherein two login events in the second login event pair belong to the same target IP;
randomly extracting a plurality of second login event pairs from login events which do not belong to a target IP, wherein two login events in the second login event pairs do not belong to the same IP;
respectively obtaining a first average distance of each login information of the plurality of first login event pairs and a second average distance of each login information of the plurality of second login event pairs;
obtaining a characteristic weight beta of the login timestamp based on the first average distance and the second average distanceTCharacteristic weight beta of login nicknameNAnd a characteristic weight beta of whether the login was successful or notS
4. The method according to claim 1, wherein the obtaining matrix eigenvalues and eigenvectors based on the eigen matrix specifically comprises:
carrying out zero equalization on each column of the feature matrix to obtain an equalized feature matrix;
based on the mean characteristic matrix, obtaining a covariance matrix according to the following formula:
Figure FDA0002536040020000022
c is a covariance matrix, X is a mean feature matrix, XTIs the transpose of the equalized feature matrix;
obtaining the matrix eigenvalue lambda based on the covariance matrix12,...,λkAnd a feature vector e1,e2,...,ekWhere k represents the number of eigenvalues contributing the most.
5. The method as claimed in claim 4, wherein said obtaining a first target parameter value characterizing a degree of deviation of the feature of the current IP from the features in the feature matrix based on the n feature values of the current IP, the matrix feature values and the feature vectors specifically comprises:
obtaining the first target parameter value using the following equation:
Figure FDA0002536040020000031
wherein: score (x) is a first target parameter value with a current IP characterized by x; e.g. of the typeyIs the y-th eigenvector, λyIs the corresponding y-th characteristic value, y being 1, 2.
6. The method of claim 1, wherein after the obtaining the target login event, the method further comprises:
and limiting the functions of the target IP and/or the account related to the target login event.
7. A system for identifying a target IP for a live webcast platform, the system comprising:
the first obtaining module is used for obtaining m IPs logged in a previous time window and a feature matrix formed by n feature values of each IP based on a log of logging events of the live webcast platform; wherein m and n are positive integers, and the previous time window is 0.5-1 h;
a second obtaining module, configured to obtain a matrix eigenvalue and an eigenvector based on the feature matrix;
the first acquisition module is used for acquiring n characteristic values of a current IP logged in a current time window, wherein the current time window is adjacent to the previous time window, and the current time window is 0.5-1 h;
a third obtaining module, configured to obtain, based on the n eigenvalues of the current IP, the matrix eigenvalue, and the eigenvector, a first target parameter value representing a degree of deviation between the feature of the current IP and the feature in the feature matrix;
the judging module is used for judging whether the first target parameter value is larger than a first target parameter threshold value or not;
and the identification module is used for identifying the current IP as the target IP if the first target parameter value is greater than the first target parameter threshold value.
8. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
9. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1-7 are implemented when the program is executed by the processor.
CN202010533071.3A 2020-06-12 2020-06-12 Method and system for identifying target IP, storage medium and equipment Active CN113810335B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010533071.3A CN113810335B (en) 2020-06-12 2020-06-12 Method and system for identifying target IP, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010533071.3A CN113810335B (en) 2020-06-12 2020-06-12 Method and system for identifying target IP, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN113810335A true CN113810335A (en) 2021-12-17
CN113810335B CN113810335B (en) 2023-08-22

Family

ID=78943853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010533071.3A Active CN113810335B (en) 2020-06-12 2020-06-12 Method and system for identifying target IP, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN113810335B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014138404A (en) * 2013-01-18 2014-07-28 Nec Corp Mutual authentication system, terminal device, mutual authentication server, mutual authentication method, and mutual authentication program
US20150262062A1 (en) * 2014-03-17 2015-09-17 Microsoft Corporation Decision tree threshold coding
US20160210556A1 (en) * 2015-01-21 2016-07-21 Anodot Ltd. Heuristic Inference of Topological Representation of Metric Relationships
CA2894317A1 (en) * 2015-06-15 2016-12-15 Deep Genomics Incorporated Systems and methods for classifying, prioritizing and interpreting genetic variants and therapies using a deep neural network
WO2019134284A1 (en) * 2018-01-08 2019-07-11 武汉斗鱼网络科技有限公司 Method and apparatus for recognizing user, and computer device
CN110059661A (en) * 2019-04-26 2019-07-26 腾讯科技(深圳)有限公司 Action identification method, man-machine interaction method, device and storage medium
CN110149343A (en) * 2019-05-31 2019-08-20 国家计算机网络与信息安全管理中心 A kind of abnormal communications and liaison behavioral value method and system based on stream

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014138404A (en) * 2013-01-18 2014-07-28 Nec Corp Mutual authentication system, terminal device, mutual authentication server, mutual authentication method, and mutual authentication program
US20150262062A1 (en) * 2014-03-17 2015-09-17 Microsoft Corporation Decision tree threshold coding
US20160210556A1 (en) * 2015-01-21 2016-07-21 Anodot Ltd. Heuristic Inference of Topological Representation of Metric Relationships
CA2894317A1 (en) * 2015-06-15 2016-12-15 Deep Genomics Incorporated Systems and methods for classifying, prioritizing and interpreting genetic variants and therapies using a deep neural network
WO2019134284A1 (en) * 2018-01-08 2019-07-11 武汉斗鱼网络科技有限公司 Method and apparatus for recognizing user, and computer device
CN110059661A (en) * 2019-04-26 2019-07-26 腾讯科技(深圳)有限公司 Action identification method, man-machine interaction method, device and storage medium
CN110149343A (en) * 2019-05-31 2019-08-20 国家计算机网络与信息安全管理中心 A kind of abnormal communications and liaison behavioral value method and system based on stream

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刘玉宽等: "分布式拒绝服务攻击高速率单点局部异常检测", 《计算机应用与软件》 *
王建等: "网络用户角色辨识及其恶意访问行为的发现方法", 《计算机科学》 *
董书琴等: "一种面向流量异常检测的概率流抽样方法", 《电子与信息学报》 *
郝志宇等: "基于相似度的DDoS异常检测系统", 《计算机工程与应用》 *

Also Published As

Publication number Publication date
CN113810335B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
US20150161387A1 (en) Network virtual user risk control method and system
CN106899440B (en) Network intrusion detection method and system for cloud computing
CN110099059B (en) Domain name identification method and device and storage medium
WO2019136850A1 (en) Risk behavior recognition method and system, and storage medium and device
US11163877B2 (en) Method, server, and computer storage medium for identifying virus-containing files
CN114666162B (en) Flow detection method, device, equipment and storage medium
CN110166344B (en) Identity identification method, device and related equipment
US20200012784A1 (en) Profile generation device, attack detection device, profile generation method, and profile generation computer program
US11409770B2 (en) Multi-distance similarity analysis with tri-point arbitration
US10158657B1 (en) Rating IP addresses based on interactions between users and an online service
CN111770047A (en) Abnormal group detection method, device and equipment
CN107231383B (en) CC attack detection method and device
CN112217650A (en) Network blocking attack effect evaluation method, device and storage medium
CN111586001B (en) Abnormal user identification method and device, electronic equipment and storage medium
Mechtri et al. Intrusion detection using principal component analysis
CN111885011B (en) Method and system for analyzing and mining safety of service data network
CN112070161A (en) Network attack event classification method, device, terminal and storage medium
CN113810335A (en) Method and system for identifying target IP, storage medium and equipment
US11461590B2 (en) Train a machine learning model using IP addresses and connection contexts
CN109587248B (en) User identification method, device, server and storage medium
CN108076032A (en) A kind of abnormal behaviour user identification method and device
CN110851828A (en) Malicious URL monitoring method and device based on multi-dimensional features and electronic equipment
CN112667961A (en) Method and system for identifying advertisement bullet screen publisher
CN110197066B (en) Virtual machine monitoring method and system in cloud computing environment
US9450982B1 (en) Email spoofing detection via infrastructure machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant