CN109413063B - White list updating method and device based on big data and electronic equipment - Google Patents

White list updating method and device based on big data and electronic equipment Download PDF

Info

Publication number
CN109413063B
CN109413063B CN201811239659.7A CN201811239659A CN109413063B CN 109413063 B CN109413063 B CN 109413063B CN 201811239659 A CN201811239659 A CN 201811239659A CN 109413063 B CN109413063 B CN 109413063B
Authority
CN
China
Prior art keywords
white list
initial
pure white
initial threshold
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811239659.7A
Other languages
Chinese (zh)
Other versions
CN109413063A (en
Inventor
孙家棣
马宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN201811239659.7A priority Critical patent/CN109413063B/en
Publication of CN109413063A publication Critical patent/CN109413063A/en
Application granted granted Critical
Publication of CN109413063B publication Critical patent/CN109413063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring

Abstract

The invention mainly relates to the technical field of big data, and discloses a white list updating method and device based on big data and electronic equipment, wherein the method comprises the following steps: determining a plurality of characteristics of the traffic data of the white list user and an initial threshold value set corresponding to each characteristic; calculating the traffic anomaly ratio of an initial pure white list user and the traffic anomaly ratio of an initial non-pure white list user corresponding to a certain initial threshold value in the initial threshold value set, acquiring the absolute value of the difference between the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user as a reference value corresponding to the certain initial threshold value, traversing the initial threshold value set, selecting the initial threshold value corresponding to the reference value with the maximum value from all the reference values, and dividing the white list user into a target pure white list user and a target non-pure white list user. Under the method, pure white list users are screened based on big data analysis, and the purity of the white list is improved.

Description

White list updating method and device based on big data and electronic equipment
Technical Field
The invention relates to the technical field of big data, in particular to a big data-based white list updating method and device and electronic equipment.
Background
At present, with the increasing number of internet users, the internet field is facing the challenge of large flow data. Abnormal flow can inevitably occur in large-flow data, and the abnormal flow can bring huge impact and loss to the internet, for example, a large amount of abnormal flow can be generated in various black industry chains such as trojan horse seeding, flow transaction and virtual property cash register formed in the black industry.
At present, a user division mode is adopted to distinguish whether the traffic sent by a user is abnormal, wherein one user division mode is to divide the user into a blacklist user, a whitelist user and an uncertain user. Blacklist users are users who are known to be engaged in black industry in advance or have abnormal traffic behaviors before; white list users are users who are unlikely to be engaged in black industry, or have abnormal traffic behavior; uncertain users refer to users other than blacklisted users and whitelisted users.
However, among the white list users, there is a case where the black industry user pretends to be a fraud for the white list user, and further discrimination is required. Therefore, black industry users are further screened out from white list users, a purer white list set is obtained, and the problem which needs to be solved at present is urgent.
Disclosure of Invention
In order to solve the technical problem of low white list purity in the related art, the invention provides a white list updating method and device based on big data and electronic equipment.
A big data based white list updating method, the method comprising:
a) determining a plurality of characteristics of traffic data of white list users and an initial threshold value set corresponding to the characteristics, wherein the initial threshold value set comprises a plurality of initial threshold values, and the white list users comprise initial pure white list users and initial non-pure white list users;
b) obtaining a traffic anomaly ratio value of the initial pure white list user according to a plurality of characteristic values corresponding to the characteristics in the traffic data of the initial pure white list user and a certain initial threshold value corresponding to the characteristics, and obtaining a traffic anomaly ratio value of the initial non-pure white list user according to a plurality of characteristic values corresponding to the characteristics in the traffic data of the initial non-pure white list user and the certain initial threshold value corresponding to the characteristics;
c) acquiring an absolute value of a difference value between the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user, and taking the absolute value as a reference value corresponding to the certain initial threshold;
d) traversing the initial set of thresholds, performing b) and c);
e) when the initial threshold value set is traversed, obtaining reference values with the number equal to the initial threshold value number in the initial threshold value set;
f) and dividing the white list users into target pure white list users and target non-pure white list users according to an initial threshold corresponding to the reference value with the maximum value in all the obtained reference values.
An apparatus for big data based white list update, the apparatus comprising:
the device comprises a determining unit, a calculating unit and a processing unit, wherein the determining unit is used for determining a plurality of characteristics of traffic data of white list users and an initial threshold value set corresponding to the characteristics, the initial threshold value set comprises a plurality of initial threshold values, and the white list users comprise initial pure white list users and initial non-pure white list users;
a first obtaining unit, configured to obtain a traffic anomaly ratio value of the initial pure white list user according to a plurality of feature values corresponding to the features in the traffic data of the initial pure white list user and a certain initial threshold corresponding to the features, and obtain a traffic anomaly ratio value of the initial non-pure white list user according to a plurality of feature values corresponding to the features in the traffic data of the initial non-pure white list user and the certain initial threshold corresponding to the features;
a second obtaining unit, configured to obtain an absolute value of a difference between a traffic anomaly ratio value of the initial pure white list user and a traffic anomaly ratio value of the initial non-pure white list user, where the absolute value is used as a reference value corresponding to the certain initial threshold;
a traversing unit, configured to traverse the initial threshold value set, trigger the first obtaining unit to obtain a traffic anomaly ratio of the initial pure white list user according to a plurality of feature values corresponding to the features in traffic data of the initial pure white list user and a certain initial threshold value corresponding to the features, obtain a traffic anomaly ratio of the initial non-pure white list user according to a plurality of feature values corresponding to the features in traffic data of the initial non-pure white list user and the certain initial threshold value corresponding to the features, and trigger the second obtaining unit to obtain an absolute value of a difference between the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user, where the absolute value is used as a reference value corresponding to the certain initial threshold value;
a third obtaining unit, configured to obtain, when the initial threshold set is traversed, reference values whose number is equal to that of the initial thresholds in the initial threshold set;
and the dividing unit is used for dividing the white list users into target pure white list users and target non-pure white list users according to the initial threshold corresponding to the reference value with the maximum value in all the obtained reference values.
A computer-readable storage medium, characterized in that it stores a computer program that causes a computer to perform the method as described above.
An electronic device, the electronic device comprising:
a processor;
a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method as previously described.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
the communication method provided by the invention comprises the following steps of a) determining a plurality of characteristics of flow data of white list users and an initial threshold value set corresponding to the characteristics, wherein the initial threshold value set comprises a plurality of initial threshold values, and the white list users comprise initial pure white list users and initial non-pure white list users; b) obtaining a traffic anomaly ratio value of the initial pure white list user according to a plurality of characteristic values corresponding to the characteristics in the traffic data of the initial pure white list user and a certain initial threshold value corresponding to the characteristics, and obtaining a traffic anomaly ratio value of the initial non-pure white list user according to a plurality of characteristic values corresponding to the characteristics in the traffic data of the initial non-pure white list user and the certain initial threshold value corresponding to the characteristics; c) acquiring an absolute value of a difference value between the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user, and taking the absolute value as a reference value corresponding to the certain initial threshold; d) traversing the initial set of thresholds, performing b) and c); e) when the initial threshold value set is traversed, obtaining reference values with the number equal to the initial threshold value number in the initial threshold value set; f) and dividing the white list users into target pure white list users and target non-pure white list users according to an initial threshold corresponding to the reference value with the maximum value in all the obtained reference values.
Under the method, the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user corresponding to each initial threshold are obtained, the absolute value of the difference value between the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user is calculated, the initial threshold corresponding to the maximum absolute value is the threshold with the best distinguishing effect, the white list users are divided into target pure white list users and target non-pure white list users according to the initial threshold, the pure white list users are screened based on big data analysis, the accuracy of screening the pure white list users from the white list users is improved, and therefore the purity of the white list is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a schematic diagram illustrating a big data based white list update apparatus according to an example embodiment;
FIG. 2 is a flow diagram illustrating a big data based white list update method in accordance with an exemplary embodiment;
FIG. 3 is a flowchart illustrating details of step 220 according to a corresponding embodiment of FIG. 2;
FIG. 4 is a flowchart illustrating details of step 260 according to a corresponding embodiment of FIG. 2;
FIG. 5 is a flowchart illustrating details of step 210 according to a corresponding embodiment of FIG. 2;
FIG. 6 is a block diagram illustrating a big data based white list update apparatus according to an example embodiment;
fig. 7 is a block diagram illustrating a big data based white list updating apparatus according to an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The environment in which the invention is implemented may be a portable mobile device, such as a smartphone, tablet, desktop computer. The white list updating method based on big data disclosed by the embodiment of the invention can be suitable for any application program running on the portable mobile equipment.
Fig. 1 is a schematic diagram illustrating a big data based white list updating apparatus according to an exemplary embodiment. The apparatus 100 may be the portable mobile device described above. As shown in fig. 1, the apparatus 100 may include one or more of the following components: a processing component 102, a memory 104, a power component 106, a multimedia component 108, an audio component 110, a sensor component 114, and a communication component 116.
The processing component 102 generally controls overall operation of the device 100, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations, among others. The processing components 102 may include one or more processors 118 to execute instructions to perform all or a portion of the steps of the methods described below. Further, the processing component 102 can include one or more modules for facilitating interaction between the processing component 102 and other components. For example, the processing component 102 can include a multimedia module for facilitating interaction between the multimedia component 108 and the processing component 102.
The memory 104 is configured to store various types of data to support operations at the apparatus 100. Examples of such data include instructions for any application or method operating on the device 100. The Memory 104 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. Also stored in memory 104 are one or more modules for execution by the one or more processors 118 to perform all or a portion of the steps of the methods described below.
The power supply component 106 provides power to the various components of the device 100. The power components 106 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 100.
The multimedia component 108 includes a screen that provides an output interface between the device 100 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a touch panel. If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. The screen may further include an Organic Light Emitting Display (OLED for short).
The audio component 110 is configured to output and/or input audio signals. For example, the audio component 110 includes a Microphone (MIC) configured to receive external audio signals when the device 100 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 104 or transmitted via the communication component 116. In some embodiments, the audio component 110 further comprises a speaker for outputting audio signals.
The sensor assembly 114 includes one or more sensors for providing various aspects of status assessment for the device 100. For example, the sensor assembly 114 may detect the open/closed status of the device 100, the relative positioning of the components, the sensor assembly 114 may also detect a change in position of the device 100 or a component of the device 100, and a change in temperature of the device 100. In some embodiments, the sensor assembly 114 may also include a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 116 is configured to facilitate wired or wireless communication between the apparatus 100 and other devices. The device 100 may access a Wireless network based on a communication standard, such as WiFi (Wireless-Fidelity). In an exemplary embodiment, the communication component 116 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the Communication component 116 further includes a Near Field Communication (NFC) module for facilitating short-range Communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, bluetooth technology, and other technologies.
In an exemplary embodiment, the apparatus 100 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital signal processors, digital signal processing devices, programmable logic devices, field programmable gate arrays, controllers, microcontrollers, microprocessors or other electronic components for performing the methods described below.
FIG. 2 is a flow diagram illustrating a big data based white list update method according to an example embodiment. As shown in fig. 2, the method includes the following steps.
Step 210, determining a plurality of characteristics of traffic data of white list users and an initial threshold value set corresponding to the characteristics, where the initial threshold value set includes a plurality of initial threshold values, and the white list users include initial pure white list users and initial non-pure white list users.
In the embodiment of the present invention, the white list users may include pure white list users and non-pure white list users. The traffic data of each user in the white list user is predefined with a plurality of characteristics, for example, the characteristics may include characteristics such as a path repetition degree, a device front-end and back-end login site occupation ratio, an ip access account number, an ip access frequency, a mobile phone number user login average value and a variance in a period, and the characteristics may be used to identify whether a piece of traffic data is abnormal traffic data. When the feature value corresponding to the feature in the traffic data of a certain user is greater than the initial threshold value, the feature value of the user is considered as an abnormal feature value, and thus the traffic data corresponding to the feature value is identified as abnormal traffic data.
As an optional implementation manner, before determining several features of the traffic data of the white list user and an initial threshold value set corresponding to the features, the following steps may be further performed:
and dividing the white list users into initial pure white list users and initial non-pure white list users according to a preset user group division standard.
In the embodiment of the present invention, the preset user group division criterion may be a certain rule for realizing the specified user group division, for example, a user who careers for the life insurance attendance in the white list users may be divided into an initial pure white list user, and other users in the white list users may be divided into initial non-pure white list users.
By implementing the optional implementation mode, reliable user groups can be manually divided into the initial pure white list users to obtain the initial pure white list users and the initial non-pure white list users, so that the iteration updating times are reduced, and the target pure white list users and the target non-pure white list users which are finally divided are obtained more quickly.
Step 220, obtaining a traffic anomaly ratio value of the initial pure white list user according to a plurality of characteristic values corresponding to the characteristics in the traffic data of the initial pure white list user and a certain initial threshold value corresponding to the characteristics, and obtaining a traffic anomaly ratio value of the initial non-pure white list user according to a plurality of characteristic values corresponding to the characteristics in the traffic data of the initial non-pure white list user and a certain initial threshold value corresponding to the characteristics.
In step 230, an absolute value of a difference between the traffic anomaly ratio of the initial white list user and the traffic anomaly ratio of the initial non-white list user is obtained and used as a reference value corresponding to a certain initial threshold.
In the embodiment of the present invention, the larger the value of the reference value is, the larger the absolute value of the difference between the traffic anomaly ratio value of the initial pure white list user and the traffic anomaly ratio value of the initial non-pure white list user is, the better the distinguishing effect of the certain initial threshold value is.
Step 240, traverse the initial set of thresholds.
Step 250, when the initial threshold value set is not traversed, executing the step 220 to the step 240; when the initial threshold value set is traversed, reference values with the number equal to the number of the initial threshold values in the initial threshold value set are obtained.
In the embodiment of the present invention, when the initial threshold set is traversed, the reference value corresponding to each initial threshold in the initial threshold set is obtained, that is, the reference values equal to the number of the initial thresholds in the initial threshold set are obtained.
And step 260, dividing the white list users into target pure white list users and target non-pure white list users according to the initial threshold corresponding to the reference value with the maximum value in all the obtained reference values.
Under the method, the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user corresponding to each initial threshold are obtained, the absolute value of the difference value between the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user is calculated, the initial threshold corresponding to the maximum absolute value is the threshold with the best distinguishing effect, the white list users are divided into target pure white list users and target non-pure white list users according to the initial threshold, the pure white list users are screened based on big data analysis, the accuracy of screening the pure white list users from the white list users is improved, and therefore the purity of the white list is improved.
Fig. 3 is a flowchart illustrating details of step 220 according to a corresponding embodiment of fig. 2. As shown in fig. 3, step 220 includes:
step 221, counting a first abnormal quantity of which characteristic values are greater than a certain initial threshold value corresponding to the characteristics in a plurality of characteristic values corresponding to the characteristics in the traffic data of the initial pure white list user, and counting a second abnormal quantity of which characteristic values are greater than a certain initial threshold value corresponding to the characteristics in a plurality of characteristic values corresponding to the characteristics in the traffic data of the initial non-pure white list user.
In the embodiment of the invention, the white list user comprises a plurality of users, the flow data of the user corresponds to each user, the flow data comprises a plurality of characteristics, and the characteristic value corresponding to each characteristic corresponds to each characteristic; a certain characteristic in the traffic data of a certain user in the initial pure white list users corresponds to a characteristic value, and if the initial pure white list users comprise a plurality of users, a plurality of characteristic values correspond to a certain characteristic in the traffic data of the initial pure white list users; the method comprises the steps that a certain feature in the flow data of a certain user in initial non-pure white list users corresponds to a feature value, if the initial non-pure white list users comprise a plurality of users, a plurality of feature values correspond to the certain feature in the flow data of the initial non-pure white list users, a first abnormal quantity of which the statistical feature value is larger than a certain initial threshold value corresponding to the feature in the plurality of feature values corresponding to the features in the flow data of the initial pure white list users, and a second abnormal quantity of which the statistical feature value is larger than a certain initial threshold value corresponding to the feature in the plurality of feature values corresponding to the features in the flow data of the initial non-pure white list users. And counting the number of the abnormal characteristic values, namely the abnormal flow data number.
Step 222, calculating a ratio of the first abnormal quantity to the traffic data of the initial pure white list user to obtain a traffic abnormal ratio of the initial pure white list user, and calculating a ratio of the second abnormal quantity to the traffic data of the initial non-pure white list user to obtain a traffic abnormal ratio of the initial non-pure white list user.
In the embodiment of the invention, the first abnormal quantity is the quantity of abnormal traffic data in the initial pure white list user, and the ratio of the first abnormal quantity to the total quantity of the traffic data of the initial pure white list user is calculated to obtain the traffic abnormal ratio of the initial pure white list user. And calculating the ratio of the second abnormal quantity to the total quantity of the traffic data of the initial non-pure white list user to obtain the traffic abnormal ratio of the initial non-pure white list user.
Fig. 4 is a flowchart illustrating details of step 260 according to a corresponding embodiment of fig. 2. As shown in fig. 4, step 260 includes:
and 261, normalizing the feature values corresponding to the plurality of features of the traffic data of the white list user to obtain normalized feature values.
As an optional implementation manner, the normalizing the feature values corresponding to a plurality of features of the traffic data of the white list user to obtain normalized feature values may include:
determining a plurality of characteristic values corresponding to a plurality of characteristics of the flow data of the white list user;
determining a minimum characteristic value and a middle characteristic value of the characteristic of the flow data from the plurality of characteristic values;
and according to the minimum characteristic value and the median characteristic value, performing normalization operation on the characteristic values to obtain normalized characteristic values.
In an exemplary embodiment, the above-mentioned medium bitThe eigenvalue may refer to a eigenvalue of more than 99% of the eigenvalues of the above-mentioned several eigenvalues, if x, x are usedmin、xmaxAnd
Figure BDA0001838972930000091
the eigenvalue, the minimum eigenvalue, the median eigenvalue, and the normalized eigenvalue, respectively, representing the flow data, then,
Figure BDA0001838972930000092
the calculation formula of (c) may include:
Figure BDA0001838972930000101
the feature value for performing the normalization operation may refer to a part of the feature values or all of the feature values, and the embodiment of the present invention is not limited thereto. The normalization operation of the characteristic value by using the median characteristic value can avoid the influence of larger characteristic value which happens.
Step 262, set the initial threshold corresponding to the largest reference value among all the obtained reference values as the target initial threshold corresponding to the feature.
And 263, when the normalized feature value corresponding to the feature of each piece of traffic data of a certain user in the white list users in the preset time period is less than or equal to the target initial threshold value corresponding to the feature, dividing the certain user into target pure white list users.
As an alternative embodiment, after step 261 is executed, the following steps may be further executed:
when a normalized characteristic value of a characteristic of a piece of traffic data of a certain user in the white list users exceeds a target initial threshold value corresponding to the characteristic within a preset time period, dividing the user into non-pure white list users;
when a normalized characteristic value corresponding to the characteristic of one user in the white list users is larger than or equal to two traffic data existence characteristics and is larger than a target initial threshold value corresponding to the characteristic, the user is classified into a non-pure white list user;
when one user in the white list users has one flow data in a preset time period and the normalized feature value corresponding to two or more features is larger than the target initial threshold value corresponding to the features, the user is divided into non-pure white list users;
when two or more traffic data of a certain user in the white list users in a preset time period have normalized feature values corresponding to two or more features larger than a target initial threshold value corresponding to the features, the user is classified into a non-pure white list user.
By implementing this alternative embodiment, different dividing manners may be adopted to divide the white-list user from the non-white-list user, for example, when a normalized feature value of a feature of one of the traffic data of a user in the white-list user exceeds an initial threshold corresponding to the feature within a preset time period, the user is divided into the non-white-list user, which is the dividing manner of step 263, and is relatively strict, as long as the user has a normalized feature value of a feature of one of the traffic data exceeding the initial threshold corresponding to the feature, the user is divided into the non-white-list user, that is, the user is divided into the white-list user only when the normalized feature values of all the features of all the traffic data of the user are less than or equal to the initial threshold corresponding to the feature. The multiple division modes meet multiple division requirements of users, and the application range is wider.
In another exemplary embodiment, step 211 may include:
the method comprises the steps of counting a first abnormal quantity of a plurality of normalized feature values corresponding to features in the traffic data of the initial pure white list user, wherein the normalized feature values are larger than a certain initial threshold value corresponding to the features, and counting a second abnormal quantity of a plurality of normalized feature values corresponding to the features in the traffic data of the initial non-pure white list user, wherein the normalized feature values are larger than a certain initial threshold value corresponding to the features.
In another exemplary embodiment, step 260 may include:
determining a target reference value with the largest value in all the obtained reference values;
when the number of the target reference values is two or more, determining the abnormal traffic ratio of the initial pure white list users in the initial threshold values corresponding to all the target reference values;
and dividing the white list users into target pure white list users and target non-pure white list users according to an initial threshold value with the minimum value of the abnormal traffic ratio of the initial pure white list users.
In the embodiment of the present invention, the number of the largest reference values in all the obtained reference values may be two or more, for example, when the initial threshold is 5, the traffic anomaly percentage of the initial pure white user is 5%, the traffic anomaly percentage of the initial non-pure white user is 25%, and the reference value is 20%; when the initial threshold is 10, the traffic anomaly percentage value of the initial pure white user is 2%, the traffic anomaly percentage value of the initial non-pure white user is 22%, and the reference value is 20%. At this time, when the initial threshold is 5 or 10, the value of the reference value may be maximized, that is, 20%, because the value of the traffic anomaly ratio of the initial pure white list user is smaller, and the absolute value (reference value) of the difference between the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user is the same, the traffic anomaly ratio of the initial non-pure white list user is also smaller, and it is better to distinguish the initial threshold by pulling a certain difference between two smaller values than by pulling a difference between two larger values, and at this time, the white list user may be divided into the target pure white list user and the target non-pure white list user according to the initial threshold with the smallest value of the traffic anomaly ratio of the initial pure white list user.
Fig. 5 is a flowchart illustrating details of step 210 according to a corresponding embodiment of fig. 2. As shown in fig. 5, step 210 includes:
step 211, determining a plurality of characteristics of the traffic data of the white list user and a first initial threshold corresponding to the characteristics, and adding the first initial threshold into an initial threshold set.
And 212, increasing the first initial threshold value according to a preset amplification value to obtain a second initial threshold value, and adding the second initial threshold value into the initial threshold value set.
Step 213, when the second initial threshold value does not reach the preset initial threshold value, updating the second initial threshold value to the first initial threshold value, and executing step 212; and when the second initial threshold value reaches a preset initial threshold value, acquiring an initial threshold value set.
The following are embodiments of the apparatus of the present invention.
Fig. 6 is a block diagram illustrating a big data based white list updating apparatus according to an example embodiment. As shown in fig. 6, the apparatus includes:
the determining unit 610 is configured to determine a plurality of features of traffic data of white list users and an initial threshold set corresponding to the features, where the initial threshold set includes a plurality of initial thresholds, and the white list users include initial pure white list users and initial non-pure white list users.
The first obtaining unit 620 is configured to obtain a traffic anomaly ratio value of the initial pure white list user according to a plurality of feature values corresponding to features in the traffic data of the initial pure white list user and a certain initial threshold corresponding to the features, and obtain a traffic anomaly ratio value of the initial non-pure white list user according to a plurality of feature values corresponding to features in the traffic data of the initial non-pure white list user and a certain initial threshold corresponding to the features.
A second obtaining unit 630, configured to obtain an absolute value of a difference between the traffic anomaly ratio value of the initial pure white list user and the traffic anomaly ratio value of the initial non-pure white list user, as a reference value corresponding to a certain initial threshold.
The traversing unit 640 is configured to traverse the initial threshold value set, trigger the first obtaining unit 620 to obtain a traffic anomaly ratio of the initial pure white list user according to a plurality of feature values corresponding to features in the traffic data of the initial pure white list user and a certain initial threshold value corresponding to the features, obtain a traffic anomaly ratio of the initial non-pure white list user according to a plurality of feature values corresponding to features in the traffic data of the initial non-pure white list user and a certain initial threshold value corresponding to the features, and trigger the second obtaining unit 630 to obtain an absolute value of a difference between the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user, where the absolute value is used as a reference value corresponding to the certain initial threshold value.
A third obtaining unit 650, configured to obtain, when the initial threshold set is traversed, reference values equal to the initial threshold values in the initial threshold set in number.
The dividing unit 660 is configured to divide the white list user into a target pure white list user and a target non-pure white list user according to an initial threshold corresponding to a reference value with a largest value among all the obtained reference values.
Fig. 7 is a block diagram illustrating a big data based white list updating apparatus according to an example embodiment. Fig. 7 is optimized on the basis of fig. 6, and compared with the big-data-based white list updating apparatus shown in fig. 6, in the apparatus shown in fig. 7:
the dividing unit 660 is further configured to divide the white list user into an initial pure white list user and an initial non-pure white list user according to a preset user group division standard before the determining unit 610 determines a plurality of characteristics of the traffic data of the white list user and an initial threshold value set corresponding to the characteristics.
Specifically, after the dividing unit 660 divides the white list users into initial pure white list users and initial non-pure white list users according to the preset user group division standard, the dividing unit 660 sends a trigger instruction to the determining unit 610 to trigger the determining unit 610 to determine a plurality of characteristics of the traffic data of the white list users and an initial threshold value set corresponding to the characteristics.
Optionally, the manner that the first obtaining unit 620 is configured to obtain the traffic anomaly ratio value of the initial pure white list user according to a plurality of feature values corresponding to features in the traffic data of the initial pure white list user and a certain initial threshold corresponding to the features, and obtain the traffic anomaly ratio value of the initial non-pure white list user according to a plurality of feature values corresponding to features in the traffic data of the initial non-pure white list user and a certain initial threshold corresponding to the features is specifically as follows:
a first obtaining unit 620, configured to count a first abnormal number of which a feature value is greater than a certain initial threshold value corresponding to a feature, among a plurality of feature values corresponding to features in traffic data of an initial pure white list user, and count a second abnormal number of which a feature value is greater than a certain initial threshold value corresponding to a feature, among a plurality of feature values corresponding to features in traffic data of an initial non-pure white list user; and calculating the ratio of the first abnormal quantity to the flow data of the initial pure white list user to obtain the flow abnormal ratio of the initial pure white list user, and calculating the ratio of the second abnormal quantity to the flow data of the initial non-pure white list user to obtain the flow abnormal ratio of the initial non-pure white list user.
Further optionally, the dividing unit 660 is configured to divide the white list user into the target pure white list user and the target non-pure white list user according to an initial threshold corresponding to a reference value with a largest value among all the obtained reference values:
the dividing unit 660 is configured to perform normalization processing on feature values corresponding to a plurality of features of traffic data of a white list user to obtain normalized feature values; setting an initial threshold corresponding to the reference value with the largest value in all the obtained reference values as a target initial threshold corresponding to the characteristics; when the normalized feature value corresponding to the feature of each piece of traffic data of a certain user in the white list users in the preset time period is smaller than or equal to the target initial threshold value corresponding to the feature, the certain user is divided into the target pure white list users.
Further optionally, the manner that the first obtaining unit 620 is configured to count, in the plurality of feature values corresponding to the features in the traffic data of the initial pure white list user, a first abnormal quantity whose feature value is greater than a certain initial threshold value corresponding to the features, and count, in the plurality of feature values corresponding to the features in the traffic data of the initial non-pure white list user, a second abnormal quantity whose feature value is greater than a certain initial threshold value corresponding to the features specifically is:
a first obtaining unit 620, configured to count a first abnormal quantity that a normalized feature value is greater than a certain initial threshold value corresponding to a feature in a plurality of normalized feature values corresponding to the features in the traffic data of the initial pure white list user, and count a second abnormal quantity that the normalized feature value is greater than a certain initial threshold value corresponding to the features in a plurality of normalized feature values corresponding to the features in the traffic data of the initial non-pure white list user.
Further optionally, the dividing unit 660 is configured to divide the white list user into the target pure white list user and the target non-pure white list user according to an initial threshold corresponding to a reference value with a largest value among all the obtained reference values:
a dividing unit 660, configured to determine a maximum target reference value from all the obtained reference values; when the number of the target reference values is two or more, determining the abnormal traffic ratio of the initial pure white list users in the initial threshold values corresponding to all the target reference values; and dividing the white list users into target pure white list users and target non-pure white list users according to an initial threshold value with the minimum value of the abnormal traffic ratio of the initial pure white list users.
Further optionally, the determining unit 610 is configured to determine a plurality of features of the traffic data of the white list user and an initial threshold set corresponding to the features, where the initial threshold set includes a plurality of initial thresholds, and the white list user includes an initial pure white list user and an initial non-pure white list user in a specific manner:
a determining unit 610, configured to determine a plurality of features of traffic data of a white list user and a first initial threshold corresponding to the features, and add the first initial threshold to an initial threshold set; increasing the first initial threshold value according to a preset amplification to obtain a second initial threshold value, and adding the second initial threshold value into the initial threshold value set; and when the second initial threshold value does not reach the preset initial threshold value, updating the second initial threshold value to the first initial threshold value, and executing the operation of increasing the first initial threshold value according to the preset amplification to obtain the second initial threshold value.
The present invention also provides an electronic device, including:
a processor;
a memory having stored thereon computer readable instructions that, when executed by the processor, implement a big-data based whitelist update method as previously shown.
The electronic device may be the big data based white list updating apparatus 100 shown in fig. 1.
In an exemplary embodiment, the present invention also provides a computer readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the big-data based white list updating method as previously shown.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (10)

1. A big data based white list updating method is characterized in that the method comprises the following steps:
a) determining a plurality of characteristics of traffic data of white list users and an initial threshold value set corresponding to the characteristics, wherein the initial threshold value set comprises a plurality of initial threshold values, and the white list users comprise initial pure white list users and initial non-pure white list users;
b) obtaining a traffic anomaly ratio value of the initial pure white list user according to a plurality of characteristic values corresponding to the characteristics in the traffic data of the initial pure white list user and a certain initial threshold value corresponding to the characteristics, and obtaining a traffic anomaly ratio value of the initial non-pure white list user according to a plurality of characteristic values corresponding to the characteristics in the traffic data of the initial non-pure white list user and the certain initial threshold value corresponding to the characteristics;
c) acquiring an absolute value of a difference value between the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user, and taking the absolute value as a reference value corresponding to the certain initial threshold;
d) traversing the initial set of thresholds, performing b) and c);
e) when the initial threshold value set is traversed, obtaining reference values with the number equal to the initial threshold value number in the initial threshold value set;
f) and dividing the white list users into target pure white list users and target non-pure white list users according to an initial threshold corresponding to the reference value with the maximum value in all the obtained reference values.
2. The method of claim 1, wherein prior to determining a number of characteristics of traffic data of the white-listed users and an initial set of thresholds to which the characteristics correspond, the method further comprises:
and dividing the white list users into initial pure white list users and initial non-pure white list users according to a preset user group division standard.
3. The method according to claim 2, wherein the obtaining a traffic anomaly ratio value of the initial pure white list user according to a plurality of feature values corresponding to the features in the traffic data of the initial pure white list user and a certain initial threshold value corresponding to the features, and obtaining a traffic anomaly ratio value of the initial non-pure white list user according to a plurality of feature values corresponding to the features in the traffic data of the initial non-pure white list user and the certain initial threshold value corresponding to the features comprises:
a first abnormal quantity with a statistical characteristic value larger than a certain initial threshold value corresponding to the characteristic in a plurality of characteristic values corresponding to the characteristic in the flow data of the initial pure white list user, and a second abnormal quantity with a statistical characteristic value larger than the certain initial threshold value corresponding to the characteristic in a plurality of characteristic values corresponding to the characteristic in the flow data of the initial non-pure white list user;
calculating the ratio of the first abnormal quantity to the flow data of the initial pure white list user to obtain the flow abnormal ratio of the initial pure white list user, and calculating the ratio of the second abnormal quantity to the flow data of the initial non-pure white list user to obtain the flow abnormal ratio of the initial non-pure white list user.
4. The method according to claim 3, wherein said dividing the white list users into target pure white list users and target non-pure white list users based on the initial threshold corresponding to the reference value with the largest value among all the obtained reference values comprises:
normalizing the characteristic values corresponding to a plurality of characteristics of the flow data of the white list user to obtain normalized characteristic values;
setting an initial threshold corresponding to a reference value with the largest value in all the obtained reference values as a target initial threshold corresponding to the characteristic;
when the normalized feature value corresponding to the feature of each piece of traffic data of a certain user in the white list users is smaller than or equal to the target initial threshold value corresponding to the feature within a preset time period, dividing the certain user into the target pure white list users.
5. The method of claim 4, wherein the step of counting a first number of anomalies whose feature values are greater than an initial threshold for the feature from among the number of feature values corresponding to the feature in the initial blacklisted user's traffic data and counting a second number of anomalies whose feature values are greater than the initial threshold for the feature from among the number of feature values corresponding to the feature in the initial non-whitelisted user's traffic data comprises:
a first abnormal quantity of a plurality of normalized feature values corresponding to the features in the traffic data of the initial pure white list user, the normalized feature values of which are greater than a certain initial threshold value corresponding to the features, and a second abnormal quantity of a plurality of normalized feature values corresponding to the features in the traffic data of the initial non-pure white list user, the normalized feature values of which are greater than the certain initial threshold value corresponding to the features.
6. The method according to any one of claims 1 to 5, wherein said dividing the white list users into target pure white list users and target non-pure white list users based on an initial threshold corresponding to a reference value with a largest value among all the obtained reference values comprises:
determining a target reference value with the largest value in all the obtained reference values;
when the number of the target reference values is two or more, determining the abnormal traffic ratio of the initial pure white list users in the initial threshold values corresponding to all the target reference values;
and dividing the white list users into target pure white list users and target non-pure white list users according to an initial threshold value with the minimum value of the abnormal traffic ratio of the initial pure white list users.
7. The method of claim 6, wherein determining a number of characteristics of traffic data of the white-list user and an initial set of thresholds corresponding to the characteristics comprises:
determining a plurality of characteristics of flow data of a white list user and a first initial threshold corresponding to the characteristics, and adding the first initial threshold into an initial threshold set;
increasing the first initial threshold value according to a preset amplification value to obtain a second initial threshold value, and adding the second initial threshold value into the initial threshold value set;
and when the second initial threshold value does not reach a preset initial threshold value, updating the second initial threshold value to the first initial threshold value, and executing the step of increasing the first initial threshold value according to a preset increase amplitude to obtain the second initial threshold value.
8. An apparatus for updating white list based on big data, the apparatus comprising:
the device comprises a determining unit, a calculating unit and a processing unit, wherein the determining unit is used for determining a plurality of characteristics of traffic data of white list users and an initial threshold value set corresponding to the characteristics, the initial threshold value set comprises a plurality of initial threshold values, and the white list users comprise initial pure white list users and initial non-pure white list users;
a first obtaining unit, configured to obtain a traffic anomaly ratio value of the initial pure white list user according to a plurality of feature values corresponding to the features in the traffic data of the initial pure white list user and a certain initial threshold corresponding to the features, and obtain a traffic anomaly ratio value of the initial non-pure white list user according to a plurality of feature values corresponding to the features in the traffic data of the initial non-pure white list user and the certain initial threshold corresponding to the features;
a second obtaining unit, configured to obtain an absolute value of a difference between a traffic anomaly ratio value of the initial pure white list user and a traffic anomaly ratio value of the initial non-pure white list user, where the absolute value is used as a reference value corresponding to the certain initial threshold;
a traversing unit, configured to traverse the initial threshold value set, trigger the first obtaining unit to obtain a traffic anomaly ratio of the initial pure white list user according to a plurality of feature values corresponding to the features in traffic data of the initial pure white list user and a certain initial threshold value corresponding to the features, obtain a traffic anomaly ratio of the initial non-pure white list user according to a plurality of feature values corresponding to the features in traffic data of the initial non-pure white list user and the certain initial threshold value corresponding to the features, and trigger the second obtaining unit to obtain an absolute value of a difference between the traffic anomaly ratio of the initial pure white list user and the traffic anomaly ratio of the initial non-pure white list user, where the absolute value is used as a reference value corresponding to the certain initial threshold value;
a third obtaining unit, configured to obtain, when the initial threshold set is traversed, reference values whose number is equal to that of the initial thresholds in the initial threshold set;
and the dividing unit is used for dividing the white list users into target pure white list users and target non-pure white list users according to the initial threshold corresponding to the reference value with the maximum value in all the obtained reference values.
9. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that it stores a computer program that causes a computer to execute the method of any one of claims 1 to 7.
CN201811239659.7A 2018-10-23 2018-10-23 White list updating method and device based on big data and electronic equipment Active CN109413063B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811239659.7A CN109413063B (en) 2018-10-23 2018-10-23 White list updating method and device based on big data and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811239659.7A CN109413063B (en) 2018-10-23 2018-10-23 White list updating method and device based on big data and electronic equipment

Publications (2)

Publication Number Publication Date
CN109413063A CN109413063A (en) 2019-03-01
CN109413063B true CN109413063B (en) 2022-01-18

Family

ID=65468838

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811239659.7A Active CN109413063B (en) 2018-10-23 2018-10-23 White list updating method and device based on big data and electronic equipment

Country Status (1)

Country Link
CN (1) CN109413063B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112084408B (en) * 2020-09-08 2023-11-21 中国平安财产保险股份有限公司 List data screening method, device, computer equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009087226A (en) * 2007-10-02 2009-04-23 Kddi Corp Web site determining device and web site determining program
US8112485B1 (en) * 2006-11-22 2012-02-07 Symantec Corporation Time and threshold based whitelisting
CN104468631A (en) * 2014-12-31 2015-03-25 国家电网公司 Network intrusion identification method based on anomaly flow and black-white list library of IP terminal
CN105094280A (en) * 2015-07-07 2015-11-25 北京奇虎科技有限公司 Method, apparatus and system for improving standby performance of intelligent terminal
CN105684391A (en) * 2013-11-04 2016-06-15 伊尔拉米公司 Automated generation of label-based access control rules
CN106506497A (en) * 2016-11-04 2017-03-15 广州华多网络科技有限公司 Forge white list IP address detection method, device and server
CN107508822A (en) * 2017-09-06 2017-12-22 迈普通信技术股份有限公司 Access control method and device
CN107800673A (en) * 2016-09-07 2018-03-13 武汉安天信息技术有限责任公司 The maintaining method and device of a kind of white list
CN107992398A (en) * 2017-12-22 2018-05-04 宜人恒业科技发展(北京)有限公司 The monitoring method and monitoring system of a kind of operation system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8112485B1 (en) * 2006-11-22 2012-02-07 Symantec Corporation Time and threshold based whitelisting
JP2009087226A (en) * 2007-10-02 2009-04-23 Kddi Corp Web site determining device and web site determining program
CN105684391A (en) * 2013-11-04 2016-06-15 伊尔拉米公司 Automated generation of label-based access control rules
CN104468631A (en) * 2014-12-31 2015-03-25 国家电网公司 Network intrusion identification method based on anomaly flow and black-white list library of IP terminal
CN105094280A (en) * 2015-07-07 2015-11-25 北京奇虎科技有限公司 Method, apparatus and system for improving standby performance of intelligent terminal
CN107800673A (en) * 2016-09-07 2018-03-13 武汉安天信息技术有限责任公司 The maintaining method and device of a kind of white list
CN106506497A (en) * 2016-11-04 2017-03-15 广州华多网络科技有限公司 Forge white list IP address detection method, device and server
CN107508822A (en) * 2017-09-06 2017-12-22 迈普通信技术股份有限公司 Access control method and device
CN107992398A (en) * 2017-12-22 2018-05-04 宜人恒业科技发展(北京)有限公司 The monitoring method and monitoring system of a kind of operation system

Also Published As

Publication number Publication date
CN109413063A (en) 2019-03-01

Similar Documents

Publication Publication Date Title
CN109299387B (en) Message pushing method and device based on intelligent recommendation and terminal equipment
CN108632081B (en) Network situation evaluation method, device and storage medium
US11233810B2 (en) Multi-signal analysis for compromised scope identification
EP2960823B1 (en) Method, device and system for managing authority
CN105210042A (en) Internet protocol threat prevention
CN105653323A (en) Application program management method and device
CN108469893B (en) Display screen control method, device, equipment and computer readable storage medium
CN110460583B (en) Sensitive information recording method and device and electronic equipment
CN110191085B (en) Intrusion detection method and device based on multiple classifications and storage medium
CN109165738B (en) Neural network model optimization method and device, electronic device and storage medium
CN110717509B (en) Data sample analysis method and device based on tree splitting algorithm
CN111242188B (en) Intrusion detection method, intrusion detection device and storage medium
US20180375883A1 (en) Automatically detecting insider threats using user collaboration patterns
CN113569992B (en) Abnormal data identification method and device, electronic equipment and storage medium
US20180302428A1 (en) Method, device and storage medium for determining health state of information system
CN109670313B (en) Method, device and readable storage medium for risk assessment in system operation
CN111428032B (en) Content quality evaluation method and device, electronic equipment and storage medium
CN109257354B (en) Abnormal flow analysis method and device based on model tree algorithm and electronic equipment
CN109413063B (en) White list updating method and device based on big data and electronic equipment
CN109802994B (en) Message pushing method and system based on content distribution network
CN109447258B (en) Neural network model optimization method and device, electronic device and storage medium
CN110689166B (en) User behavior analysis method and device based on random gradient descent algorithm
US10826923B2 (en) Network security tool
CN108733695A (en) The intension recognizing method and device of user's search string
KR20220068408A (en) The method for controling security based on internet protocol and system thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant