Internet-based application monitoring and early warning data management system and method
Technical Field
The invention relates to the technical field of Internet application supervision, in particular to an application monitoring and early warning data management system and method based on Internet.
Background
In the internet, and in particular in social networking platforms, some users posting specific information for specific content may be considered as the owners of the anomalous accounts whose behavior may affect the network activity of the normal users. There are currently a number of methods for identifying and limiting an abnormal account, such as by machine learning and neural network analysis of characteristics of the abnormal account, thereby identifying the abnormal account.
As the effort to hit an anomalous account increases, the owner of the anomalous account also begins to evade detection by intelligent means, such as: the use of computer programs to generate random operations to mimic the behavior of a real person or by special equipment to enable a single person to operate multiple network application accounts requires further improvement in the ability to monitor network anomaly accounts.
Disclosure of Invention
The invention aims to provide an application monitoring and early warning data management system and method based on the Internet, which are used for solving the problems in the background technology.
In order to solve the technical problems, the invention provides the following technical scheme: an application monitoring early warning data management system and method based on the Internet, the method includes:
step S100: setting certain application software providing the Internet social media service as a target application, setting a certain account of the target application as a target account, extracting a historical record and a historical operation record of the target account in the target application and participating in topics, and constructing a behavior habit model of the target account in the target application;
step S200: setting a monitoring period, monitoring the operation behavior of the target account in the monitoring period, and comparing the difference between the operation behavior of the target account in the monitoring period and the behavior habit model of the target account;
step S300: acquiring records of participation topics of a target account in a monitoring time period, pushing the records and searching the records, wherein the characteristics of the coincidence degree of the participation topics of the target account and the pushing records and the coincidence degree difference of the participation topics of the target account and the search content are first characteristics, and calculating first characteristic parameters;
step S400: acquiring the behavior of the target account on the topic of the monitoring time period, extracting the characteristic of the transmitted content and the response characteristic of the target account in the monitoring time period as second characteristics, and calculating second characteristic parameters through the repetition rate and the response rate of the transmitted content;
step S500: and setting a management threshold value, evaluating the difference degree of the target account behavior habit model, extracting characteristic components corresponding to the characteristic parameters to form early warning characteristics, extracting account numbers with the early warning characteristics in all account numbers of the Internet application, and generating early warning information to be fed back to an Internet platform manager.
Further, step S100 includes:
step S101: extracting a participation topic record of a target account and a historical operation record of the target account;
topics represent topic discussion spaces of internet application users, e.g. "hot search vocabulary entry", "super words" and "interest group" may be regarded as corresponding topic discussion spaces;
in the early warning and data process of the scheme, when whether an operator behind an account is a real person normal user or not is uncertain, the account is taken as a study subject;
step S102: acquiring browsing time of each topic of a target account, browsing time length in the topic and a skip operation record, wherein the skip operation is that the target account obtains further browsing operation in the topic range in the action of browsing the topic;
in the interface of the application, part of the content is folded, and a jump operation is needed to view the content, and the jump operation usually needs to click some links, for example: "more", "expand" and "next page";
step S103: calculating topic history participation coefficients of the target account in topic participation, wherein the topic history participation coefficient of the ith topic is h i ,h i =α 1 (h content /h time )+α 2 h num Wherein h is content Indicating the topic content browsing amount of the target account in the ith topic, h time Representing the content browsing time of the target account in the ith topic, h num Indicating that the target account jumps in the ith topicNumber of operations performed, alpha 1 Is (h) content /h time ) Weight coefficient of term, alpha 2 Is h num The weight coefficient of the item, the topic content browsing amount is the word number of the topic content browsed by the target account;
h content /h time the input degree of the target account in the topics is represented, and the content h of the topics is browsed more carefully or repeatedly in the browsing process content /h time The larger h content The method can represent the interest degree of the target account on a certain topic, and the time when the target account knows about the certain topic is h content The larger the value of the topic history participation coefficient is, the participation habit of the target account when the target account historically participates in a topic is shown, and the habit is different from the behavior mode of the target account for participating in abnormal account activities;
step S104: obtaining topic labels, and establishing habit models of all topics of a target account, wherein the habit model of the ith topic is H i ,H i ={H label ,h i }, wherein H label Representing the set of tags contained by the i-th topic.
Further, step S200 includes:
step S201: obtaining the last login time of the target account, and T after the last login time of the target account 1 The time period is set as a monitoring time period;
step S202: calculating a monitoring period behavior parameter k of a target account in topic participation, wherein k=beta 1 (k content /k time )+β 2 k num Wherein k is content Representing the monitored content browsing amount, k, of the target account in the monitored time period time Representing the content browsing time, k, of the target account in the monitoring period num Indicating the operation number of jump operation of the target account in the monitoring time period beta 1 Is (k) content /k time ) Weight coefficient of term, beta 2 Is k num Weight coefficient of term, where k time ≤T 1 The monitoring content browsing amount represents the word number of the browsed content in the target account monitoring time period;
step S203: acquiring the total number of histories of the target account participating in topics and the histories of the topics, and calculating the histories of the topics, wherein p is used as the histories of the ith topic i Representing p i =(n i /N p )×h i ,n i Represents the historical participation times of the ith topic, N p The total number of the histories of the topics participated by the target account is represented, and the historic participated behavior parameters of all topics are accumulated to obtain the historic behavior parameters P of the target account;
step S204: epsilon calculation 1 =p-k, take ε 1 Is the absolute value epsilon of (2) 2 ,ε 2 And representing the difference degree between the operation behavior of the target account in the monitoring time period and the behavior habit model of the target account.
Further, step S300 includes:
step S301: acquiring records of participation topics of a target account in a monitoring time period, wherein a label set C of the participation topics label Obtaining push records of a target account in a monitoring time period, wherein a content tag set M of the push records label Acquiring a search record of a target account in a monitoring time period, wherein a label set of search contents is S label ;
Step S302: calculating a first characteristic parameter delta 1 ,δ 1 =NUM(C label ∩S label )/ NUM(C label ∩M label ) Wherein NUM is a counting function used to count the number of elements in the collection.
C label ∩S label Representing the coincidence degree of the participation topics of the target account and the search content in the monitoring time period, C label ∩M label Representing the coincidence degree of participation topics and push records of a target account in a monitoring time period, wherein owners of abnormal accounts have purposeful effects on the development of certain topic contents, and before the topic contents become hot contents, the topics are searched, viewpoints are subjectively introduced, the manufactured topic heat is a common means of the owners of the abnormal accounts, and the common means of actively manufacturing the hot topics is used for passively receiving topic informationThe user follows the wind, NUM (C) label ∩S label ) The larger accounts participation topics are more prone to active participation, NUM (C label ∩M label ) The larger the account participation topic is, the more the account participation topic tends to be passively participated, the first characteristic parameter delta 1 The method is used for distinguishing the tendency of the accounts when participating in topics, and further intercepting account behaviors with active participation tendency;
for unavailable content tag set M label Can be passed through each H label Intersection of the sets to give H' label Set replaces M label A collection;
further, step S400 includes:
step S401: acquiring the content released by the target account in the target application in the monitoring time period, and extracting the characteristics of the released content;
step S402: calculating the repetition rate f of the jth content feature of the target account in the monitoring time period j ,f j =e j E, wherein E j Representing the number of sending stripes of the jth content characteristic of the target account in the monitoring time period, wherein E represents the total number of sending stripes of the content of the target account in the monitoring time period;
step S403: arranging the repetition rate of each feature from high to low, setting main feature boundary value, and detecting f cbn -f cbn+1 At > gamma, f cbn F cbn The front corresponding content features are the main features, where f cbn Representing a repetition rate corresponding to an nth content feature in the arranged repeating sequence of content features, where f cbn+1 Representing the repetition rate corresponding to the (n+1) th content feature in the arranged content feature repetition sequence, wherein gamma represents a main feature boundary value;
step S404: accumulating the repetition rate of the main features to obtain a main feature repetition rate F, and calculating a repetition behavior parameter R 1 ,R 1 =F×r in Wherein r is in Representing average interval time of release items of a target account in a monitoring time period, wherein the release items represent content sets sent to a target application after the target account completes content editing, and one content setCorresponding to a content release item;
step S405: extracting the reply behavior of the target account after the monitoring time period is commented, acquiring the content of the target account replied again after being commented, and calculating the reply rate R of the target account 2 ,R 2 =Q t /Q r Wherein Q is t Representing the amount of data sent during the target account monitoring period, Q r Representing the data quantity of the replies after the comments in the target account monitoring time period;
R 2 the interaction index of the target account with other accounts in the topics is shown, the larger R2 indicates that the interaction of the target account is stronger, the interaction is used as an index for judging whether the mechanical operation is performed on the target account, when the mechanical operation is performed on the target account, the target account is difficult to interact with random comments, and whether the mechanical operation is performed on the target account is further checked;
step S406: calculating a second characteristic parameter delta 2 ,δ 2 =μ 1 R 1 +μ 2 R 2 Wherein mu 1 Is R 1 Weight coefficient, mu 2 Is R 2 Weight coefficient of (c) in the above-mentioned formula (c).
Further, step S500 includes:
step S501: setting a management threshold epsilon 0 When epsilon appears in the target account 2 >ε 0 Setting the target account as an early warning characteristic account;
step S502: creating early warning features X according to the early warning template account, wherein X is { C } label ,δ 1 ,δ 2 A set of three elements;
in one early warning feature, C label Represent attack range, delta 1 Representation pair C label Attack tendencies of delta 2 The mechanical behavior is represented, whether the target account is controlled by an abnormal account owner or not is judged through the tendency and the mechanical property of the target account, the attack range of the abnormal account owner is given, and an account with the mechanical behavior is set as an abnormal account;
step S503: for the early warning time period T 2 Account with early warning feature inQuantity, when T 2 Exceeding the early warning number threshold N in a time period X And feeding back account information with early warning characteristics and corresponding early warning characteristics to an Internet platform manager.
In order to better realize the method, the application monitoring and early warning data management system based on the Internet is also provided, which comprises the following steps: the system comprises an operation record extraction module, a behavior comparison module, a characteristic parameter calculation module and an early warning module, wherein the operation record extraction module is used for extracting a participation topic record of a target account, a historical operation record of the target account and an operation behavior of the target account in a monitoring time period, the behavior comparison module is used for comparing the difference between the operation behavior of the target account in the monitoring time period and a behavior habit model of the target account, the characteristic parameter calculation module is used for calculating a first characteristic parameter and a second characteristic parameter, and the early warning module is used for extracting early warning characteristics and generating early warning information to be fed back to an Internet platform manager.
Further, the behavior comparison module includes: the system comprises a historical participation coefficient calculation unit, a behavior parameter calculation unit, a historical behavior parameter calculation unit and a difference degree calculation unit of a behavior habit model, wherein the historical participation coefficient calculation unit is used for calculating a historical participation coefficient, the behavior parameter calculation unit is used for calculating a monitoring period behavior parameter, the historical behavior parameter calculation unit is used for calculating a historical behavior parameter, and the difference degree calculation unit of the behavior habit model is used for calculating the difference degree of the behavior habit model.
Further, the feature parameter calculation module includes: the system comprises a participation topic record acquisition unit, a push record acquisition unit, a search record acquisition unit, a first characteristic parameter calculation unit, a characteristic repetition rate calculation unit, a main characteristic extraction unit, a repeated behavior parameter calculation unit, a target account reply rate calculation unit and a second characteristic parameter calculation unit, wherein the participation topic record acquisition unit is used for acquiring records of participation topics of a target account in a monitoring time period, extracting a label set of the participation topics, the push record acquisition unit is used for acquiring push records of the target account in the monitoring time period, extracting a content label set of the push records, the search record acquisition unit is used for acquiring search records of the target account in the monitoring time period, extracting a label set of search content, the first characteristic parameter calculation unit is used for calculating a first characteristic parameter, the characteristic repetition rate calculation unit is used for calculating a characteristic repetition rate, the main characteristic extraction unit is used for extracting main characteristics, the repeated behavior parameter calculation unit is used for calculating a repeated behavior parameter, the target account reply rate calculation unit is used for calculating the reply rate of the target account, and the second characteristic parameter calculation unit is used for calculating a second characteristic parameter.
Further, the early warning module includes: the system comprises a difference evaluation unit, an early warning feature creation unit, an early warning feature comparison unit and an early warning feedback unit, wherein the difference evaluation unit is used for calculating the difference of a target account behavior habit model, the early warning feature creation unit is used for creating early warning features, the early warning feature comparison unit is used for comparing and extracting account information with early warning features according to the early warning features, and the early warning feedback unit is used for feeding back account information with the early warning features and corresponding early warning features to an Internet platform manager.
Compared with the prior art, the invention has the following beneficial effects: according to the method, whether the target account is controlled by an abnormal account owner is judged by evaluating the tendency and mechanical behavior of the target account, early warning features are created according to the early warning template account, the target account is checked in the Internet application account, the searching position has similar attack targets and attack behaviors, and the identification capacity of the abnormal account is further improved.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a schematic diagram of an Internet-based application monitoring and early warning data management system;
fig. 2 is a flow chart of an application monitoring and early warning data management method based on the internet.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1 and 2, the present invention provides the following technical solutions: the method comprises the following steps:
step S100: setting certain application software providing the Internet social media service as a target application, setting a certain account of the target application as a target account, extracting a historical record and a historical operation record of the target account in the target application and participating in topics, and constructing a behavior habit model of the target account in the target application;
wherein, step S100 includes:
step S101: extracting a participation topic record of a target account and a historical operation record of the target account;
step S102: acquiring browsing time of each topic of a target account, browsing time length in the topic and a skip operation record, wherein the skip operation is that the target account obtains further browsing operation in the topic range in the action of browsing the topic;
step S103: calculating topic history participation coefficients of the target account in topic participation, wherein the topic history participation coefficient of the ith topic is h i ,h i =α 1 (h content /h time )+α 2 h num Wherein h is content Indicating the topic content browsing amount of the target account in the ith topic, h time Representing the content browsing time of the target account in the ith topic, h num Representing the operation number of jumping operation of the target account in the ith topic, alpha 1 Is (h) content /h time ) Weight coefficient of term, alpha 2 Is h num The weight coefficient of the item, the topic content browsing amount is the word number of the topic content browsed by the target account;
step S104: obtaining topic labels and establishing each phone of target accountA habit model of the topic, wherein the habit model of the ith topic is H i ,H i ={H label ,h i }, wherein H label Representing the set of tags contained by the i-th topic.
Step S200: setting a monitoring period, monitoring the operation behavior of the target account in the monitoring period, and comparing the difference between the operation behavior of the target account in the monitoring period and the behavior habit model of the target account;
wherein, step S200 includes:
step S201: obtaining the last login time of the target account, and T after the last login time of the target account 1 The time period is set as a monitoring time period;
step S202: calculating a monitoring period behavior parameter k of a target account in topic participation, wherein k=beta 1 (k content /k time )+β 2 k num Wherein k is content Representing the monitored content browsing amount, k, of the target account in the monitored time period time Representing the content browsing time, k, of the target account in the monitoring period num Indicating the operation number of jump operation of the target account in the monitoring time period beta 1 Is (k) content /k time ) Weight coefficient of term, beta 2 Is k num Weight coefficient of term, where k time ≤T 1 The monitoring content browsing amount represents the word number of the browsed content in the target account monitoring time period;
in an actual scene, an existing supervision mode divides an account with abnormal account behavior into an abnormal account list, in order to evade supervision of the abnormal account list, an owner of the abnormal account can conduct normal operations through the account to evade supervision of an existing supervision platform, and the behavior of the owner of the abnormal account in the evading supervision process and the participation public opinion attack behavior can be different, wherein the difference is obtained by comparing historical participation coefficients and monitoring period behavior parameters;
step S203: acquiring the total number of histories of participation topics of a target account and the historical participation times of all topics, and calculating the historical participation behavior of a certain topic, wherein the history of the ith topicP for participation behavior parameters i Representing p i =(n i /N p )×h i ,n i Represents the historical participation times of the ith topic, N p The total number of the histories of the topics participated by the target account is represented, and the historic participated behavior parameters of all topics are accumulated to obtain the historic behavior parameters P of the target account;
step S204: epsilon calculation 1 =p-k, take ε 1 Is the absolute value epsilon of (2) 2 ,ε 2 And representing the difference degree between the operation behavior of the target account in the detection time period and the behavior habit model of the target account.
Step S300: acquiring records of participation topics of a target account in a monitoring time period, pushing the records and searching the records, wherein the characteristics of the coincidence degree of the participation topics of the target account and the pushing records and the coincidence degree difference of the participation topics of the target account and the search content are first characteristics, and calculating first characteristic parameters;
wherein, step S300 includes:
step S301: acquiring records of participation topics of a target account in a monitoring time period, wherein a label set C of the participation topics label Obtaining push records of a target account in a monitoring time period, wherein a content tag set M of the push records label Acquiring a search record of a target account in a monitoring time period, wherein a label set of search contents is S label ;
Step S302: calculating a first characteristic parameter delta 1 ,δ 1 =NUM(C label ∩S label )/ NUM(C label ∩M label ) Wherein NUM is a counting function used to count the number of elements in the collection.
Step S400: acquiring the behavior of the target account on the topic of the monitoring time period, extracting the characteristic of the transmitted content and the response characteristic of the target account in the monitoring time period as second characteristics, and calculating second characteristic parameters through the repetition rate and the response rate of the transmitted content;
wherein, step S400 includes:
step S401: acquiring the content released by the target account in the target application in the monitoring time period, and extracting the characteristics of the released content;
the method for extracting the characteristics of the release content comprises the following steps: a word bag model, an N-gram model, a word vector model and a TF-IDF model;
step S402: calculating the repetition rate f of the jth content feature of the target account in the monitoring time period j ,f j =e j E, wherein E j Representing the number of sending stripes of the jth content characteristic of the target account in the monitoring time period, wherein E represents the total number of sending stripes of the content of the target account in the monitoring time period;
step S403: arranging the repetition rate of each feature from high to low, setting main feature boundary value, and detecting f cbn -f cbn+1 At > gamma, f cbn F cbn The front corresponding content features are the main features, where f cbn Representing a repetition rate corresponding to an nth content feature in the arranged repeating sequence of content features, where f cbn+1 Representing the repetition rate corresponding to the (n+1) th content feature in the arranged content feature repetition sequence, wherein gamma represents a main feature boundary value;
the repetition rate of the 6 content features is f 1 、f 2 、f 3 、f 4 、f 5 And f 6 After the high-to-low arrangement, the sequence is obtained: f (f) 3 ,f 5 ,f 1 ,f 6 ,f 4 ,f 2 ;
Calculating f 3 - f 5 <γ,f 5 - f 1 <γ,f 1 - f 6 > gamma, f is taken 3 、f 5 And f 1 The corresponding content features are the main features;
step S404: accumulating the repetition rate of the main features to obtain a main feature repetition rate F, and calculating a repetition behavior parameter R 1 ,R 1 =F×r in Wherein r is in The method comprises the steps that average interval time of release items of a target account in a monitoring time period is represented, the release items represent content sets sent to a target application after the target account completes content editing, and one content set corresponds to one content release item;
step S405: extracting a targetThe reply behavior of the account after being reviewed in the account monitoring time period is obtained, the content of the target account replied again after being reviewed is obtained, and the reply rate R of the target account is calculated 2 ,R 2 =Q t /Q r Wherein Q is t Representing the amount of data sent during the target account monitoring period, Q r Representing the data quantity of the replies after the comments in the target account monitoring time period;
step S406: calculating a second characteristic parameter delta 2 ,δ 2 =μ 1 R 1 +μ 2 R 2 Wherein mu 1 Is R 1 Weight coefficient, mu 2 Is R 2 Weight coefficient of (c) in the above-mentioned formula (c).
Step S500: and setting a management threshold value, evaluating the difference degree of the target account behavior habit model, extracting characteristic components corresponding to the characteristic parameters to form early warning characteristics, extracting account numbers with the early warning characteristics in all account numbers of the Internet application, and generating early warning information to be fed back to an Internet platform manager.
Wherein, step S500 includes:
step S501: setting a management threshold epsilon 0 When epsilon appears in the target account 2 >ε 0 Setting the target account as an early warning characteristic account;
step S502: creating early warning features X according to the early warning template account, wherein X is { C } label ,δ 1 ,δ 2 A set of };
step S503: for the early warning time period T 2 The number of the accounts with early warning characteristics in the system is T 2 Exceeding the early warning number threshold N in a time period X And feeding back account information with early warning characteristics and corresponding early warning characteristics to an Internet platform manager.
The system comprises:
the system comprises an operation record extraction module, a behavior comparison module, a characteristic parameter calculation module and an early warning module, wherein the operation record extraction module is used for extracting a participation topic record of a target account, a historical operation record of the target account and an operation behavior of the target account in a detection time period, the behavior comparison module is used for comparing the difference between the operation behavior of the target account in the detection time period and a behavior habit model of the target account, the characteristic parameter calculation module is used for calculating a first characteristic parameter and a second characteristic parameter, and the early warning module is used for extracting early warning characteristics and generating early warning information to be fed back to an Internet platform manager.
Wherein, the action comparison module includes: the system comprises a historical participation coefficient calculation unit, a behavior parameter calculation unit, a historical behavior parameter calculation unit and a difference degree calculation unit of a behavior habit model, wherein the historical participation coefficient calculation unit is used for calculating a historical participation coefficient, the behavior parameter calculation unit is used for calculating a monitoring period behavior parameter, the historical behavior parameter calculation unit is used for calculating a historical behavior parameter, and the difference degree calculation unit of the behavior habit model is used for calculating the difference degree of the behavior habit model.
Wherein, the characteristic parameter calculation module includes: the system comprises a participation topic record acquisition unit, a push record acquisition unit, a search record acquisition unit, a first characteristic parameter calculation unit, a characteristic repetition rate calculation unit, a main characteristic extraction unit, a repeated behavior parameter calculation unit, a target account reply rate calculation unit and a second characteristic parameter calculation unit, wherein the participation topic record acquisition unit is used for acquiring records of participation topics of a target account in a monitoring time period, extracting a label set of the participation topics, the push record acquisition unit is used for acquiring push records of the target account in the monitoring time period, extracting a content label set of the push records, the search record acquisition unit is used for acquiring search records of the target account in the monitoring time period, extracting a label set of search content, the first characteristic parameter calculation unit is used for calculating a first characteristic parameter, the characteristic repetition rate calculation unit is used for calculating a characteristic repetition rate, the main characteristic extraction unit is used for extracting main characteristics, the repeated behavior parameter calculation unit is used for calculating a repeated behavior parameter, the target account reply rate calculation unit is used for calculating the reply rate of the target account, and the second characteristic parameter calculation unit is used for calculating a second characteristic parameter.
Wherein, early warning module includes: the system comprises a difference evaluation unit, an early warning feature creation unit, an early warning feature comparison unit and an early warning feedback unit, wherein the difference evaluation unit is used for calculating the difference of a target account behavior habit model, the early warning feature creation unit is used for creating early warning features, the early warning feature comparison unit is used for comparing and extracting account information with early warning features according to the early warning features, and the early warning feedback unit is used for feeding back account information with the early warning features and corresponding early warning features to an Internet platform manager.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but it is to be understood that modifications and equivalents of some of the technical features described in the foregoing embodiments may be made by those skilled in the art, although the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.