CN109255391B - Method, device and storage medium for identifying malicious user - Google Patents

Method, device and storage medium for identifying malicious user Download PDF

Info

Publication number
CN109255391B
CN109255391B CN201811161527.7A CN201811161527A CN109255391B CN 109255391 B CN109255391 B CN 109255391B CN 201811161527 A CN201811161527 A CN 201811161527A CN 109255391 B CN109255391 B CN 109255391B
Authority
CN
China
Prior art keywords
user
sample set
malicious
value
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811161527.7A
Other languages
Chinese (zh)
Other versions
CN109255391A (en
Inventor
王非池
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201811161527.7A priority Critical patent/CN109255391B/en
Publication of CN109255391A publication Critical patent/CN109255391A/en
Application granted granted Critical
Publication of CN109255391B publication Critical patent/CN109255391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Abstract

The embodiment of the invention provides a method, a device and a storage medium for identifying a malicious user, wherein the method comprises the following steps: acquiring a training sample set, wherein the training sample set comprises characteristics of a training sample and a label corresponding to the training sample, and the label corresponding to the training sample is used for identifying whether a user is a malicious user; determining the optimal segmentation characteristic and the optimal segmentation threshold value from the training sample set; splitting a training sample set to obtain a first sample set and a second sample set, and taking the mean value of each feature in the first sample set and the second sample set as a target output value; judging whether the user is a malicious user or not according to the target output value; setting labels corresponding to training samples which are judged to be malicious users in the first sample set and the second sample set as the malicious users, and setting labels corresponding to training samples which are judged to be non-malicious users as the non-malicious users. By adopting the scheme, the behavior characteristics of the user can be accurately analyzed, and the malicious user can be identified according to the behavior characteristics.

Description

Method, device and storage medium for identifying malicious user
Technical Field
The present invention relates to the field of software technologies, and in particular, to a method, an apparatus, and a storage medium for identifying a malicious user.
Background
In the development process of a live broadcast platform, a plurality of platform accounts can be frequently registered in batches by some black product groups for achieving the private purpose, and the accounts are used for brushing malicious behaviors such as comments, bullet curtains and people's qi, and the hot degree of a certain live broadcast room is tried to be increased, so that the resources of the whole live broadcast platform are unequal, and the resources are benefited. The live broadcast platform needs to block the users of the malicious accounts, however, the behaviors of the malicious accounts are difficult to directly mine from mass data. The method is a feasible idea for mining the malicious users by using the algorithm.
The malicious user is a black-production batch operation, and the behavior and the action of the malicious user often have certain similarity and have certain difference with the behavior pattern of a normal user. Common malicious user classification algorithms include algorithms such as a decision tree, a support vector machine and a perceptron. The algorithms are usually based on the discrimination of malicious users, continuously distributed malicious user evaluation indexes cannot be provided, and the mining results of the models created based on the algorithms are difficult to manually regulate and control. Moreover, the interpretability of the mining result is poor, and the reason for the judgment result cannot be automatically formed.
Disclosure of Invention
The embodiment of the invention provides a method, a device and a storage medium for identifying malicious users, which are used for solving the problem of low accuracy of mining the malicious users in the existing mechanism.
In a first aspect, the present invention provides a method for identifying a malicious user, the method comprising:
acquiring a training sample set, wherein the training sample set is of a regression tree structure, the training sample set comprises characteristics of training samples and labels corresponding to the training samples, and the labels corresponding to the training samples are used for identifying whether a user is a malicious user;
determining the optimal segmentation characteristic and the optimal segmentation threshold value from the training sample set;
splitting the training sample set to obtain a first sample set and a second sample set, wherein the first sample set is a feature set with the value of the feature not less than the splitting threshold, and the second sample set is a feature set with the value of the feature more than the splitting threshold;
taking the mean value of the features in the first sample set and the second sample set as a target output value;
judging whether the user is a malicious user or not according to the target output value;
setting labels corresponding to training samples which are judged to be malicious users in the first sample set and the second sample set as the malicious users, and setting labels corresponding to training samples which are judged to be non-malicious users in the first sample set and the second sample set as the non-malicious users.
In some possible designs, the target output value is used to evaluate a level of maliciousness of the user.
In some possible designs, the characteristics of the training sample include at least an abnormal barrage number of the user, a playing number of the user, a number of comments posted by the user in a first period of time, a number of same comments posted by the user in a second period of time, and a balance of an account of the user.
In some possible designs, the method further comprises:
setting a block threshold;
the judging whether the user is a malicious user according to the target output value comprises the following steps:
and if the target output value is larger than the forbidden threshold value, determining that the malicious level of the user meets the judgment condition of the malicious user.
In some possible designs, the training sample includes a first feature, a second feature, and a third feature, and after determining whether the user is a malicious user according to the target output value, the method further includes:
generating a feedback result, the feedback result comprising: the first characteristic has a value not greater than a first value, the second characteristic has a value greater than a second value, and the third characteristic has a value not greater than a third value.
In a second aspect, an embodiment of the present invention provides an apparatus for identifying a malicious user, which has a function of implementing the method for identifying a malicious user provided in the above first aspect. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions, which may be software and/or hardware. The device comprises:
the system comprises an acquisition module, a judgment module and a processing module, wherein the acquisition module is used for acquiring a training sample set, the training sample set is of a regression tree structure, the training sample set comprises the characteristics of training samples and labels corresponding to the training samples, and the labels corresponding to the training samples are used for identifying whether a user is a malicious user or not;
the processing module is used for determining the optimal segmentation characteristic and the optimal segmentation threshold value from the training sample set; splitting the training sample set to obtain a first sample set and a second sample set, wherein the first sample set is a feature set with the value of the feature not less than the splitting threshold, and the second sample set is a feature set with the value of the feature more than the splitting threshold; taking the mean value of the features in the first sample set and the second sample set as a target output value; judging whether the user is a malicious user or not according to the target output value; setting labels corresponding to training samples which are judged to be malicious users in the first sample set and the second sample set as the malicious users, and setting labels corresponding to training samples which are judged to be non-malicious users in the first sample set and the second sample set as the non-malicious users.
In some possible designs, the target output value is used to evaluate a level of maliciousness of the user.
In some possible designs, the characteristics of the training sample include at least an abnormal barrage number of the user, a playing number of the user, a number of comments posted by the user in a first period of time, a number of same comments posted by the user in a second period of time, and a balance of an account of the user.
In some possible designs, the processing module is to:
setting a block threshold;
and if the target output value is larger than the forbidden threshold value, determining that the malicious level of the user meets the judgment condition of the malicious user.
In some possible designs, the training sample includes a first feature, a second feature, and a third feature, and the processing module, after determining whether the user is a malicious user according to the target output value, is further configured to:
generating a feedback result, the feedback result comprising: the first characteristic has a value not greater than a first value, the second characteristic has a value greater than a second value, and the third characteristic has a value not greater than a third value.
In a third aspect, an embodiment of the present invention provides an apparatus for identifying a malicious user, including a processor, where the processor is configured to implement, when executing a computer program stored in a memory, the steps in the method for identifying a malicious user as described in the foregoing first aspect embodiment or second aspect embodiment.
In a fourth aspect, an embodiment of the present invention provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the method for identifying a malicious user as described in the foregoing first or second aspect embodiment.
One or more technical solutions in the embodiments of the present invention at least have one or more of the following technical effects:
according to the technical scheme of the embodiment of the invention, after a training sample set is obtained, the optimal segmentation characteristic and the optimal segmentation threshold value are determined from the training sample set; splitting the training sample set to obtain a first sample set and a second sample set, and taking the mean value of the features in the first sample set and the second sample set as a target output value; judging whether the user is a malicious user or not according to the target output value; setting labels corresponding to training samples which are judged to be malicious users in the first sample set and the second sample set as the malicious users, and setting labels corresponding to training samples which are judged to be non-malicious users in the first sample set and the second sample set as the non-malicious users. Therefore, the behavior characteristics of the user can be analyzed based on the mass data, and the malicious user can be screened according to the behavior characteristics of the user. In addition, the network environment of the forum is maintained in a warning or blocking mode, and a good reading environment is provided for the user.
Drawings
Fig. 1 is a flowchart of a method for identifying a malicious user according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a structure of a regression tree according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for identifying a malicious user according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for identifying a malicious user according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device and a storage medium for identifying a malicious user. The method for identifying the malicious user in the embodiment of the invention can be applied to the field of big data processing, for example, the behavior characteristics of the user are analyzed based on mass data, the network environment of forums or live broadcasts is maintained according to the behavior characteristics of the user, and a good reading environment or live broadcast environment is provided for the user.
The technical solutions of the present invention are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present invention are described in detail in the technical solutions of the present application, and are not limited to the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
In order to solve the technical problem, the embodiment of the invention provides the following technical scheme:
the algorithm for mining and evaluating the malicious users based on the regression tree utilizes the regression tree as a basic model, so that the model can give continuous malicious level evaluation, and utilizes the natural interpretability advantage of the tree model when the nodes are split, so that the model can effectively obtain rule basis for sample discrimination while classifying the samples. And mining by using a pattern recognition correlation algorithm. Because the platform blocking operation needs to be very careful, the algorithm needs to perform effective grade assessment on malicious users, and the threshold value is conveniently controlled manually. On the other hand, because the platform needs to be sufficiently persuasive to deal with malicious users, a certain sealing reason needs to be given when sealing. This also means that the algorithm not only needs to have good classification performance, but the algorithm also automatically generates a discriminant reason that can be understood by a human.
Examples
Referring to fig. 1, a method for identifying a malicious user in an embodiment of the present invention is described below. The method comprises the following steps:
101. a training sample set is obtained.
The training sample set is of a regression tree structure, the training sample set comprises features of training samples and labels corresponding to the training samples, and the labels corresponding to the training samples are used for identifying whether the user is a malicious user or not.
In some embodiments, the characteristics of the training sample include at least the number of abnormal barrages of the user, the number of plays of the user, the number of comments posted by the user in the first time period, the number of same comments posted by the user in the second time period, and the balance of the account of the user.
102. And determining the optimal segmentation characteristic and the optimal segmentation threshold value from the training sample set.
103. And splitting the training sample set to obtain a first sample set and a second sample set.
The first sample set refers to a feature set with a feature value not less than the segmentation threshold, and the second sample set refers to a feature set with a feature value greater than the segmentation threshold.
In some embodiments, the training sample set may be split according to a recursive algorithm, for example, after the training sample set is obtained, the training sample set is recursively split based on a regression tree algorithm. Each time the training sample set is split, the training sample set is split into a first sample set and a second sample set. And then, continuously using the regression tree algorithm to split the first sample set and the second sample set respectively. Each splitting operation can be regarded as node splitting of a binary tree, and after several splitting operations, all samples fall into leaf nodes of the binary tree, that is, finally, the splitting operation obtains multiple binary trees of multiple users, and the binary trees include multiple leaf nodes. For each leaf node, a value is given as output.
104. And taking the mean value of the features in the first sample set and the second sample set as a target output value.
105. And judging whether the user is a malicious user or not according to the target output value.
106. Setting labels corresponding to training samples which are judged to be malicious users in the first sample set and the second sample set as the malicious users, and setting labels corresponding to training samples which are judged to be non-malicious users in the first sample set and the second sample set as the non-malicious users.
Compared with the existing mechanism, in the embodiment of the invention, a training sample set is obtained, the training sample set comprises the characteristics of the training samples and labels corresponding to the training samples, and the optimal segmentation characteristics and the optimal segmentation threshold value are determined from the training sample set; splitting a training sample set to obtain a first sample set and a second sample set, and taking the mean value of each feature in the first sample set and the second sample set as a target output value; judging whether the user is a malicious user or not according to the target output value; setting labels corresponding to training samples which are judged to be malicious users in the first sample set and the second sample set as the malicious users, and setting labels corresponding to training samples which are judged to be non-malicious users as the non-malicious users. The label corresponding to the training sample can be used for identifying whether the user is a malicious user, so that by adopting the scheme, the behavior characteristics of the user can be accurately analyzed, and the malicious user can be identified according to the behavior characteristics.
Optionally, in some embodiments of the present application, a way of training the regression tree model is described below, for example, the training sample set includes a plurality of training samples, each training sample has n features (n values of each training sample may be the same or different), and each feature has siAnd (i belongs to (1, n)) taking values. And finding the optimal segmentation characteristic j and the segmentation threshold s from the n characteristics. For the optimal segmentation feature j and the segmentation threshold s, the sample set is segmented into a first sample set R1 and a second sample set R2:
R1(j,s)={x|x(j)≤s}
R2(j,s)={x|x(j)>s}
wherein x (j) represents the value of the feature j of the training sample. For the two sample sets after splitting (e.g., including the first sample set and the second sample set described above), a mean of the two sample sets is defined as an output value of the first sample set or the second sample set
Figure BDA0001820133570000071
Figure BDA0001820133570000072
And for each feature j and the value s, evaluating the effect of the feature j when the sample set is split. In some embodiments, the output values of the first and second sample sets after splitting can be obtained by least squares
Figure BDA0001820133570000073
The minimum residual with the label y is used as an evaluation index Q (j, s), and one way of calculating Q (j, s) is as follows:
Figure BDA0001820133570000074
traversing all the characteristics j in the first sample set and the second sample set, scanning all possible values s of the characteristics j, and then obtaining an optimized residual error function
Figure BDA0001820133570000075
Therefore, the optimal segmentation characteristic and the threshold value of the divided sample set are obtained. And dividing the sample set into two parts, and continuing to recursively divide the sample set until a preset depth is reached, and stopping the recursive division operation. Finally, dividing the sample set after the recursive division into a series of sub-sample sets, wherein the corresponding segmentation rule of each sub-sample set is { R }m(j1,s1),Rm(j2,s2),......,Rm(jn,sn) Where j isn,snThe features and thresholds used for the nth slicing. Output value corresponding to each subsample set
Figure BDA0001820133570000076
As the output value of the input sample. So far, the training of the regression tree model is completed.
Optionally, in some embodiments of the application, after determining whether the user is a malicious user according to the target output value, the method further includes one of:
forbidding a user account number of the user who is judged to be malicious;
or the user account of the malicious user is determined to be forbidden and the effective forbidden time length is set;
or sending a warning message to the user account which is judged to be the malicious user.
Optionally, in some embodiments of the present application, the method further includes:
setting a block threshold;
the judging whether the user is a malicious user according to the target output value comprises the following steps:
and if the target output value is larger than the forbidden threshold value, determining that the malicious level of the user meets the judgment condition of the malicious user.
For example, in the actual use process, data of a certain user is input, the training sample set is divided through the segmentation rule, and each training sample can obtain corresponding output
Figure BDA0001820133570000081
Figure BDA0001820133570000082
The method is a continuous evaluation index and can be used for evaluating the malicious user level of the user. In the actual application process, the platform administrator may set a block threshold, for example, if the block threshold is defined as 0.8, the platform administrator may set the block threshold to be a block threshold
Figure BDA0001820133570000083
Will be automatically disabled. Therefore, the pressure of manual examination is reduced, the method is more flexible, and the manual regulation and control of the sealing force are facilitated.
Optionally, in some embodiments of the present application, the training sample includes a first feature, a second feature, and a third feature, and after determining whether the user is a malicious user according to the target output value, the method further includes:
generating a feedback result, the feedback result comprising: the first characteristic has a value not greater than a first value, the second characteristic has a value greater than a second value, and the third characteristic has a value not greater than a third value.
For example, after an operation is prohibited for a user, feedback on a reason for the prohibition is required. At this time, it may be extracted from the segmentation rule of the regression tree model. For example: the segmentation rule of a certain sample hit in a certain region of the tree is { R1(j1,s1),R2(j2,s2),R1(j3,s3)}。
And developing the segmentation rule to form a feedback result: { x (j)1)≤s1,x(j2)>s2,x(j3)≤s3Where x (j)1)≤s1As a first feature, x (j)2)>s2As a second feature, x (j)3)≤s3Is the third feature. The feedback result enables the reliability of the regression tree model discrimination to be stronger and more convincing.
For ease of understanding, the method for identifying malicious users in the present application is described below by taking a specific application scenario as an example. Taking a regression tree with a depth of 2 obtained through model training as an example, as shown in fig. 2, the regression tree includes a user 1 and a user 2, and behavior statistics of the two users are as follows:
user 1: the number of the user low-custom barrages is 3, the playing times of the user is 10, and the recharging amount of the user is 6;
and (4) a user 2: the quantity of the user low-custom barrage is 10, the playing times of the user is 5, and the recharge amount of the user is 1
And judging according to the regression tree, wherein the malicious user evaluation result of the user 1 is 0.5, and the malicious user evaluation result of the user 2 is 0.9. At this time, if the user-set threshold is 0.8, user 2 will be disabled and user 1 will not do the processing. If the threshold is set to 0.5, both users will be disabled.
Fig. 3 is a schematic structural diagram of an apparatus 30 for identifying a malicious user, which can be applied to interactive network platforms such as a live broadcast platform, a forum, a news service, a microblog service, and the like. The apparatus for identifying a malicious user in the embodiment of the present application can implement the steps corresponding to the method for identifying a malicious user performed in the embodiment corresponding to fig. 1. The functions implemented by the apparatus 30 for identifying malicious users may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, which may be software and/or hardware. The apparatus for identifying a malicious user may include an obtaining module 301 and a processing module 302, and the implementation of the functions of the processing module 302 and the obtaining module 301 may refer to operations executed in the embodiment corresponding to fig. 1, which is not described herein again. The processing module may be configured to control the transceiving operation of the acquisition module 301.
In some embodiments, the obtaining module may be configured to obtain a training sample set, where the training sample set is a regression tree structure, and the training sample set includes features of training samples and labels corresponding to the training samples, where the labels corresponding to the training samples are used to identify whether a user is a malicious user;
the processing module 302 may be configured to determine optimal segmentation features and segmentation thresholds from the training sample set; splitting the training sample set to obtain a first sample set and a second sample set, wherein the first sample set is a feature set with the value of the feature not less than the splitting threshold, and the second sample set is a feature set with the value of the feature more than the splitting threshold; taking the mean value of the features in the first sample set and the second sample set as a target output value; judging whether the user is a malicious user or not according to the target output value; setting labels corresponding to training samples which are judged to be malicious users in the first sample set and the second sample set as the malicious users, and setting labels corresponding to training samples which are judged to be non-malicious users in the first sample set and the second sample set as the non-malicious users.
In the embodiment of the present invention, after the obtaining module 301 obtains the training sample set, the processing module 302 determines the optimal segmentation feature and the optimal segmentation threshold from the training sample set; splitting a training sample set to obtain a first sample set and a second sample set, and taking the mean value of each feature in the first sample set and the second sample set as a target output value; judging whether the user is a malicious user or not according to the target output value; setting labels corresponding to training samples which are judged to be malicious users in the first sample set and the second sample set as the malicious users, and setting labels corresponding to training samples which are judged to be non-malicious users as the non-malicious users. By adopting the scheme, the behavior characteristics of the user can be accurately analyzed, and the malicious user can be identified according to the behavior characteristics.
In some embodiments, the target output value is used to assess a level of maliciousness of the user.
In some embodiments, the characteristics of the training sample include at least the number of abnormal barrages of the user, the number of plays of the user, the number of comments posted by the user in the first time period, the number of same comments posted by the user in the second time period, and the balance of the account of the user.
In some embodiments, the processing module 302 is configured to:
setting a block threshold;
and if the target output value is larger than the forbidden threshold value, determining that the malicious level of the user meets the judgment condition of the malicious user.
In some embodiments, the training sample includes a first feature, a second feature, and a third feature, and the processing module 302, after determining whether the user is a malicious user according to the target output value, is further configured to:
generating a feedback result, the feedback result comprising: the first characteristic has a value not greater than a first value, the second characteristic has a value greater than a second value, and the third characteristic has a value not greater than a third value.
The apparatus for identifying a malicious user in the embodiment of the present application is described above from the perspective of a modular functional entity, and the apparatus for identifying a malicious user is described below from the perspective of hardware, as shown in fig. 4, and includes: a processor, a memory, a transceiver (which may also be an input-output unit, not identified in fig. 4), and a computer program stored in the memory and executable on the processor. For example, the computer program may be a program corresponding to the method for identifying a malicious user in the embodiment corresponding to fig. 1. For example, when the apparatus for identifying a malicious user implements the functions of the apparatus for identifying a malicious user 30 shown in fig. 3, the processor executes the computer program to implement the steps of the method for identifying a malicious user performed by the apparatus for identifying a malicious user 30 in the embodiment corresponding to fig. 3; alternatively, the processor, when executing the computer program, implements the functions of the modules in the apparatus 30 for identifying a malicious user according to the embodiment corresponding to fig. 3. For another example, the computer program may be a program corresponding to the method for identifying a malicious user in the embodiment corresponding to fig. 1.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program in the computer apparatus.
The data processing device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the illustrations are merely examples of a computer apparatus and are not meant to be limiting as a server may include more or less components than those shown, or some components may be combined, or different components, e.g., the server may also include input output devices, network access devices, buses, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like which is the control center for the computer device and which connects the various parts of the overall computer device using various interfaces and lines.
The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the computer device by running or executing the computer programs and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The transceivers may also be replaced by receivers and transmitters, which may be the same or different physical entities. When the same physical entity, may be collectively referred to as a transceiver. The memory may be integrated in the processor or may be provided separately from the processor. The transceiver may be an input-output unit.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the apparatus 30 for identifying a malicious user in the embodiment corresponding to fig. 3 is implemented in the form of a software functional unit and sold or used as a standalone product, and the computer program may be stored in a computer-readable storage medium. Based on such understanding, all or part of the flow of the method for identifying a malicious user in the embodiment corresponding to fig. 1 may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
While alternative embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following appended claims be interpreted as including alternative embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method of identifying a malicious user, the method comprising:
acquiring a training sample set, wherein the training sample set is of a regression tree structure, the training sample set comprises characteristics of training samples and labels corresponding to the training samples, and the labels corresponding to the training samples are used for identifying whether a user is a malicious user;
determining the optimal segmentation characteristic and the optimal segmentation threshold value from the training sample set;
splitting the training sample set to obtain a first sample set and a second sample set, wherein the first sample set is a feature set with the value of the feature not less than the splitting threshold, and the second sample set is a feature set with the value of the feature more than the splitting threshold;
taking the mean value of the features in the first sample set and the second sample set as a target output value;
judging whether the user is a malicious user or not according to the target output value;
setting labels corresponding to training samples which are judged to be malicious users in the first sample set and the second sample set as the malicious users, and setting labels corresponding to training samples which are judged to be non-malicious users in the first sample set and the second sample set as the non-malicious users.
2. The method of claim 1, wherein the target output value is used to assess a user's level of maliciousness; the characteristics of the training sample at least comprise the number of abnormal barrages of the user, the playing number of the user, the number of comments made by the user in the first time period, the same number of comments made by the user in the second time period and the balance of the user account.
3. The method of claim 2, wherein after determining whether the user is a malicious user based on the target output value, the method further comprises one of:
forbidding a user account number of the user who is judged to be malicious;
or the user account of the malicious user is determined to be forbidden and the effective forbidden time length is set;
or sending a warning message to the user account which is judged to be the malicious user.
4. The method of any one of claims 1-3, further comprising:
setting a block threshold;
the judging whether the user is a malicious user according to the target output value comprises the following steps:
and if the target output value is larger than the forbidden threshold value, determining that the malicious level of the user meets the judgment condition of the malicious user.
5. The method of claim 4, wherein the training sample comprises a first feature, a second feature, and a third feature, and wherein after determining whether the user is a malicious user based on the target output value, the method further comprises:
generating a feedback result, the feedback result comprising: the first characteristic has a value not greater than a first value, the second characteristic has a value greater than a second value, and the third characteristic has a value not greater than a third value.
6. An apparatus for identifying malicious users, the apparatus comprising:
the system comprises an acquisition module, a judgment module and a processing module, wherein the acquisition module is used for acquiring a training sample set, the training sample set is of a regression tree structure, the training sample set comprises the characteristics of training samples and labels corresponding to the training samples, and the labels corresponding to the training samples are used for identifying whether a user is a malicious user or not;
the processing module is used for determining the optimal segmentation characteristic and the optimal segmentation threshold value from the training sample set; splitting the training sample set to obtain a first sample set and a second sample set, wherein the first sample set is a feature set with the value of the feature not less than the splitting threshold, and the second sample set is a feature set with the value of the feature more than the splitting threshold; taking the mean value of the features in the first sample set and the second sample set as a target output value; judging whether the user is a malicious user or not according to the target output value; setting labels corresponding to training samples which are judged to be malicious users in the first sample set and the second sample set as the malicious users, and setting labels corresponding to training samples which are judged to be non-malicious users in the first sample set and the second sample set as the non-malicious users.
7. The apparatus of claim 6, wherein the target output value is used to evaluate a user's level of maliciousness; the characteristics of the training sample at least comprise the number of abnormal barrages of the user, the playing number of the user, the number of comments made by the user in the first time period, the same number of comments made by the user in the second time period and the balance of the user account.
8. The apparatus of claim 6 or 7, wherein the processing module is to:
setting a block threshold;
and if the target output value is larger than the forbidden threshold value, determining that the malicious level of the user meets the judgment condition of the malicious user.
9. A data processing apparatus comprising a processor for implementing the steps in the method of identifying malicious users as claimed in any one of claims 1 to 5 when executing a computer program stored in a memory.
10. A readable storage medium, having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of identifying a malicious user according to any of claims 1 to 5.
CN201811161527.7A 2018-09-30 2018-09-30 Method, device and storage medium for identifying malicious user Active CN109255391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811161527.7A CN109255391B (en) 2018-09-30 2018-09-30 Method, device and storage medium for identifying malicious user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811161527.7A CN109255391B (en) 2018-09-30 2018-09-30 Method, device and storage medium for identifying malicious user

Publications (2)

Publication Number Publication Date
CN109255391A CN109255391A (en) 2019-01-22
CN109255391B true CN109255391B (en) 2021-07-23

Family

ID=65045252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811161527.7A Active CN109255391B (en) 2018-09-30 2018-09-30 Method, device and storage medium for identifying malicious user

Country Status (1)

Country Link
CN (1) CN109255391B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232473B (en) * 2019-05-22 2022-12-27 重庆邮电大学 Black product user prediction method based on big data finance
CN110705584A (en) * 2019-08-21 2020-01-17 深圳壹账通智能科技有限公司 Emotion recognition method, emotion recognition device, computer device and storage medium
CN111104963B (en) * 2019-11-22 2023-10-24 贝壳技术有限公司 Target user determining method and device, storage medium and electronic equipment
CN112395556B (en) * 2020-09-30 2022-09-06 广州市百果园网络科技有限公司 Abnormal user detection model training method, abnormal user auditing method and device
CN112533018B (en) * 2020-12-02 2023-04-07 北京五八信息技术有限公司 Method and device for processing data of live broadcast room
CN114302216B (en) * 2021-08-25 2024-03-22 上海哔哩哔哩科技有限公司 Barrage processing method, device, equipment and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105915960A (en) * 2016-03-31 2016-08-31 广州华多网络科技有限公司 User type determination method and device
CN106777024A (en) * 2016-12-08 2017-05-31 北京小米移动软件有限公司 Recognize the method and device of malicious user
CN106919579A (en) * 2015-12-24 2017-07-04 腾讯科技(深圳)有限公司 A kind of information processing method and device, equipment
WO2017219548A1 (en) * 2016-06-20 2017-12-28 乐视控股(北京)有限公司 Method and device for predicting user attributes
CN108174296A (en) * 2018-01-02 2018-06-15 武汉斗鱼网络科技有限公司 Malicious user recognition methods and device
CN108470253A (en) * 2018-04-02 2018-08-31 腾讯科技(深圳)有限公司 A kind of user identification method, device and storage device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919579A (en) * 2015-12-24 2017-07-04 腾讯科技(深圳)有限公司 A kind of information processing method and device, equipment
CN105915960A (en) * 2016-03-31 2016-08-31 广州华多网络科技有限公司 User type determination method and device
WO2017219548A1 (en) * 2016-06-20 2017-12-28 乐视控股(北京)有限公司 Method and device for predicting user attributes
CN106777024A (en) * 2016-12-08 2017-05-31 北京小米移动软件有限公司 Recognize the method and device of malicious user
CN108174296A (en) * 2018-01-02 2018-06-15 武汉斗鱼网络科技有限公司 Malicious user recognition methods and device
CN108470253A (en) * 2018-04-02 2018-08-31 腾讯科技(深圳)有限公司 A kind of user identification method, device and storage device

Also Published As

Publication number Publication date
CN109255391A (en) 2019-01-22

Similar Documents

Publication Publication Date Title
CN109255391B (en) Method, device and storage medium for identifying malicious user
US11689549B2 (en) Continuous learning for intrusion detection
US9398034B2 (en) Matrix factorization for automated malware detection
AU2011200343B2 (en) Image identification information adding program, image identification information adding apparatus and image identification information adding method
US20170063893A1 (en) Learning detector of malicious network traffic from weak labels
CN108345641B (en) Method for crawling website data, storage medium and server
CN107368856B (en) Malicious software clustering method and device, computer device and readable storage medium
CN107909038B (en) Social relationship classification model training method and device, electronic equipment and medium
CN108629047B (en) Song list generation method and terminal equipment
CN112116225A (en) Fighting efficiency evaluation method and device for equipment system, and storage medium
KR20170109304A (en) Method for parallel learning of cascade classifier by object recognition
CN111783812A (en) Method and device for identifying forbidden images and computer readable storage medium
US6789070B1 (en) Automatic feature selection system for data containing missing values
CN108628873B (en) Text classification method, device and equipment
CN112560545B (en) Method and device for identifying form direction and electronic equipment
CN113987243A (en) Image file gathering method, image file gathering device and computer readable storage medium
CN112861127A (en) Malicious software detection method and device based on machine learning and storage medium
CN111966920A (en) Public opinion propagation stable condition prediction method, device and equipment
CN111144546A (en) Scoring method and device, electronic equipment and storage medium
CN111325228B (en) Model training method and device
CN112769540B (en) Diagnosis method, system, equipment and storage medium for side channel information leakage
CN112949305B (en) Negative feedback information acquisition method, device, equipment and storage medium
CN111930935A (en) Image classification method, device, equipment and storage medium
CN111353860A (en) Product information pushing method and system
CN113327601B (en) Method, device, computer equipment and storage medium for identifying harmful voice

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant