CN104811424A

CN104811424A - Malicious user identification method and device

Info

Publication number: CN104811424A
Application number: CN201410037848.1A
Authority: CN
Inventors: 谢波; 周斌; 赵立; 刘婷婷
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2014-01-26
Filing date: 2014-01-26
Publication date: 2015-07-29
Anticipated expiration: 2034-01-26
Also published as: CN104811424B

Abstract

The invention relates to a malicious user identification method. The malicious user identification method includes the following steps that: user identifiers of global users are obtained, and credit corresponding to the user identifiers is calculated, and global user credit distribution is generated according to the credit corresponding to the user identifiers of the global users; user identifiers of conditional users are screened from the user identifiers of the global users according to query conditions; credit corresponding to the user identifiers of the conditional users is obtained, and conditional user credit distribution is generated according to the credit corresponding to the user identifiers of the conditional users; the number of global high-quality users and the number of conditional high-quality users of which the credit is larger than a credit threshold value are respectively obtained from the global user credit distribution and the conditional user credit distribution; and a preset threshold value of the number of high-quality users is obtained, and the difference value of the number of the global high-quality users and the number of the conditional high-quality users is calculated, and whether the query conditions are corresponding to malicious users can be judged according to the difference value and the threshold value of the number of the high-quality users. The above malicious user identification method can improve the accuracy of identification.

Description

Malicious user recognition methods and device

Technical field

The present invention relates to Internet technical field, particularly relate to a kind of malicious user recognition methods and device.

Background technology

In traditional social networks application, some malicious user usually uses multiple social networks account to mass-send advertisement, pornographic, violence or has the junk information of political motives under same IP.Then the message content of server is submitted to identify malicious user according to user in conventional art, but submit to the malicious user of junk information usually to add interfere information (such as to the information content in junk information pointedly, symbol etc. is added between the character of sensitive word), make the identification of malicious user inaccurate.

Summary of the invention

Based on this, be necessary to provide a kind of malicious user recognition methods that can improve the accuracy rate identified.

A kind of malicious user recognition methods, comprising:

Obtain the user ID of overall user, calculate the credit rating of its correspondence, the credit rating corresponding according to the user ID of described overall user generates the distribution of overall user credit;

Obtain the querying condition preset, in the user ID of described overall user, screen the user ID obtaining condition users according to described querying condition;

Obtain the credit rating that the user ID of described condition users is corresponding, the credit rating formation condition user credit distribution corresponding according to the user ID of described condition users;

Obtain the credit threshold preset, in described overall user credit distribution and the distribution of condition users credit, obtain overall high-quality/malicious user number and condition high-quality/malicious user number that its credit rating is greater than/is less than described credit threshold respectively;

Obtain the high-quality/malicious user number threshold value preset, calculate the difference of described overall high-quality/malicious user number and described condition high-quality/malicious user number, the whether corresponding malicious user of querying condition according to described difference and described high-quality/malicious user number threshold decision.

In addition, there is a need to provide a kind of malicious user recognition device that can improve the accuracy rate identified.

A kind of malicious user recognition device, comprising:

Overall situation distribution generation module, for obtaining the user ID of overall user, calculates the credit rating of its correspondence, and the credit rating corresponding according to the user ID of described overall user generates the distribution of overall user credit;

Condition users searches module, for obtaining default querying condition, screens the user ID obtaining condition users according to described querying condition in the user ID of described overall user;

Condition distribution generation module, the credit rating that the user ID for obtaining described condition users is corresponding, the credit rating formation condition user credit distribution corresponding according to the user ID of described condition users;

User screens module, for obtaining default credit threshold, in described overall user credit distribution and the distribution of condition users credit, obtains the overall high-quality/malicious user number and the condition high-quality/malicious user number that are greater than/are less than described credit threshold respectively;

Malicious user identification module, for obtaining default high-quality/malicious user number threshold value, calculate the difference of described overall high-quality/malicious user number and described condition high-quality/malicious user number, the whether corresponding malicious user of querying condition according to described difference and described high-quality/malicious user number threshold decision.

Above-mentioned malicious user recognition methods, condition users credit rating distribution according to condition users corresponding to querying condition carrys out the whether corresponding malicious user of decision condition user with the otherness that the overall user credit degree that social networks applies the overall user registered distributes, compared with conventional art, even if some junk information that with the addition of interference leaks through inspection, still the message otherness on the whole submitted to by malicious user and non-malicious user is identified, thus the accuracy rate identified is higher.

Accompanying drawing explanation

Fig. 1 is the flow chart of malicious user recognition methods in an embodiment;

Fig. 2 is overall user credit distribution schematic diagram in an embodiment;

Fig. 3 is an embodiment conditional user credit distribution schematic diagram;

Fig. 4 is the overall user credit distribution schematic diagram using credit grade in an embodiment;

Fig. 5 is the condition users credit distribution schematic diagram using credit grade in an embodiment;

Fig. 6 is the structural representation of malicious user recognition device in an embodiment;

Fig. 7 is the hardware environment figure of malicious user recognition device in an embodiment.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.

Unless context separately has the description of specific distinct, the element in the present invention and assembly, the form that quantity both can be single exists, and form that also can be multiple exists, and the present invention does not limit this.Although the step in the present invention arranges with label, and be not used in and limit the precedence of step, the order of step or the execution of certain step need based on other steps unless expressly stated, otherwise the relative rank of step is adjustable.Be appreciated that term "and/or" used herein relates to and contains the one or more any and all possible combination in the Listed Items be associated.

As shown in Figure 1, in one embodiment, a kind of malicious user recognition methods, the execution of the method depends on computer program, and can run in the computer system based on Feng Luoyiman system, the method comprises:

Step S102, obtains the user ID of overall user, calculates the credit rating of its correspondence, and the credit rating corresponding according to the user ID of described overall user generates the distribution of overall user credit.

Overall situation user is the user of registration in social networks application.User after registration, usually uses user ID and corresponding log-on message, the subscriber data of user data library storage registered users in social networks application in social networks application.The user ID of overall user can be found in customer data base.

Credit rating corresponding to user ID is the creditworthiness that social networks account corresponding to this user ID submits to the message (UGC, User Generated Content, user-generated content) of social networking service device.If certain user often uses its social networks account to submit the junk information with pornographic, violence, advertisement or political motives in social networks application, then the credit rating of this user is then lower.

Malicious user is the lower user of credit rating, and junk information is frequently propagated to other users by malicious user usually, thus affects the normal browsing of other users.Corresponding with malicious user is high-quality user, and high-quality user credit degree is higher, and for normally using social networks, less violation is posted, lessly repeat to post and the normative user of content of posting.

In the present embodiment, the step calculating credit rating corresponding to the user ID of overall user can be specially: the message obtaining the user ID of overall user corresponding submits to number of times, online hours and rubbish message to submit number of times to; After submitting to number of times to be multiplied by the corresponding weight coefficient preset respectively by message being submitted to number of times, online hours and rubbish message, superposition obtains credit rating corresponding to the user ID of overall user.

Such as, if certain user submits message N1 time to social networks application altogether, wherein submit to the number of times of rubbish message to be N2 time, and online number of days is D, then can according to formula:

C=k1×N1+k2×D-k3×N2

Wherein, C is credit rating, and k1, k2 and k3 are respectively corresponding weight coefficient.

In preferred mode, also time interval can be preset, obtain message submission number of times, online hours and the rubbish message submission number of times that the user ID of overall user in this time interval is corresponding, and go out the credit rating of overall user in this time interval according to above-mentioned formulae discovery.Such as, the time interval preset is last month, thus the user that can obtain in social networks application submit the number of times of message in the month before in 30 days to, submit the number of times of rubbish message to and in 30 days last months by how many days logged, and then calculate overall user credit rating in the month before.

It should be noted that, for the message that user submits to, identify whether it is rubbish message by keyword match or machine learning algorithm.Such as, if having the keyword with pornographic, violence, politics, advertisement and political motives in the content of message, then can be judged to be rubbish message, accordingly, submit to during number of times and then add 1 adding up rubbish message.Also can use for identifying that the SVMs of spam is classified to message, thus identify whether the message that user submits to is rubbish message.

After the distribution of overall situation user credit is the respective credit rating of user calculated in social networks application, the distribution situation of credit rating size and user number.It should be noted that, number of users is percentage number of users, below repeats no more.As described in Figure 2, transverse axis is credit rating (can normalize to 0 to 100), and the longitudinal axis is the percentage number of users being less than or equal to corresponding credit rating.Can find out by figure, the number of users that credit rating is less than or equal to 60 is 30%.

Step S104, obtains the querying condition preset, screens the user ID obtaining condition users according to querying condition in the user ID of overall user.

In the present embodiment, the conditional information of specific user colony (i.e. condition users) is filtered out in the registered user that querying condition is applied at social networks.It can be the one in IP address, message identifier, groups of users mark or individual subscriber information attribute value.

Such as, if a large amount of message has been mass-sended in certain IP address in one month, namely have submitted a large amount of UGC to social interaction server device, the user that then this IP address is corresponding may be malicious user, this malicious user may use under this IP address multiple social networks account send junk information, can according to using this IP address as querying condition.

Again such as, if certain message be forwarded repeatedly by multiple user, then the user forwarded also is probably malicious user, and it forwards rubbish message in malice, this message identifier can be found out the condition users of all this message of forwarding as querying condition.

For groups of users mark and individual subscriber information attribute value, lawless person first registers multiple social networks application account usually, by switching mass-sending rubbish message, these lawless persons are for mass-sending the social networks application account of junk information usually when registering, the individual subscriber information attribute value of filling in has similitude (being such as the people in same place, the graduation of same school etc.) usually, and individual subscriber information attribute value can be used to filter out this part user as querying condition.

Step S106, obtains the credit rating that the user ID of condition users is corresponding, the credit rating formation condition user credit distribution corresponding according to the user ID of condition users.

As previously mentioned, calculated user's credit rating separately of registration in overall user and social networks application, then can obtain the credit rating that the user ID of the condition users filtered out is corresponding.The distribution of condition users credit is the credit rating distribution generated using condition users as sample space, as shown in Figure 3.

Further, the number of the user ID of searching the condition users obtained also can be obtained before obtaining the step of credit rating corresponding to the user ID of condition users, judge whether it is greater than recognition threshold, if so, then perform the step obtaining credit rating corresponding to the user ID of condition users.

That is, if according to the negligible amounts of the user ID that querying condition filters out, be less than recognition threshold, then can judge that the user that this querying condition is corresponding is not malicious user (the malicious user mass-sending rubbish message larger because of harmfulness needs very many social networks application accounts usually).

Step S108, obtains the credit threshold preset, and obtains overall high-quality/malicious user number and condition high-quality/malicious user number that its credit rating is greater than/is less than credit threshold in overall user credit distribution and the distribution of condition users credit respectively.

As shown in Figures 2 and 3, if use high-quality number of users to carry out identifying and credit threshold is 60, then can obtain the overall high-quality number of users that credit rating is greater than 60 is; The condition high-quality number of users that credit rating is greater than 60 is.If use malicious user number to carry out identifying and credit threshold is 30, then can obtain the overall malicious user number that credit rating is less than 30 is, the condition malicious user number that credit rating is less than 30 is.

Step S110, obtains the high-quality/malicious user number threshold value preset, calculates the difference of overall high-quality/malicious user number and described condition high-quality/malicious user number, according to difference and described high-quality/whether corresponding malicious user of malicious user number threshold decision querying condition.

If use high-quality number of users to identify, then overall high-quality number of users and condition high-quality number of users are subtracted each other, if difference is greater than high-quality number of users threshold value, the social networks application account for mass-sending junk information that the user ID that then querying condition the is corresponding i.e. corresponding malicious user of possibility is registered in a large number, namely judges the corresponding malicious user of this querying condition.

If use malicious user number to identify, then condition malicious user number and overall malicious user number (percentage) are subtracted each other, if difference is greater than malicious user number threshold value, the social networks application account for mass-sending junk information that the user ID that then querying condition the is corresponding i.e. corresponding malicious user of possibility is registered in a large number, namely judges the corresponding malicious user of this querying condition.

After determined the corresponding malicious user of querying condition, the keeper of social networks application then can the user ID corresponding to corresponding querying condition process.Such as, if querying condition is IP address, then the keeper of social networks application can by this IP address mask.

In one embodiment, also comprise after calculating the step of credit rating corresponding to the user ID of overall user:

Obtain the grade interval threshold value preset, determine the credit rating grade belonging to the credit rating that the user ID of described overall user is corresponding according to grade interval threshold value.

And credit threshold is default target credit rating grade.

As shown in Figure 4 and Figure 5, the credit rating of (between linear scale to 0 to 100) after normalization can be divided into 5 grades by grade interval threshold value, and comprising: poor, poor, general, better, good, each grade span is the credit rating of 20.Then namely overall user credit distribution and the distribution of condition users credit convert the discrete column distribution only comprising 5 columns to.The credit threshold preset can be set to general or better, then when calculating overall high-quality number of users, then can be that better or good number of users is as overall high-quality user using credit rating grade.

Use credit rating grade, simply discrete column distribution can be used to calculate, thus decrease amount of calculation, improve recognition efficiency.

In one embodiment, as shown in Figure 6, a kind of malicious user recognition device, comprising: overall situation distribution generation module 10, condition users search module 20, condition distributes generation module 30, user screens module 40 and malicious user identification module 50, wherein:

Overall situation distribution generation module 10, for obtaining the user ID of overall user, calculates the credit rating of its correspondence, and the credit rating corresponding according to the user ID of overall user generates the distribution of overall user credit.

Condition users searches module 20, for obtaining default querying condition, screens the user ID obtaining condition users according to querying condition in the user ID of overall user.

Condition distribution generation module 30, the credit rating that the user ID for obtaining condition users is corresponding, the credit rating formation condition user credit distribution corresponding according to the user ID of condition users.

User screens module 40, for obtaining default credit threshold, in overall user credit distribution and the distribution of condition users credit, obtains the overall high-quality/malicious user number and the condition high-quality/malicious user number that are greater than/are less than credit threshold respectively.

Malicious user identification module 50, for obtaining default high-quality/malicious user number threshold value, calculate the difference of overall high-quality/malicious user number and condition high-quality/malicious user number, according to difference and high-quality/whether corresponding malicious user of malicious user number threshold decision querying condition.

In one embodiment, overall situation distribution generation module 10 also submits to number of times, online hours and rubbish message to submit number of times to for the message that the user ID obtaining overall user is corresponding; After submitting to number of times to be multiplied by the corresponding weight coefficient preset respectively by message being submitted to number of times, online hours and rubbish message, superposition obtains credit rating corresponding to the user ID of overall user.

In one embodiment, overall situation distribution generation module 10, also for obtaining default grade threshold, determines the credit rating grade belonging to the credit rating that the user ID of overall user is corresponding according to grade threshold; And credit threshold is default target credit rating grade.

In one embodiment, querying condition is the one in IP address, message identifier, groups of users mark or individual subscriber information attribute value.

In one embodiment, condition distribution generation module 30, also for obtaining the number of the user ID of searching the condition users obtained, judging whether it is greater than recognition threshold, if so, then obtaining the credit rating that the user ID of condition users is corresponding.

As shown in Figure 7, Fig. 7 is the module map of a computer system 1000 that can realize the embodiment of the present invention.This computer system 1000 is an example being applicable to computer environment of the present invention, can not think to propose any restriction to the scope of application of the present invention.Computer system 1000 can not be interpreted as the combination needing the one or more parts depending on or have in illustrated exemplary computer system 1000.

Computer system 1000 shown in Fig. 7 is the examples being suitable for computer system of the present invention.Other framework with different sub-systems configuration also can use.The similar devices such as the desktop computer known by masses, notebook, personal digital assistant, smart phone, panel computer, portable electronic device, Set Top Box are such as had to go for some embodiments of the present invention.But be not limited to above cited equipment.

As shown in Figure 7, computer system 1000 comprises processor 1010, memory 1020 and system bus 1022.The various system components comprising memory 1020 and processor 1010 are connected on system bus 1022.Processor 1010 is the hardware being used for being performed by arithmetic sum logical operation basic in computer system computer program instructions.Memory 1020 be one for storing the physical equipment of calculation procedure or data (such as, program state information) temporarily or permanently.System bus 1020 can be any one in the bus structures of following several types, comprises memory bus or storage control, peripheral bus and local bus.Processor 1010 and memory 1020 can carry out data communication by system bus 1022.Wherein memory 1020 comprises read-only memory (ROM) or flash memory (all not shown in figure), and random-access memory (ram), and RAM typically refers to the main storage being loaded with operating system and application program.

Computer system 1000 also comprises display interface 1030(such as, Graphics Processing Unit), display device 1040(such as, liquid crystal display), audio interface 1050(such as, sound card) and audio frequency apparatus 1060(such as, loud speaker).Display device 1040 and audio frequency apparatus 1060 are the media devices for experiencing content of multimedia.

Computer system 1000 generally comprises a memory device 1070.Memory device 1070 can be selected from multiple computer-readable medium, and computer-readable medium refers to any available medium can accessed by computer system 1000, that comprise movement and fixing two media.Such as, computer-readable medium includes but not limited to, flash memory (miniature SD card), CD-ROM, digital versatile disc (DVD) or other optical disc storage, cassette, tape, disk storage or other magnetic storage apparatus, or can be used for storing information needed and other medium any can accessed by computer system 1000.

Computer system 1000 also comprises input unit 1080 and input interface 1090(such as, I/O controller).User can pass through input unit 1080, and as the touch panel equipment in keyboard, mouse, display unit 1040, input instruction and information are in computer system 1000.Input unit 1080 is normally connected on system bus 1022 by input interface 1090, but also can be connected by other interface or bus structures, as USB (USB).

Computer system 1000 can be carried out logic with one or more network equipment in a network environment and is connected.The network equipment can be PC, server, router, smart phone, panel computer or other common network node.Computer system 1000 is connected with the network equipment by Local Area Network interface 1100 or mobile comm unit 1110.Local Area Network refers in finite region, such as family, school, computer laboratory or use the office building of the network media, the computer network of interconnected composition.WiFi and twisted-pair feeder wiring Ethernet are two kinds of technology of the most frequently used structure local area network (LAN).WiFi is a kind of technology that can make computer system 1000 swapping data or be connected to wireless network by radio wave.Mobile comm unit 1110 can be answered by radio communication diagram while movement and call in a wide geographic area.Except call, mobile comm unit 1110 is also supported in the 2G providing mobile data service, carries out internet access in 3G or 4G cellular communication system.

It should be pointed out that other computer system comprising the subsystem more more or less than computer system 1000 also can be applicable to invention.Such as, computer system 1000 can comprise can in short distance the bluetooth unit of swap data, for the imageing sensor of taking a picture, and for the accelerometer of acceleration measurement.

As described in detail, be applicable to the assigned operation that computer system 1000 of the present invention can perform treatment state methods of exhibiting above.The form of the software instruction that computer system 1000 is operated in computer-readable medium by processor 1010 performs these operations.These software instructions can be read into memory 1020 from memory device 1070 or by lan interfaces 1100 from another equipment.The software instruction be stored in memory 1020 makes processor 1010 perform above-mentioned treatment state methods of exhibiting.In addition, also the present invention can be realized equally by hardware circuit or hardware circuit in conjunction with software instruction.Therefore, the combination that the present invention is not limited to any specific hardware circuit and software is realized.

The above embodiment only have expressed several execution mode of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection range of patent of the present invention should be as the criterion with claims.

Claims

1. a malicious user recognition methods, comprising:

2. malicious user recognition methods according to claim 1, is characterized in that, the step of the credit rating that the user ID of the overall user of described calculating is corresponding is:

Obtain message submission number of times, online hours and the rubbish message submission number of times that the user ID of described overall user is corresponding;

After submitting to number of times to be multiplied by the corresponding weight coefficient preset respectively by described message being submitted to number of times, online hours and rubbish message, superposition obtains credit rating corresponding to the user ID of described overall user.

3. malicious user recognition methods according to claim 1, is characterized in that, also comprises after the step of the credit rating that the user ID of the overall user of described calculating is corresponding:

Obtain the grade interval threshold value preset, determine the credit rating grade belonging to the credit rating that the user ID of described overall user is corresponding according to described grade interval threshold value;

And described credit threshold is default target credit rating grade.

4. malicious user recognition methods according to claim 1, is characterized in that, described querying condition is the one in IP address, message identifier, groups of users mark or individual subscriber information attribute value.

5. malicious user recognition methods according to claim 1, is characterized in that, also comprises before the step of the credit rating that the user ID of the described condition users of described acquisition is corresponding:

Search the number of the user ID of the condition users obtained described in acquisition, judge whether it is greater than recognition threshold, if so, then perform the step obtaining credit rating corresponding to the user ID of described condition users.

6. a malicious user recognition device, is characterized in that, comprising:

7. malicious user recognition device according to claim 6, is characterized in that, described overall situation distribution generation module also submits to number of times, online hours and rubbish message to submit number of times to for the message that the user ID obtaining described overall user is corresponding; After submitting to number of times to be multiplied by the corresponding weight coefficient preset respectively by described message being submitted to number of times, online hours and rubbish message, superposition obtains credit rating corresponding to the user ID of described overall user.

8. malicious user recognition device according to claim 6, it is characterized in that, described overall situation distribution generation module, also for obtaining default grade threshold, determines the credit rating grade belonging to the credit rating that the user ID of described overall user is corresponding according to described grade threshold;

And described credit threshold is default target credit rating grade.

9. malicious user recognition device according to claim 6, is characterized in that, described querying condition is the one in IP address, message identifier, groups of users mark or individual subscriber information attribute value.

10. malicious user recognition device according to claim 6, it is characterized in that, described condition distribution generation module is also for searching the number of the user ID of the condition users obtained described in obtaining, judge whether it is greater than recognition threshold, if so, the credit rating that the user ID of described condition users is corresponding is then obtained.