Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
Unless context separately has the description of specific distinct, the element in the present invention and assembly, the form that quantity both can be single exists, and form that also can be multiple exists, and the present invention does not limit this.Although the step in the present invention arranges with label, and be not used in and limit the precedence of step, the order of step or the execution of certain step need based on other steps unless expressly stated, otherwise the relative rank of step is adjustable.Be appreciated that term "and/or" used herein relates to and contains the one or more any and all possible combination in the Listed Items be associated.
As shown in Figure 1, in one embodiment, a kind of malicious user recognition methods, the execution of the method depends on computer program, and can run in the computer system based on Feng Luoyiman system, the method comprises:
Step S102, obtains the user ID of overall user, calculates the credit rating of its correspondence, and the credit rating corresponding according to the user ID of described overall user generates the distribution of overall user credit.
Overall situation user is the user of registration in social networks application.User after registration, usually uses user ID and corresponding log-on message, the subscriber data of user data library storage registered users in social networks application in social networks application.The user ID of overall user can be found in customer data base.
Credit rating corresponding to user ID is the creditworthiness that social networks account corresponding to this user ID submits to the message (UGC, User Generated Content, user-generated content) of social networking service device.If certain user often uses its social networks account to submit the junk information with pornographic, violence, advertisement or political motives in social networks application, then the credit rating of this user is then lower.
Malicious user is the lower user of credit rating, and junk information is frequently propagated to other users by malicious user usually, thus affects the normal browsing of other users.Corresponding with malicious user is high-quality user, and high-quality user credit degree is higher, and for normally using social networks, less violation is posted, lessly repeat to post and the normative user of content of posting.
In the present embodiment, the step calculating credit rating corresponding to the user ID of overall user can be specially: the message obtaining the user ID of overall user corresponding submits to number of times, online hours and rubbish message to submit number of times to; After submitting to number of times to be multiplied by the corresponding weight coefficient preset respectively by message being submitted to number of times, online hours and rubbish message, superposition obtains credit rating corresponding to the user ID of overall user.
Such as, if certain user submits message N1 time to social networks application altogether, wherein submit to the number of times of rubbish message to be N2 time, and online number of days is D, then can according to formula:
C=k1×N1+k2×D-k3×N2
Wherein, C is credit rating, and k1, k2 and k3 are respectively corresponding weight coefficient.
In preferred mode, also time interval can be preset, obtain message submission number of times, online hours and the rubbish message submission number of times that the user ID of overall user in this time interval is corresponding, and go out the credit rating of overall user in this time interval according to above-mentioned formulae discovery.Such as, the time interval preset is last month, thus the user that can obtain in social networks application submit the number of times of message in the month before in 30 days to, submit the number of times of rubbish message to and in 30 days last months by how many days logged, and then calculate overall user credit rating in the month before.
It should be noted that, for the message that user submits to, identify whether it is rubbish message by keyword match or machine learning algorithm.Such as, if having the keyword with pornographic, violence, politics, advertisement and political motives in the content of message, then can be judged to be rubbish message, accordingly, submit to during number of times and then add 1 adding up rubbish message.Also can use for identifying that the SVMs of spam is classified to message, thus identify whether the message that user submits to is rubbish message.
After the distribution of overall situation user credit is the respective credit rating of user calculated in social networks application, the distribution situation of credit rating size and user number.It should be noted that, number of users is percentage number of users, below repeats no more.As described in Figure 2, transverse axis is credit rating (can normalize to 0 to 100), and the longitudinal axis is the percentage number of users being less than or equal to corresponding credit rating.Can find out by figure, the number of users that credit rating is less than or equal to 60 is 30%.
Step S104, obtains the querying condition preset, screens the user ID obtaining condition users according to querying condition in the user ID of overall user.
In the present embodiment, the conditional information of specific user colony (i.e. condition users) is filtered out in the registered user that querying condition is applied at social networks.It can be the one in IP address, message identifier, groups of users mark or individual subscriber information attribute value.
Such as, if a large amount of message has been mass-sended in certain IP address in one month, namely have submitted a large amount of UGC to social interaction server device, the user that then this IP address is corresponding may be malicious user, this malicious user may use under this IP address multiple social networks account send junk information, can according to using this IP address as querying condition.
Again such as, if certain message be forwarded repeatedly by multiple user, then the user forwarded also is probably malicious user, and it forwards rubbish message in malice, this message identifier can be found out the condition users of all this message of forwarding as querying condition.
For groups of users mark and individual subscriber information attribute value, lawless person first registers multiple social networks application account usually, by switching mass-sending rubbish message, these lawless persons are for mass-sending the social networks application account of junk information usually when registering, the individual subscriber information attribute value of filling in has similitude (being such as the people in same place, the graduation of same school etc.) usually, and individual subscriber information attribute value can be used to filter out this part user as querying condition.
Step S106, obtains the credit rating that the user ID of condition users is corresponding, the credit rating formation condition user credit distribution corresponding according to the user ID of condition users.
As previously mentioned, calculated user's credit rating separately of registration in overall user and social networks application, then can obtain the credit rating that the user ID of the condition users filtered out is corresponding.The distribution of condition users credit is the credit rating distribution generated using condition users as sample space, as shown in Figure 3.
Further, the number of the user ID of searching the condition users obtained also can be obtained before obtaining the step of credit rating corresponding to the user ID of condition users, judge whether it is greater than recognition threshold, if so, then perform the step obtaining credit rating corresponding to the user ID of condition users.
That is, if according to the negligible amounts of the user ID that querying condition filters out, be less than recognition threshold, then can judge that the user that this querying condition is corresponding is not malicious user (the malicious user mass-sending rubbish message larger because of harmfulness needs very many social networks application accounts usually).
Step S108, obtains the credit threshold preset, and obtains overall high-quality/malicious user number and condition high-quality/malicious user number that its credit rating is greater than/is less than credit threshold in overall user credit distribution and the distribution of condition users credit respectively.
As shown in Figures 2 and 3, if use high-quality number of users to carry out identifying and credit threshold is 60, then can obtain the overall high-quality number of users that credit rating is greater than 60 is; The condition high-quality number of users that credit rating is greater than 60 is.If use malicious user number to carry out identifying and credit threshold is 30, then can obtain the overall malicious user number that credit rating is less than 30 is, the condition malicious user number that credit rating is less than 30 is.
Step S110, obtains the high-quality/malicious user number threshold value preset, calculates the difference of overall high-quality/malicious user number and described condition high-quality/malicious user number, according to difference and described high-quality/whether corresponding malicious user of malicious user number threshold decision querying condition.
If use high-quality number of users to identify, then overall high-quality number of users and condition high-quality number of users are subtracted each other, if difference is greater than high-quality number of users threshold value, the social networks application account for mass-sending junk information that the user ID that then querying condition the is corresponding i.e. corresponding malicious user of possibility is registered in a large number, namely judges the corresponding malicious user of this querying condition.
If use malicious user number to identify, then condition malicious user number and overall malicious user number (percentage) are subtracted each other, if difference is greater than malicious user number threshold value, the social networks application account for mass-sending junk information that the user ID that then querying condition the is corresponding i.e. corresponding malicious user of possibility is registered in a large number, namely judges the corresponding malicious user of this querying condition.
After determined the corresponding malicious user of querying condition, the keeper of social networks application then can the user ID corresponding to corresponding querying condition process.Such as, if querying condition is IP address, then the keeper of social networks application can by this IP address mask.
In one embodiment, also comprise after calculating the step of credit rating corresponding to the user ID of overall user:
Obtain the grade interval threshold value preset, determine the credit rating grade belonging to the credit rating that the user ID of described overall user is corresponding according to grade interval threshold value.
And credit threshold is default target credit rating grade.
As shown in Figure 4 and Figure 5, the credit rating of (between linear scale to 0 to 100) after normalization can be divided into 5 grades by grade interval threshold value, and comprising: poor, poor, general, better, good, each grade span is the credit rating of 20.Then namely overall user credit distribution and the distribution of condition users credit convert the discrete column distribution only comprising 5 columns to.The credit threshold preset can be set to general or better, then when calculating overall high-quality number of users, then can be that better or good number of users is as overall high-quality user using credit rating grade.
Use credit rating grade, simply discrete column distribution can be used to calculate, thus decrease amount of calculation, improve recognition efficiency.
In one embodiment, as shown in Figure 6, a kind of malicious user recognition device, comprising: overall situation distribution generation module 10, condition users search module 20, condition distributes generation module 30, user screens module 40 and malicious user identification module 50, wherein:
Overall situation distribution generation module 10, for obtaining the user ID of overall user, calculates the credit rating of its correspondence, and the credit rating corresponding according to the user ID of overall user generates the distribution of overall user credit.
Condition users searches module 20, for obtaining default querying condition, screens the user ID obtaining condition users according to querying condition in the user ID of overall user.
Condition distribution generation module 30, the credit rating that the user ID for obtaining condition users is corresponding, the credit rating formation condition user credit distribution corresponding according to the user ID of condition users.
User screens module 40, for obtaining default credit threshold, in overall user credit distribution and the distribution of condition users credit, obtains the overall high-quality/malicious user number and the condition high-quality/malicious user number that are greater than/are less than credit threshold respectively.
Malicious user identification module 50, for obtaining default high-quality/malicious user number threshold value, calculate the difference of overall high-quality/malicious user number and condition high-quality/malicious user number, according to difference and high-quality/whether corresponding malicious user of malicious user number threshold decision querying condition.
In one embodiment, overall situation distribution generation module 10 also submits to number of times, online hours and rubbish message to submit number of times to for the message that the user ID obtaining overall user is corresponding; After submitting to number of times to be multiplied by the corresponding weight coefficient preset respectively by message being submitted to number of times, online hours and rubbish message, superposition obtains credit rating corresponding to the user ID of overall user.
In one embodiment, overall situation distribution generation module 10, also for obtaining default grade threshold, determines the credit rating grade belonging to the credit rating that the user ID of overall user is corresponding according to grade threshold; And credit threshold is default target credit rating grade.
In one embodiment, querying condition is the one in IP address, message identifier, groups of users mark or individual subscriber information attribute value.
In one embodiment, condition distribution generation module 30, also for obtaining the number of the user ID of searching the condition users obtained, judging whether it is greater than recognition threshold, if so, then obtaining the credit rating that the user ID of condition users is corresponding.
Above-mentioned malicious user recognition methods, condition users credit rating distribution according to condition users corresponding to querying condition carrys out the whether corresponding malicious user of decision condition user with the otherness that the overall user credit degree that social networks applies the overall user registered distributes, compared with conventional art, even if some junk information that with the addition of interference leaks through inspection, still the message otherness on the whole submitted to by malicious user and non-malicious user is identified, thus the accuracy rate identified is higher.
As shown in Figure 7, Fig. 7 is the module map of a computer system 1000 that can realize the embodiment of the present invention.This computer system 1000 is an example being applicable to computer environment of the present invention, can not think to propose any restriction to the scope of application of the present invention.Computer system 1000 can not be interpreted as the combination needing the one or more parts depending on or have in illustrated exemplary computer system 1000.
Computer system 1000 shown in Fig. 7 is the examples being suitable for computer system of the present invention.Other framework with different sub-systems configuration also can use.The similar devices such as the desktop computer known by masses, notebook, personal digital assistant, smart phone, panel computer, portable electronic device, Set Top Box are such as had to go for some embodiments of the present invention.But be not limited to above cited equipment.
As shown in Figure 7, computer system 1000 comprises processor 1010, memory 1020 and system bus 1022.The various system components comprising memory 1020 and processor 1010 are connected on system bus 1022.Processor 1010 is the hardware being used for being performed by arithmetic sum logical operation basic in computer system computer program instructions.Memory 1020 be one for storing the physical equipment of calculation procedure or data (such as, program state information) temporarily or permanently.System bus 1020 can be any one in the bus structures of following several types, comprises memory bus or storage control, peripheral bus and local bus.Processor 1010 and memory 1020 can carry out data communication by system bus 1022.Wherein memory 1020 comprises read-only memory (ROM) or flash memory (all not shown in figure), and random-access memory (ram), and RAM typically refers to the main storage being loaded with operating system and application program.
Computer system 1000 also comprises display interface 1030(such as, Graphics Processing Unit), display device 1040(such as, liquid crystal display), audio interface 1050(such as, sound card) and audio frequency apparatus 1060(such as, loud speaker).Display device 1040 and audio frequency apparatus 1060 are the media devices for experiencing content of multimedia.
Computer system 1000 generally comprises a memory device 1070.Memory device 1070 can be selected from multiple computer-readable medium, and computer-readable medium refers to any available medium can accessed by computer system 1000, that comprise movement and fixing two media.Such as, computer-readable medium includes but not limited to, flash memory (miniature SD card), CD-ROM, digital versatile disc (DVD) or other optical disc storage, cassette, tape, disk storage or other magnetic storage apparatus, or can be used for storing information needed and other medium any can accessed by computer system 1000.
Computer system 1000 also comprises input unit 1080 and input interface 1090(such as, I/O controller).User can pass through input unit 1080, and as the touch panel equipment in keyboard, mouse, display unit 1040, input instruction and information are in computer system 1000.Input unit 1080 is normally connected on system bus 1022 by input interface 1090, but also can be connected by other interface or bus structures, as USB (USB).
Computer system 1000 can be carried out logic with one or more network equipment in a network environment and is connected.The network equipment can be PC, server, router, smart phone, panel computer or other common network node.Computer system 1000 is connected with the network equipment by Local Area Network interface 1100 or mobile comm unit 1110.Local Area Network refers in finite region, such as family, school, computer laboratory or use the office building of the network media, the computer network of interconnected composition.WiFi and twisted-pair feeder wiring Ethernet are two kinds of technology of the most frequently used structure local area network (LAN).WiFi is a kind of technology that can make computer system 1000 swapping data or be connected to wireless network by radio wave.Mobile comm unit 1110 can be answered by radio communication diagram while movement and call in a wide geographic area.Except call, mobile comm unit 1110 is also supported in the 2G providing mobile data service, carries out internet access in 3G or 4G cellular communication system.
It should be pointed out that other computer system comprising the subsystem more more or less than computer system 1000 also can be applicable to invention.Such as, computer system 1000 can comprise can in short distance the bluetooth unit of swap data, for the imageing sensor of taking a picture, and for the accelerometer of acceleration measurement.
As described in detail, be applicable to the assigned operation that computer system 1000 of the present invention can perform treatment state methods of exhibiting above.The form of the software instruction that computer system 1000 is operated in computer-readable medium by processor 1010 performs these operations.These software instructions can be read into memory 1020 from memory device 1070 or by lan interfaces 1100 from another equipment.The software instruction be stored in memory 1020 makes processor 1010 perform above-mentioned treatment state methods of exhibiting.In addition, also the present invention can be realized equally by hardware circuit or hardware circuit in conjunction with software instruction.Therefore, the combination that the present invention is not limited to any specific hardware circuit and software is realized.
The above embodiment only have expressed several execution mode of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection range of patent of the present invention should be as the criterion with claims.