CN106469179A - A kind of information monitoring method and device - Google Patents

A kind of information monitoring method and device Download PDF

Info

Publication number
CN106469179A
CN106469179A CN201510518846.9A CN201510518846A CN106469179A CN 106469179 A CN106469179 A CN 106469179A CN 201510518846 A CN201510518846 A CN 201510518846A CN 106469179 A CN106469179 A CN 106469179A
Authority
CN
China
Prior art keywords
feature
account
character feature
character
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510518846.9A
Other languages
Chinese (zh)
Inventor
郑丹丹
林述明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510518846.9A priority Critical patent/CN106469179A/en
Publication of CN106469179A publication Critical patent/CN106469179A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

This application discloses a kind of information monitoring method and device, the method includes:Determine the accounts information of the account each to be identified receiving in special time period, extract the character feature in each accounts information receiving, according to the character feature extracting, count the account quantity in described special time period with identical character feature respectively, according to each character feature corresponding account quantitative criteria pre-building, and there is in the described special time period counting the account quantity of identical character feature, judge whether each accounts information is fallacious message.Therefore, normal accounts information and malice accounts information, in the accounts information providing in the face of user, can effectively be distinguished by Internet service provider, improves the accuracy rate distinguishing normal accounts information and malice accounts information.

Description

A kind of information monitoring method and device
Technical field
The application is related to field of computer technology, more particularly, to a kind of information monitoring method and device.
Background technology
With the continuous development of network technology, Internet service provider is (such as:Website) receive user provide user After information, all kinds of abundant network services can be provided the user.
At present, the accounts information received by Internet service provider includes different types of information, such as:User The accounts information of registration on certain commodity website, or the accounts information registered on certain game website.Network The accounts information that user provides can be stored in the webserver for service provider.But, the use that user is provided Accounts information may for the accounts information of malice (such as:The batch registration accounts information of malice), these malice Accounts information can affect the normal operation of Internet service provider, cause the unnecessary wasting of resources.
In prior art, the accounts information that the docking of the network service chamber of commerce receives is identified and processes, generally, By in the accounts information receiving, the accounts information with same or like information characteristics carries for Internet service provider Take out, e.g., the same or analogous prefix of the account name in accounts information, then by its in accounts information His attribute, quantifies the probability size that accounts information is Mass production, e.g., its included in accounts information His attribute includes userspersonal information (such as:Address name, subscriber phone), and the equipment letter being comprised Breath, such as:Agreement (Internet Protocol, the IP) address of interconnection between network.Other attribute identicals More, then prove accounts information be Mass production probability bigger, then, account information be malice account Family information probability is also bigger, otherwise then it is considered that account information is normal accounts information.
For example:In certain commodity website, user's login account information on this commodity website, wherein, registration Accounts information include:Luha001@163.com, luha002@163.com, luha003@163.com, Luha004@163.com, luha005@163.com, these accounts informations have obvious similarity:In prefix Letter identical, and the regular growth of the digital Cheng Zizeng in prefix.These accounts informations are likely to be Malice accounts information, then, what the server of commodity website can be registered to this user comprises above-mentioned mailbox Accounts information extracts, and counts other attributes included in these accounts informations (such as:Address name, Subscriber phone), other attributes included in these accounts informations are compared, identical attribute is more, Just illustrate that the accounts information that this user is registered on the web site is bigger as the probability of malice accounts information.
But, for the mailbox more than some numbers of users, newly-increased quantity is larger daily, and millions of is new Increase register user for, even if the mailbox having a fairly large number of model identical also belongs to normal, be necessarily required to by Other accounts informations, the computation complexity so accounts information being identified is high, and easily flase drop, Meanwhile, the facility information in accounts information is (such as:IP) less stable, user can also use certain A little network equipment change IP, thus, Internet service provider's normal accounts information in distinguishing accounts information can be led to Relatively low with malice accounts information accuracy rate.
Content of the invention
The embodiment of the present application provides a kind of information monitoring method and device, is distinguishing in order to solve Internet service provider Normal accounts information and the relatively low problem of malice accounts information accuracy rate in accounts information.
A kind of information monitoring method that the embodiment of the present application provides, including:
Determine the accounts information of the account each to be identified receiving in special time period;
Extract the character feature in each accounts information receiving;
According to the character feature extracting, count respectively and there is in described special time period identical character spy The account quantity levied;
According to each character feature corresponding account quantitative criteria pre-building, and count described specific There is in time period the account quantity of identical character feature, judge whether each accounts information is fallacious message.
A kind of information monitoring device that the embodiment of the present application provides, including:
Receiver module, for determining the accounts information of the account each to be identified receiving in special time period;
Extraction module, for extracting the character feature in each accounts information receiving;
Statistical module, for according to the character feature extracting, counting phase in described special time period respectively The account quantity of same character feature;
Judge module, for according to each character feature corresponding account quantitative criteria pre-building, Yi Jitong There is in the described special time period counted out the account quantity of identical character feature, judge that each accounts information is No for fallacious message.
The embodiment of the present application provides a kind of information monitoring method and device, and the method is directed to user's letter to be monitored The accounts information of the account each to be identified that breath, the first webserver receive in determination special time period, Extract the character feature in each accounts information receiving, according to the character feature extracting, count respectively There is the account quantity of identical character feature, according to each character feature pair pre-building in special time period The account quantitative criteria answered, and there is in the special time period counting the account number of identical character feature Amount, judges whether each accounts information is fallacious message, therefore, Internet service provider is in the account providing in the face of user During the information of family, can effectively distinguish normal accounts information and malice accounts information, improve differentiation normal Accounts information and the accuracy rate of malice accounts information.
Brief description
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes of the application Point, the schematic description and description of the application is used for explaining the application, does not constitute to the application not Work as restriction.In the accompanying drawings:
The process schematic of the information monitoring method that Fig. 1 provides for the embodiment of the present application;
The structural representation of the information monitoring device that Fig. 2 provides for the embodiment of the present application.
Specific embodiment
Purpose, technical scheme and advantage for making the application are clearer, specifically real below in conjunction with the application Apply example and corresponding accompanying drawing is clearly and completely described to technical scheme.Obviously, described Embodiment is only some embodiments of the present application, rather than whole embodiments.Based on the enforcement in the application Example, the every other enforcement that those of ordinary skill in the art are obtained under the premise of not making creative work Example, broadly falls into the scope of the application protection.
The information monitoring process that Fig. 1 provides for the embodiment of the present application, specifically includes following steps:
S101:Determine the accounts information of the account each to be identified receiving in special time period.
Wherein, described accounts information, including but not limited to user are filled out on webpage (or application interface) The accounts information write, described special time period can be current slot or pass by sometime Section.
In the embodiment of the present application, the webserver receive user registration each accounts information and store, work as net When network server receives decision instruction, first have to determine special time period (e.g., one day, the past in past One hour etc.) each accounts information of interior user's registration (and wherein, described each accounts information be each to be identified The accounts information of account), then determine the accounts information of each account to be identified, described decision instruction is used for making net Network server judges whether the accounts information of each account to be identified is fallacious message.
S102:Extract the character feature in each accounts information receiving.
Wherein, described character feature, in order to characterize the information of accounts information feature, as the word of accounts information Symbol quantity.
In the embodiment of the present application, the webserver receives the account letter that special time period user is provided Breath, extracts the information containing a certain feature in accounts information.As:In current slot, a certain use The account of this commodity website is registered at family on certain commodity website, and the server of this commodity website can receive this use The accounts information that family is registered, and using a certain specific feature extracting method (such as:Without clear and definite account letter Numeral in breath specifically how many it is only necessary to the position of reference numbers) accounts information registered in this user In extract the character feature containing individual features, the such as quantity of numeral.
S103:According to the character feature extracting, count respectively in described special time period and there is identical The account quantity of character feature.
In the embodiment of the present application, in special time period, the webserver is according to specific feature extraction side Method is (such as:Without the numeral in clear and definite accounts information specifically how many it is only necessary to the position of reference numbers) Extract character feature in accounts information, will have identical character feature and be classified as same category, and count this The character feature corresponding account quantity being comprised in classification, and be stored in the webserver.
As it is assumed that in current slot, the server of certain commodity website receives the accounts information of user (such as:Account name) be:Dafa123, dasa324, dafa897, dasa898 are it is assumed that this specific feature Extracting method is:Without the numeral in clear and definite specify information specifically how many it is only necessary to the position of reference numbers, Then the character feature of above-mentioned accounts information is:Dafa^^^, dasa^^^, dafa^^^, dasa^^^ it is clear that The character feature dafa^^^ of the character feature dafa^^^ and dafa897 of dafa123 is identical, the word of dasa324 Symbol feature dasa^^^ is identical with the character feature dasa^^^ of dasa898, and therefore, server is by above-mentioned character In feature, the character feature dafa^^^ of character feature dafa^^^ and dafa897 of dafa123 is classified as the first kind Not, the character feature dasa^^^ of character feature dasa^^^ and dasa898 of dasa324 is classified as Equations of The Second Kind Not it is clear that the character feature corresponding account quantity that first category comprises is 2, the word that second category comprises Symbol feature corresponding account quantity is 2, by corresponding with second category for the first category coming out account number Amount is stored in the webserver.
S104:According to each character feature corresponding account quantitative criteria pre-building, and the institute counting State the account quantity in special time period with identical character feature, judge whether each accounts information is malice Information.
Wherein, the corresponding account quantitative criteria of described each character feature, including corresponding according to each character feature The criterion that account quantity is set up, e.g., the average of character feature corresponding account quantity.
In special time period, the webserver counts the account of user according to specific feature extracting method Different classes of character feature corresponding account quantity in information, in each character feature pair pre-building Find in the account quantitative criteria (e.g., the average of character feature corresponding account quantity) answered with described certain The identical character feature classification of one classification character feature, by the account quantity mark in this character feature classification Standard takes out, and is compared judgement.
Continuation of the previous cases it is assumed that in each character feature corresponding account quantitative criteria pre-building, The character feature corresponding account quantitative criteria of one classification is 3, the corresponding account of character feature of second category Quantitative criteria is 1, and the character feature corresponding account quantity of the first category that the webserver comes out For 2, without departing from the category corresponding account quantitative criteria, then the webserver can be to the word of first category Accounts information corresponding to symbol feature carries out clearance process (that is, not carrying out any process to account information), The character feature corresponding account quantity of the second category that the webserver comes out is 2, beyond such Not corresponding account quantitative criteria, then the webserver can to the character feature of second category corresponding account letter Breath carries out air control process (that is, carrying out behavior early warning).
By above-mentioned steps, in special time period, the webserver receives the account of each account to be identified Information, accounts information extracts corresponding character feature by feature extracting method, will have identical character Feature is classified as same category, counts the corresponding account of character feature of each classification in special time period respectively Quantity, for the character feature of each classification, finds out institute in each character feature corresponding account quantitative criteria State each classification corresponding account quantitative criteria, thus judge whether described accounts information is fallacious message. Therefore, normal account, in the accounts information providing in the face of user, can effectively be distinguished by Internet service provider Information and malice accounts information, improve the accuracy rate distinguishing normal accounts information and malice accounts information.
For the clearer described information monitoring method illustrating the application, included with described accounts information below Account name, described special time period includes entering in case of the time span that the default unit interval divides Row describes in detail.
In actual applications, user can register corresponding accounts information on some commodity websites, can with this Continue to carry out meeting the operation of oneself needs on commodity website, but, the accounts information of user's registration has can Can be the accounts information of malice, therefore the webserver, after the accounts information receiving described user, is determined Account name in accounts information, according at least one feature extracting method default, carries from each account name respectively Take out character feature.Wherein, described default multiple feature extracting methods, including the number of characters in account name Amount, character types, the combination in any of character sortord.
As user have registered the account name in accounts information in certain commodity website and includes:Fawd2431, faad 783, fawd 7972, faad442, luha8988 are it is assumed that feature extracting method has eight, respectively:
Method one:Obscure all numerals in account name, and retain the digital number being blurred, wherein, Obscure refer to specifically how many without clear and definite numeral;
Method two:Obscure all numerals in account name, without clear and definite digital number it is only necessary to identify Going out the part obscuring is numeral;
Method three:Obscure all letters in account name, and retain the alphabetical number being blurred;
Method four:Obscure all letters in account name, without clear and definite alphabetical number it is only necessary to identify The part obscuring is letter;
Method five:Obscure all characters in addition to specified location in account name, individual without clear and definite numeral Count it is only necessary to retain the number of the nonnumeric character (including alphabetic character and sign character) being blurred, Wherein it is intended that position will be nonnumeric character;
Method six:Obscure all characters in addition to specified location in account name, without clearly obscuring institute Some character numbers it is only necessary to the part that is blurred of mark is digital or nonnumeric, wherein it is intended that position Put nonnumeric character to be;
Method seven:Obscure all of monogram in account name, obscure all of blockette in account name Close it is only necessary to the part that mark is blurred is numeral combination or monogram;
Method eight:Obscure all character combinations in account name, without the character combination being clearly blurred In character number it is only necessary to mark be blurred be part be character combination, wherein, described character group Close the combination referring to other any characters in addition to playing the separating character of segmentation effect.
This eight methods are parallel presence, and account name often will extract one according to a feature extracting method Individual corresponding character feature, specifically:
When above-mentioned account name according to the character feature that method one is extracted is:Fawd^^^^, faad^^^, Fawd^^^^, faad^^^, luha^^^^;
When above-mentioned account name according to the character feature that method two is extracted is:Fawd^, faad^, fawd ^, faad^, luha^;
When above-mentioned account name according to the character feature that method three is extracted is:Cccc2431, cccc783, Cccc7972, cccc442, cccc8988;
When above-mentioned account name according to the character feature that method four is extracted is:C2431, c783, c7972, C442, c8988;
When above-mentioned account name according to the character feature that method five is extracted is:Facc^, facc^, facc^, Facc^, lucc^, wherein it is intended that being front two nonnumeric character at position;
When above-mentioned account name according to the character feature that method six is extracted is:Fac^, fac^, fac^, fac^, Luc^, wherein it is intended that being front two nonnumeric character at position;
When above-mentioned account name according to the character feature that method seven is extracted is:C^, c^, c^, c^, c^;
When above-mentioned account name according to the character feature that method eight is extracted is:X, x, x, x, x.
" c " in above-mentioned represents alphabetical identifier, and " ^ " represents numeric identifier, and " x " represents character group Close identifier.Obviously, above-mentioned 5 account names have according to the character feature that eight feature extracting methods extract 22.
In actual applications, the accounts information that user is registered is not limited only to 5 in example, but can Reach hundred grades thousand grades even ten thousand grades, the enforcement of this implementation steps to be only described here taking 5 accounts informations as a example Process, certainly, above-mentioned in characteristics information extraction method be also not limited to 8, can set as needed Fixed N number of extracting method.
In the embodiment of the present application, in special time period, extracted in account name according to feature extracting method Different character features, the webserver will have identical character feature and be classified as same category, count respectively Go out character feature corresponding account quantity in each classification, and by each class being counted in special time period The corresponding account quantity of not middle character feature is deposited in the webserver.
Adopt example, user have registered the account name in accounts information in certain commodity website and still includes: Fawd2431, faad 783, fawd 7972, faad442, luha8988 are it is assumed that feature extracting method only selects Select three:It is the feature extracting method one in upper example, feature extracting method two and feature extracting method seven respectively, The account name of user according to the character feature that feature extracting method one is extracted is:Fawd^^^^, faad ^^^, fawd^^^^, faad^^^, luha^^^^;When above-mentioned account name is extracted according to feature extracting method two Character feature out is:Fawd^, faad^, fawd^, faad^, luha^;Account name carries according to feature The character feature that method seven is extracted is taken to be:C^, c^, c^, c^, c^, above-mentioned 5 account name roots Have 15 according to the character feature that above three feature extracting method extracts, in the character feature extracting, Containing identical character feature, same category can be classified as by having identical character feature, that is, above-mentioned 15 character features can be divided into 7 classifications, specifically includes:Fawd^^^^, faad^^^, luha^^^^, fawd^, Faad^, luha^, c^.Afterwards, the webserver can count the corresponding account of character feature in each classification Amount amount, specifically, the character feature corresponding account quantity comprising in classification " fawd^^^^ " is 2, The character feature corresponding account quantity comprising in classification faad^^^ is 2, the word comprising in classification luha^^^^ Symbol feature corresponding account quantity is 1, and the character feature comprising in classification fawd^ corresponding account quantity is 2, the character feature comprising in classification faad^ corresponding account quantity is 2, the word comprising in classification luha^ Symbol feature corresponding account quantity is 1, and the character feature comprising in classification c^ corresponding account quantity is 5, And the character feature corresponding account quantity being comprised above-mentioned each classification is deposited in the webserver.
The webserver counts the character feature corresponding account quantity that kinds of characters feature classification is comprised, By corresponding for each character feature account quantity and each character feature corresponding account quantitative criteria pre-building Compare, accordingly, it would be desirable to pre-build each character feature corresponding account quantitative criteria.
In the embodiment of the present application, pre-build each character feature corresponding account quantitative criteria, specifically include: Previously according to the character feature in the historical account information receiving in multiple historical time sections, will be described In each historical time section, identical character feature is sorted out, and wherein, described historical time section is specific with described The time span of time period is identical, and for each feature classification, counting respectively should in each historical time section The history character feature corresponding account quantity of feature classification, this feature classification according to counting is gone through each History character feature corresponding account quantity in the history time period, determines that the character feature of this feature classification corresponds to Average in all historical time sections for the account quantity and standard deviation, according to the corresponding account of this feature classification The average of quantity and standard deviation, determine this feature classification corresponding account quantitative criteria.
For example, it is assumed that setting 4 historical time sections, each historical time Duan Weiyi days, receive within first day The historical account information quantity of each user (assume the historical account of this 100 historical account information for 100 Name is respectively provided with identical character feature), wherein, with the historical account name comprising in two historical account information: As a example fawd2431, faw 783.Continue to use above-mentioned in feature extracting method one and two, when above-mentioned account name According to the character feature that method one is extracted it is:Fawd^^^^, faw^^^;When above-mentioned account name is according to side The character feature that method two is extracted is:Fawd^, faw^;Above-mentioned " ^ " represents numeric identifier.
Wherein, the webserver using character feature fawd^^^^ as first category, by character feature faw^^^ As second category, using character feature fawd^ as the 3rd classification, using character feature faw^ as the 4th class Not, the webserver will count the comprised character feature of each classification and correspond to according to this 100 account names Account quantity, it is, the corresponding account of character feature of each classification that the webserver comes out Total quantity includes:It is 70 that first category comprises character feature corresponding account quantity, and second category comprises character Feature corresponding account quantity is 30, and it is 60 that the 3rd classification comprises character feature corresponding account quantity, the It is 40 that four classifications comprise character feature corresponding account quantity.
The historical account information quantity of each user that hypothesis receives for second day is 100, wherein, according to above-mentioned The character feature that feature extracting method one extracts includes the character feature corresponding account number of first category Measure as 60, the character feature corresponding account quantity of second category is 40, according to features described above extracting method The character feature corresponding account quantity that two character features extracting include the 3rd classification is 50, the 4th class The corresponding account quantity of other character feature is 50.
Assume that the historical account information quantity of each user receiving is 100, wherein, according to above-mentioned the 3rd day The character feature that feature extracting method one extracts includes the character feature corresponding account number of first category Measure as 50, the character feature corresponding account quantity of second category is 50, according to features described above extracting method The character feature corresponding account quantity that two character features extracting include the 3rd classification is 40, the 4th class The corresponding account quantity of other character feature is 60.
Assume that the historical account information quantity of each user receiving is 100, wherein, according to above-mentioned the 4th day The character feature that feature extracting method one extracts includes the character feature corresponding account number of first category Measure as 80, the character feature corresponding account quantity of second category is 20, according to features described above extracting method The character feature corresponding account quantity that two character features extracting include the 3rd classification is 50, the 4th class The corresponding account quantity of other character feature is 50.
The webserver counts the account quantity corresponding to each classification in this four days every day, that is,:Each Character feature all corresponds to certain account quantity in every day.Can be simulated by normal distribution, each character The feature average of corresponding account quantity and standard deviation daily, that is,:First category character feature is corresponding daily The average of account quantity is 65, and standard deviation is 11, second category character feature corresponding account quantity daily Average be 35, standard deviation is 11, and the average of the 3rd classification character feature corresponding account quantity daily is 55, standard deviation be 11, the 4th classification character feature daily corresponding account quantity average be 45, standard Difference is for 11 it is assumed that adopting formula (μ+k σ) in this example as each character feature corresponding account number Amount standard, wherein μ represent the average of classification character feature corresponding account quantity daily, and k represents and extremely refers to Mark coefficient, assumes that k represents classification character feature corresponding account quantity daily for 2, σ in application example Standard deviation, then first category character feature corresponding account quantitative criteria be 87, second category character feature Corresponding account quantitative criteria is 57, and the corresponding account quantitative criteria of the 3rd classification character feature is 77, the The corresponding account quantitative criteria of four classification character features is 67, by each classification character feature pair obtained above The numerical value of account quantitative criteria answered and the average of category character feature corresponding account quantity and standard Difference is stored in the data base of the webserver as feature quantity standard.
In the embodiment of the present application, in special time period, the webserver is through above-mentioned steps basis Specific feature extracting method extracts the character feature of account name in all accounts informations, will have identical word Symbol feature is classified as same category, counts the character feature corresponding account quantity that each classification is comprised, and upper State each character feature corresponding account quantitative criteria set up to compare, judge described each accounts information Whether it is fallacious message, therefore, judge whether described each accounts information is fallacious message, specifically include:Pin To each feature classification, judge the character feature corresponding account quantity in this feature classification in special time period Whether it is more than this feature classification corresponding account quantitative criteria, the tool if so, then receiving in special time period The corresponding accounts information of character feature in this feature classification is had to be fallacious message, otherwise, then special time period The corresponding accounts information of character feature having in this feature classification inside receiving is normal information.
The character feature corresponding account letter having in this feature classification receiving in by special time period Before breath is defined as fallacious message, it is more than account quantitative criteria for each character feature corresponding account quantity Feature classification, determine character feature corresponding account quantity in this feature classification in special time period with should The difference of the average of feature classification corresponding account quantity, determines described difference account corresponding with this feature classification The ratio of the standard deviation of amount amount, in this ratio determined, determines the maximum feature classification of ratio, The maximum accounts information corresponding to feature classification of described ratio is fallacious message.
In corresponding each accounts information in above-mentioned fallacious message, after feature extracting method, containing identical Character feature, the character feature of these accounts informations is classified as one group, character feature corresponding account number Amount has exceeded account quantitative criteria, just illustrates that the quantity of these accounts informations has exceeded daily and above-mentioned account letter Breath has the quantity of the normal accounts information of identical characters feature it is possible to be belonging to the malice account of batch registration Information.
Continuation of the previous cases, it is assumed that the account name quantity that the same day webserver receives user's registration is 155, is led to Crossing the first category character feature corresponding account quantity that features described above extracting method one extracts is 76, the The corresponding account quantity of two classification character features is 79, the being extracted by features described above extracting method two The corresponding account quantity of three classification character features is 77, and the corresponding account quantity of the 4th classification character feature is 78, according in the above-mentioned each character feature corresponding account quantitative criteria set up it should be apparent that First category character feature corresponding account quantity is less than this first category character feature corresponding account quantity Standard, second category character feature corresponding account quantity is more than this corresponding account of second category character feature Quantitative criteria, it is corresponding that the corresponding account quantity of the 3rd classification character feature is equal to the 3rd classification character feature Account quantitative criteria, the corresponding account quantity of the 4th classification character feature is more than the 4th classification character feature pair The account quantitative criteria answered.
Therefore, for aforementioned four classification character feature, first category character feature and the 3rd classification character are special Levy corresponding account quantity and be not above account quantitative criteria, and second category character feature and the 4th class malapropism Symbol feature corresponding account quantity has exceeded account quantitative criteria, therefore, for exceeding account quantitative criteria For second category character feature and the 4th classification character feature, second category character feature corresponding account number The difference measuring account number average value corresponding with history second category character feature is 44, that is,:79-35=44, The corresponding account quantity of 4th classification character feature account quantity corresponding with history the 4th classification character feature The difference of average is 33, that is,:78-45=33, the difference of second category character feature corresponding account quantity with The ratio of the standard deviation of history second category character feature corresponding account quantity is 4, and the 4th classification character is special Levy the standard deviation of the difference account quantity corresponding with history the 4th classification character feature of corresponding account quantity Ratio be 3 it is clear that the ratio of second category character feature corresponding account quantity is in this two ratios Maximum, thus may determine that the accounts information corresponding to second category character feature is fallacious message.
In the examples described above, judge that the accounts information corresponding to second category is fallacious message, in this regard, These fallacious messages are processed accordingly, that is,:Fallacious message corresponding to second category is removed, moves The quantity removed is 79, wherein, counts the account quantity comprising this fallacious message in other classifications, and from every These account quantity are removed it is assumed that the character feature corresponding account quantity of the 3rd classification is 77 in one classification, In 3rd classification, the account quantity of fallacious message is 35, the character feature corresponding account number of the 4th classification Measure as 78, in the 4th classification, the account quantity of fallacious message is 44, therefore, moves in each category After the account quantity of these fallacious messages, the character feature corresponding account quantity of the 3rd classification is 42, The character feature corresponding account quantity of the 4th classification is 34, and recalculates and judge new first category, 3rd classification, whether the corresponding account quantity of the 4th classification exceedes account quantitative criteria, until all categories Character feature corresponding account quantity all no longer has more than account quantitative criteria, and remaining character feature is corresponding Accounts information is all normal information.
In actual applications, the webserver can store each classification coming out in special time period Character feature corresponding account quantity, for the character feature corresponding account quantity continuously several days of a certain classification The above corresponding account quantitative criteria of the category, such as:Continuous three days of second category corresponding account quantity Above second category corresponding account quantitative criteria, then the webserver can be to corresponding to second category Accounts information carries out early warning.
The information monitoring method providing for the embodiment of the present application above, based on same thinking, the application is implemented Example also provides a kind of information monitoring device.
As shown in Fig. 2 a kind of information monitoring device that the embodiment of the present application provides includes:
Receiver module 201, for determining the accounts information of the account each to be identified receiving in special time period;
Extraction module 202, for extracting the character feature in each accounts information receiving;
Statistical module 203, for according to the character feature extracting, counting described special time period respectively The account quantity of interior identical character feature;
Judge module 204, each character feature corresponding account quantitative criteria pre-building for basis, with And there is in the described special time period counting the account quantity of identical character feature, judge each account letter Whether breath is fallacious message.
In the embodiment of the present application, described accounts information includes account name;Described special time period is included with pre- If unit interval divide time span.
Described extraction module 202, specifically for determining the account name in each accounts information, according to default At least one feature extracting method, extracts character feature from each account name respectively.
Described statistical module 203, specifically for each character that will be extracted in described special time period In feature, identical character feature is sorted out, and counts the corresponding account of character feature in each feature classification respectively Amount amount.
Described device also includes:
Pre-build module 205, specifically for previously according to going through of receiving in multiple historical time sections Character feature in history accounts information, identical character feature will be returned in described each historical time section Class, wherein, described historical time section is identical with the time span of described special time period, for each feature Classification, counts the history character feature corresponding account number of this feature classification in each historical time section respectively Amount, according to the corresponding account of history character feature in each historical time section for this feature classification counting Quantity, determines the character feature corresponding account quantity of this feature classification average in all historical time sections And standard deviation, the average according to this feature classification corresponding account quantity and standard deviation, determine this feature class Not corresponding account quantitative criteria.
Described judge module 204, specifically for for each feature classification, judging should in special time period Whether the corresponding account quantity of the character feature in feature classification is more than this feature classification corresponding account quantity Standard, the corresponding account of character feature having in this feature classification if so, then receiving in special time period Family information is fallacious message, otherwise, the then character having in this feature classification that receives in special time period The corresponding accounts information of feature is normal information.
Described judge module 204, specifically for for each feature classification, determining should in special time period The average of the corresponding account quantity of the character feature in feature classification account quantity corresponding with this feature classification Difference, determine the ratio of the standard deviation of described difference account quantity corresponding with this feature classification, determine In this ratio going out, determine the maximum feature classification of ratio.
Described device also includes:
Processing module 206, specifically for, after maximum ratio in the ratio determining described feature classification, Remove the corresponding accounts information of maximum ratio, and again count the corresponding account of character feature in each feature classification Amount amount, is compared with each character feature corresponding account quantitative criteria pre-building, until each feature classification Character feature corresponding account quantity be respectively less than each character feature corresponding account quantitative criteria.
In a typical configuration, computing device includes one or more processors (CPU), input/defeated Outgoing interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or the form such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM). Internal memory is the example of computer-readable medium.
Computer-readable medium include permanent and non-permanent, removable and non-removable media can by appoint What method or technique is realizing information Store.Information can be computer-readable instruction, data structure, program Module or other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), its The random access memory (RAM) of his type, read only memory (ROM), electrically erasable are read-only Memorizer (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, magnetic cassette tape, tape magnetic Disk storage or other magnetic storage apparatus or any other non-transmission medium, can be used for storage can be calculated The information that equipment accesses.Define according to herein, computer-readable medium does not include temporary computer-readable matchmaker Body (transitory media), the such as data signal of modulation and carrier wave.
Also, it should be noted term " inclusion ", "comprising" or its any other variant are intended to non-row The comprising, so that include a series of process of key elements, method, commodity or equipment not only including of his property Those key elements, but also include other key elements of being not expressly set out, or also include for this process, Method, commodity or the intrinsic key element of equipment.In the absence of more restrictions, " included by sentence One ... " key element that limits is being it is not excluded that including the process of described key element, method, commodity or setting Also there is other identical element in standby.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer journey Sequence product.Therefore, the application can using complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.And, the application can adopt and wherein include calculating one or more Machine usable program code computer-usable storage medium (including but not limited to disk memory, CD-ROM, Optical memory etc.) the upper computer program implemented form.
The foregoing is only embodiments herein, be not limited to the application.For this area skill For art personnel, the application can have various modifications and variations.All institutes within spirit herein and principle Any modification, equivalent substitution and improvement made etc., within the scope of should be included in claims hereof.

Claims (16)

1. a kind of information monitoring method is it is characterised in that include:
Determine the accounts information of the account each to be identified receiving in special time period;
Extract the character feature in each accounts information receiving;
According to the character feature extracting, count respectively and there is in described special time period identical character spy The account quantity levied;
According to each character feature corresponding account quantitative criteria pre-building, and count described specific There is in time period the account quantity of identical character feature, judge whether each accounts information is fallacious message.
2. the method for claim 1 is it is characterised in that described accounts information includes account name; Described special time period includes the time span dividing with the default unit interval.
3. method as claimed in claim 2 is it is characterised in that extract in each accounts information receiving Character feature, specifically include:
Determine the account name in each accounts information;
According at least one feature extracting method default, extract character feature from each account name respectively.
4. the method for claim 1 is it is characterised in that count described special time period respectively Inside there is the account quantity of identical character feature, specifically include:
In each character feature that will be extracted in described special time period, identical character feature is returned Class;
Count the character feature corresponding account quantity in each feature classification respectively.
5. method as claimed in claim 4 is it is characterised in that pre-build that each character feature is corresponding Account quantitative criteria, specifically includes:
Previously according to the character feature in the historical account information receiving in multiple historical time sections, will be In described each historical time section, identical character feature is sorted out, wherein, described historical time section with described The time span of special time period is identical;
For each feature classification, count the history character of this feature classification in each historical time section respectively Feature corresponding account quantity;
According to the corresponding account of history character feature in each historical time section for this feature classification counting Amount amount, determines that the character feature corresponding account quantity of this feature classification is equal in all historical time sections Value and standard deviation;
Average according to this feature classification corresponding account quantity and standard deviation, determine that this feature classification corresponds to Account quantitative criteria.
6. method as claimed in claim 5 is it is characterised in that judge whether each accounts information is malice Information, specifically includes:
For each feature classification, judge the corresponding account of character feature in this feature classification in special time period Whether amount amount is more than this feature classification corresponding account quantitative criteria;
If so, the corresponding account of character feature having in this feature classification then receiving in special time period Information is fallacious message;
Otherwise, then the corresponding account of character feature having in this feature classification receiving in special time period Information is normal information.
7. method as claimed in claim 6 is it is characterised in that receive in by special time period Have before the corresponding accounts information of character feature in this feature classification is fallacious message, also include:
For each feature classification, determine the corresponding account of character feature in this feature classification in special time period The difference of the average of amount amount account quantity corresponding with this feature classification;
Determine the ratio of the standard deviation of described difference account quantity corresponding with this feature classification;
In this ratio determined, determine the maximum feature classification of ratio.
8. method as claimed in claim 7 is it is characterised in that methods described also includes:When determining After maximum ratio in the ratio of described feature classification, remove the corresponding accounts information of maximum ratio, and again Count the character feature corresponding account quantity in each feature classification, corresponding with each character feature pre-building Account quantitative criteria compare, until each feature classification character feature corresponding account quantity be respectively less than each word Symbol feature corresponding account quantitative criteria.
9. a kind of information monitoring device is it is characterised in that include:
Receiver module, for determining the accounts information of the account each to be identified receiving in special time period;
Extraction module, for extracting the character feature in each accounts information receiving;
Statistical module, for according to the character feature extracting, counting tool in described special time period respectively There is the account quantity of identical character feature;
Judge module, for according to each character feature corresponding account quantitative criteria pre-building, Yi Jitong There is in the described special time period counted out the account quantity of identical character feature, judge that each accounts information is No for fallacious message.
10. device as claimed in claim 9 is it is characterised in that described accounts information includes account name; Described special time period includes the time span dividing with the default unit interval.
11. devices as claimed in claim 10 it is characterised in that described extraction module specifically for, Determine the account name in each accounts information, according at least one feature extracting method default, respectively from each account Name in an account book extracts character feature.
12. devices as claimed in claim 9 are it is characterised in that described statistical module is specifically for inciting somebody to action In each character feature being extracted in described special time period, identical character feature is sorted out, respectively Count the character feature corresponding account quantity in each feature classification.
13. methods as claimed in claim 12 are it is characterised in that described device also includes:
Pre-build module, specifically for previously according to the history account receiving in multiple historical time sections Character feature in the information of family, identical character feature will be sorted out in described each historical time section, its In, described historical time section is identical with the time span of described special time period, for each feature classification, Count the history character feature corresponding account quantity of this feature classification in each historical time section, root respectively History character feature corresponding account quantity in each historical time section for this feature classification going out according to statistics, Determine the character feature corresponding account quantity of this feature classification average in all historical time sections and mark Accurate poor, the average according to this feature classification corresponding account quantity and standard deviation, determine this feature classification pair The account quantitative criteria answered.
14. devices as claimed in claim 13 it is characterised in that described judge module specifically for, For each feature classification, judge the character feature corresponding account number in this feature classification in special time period Whether amount is more than this feature classification corresponding account quantitative criteria, if so, then receives in special time period Having the corresponding accounts information of character feature in this feature classification is fallacious message, otherwise, then special time The corresponding accounts information of character feature having in this feature classification receiving in section is normal information.
15. devices as claimed in claim 14 it is characterised in that described judge module specifically for, For each feature classification, determine the character feature corresponding account number in this feature classification in special time period Measure the difference of the average of account quantity corresponding with this feature classification, determine described difference and this feature classification pair The ratio of the standard deviation of account quantity answered, in this ratio determined, determines the maximum feature of ratio Classification.
16. devices as claimed in claim 15 are it is characterised in that described device also includes:
Processing module, specifically for, after maximum ratio in the ratio determining described feature classification, moving Except the corresponding accounts information of maximum ratio, and again count the corresponding account of character feature in each feature classification Quantity, is compared with each character feature corresponding account quantitative criteria pre-building, until each feature classification Character feature corresponding account quantity is respectively less than each character feature corresponding account quantitative criteria.
CN201510518846.9A 2015-08-21 2015-08-21 A kind of information monitoring method and device Pending CN106469179A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510518846.9A CN106469179A (en) 2015-08-21 2015-08-21 A kind of information monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510518846.9A CN106469179A (en) 2015-08-21 2015-08-21 A kind of information monitoring method and device

Publications (1)

Publication Number Publication Date
CN106469179A true CN106469179A (en) 2017-03-01

Family

ID=58229738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510518846.9A Pending CN106469179A (en) 2015-08-21 2015-08-21 A kind of information monitoring method and device

Country Status (1)

Country Link
CN (1) CN106469179A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825924A (en) * 2019-11-01 2020-02-21 深圳市前海随手数据服务有限公司 Data detection method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081774A (en) * 2009-11-26 2011-06-01 中国移动通信集团广东有限公司 Card-raising identification method and system
CN102402517A (en) * 2010-09-09 2012-04-04 北京启明星辰信息技术股份有限公司 Method and system for establishing normal database login model and method and system for detecting abnormal login behavior
CN103377319A (en) * 2012-04-13 2013-10-30 索尼公司 System and method used for detecting users of piracy
CN103905532A (en) * 2014-03-13 2014-07-02 微梦创科网络科技(中国)有限公司 Microblog marketing account recognition method and system
CN104572765A (en) * 2013-10-25 2015-04-29 西安群丰电子信息科技有限公司 Method and system for finding vest account based on behavior analysis of user account
CN104715007A (en) * 2014-12-26 2015-06-17 小米科技有限责任公司 User identification method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081774A (en) * 2009-11-26 2011-06-01 中国移动通信集团广东有限公司 Card-raising identification method and system
CN102402517A (en) * 2010-09-09 2012-04-04 北京启明星辰信息技术股份有限公司 Method and system for establishing normal database login model and method and system for detecting abnormal login behavior
CN103377319A (en) * 2012-04-13 2013-10-30 索尼公司 System and method used for detecting users of piracy
CN104572765A (en) * 2013-10-25 2015-04-29 西安群丰电子信息科技有限公司 Method and system for finding vest account based on behavior analysis of user account
CN103905532A (en) * 2014-03-13 2014-07-02 微梦创科网络科技(中国)有限公司 Microblog marketing account recognition method and system
CN104715007A (en) * 2014-12-26 2015-06-17 小米科技有限责任公司 User identification method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825924A (en) * 2019-11-01 2020-02-21 深圳市前海随手数据服务有限公司 Data detection method, device and storage medium
CN110825924B (en) * 2019-11-01 2022-12-06 深圳市卡牛科技有限公司 Data detection method, device and storage medium

Similar Documents

Publication Publication Date Title
CN105808988B (en) Method and device for identifying abnormal account
CN110381151B (en) Abnormal equipment detection method and device
CN104067567B (en) System and method for carrying out spam detection using character histogram
CN108809745A (en) A kind of user's anomaly detection method, apparatus and system
CN110033302B (en) Malicious account identification method and device
CN108282450A (en) The detection method and device of abnormal domain name
CN104040963A (en) System and methods for spam detection using frequency spectra of character strings
CN106951571A (en) A kind of method and apparatus for giving application mark label
CN107547671A (en) A kind of URL matching process and device
CN102708186A (en) Identification method of phishing sites
CN105045911B (en) Label generating method and equipment for user to mark
CN113328994B (en) Malicious domain name processing method, device, equipment and machine readable storage medium
CN108647997A (en) A kind of method and device of detection abnormal data
CN109446391A (en) User's reading behavior analysis method, electronic device, computer readable storage medium
CN107622326A (en) User's classification, available resources Forecasting Methodology, device and equipment
CN107622406A (en) Identify the method and system of virtual unit
CN111242218A (en) Cross-social network user identity recognition method fusing user multi-attribute information
CN111476375B (en) Method and device for determining identification model, electronic equipment and storage medium
CN113010637A (en) Text auditing method and device
CN106469179A (en) A kind of information monitoring method and device
CN104462448B (en) A kind of packet name classification method and device
CN107391543A (en) The kind identification method and device of a kind of hotspot
CN111340380A (en) Client resource allocation method, device and storage medium
CN106844765A (en) Notable information detecting method and device based on convolutional neural networks
CN109062638B (en) System component display method, computer readable storage medium and terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170301