CN108092963A - Web page identification method, device, computer equipment and storage medium - Google Patents
Web page identification method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN108092963A CN108092963A CN201711297266.7A CN201711297266A CN108092963A CN 108092963 A CN108092963 A CN 108092963A CN 201711297266 A CN201711297266 A CN 201711297266A CN 108092963 A CN108092963 A CN 108092963A
- Authority
- CN
- China
- Prior art keywords
- domain name
- identified
- data
- webpage
- website
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1483—Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/16—Implementing security features at a particular protocol layer
- H04L63/168—Implementing security features at a particular protocol layer above the transport layer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
Abstract
The present invention relates to a kind of web page identification method, device, computer equipment and storage mediums.This method includes obtaining the webpage that identified risk class is more than predetermined level, the corresponding website domain name of extraction webpage;The corresponding network address in website is obtained according to website domain name;The domain name with network addresses is searched, when finding the domain name with network addresses, then using associated domain name as domain name to be identified;Obtain the web data in the corresponding website of domain name to be identified;Webpage of the risk class corresponding with domain name to be identified more than predetermined level is obtained according to acquired web data.Above-mentioned web page identification method, device, computer equipment and storage medium, the web page interrogation that predetermined level can be more than by a risk class are more than the webpage of predetermined level to associated multiple risk class, and search efficiency is high.
Description
Technical field
The present invention relates to network safety fileds, more particularly to a kind of web page identification method, device, computer equipment and deposit
Storage media.
Background technology
With the development of internet science and technology, the more and more activities of people carry out on network, such as are carried out on network
Transaction, corresponding banking etc. is handled on network, is thus present with the website of some banks that disguise oneself as, when user accesses
The private informations such as Bank Account Number, password that user submits when using such website can be stolen, such has if finding not in time
The website of menace can threaten the property safety of user, endanger the interests of user.
Traditionally, since substantial amounts of webpage can be generated daily, then need to select in the substantial amounts of webpage generated from internet
The target webpage of menace may be had by taking, and then the target webpage to choosing carries out cumbersome analysis so that identification target
Whether webpage is more than the inefficient of predetermined level for risk class.
The content of the invention
Based on this, it is necessary to whether be asked for the risk class of identification target webpage more than the inefficient of predetermined level
Topic, provides a kind of web page identification method, device, computer equipment and storage medium.
A kind of website identification method, including:
The webpage that identified risk class is more than predetermined level is obtained, extracts the corresponding website domain name of the webpage;
The corresponding network address in the website is obtained according to the website domain name;
The domain name with the network addresses is searched, when finding the domain name with the network addresses, then will
The associated domain name is as domain name to be identified;
Obtain the web data in the corresponding website of the domain name to be identified;
Risk class corresponding with the domain name to be identified is obtained according to acquired web data and is more than predetermined level
Webpage.
In one of the embodiments, the step of domain name of the lookup and the network addresses, including:
The network address is matched with the network address being pre-stored in address information storehouse;
When the network address successful match being pre-stored in the network address and described address correlation database, obtain with it is described
The association domain name to be matched of pre-stored network addresses;
Obtain effective deadline of the association domain name to be matched;
If current time is less than or equal to effective deadline, extracts the associated domain name conduct to be matched and wait to know
Other domain name.
In one of the embodiments, the method further includes:
When not finding the domain name with the network addresses, then the corresponding registration number of domain name of the website is obtained
According to according to the corresponding domain name of log-on data inquiry as domain name to be identified.
In one of the embodiments, the corresponding log-on data of domain name for obtaining the website, according to the registration
The step of corresponding domain name of data query is as domain name to be identified, including:
The corresponding log-on data of domain name of the website is obtained, it is corresponding that the log-on data is chosen from conversion logic storehouse
Conversion logic;
The log-on data carried out according to the conversion logic to be converted to transformed log-on data;
The transformed log-on data is matched with the information data stored in information repository;
When the information data successful match stored in transformed log-on data and information repository, then obtain matching into
The domain name of the described information data correlation of work(is as domain name to be identified.
In one of the embodiments, the web data acquired in the basis obtains corresponding with the domain name to be identified
Risk class is more than the step of webpage of predetermined level, including:
The web data is matched with the first filter data excessively stored in default blacklist, when the website number
During according to the described first filtering Data Matching success, then suspicious label is added to the domain name to be identified;
By the web data in the corresponding website of the domain name to be identified for adding suspicious label in default white list
The second of storage crosses filter data and is matched;
When web data successful match non-with the described second mistake filter data, then extraction carries treating for suspicious label
It identifies domain name, obtains the webpage that the webpage in the corresponding website of the domain name to be identified is more than predetermined level as risk class.
In one of the embodiments, the method further includes:
Do not have that carry can after data identification is carried out with the default white list by the default blacklist
When doubting the domain name to be identified of label, then the corresponding identifier of the domain name to be identified is obtained;
The identifier is matched with the secure identifier being stored in advance in security identifier repository;
When secure identifier identifier match success corresponding with the domain name to be identified, then successful match is obtained
The associated secure domain name of the secure identifier being stored in the security identifier repository, by the secure domain name and institute
State domain name matching to be identified;
When the secure domain name matches unsuccessful with the domain name to be identified, then the corresponding website of the domain name to be identified
In webpage as risk class be more than predetermined level webpage.
In one of the embodiments, the web data acquired in the basis obtains corresponding with the domain name to be identified
Risk class was more than after the step of webpage of predetermined level, further included:
Keyword of the risk class more than the web data of the webpage of predetermined level is extracted, according to the keyword pair
The corresponding domain name to be identified of webpage that the risk class is more than predetermined level adds corresponding class label;
The risk class is more than class label and the stored class label of the domain name to be identified of predetermined level into
Row matching;
When non-successful match, then class label of the risk class more than the domain name to be identified of predetermined level is added,
And the risk class is more than under web storage to the class label of predetermined level.
A kind of webpage identification device, described device include:
First acquisition module for obtaining the webpage that identified risk class is more than predetermined level, extracts the webpage
Corresponding website domain name;
Second acquisition module, for obtaining the corresponding network address in the website according to the website domain name;
Searching module, for search with the domain names of the network addresses, when finding and the network addresses
Domain name when, then using the associated domain name as domain name to be identified;
3rd acquisition module, for obtaining the web data in the corresponding website of the domain name to be identified;
Identification module is big for obtaining risk class corresponding with the domain name to be identified according to acquired web data
In the webpage of predetermined level.
A kind of computer equipment can be run on a memory and on a processor including memory, processor and storage
Computer program, the processor realizes the step in the above method when performing the computer program.
A kind of storage medium, is stored thereon with computer program, which realizes above-mentioned when being executed by processor
Step in method.
Method, apparatus, computer equipment and the storage medium of above-mentioned webpage identification, obtain identified risk class and are more than
The webpage of predetermined level, and then the domain name of the corresponding website of the webpage is got according to webpage, according to the domain Name acquisition of the website
The corresponding network address in the website, and then the domain name with the network addresses is searched as domain name to be identified, it is treated when inquiring
When identifying domain name, the web data in the corresponding website of domain name to be identified is obtained, according to the inquiry of web data, obtains risk etc.
Grade is more than the webpage of predetermined level.The web page interrogation of predetermined level can be more than by a risk class to associated multiple
Risk class is more than the webpage of predetermined level, and search efficiency is high.
Description of the drawings
Fig. 1 is the application scenario diagram of web page identification method in an embodiment;
Fig. 2 is web page identification method flow chart in an embodiment;
Fig. 3 is the structure diagram of webpage identification device in an embodiment;
Fig. 4 is an embodiment Computer device structure schematic diagram.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is used only for explaining the present invention, and
It is not used in the restriction present invention.
Be described in detail according to an embodiment of the invention before, it should be noted that, the embodiment described essentially consist in
Web page identification method, device, computer equipment and the step of storage medium correlation and the combination of device assembly.Therefore, the dress
Component and method and step is put to show in position by ordinary symbol in the accompanying drawings, and merely illustrate with
Understand the related details of the embodiment of the present invention, in order to avoid because being shown for having benefited from those of ordinary skill in the art of the present invention
And those details being clear to have obscured the disclosure.
Herein, such as left and right, upper and lower, front and rear, first and second etc relational terms are used merely to area
Divide an entity or action and another entity or action, and not necessarily require or imply and is between this entity or action any
Actual this relation or order.Term " comprising ", "comprising" or any other variant are intended to cover non-exclusive inclusion, by
This to include the process of a series of elements, method, article or equipment not only comprising these elements, but also comprising not bright
The other element really listed is elements inherent to such a process, method, article, or device.
Fig. 1 is refer to, Fig. 1 is the application scenario diagram of web page identification method in an embodiment, is identified including webpage flat
Platform and server, webpage identifying platform obtain net of the risk class identified more than predetermined level of storage from server
Page is more than on the webpage of predetermined level from the risk class got and obtains web page address, and then extraction should from web page address
The corresponding website domain name of webpage, webpage identifying platform obtain the corresponding network address in website according to website domain name, and webpage identification is flat
Platform is according to network address, lookup and the domain name of the network addresses from the address information storehouse for be stored in webpage identifying platform,
When finding the domain name with network addresses, then using the associated domain name as domain name to be identified, webpage identifying platform obtains
The web data on the webpage included in the corresponding website of domain name to be identified is taken, obtains and wait according to the web data got to know
The corresponding risk class of other domain name is more than the webpage of predetermined level.
Fig. 2 is referred to, in one of the embodiments, provides a kind of flow chart of web page identification method, in the present embodiment
Come in the webpage identifying platform being applied in this way in above-mentioned Fig. 1 for example, on the platform operation have webpage recognizer,
Webpage identifying processing is implemented by the webpage recognizer.This method comprises the following steps:
S202:Obtain the webpage that identified risk class is more than predetermined level, the corresponding website domain name of extraction webpage.
Specifically, risk class refer to for evaluating network page whether safety safety index, risk class can be default
Evaluating network page whether safety different stage, for example, risk class can from low to high be set according to rank, risk class is got over
It is high, then it represents that risk existing for corresponding webpage is higher, and e.g., risk class is arranged to 1 grade to 5 grades, represents the corresponding wind of webpage
Danger is higher and higher.Website domain name refers to the mark of related web site, can have multiple webpages under same website domain name, for example, website
The website domain name of " Baidu " is " baidu.com ", has multiple webpages, such as " Baidupedia " webpage under the website domain name.Wherein,
Vulnerability database is provided in server, the webpage that risk class is more than predetermined level, risk etc. are stored in vulnerability database
The webpage that grade is more than predetermined level then represents the webpage with excessive risk, and webpage identifying platform is obtained from server have been identified
Risk class be more than predetermined level webpage, when get identified risk class be more than predetermined level webpage when, then
Webpage according to getting obtains the corresponding web page address of the webpage, and then webpage identifying platform is according to the web page address, extraction
Website domain name in web page address.It should be noted that the web page address of webpage refers in a network, pair that each webpage has
The unique mark answered, web page address can be URL (Uniform Resoure Locator, uniform resource locator) addresses.Wind
Dangerous database refers to be stored with database of the risk class more than the webpage of preset value.
S204:The corresponding network address in website is obtained according to website domain name.
Specifically, network address refer to computer network be connected with each other or one kind when being communicated can communication identifier, can
To be the network address of computer in certain network, the computer which can uniquely identify in network is set
Standby, which may be employed network address as communication identifier when being communicated with other computers, for example, network address can
To be IP (Internet Protocol, Internet protocol) address etc., different website domain names is with being corresponding with corresponding network
Location.Further, webpage identifying platform inquires the corresponding network address in the website, Ke Yishi according to website domain name, and webpage is known
Other platform sends corresponding test data according to the website domain name got to the corresponding Website server in the website, when corresponding
During Website server returning response data, then webpage identifying platform is extracted from the response data for receiving Website server transmission
Corresponding network address.
S206:The domain name with network addresses is searched, it, then will association when finding the domain name with network addresses
Domain name as domain name to be identified.
Specifically, associated domain name refers to the domain name that can share same network address, when different website domain names pair
The website answered can share same network address when being stored in identical Website server, different website domain names is corresponding
Website is corresponding with different access ports in Website server, and different website domain names pair is distinguished according to different access ports
The website answered.Further, different network address and corresponding website domain name, webpage are pre-stored in webpage identifying platform
Identifying platform is according to the network address got, inquiry and the domain name of network addresses, the associated domain name with it is identified
The website domain name that risk class is more than predetermined level is different, corresponding more than the website of predetermined level with risk class when finding
During the domain name of network addresses, then using the associated domain name as domain name to be identified.
S208:Obtain the web data in the corresponding website of domain name to be identified.
Specifically, web data refers to the content shown on Webpage, and web data can be lteral data, picture number
According to, numerical data etc..Specifically, different webpages can be included in website, webpage identifying platform is according to the knowledge got
The associated domain name that the corresponding network address lookup in website that other risk class is more than predetermined level arrives is as domain name to be identified
When, webpage identifying platform, to the corresponding website of domain name to be identified, is waited to know according to the domain name lookup to be identified got so as to obtain
The web data of the different web pages included in the corresponding website of other domain name, such as obtain the lteral data shown in different web pages.
S210:Risk class corresponding with domain name to be identified is obtained more than predetermined level according to acquired web data
Webpage.
Specifically, webpage identifying platform is identified web data, according to the web data got when obtained net
There are during suspicious data in page data, and then it is more than using the webpage comprising the web data as risk class the net of predetermined level
Page.Can be, webpage identifying platform according to the lteral data of the web data got, in lteral data to character one by one into
Row identification, when recognizing there are during suspicious lteral data, then the webpage comprising the lteral data is corresponding with domain name to be identified
Risk class is more than the webpage of predetermined level.It should be noted that suspicious data can be default data, when being included in webpage
During the default data, then webpage is more than the webpage of predetermined level for risk class, and suspicious data can be lteral data, picture
Data, numerical data etc., for example, can be able to be to be arranged to word " bank ", " integration " or " prize " etc. with data.
In the present embodiment, the risk class that webpage identifying platform has been identified by one is more than the webpage of predetermined level
Other associated domain names are inquired, the web data in the corresponding website of associated domain name is inquired about to obtain other risk class
More than the webpage of predetermined level, the webpage of predetermined level is more than by a risk class can be related to query different risk etc.
Grade is more than the webpage of predetermined level, improves search efficiency.
In one of the embodiments, step S206 can include following flow, step S206 is searched with network
The step of associated domain name in location, including:
Network address is matched with the network address being pre-stored in address information storehouse.Specifically, address information storehouse is
Refer to the database for being stored with different network address and domain name corresponding from different network address.Webpage identifying platform will obtain
The risk class got is more than the webpage of predetermined level, and obtains web page address of the risk class more than the webpage of predetermined level,
The corresponding website domain name of the webpage is extracted according to web page address, which is obtained more than predetermined level according to website domain name
The corresponding network address in website, and then, by website corresponding net of the identified risk class got more than predetermined level
Network address is matched one by one with the all-network address being pre-stored in address information storehouse, and is traveled through and matched address repository
The all-network address of middle storage.
When the network address successful match being pre-stored in network address and address information storehouse, obtain and pre-stored network
The association domain name to be matched of address information.Specifically, association domain name to be matched refers to and is pre-stored in the net in address information storehouse
The domain name of network address information, the domain name can be the marks of relevant website, when getting network in address information repository
Address, which can associate, gets association domain name to be matched corresponding with network address.Webpage identifying platform is by identified risk etc.
The network address that grade is more than predetermined level is matched one by one with all network address stored in address information storehouse, and then net
Page identifying platform chooses network address of the identified risk class more than predetermined level successful match in address information storehouse
Network address obtains the association domain name to be matched with the network addresses of successful match from address information storehouse.
Obtain effective deadline that association domain name to be matched is obtained in address information storehouse.Specifically, effective deadline
Refer to the final effective time that association domain name to be matched carries, effective deadline can be the time in time, when effectively ending
Between can specific date details etc. can also be with the specific month in the time, effective deadline, for example, effectively deadline can
Using be the time in time as 2017, effective deadline can be specific month in the time in December, 2017, effectively end
Time can also be specific date details on December 31st, 2017 etc..It is in webpage identifying platform that identified risk class is big
The network address successful match stored in the corresponding network address of webpage of predetermined level and address information storehouse, and then webpage is known
When other platform obtains the association domain name to be matched of the network addresses of successful match, webpage identifying platform is according to address correlation database
In association domain Name acquisition to be matched association domain name to be matched corresponding effective deadline, i.e., according in the correlation database of address
Association domain Name acquisition association domain name to be matched corresponding final effective time to be matched.
If current time is less than or equal to effective deadline, associated domain name to be matched is extracted as domain name to be identified.
Specifically, current time refers to the time for getting association domain name to be matched, and current time can be system time, for example, working as
The preceding time can be the time in time, and current time can be the specific month in the time, and current time can also be specific day
Phase etc..Webpage identifying platform gets association domain name to be matched, and obtains current time, when which can be system
Between, webpage identifying platform is corresponding with association domain name to be matched by the current time got according to the current time got
Effective deadline is compared, if the current time for getting association domain name to be matched is less than effective deadline, obtains
The association domain name to be matched got is not less than effective deadline, that is, the association domain name to be matched got is effective, then webpage is known
Other platform is using the associated domain name to be matched got as associated domain name, and then using associated domain name as domain name to be identified.
It should be noted that in the present embodiment, address information storehouse can be passive DNS (passive Domain
Name System, passive domain name system) database, webpage identifying platform is more than according to the identified risk class got
The network address of the website of predetermined level is matched with the network address stored in passive DNS databases, when matching into
During work(, then the corresponding association domain name to be matched of network address of successful match in passive DNS databases is obtained, when acquisition
It is when the current time of association domain name to be matched is less than or equal to effective deadline of the association domain name to be matched, then this is to be matched
Associated domain name is as associated domain name.
It should be noted that the webpage that risk class is more than predetermined level can be the excessive risk net of normal webpage of disguising oneself as
Page when user accesses, steals associated bank card information of user etc., and then threatens the property safety of user, e.g. goes fishing
Webpage;Can also be other webpages that access is limited when needing to carry out risk management and control, for example, risk class is more than default etc.
The webpage of grade is the access rights that some enterprises have corresponding webpage, then limits the webpage of access it may be considered that being wind
Dangerous grade is more than the webpage of predetermined level.In the present embodiment, webpage identifying platform is according to the successful match from address information storehouse
Pre-stored network address obtains association domain name to be matched, and current time is associated the corresponding effective cut-off of domain name with to be matched
Time is compared, and when current time is less than or equal to effective deadline, then the association domain name to be matched is effective, you can to make
For associated domain name and then domain name to be identified is used as, is directly treated according to the filtering of current time and effective deadline are invalid
With association domain name, raising efficiency easy to operate, and invalid association domain name to be matched is directly filtered, improves and choose association
Domain name accuracy.
In one of the embodiments, web page identification method can also include the following steps, which can be in step
It is performed after S206, step S206, that is, searches and performed afterwards with the domain name of network addresses, which can include:
When not finding the domain name with network addresses, then the corresponding log-on data of domain name of website is obtained, according to
Log-on data inquires about corresponding domain name as domain name to be identified.Specifically, log-on data refers to the domain name for showing registration of website
The data of the details of user, log-on data can be lteral data, image data or numerical data etc., for example, registration number
According to can be personal name, log-on data can be individual mailbox, and log-on data can be personal call, and log-on data can be with
It is personal photo etc..Webpage identifying platform in address information storehouse not with pre-stored network address successful match when, then do not obtain
The association domain name to be matched for the network addresses got and be pre-stored, then webpage identifying platform obtain identified risk class
More than the corresponding log-on data of domain name of the website of predetermined level, and then webpage identifying platform is looked into according to the log-on data inquired
Domain name corresponding with log-on data is ask, the domain name corresponding with log-on data inquired is more than predetermined level with risk class
The domain name of website is different, so the domain names that the domain name of the website for being more than predetermined level from risk class inquired is different as
Domain name to be identified.
In the present embodiment, when do not found in address information storehouse with identified risk class be more than predetermined level net
Stand corresponding network addresses domain name when, then according to the identified risk class be more than predetermined level website correspond to note
Volume data query is to different domain names as domain name to be identified, you can, will to inquire about associated domain name again by log-on message
For the associated domain name inquired as domain name to be identified, it is accurate more than the website of predetermined level that raising inquires risk class
Property.
In one of the embodiments, the corresponding log-on data of domain name of above-mentioned acquisition website, is inquired about according to log-on data
The step of corresponding domain name is as domain name to be identified can include following flow:
The corresponding log-on data of domain name of website is obtained, the corresponding conversion of log-on data is chosen from conversion logic storehouse and is patrolled
Volume.Specifically, conversion logic storehouse refers to be stored with the conversion logic for the log-on data that log-on data is converted to set form
Database.Conversion logic refers to the rule for converting log-on data, and conversion logic can be by the character in log-on data
It is replaced as default character, conversion logic can delete invalid character etc..Further, webpage identifying platform obtains
When being more than the webpage of predetermined level to identified risk class, which is extracted according to the web page address of webpage
Grade is more than the corresponding website domain name of webpage of predetermined level, when webpage identifying platform extracts the website domain name, then basis
The website domain name obtains the corresponding log-on data of webpage that the identified risk class is more than predetermined level, and the note got
Volumes is then chosen from conversion logic storehouse to the registration according to being shown not in accordance with prescribed form according to the type of log-on data
The corresponding conversion logic of data, and then by the log-on data of acquisition according to the display format of regulation.For example, webpage identifying platform root
It is more than the domain name of the website of predetermined level according to the identified risk class of extraction, extracting domain name according to the domain name of website corresponds to
Log-on data, such as registration name, registration mailbox, registration phone, register name among contain space, contain in registration phone
Connector, then according to log-on data type, i.e. webpage identification is chosen registration name according to registration name from logical transition storehouse and is pressed
According to the conversion logic of display rule display, space in name will be registered and deleted, and then according to registration phone from conversion logic
Conversion logic of the registration phone according to display rule display is chosen in storehouse, i.e., is deleted the connector in registration phone.
Log-on data carried out according to conversion logic to be converted to transformed log-on data.Specifically, when webpage identifies
When platform is chosen to conversion logic, i.e. webpage identification adds selection to the rule for converting log-on data, will such as register number
Character in is replaced as default character, deletes invalid character etc., then webpage identifying platform, will according to conversion logic
What log-on data was converted arrives transformed log-on data, and transformed log-on data can be then the display lattice according to regulation
Formula is shown.For example, log-on data has registration name, registration mailbox, registration phone etc., webpage identifying platform is chosen to registration
The conversion logic of name and registration phone will then be registered and delete invalid space character in name according to conversion logic, can also
In registration phone connector will be deleted according to the conversion logic in registration phone.
Transformed log-on data is matched with the information data stored in information repository.Specifically, information is deposited
Storage cavern refers to the database for being stored with different log-on messages and the associated domain name of log-on message, and information repository can store
There are registration name, registration mailbox and registration phone etc., registration name, registration mailbox and the registration stored in information database
Phone can correspond, and information repository can be stored with the associated website domain name of log-on message.Information data is
Refer to the data of the details for the registrant for showing relevant domain name, information data can be lteral data, and information data can be with
It is that numerical data can also be image data etc., for example, information data can be name, phone, mailbox or photo etc..Specifically
Ground, webpage identifying platform match the log-on data got with the information data stored in information repository one by one, can
To be, the log-on data that webpage identifying platform is got is to register name, registration mailbox and registration phone, webpage identifying platform root
Name will be registered according to transformation rule, registration mailbox and registration phone be converted to transformed registration name, transformed
Registration mailbox and transformed registration phone, webpage identifying platform register what is stored in name and information repository by transformed
Name is matched, the phone progress that webpage identifying platform will store in transformed registration phone and information repository again
Match somebody with somebody, and then webpage identifying platform matches transformed registration mailbox with the mailbox stored in information repository.
When the information data successful match stored in transformed log-on data and information repository, then obtain matching into
The associated domain name of information data of work(is as domain name to be identified.Specifically, when webpage identifying platform is by transformed log-on data
The information data for summarizing storage with information repository matches one by one, when matching corresponding information data in information repository
When, then the associated domain name of information data of successful match is obtained, using the associated domain name as domain name to be identified.Can be, net
Page identifying platform will one by one be matched with the information data stored in information data respectively in log-on data per middle data, work as registration
When each data in data are with the information data successful match stored in information database, then it is associated to obtain information data
Domain name.Webpage identifying platform by it is transformed registration name matched with the name stored in information database, when match into
During work(, then registration mailbox mailbox corresponding with the name stored in information database matched, when registration mailbox matches
Then registration phone with storing phone corresponding with name and mailbox in information database is matched again during success, works as registration
Phone also successful match when then by the associated domain name of name, phone and mailbox of the successful match stored in information repository into
Row extraction, so as to using the domain name extracted as domain name to be identified.It should be noted that can also be webpage identifying platform only
It is carried out with the data stored in information data with arbitrary log-on data in log-on data matched, when successful match, then will
The associated domain name of successful match information data is as domain name to be identified.By transformed registration name with being stored in information database
Name matched, then directly extract successful match the associated domain name of name as domain name to be identified.
It should be noted that in the present embodiment, information repository can be whois databases, and webpage identifying platform obtains
It is more than the domain name of the website of predetermined level to identified risk class, and according to the domain Name acquisition to the corresponding registration in the website
During data, which can be matched with the information data stored in whois databases, when successful match, then
The associated domain name of information data is obtained as domain name to be identified.
In the present embodiment, webpage identifying platform first converts the log-on data got according to conversion logic, obtains
The accuracy for identifying associated domain name to be identified can be improved according to the transformed log-on data of display rule display, into
And matched according to transformed log-on data with the information data stored in information repository, when successful match, then obtain
The associated domain name of information data of successful match is taken as domain name to be identified, according to log-on message be can obtain it is different to be identified
Domain name improves recognition efficiency.
In a wherein example, it is big that risk class corresponding with domain name to be identified is obtained according to acquired web data
In the webpage of predetermined level the step of, it can include:
Web data is matched with the first filter data excessively stored in default blacklist, when website data and first
When filtering Data Matching success, then suspicious label is added to domain name to be identified.Specifically, blacklist refers to be stored with risk
Grade is more than the data of predetermined level, and the data that risk class is more than predetermined level can be lteral data, image data, number
Data etc., for example, character such as " bank ", " integration " can be stored with.First, which crosses filter data, refers to risk class more than default
The data of grade, when including first in webpage, then the website may be net of the risk class more than predetermined level to filter data excessively
Page, the first filter data excessively can be lteral data, image data, numerical data etc..Suspicious label refers to that domain name to be identified may
It is the mark that risk class is more than predetermined level.Specifically, webpage identifying platform will be from the corresponding website Zhong Bao of domain name to be identified
When all webpages contained all extract web data, then all web datas will be extracted one by one with being stored in default blacklist
First cross filter data and matched, when all web datas first cross filter data with what is be arbitrarily stored in blacklist
During with success, then webpage identifying platform adds the corresponding domain name to be identified in the website of the Webpage correlation in the source of the web data
It can be with label.It should be noted that number of matches threshold value can also be provided with, i.e., webpage identifying platform is all by what is got
Web data first is crossed filter data and is matched one by one with what is be stored in blacklist, when being stored in black name with default quantity
During the first filtering Data Matching success in list, then domain to be identified corresponding to the website of the Webpage correlation in the web data source
Name adds suspicious label, and number of matches threshold value can be preset as 1, be preset as 3, be preset as 4 etc..It is it is also possible that default when having
The first filtering in the web data and blacklist of the webpage included in the corresponding website of domain name to be identified got of quantity
During Data Matching success, then suspicious label is added to the domain name to be identified.
By the web data in the corresponding website of domain name to be identified for adding suspicious label with being stored in default white list
Second cross filter data matched.Specifically, white list refers to the database for being stored with trust data, and trust data refers to wind
Dangerous grade is less than or equal to the data of predetermined level, and trust data can be lteral data, image data, numerical data etc., for example,
Character such as " lottery industry " can be stored with.Second cross filter data refer to risk class be less than or equal to predetermined level data namely
It is trust data, when including the secondth in webpage, then the website may be reliable website to filter data excessively, and second crosses filter data can
To be lteral data, image data, numerical data etc..Specifically, the extraction of webpage identifying platform is with the addition of suspicious label and waits to know
Other domain name, and by the web data being with the addition of on all webpages included in the website of the domain name to be identified of suspicious label with presetting
White list in store second cross filter data matched one by one, when the corresponding net of domain name to be identified for being with the addition of suspicious label
When the web data stood on all webpages included is with the second filtering Data Matching success pre-stored in white list, then will
The suspicious label carried in domain name to be identified is deleted.It should be noted that can also be with the addition of suspicious mark when default quantity
Web data in the website of the domain name to be identified of label on the webpage that includes and the second filtering number stored in default white list
During according to successful match, then the suspicious label carried in domain name to be identified is deleted.
When web data and second cross filter data non-successful match when, then extraction carries the domain to be identified of suspicious label
Name obtains the webpage that the webpage in the corresponding website of domain name to be identified is more than predetermined level as risk class.Specifically, net is worked as
The web data included in the corresponding website of domain name to be identified for adding suspicious label and second are crossed filter data by page identifying platform
During non-successful match, then suspicious label is still carried in domain name to be identified, then webpage identifying platform, which extracts, still carries
The domain name to be identified of suspicious label, and then the corresponding website of domain name to be identified is obtained, extract the webpage included in corresponding website
It is more than the webpage of predetermined level as risk class.
In the present embodiment, filter data and the second filtering number being stored in white list are crossed by stored in blacklist first
It is filtered according to web data, so as to obtain the webpage that required risk class is more than predetermined level, is taken although preventing
It is more than the web data of predetermined level but really credible webpage with risk class, by double-filtration, improves identification risk
Grade is more than the accuracy of the webpage of predetermined level.
In one of the embodiments, web page identification method can also include:
Do not exist after data identification is carried out by default blacklist and default white list and carry suspicious label
During domain name to be identified, then the corresponding identifier of domain name to be identified is obtained.Specifically, identifier refers to represent that domain name to be identified corresponds to
The distinctive mark in website, identifier can be enterprise mark, for example, identifier can be enterprise logo etc..Specifically, net is worked as
Page identifying platform is according to the web data on the webpage included in all corresponding websites of domain name to be identified got by pre-
If blacklist and default white list carry out data identification after, when domain name to be identified does not all carry suspicious label, then pass through
Web data identification is unidentified to arrive webpage of the risk class more than predetermined level, then webpage identifying platform obtains domain name pair to be identified
The identifier answered.
Identifier is matched with the secure identifier being stored in advance in security identifier repository.Specifically, safety
Mark repository refers to the identifier for being stored with website trusty and the database of the corresponding website domain name of identifier.Safety
Identifier refers to the mark of trusted website, and secure identifier can be the mark of the enterprise of safe webpage, for example, safety post
Know logo of the symbol for industrial and commercial bank's webpage, be the logo etc. of safety group webpage.Specifically, webpage identifying platform will be got
Identifier is matched one by one with being stored in advance in the secure identifier stored in security identifier repository, Ke Yishi, and webpage is known
The corresponding identifier of domain name to be identified that other platform is got is safety group logo, and then the domain name pair to be identified that will be got
The identifier answered i.e. safety group logo is matched with the secure identifier being stored in security identifier repository.
When secure identifier identifier match success corresponding with domain name to be identified, then being stored in for successful match is obtained
The associated secure domain name of secure identifier in security identifier repository matches secure domain name with domain name to be identified.Specifically,
During the secure identifier successful match that webpage identifying platform will store in the corresponding identifier of domain name to be identified and secure storage storehouse,
Then the corresponding domain name to be identified of the corresponding secure identifier of domain name to be identified may be secure domain name, and then need to carry out further
Matching is with identifying, then the secure identifier being stored in preceding security identifier repository of webpage identifying platform acquisition successful match closes
The secure domain name of connection, by the associated secure domain name of safe identifier being stored in security identifier repository of successful match, and
Secure domain name is matched with domain name to be identified.For example, the corresponding identifier of domain name to be identified that webpage identification is got is put down
When peace group logo is with the safety group logo successful match stored in secure storage storehouse, then obtains and deposited in security identifier repository
The associated domain names of safety group logo " pingan.com " of storage, and by domain name to be identified and the associated domain name
" pingan.com " is matched.
When secure domain name matches unsuccessful with domain name to be identified, then the webpage conduct in the corresponding website of domain name to be identified
Risk class is more than the webpage of predetermined level.Specifically, when webpage identifying platform matches domain name to be identified not with secure domain name
Successfully make, then the corresponding identifier of domain name to be identified is the secure identifier forged, then will be in the corresponding website of domain name to be identified
Comprising webpage as risk class be more than predetermined level webpage.For example, the domain to be identified that webpage identifying platform will be got
The identifier of name is safety group logo, when safety group logo is matched into the security identifier stored in security identifier repository
Work(then obtains associated domain name " pingan.com " in security identifier repository, when domain name to be identified is not " pingan.com "
When, then domain name to be identified has forged safety group logo, then using the webpage in the corresponding website of the domain name to be identified as risk
Grade is more than the webpage of predetermined level.
In the present embodiment, when web data is identified do not obtain suspicious domain name to be identified when, then according to domain to be identified
Further identification is big for risk class so as to obtain the webpage included in the corresponding website of domain name to be identified for the identifier that name carries
In the webpage of preset value, using Multiple recognition method, the accuracy that identification risk class is more than the webpage of predetermined level is improved.
In one of the embodiments, after step S210, can also include the following steps, step S210, i.e., according to institute
After the website data of acquisition obtains the step of risk class corresponding with domain name to be identified is more than the webpage of predetermined level, also wrap
It includes:
The keyword that risk class is more than the web data of predetermined level webpage is extracted, it is big to risk class according to keyword
Corresponding class label is added in the domain name to be identified of predetermined level.Specifically, class label refers to the type of web data
Mark, class label can be the label of different risk, can be with for example, class label can be bank sort label
It is shopping category label etc..Specifically, webpage identifying platform identifies that risk class is more than the webpage of predetermined level, and then, net
The keyword of page identifying platform extraction web data, webpage identifying platform according to the keyword of the web data extracted, according to
The keyword of the web data extracted, domain name addition pair to be identified associated to the corresponding website of the webpage comprising web data
The class label answered.For example, the webpage that webpage identifying platform is more than predetermined level according to risk class is identified, and then from webpage
Identifying platform extract from different webpages keyword respectively " integrate " with " bank ", webpage identifying platform is according to extracting
The keyword " integration " of web data and " bank ", to the associated domain name to be identified in the corresponding website of the webpage comprising web data
Corresponding class label is added to add " bank's label " or " integration label ".
Risk class is more than to the class label of the domain name to be identified of predetermined level and the progress of stored class label
Match somebody with somebody.Specifically, webpage identifying platform will store the classification mark of webpage identifying platform according to the class label for treating addition domain name
Label are matched one by one, until having traveled through all stored class labels.For example, it is to the label of domain name to be identified addition
" bank " and " integration ", the label that domain name to be identified is added " bank " match one by one with stored class label, then
The class label " integration " added to domain name to be identified is matched one by one with stored class label.
When non-successful match, then class label of the risk class more than the domain name to be identified of predetermined level is added, and will
Risk class is more than under web storage to the class label of predetermined level.Specifically, when addition class label with it is stored
During the non-successful match of class label, then the class label added is new class label, then by the risk class of non-successful match
Class label more than the domain name to be identified of predetermined level is added in stored class label, and by the class label of addition
The corresponding website of domain name to be identified in the risk class that includes be more than the webpage of predetermined level and be added in such distinguishing label.Example
Such as, the class label of domain name addition to be identified is respectively " bank " and " integration ", by class label " bank " and stored class
Distinguishing label is matched one by one, the class label " integration " that adds domain name to be identified and stored class label one by one into
Row matching, when class label " bank " non-successful match, is then added to stored class label by class label " bank "
In, and the risk class included in the corresponding website of domain name to be identified for being with the addition of " bank " class label is more than predetermined level
Webpage be added in such distinguishing label.
It should be noted that webpage identifying platform can be with preset time, by updated class label and class label
Corresponding risk class is sent to server more than the webpage of predetermined level and is stored.For example, one hour of predetermined interval will
The webpage that updated class label and the corresponding risk class of class label are more than predetermined level is sent to server progress
Storage.
In the present embodiment, the keyword that risk class is more than to the web data in the webpage of predetermined level extracts,
The domain name to be identified for being more than predetermined level to risk class according to keyword adds corresponding class label, and then if addition
When class label is not with stored class label successful match, then the class label of addition is added to stored classification mark
Label, and by risk class be more than predetermined level web storage in the class label of the addition, progressively expand stored class
Distinguishing label enhances applicability.
In one of the embodiments, when the webpage that risk class is more than predetermined level is fishing webpage, citing signal,
When webpage identifying platform gets identified fishing webpage, then the corresponding webpage domain name of the fishing webpage is extracted, and then according to
The network address of the corresponding website of the webpage domain Name acquisition fishing webpage, webpage identifying platform according to the network address inquired,
The domain name of Network Search address information, the domain name of Network Search address information can be the fishings that webpage identifying platform will inquire
The network address of the corresponding website of fishnet page is matched with the network address with storage in address information storehouse, when the fishing webpage
When the network address of corresponding website is with the network address successful match being pre-stored in address information storehouse, gets and be pre-stored
The association domain name to be matched of network addresses, and then according to the effective time of association domain name to be matched, judge the pass to be matched
Whether connection domain name is effective namely when current time is less than or equal to effective deadline, then extracts associated domain name conduct to be matched
Domain name to be identified, and then when webpage identifying platform finds the domain name with network addresses, then make the associated domain name
For domain name to be identified.And then when not inquiring the domain name with network addresses in aforementioned manners, then the domain name pair of website is obtained
The log-on data answered inquires about corresponding domain name as domain name to be identified according to log-on data, and Ke Yishi is inquired about according to log-on data
Corresponding domain name can be that the domain name that webpage identifying platform gets the corresponding website of fishing website is corresponded to as domain name to be identified
Log-on data, and then the corresponding conversion logic of log-on data is chosen from conversion logic storehouse, and then by log-on data according to turning
It changes logic to carry out being converted to transformed log-on data, the information that will be stored in transformed log-on data and information repository
Data are matched, and when the information data successful match stored in transformed log-on data and information repository, are then obtained
The associated domain name of information data of successful match is as domain name to be identified.First using the corresponding website of identified fishing webpage
The domain name of network addresses carries out inquiring about domain name to be identified, is corresponded to when not inquiring, then using identified fishing webpage
The corresponding log-on data of network address of website inquire about domain name to be identified, inquired about, ensured by way of inquiring about twice
Inquiry is not in omit.
When webpage identifying platform obtains domain name to be identified, then the webpage that is included in the corresponding website of domain name to be identified is obtained
Web data, and then web data is matched with the first data stored in default blacklist, when successful match, then
The corresponding domain name to be identified in the website in the web data corresponding webpage institute source adds suspicious label, and then will be with the addition of again can
Doubt second stored in web data and default white list in the corresponding website of domain name to be identified of label cross filter data into
Row matching, when not crossing the non-successful match of filter data with second, then extraction carries the domain name to be identified of suspicious label, so as to should
The webpage in the corresponding website of domain name to be identified of suspicious label is carried as fishing webpage.Further, when by default
Blacklist and default list all carry out Data Matching so that identify all do not exist with suspicious label domain name to be identified when,
The corresponding identifier of domain name to be identified is then obtained, such as enterprise logo, and then by the logo of acquisition and is stored in advance in security identifier
Secure identifier in storage is matched, and when successful match, then obtains being stored in security identifier storehouse for successful match
The associated secure domain name of security identifier, and then secure domain name is matched with domain name to be identified, when matching unsuccessful, then should
Domain name to be identified disguises oneself as secure domain name, and then the webpage in the corresponding website of the domain name to be identified passes through as fishing webpage
Web data in the webpage that is included in the corresponding website of domain name to be identified and banner are inquired about, determined to be identified
Whether the webpage included in the corresponding website of domain name is fishing webpage, and carries out secondary inspection using web data and banner
It surveys, improves the accuracy for being detected as fishing webpage.
And then when identify fishing webpage be then extract web data on fishing webpage it is crucial then, according to keyword
By the corresponding domain name to be identified addition class label of the fishing webpage, and if such distinguishing label with stored class label not
During successful match, then the class label of the corresponding domain name to be identified of fishing webpage is added, and then fishing webpage is added to classification
Under label.
In the present embodiment, multiple domain names to be identified can be related to query by a fishing webpage, improve production news efficiency,
Enhance applicability, and the web data of the webpage in the website in domain name to be identified correspondence is inquired about and to webpage mark
Know and carry out whether corresponding webpage in inquiry judging domain name to be identified is fishing webpage, inquiry is accurate, and the fishing that will be inquired
Webpage is classified according to classification, convenient for subsequent inquiry and push.
In one of the embodiments, Fig. 3 is referred to, the structure diagram of a webpage identification device, webpage identification are provided
Device 300 can include:
First acquisition module 310 for obtaining the webpage that identified risk class is more than predetermined level, extracts webpage pair
The website domain name answered.
Second acquisition module 320, for obtaining the corresponding network address in website according to website domain name.
Searching module 330, for search with the domain names of network addresses, when finding the domain name with network addresses
When, then using associated domain name as domain name to be identified.
3rd acquisition module 340, for obtaining the web data in the corresponding website of domain name to be identified.
Identification module 350 is big for obtaining risk class corresponding with domain name to be identified according to acquired web data
In the webpage of predetermined level.
In one of the embodiments, searching module 330 can include:
First matching unit, for network address to be matched with the network address being pre-stored in address information storehouse.
Domain Name acquisition unit, for when the network address successful match being pre-stored in network address and address information storehouse,
Obtain the association domain name to be matched with pre-stored network addresses.
Time acquisition unit, for obtaining effective deadline of association domain name to be matched.
If extraction unit be less than or equal to effective deadline for current time, extracts associated domain masterpiece to be matched
For domain name to be identified.
In one of the embodiments, webpage identification device can also include:
Enquiry module, for when not finding the domain name with network addresses, then the domain name for obtaining website to be corresponding
Log-on data inquires about corresponding domain name as domain name to be identified according to log-on data.
In one of the embodiments, enquiry module can include:
Unit is chosen, for obtaining the corresponding log-on data of the domain name of website, log-on data is chosen from conversion logic storehouse
Corresponding conversion logic.
Converting unit, for being carried out log-on data according to conversion logic to be converted to transformed log-on data.
Second matching unit, the information data for that will store in transformed log-on data and information repository carry out
Match somebody with somebody.
Domain Name acquisition unit to be identified, for the information data worked as transformed log-on data with stored in information repository
During successful match, then the associated domain name of information data of successful match is obtained as domain name to be identified.
In one of the embodiments, identification module 350 can also include:
First filter element, first for that will store in web data and default blacklist, which crosses filter data, carries out
Match somebody with somebody, when website data and the first filtering Data Matching success, then suspicious label is added to domain name to be identified.
Second filter element, for will add the web data in the corresponding website of domain name to be identified of suspicious label with it is pre-
If white list in store second cross filter data matched.
Label domain name acquiring unit, for when web data and second cross filter data non-successful match when, then extract carrying
There is the domain name to be identified of suspicious label, obtain the webpage in the corresponding website of domain name to be identified as risk class more than default etc.
The webpage of grade.
An example kind wherein, webpage identification device 300 can also include:
Identifier acquisition module, for not deposited after data identification is carried out with default white list by default blacklist
When carrying the domain name to be identified of suspicious label, then the corresponding identifier of domain name to be identified is obtained.
Identifier match module, for by identifier and the secure identifier that is stored in advance in security identifier repository into
Row matching.
Secure domain name matching module, for when the success of secure identifier corresponding with domain name to be identified identifier match,
Then obtain the associated secure domain name of the secure identifier being stored in security identifier repository of successful match, by secure domain name with
Domain name matching to be identified.
Suspicious domain name extraction module, for when secure domain name matches unsuccessful with domain name to be identified, then domain name to be identified
Webpage in corresponding website is more than the webpage of predetermined level as risk class.
In one of the embodiments, webpage identification device 300 can also include:
Keyword-extraction module, for extracting keyword of the risk class more than the web data of the webpage of predetermined level,
The corresponding domain name to be identified of webpage for being more than predetermined level to risk class according to keyword adds corresponding class label.
Tag match module, for risk class to be more than to the class label of the domain name to be identified of predetermined level with having stored
Class label matched.
Add module, for when non-successful match, then adding risk class more than the domain name to be identified of predetermined level
Class label, and risk class is more than under web storage to the class label of predetermined level.
The above-mentioned specific restriction on webpage identification device may refer to the restriction above in connection with web page identification method,
This is repeated no more.
In one of the embodiments, a kind of computer equipment is provided, the computer equipment can be conventional terminal or its
His any suitable computer equipment, cut-away view can be as shown in Figure 4.The computer equipment includes passing through system bus
Processor, memory and the network interface of connection.Wherein, the processor of the computer equipment calculates and controls energy for providing
Power.The memory of the computer equipment includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with
Operating system and computer program.The built-in storage is the fortune of the operating system and computer program in non-volatile memory medium
Row provides environment.The network interface of the computer equipment is used to communicate by network connection with external terminal.The computer journey
To realize a kind of web page identification method when sequence is executed by processor, processor realizes following steps when performing the computer program:
Obtain the webpage that identified risk class is more than predetermined level, the corresponding website domain name of extraction webpage.It is obtained according to website domain name
Take the corresponding network address in website.The domain name with network addresses is searched, when finding the domain name with network addresses,
Then using associated domain name as domain name to be identified.Obtain the web data in the corresponding website of domain name to be identified.According to acquired
Web data obtain risk class corresponding with domain name to be identified be more than predetermined level webpage.
In one of the embodiments, the domain name searched with network addresses is realized when processor performs computer program
The step of, it can include:Network address is matched with the network address being pre-stored in address information storehouse.When network address with
During the network address successful match being pre-stored in address information storehouse, the association to be matched with pre-stored network addresses is obtained
Domain name.Obtain effective deadline of association domain name to be matched.If current time is less than or equal to effective deadline, extract
Associated domain name to be matched is as domain name to be identified.
In one of the embodiments, following steps are also realized when processor performs computer program:When do not find with
During the domain name of network addresses, then the corresponding log-on data of domain name of website is obtained, corresponding domain is inquired about according to log-on data
Name is used as domain name to be identified.
In one of the embodiments, the corresponding registration of domain name for obtaining website is realized when processor performs computer program
Data are inquired about the step of corresponding domain name is as domain name to be identified according to log-on data, can be included:
The corresponding log-on data of domain name of website is obtained, the corresponding conversion of log-on data is chosen from conversion logic storehouse and is patrolled
Volume.Log-on data carried out according to conversion logic to be converted to transformed log-on data.By transformed log-on data and letter
The information data stored in breath repository is matched.When the Information Number stored in transformed log-on data and information repository
During according to successful match, then the associated domain name of information data of successful match is obtained as domain name to be identified.
In one of the embodiments, realize when processor performs computer program and obtained according to acquired web data
Risk class corresponding with domain name to be identified is more than the step of webpage of predetermined level, can include:
Web data is matched with the first filter data excessively stored in default blacklist, when website data and first
When filtering Data Matching success, then suspicious label is added to domain name to be identified.The domain name to be identified for adding suspicious label is corresponded to
Website in web data with stored in default white list second cross filter data matched.When web data and second
When crossing the non-successful match of filter data, then extraction carries the domain name to be identified of suspicious label, obtains the corresponding net of domain name to be identified
Webpage in standing is more than the webpage of predetermined level as risk class.
In one of the embodiments, the step of being realized when processor performs computer program can also include:Work as process
When default blacklist after the progress data identification of default white list with not having the domain name to be identified for carrying suspicious label, then
Obtain the corresponding identifier of domain name to be identified.By identifier and the secure identifier that is stored in advance in security identifier repository into
Row matching.When secure identifier identifier match success corresponding with domain name to be identified, then being stored in for successful match is obtained
The associated secure domain name of secure identifier in security identifier repository matches secure domain name with domain name to be identified.Work as safety
When domain name matches unsuccessful with domain name to be identified, then the webpage in the corresponding website of domain name to be identified is more than pre- as risk class
If the webpage of grade.
In one of the embodiments, realize when processor performs computer program and obtained according to acquired web data
Risk class corresponding with domain name to be identified was more than after the step of webpage of predetermined level, can also include:Extract risk etc.
Grade is more than the keyword of the web data of the webpage of predetermined level, is more than the webpage of predetermined level to risk class according to keyword
Corresponding domain name to be identified adds corresponding class label.Risk class is more than the class label of the domain name to be identified of predetermined level
It is matched with stored class label.When non-successful match, then it is to be identified more than predetermined level to add risk class
The class label of domain name, and risk class is more than under web storage to the class label of predetermined level.
The above-mentioned specific restriction on computer equipment may refer to the restriction above in connection with web page identification method, herein
It repeats no more.
In one of the embodiments, continuing with referring to Fig. 4, a kind of storage medium is provided, is stored thereon with computer journey
Sequence, the computer program realize following steps when being executed by processor:Identified risk class is obtained more than predetermined level
Webpage, the corresponding website domain name of extraction webpage.The corresponding network address in website is obtained according to website domain name.Lookup and network address
Associated domain name, when finding the domain name with network addresses, then using associated domain name as domain name to be identified.Acquisition is treated
Identify the web data in the corresponding website of domain name.Risk corresponding with domain name to be identified is obtained according to acquired web data
Grade is more than the webpage of predetermined level.
In one of the embodiments, realize and search and network addresses when which is executed by processor
The step of domain name, can include:Network address is matched with the network address being pre-stored in address information storehouse.When network
When location is with the network address successful match being pre-stored in address information storehouse, obtain to be matched with pre-stored network addresses
Associate domain name.Obtain effective deadline of association domain name to be matched.If current time is less than or equal to effective deadline,
Associated domain name to be matched is extracted as domain name to be identified.
In one of the embodiments, following steps are also realized when which is executed by processor:When not searching
During to domain name with network addresses, then the corresponding log-on data of domain name of website is obtained, inquired about and corresponded to according to log-on data
Domain name as domain name to be identified.
In one of the embodiments, realize that the domain name for obtaining website is corresponding when which is executed by processor
Log-on data is inquired about the step of corresponding domain name is as domain name to be identified according to log-on data, can be included:Obtain the domain of website
The corresponding log-on data of name chooses the corresponding conversion logic of log-on data from conversion logic storehouse.It will be registered according to conversion logic
Data carry out being converted to transformed log-on data.The Information Number that will be stored in transformed log-on data and information repository
According to being matched.When the information data successful match stored in transformed log-on data and information repository, then acquisition
With the associated domain name of successful information data as domain name to be identified.
In one of the embodiments, realized when which is executed by processor according to acquired web data
The step of risk class corresponding with domain name to be identified is more than the webpage of predetermined level is obtained, can be included:By web data with
First stored in default blacklist crosses filter data and is matched, when website data and the first filtering Data Matching success,
Suspicious label then is added to domain name to be identified.By the web data in the corresponding website of domain name to be identified for adding suspicious label with
Second stored in default white list crosses filter data and is matched.When web data and second cross the non-successful match of filter data
When, then extraction carries the domain name to be identified of suspicious label, obtains the webpage in the corresponding website of domain name to be identified as risk
Grade is more than the webpage of predetermined level.
In one of the embodiments, the step of being realized when which is executed by processor can also include:When
Not there is no the domain name to be identified for carrying suspicious label after default blacklist and default white list carry out data identification
When, then obtain the corresponding identifier of domain name to be identified.By identifier and the safety post being stored in advance in security identifier repository
Know symbol to be matched.When secure identifier identifier match success corresponding with domain name to be identified, then successful match is obtained
The associated secure domain name of secure identifier being stored in security identifier repository matches secure domain name with domain name to be identified.
When secure domain name matches unsuccessful with domain name to be identified, then the webpage in the corresponding website of domain name to be identified is as risk class
More than the webpage of predetermined level.
In one of the embodiments, realized when which is executed by processor according to acquired web data
After obtaining the step of risk class corresponding with domain name to be identified is more than the webpage of predetermined level, it can also include:Extract wind
Dangerous grade is more than the keyword of the web data of the webpage of predetermined level, is more than predetermined level to risk class according to keyword
The corresponding domain name to be identified of webpage adds corresponding class label.Risk class is more than the classification of the domain name to be identified of predetermined level
Label is matched with stored class label.When non-successful match, then add risk class and treated more than predetermined level
It identifies the class label of domain name, and risk class is more than under web storage to the class label of predetermined level.
The above-mentioned specific restriction on storage medium may refer to the restriction above in connection with web page identification method, herein not
It repeats again.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with
The program that relevant hardware is instructed to complete by computer program can be stored in a non-volatile computer and storage can be read
In medium, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, it is computer-readable to deposit
Storage media can be magnetic disc, CD, read-only memory (Read-OnlyMemory, ROM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality
It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, the scope that this specification is recorded all is considered to be.
Embodiment described above only expresses the several embodiments of the present invention, and description is more specific and detailed, but simultaneously
It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that come for those of ordinary skill in the art
It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention
Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.
Claims (10)
1. a kind of web page identification method, which is characterized in that including:
The webpage that identified risk class is more than predetermined level is obtained, extracts the corresponding website domain name of the webpage;
The corresponding network address in the website is obtained according to the website domain name;
The domain name with the network addresses is searched, when finding the domain name with the network addresses, then by described in
Associated domain name is as domain name to be identified;
Obtain the web data in the corresponding website of the domain name to be identified;
The webpage that risk class corresponding with the domain name to be identified is more than predetermined level is obtained according to acquired web data.
2. according to the method described in claim 1, it is characterized in that, the lookup and the step of the domain name of the network addresses
Suddenly, including:
The network address is matched with the network address being pre-stored in address information storehouse;
When the network address successful match being pre-stored in the network address and described address correlation database, acquisition prestores with described
The association domain name to be matched of the network addresses of storage;
Obtain effective deadline of the association domain name to be matched;
If current time is less than or equal to effective deadline, the associated domain name to be matched is extracted as domain to be identified
Name.
3. according to the method described in claim 1, it is characterized in that, the method further includes:
When not finding the domain name with the network addresses, then the corresponding log-on data of domain name of the website is obtained,
Corresponding domain name is inquired about as domain name to be identified according to the log-on data.
4. the according to the method described in claim 3, it is characterized in that, corresponding registration number of domain name for obtaining the website
According to, the step of corresponding domain name is as domain name to be identified is inquired about according to the log-on data, including:
The corresponding log-on data of domain name of the website is obtained, the corresponding conversion of the log-on data is chosen from conversion logic storehouse
Logic;
The log-on data carried out according to the conversion logic to be converted to transformed log-on data;
The transformed log-on data is matched with the information data stored in information repository;
When the information data successful match stored in transformed log-on data and information repository, then successful match is obtained
The domain name of described information data correlation is as domain name to be identified.
5. according to the method described in claim 1, it is characterized in that, the web data acquired in the basis obtains treating with described
Identify the step of corresponding risk class of domain name is more than the webpage of predetermined level, including:
By the web data with stored in default blacklist first cross filter data match, when the website data with
During the first filtering Data Matching success, then suspicious label is added to the domain name to be identified;
By the web data in the corresponding website of the domain name to be identified for adding suspicious label with being stored in default white list
Second cross filter data matched;
When web data successful match non-with the described second mistake filter data, then extraction carries the to be identified of suspicious label
Domain name obtains the webpage that the webpage in the corresponding website of the domain name to be identified is more than predetermined level as risk class.
6. according to the method described in claim 5, it is characterized in that, the method further includes:
Do not exist after data identification is carried out with the default white list by the default blacklist and carry suspicious mark
During the domain name to be identified of label, then the corresponding identifier of the domain name to be identified is obtained;
The identifier is matched with the secure identifier being stored in advance in security identifier repository;
When secure identifier identifier match success corresponding with the domain name to be identified, then depositing for successful match is obtained
The associated secure domain name of the secure identifier in the security identifier repository is stored up, the secure domain name is treated with described
Identify domain name matching;
When the secure domain name matches unsuccessful with the domain name to be identified, then in the corresponding website of the domain name to be identified
Webpage is more than the webpage of predetermined level as risk class.
7. according to the method described in claim 1, it is characterized in that, the web data acquired in the basis obtains treating with described
After identifying the step of corresponding risk class of domain name is more than the webpage of predetermined level, further include:
Keyword of the risk class more than the web data of the webpage of predetermined level is extracted, according to the keyword to described
The corresponding domain name to be identified of webpage that risk class is more than predetermined level adds corresponding class label;
The risk class is more than to the class label of the domain name to be identified of predetermined level and the progress of stored class label
Match somebody with somebody;
When non-successful match, then class label of the risk class more than the domain name to be identified of predetermined level is added, and will
The risk class is more than under web storage to the class label of predetermined level.
8. a kind of webpage identification device, which is characterized in that described device includes:
First acquisition module for obtaining the webpage that identified risk class is more than predetermined level, extracts the webpage and corresponds to
Website domain name;
Second acquisition module, for obtaining the corresponding network address in the website according to the website domain name;
Searching module, for search with the domain names of the network addresses, when finding the domain with the network addresses
During name, then using the associated domain name as domain name to be identified;
3rd acquisition module, for obtaining the web data in the corresponding website of the domain name to be identified;
Identification module, for according to acquired web data obtain risk class corresponding with the domain name to be identified be more than it is pre-
If the webpage of grade.
9. a kind of computer equipment, which is characterized in that on a memory and can handled including memory, processor and storage
The computer program run on device, which is characterized in that the processor realized when performing the computer program claim 1 to
Step in 7 in any one the method.
10. a kind of storage medium, is stored thereon with computer program, which is characterized in that the computer program is executed by processor
Step in Shi Shixian claim 1 to 7 any one the methods.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711297266.7A CN108092963B (en) | 2017-12-08 | 2017-12-08 | Webpage identification method and device, computer equipment and storage medium |
PCT/CN2018/077064 WO2019109529A1 (en) | 2017-12-08 | 2018-02-23 | Webpage identification method, device, computer apparatus, and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711297266.7A CN108092963B (en) | 2017-12-08 | 2017-12-08 | Webpage identification method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108092963A true CN108092963A (en) | 2018-05-29 |
CN108092963B CN108092963B (en) | 2020-05-08 |
Family
ID=62174944
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711297266.7A Active CN108092963B (en) | 2017-12-08 | 2017-12-08 | Webpage identification method and device, computer equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108092963B (en) |
WO (1) | WO2019109529A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110012030A (en) * | 2019-04-23 | 2019-07-12 | 北京微步在线科技有限公司 | A kind of method and device of association detection hacker |
CN110033092A (en) * | 2019-01-31 | 2019-07-19 | 阿里巴巴集团控股有限公司 | Data label generation, model training, event recognition method and device |
CN110266661A (en) * | 2019-06-04 | 2019-09-20 | 东软集团股份有限公司 | A kind of authorization method, device and equipment |
CN110865818A (en) * | 2018-08-28 | 2020-03-06 | 优视科技有限公司 | Application associated domain name detection method and device and electronic equipment |
CN110958244A (en) * | 2019-11-29 | 2020-04-03 | 北京邮电大学 | Method and device for detecting counterfeit domain name based on deep learning |
CN111814643A (en) * | 2020-06-30 | 2020-10-23 | 杭州科度科技有限公司 | Black and gray URL (Uniform resource locator) identification method and device, electronic equipment and medium |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113098859B (en) * | 2021-03-30 | 2023-03-31 | 深圳市欢太科技有限公司 | Webpage page rollback method, device, terminal and storage medium |
CN113923193B (en) * | 2021-10-27 | 2023-11-28 | 北京知道创宇信息技术股份有限公司 | Network domain name association method and device, storage medium and electronic equipment |
CN114900363A (en) * | 2022-05-18 | 2022-08-12 | 杭州安恒信息技术股份有限公司 | Malicious website identification method and device, electronic equipment and storage medium |
CN116708356B (en) * | 2023-08-02 | 2023-11-14 | 苏州迈科网络安全技术股份有限公司 | IP feature library generation method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102096781A (en) * | 2011-01-18 | 2011-06-15 | 南京邮电大学 | Fishing detection method based on webpage relevance |
CN102724187A (en) * | 2012-06-06 | 2012-10-10 | 奇智软件(北京)有限公司 | Method and device for safety detection of universal resource locators |
CN102739653A (en) * | 2012-06-06 | 2012-10-17 | 奇智软件(北京)有限公司 | Detection method and device aiming at webpage address |
CN105338001A (en) * | 2015-12-04 | 2016-02-17 | 北京奇虎科技有限公司 | Method and device for recognizing phishing website |
CN106302438A (en) * | 2016-08-11 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of method of actively monitoring fishing website of Behavior-based control feature by all kinds of means |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8869269B1 (en) * | 2008-05-28 | 2014-10-21 | Symantec Corporation | Method and apparatus for identifying domain name abuse |
CN102523210B (en) * | 2011-12-06 | 2014-11-05 | 中国科学院计算机网络信息中心 | Phishing website detection method and device |
CN102663000B (en) * | 2012-03-15 | 2016-08-03 | 北京百度网讯科技有限公司 | The maliciously recognition methods of the method for building up of network address database, maliciously network address and device |
CN105718577B (en) * | 2016-01-22 | 2020-01-21 | 中国互联网络信息中心 | Method and system for automatically detecting phishing aiming at newly added domain name |
-
2017
- 2017-12-08 CN CN201711297266.7A patent/CN108092963B/en active Active
-
2018
- 2018-02-23 WO PCT/CN2018/077064 patent/WO2019109529A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102096781A (en) * | 2011-01-18 | 2011-06-15 | 南京邮电大学 | Fishing detection method based on webpage relevance |
CN102724187A (en) * | 2012-06-06 | 2012-10-10 | 奇智软件(北京)有限公司 | Method and device for safety detection of universal resource locators |
CN102739653A (en) * | 2012-06-06 | 2012-10-17 | 奇智软件(北京)有限公司 | Detection method and device aiming at webpage address |
CN105338001A (en) * | 2015-12-04 | 2016-02-17 | 北京奇虎科技有限公司 | Method and device for recognizing phishing website |
CN106302438A (en) * | 2016-08-11 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of method of actively monitoring fishing website of Behavior-based control feature by all kinds of means |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110865818A (en) * | 2018-08-28 | 2020-03-06 | 优视科技有限公司 | Application associated domain name detection method and device and electronic equipment |
CN110865818B (en) * | 2018-08-28 | 2023-07-28 | 阿里巴巴(中国)有限公司 | Detection method and device for application associated domain name and electronic equipment |
CN110033092A (en) * | 2019-01-31 | 2019-07-19 | 阿里巴巴集团控股有限公司 | Data label generation, model training, event recognition method and device |
CN110033092B (en) * | 2019-01-31 | 2020-06-02 | 阿里巴巴集团控股有限公司 | Data label generation method, data label training device, event recognition method and event recognition device |
CN110012030A (en) * | 2019-04-23 | 2019-07-12 | 北京微步在线科技有限公司 | A kind of method and device of association detection hacker |
CN110266661A (en) * | 2019-06-04 | 2019-09-20 | 东软集团股份有限公司 | A kind of authorization method, device and equipment |
CN110266661B (en) * | 2019-06-04 | 2021-09-14 | 东软集团股份有限公司 | Authorization method, device and equipment |
CN110958244A (en) * | 2019-11-29 | 2020-04-03 | 北京邮电大学 | Method and device for detecting counterfeit domain name based on deep learning |
CN111814643A (en) * | 2020-06-30 | 2020-10-23 | 杭州科度科技有限公司 | Black and gray URL (Uniform resource locator) identification method and device, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
WO2019109529A1 (en) | 2019-06-13 |
CN108092963B (en) | 2020-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108092963A (en) | Web page identification method, device, computer equipment and storage medium | |
US9276956B2 (en) | Method for detecting phishing website without depending on samples | |
CN104899508B (en) | A kind of multistage detection method for phishing site and system | |
EP2803031B1 (en) | Machine-learning based classification of user accounts based on email addresses and other account information | |
CN104954372B (en) | A kind of evidence obtaining of fishing website and verification method and system | |
CN109690547A (en) | For detecting the system and method cheated online | |
CN103530367B (en) | A kind of fishing website identification system and method | |
CN105119909B (en) | A kind of counterfeit website detection method and system based on page visual similarity | |
CN106302440B (en) | Method for acquiring suspicious phishing websites through multiple channels | |
CN109522504A (en) | A method of counterfeit website is differentiated based on threat information | |
US20180131708A1 (en) | Identifying Fraudulent and Malicious Websites, Domain and Sub-domain Names | |
CN103634317A (en) | Method and system of performing safety appraisal on malicious web site information on basis of cloud safety | |
CN112804210B (en) | Data association method and device, electronic equipment and computer-readable storage medium | |
CN103209177B (en) | The detection method of phishing attacks and device | |
CN109274632A (en) | A kind of recognition methods of website and device | |
CN112464666B (en) | Unknown network threat automatic discovery method based on hidden network data | |
CN104239582A (en) | Method and device for identifying phishing webpage based on feature vector model | |
CN103067387A (en) | Monitoring system and monitoring method for anti phishing | |
CN102902722B (en) | A kind of disposal route of Information Security and system | |
CN112751804B (en) | Method, device and equipment for identifying counterfeit domain name | |
CN108270754B (en) | Detection method and device for phishing website | |
CN105262730A (en) | Monitoring method and device based on enterprise domain name safety | |
CN107590265A (en) | A kind of administrative ownership recognition methods in the website based on web crawlers | |
US20040267895A1 (en) | Search system using real name and method thereof | |
KR20100120966A (en) | System for sorting phising site base on searching web site and method therefor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |