CN103501306B - A kind of network address knows method for distinguishing, server and system - Google Patents

A kind of network address knows method for distinguishing, server and system Download PDF

Info

Publication number
CN103501306B
CN103501306B CN201310503007.0A CN201310503007A CN103501306B CN 103501306 B CN103501306 B CN 103501306B CN 201310503007 A CN201310503007 A CN 201310503007A CN 103501306 B CN103501306 B CN 103501306B
Authority
CN
China
Prior art keywords
network address
malice
pages
content
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310503007.0A
Other languages
Chinese (zh)
Other versions
CN103501306A (en
Inventor
刘健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Wuhan Co Ltd
Original Assignee
Tencent Technology Wuhan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Wuhan Co Ltd filed Critical Tencent Technology Wuhan Co Ltd
Priority to CN201310503007.0A priority Critical patent/CN103501306B/en
Publication of CN103501306A publication Critical patent/CN103501306A/en
Priority to PCT/CN2014/088468 priority patent/WO2015058631A1/en
Application granted granted Critical
Publication of CN103501306B publication Critical patent/CN103501306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Abstract

The invention discloses a kind of network address and know method for distinguishing, including: obtain the content of pages that network address to be detected is corresponding;Described content of pages is mated with any page face die plate in the malice Page Template storehouse previously generated;When the matching similarity of described content of pages and described any page face die plate is more than the first predetermined threshold value, determine that described network address to be detected is for malice network address.The embodiment of the present invention also provides for corresponding server.The network address that the embodiment of the present invention provides knows method for distinguishing, can quickly identify malice network address, thus improve internet security.

Description

A kind of network address knows method for distinguishing, server and system
Technical field
The present invention relates to Internet technical field, be specifically related to a kind of network address and know method for distinguishing, server and be System.
Background technology
The Internet is while bringing convenience to people's lives, and the security situation of the Internet also allows of no optimist, respectively The class trojan horse normal file that disguises oneself as is propagated wantonly, and fishing website imitates normal website, and to steal user account number close Code grows in intensity.
For identification and the strike of malicious websites, generally there is two schemes: a class is based on user's report and people The method of work examination & verification, user can submit suspicious URL (Uniform Resource to Locator, URL), URL is also referred to as web page address, is called for short network address, then adds after artificial nucleus actually malice Enter in malice url list;One class is method based on URL feature identification.
To in the research of prior art and practice process, it was found by the inventors of the present invention that nothing in prior art The method being also based on URL feature identification by the method being manual examination and verification, is required for long time ability Determine whether this network address is malice network address, cause the recognition efficiency to malice network address low.
Summary of the invention
The embodiment of the present invention provides a kind of network address to know method for distinguishing, can quickly identify malice network address, thus Improve internet security.The embodiment of the present invention additionally provides corresponding server and system.
First aspect present invention provides a kind of network address to know method for distinguishing, including:
Obtain the content of pages that network address to be detected is corresponding;
Described content of pages and any page face die plate in the malice Page Template storehouse previously generated are carried out Join;
When the matching similarity of described content of pages and described any page face die plate is more than the first predetermined threshold value, Determine that described network address to be detected is for malice network address.
In conjunction with first aspect, in the implementation that the first is possible, described method also includes:
Described malice network address is stored in the malice URL library pre-set, and collects and drawn black network address to described Maliciously URL library.
In conjunction with the first possible implementation of first aspect, in the implementation that the second is possible, described Method also includes:
Described malice Page Template storehouse is updated according to described malice URL library.
In conjunction with the implementation that first aspect the second is possible, in the implementation that the third is possible, described Described malice Page Template storehouse is updated according to described malice URL library, including:
Obtain the content of pages that each network address in described malice URL library is corresponding;
Calculate the similarity of any two content of pages in the content of pages that each network address described is corresponding, by institute The similarity stating any two content of pages is divided into identity set more than the network address of the second predetermined threshold value;
Make comprising the network address quantity content of pages corresponding more than network address in arbitrary set of the 3rd preset threshold value For malice Page Template, and described malice Page Template is stored in described malice Page Template storehouse.
In conjunction with first aspect, first aspect the first to the third may any one in implementation, In 4th kind of possible implementation, the content of pages that described acquisition network address to be detected is corresponding, including:
Receive the network address described to be detected that user side sends;
According to the content of pages that network address to be detected described in described website, download to be detected is corresponding.
Second aspect present invention provides a kind of server, including:
Acquiring unit, for obtaining the content of pages that network address to be detected is corresponding;
Matching unit, for the content of pages obtained by described acquiring unit and the malice page mould previously generated Each Page Template in plate storehouse mates;
Determine unit, for matching described content of pages and described any page face die plate when described matching unit Matching similarity more than the first predetermined threshold value time, determine described network address to be detected for malice network address.
In conjunction with second aspect, in the implementation that the first is possible, described server also includes:
Memory element, for being stored in, by described malice network address, the malice URL library pre-set;
Collector unit, is drawn black network address to described malice URL library for collecting.
In conjunction with the first possible implementation of second aspect, in the implementation that the second is possible, described Server also includes:
Updating block, for updating described malice Page Template storehouse according to described malice URL library.
In conjunction with the implementation that second aspect the second is possible, in the implementation that the third is possible, described Updating block includes:
Obtain subelement, for obtaining the content of pages that each network address in described malice URL library is corresponding;
Computation subunit, for calculating the content of pages that each network address of described acquisition subelement acquisition is corresponding The similarity of middle any two content of pages;
Dividing subelement, the similarity of any two content of pages for described computation subunit being calculated surpasses The network address crossing the second predetermined threshold value is divided into identity set;
Determine subelement, for single more than arbitrary described division of the 3rd preset threshold value by comprising network address quantity The content of pages that in the set that unit divides, network address is corresponding is as malice Page Template;
Storing sub-units, is used for and the described malice Page Template determining that subelement determines is stored in described malice In Page Template storehouse.
In conjunction with second aspect, second aspect the first to the third may any one in implementation, In 4th kind of possible implementation, described acquiring unit includes:
Receive subelement, for receiving the network address described to be detected that user side sends;
Lower subelements, for be checked described in the website, download to be detected that receives according to described reception subelement The content of pages that survey grid location is corresponding.
Third aspect present invention provides a kind of network address identification system, including: server and user side,
Wherein, described server is the server described in technique scheme.
The embodiment of the present invention uses and obtains the content of pages that network address to be detected is corresponding;By described content of pages with pre- Any page face die plate in the malice Page Template storehouse first generated mates;When described content of pages is with described When the matching similarity of any page face die plate is more than the first predetermined threshold value, determine that described network address to be detected is for malice Network address.With in prior art to the recognition efficiency of malice network address lowly compared with, the net that the embodiment of the present invention provides Method for distinguishing is known in location, can quickly identify malice network address, thus improve internet security.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, institute in embodiment being described below The accompanying drawing used is needed to be briefly described, it should be apparent that, the accompanying drawing in describing below is only the present invention Some embodiments, for those skilled in the art, on the premise of not paying creative work, also Other accompanying drawing can be obtained according to these accompanying drawings.
Fig. 1 is the embodiment schematic diagram that in the embodiment of the present invention, network address knows method for distinguishing;
Fig. 2 is another embodiment schematic diagram that in the embodiment of the present invention, network address knows method for distinguishing;
Fig. 3 is an embodiment schematic diagram of server in the embodiment of the present invention;
Fig. 4 is another embodiment schematic diagram of server in the embodiment of the present invention;
Fig. 5 is another embodiment schematic diagram of server in the embodiment of the present invention;
Fig. 6 is another embodiment schematic diagram of server in the embodiment of the present invention;
Fig. 7 is another embodiment schematic diagram of server in the embodiment of the present invention;
Fig. 8 is another embodiment schematic diagram of server in the embodiment of the present invention;
Fig. 9 is an embodiment schematic diagram of network address identification system in the embodiment of the present invention.
Detailed description of the invention
The embodiment of the present invention provides a kind of network address to know method for distinguishing, can quickly identify malice network address, thus Improve internet security.The embodiment of the present invention additionally provides corresponding server and system.Carry out individually below Describe in detail.
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clearly Chu, be fully described by, it is clear that described embodiment be only a part of embodiment of the present invention rather than Whole embodiments.Based on the embodiment in the present invention, those skilled in the art are not making creative labor The every other embodiment obtained under dynamic premise, broadly falls into the scope of protection of the invention.
Refering to Fig. 1, the network address that the embodiment of the present invention provides is known an embodiment of method for distinguishing and is included:
101, the content of pages that network address to be detected is corresponding is obtained.
102, described content of pages is entered with any page face die plate in the malice Page Template storehouse previously generated Row coupling.
The malice Page Template storehouse previously generated can be according to the network address of reporting of user accumulated before or Summed up out by the content of pages drawing black network address corresponding.
Draw black URL to be i.e. identified as malice by security software programs or operation personnel receives user and reports descendant Work examines the URL into malice.
103, threshold is preset when the matching similarity of described content of pages Yu described any page face die plate more than first During value, determine that described network address to be detected is for malice network address.
First predetermined threshold value can be 80%, 90% or other numerical value.
Maliciously network address refers to that malice plants the rogue program such as wooden horse, virus in network address, by " the net of camouflage Location service content " induce user to access this network address, once enter these network address, plantation in network address will be triggered Under wooden horse, the program such as virus, cause visitor computer infected, face loss account number or privacy letter The danger such as breath.Maliciously network address easily occur in some nameless with sell, recommend character network address in.
The embodiment of the present invention uses and obtains the content of pages that network address to be detected is corresponding;By described content of pages with pre- Any page face die plate in the malice Page Template storehouse first generated mates;When described content of pages is with described When the matching similarity of any page face die plate is more than the first predetermined threshold value, determine that described network address to be detected is for malice Network address.With in prior art to the recognition efficiency of malice network address lowly compared with, the net that the embodiment of the present invention provides Method for distinguishing is known in location, can quickly identify malice network address, thus improve internet security.
Alternatively, on the basis of the embodiment that above-mentioned Fig. 1 is corresponding, the network address that the embodiment of the present invention provides is known In one alternative embodiment of method for distinguishing, described method can also include:
Described malice network address is stored in the malice URL library pre-set, and collects and drawn black network address to described Maliciously URL library.
In the embodiment of the present invention, when match network address to be detected for malice network address after, can by this malice net Location stores malice URL library, maliciously URL library can store the reporting of user accumulated before network address or Person is drawn black network address, and server can be with the network address of persistent collection reporting of user or drawn black network address.
Alternatively, on the basis of an alternative embodiment corresponding for above-mentioned Fig. 1, the embodiment of the present invention provides Network address is known in another alternative embodiment of method for distinguishing, and described method can also include:
Described malice Page Template storehouse is updated according to described malice URL library.
In the embodiment of the present invention, because malice network address the most constantly increases, so malice Page Template storehouse also needs Constantly updating, such guarantee confirms malice network address efficiently.
Alternatively, on the basis of another alternative embodiment that above-mentioned Fig. 1 is corresponding, the embodiment of the present invention provides Network address know method for distinguishing another alternative embodiment in, described according to described malice URL library update described evil Meaning Page Template storehouse, including:
Obtain the content of pages that each network address in described malice URL library is corresponding;
Calculate the similarity of any two content of pages in the content of pages that each network address described is corresponding, by institute The similarity stating any two content of pages is divided into identity set more than the network address of the second predetermined threshold value;
Make comprising the network address quantity content of pages corresponding more than network address in arbitrary set of the 3rd preset threshold value For malice Page Template, and described malice Page Template is stored in described malice Page Template storehouse.
In the embodiment of the present invention, compare the similarity calculating web page contents two-by-two.Weigh the calculation of text similarity Method has a lot, such as longest common subsequence, minimum editing distance, Hamming distance, characteristic vector cosine value Deng, the invention is not limited in this regard, and only do an explanation with minimum editing distance.Assuming that text A is "<html>hi</html>" (string length is 15), text B be "<html>hello</html>" (word Symbol string length is 18), text A is converted to text B to be needed character ' i ' is become ' e ', then adds respectively Character ' l ', ' l ', ' o ', at least need 4 steps, then its minimum editing distance is 4;Text A's with B is similar Degree can be defined as the minimum editing distance of 1-()/(maximum of A and B length), i.e. 1-4/18=0.78;As It is 0.8 that fruit arranges similarity threshold, then similarity is less than threshold value, it is believed that text A and text B is dissimilar.
URL is sorted out according to similarity result, for example, it is assumed that have 8 URL, the most similar Including (URL1, URL3), (URL3, URL7) and (URL4, URL6), by similar URL Add identity set, then all URL can be divided into following set:
Set 1:URL1, URL3, URL7
Set 2:URL4, URL6
Set 3:URL2
Set 4:URL5
Set 5:URL8
By above-mentioned set in magnitude order, choose meet the requirements set in URL content of pages as malice mould Plate.As the set comprising at least 3 similar URL can be selected as template, then example above only collects Close 1 to meet the requirements.Threshold value for set sizes can adjust according to practical situation.
Second predetermined threshold value can be identical with the first predetermined threshold value, it is also possible to different, the 3rd predetermined threshold value is permissible It is the numerical value such as 3,4,5, this is not limited.
Alternatively, on the basis of arbitrary alternative embodiment that above-mentioned Fig. 1 or Fig. 1 is corresponding, the embodiment of the present invention The network address provided is known in another alternative embodiment of method for distinguishing, the page that described acquisition network address to be detected is corresponding Content, may include that
Receive the network address described to be detected that user side sends;
According to the content of pages that network address to be detected described in described website, download to be detected is corresponding.
In the embodiment of the present invention, after server receives network address to be detected, can be directly from locally stored page Face content finds the content of pages that this network address to be detected is corresponding.
In order to make it easy to understand, refering to Fig. 2, below as a example by an application scenarios, the embodiment of the present invention is described The process of middle network address identification:
S100, server obtain network address to be detected.
S110, determine whether content of pages corresponding to network address to be detected can be downloaded, if execution can be downloaded Step S130, if cannot download, performs step S120.
S120, when the content of pages that network address to be detected is corresponding cannot be downloaded, this network address to be detected is set State is for meaning no harm.
S130, when the content of pages that network address to be detected is corresponding can be downloaded, determine this content of pages with malice Arbitrary page in Page Template storehouse mates, and when matching, performs step S140, when not matching Time, perform step S150.
Content of pages corresponding to S140, network address to be detected matches with arbitrary page in malice Page Template storehouse After, confirm that this network address to be detected, for malice network address, is set to malice network address by this network address to be detected.
S150, when not matching, proceed existing detection logic of the prior art.
Refering to Fig. 3, an embodiment of the server 20 that the embodiment of the present invention provides includes:
Acquiring unit 201, for obtaining the content of pages that network address to be detected is corresponding;
Matching unit 202, for the content of pages obtained by described acquiring unit 201 and the malice previously generated Each Page Template in Page Template storehouse mates;
Determine unit 203, for matching described content of pages and described any page when described matching unit 202 When the matching similarity of face die plate is more than the first predetermined threshold value, determine that described network address to be detected is for malice network address.
In the embodiment of the present invention, acquiring unit 201 obtains the content of pages that network address to be detected is corresponding;Coupling is single Unit 202 content of pages that described acquiring unit 201 is obtained with in the malice Page Template storehouse previously generated Each Page Template mates;Determine that unit 203 matches described content of pages when described matching unit 202 During with the matching similarity of described any page face die plate more than the first predetermined threshold value, determine described network address to be detected For malice network address.With in prior art to the recognition efficiency of malice network address lowly compared with, the embodiment of the present invention carries The server of confession, can quickly identify malice network address, thus improve internet security.
Alternatively, on the basis of the embodiment that above-mentioned Fig. 3 is corresponding, refering to Fig. 4, the embodiment of the present invention provides Server another embodiment in, described server 20 also includes:
Memory element 204, for being stored in, by described malice network address, the malice URL library pre-set;
Collector unit 205, is drawn black network address to described malice URL library for collecting.
Alternatively, on the basis of the embodiment that above-mentioned Fig. 4 is corresponding, refering to Fig. 5, the embodiment of the present invention provides Server another embodiment in, described server 20 also includes:
Updating block 206, for updating described malice Page Template storehouse according to described malice URL library.
Alternatively, on the basis of the embodiment that above-mentioned Fig. 5 is corresponding, refering to Fig. 6, the embodiment of the present invention provides Server another embodiment in, described updating block 206 includes:
Obtain subelement 2061, for obtaining in the page that each network address in described malice URL library is corresponding Hold;
Computation subunit 2062, corresponding for calculating each network address of described acquisition subelement 2061 acquisition The similarity of any two content of pages in content of pages;
Divide subelement 2063, for any two content of pages that described computation subunit 2062 calculated Similarity is divided into identity set more than the network address of the second predetermined threshold value;
Determine subelement 2064, for the arbitrary described division more than the 3rd preset threshold value of the network address quantity will be comprised The content of pages that in the set that subelement 2063 divides, network address is corresponding is as malice Page Template;
Storing sub-units 2065, is used for and is stored in by the described malice Page Template determining that subelement 2064 determines In described malice Page Template storehouse.
Alternatively, on the basis of the embodiment that above-mentioned Fig. 3 is corresponding, refering to Fig. 7, the embodiment of the present invention provides Server another embodiment in, described acquiring unit 201 includes:
Receive subelement 2011, for receiving the network address described to be detected that user side sends;
Lower subelements 2012, for the website, download to be detected received according to described reception subelement 2011 The content of pages that described network address to be detected is corresponding.
The embodiment of the present invention also provides for a kind of computer-readable storage medium, and this storage medium has program stored therein, this journey Sequence includes when performing that above-mentioned network address knows the some or all of step of method for distinguishing.
It is the structural representation of embodiment of the present invention server 20 refering to Fig. 8, Fig. 8.Server 20 can include defeated Enter equipment 210, outut device 220, processor 230 and memorizer 240.
Memorizer 240 can include read only memory and random access memory, and refers to processor 230 offer Order and data.A part for memorizer 240 can also include nonvolatile RAM (NVRAM).
Memorizer 240 stores following element, executable module or data structure, or their son Collection, or their superset:
Operational order: include various operational order, is used for realizing various operation.
Operating system: include various system program, is used for realizing various basic business and processing based on hardware Task.
In embodiments of the present invention, processor 230 is by calling operational order (this behaviour of memorizer 240 storage It is storable in operating system as instruction), perform to operate as follows:
Obtain the content of pages that network address to be detected is corresponding;
Described content of pages and any page face die plate in the malice Page Template storehouse previously generated are carried out Join;
When the matching similarity of described content of pages and described any page face die plate is more than the first predetermined threshold value, Determine that described network address to be detected is for malice network address.
With in prior art to the recognition efficiency of malice network address lowly compared with, the network address that the embodiment of the present invention provides Know method for distinguishing, can quickly identify malice network address, thus improve internet security.
Processor 230 controls the operation of server 20, and processor 230 can also be referred to as CPU(Central Processing Unit, CPU).Memorizer 240 can include read only memory and random access memory Memorizer, and provide instruction and data to processor 230.A part for memorizer 240 can also include non-easily The property lost random access memory (NVRAM).In concrete application, each assembly of server 20 passes through Bus system 250 is coupled, and wherein bus system 250 is in addition to including data/address bus, it is also possible to include Power bus, control bus and status signal bus in addition etc..But for the sake of understanding explanation, in the drawings will be each Plant bus and be all designated as bus system 250.
The method that the invention described above embodiment discloses can apply in processor 230, or by processor 230 Realize.Processor 230 is probably a kind of IC chip, has the disposal ability of signal.Realizing Cheng Zhong, each step of said method can be by the integrated logic circuit of the hardware in processor 230 or soft The instruction of part form completes.Above-mentioned processor 230 can be general processor, digital signal processor (DSP), special IC (ASIC), ready-made programmable gate array (FPGA) or other can compile Journey logical device, discrete gate or transistor logic, discrete hardware components.Can realize or perform Disclosed each method, step and logic diagram in the embodiment of the present invention.General processor can be micro-process Device or this processor can also be the processors etc. of any routine.In conjunction with the side disclosed in the embodiment of the present invention The step of method can be embodied directly in hardware decoding processor and perform, or hard with in decoding processor Part and software module combination execution complete.Software module may be located at random access memory, flash memory, read-only storage Device, ripe the depositing in this area such as programmable read only memory or electrically erasable programmable memorizer, depositor In storage media.This storage medium is positioned at memorizer 240, and processor 230 reads the information in memorizer 240, The step of said method is completed in conjunction with its hardware.
Alternatively, described malice network address also can be stored in the malice URL library pre-set by processor 230, and Collecting is drawn black network address to described malice URL library.
Alternatively, processor 230 also can update described malice Page Template storehouse according to described malice URL library.
Alternatively, the page that each network address during processor 230 specifically can obtain described malice URL library is corresponding Face content;Calculate the similarity of any two content of pages in the content of pages that each network address described is corresponding, The similarity of described any two content of pages is divided into identity set more than the network address of the second predetermined threshold value; The network address quantity content of pages corresponding more than network address in arbitrary set of the 3rd preset threshold value will be comprised as evil Meaning Page Template, and described malice Page Template is stored in described malice Page Template storehouse.
Alternatively, input equipment 210 can receive the network address described to be detected that user side sends;
Processor 230 is according to content of pages corresponding to network address to be detected described in described website, download to be detected.
Refering to Fig. 9, an embodiment of the network address identification system that the embodiment of the present invention provides includes: server 20 Communicate to connect with user side 30, server 20 and user side 30;
In the embodiment of the present invention, user side can have multiple, only depicts three, can essentially have in Fig. 9 A lot of.
Described server 20, for obtaining the content of pages that network address to be detected is corresponding;By described content of pages with Any page face die plate in the malice Page Template storehouse previously generated mates;When described content of pages and institute When stating the matching similarity of any page face die plate more than the first predetermined threshold value, determine that described network address to be detected is for disliking Meaning network address.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is Can instruct relevant hardware by program to complete, this program can be stored in a computer-readable storage In medium, storage medium may include that ROM, RAM, disk or CD etc..
The network address provided the embodiment of the present invention above is known method for distinguishing, server and system and has been carried out in detail Thin introducing, principle and the embodiment of the present invention are set forth by specific case used herein, above The explanation of embodiment is only intended to help to understand method and the core concept thereof of the present invention;Simultaneously for ability The those skilled in the art in territory, according to the thought of the present invention, the most all have In place of change, in sum, this specification content should not be construed as limitation of the present invention.

Claims (7)

1. a network address knows method for distinguishing, it is characterised in that including:
Obtain the content of pages that network address to be detected is corresponding;
Described content of pages and any page face die plate in the malice Page Template storehouse previously generated are carried out Join;
When the matching similarity of described content of pages and described any page face die plate is more than the first predetermined threshold value, Determine that described network address to be detected is for malice network address;
Described malice Page Template storehouse is updated according to described malice URL library;
Wherein, described according to described malice URL library update described malice Page Template storehouse, including:
Obtain the content of pages that each network address in described malice URL library is corresponding;
Calculate the similarity of any two content of pages in the content of pages that each network address described is corresponding, by institute The similarity stating any two content of pages is divided into identity set more than the network address of the second predetermined threshold value;
Make comprising the network address quantity content of pages corresponding more than network address in arbitrary set of the 3rd preset threshold value For malice Page Template, and described malice Page Template is stored in described malice Page Template storehouse.
Method the most according to claim 1, it is characterised in that described method also includes:
Described malice network address is stored in the malice URL library pre-set, and collects and drawn black network address to described Maliciously URL library.
3. according to the arbitrary described method of claim 1-2, it is characterised in that described acquisition network address to be detected Corresponding content of pages, including:
Receive the network address described to be detected that user side sends;
According to the content of pages that network address to be detected described in described website, download to be detected is corresponding.
4. a server, it is characterised in that including:
Acquiring unit, for obtaining the content of pages that network address to be detected is corresponding;
Matching unit, for the content of pages obtained by described acquiring unit and the malice page mould previously generated Each Page Template in plate storehouse mates;
Determine unit, for matching described content of pages and described any page face die plate when described matching unit Matching similarity more than the first predetermined threshold value time, determine described network address to be detected for malice network address;
Updating block, for updating described malice Page Template storehouse according to described malice URL library;
Wherein, described updating block includes:
Obtain subelement, for obtaining the content of pages that each network address in described malice URL library is corresponding;
Computation subunit, for calculating the content of pages that each network address of described acquisition subelement acquisition is corresponding The similarity of middle any two content of pages;
Dividing subelement, the similarity of any two content of pages for described computation subunit being calculated surpasses The network address crossing the second predetermined threshold value is divided into identity set;
Determine subelement, for single more than arbitrary described division of the 3rd preset threshold value by comprising network address quantity The content of pages that in the set that unit divides, network address is corresponding is as malice Page Template;
Storing sub-units, is used for and the described malice Page Template determining that subelement determines is stored in described malice In Page Template storehouse.
Server the most according to claim 4, it is characterised in that described server also includes:
Memory element, for being stored in, by described malice network address, the malice URL library pre-set;
Collector unit, is drawn black network address to described malice URL library for collecting.
6. according to the arbitrary described server of claim 4-5, it is characterised in that described acquiring unit includes:
Receive subelement, for receiving the network address described to be detected that user side sends;
Lower subelements, for be checked described in the website, download to be detected that receives according to described reception subelement The content of pages that survey grid location is corresponding.
7. a network address identification system, it is characterised in that including: server and user side,
Wherein, described server is the arbitrary described servers of the claims 4-6.
CN201310503007.0A 2013-10-23 2013-10-23 A kind of network address knows method for distinguishing, server and system Active CN103501306B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310503007.0A CN103501306B (en) 2013-10-23 2013-10-23 A kind of network address knows method for distinguishing, server and system
PCT/CN2014/088468 WO2015058631A1 (en) 2013-10-23 2014-10-13 Method, server and system for malicious url identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310503007.0A CN103501306B (en) 2013-10-23 2013-10-23 A kind of network address knows method for distinguishing, server and system

Publications (2)

Publication Number Publication Date
CN103501306A CN103501306A (en) 2014-01-08
CN103501306B true CN103501306B (en) 2016-09-14

Family

ID=49866478

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310503007.0A Active CN103501306B (en) 2013-10-23 2013-10-23 A kind of network address knows method for distinguishing, server and system

Country Status (2)

Country Link
CN (1) CN103501306B (en)
WO (1) WO2015058631A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103501306B (en) * 2013-10-23 2016-09-14 腾讯科技(武汉)有限公司 A kind of network address knows method for distinguishing, server and system
CN104852883A (en) * 2014-02-14 2015-08-19 腾讯科技(深圳)有限公司 Method and system for protecting safety of account information
CN104079560A (en) * 2014-06-05 2014-10-01 腾讯科技(深圳)有限公司 Web address security detecting method and device and server
CN108683666B (en) * 2018-05-16 2021-04-16 新华三信息安全技术有限公司 Webpage identification method and device
CN109992666A (en) * 2019-03-22 2019-07-09 阿里巴巴集团控股有限公司 Method, apparatus and non-transitory machine readable media for processing feature library
CN111198939B (en) * 2019-12-27 2021-11-23 北京健康之家科技有限公司 Statement similarity analysis method and device and computer equipment
CN114172676A (en) * 2020-09-10 2022-03-11 中国移动通信有限公司研究院 Malicious website detection method, device, equipment and storage medium
CN112084501A (en) * 2020-09-18 2020-12-15 珠海豹趣科技有限公司 Malicious program detection method and device, electronic device and storage medium
CN113098859B (en) * 2021-03-30 2023-03-31 深圳市欢太科技有限公司 Webpage page rollback method, device, terminal and storage medium
CN113239305A (en) * 2021-05-19 2021-08-10 中国电子科技集团公司第三十研究所 Target detection and identification method in cloud computing environment
CN113904827B (en) * 2021-09-29 2024-03-19 恒安嘉新(北京)科技股份公司 Identification method and device for counterfeit website, computer equipment and medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082792A (en) * 2010-12-31 2011-06-01 成都市华为赛门铁克科技有限公司 Phishing webpage detection method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693236A (en) * 2011-03-24 2012-09-26 苏州风采信息技术有限公司 Bad information filtering method based on content understanding
CN102170640A (en) * 2011-06-01 2011-08-31 南通海韵信息技术服务有限公司 Mode library-based smart mobile phone terminal adverse content website identifying method
CN102339320B (en) * 2011-11-04 2013-08-28 华为数字技术(成都)有限公司 Malicious web recognition method and device
CN102609516A (en) * 2012-02-08 2012-07-25 苏州中联互通信息科技有限公司 Content understanding-based bad information filter method
CN103501306B (en) * 2013-10-23 2016-09-14 腾讯科技(武汉)有限公司 A kind of network address knows method for distinguishing, server and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102082792A (en) * 2010-12-31 2011-06-01 成都市华为赛门铁克科技有限公司 Phishing webpage detection method and device

Also Published As

Publication number Publication date
CN103501306A (en) 2014-01-08
WO2015058631A1 (en) 2015-04-30

Similar Documents

Publication Publication Date Title
CN103501306B (en) A kind of network address knows method for distinguishing, server and system
CN108768943B (en) Method and device for detecting abnormal account and server
CN110177108B (en) Abnormal behavior detection method, device and verification system
CN107204960B (en) Webpage identification method and device and server
CN107659570A (en) Webshell detection methods and system based on machine learning and static and dynamic analysis
CN107241296B (en) Webshell detection method and device
CN110099059A (en) A kind of domain name recognition methods, device and storage medium
CN104079559B (en) A kind of website safety detection method, device and server
CN105224600B (en) A kind of detection method and device of Sample Similarity
CN108023868B (en) Malicious resource address detection method and device
CN103019879A (en) Method and system for processing crash information of browser
CN104202291A (en) Anti-phishing method based on multi-factor comprehensive assessment method
CN106708952A (en) Web page clustering method and device
CN107547671A (en) A kind of URL matching process and device
CN105893622A (en) Polymerization search method and polymerization search system
CN104158828A (en) Method and system for identifying doubtful phishing webpage on basis of cloud content rule base
CN107040532B (en) Data evaluation device using verification code for verification
CN110020161B (en) Data processing method, log processing method and terminal
CN109561163B (en) Method and device for generating uniform resource locator rewriting rule
CN112148956A (en) Hidden net threat information mining system and method based on machine learning
CN108876314B (en) Career professional ability traceable method and platform
CN104978523A (en) Malicious sample capture method and system based on network hot word recognition
CN103838865B (en) For excavating the method and device of ageing kind of subpage
CN106911635A (en) A kind of method and device of detection website with the presence or absence of backdoor programs
CN111125704B (en) Webpage Trojan horse recognition method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant