CN110309402A - Detect the method and system of website - Google Patents
Detect the method and system of website Download PDFInfo
- Publication number
- CN110309402A CN110309402A CN201810164312.4A CN201810164312A CN110309402A CN 110309402 A CN110309402 A CN 110309402A CN 201810164312 A CN201810164312 A CN 201810164312A CN 110309402 A CN110309402 A CN 110309402A
- Authority
- CN
- China
- Prior art keywords
- website
- detected
- similarity
- library
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 80
- 238000001514 detection method Methods 0.000 claims abstract description 97
- 230000002159 abnormal effect Effects 0.000 claims description 120
- 230000005856 abnormality Effects 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 26
- 230000015654 memory Effects 0.000 claims description 21
- 208000002693 Multiple Abnormalities Diseases 0.000 claims description 10
- 238000010168 coupling process Methods 0.000 claims description 8
- 238000005859 coupling reaction Methods 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 8
- 230000008878 coupling Effects 0.000 claims description 7
- 208000001613 Gambling Diseases 0.000 description 17
- 238000010586 diagram Methods 0.000 description 15
- 230000008569 process Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 235000013399 edible fruits Nutrition 0.000 description 5
- 238000012550 audit Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000005055 memory storage Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002224 dissection Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of method and systems for detecting website.Wherein, this method comprises: determining the similarity of the structure of web page of website and benchmark website to be detected;In the case where similarity is greater than the first preset value, the keyword that whether there is specified type in website to be detected is judged;There are in the case where the keyword of specified type, determine that website to be detected is the website of specified type in determining website to be detected.The present invention solve detection website in the prior art whether be violation website the low technical problem of accuracy rate.
Description
Technical field
The present invention relates to network detection fields, in particular to a kind of method and system for detecting website.
Background technique
With flourishing for Internet technology, people will receive a large amount of bad letters when accessing the data of various websites
The interference of breath, especially gambling, pornography are spread unchecked.Therefore, carrying out identification to the flame of website is to realize GreenNet
The premise of network.
Currently, mainly including the following two kinds to the detection of gambling, the pornographic network information:
(1) violation website is identified based on the dictionary of sensitive keys word.This method needs a large amount of manpower to regularly update word
Library, the amount of recalling depend on the sample dictionary of sensitive keys word, and can also have a large amount of wrong report phenomenon.
(2) violation website is identified based on the method for picture recognition, but this method not only needs to consume a large amount of computer
Resource, and discrimination is lower.
For the above-mentioned website of detection in the prior art whether be violation website the low problem of accuracy rate, not yet propose at present
Effective solution scheme.
Summary of the invention
The embodiment of the invention provides a kind of method and systems for detecting website, at least to solve to detect net in the prior art
Stand whether be violation website the low technical problem of accuracy rate.
According to an aspect of an embodiment of the present invention, a kind of method for detecting website is provided, comprising: determine survey grid to be checked
It stands and the similarity of the structure of web page of benchmark website;In the case where similarity is greater than the first preset value, website to be detected is judged
In whether there is specified type keyword;There are in the case where the keyword of specified type in determining website to be detected, really
Fixed website to be detected is the website of specified type.
According to another aspect of an embodiment of the present invention, a kind of method for detecting website is additionally provided, comprising: obtain to be detected
The data to be tested of website;Determine the first similarity of data to be tested and the data in the library of abnormal website, wherein abnormal website
Library includes the structure of web page of multiple abnormal websites;Determine the second similarity of data to be tested and the keyword in sensitive dictionary;
If the first similarity is greater than first threshold, and the second similarity is greater than second threshold, it is determined that website to be detected is specified type
Website.
According to another aspect of an embodiment of the present invention, a kind of method for detecting website is additionally provided, comprising: receive to be detected
The data information of website;The data information of website to be detected is evaluated based on multiple abnormality detection libraries, obtains survey grid to be checked
The value-at-risk stood, wherein different abnormality detection libraries corresponds to different judgment rules, and judgment rule is for determining website to be detected
Value-at-risk under different abnormality detection libraries;The Type of website of website to be detected is determined based on the value-at-risk of website to be detected.
According to another aspect of an embodiment of the present invention, a kind of system for detecting website is additionally provided, comprising: input unit,
For obtaining website to be detected;Processor, the similarity of the structure of web page for determining website to be detected and benchmark website, and
In the case that similarity is greater than the first preset value, there are in the case where the keyword of specified type in determining website to be detected,
Determine that website to be detected is the website of specified type.
According to another aspect of an embodiment of the present invention, a kind of storage medium is additionally provided, which includes storage
Program, wherein the method that equipment where control storage medium executes detection website in program operation.
According to another aspect of an embodiment of the present invention, a kind of processor is additionally provided, which is used to run program,
In, the method for detection website is executed when program is run.
According to another aspect of an embodiment of the present invention, a kind of system for detecting website is additionally provided, comprising: processor;With
And memory, connect with processor, for providing the instruction for handling following processing step for processor: determine website to be detected and
The similarity of the structure of web page of benchmark website;Similarity be greater than the first preset value in the case where, judge be in website to be detected
It is no that there are the keywords of specified type;There are in the case where the keyword of specified type in determining website to be detected, determine to
Detect the website that website is specified type.
In embodiments of the present invention, using the detection mode based on web site architecture, pass through determination website to be detected and benchmark
The similarity of the structure of web page of website, then similarity be greater than the first preset value in the case where, judge be in website to be detected
No there are the keywords of specified type, finally, there are in the case where the keyword of specified type in determining website to be detected, really
Fixed website to be detected is the website of specified type, has achieved the purpose that the detection efficiency for improving detection violation website, to realize
Recall rate, the technical effect of high rate of false alarm caused by keyword detection is used only are avoided, and then solves and examines in the prior art
Survey grid station whether be violation website the low technical problem of accuracy rate.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of system structure diagram for detecting website according to an embodiment of the present invention;
Fig. 2 is a kind of method flow diagram for detecting website according to an embodiment of the present invention;
Fig. 3 is a kind of method flow diagram for detecting website according to an embodiment of the present invention;
Fig. 4 (a) is a kind of method flow diagram of optional detection website according to an embodiment of the present invention;
Fig. 4 (b) is a kind of method flow diagram of optional detection website according to an embodiment of the present invention;
Fig. 5 is a kind of flow chart in optional building abnormality detection library according to an embodiment of the present invention;
Fig. 6 is a kind of method flow diagram for detecting website according to an embodiment of the present invention;
Fig. 7 is a kind of apparatus structure schematic diagram for detecting website according to an embodiment of the present invention;And
Fig. 8 is a kind of hardware block diagram of terminal according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Firstly, the part noun or term that occur during the embodiment of the present application is described are suitable for following solution
It releases:
(1) clustering, i.e. cluster analysis are a kind of statistical analysis techniques and data for studying sample or index classification problem
Mining algorithm.For example, the set of physics or abstract object is grouped into the analytic process for the multiple classes being made of similar object,
It is made of several modes, in general, vector or a point in hyperspace that mode refers to a measurement.
(2) DOM (Document Object Model, i.e. DOM Document Object Model) is that the processing of World Wide Web Consortium recommendation can
The standard programming interface of extension flag language.Wherein, on webpage, organize the object of the page or document be organized in one it is tree-like
In structure, i.e. dom tree.
Embodiment 1
According to embodiments of the present invention, a kind of system for detecting website is provided, it should be noted that the application was proposed
The system of detection website can be applied to network context of detection, for example, to prevent minor child's access in family unsound
Website, parent can be by using the systems of detection website provided herein.Child in manager is a certain by computer to access
When website, the system for detecting website carries out the structure of web page of the website and violation website (for example, gambling site, porn site)
Matching, obtains the maximum similarity of similarity with multiple violation websites, and further determine that whether maximum similarity is greater than
Default similarity, whether if maximum similarity is greater than default similarity, further detecting in website includes sensitive keys
Word;If in website including sensitive keys word, it is determined that the website is located at the corresponding violation website of sensitive keys word.In determination
The system control computer shutdown of website is after violation website, is detected or generates warning in the website of access, to remind visitor
The website is violation website.
As shown in the above, the system for the detection website that the application is proposed can carry out various dimensions to website to be detected
Detection, wherein various dimensions detection includes but is not limited to based on the dimensions such as structure of web page and sensitive keys word to website to be detected
It is detected, whether is violation website (for example, gambling site, porn site) with determination website to be detected, to reach purification
The purpose of network environment.
Specifically, as shown in Figure 1, the system of detection website provided herein specifically include that input unit 10 and
Processor 20.Wherein, input unit 10, for obtaining website to be detected;Processor 20, for determining website to be detected and benchmark
The similarity of the structure of web page of website, and in the case where similarity is greater than the first preset value, if existed in website to be detected
The keyword of specified type, it is determined that website to be detected is the website of specified type.
It should be noted that can be by obtaining the data to be tested of website to be detected, and determined by data to be tested
The Type of website of website to be detected, wherein the data to be tested of website to be detected can be but be not limited to the net of website to be detected
Keyword included in page structure, website to be detected etc..Said reference website is the website in the library of abnormal website, wherein different
There are multiple violation websites in normal website library.The keyword of above-mentioned specified type is the sensitive keys word that can identify the Type of website,
For example, " gambling house ", " gambling ", " mahjong " etc., can identify the website is gambling site.Above-mentioned specified type is the type of website,
For example, normal website, gambling site, porn site etc..
In an alternative embodiment, user by input unit input network address to determine the website to be accessed,
Determining website to be detected, the processor connecting with input unit can get the data to be tested of website to be detected, and according to
The data to be tested of website to be detected obtain the structure of web page of website to be detected, and by the structure of web page and exception of website to be detected
The structure of web page of each benchmark website in the library of website is matched, and multiple similarities are obtained, and then determines multiple similar degrees
It is worth similarity of the maximum similarity as website to be detected and benchmark website.If it is default that the similarity is less than or equal to first
Value then illustrates website to be detected and violation website and do not match that, therefore, the inspection no longer to website to be detected progress next step
It surveys;If the similarity is greater than the first preset value, illustrate that website to be detected is possible to as violation website, therefore, it is necessary to treat
Detection website is further detected, that is, passes through keyword detection.Specifically, the keyword in website to be detected is extracted,
And the crucially no matching of the keyword extracted and specified type is compared, if it does, then determining website Zhong Bao to be detected
Keyword containing specified type, so that it is determined that website to be detected is the net of specified type corresponding with the keyword of specified type
Network, for example, determining in website to be detected through overmatching comprising keyword " gambling ", it is determined that the website to be detected is gambling net
It stands.
From the foregoing, it will be observed that obtain website to be detected by input equipment, processor determines website to be detected and benchmark website
The similarity of structure of web page, and in the case where similarity is greater than the first preset value, if there is specified class in website to be detected
The keyword of type, it is determined that website to be detected is the website of specified type.
It is easily noted that, due to being from the similarity of the structure of web page of website to be detected and benchmark website and to be checked
Two dimensions of keyword in survey grid station with the presence or absence of specified type detect website to be detected, and not only from sensitive word
Library is detected, and carries out detecting the recall rate for leading to website to website to be detected to reach and effectively avoid being used only keyword
Problem low, rate of false alarm is high.
As shown in the above, the system of detection website provided herein, which can achieve, improves detection violation website
The purpose of detection efficiency, to realize the technology effect for avoiding low, the high rate of false alarm of recall rate caused by keyword detection is used only
Fruit, so solve detection website in the prior art whether be violation website the low technical problem of accuracy rate.
In an alternative embodiment, the system of detection website provided herein further includes memory.Wherein, it deposits
Reservoir, for storing the website that website to be detected is specified type.Specifically, determining that website to be detected is the net of specified type
It stands after (for example, gambling site), website to be detected is stored into abnormal website library, to increase the benchmark in the library of abnormal website
The quantity of website provides guarantee to provide accurate detection result.In addition, having due to the fast development of Internet technology
Abnormal website library in benchmark website may not be suitable for detecting existing violation website.Therefore, it is necessary to exception
Website library is updated.And above-mentioned memory stores the violation website detected every time into abnormal website library, can achieve
The purpose that abnormal website library is updated.
In an alternative embodiment, processor is also used to obtain the dom tree of website to be detected;Dom tree is divided
Solution obtains tree set of paths;The similarity of the structure of web page of website to be detected and benchmark website is determined according to tree set of paths.
In an alternative embodiment, processor is also used to extract the keyword in tree set of paths;Comparison tree path
The keyword of keyword and specified type in set, obtains similarity;In the case where similarity is greater than similarity threshold, really
There are the keywords of specified type in fixed website to be detected.
It should be noted that the system of detection website provided herein can also be based on domain-name information to website to be detected
It is detected.Mainly pass through the similarity of the domain name of the domain name and benchmark website that detect website to be detected, and/or, it is to be detected
The size of the domain name price of website and default domain name price, come determine website to be detected whether be specified type website.Specifically
, in the case where similarity is less than or equal to the first preset value and is greater than the second preset value, processor judges website to be detected
Whether the similarity of domain name and the domain name of benchmark website is greater than third preset value, and/or, whether the domain name price of website to be detected
Less than the 4th preset value;It is greater than third preset value in the similarity of the domain name of the domain name and benchmark website that determine website to be detected,
And/or in the case that the domain name price of website to be detected is less than the 4th preset value, judge in website to be detected with the presence or absence of specified
The keyword of type;There are in the case where the keyword of specified type, determine that website to be detected is in determining website to be detected
The website of specified type.For example, processor the domain name for determining website to be detected and the domain name of the violation website of benchmark phase
It is greater than third preset value like degree, also, the domain name price of website to be detected is less than the 4th preset value, it is determined that website to be detected is
Violation website, or abnormal website.
In addition it is also necessary to explanation, the system of detection website provided by the present application not only can be from sensitive dictionary, exception
Detection library and three, domain-name information library dimension are detected, and can also be detected from other dimensions, details are not described herein.
Embodiment 2
According to embodiments of the present invention, a kind of embodiment of the method for detecting website is additionally provided, it should be noted that in attached drawing
Process the step of illustrating can execute in a computer system such as a set of computer executable instructions, although also,
Logical order is shown in flow charts, but in some cases, can be executed with the sequence for being different from herein it is shown or
The step of description.Wherein, embodiment of the method provided herein can be held in the system of detection website in embodiment 1
Row.Wherein, Fig. 2 shows it is a kind of detect website method flow diagram, as shown in Figure 2, the method for detecting website specifically include as
Lower step:
Step S202 determines the similarity of the structure of web page of website to be detected and benchmark website.
It should be noted that benchmark website is the website in abnormality detection library, wherein the website in abnormality detection library is equal
For violation website, for example, gambling site, porn site etc..Wherein, the data to be tested and base of comparison website to be detected can be passed through
The data of quasi- website determine the Type of website of website to be detected, the data to be tested of website to be detected can be but be not limited to
Detect the structure of web page of website, keyword, the domain-name information of website to be detected that website to be detected is included etc..
In an alternative embodiment, user by client access website (website i.e. to be detected), with client into
The system of the detection website of row communication gets client and accesses the solicited message of website, and is obtained according to solicited message to be detected
The feedback information that website returns obtains website to be detected it is then detected that the system of website carries out dissection process to feedback information
Data to be tested are to get data such as the structure of web page for arriving website to be detected.The system of website is detected by the net to website to be detected
Page is handled, and tree set of paths is obtained, meanwhile, the structure of web page of benchmark website is obtained from abnormality detection library, wherein reference net
The form that the structure of web page stood can also set set of paths indicates, by comparing the tree set of paths and reference net of website to be detected
The tree set of paths for the structure of web page stood obtains the similarity of the two.
In addition it is also necessary to which the similarity of the structure of web page of explanation, website to be detected and benchmark website is survey grid to be checked
The highest similarity of structure of web page similarity stood with benchmark website in abnormality detection library.For example, benchmark website includes three nets
Page A, B and C, wherein the similarity of the webpage of the webpage and benchmark website of website to be detected is respectively A1, B1 and C1, and A1
< C1 < B1, then using B1 as the similarity of the structure of web page of website to be detected and benchmark website.
Step S204 judges in website to be detected in the case where similarity is greater than the first preset value with the presence or absence of specified
The keyword of type.
It should be noted that the keyword of specified type is the keyword in sensitive dictionary.If it is determined that website to be detected
It is greater than the first preset value with the similarity of the structure of web page of benchmark website, then illustrates that website to be detected may be violation website.For
Further determine that whether website to be detected is violation website, needs further to examine the keyword in website to be detected
It surveys.Specifically, can be to be checked to determine by comparing the similarity of the keyword of keyword and specified type in website to be detected
It whether there is the keyword of specified type in survey grid station.Wherein, if similarity is greater than preset similarity threshold, really
There are the keywords of specified type in fixed website to be detected.
Step S206 determines survey grid to be checked there are in the case where the keyword of specified type in determining website to be detected
It stands as the website of specified type.
It should be noted that detecting website in the case where determining that website to be detected includes the keyword of specified type
System will acquire the type of the keyword of specified type, and the type of website to be detected is determined according to the type of keyword.For example,
The type for the keyword that website to be detected includes is gambling, it is determined that website to be detected is gambling site.
Based on step defined by above-mentioned steps S202 to step S206, can know, by determination website to be detected and
The similarity of the structure of web page of benchmark website.Similarity be greater than the first preset value in the case where, judge be in website to be detected
It is no that there are the keywords of specified type.Finally in determining website to be detected there are in the case where the keyword of specified type, really
Fixed website to be detected is the website of specified type.
It is easily noted that, due to being from the similarity of the structure of web page of website to be detected and benchmark website and to be checked
Two dimensions of keyword in survey grid station with the presence or absence of specified type detect website to be detected, and not only from sensitive word
Library is detected, and carries out detecting the recall rate for leading to website to website to be detected to reach and effectively avoid being used only keyword
Problem low, rate of false alarm is high.
As shown in the above, the system of detection website provided herein, which can achieve, improves detection violation website
The purpose of detection efficiency, to realize the technology effect for avoiding low, the high rate of false alarm of recall rate caused by keyword detection is used only
Fruit, so solve detection website in the prior art whether be violation website the low technical problem of accuracy rate.
It should be noted that the similarity in the structure of web page of website to be detected and benchmark website is greater than the first preset value
In the case of, need further to be determined according to the domain-name information of website to be detected website to be detected whether be specified type net
It stands, wherein the domain-name information of website to be detected includes at least: the domain name valence of the domain name of website to be detected and website to be detected
Lattice.Determined according to the domain-name information of website to be detected website to be detected whether be specified type website specific steps such as
Under:
Step S210 judges to be checked in the case where similarity is less than or equal to the first preset value and is greater than the second preset value
Whether the similarity of the domain name of the domain name and benchmark website at survey grid station is greater than third preset value, and/or, the domain name of website to be detected
Whether price is less than the 4th preset value;
Step S212, it is default greater than third in the similarity of the domain name of the domain name and benchmark website that determine website to be detected
Value, and/or, in the case that the domain name price of website to be detected is less than the 4th preset value, judge to whether there is in website to be detected
The keyword of specified type;
Step S214 determines survey grid to be checked there are in the case where the keyword of specified type in determining website to be detected
It stands as the website of specified type.
In an alternative embodiment, in the structure of web page for determining website to be detected and the structure of web page of benchmark website
In the case that similarity is less than or equal to the first preset value, website to be detected may be not identical as the Type of website of benchmark website, is
Determine that the Type of website of website to be detected needs in order to which whether determination website to be detected is identical as the type of benchmark website
The domain-name information of website to be detected is detected.Wherein, it is less than or equal to the first preset value in similarity and is greater than second in advance
If determining the similarity of the domain name of website to be detected and the domain name of benchmark website in the case where value, if similarity is greater than third
Preset value, it is determined that the Type of website of website to be detected is identical as the Type of website of benchmark website.Can have by the above method
Effect avoids the problem that improving according only to structure of web page similarity and the keyword erroneous judgement Type of website and determining the Type of website to be detected
Accuracy.
In an alternative embodiment, in the structure of web page of the structure of web page and benchmark website that determine website to be detected
Similarity be less than or equal to the first preset value in the case where, determine whether the domain name price of website to be detected default less than the 4th
Value, if the domain name price of website to be detected is less than the 4th preset value, it is determined that website to be detected is violation website.It needs to illustrate
, the domain name price of the higher website of risk is generally relatively low, therefore, by detect website to be detected domain name price whether
Less than the domain name price of the lower website of risk or the domain name price of normal website, can effectively avoid according only to structure of web page phase
Like the problem of degree and the keyword erroneous judgement Type of website.
There is also a kind of optional embodiments, in the structure of web page of the structure of web page and benchmark website that determine website to be detected
Similarity be less than or equal to the first preset value after, detect the similarity of the domain name of website to be detected and the domain name of benchmark website,
And the domain name price of website to be detected.It is pre- greater than third in the domain name of website to be detected and the similarity of the domain name of benchmark website
If value, meanwhile, the domain name price of website to be detected is less than the 4th preset value, it is determined that website to be detected is the net of specified type
It stands.It should be noted that low with the similarity-rough set of the structure of web page of benchmark website in the structure of web page for determining website to be detected
In the case where, further detect the domain-name information of website to be detected, and then determined according to the domain-name information of website to be detected to
The Type of website for detecting website, can achieve the purpose for accurately determining the Type of website of website to be detected.
In an alternative embodiment, the tree path of the structure of web page of comparison website to be detected and benchmark website can be passed through
The mode of set determines the similarity of the structure of web page of website to be detected and benchmark website, the specific steps are as follows:
Step S2040 obtains the dom tree of website to be detected;
Step S2042, decomposes dom tree, obtains tree set of paths;
Step S2044 determines the similarity of the structure of web page of website to be detected and benchmark website according to tree set of paths.
It should be noted that each dom tree can be analyzed to a plurality of tree set of paths, wherein if the structure of two webpages
Similar, then the two webpages can be analyzed to a plurality of similar tree path.In addition, the similarity of structure of web page can pass through best match
Path calculates.Wherein, the similarity of two structure of web page is the similarity in every tree path and the tree path of its best match
Average value.
Specifically, the feature of tree set of paths is counted based on DOM tree node after the dom tree for obtaining website to be detected,
And based on the feature extracted to tree set of paths link match, be then based on minimum editing distance principle determine it is to be checked
The similarity of the structure of web page of survey grid station and benchmark website, i.e., by the corresponding tree path of page structure in benchmark website with it is to be checked
The tree path at survey grid station, obtains the difference value or distance in two paths, and difference value or the webpage apart from the smallest benchmark website
The similarity of the structure of web page of structure and website to be detected, the phase of as above-mentioned website to be detected and the structure of web page of benchmark website
Like degree.
In an alternative embodiment, the mode that DOM decomposition can be used judges in website to be detected with the presence or absence of specified
The keyword of type, the specific steps are as follows:
Step S2060 extracts the keyword in tree set of paths;
Step S2062, comparison set the keyword of keyword and specified type in set of paths, obtain similarity;
Step S2064 determines that there are specified types in website to be detected in the case where similarity is greater than similarity threshold
Keyword.
Specifically, the system of detection website is pre-processed by the data to be tested to website to be detected, obtain to be checked
The dom tree of the webpage at survey grid station, then analyzes dom tree, obtains tree set of paths, while by extracting tree set of paths
In content extracted, then after segmenting, removing stop words, use TF-IDF (Term Frequency-Inverse
The reverse file word frequency of Document Frequency, i.e. word frequency -) method, IDF (i.e. reverse file word frequency) numerical value is biggish
Word extracts to arrive the keyword of website to be detected.Then by the pass of the keyword of website to be detected and specified type
Keyword is compared, and obtains the corresponding similarity of keyword.If similarity is greater than similarity threshold, it is determined that website to be detected
It is middle that there are the keywords of specified type.
Embodiment 3
According to embodiments of the present invention, a kind of embodiment of the method for detecting website is additionally provided, wherein Fig. 3 shows one kind
The method flow diagram of website is detected, from the figure 3, it may be seen that the method for detection website specifically comprises the following steps:
Step S302 obtains the data to be tested of website to be detected.
It should be noted that the data to be tested of above-mentioned website to be detected can be but be not limited to the webpage of website to be detected
Keyword included in structure, website to be detected etc..
Step S304 determines the first similarity of data to be tested and the data in the library of abnormal website, wherein abnormal website
Library includes the structure of web page of multiple abnormal websites.
It should be noted that being stored at least one violation website in the library of abnormal website.Wherein, data to be tested and exception
The similarity of data in the library of website is the similarity of the structure of web page of data to be tested and the benchmark website in the library of abnormal website.
Benchmark website is the highest website of similarity in the library of abnormal website with data to be tested.
Step S306 determines the second similarity of data to be tested and the keyword in sensitive dictionary.
It should be noted that including multiple sensitive vocabulary in sensitive dictionary, wherein the keyword in sensitive dictionary is quick
Feel dictionary in the highest keyword of data to be tested similarity.
Step S308, if the first similarity is greater than first threshold, and the second similarity is greater than second threshold, it is determined that be checked
Survey grid station is the website of specified type.
It should be noted that above-mentioned exception website characterizes the website to be detected to have risky website.Specifically, true
Fixed first similarity is greater than after first threshold, illustrates that website to be detected may be violation website;It then proceedes to similar to second
Degree is judged, if the second similarity is greater than second threshold, it is determined that website to be detected is abnormal website.Further, according to
The classification of the keyword in sensitive dictionary to match with data to be tested can determine the type of website to be detected, for example, really
Fixed website to be detected is gambling site.
Based on step defined by above-mentioned steps S302 to step S308, it can know, by obtaining website to be detected
Data to be tested, and determine the first similarity of data to be tested and the data in the library of abnormal website, then determine number to be detected
According to the second similarity with the keyword in sensitive dictionary, if the first similarity is greater than first threshold, and the second similarity is greater than
Second threshold, it is determined that website to be detected is the website of specified type, wherein abnormal website library includes the net of multiple risk websites
Page structure.
It is easily noted that, due to being from the similarity of the structure of web page of website to be detected and benchmark website and to be checked
Two dimensions of keyword in survey grid station with the presence or absence of specified type detect website to be detected, and not only from sensitive word
Library is detected, and carries out detecting the recall rate for leading to website to website to be detected to reach and effectively avoid being used only keyword
Problem low, rate of false alarm is high.
As shown in the above, it is risky to can achieve raising detection tool for the system of detection website provided herein
The purpose of the detection efficiency of website avoids low recall rate caused by keyword detection is used only, high rate of false alarm to realize
Technical effect, so solve detection website in the prior art whether be violation website the low technical problem of accuracy rate.
In an alternative embodiment, before the data to be tested for obtaining website to be detected, need to construct abnormal inspection
Survey library, the specific steps are as follows:
Step S30 constructs abnormal website library and sensitive dictionary;
Step S32 constructs abnormality detection library according to abnormal website library and sensitive dictionary.
Wherein, sensitive dictionary is constructed to specifically comprise the following steps:
Step S3002a obtains the data set of multiple abnormal websites;
Step S3004a handles the data set of multiple abnormal websites, obtains the tree set of paths of data set;
Step S3006a extracts the keyword in tree set of paths, wherein keyword is sensitive keys word;
Step S3008a constructs sensitive dictionary according to the keyword being drawn into.
Specifically, the system of detection website pre-processes the data set of multiple abnormal websites, by the net of abnormal website
The dom tree of page is indicated with one group of tree set of paths, while carrying out whole extractions to the content of abnormal website, by segmenting, going
After stop words processing, using the method for TF-IDF, the biggish word of IDF numerical value is extracted to get quick into sensitive dictionary
Feel word.Multiple abnormal websites are carried out sensitive word to extract being that may make up sensitive word library.
In an alternative embodiment, abnormal website library is constructed, comprising:
Step S3002b obtains the dom tree of multiple abnormal websites;
Step S3004b, decomposes dom tree, obtains tree set of paths;
Step S3006b determines the similarity of the structure of web page of multiple abnormal websites according to tree set of paths;
Step S3008b carries out clustering processing to multiple abnormal websites according to the similarity of structure of web page, obtains cluster knot
Fruit;
Step S3010b constructs abnormal website library according to cluster result.
Specifically, one kind as shown in Fig. 4 (a) optionally detection website method flow diagram, by Fig. 4 (a) it is found that from
After sample database gets the data of abnormal website, needs the data to abnormal website to pre-process, obtain dom tree, then
Documents structured Cluster processing is carried out to obtained dom tree, and abnormal website library is formed according to cluster result.
It should be noted that determining the similarity of the structure of web page of multiple abnormal websites according to tree set of paths, comprising:
Step S402 obtains the first similarity of each path and coupling path in tree set of paths, wherein matching road
Diameter is the highest path of similarity corresponding with each path;
Step S404 determines the similarity of structure of web page according to the first similarity.
Specifically, when carrying out documents structured Cluster to multiple abnormal websites, it is necessary first to be parsed to webpage, spanning tree path
Set.Then calculate this it is multiple tree set of paths similarities, formed similarity matrix, and based on similarity matrix obtain it is above-mentioned
The best matching path of multiple tree set of paths, and then obtain the similarity of multiple webpages.Be then based on the similarity of webpage into
Row cluster merges, and completes the cluster of abnormal website.
It should be noted that abnormal website can be constructed according to cluster result after the cluster result for obtaining abnormal website
Library, the specific steps are as follows:
Step S406 determines the Template web page in every class exception website according to cluster result, wherein Template web page is base
The webpage of quasi- website.
Step S408 constructs abnormal website library based on Template web page.
It should be noted that abnormality detection library further include: domain-name information library, domain-name information library include violation domain-name information library
And domain name price library, wherein before determining website to be detected for abnormal website, the method that detects website further include:
Step S502 obtains the domain-name information of website to be detected;
Step S504, determine website to be detected domain name and violation domain-name information library in domain name third similarity, and/
Or, determining the domain name price of website to be detected according to domain name price library.
It should be noted that obtaining website to be detected and abnormal website library for the type for accurately determining website to be detected
In data the first similarity and website to be detected data to be tested it is similar to second of the keyword in sensitive dictionary
While spending, it is also necessary to determine the domain-name information of website to be detected according to violation domain-name information library and domain name price library.
In an alternative embodiment, the domain name of website to be detected and the domain name in violation domain-name information library are compared
It is right, determine that the domain name of website to be detected and the highest similarity of the domain name in violation domain-name information library, the similarity are as to be checked
The domain name at survey grid station and the third similarity of the domain name in violation domain-name information library.And then the highest similarity of above-mentioned determination determines
With the domain name in the domain name similarity highest violation site information library of website to be detected, then according to the domain name in domain name price
Domain name price is determined in library, wherein the domain name price is the domain name price of website to be detected.
It should be noted that after the similarity and/or domain name price in domain name price library has been determined, and then according to domain name
The similarity and/or domain name price in price library are that can determine that website to be detected is the website of specified type, specifically, if the first phase
It is greater than first threshold like degree, the second similarity is greater than second threshold, and third similarity is greater than third threshold value, and/or, it is to be detected
The domain name price of website is less than default price, it is determined that website to be detected is the website of specified type.
Specifically, can determine that website to be detected is the website of specified type according to any one following mode:
One: the first similarity of mode is greater than first threshold, and the second similarity is greater than second threshold, and third similarity is big
In third threshold value;
Mode two: the first similarity is greater than first threshold, and the second similarity is greater than second threshold, and website to be detected
Domain name price is less than default price;
Three: the first similarity of mode is greater than first threshold, and the second similarity is greater than second threshold, and third similarity is greater than the
Three threshold values, and the domain name price of website to be detected is less than default price.
It in an alternative embodiment, can be to net to be detected by Fig. 4 (a) it is found that after constructing abnormal website library
Station is detected, specifically, obtaining the data (data to be tested of website i.e. to be detected) of website on cloud, and carries out label to it
The extraction of content, and carry out characteristic set processing obtain website to be detected whether include specified type keyword.Simultaneously
The structure of web page of website to be detected is detected based on the abnormal website library built, determines the tree road set of paths Zhong Meitiao
The size and data to be tested of the first similarity and first threshold of diameter and coupling path and the keyword in sensitive dictionary
The size of second similarity and second threshold.If the first similarity is greater than first threshold, while the second similarity is greater than second
Threshold value then carries out platform audit, if audit passes through, it is determined that website to be detected is abnormal website, and stores abnormal website extremely
Abnormality detection library.As a member in the sample database of abnormal website, and then the data issued with network are together as webpage training
Collection.
In an alternative embodiment, Fig. 4 (b) shows a kind of method flow diagram of optional detection website.By Fig. 4
(b) it is found that extracting webpage from the sample database for being stored with abnormal website to obtain webpage training set, and then abnormal website is obtained
Data.After the data for obtaining abnormal website, the data of abnormal website can be pre-processed.Specifically, abnormal website
Webpage be HTML (i.e. hypertext markup language) webpage, HTML parsing is carried out to the webpage of abnormal website, XML can be obtained
(i.e. extensible markup language), then to XML carry out DOM parsing, thus obtain DOM object (i.e. Document object) to get
To the dom tree of the webpage of abnormal website.After obtaining the dom tree of webpage, documents structured Cluster processing is carried out to dom tree, specifically,
It is primarily based on dom tree to parse webpage, spanning tree set of paths, for example, generating P1=(N11,N12,...,N1m) and P2=
(N21,N22,...,N2m) two tree set of paths, similarities of the two tree set of paths are then calculated, similarity moment is formed
Battle array, and the best matching path of above-mentioned two tree set of paths is obtained based on similarity matrix, and then obtain the phase of two webpages
Like degree, last web-based similarity obtains cluster result.After obtaining cluster result, generated according to cluster result multiple
Webpage cluster, and by artificial screening, i.e. manual intervention formation rule library in Fig. 4 (b) to get to abnormal website library (i.e. in violation of rules and regulations
Website form library).After constructing abnormal website library, website to be detected can be detected, specifically, obtaining website on cloud
Data (data to be tested of website i.e. to be detected), and the extraction of label substance is carried out to it, and carry out characteristic set
Processing is then based on the abnormal website library of building and treated characteristic set, constructs model, and according to the model built
Website to be detected is detected, determines the first similarity, to be detected of each path and coupling path in tree set of paths
Second similarity of data and the keyword in sensitive dictionary, then carries out threshold value comparison to above-mentioned testing result.Specifically, than
Compared with the size of the size and the second similarity and second threshold of the first similarity and first threshold.If the first similarity is greater than
First threshold, while the second similarity is greater than second threshold, then carries out RCP (Rich Client Platform, i.e. rich clients
Platform) audit, if audit passes through, it is determined that website to be detected is abnormal website, is punished to the head of a station of website to be detected,
And illegal website is stored to abnormality detection library.As a member in the sample database of abnormal website, and then the data issued with network
Together as webpage training set.
Embodiment 4
According to embodiments of the present invention, a kind of embodiment of the method for detecting website is additionally provided, wherein Fig. 6 shows one kind
The method flow diagram of website is detected, it will be appreciated from fig. 6 that the method for detection website specifically comprises the following steps:
Step S602 receives the data information of website to be detected;
Step S604 evaluates the data information of website to be detected based on multiple abnormality detection libraries, obtains to be detected
The value-at-risk of website, wherein different abnormality detection libraries corresponds to different judgment rules, and judgment rule is for determining survey grid to be checked
The risk stood under different abnormality detection libraries;
Step S606 determines the Type of website of website to be detected based on the value-at-risk of website to be detected.
It should be noted that being detected based on different abnormality detection libraries to the data information of website to be detected, can obtain
To different value-at-risks, summation then is weighted to the value-at-risk obtained under different detection libraries, survey grid to be checked can be obtained
The value-at-risk stood.It is the Type of website that can determine website to be detected according to the section where the numerical value of value-at-risk, for example, value-at-risk
Numerical value section [A, B) in, it is determined that the Type of website of website to be detected be gambling site.
In addition it is also necessary to which explanation, multiple abnormality detection libraries include at least: sensitive dictionary, abnormal website library, domain name letter
Cease library, wherein abnormal website library includes the structure of web page of multiple abnormal websites.
Based on step defined by above-mentioned steps S602 to step S606, it can know, by receiving website to be detected
Data information is then based on multiple abnormality detection libraries and evaluates the data information of website to be detected, obtains website to be detected
Value-at-risk, finally the value-at-risk based on website to be detected determines the Type of website of website to be detected, wherein different abnormal inspections
It surveys library and corresponds to different judgment rules, judgment rule is for determining risk of the website to be detected under different abnormality detection libraries
Value.
It is easily noted that, due to being from the similarity of the structure of web page of website to be detected and benchmark website and to be checked
Two dimensions of keyword in survey grid station with the presence or absence of specified type detect website to be detected, and not only from sensitive word
Library is detected, and carries out detecting the recall rate for leading to website to website to be detected to reach and effectively avoid being used only keyword
Problem low, rate of false alarm is high.
As shown in the above, the system of detection website provided herein, which can achieve to improve, detects abnormal website
The purpose of detection efficiency avoids low recall rate caused by keyword detection is used only, the technology of high rate of false alarm effect to realize
Fruit, so solve detection website in the prior art whether be abnormal website the low technical problem of accuracy rate.Wherein, abnormal net
(or violation website) is stood as the website with security risk.
It should be noted that detecting based on data information of the abnormality detection library to website to be detected, risk is obtained
After value, if website to be detected is abnormal website, website to be detected is stored to abnormality detection library.
In an alternative embodiment, the data information of website to be detected is commented based on multiple abnormality detection libraries
Valence before obtaining the value-at-risk of website to be detected, needs to construct sensitive dictionary and abnormal website library, wherein construct sensitive dictionary
Include:
Step S60 obtains the data set of abnormal website;
Step S62 handles the data set of abnormal website, obtains the tree set of paths of data set;
Step S64 extracts the keyword in tree set of paths, wherein keyword is sensitive keys word;
Step S66 constructs sensitive dictionary according to the keyword being drawn into.
Specifically, the system of detection website pre-processes the data set of multiple abnormal websites, by the net of abnormal website
The dom tree of page is indicated with one group of tree set of paths, while carrying out whole extractions to the content of abnormal website, by segmenting, going
After stop words processing, using the method for TF-IDF, the biggish word of IDF numerical value is extracted to get quick into sensitive dictionary
Feel word.Multiple abnormal websites are carried out sensitive word to extract being that may make up sensitive word library.
In an alternative embodiment, abnormal website library is constructed, comprising:
Step S70 obtains the dom tree of multiple abnormal websites;
Step S72, decomposes dom tree, obtains tree set of paths;
Step S74 determines the similarity of the structure of web page of multiple abnormal websites according to tree set of paths;
Step S76 carries out clustering processing to multiple abnormal websites according to the similarity of structure of web page, obtains cluster result;
Step S78 constructs abnormal website library according to cluster result.
Specifically, by Fig. 4 (b) it is found that after the data for getting abnormal website, need to the data of abnormal website into
Row pretreatment.Specifically, the webpage of abnormal website is HTML (i.e. hypertext markup language) webpage, to the webpage of abnormal website into
Row HTML parsing, can be obtained XML (i.e. extensible markup language), then DOM parsing be carried out to XML, to obtain DOM pairs
As the dom tree to get the webpage for arriving abnormal website.Then obtained dom tree is subjected to documents structured Cluster processing, and is tied according to cluster
Fruit generates multiple webpage clusters, and carries out conversion processing to multiple webpage clusters, so that formation rule library is to get to abnormal website library.
In an alternative embodiment, it is detected, is obtained based on data information of the abnormality detection library to website to be detected
To value-at-risk, comprising:
Step S6040 obtains the data information for treating detection website based on sensitive dictionary and is detected, the first obtained wind
Danger value;
Step S6042, acquisition detect the data information of website to be detected based on abnormal website library, second obtained
Value-at-risk;
Step S6044, acquisition are detected based on data information of the domain-name information library to website to be detected, obtain third wind
Danger value;
Step S6044 is weighted summation to the first value-at-risk, the second value-at-risk and third value-at-risk, determines to be checked
The risk at survey grid station.
It should be noted that the weight of the first value-at-risk, the second value-at-risk and third value-at-risk can according to the actual situation into
Row setting, wherein the weight highest of the first value-at-risk.
In an alternative embodiment, the value-at-risk based on website to be detected determines the Type of website of website to be detected,
It specifically includes:
Step S6060, judges whether the value-at-risk of website to be detected is greater than default value-at-risk;
Step S6062 determines that website to be detected is in the case where the value-at-risk of website to be detected is greater than default value-at-risk
Abnormal website.
It should be noted that after determining website to be detected for abnormal website, it can be according to the value-at-risk of website to be detected
Affiliated numerical intervals determine the concrete type of website to be detected, for example, value-at-risk is greater than A, it is determined that website to be detected is
Abnormal website;If further value-at-risk numerical value section [A, B) in, it is determined that the Type of website of website to be detected is gambling
Website.
In an alternative embodiment, Fig. 5 shows a kind of flow chart in optional building abnormality detection library, such as Fig. 5
Known to, the specific steps are as follows:
Step S51 establishes the risk library of various dimensions after obtaining the data (for example, html source code) of abnormal website,
In, the risk library of various dimensions can be but be not limited to label violation dictionary (i.e. sensitive dictionary), the black sample form library of website structure
The information bank of (i.e. abnormal website library) and domain name.
Step S53 is generated in abnormality detection model (i.e. abnormality detection library) after establishing the risk library of various dimensions.
Step S55 detects the data to be tested of website to be detected based on the abnormality detection model built.Wherein, exist
When detecting to website to be detected, each risk library can obtain a value-at-risk, to each obtained risk in risk library
Value is weighted summation, and the value-at-risk of website to be detected can be obtained.And then it can determine according to the value-at-risk of website to be detected
Whether website to be detected is abnormal website.
Embodiment 5
According to embodiments of the present invention, a kind of Installation practice for detecting website is additionally provided, wherein Fig. 7 shows one kind
The apparatus structure schematic diagram of website is detected, as shown in Figure 7, the device for detecting website specifically includes: the first determining module 701 is sentenced
Disconnected module 703 and the second determining module 705.
Wherein, the first determining module 701, the similarity of the structure of web page for determining website to be detected and benchmark website;
Judgment module 703, in the case where similarity is greater than the first preset value, judging in website to be detected with the presence or absence of specified class
The keyword of type;Second determining module 705, for the case where there are the keywords of specified type in determining website to be detected
Under, determine that website to be detected is the website of specified type.
Herein it should be noted that above-mentioned first determining module 701, judgment module 703 and the second determining module 705 are right
Should be in the step S202 to step S206 in embodiment 2, example and application scenarios that three modules and corresponding step are realized
It is identical, but it is not limited to the above embodiments two disclosure of that.
In an alternative embodiment, detect the device of website further include: first judgment module, the second judgment module with
And the 5th determining module.Wherein, first judgment module is preset for being less than or equal to the first preset value in similarity and being greater than second
In the case where value, judge whether the similarity of the domain name of website to be detected and the domain name of benchmark website is greater than third preset value, and/
Or, whether the domain name price of website to be detected is less than the 4th preset value;Second judgment module, for determining website to be detected
The similarity of domain name and the domain name of benchmark website is greater than third preset value, and/or, the domain name price of website to be detected is less than the 4th
In the case where preset value, the keyword that whether there is specified type in website to be detected is judged;5th determining module, for true
There are in the case where the keyword of specified type, determine that website to be detected is the website of specified type in fixed website to be detected.
Herein it should be noted that above-mentioned first judgment module, the second judgment module and the 5th determining module correspond to
Step S210 to step S214 in embodiment 2, three modules are identical as example and application scenarios that corresponding step is realized,
But it is not limited to the above embodiments two disclosure of that.
In an alternative embodiment, the first determining module includes: that the first acquisition module, decomposing module and third are true
Cover half block.Wherein, first module is obtained, for obtaining the dom tree of website to be detected;Decomposing module, for dividing dom tree
Solution obtains tree set of paths;Third determining module, for determining the net of website to be detected Yu benchmark website according to tree set of paths
The similarity of page structure.
Herein it should be noted that above-mentioned first obtains module, decomposing module and third determining module corresponding to implementation
Step S2040 to step S2044 in example 2, three modules are identical as example and application scenarios that corresponding step is realized, but
It is not limited to the above embodiments two disclosure of that.
In an alternative embodiment, judgment module includes: abstraction module, contrast module and the 4th determining module.
Wherein, abstraction module, for extracting the keyword in tree set of paths;Contrast module, for comparing the pass in tree set of paths
The keyword of keyword and specified type, obtains similarity;4th determining module, for being greater than the feelings of similarity threshold in similarity
Under condition, the keyword in website to be detected there are specified type is determined.
Herein it should be noted that above-mentioned abstraction module, contrast module and the 4th determining module correspond in embodiment 2
Step S2060 to step S2064, the example and application scenarios that three modules and corresponding step are realized be identical but unlimited
In two disclosure of that of above-described embodiment.
Embodiment 6
According to embodiments of the present invention, a kind of system embodiment for detecting website is additionally provided, wherein the system for detecting website
It include: processor and memory.Wherein, memory is connect with processor, for providing processing following processing step for processor
Instruction: determine the similarity of the structure of web page of website to be detected and benchmark website;It is greater than the feelings of default similarity in similarity
Under condition, the keyword that whether there is specified type in website to be detected is judged;There are specified types in determining website to be detected
Keyword in the case where, determine website to be detected be specified type website.
Embodiment 7
The embodiment of the present invention can provide a kind of terminal, which can be in terminal group
Any one computer terminal.Optionally, in the present embodiment, above-mentioned terminal also could alternatively be mobile whole
The terminal devices such as end.
Optionally, in the present embodiment, above-mentioned terminal can be located in multiple network equipments of computer network
At least one network equipment.
Fig. 8 shows a kind of hardware block diagram of terminal.As shown in figure 8, terminal A may include one
(processor 802 may include but unlimited for a or multiple (802a, 802b ... ... being used in figure, 802n is shown) processor 802
In the processing unit of Micro-processor MCV or programmable logic device FPGA etc.), memory 804, Yi Jiyong for storing data
In the transmitting device 806 of communication function.In addition to this, it can also include: display, input/output interface (I/O interface), lead to
With the port universal serial bus (USB) (can be used as a port in the port of I/O interface is included), network interface, power supply and/
Or camera.It will appreciated by the skilled person that structure shown in Fig. 8 is only to illustrate, not to above-mentioned electronic device
Structure cause to limit.For example, terminal A may also include the more perhaps less component than shown in Fig. 8 or have
The configuration different from shown in Fig. 8.
It is to be noted that said one or multiple processors 802 and/or other data processing circuits lead to herein
Can often " data processing circuit " be referred to as.The data processing circuit all or part of can be presented as software, hardware, firmware
Or any other combination.In addition, data processing circuit for single independent processing module or all or part of can be integrated to meter
In any one in other elements in calculation machine terminal A.As involved in the embodiment of the present application, the data processing circuit
(such as the selection for the variable resistance end path connecting with interface) is controlled as a kind of processor.
Processor 802 can call the information and application program of memory storage by transmitting device, to execute following steps
It is rapid: to determine the similarity of the structure of web page of website to be detected and benchmark website;In the case where similarity is greater than the first preset value,
Judge the keyword that whether there is specified type in website to be detected;There are the keys of specified type in determining website to be detected
In the case where word, determine that website to be detected is the website of specified type.
Memory 804 can be used for storing the software program and module of application software, such as the detection in the embodiment of the present application
Corresponding program instruction/the data storage device of the method for website, processor 802 are stored in soft in memory 804 by operation
Part program and module realize the method for above-mentioned detection website thereby executing various function application and data processing.It deposits
Reservoir 804 may include high speed random access memory, may also include nonvolatile memory, as one or more magnetic storage fills
It sets, flash memory or other non-volatile solid state memories.In some instances, memory 804 can further comprise relative to place
The remotely located memory of device 802 is managed, these remote memories can pass through network connection to terminal A.Above-mentioned network
Example include but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 806 is used to that data to be received or sent via a network.Above-mentioned network specific example may include
The wireless network that the communication providers of terminal A provide.In an example, transmitting device 806 includes that a network is suitable
Orchestration (Network Interface Controller, NIC), can be connected by base station with other network equipments so as to
Internet is communicated.In an example, transmitting device 806 can be radio frequency (Radio Frequency, RF) module,
For wirelessly being communicated with internet.
Display can such as touch-screen type liquid crystal display (LCD), the liquid crystal display aloow user with
The user interface of terminal A interacts.
Herein it should be noted that in some optional embodiments, above-mentioned terminal A shown in Fig. 8 may include
Hardware element (including circuit), software element (including the computer code that may be stored on the computer-readable medium) or hardware member
The combination of both part and software element.It should be pointed out that Fig. 8 is only an example of particular embodiment, and it is intended to show
It may be present in the type of the component in above-mentioned terminal A out.
In the present embodiment, above-mentioned terminal A can execute the program generation of following steps in the method for detection website
Code: the similarity of the structure of web page of website to be detected and benchmark website is determined;In the case where similarity is greater than the first preset value,
Judge the keyword that whether there is specified type in website to be detected;There are the keys of specified type in determining website to be detected
In the case where word, determine that website to be detected is the website of specified type.
Processor can call the information and application program of memory storage by transmitting device, to execute following step:
In the case where similarity is less than or equal to the first preset value and is greater than the second preset value, the domain name and benchmark of website to be detected are judged
Whether the similarity of the domain name of website is greater than third preset value, and/or, whether the domain name price of website to be detected is pre- less than the 4th
If value;It is greater than third preset value in the similarity of the domain name of the domain name and benchmark website that determine website to be detected, and/or, it is to be checked
In the case that the domain name price at survey grid station is less than the 4th preset value, the key that whether there is specified type in website to be detected is judged
Word;There are in the case where the keyword of specified type, determine that website to be detected is specified type in determining website to be detected
Website.
Processor can call the information and application program of memory storage by transmitting device, to execute following step:
Obtain the dom tree of website to be detected;Dom tree is decomposed, tree set of paths is obtained;It is determined according to tree set of paths to be detected
The similarity of the structure of web page of website and benchmark website.
Processor can call the information and application program of memory storage by transmitting device, to execute following step:
Extract the keyword in tree set of paths;The keyword of keyword and specified type in comparison tree set of paths, obtains similar
Degree;In the case where similarity is greater than similarity threshold, the keyword in website to be detected there are specified type is determined.
It will appreciated by the skilled person that structure shown in Fig. 8 is only to illustrate, terminal is also possible to intelligence
It can mobile phone (such as Android phone, iOS mobile phone), tablet computer, applause computer and mobile internet device (Mobile
Internet Devices, MID), the terminal devices such as PAD.Fig. 8 it does not cause to limit to the structure of above-mentioned electronic device.Example
Such as, terminal A may also include the more or less component (such as network interface, display device) than shown in Fig. 8, or
Person has the configuration different from shown in Fig. 8.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing the relevant hardware of terminal device by program, which can store in a computer readable storage medium
In, storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random
Access Memory, RAM), disk or CD etc..
Embodiment 8
The embodiments of the present invention also provide a kind of storage mediums.Optionally, in the present embodiment, above-mentioned storage medium can
To detect program code performed by the method for website provided by above-described embodiment for saving.
Optionally, in the present embodiment, above-mentioned storage medium can be located in computer network in computer terminal group
In any one terminal, or in any one mobile terminal in mobile terminal group.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps: really
The similarity of the structure of web page of fixed website to be detected and benchmark website;In the case where similarity is greater than the first preset value, judgement
It whether there is the keyword of specified type in website to be detected;There are the keywords of specified type in determining website to be detected
In the case of, determine that website to be detected is the website of specified type.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps:
In the case that similarity is less than or equal to the first preset value and is greater than the second preset value, the domain name and reference net of website to be detected are judged
Whether the similarity for the domain name stood is greater than third preset value, and/or, whether the domain name price of website to be detected is default less than the 4th
Value;It is greater than third preset value in the similarity of the domain name of the domain name and benchmark website that determine website to be detected, and/or, it is to be detected
In the case that the domain name price of website is less than the 4th preset value, the key that whether there is specified type in website to be detected is judged
Word;There are in the case where the keyword of specified type, determine that website to be detected is specified type in determining website to be detected
Website.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps: obtaining
Take the dom tree of website to be detected;Dom tree is decomposed, tree set of paths is obtained;Survey grid to be checked is determined according to tree set of paths
It stands and the similarity of the structure of web page of benchmark website.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps: taking out
Take the keyword in tree set of paths;The keyword of keyword and specified type in comparison tree set of paths, obtains similarity;
In the case where similarity is greater than similarity threshold, the keyword in website to be detected there are specified type is determined.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others
Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, only
A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or
Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code
Medium.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (25)
1. a kind of method for detecting website characterized by comprising
Determine the similarity of the structure of web page of website to be detected and benchmark website;
In the case where the similarity is greater than the first preset value, judge in the website to be detected with the presence or absence of specified type
Keyword;
There are in the case where the keyword of the specified type in determining the website to be detected, the website to be detected is determined
For the website of specified type.
2. the method according to claim 1, wherein the method also includes:
In the case where the similarity is less than or equal to first preset value and is greater than the second preset value, judge described to be detected
Whether the similarity of the domain name of website and the domain name of the benchmark website is greater than third preset value, and/or, the website to be detected
Domain name price whether less than the 4th preset value;
It is greater than the third preset value in the similarity of the domain name of the domain name and the benchmark website that determine the website to be detected,
And/or in the case that the domain name price of the website to be detected is less than the 4th preset value, judge in the website to be detected
With the presence or absence of the keyword of the specified type;
There are in the case where the keyword of the specified type in determining the website to be detected, the website to be detected is determined
For the website of the specified type.
3. the method according to claim 1, wherein determining the webpage knot of the website to be detected and benchmark website
The similarity of structure, comprising:
Obtain the dom tree of the website to be detected;
The dom tree is decomposed, tree set of paths is obtained;
The similarity of the structure of web page of the website to be detected and the benchmark website is determined according to the tree set of paths.
4. according to the method described in claim 3, it is characterized in that, judging in the website to be detected with the presence or absence of specified type
Keyword, comprising:
Extract the keyword in the tree set of paths;
The keyword for comparing the keyword and the specified type in the tree set of paths, obtains the similarity;
In the case where the similarity is greater than similarity threshold, determine that there are the specified types in the website to be detected
Keyword.
5. a kind of method for detecting website characterized by comprising
Obtain the data to be tested of website to be detected;
Determine the first similarity of the data to be tested and the data in the library of abnormal website, wherein the exception website Ku Bao
Structure of web page containing multiple abnormal websites;
Determine the second similarity of the data to be tested and the keyword in sensitive dictionary;
If first similarity is greater than first threshold, and second similarity is greater than second threshold, it is determined that described to be checked
Survey grid station is the website of specified type.
6. according to the method described in claim 5, it is characterized in that, before the data to be tested for obtaining website to be detected, institute
State method further include:
Construct the abnormal website library and the sensitive dictionary;
Abnormality detection library is constructed according to the abnormal website library and the sensitive dictionary.
7. according to the method described in claim 6, it is characterized in that, the building sensitive dictionary, comprising:
Obtain the data set of the multiple abnormal website;
The data set of the multiple abnormal website is handled, the tree set of paths of the data set is obtained;
Extract the keyword in the tree set of paths, wherein the keyword is sensitive keys word;
According to the keyword building the being drawn into sensitive dictionary.
8. according to the method described in claim 6, it is characterized in that, the building abnormal website library, comprising:
Obtain the dom tree of the multiple abnormal website;
The dom tree is decomposed, tree set of paths is obtained;
The similarity of the structure of web page of the multiple abnormal website is determined according to the tree set of paths;
Clustering processing is carried out to the multiple abnormal website according to the similarity of the structure of web page, obtains cluster result;
According to the cluster result building abnormal website library.
9. according to the method described in claim 8, it is characterized in that, determining the multiple abnormal net according to the tree set of paths
The similarity for the structure of web page stood, comprising:
Obtain the first similarity of each path and coupling path in the tree set of paths, wherein the coupling path is
The highest path of similarity corresponding with each path;
The similarity of the structure of web page is determined according to first similarity.
10. according to the method described in claim 8, it is characterized in that, according to the cluster result building abnormal website library,
Include:
The Template web page in every class exception website is determined according to the cluster result;
Based on the Template web page building abnormal website library.
11. according to the method described in claim 6, it is characterized in that, the abnormality detection library further include: domain-name information library, institute
Stating domain-name information library includes violation domain-name information library and domain name price library, wherein is determining that the website to be detected is specified
Before the website of type, the method also includes:
Obtain the domain-name information of the website to be detected;
Determine the domain name of the website to be detected and the third similarity of the domain name in violation domain-name information library, and/or,
The domain name price of the website to be detected is determined according to domain name price library.
12. according to the method for claim 11, which is characterized in that determine that the website to be detected is the net of specified type
It stands, comprising:
If first similarity is greater than the first threshold, second similarity is greater than the second threshold, and described the
Three similarities are greater than third preset value, and/or, the domain name price of the website to be detected is less than default price, it is determined that described
Website to be detected is the website of the specified type.
13. the method according to any one of claim 6 or 11, which is characterized in that determining the website to be detected
After the website of specified type, the method also includes:
Store website to the abnormality detection library of the specified type.
14. a kind of method for detecting website characterized by comprising
Receive the data information of website to be detected;
It is evaluated based on data information of multiple abnormality detection libraries to the website to be detected, obtains the website to be detected
Value-at-risk, wherein different abnormality detection libraries corresponds to different judgment rules, and the judgment rule is described to be detected for determining
Value-at-risk of the website under the different abnormality detection library;
The Type of website of the website to be detected is determined based on the value-at-risk of the website to be detected.
15. according to the method for claim 14, which is characterized in that the multiple abnormality detection library includes at least: sensitive word
Library, abnormal website library, domain-name information library, wherein the exception website library includes the structure of web page of multiple abnormal websites.
16. according to the method for claim 14, which is characterized in that be based on multiple abnormality detection libraries to the survey grid to be checked
The data information stood is evaluated, after obtaining the value-at-risk of the website to be detected, the method also includes:
In the case where determining the website to be detected is abnormal website, the website to be detected is stored to corresponding abnormality detection
In library.
17. according to the method for claim 15, which is characterized in that be based on multiple abnormality detection libraries to the survey grid to be checked
The data information stood is evaluated, before obtaining the value-at-risk of the website to be detected, the method also includes: building is described quick
Feel dictionary and the abnormal website library, wherein constructing the sensitive dictionary includes:
Obtain the data set of abnormal website;
The data set of the abnormal website is handled, the tree set of paths of the data set is obtained;
Extract the keyword in the tree set of paths, wherein the keyword is sensitive keys word;
According to the keyword building the being drawn into sensitive dictionary.
18. according to the method for claim 15, which is characterized in that the building abnormal website library, comprising:
Obtain the dom tree of the multiple abnormal website;
The dom tree is decomposed, tree set of paths is obtained;
The similarity of the structure of web page of the multiple abnormal website is determined according to the tree set of paths;
Clustering processing is carried out to the multiple abnormal website according to the similarity of the structure of web page, obtains cluster result;
According to the cluster result building abnormal website library.
19. according to the method for claim 15, which is characterized in that based on multiple abnormality detection libraries to the website to be detected
Data information evaluated, obtain the value-at-risk of the website to be detected, comprising:
It obtains and the data information of the website to be detected is detected based on the sensitive dictionary, obtain the first value-at-risk;
It obtains and the data information of the website to be detected is detected based on the abnormal website library, obtain the second value-at-risk;
Acquisition is detected based on data information of the domain name information bank to the website to be detected, obtains third value-at-risk;
Summation is weighted to first value-at-risk, second value-at-risk and the third value-at-risk, determine it is described to
Detect the value-at-risk of website.
20. according to the method for claim 19, which is characterized in that described in the value-at-risk determination based on the website to be detected
The Type of website of website to be detected, comprising:
Judge whether the value-at-risk of the website to be detected is greater than default value-at-risk;
In the case where the value-at-risk of the website to be detected is greater than default value-at-risk, determine that the website to be detected is described different
Normal website.
21. a kind of system for detecting website characterized by comprising
Input unit, for obtaining website to be detected;
Processor, the similarity of the structure of web page for determining the website to be detected and benchmark website, and in the similarity
In the case where greater than the first preset value, if there are the keywords of specified type in the website to be detected, it is determined that it is described to
Detect the website that website is specified type.
22. system according to claim 21, which is characterized in that the system also includes:
Memory, for storing the website that the website to be detected is the specified type.
23. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program
When control the storage medium where equipment execute following steps:
Determine the similarity of the structure of web page of website to be detected and benchmark website;
In the case where the similarity is greater than the first preset value, judge in the website to be detected with the presence or absence of specified type
Keyword;
There are in the case where the keyword of the specified type in determining the website to be detected, the website to be detected is determined
For the website of specified type.
24. a kind of processor, which is characterized in that the processor is for running program, wherein executed when described program is run with
Lower step:
Determine the similarity of the structure of web page of website to be detected and benchmark website;
In the case where the similarity is greater than the first preset value, judge in the website to be detected with the presence or absence of specified type
Keyword;
There are in the case where the keyword of the specified type in determining the website to be detected, the website to be detected is determined
For the website of specified type.
25. a kind of system for detecting website characterized by comprising
Processor;And
Memory is connected to the processor, for providing the instruction for handling following processing step for the processor:
Determine the similarity of the structure of web page of website to be detected and benchmark website;
In the case where the similarity is greater than the first preset value, judge in the website to be detected with the presence or absence of specified type
Keyword;
There are in the case where the keyword of the specified type in determining the website to be detected, the website to be detected is determined
For the website of specified type.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810164312.4A CN110309402A (en) | 2018-02-27 | 2018-02-27 | Detect the method and system of website |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810164312.4A CN110309402A (en) | 2018-02-27 | 2018-02-27 | Detect the method and system of website |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110309402A true CN110309402A (en) | 2019-10-08 |
Family
ID=68073643
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810164312.4A Pending CN110309402A (en) | 2018-02-27 | 2018-02-27 | Detect the method and system of website |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110309402A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111078962A (en) * | 2019-12-24 | 2020-04-28 | 北京海致星图科技有限公司 | Method, system, medium and device for finding similar website sections |
CN112328732A (en) * | 2020-10-22 | 2021-02-05 | 上海艾融软件股份有限公司 | Sensitive word detection method and device and sensitive word tree construction method and device |
CN112347327A (en) * | 2020-10-22 | 2021-02-09 | 杭州安恒信息技术股份有限公司 | Website detection method and device, readable storage medium and computer equipment |
CN116680700A (en) * | 2023-05-18 | 2023-09-01 | 北京天融信网络安全技术有限公司 | Risk detection method, apparatus, device and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102739679A (en) * | 2012-06-29 | 2012-10-17 | 东南大学 | URL(Uniform Resource Locator) classification-based phishing website detection method |
US20130086677A1 (en) * | 2010-12-31 | 2013-04-04 | Huawei Technologies Co., Ltd. | Method and device for detecting phishing web page |
CN103179095A (en) * | 2011-12-22 | 2013-06-26 | 阿里巴巴集团控股有限公司 | Method and client device for detecting phishing websites |
CN103685307A (en) * | 2013-12-25 | 2014-03-26 | 北京奇虎科技有限公司 | Method, system, client and server for detecting phishing fraud webpage based on feature library |
CN104615760A (en) * | 2015-02-13 | 2015-05-13 | 北京瑞星信息技术有限公司 | Phishing website recognizing method and phishing website recognizing system |
-
2018
- 2018-02-27 CN CN201810164312.4A patent/CN110309402A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130086677A1 (en) * | 2010-12-31 | 2013-04-04 | Huawei Technologies Co., Ltd. | Method and device for detecting phishing web page |
CN103179095A (en) * | 2011-12-22 | 2013-06-26 | 阿里巴巴集团控股有限公司 | Method and client device for detecting phishing websites |
CN102739679A (en) * | 2012-06-29 | 2012-10-17 | 东南大学 | URL(Uniform Resource Locator) classification-based phishing website detection method |
CN103685307A (en) * | 2013-12-25 | 2014-03-26 | 北京奇虎科技有限公司 | Method, system, client and server for detecting phishing fraud webpage based on feature library |
CN104615760A (en) * | 2015-02-13 | 2015-05-13 | 北京瑞星信息技术有限公司 | Phishing website recognizing method and phishing website recognizing system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111078962A (en) * | 2019-12-24 | 2020-04-28 | 北京海致星图科技有限公司 | Method, system, medium and device for finding similar website sections |
CN112328732A (en) * | 2020-10-22 | 2021-02-05 | 上海艾融软件股份有限公司 | Sensitive word detection method and device and sensitive word tree construction method and device |
CN112347327A (en) * | 2020-10-22 | 2021-02-09 | 杭州安恒信息技术股份有限公司 | Website detection method and device, readable storage medium and computer equipment |
CN112347327B (en) * | 2020-10-22 | 2024-03-19 | 杭州安恒信息技术股份有限公司 | Website detection method and device, readable storage medium and computer equipment |
CN116680700A (en) * | 2023-05-18 | 2023-09-01 | 北京天融信网络安全技术有限公司 | Risk detection method, apparatus, device and storage medium |
CN116680700B (en) * | 2023-05-18 | 2024-06-14 | 北京天融信网络安全技术有限公司 | Risk detection method, apparatus, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109271512B (en) | Emotion analysis method, device and storage medium for public opinion comment information | |
CN110275958B (en) | Website information identification method and device and electronic equipment | |
CN110309402A (en) | Detect the method and system of website | |
CN111881983B (en) | Data processing method and device based on classification model, electronic equipment and medium | |
CN109918560A (en) | A kind of answering method and device based on search engine | |
CN106570513A (en) | Fault diagnosis method and apparatus for big data network system | |
CN107220296A (en) | The generation method of question and answer knowledge base, the training method of neutral net and equipment | |
CN105577685A (en) | Intrusion detection independent analysis method and system in cloud calculation environment | |
CN108229170B (en) | Software analysis method and apparatus using big data and neural network | |
CN110222171A (en) | A kind of application of disaggregated model, disaggregated model training method and device | |
CN111866004B (en) | Security assessment method, apparatus, computer system, and medium | |
CN110134961A (en) | Processing method, device and the storage medium of text | |
CN110019519A (en) | Data processing method, device, storage medium and electronic device | |
CN110414581B (en) | Picture detection method and device, storage medium and electronic device | |
CN109657459A (en) | Webpage back door detection method, equipment, storage medium and device | |
CN109634820A (en) | A kind of fault early warning method, relevant device and the system of the collaboration of cloud mobile terminal | |
CN110365691A (en) | Fishing website method of discrimination and device based on deep learning | |
CN110472866A (en) | A kind of work order quality inspection analysis method and device | |
CN113704420A (en) | Method and device for identifying role in text, electronic equipment and storage medium | |
CN114462040A (en) | Malicious software detection model training method, malicious software detection method and malicious software detection device | |
CN112131354B (en) | Answer screening method and device, terminal equipment and computer readable storage medium | |
CN113628043A (en) | Complaint validity judgment method, device, equipment and medium based on data classification | |
CN108875374B (en) | Malicious PDF detection method and device based on document node type | |
CN115879110A (en) | System for identifying financial risk website based on fingerprint penetration technology | |
CN109714342A (en) | The guard method of a kind of electronic equipment and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40015537 Country of ref document: HK |
|
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191008 |
|
RJ01 | Rejection of invention patent application after publication |