CN102436564A - Method and device for identifying falsified webpage - Google Patents
Method and device for identifying falsified webpage Download PDFInfo
- Publication number
- CN102436564A CN102436564A CN2011104561726A CN201110456172A CN102436564A CN 102436564 A CN102436564 A CN 102436564A CN 2011104561726 A CN2011104561726 A CN 2011104561726A CN 201110456172 A CN201110456172 A CN 201110456172A CN 102436564 A CN102436564 A CN 102436564A
- Authority
- CN
- China
- Prior art keywords
- content
- pages
- webpage
- request
- distorted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a method and device for identifying a falsified webpage. The method comprises the following steps of: initiating a request of accessing a target webpage by simulating a mode of inputting an uniform resource locator (URL) in a browser address bar; determining an obtained page content as a first page content; initiating a request of accessing the target webpage by simulating a mode of skipping from a link; determining the obtained page content as a second page content; comparing the first page content with the second page content to obtain a comparison result; and according to the comparison result, identifying whether the target webpage is the falsified webpage. According to the method and the device, whether the target webpage is falsified or not can be effectively identified, and an effective means for judging whether the target page is falsified is provided for users and computer service.
Description
Technical field
The present invention relates to field of computer technology, particularly relate to method and device that webpage is distorted in a kind of identification.
Background technology
In today that ecommerce, E-Government are popularized day by day, the website has become the vivid window of enterprises and institutions, government bodies, also is the important means that externally releases news, commences business, provides service.If Website page is distorted, not only will influence carrying out of regular traffic, and can bring the negative effect that to estimate corporate image, government's prestige.What is more, and some lawless person also utilizes and distorts this means of webpage and swindle criminal activity.If to the webpage tamper of government website, especially contain politics and attack distorting of color, can cause serious harm to government image; The unique people of other may utilize the people that the trust of government website is carried out semanteme to webpage and distort, and spread rumors causes the unnecessary fear of the common people and suspects, thereby has caused tremendous loss to the country and people.
Such as; Health and epidemic prevention bulletin " this area finds the enteron aisle influenza virus " on certain government website is distorted is " this area's discovery avian influenza virus "; Message is reprinted on the network media one after another, and the result certainly will cause fear and the enormous economic loss that the common people are unnecessary.For another example, it is 10 yuan that certain commodity price on certain e-commerce website is distorted from 1000 yuan, causes a large amount of orders sudden as snowflake, and what this website faced will be that real profit and business reputation can't be taken into account save from damage in straitened circumstances.
Along with Internet fast development, the incident that the website is invaded, webpage is distorted frequently takes place, and various hacking techniques are misused in the internet, causes an immeasurable loss all for every year individual and social organization.
Therefore, press for the technical matters that those skilled in the art solve and just be, the method that how to provide a kind of effective discriminating webpage whether to be distorted is for user and other Computer Service provide a kind of effective means of judging whether webpage is distorted.
Summary of the invention
The invention provides a kind of identification and distorted the method and the device of webpage, can effectively discern and distorted webpage, for the user provides a kind of effective means of judging whether webpage is distorted with other Computer Service.
The invention provides following scheme:
The method that webpage is distorted in a kind of identification comprises:
Through the mode of simulation input uniform resource position mark URL in browser address bar, initiate the request of access destination webpage, and the content of pages that obtains is confirmed as first content of pages;
Through the mode that simulation is carried out redirect by link, initiate the request of the said target web of visit, and the content of pages that obtains is confirmed as second content of pages;
More said first content of pages and second content of pages obtain a comparative result;
Discern said target web according to said comparative result and whether distorted webpage.
Wherein, said mode of carrying out redirect by link through simulation is initiated the request of the said target web of visit, comprising:
Through the mode that redirect is carried out in the link of simulating in the Search Results that is provided by search engine, initiate the request of the said target web of visit.
Wherein, said first content of pages and second content of pages obtain a comparative result, comprising:
The key element of more said first content of pages and second content of pages obtains a comparative result.
Wherein, said comparison first content of pages and second content of pages obtain a comparative result, comprising:
Compare first content of pages and second content of pages, obtain the similarity of first content of pages and second content of pages;
Said to discern said target web according to said comparative result be for by being distorted webpage, comprising:
Whether the similarity according to said first content of pages and second content of pages reaches preset threshold value, discerns said target web and whether is distorted webpage.
The device of webpage is distorted in a kind of identification, comprising:
The first content of pages acquiring unit is used for initiating the request of access destination webpage, and the content of pages that obtains being confirmed as first content of pages through the mode of simulation in browser address bar input uniform resource position mark URL;
The second content of pages acquiring unit is used for the mode of carrying out redirect by link through simulation, initiates the request of the said target web of visit, and the content of pages that obtains is confirmed as second content of pages;
Comparing unit is used for more said first content of pages and second content of pages, obtains a comparative result;
Recognition unit is used for discerning said target web according to said comparative result and whether is distorted webpage.
Wherein, the said second content of pages acquiring unit comprises:
Search engine redirect subelement, the mode that redirect is carried out in the link of the Search Results that is used for being provided by search engine through simulation is initiated the request of the said target web of visit.
Wherein, said comparing unit comprises:
Key element is subelement relatively, is used for the key element of more said first content of pages and second content of pages, obtains a comparative result.
Wherein, said comparing unit specifically is used for:
Compare first content of pages and second content of pages, obtain the similarity of first content of pages and second content of pages;
Said judging unit specifically is used for:
Whether the similarity according to said first content of pages and second content of pages reaches preset threshold value, discerns said target web and whether is distorted webpage.
According to specific embodiment provided by the invention, the invention discloses following technique effect:
Through the present invention, can initiate the request of access destination webpage through the mode of simulation input uniform resource position mark URL in browser address bar; And the mode of carrying out redirect by link; Initiate the request of access destination webpage, and the content of pages that relatively obtains, thus the difference of the content of pages of finding to obtain by dual mode access destination webpage; And disclose the behavior that webpage is distorted, can whether be distorted webpage by the effective recognition target web.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art; To do to introduce simply to the accompanying drawing of required use among the embodiment below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the process flow diagram of the method that provides of the embodiment of the invention;
Fig. 2 is the schematic representation of apparatus that the embodiment of the invention provides.
Embodiment
To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, the every other embodiment that those of ordinary skills obtained belongs to the scope that the present invention protects.
No matter at first need to prove, in the time of webpage of internet user access, be through in browser's address bar, directly importing the mode of uniform resource position mark URL; The mode of still carrying out redirect by link; In fact all be to use the browser of local computer, sent a HTTP (HTTP, HyperText Transfer Protocol) request to server through the internet; This HTTP request has comprised one or several usually; Necessary or non-essential request header perhaps is called header field, has comprised the request type information to server requests in the request header.
Like request header Accept-Charset, it has represented the acceptable character set information of browser of local computer; Again such as request header User-Agent; It has comprised operating system that the client uses and version, cpu type, browser and version, browser renders engine, browser language, browser plug-in etc.; So that server is through judging the particular content of request header User-Agent; In response user request,, generate and send the different pages according to the employed computer software and hardware environment of different users; Again such as request header Referer; It has comprised a uniform resource position mark URL, and it is to come through the URL redirect that wherein comprises to clear this request of server table, and promptly the user is from the page of this URL representative; The page of visit current request; The commercial cooperation in current website closely with the frequent environment of search engine use under, request header Referer is used in the request of most of page jump, has played to make things convenient for server that visit data such as is added up at effect.
Need to prove that in addition in today that search engine is propagated its belief on a large scale, search engine has become the internet surfing necessary tool, it provides the information of every field for people, for people's life is providing facility.And why search engine can provide various information, has brought into play vital role as the web crawlers of one of basic ingredient of search engine.Web crawlers is a kind ofly day and night to work, and can download, analyze and extract the program or the script of the info web on the WWW according to certain rule automatically, and the page that provides of the Web server on its access internet is for search engine provides information source.And in the process of web crawlers accessing Web services device, the HTTP head of the access request that web crawlers sends has comprised the peculiar information content of search engine usually.Such as then having comprised the distinctive web crawlers program name of each search engine among the request header User-Agent, such as the web crawlers program " Googlebot " of google search engine.
Secure context at network; Game between hacker and security service provider, the computer user never stopped, and the hacker can take certain strategy usually when implementing hacker's behavior; The lawbreaking activities of oneself is pretended and covered up, to reach the purpose that is not disclosed.For webpage tamper; A kind of characteristics of hacking technique wherein; The following situation that often runs in can the process through user's browsing page reflects: the user is when directly input target network address is browsed in browser's address bar, and what open is the normal webpage of not distorted, and the Search Results through search engine or when carrying out redirect and get into this webpage by the link of other webpages; The webpage of opening but is the webpage through distorting; Institute demonstrates content and former webpage has sizable gap, even changed beyond recognition, be not fully former webpage the information that will represent.
Reality in practical application is that the general internet user is when needs are opened a new webpage, under most of situation; Be not through in address field directly the actual network address of input webpage conduct interviews because the complete network address of most of webpage is very long, be not easy to memory; Knock complete network address and waste user time again, so, when the user wants to arrive certain webpage; Often adopt the Search Results through search engine, perhaps redirect is carried out in the link of other webpage; In addition, the Internet user is when surfing the web, and the behavior of much opening webpage does not have clear and definite purpose, promptly when finding interested content in the webpage of user in current browsing, can jump to interested webpage through the link of current web page usually.
And for the people of real concern specific webpage content, such as the owner, the supvr of website, when needs get into certain specific webpage; Owing to know the network address of particular webpage; Most applications can't be via search engine search results, and perhaps the link of other pages mode that jumps to particular webpage is browsed, but directly in browser's address bar directly input target network address browse; At this moment; What show is the normal webpage of not distorted, and for the content of being distorted, this type special viewer but be difficult to find.
This shows; When webpage of needs visit; The mode great majority that domestic consumer uses belong to through link and carry out redirect, and for special populations such as the owner of website, supvrs, owing to there are not the needs that use the link redirect usually; Usually use the mode of directly in browser address bar, directly importing the actual network address of webpage to conduct interviews; Caused to find the content part that webpage is distorted under the most of situation of this type user crowd, and the behavioral characteristic of these browsing pages has just given the hacker who implements the webpage tamper behavior with opportunity; Feasible enforcement has the hacker of the webpage tamper behavior of These characteristics, and the behavior of oneself distorting webpage has been carried out effectively covering up.
The inventor finds in realizing process of the present invention; Why can occur in the browser's address bar directly input target network address and carry out web page browsing; With Search Results through search engine or carry out redirect by linking of other webpages and carry out browsing of same webpage, the content that is shown has sizable gap, says from technology angle; Be because in the process of user capture webpage; Implement the hacker of webpage tamper behavior, abduction has been implemented in the HTTP that is sent when the user is used browser browsing page request, and analyzes the HTTP requested feature; Then take different means, to such an extent as to the user has obtained different web page contents according to different analysis results.At length introduce in the face of this down.
When the user initiates the access request to a webpage; Be actually by browser and sent a HTTP request to Web server; The hacker who implements the webpage tamper behavior can kidnap and analyze this request, and carries out different processing according to the HTTP requested feature: if in the browse request of sending, the target network address of being asked comes from the direct input of user in browser's address bar; Then this HTTP request is let pass; Target Web server by the HTTP request returns normal web page contents, and thus, the content that is presented on the user browser is the normal web page contents that does not have content tampering; And perhaps carry out the HTTP request that redirect comes browsing page by the link of other webpages for the Search Results that passes through search engine that user browser sends, then directly return to the webpage that quilt of user is distorted.
Concrete; The hacker who implements the webpage tamper behavior analyzes what the HTTP that sends to the target Web server that kidnaps asked; What in fact, the hacker of enforcement webpage tamper behavior analyzed is the information that is comprised to the HTTP head that the HTTP that the target Web server sends asks.For example analyze the Referer request header; Just can obtain the URL that the Referer request header is comprised; Promptly analyze and obtain the page of user from the page visit current request of which URL representative, whether the hacker who implements the webpage tamper behavior so just can judge current HTTP request is the HTTP request of sending through the link redirect of specific webpage; And for example; Analyze the User-Agent request header; Obtain the employed software information of the person of sending of current HTTP request; The hacker who implements the webpage tamper behavior so just can judge the person of sending of current HTTP request employed be which type of software, such as being the browser that the user uses, the perhaps crawlers of search engine use etc.
The hacker who implements the webpage tamper behavior analyzes through what the HTTP that sends to the target Web server that kidnaps was asked; According to analysis result; Confirm it is this HTTP request of letting pass, the target Web server of being asked by this HTTP returns normal webpage, still returns the webpage of distorting.So just caused opening the difference of the content of same webpage through different modes, even, wrong information also comprised in the Search Results that obtains by the crawlers of some search engine, promptly in the Search Results of search engine.
Based on above analysis, the method that the embodiment of the invention provides a kind of identification to be distorted webpage, referring to Fig. 1, the method includes the steps of:
S101: through the mode of simulation input uniform resource position mark URL in browser address bar, initiate the request of access destination webpage, and the content of pages that obtains is confirmed as first content of pages;
In embodiments of the present invention, at first through HTTP request of structure, simulation is initiated the request of access destination webpage with the mode of input URL in browser address bar.The HTTP request of this structure possesses the mode with input URL in browser address bar, initiates the characteristic of the HTTP access request of access destination webpage.With the mode of input URL in browser address bar, the HTTP access request of the access destination webpage of initiation, in its request header, the Referer request header is not involved usually, promptly in this type of HTTP request, does not have the Referer request header; In addition, in the request header of the HTTP of structure request, comprise the User-Agent request header, in the User-Agent request header, constructed user browser information, for example:
User-Agent:Mozilla/5.0(compatible;MSIE?9.0;Windows?NT?6.1;Trident/5.0)
In the example of this User-Agent request header; User browser type, version have been provided; Information such as operating system of user version; This User-Agent request header can be identified as the mode with input URL in browser address bar, the HTTP request header of initiating the HTTP access request of access destination webpage.
Through constructing a HTTP request that comprises above characteristic; Simulate a mode with input URL in browser address bar; Initiate the HTTP request of access destination webpage, and, the content of pages that obtains is confirmed as first content of pages to the HTTP request that the target Web server sends this structure.
Because the HTTP of this structure request possesses the mode with input URL in browser address bar; Initiate the characteristic of the HTTP access request of access destination webpage; If implement the HTTP request that the hacker of webpage tamper behavior kidnapped and analyzed this structure so; According to hacker's behavioural characteristic, can be identified as mode to this HTTP access request with input URL in browser address bar, initiate the HTTP request of access destination webpage; And let pass, return a normal web page contents by Web server then.Therefore in embodiments of the present invention, first content of pages that obtains is normal content of pages.
S102: through the mode that simulation is carried out redirect by link, initiate the request of the said target web of visit, and the content of pages that obtains is confirmed as second content of pages;
Except obtaining first content of pages, also need be through HTTP request of structure, the mode that simulation is carried out redirect by link is initiated the request of access destination webpage.The HTTP request of this structure possesses the mode of being carried out redirect by link, initiates the HTTP requested feature of access destination webpage.Carry out the mode of redirect by link; Initiate the HTTP request of the said target web of visit, in its HTTP request, comprised the Referer request header; Comprised a URL information in this Referer request header; Explained that this HTTP request is to come through the URL redirect that comprises in the Referer request header, promptly this HTTP request is to set out through the URL that comprises in this Referer request header, the HTTP request of visit current page.This Referer request header can be identified as the mode of being carried out redirect by link, initiates the request header of the HTTP request of access destination webpage.
Through constructing a HTTP request that comprises above Referer request header characteristic; Simulate a mode of carrying out redirect by link; Initiate the request HTTP request of access destination webpage; And, the content of pages that obtains is confirmed as second content of pages to the HTTP request that the target Web server sends this structure
Because the HTTP of this structure request possesses the mode of being carried out redirect by link; Initiate the HTTP requested feature of access destination webpage; If implement the HTTP request that the hacker of webpage tamper behavior kidnapped and analyzed this structure so,, can be identified as the mode of carrying out redirect by link to this HTTP access request according to hacker's behavioural characteristic; Initiate the HTTP request of access destination webpage, return the web page contents of being distorted then.Therefore in embodiments of the present invention, if target web is distorted, second content of pages that obtains through the HTTP request of constructing is the content of pages of being distorted.
S103: more said first content of pages and second content of pages obtain a comparative result;
During concrete the realization, relatively first content of pages and second content of pages obtain comparative result, and multiple concrete implementation can be arranged.For example, wherein a kind of implementation can be comparison first page full content and second page full content, obtains an accurate relatively comparative result.During concrete the realization, can generate the DOM Tree of first page and second page respectively according to the HTML code of first page and second page, whether identical according to the element on two each corresponding node of DOM tree, compare.
But in practical application; Because relatively the system overhead of first page full content and second page full content can be bigger; Therefore except the strategy that compares first page full content and second page full content, also can use the another kind of implementation of taking following strategy:, generate the DOM Tree of first page and second page respectively according to the HTML code of first page and second page; Choose two elements on the DOM tree part corresponding nodes, compare.Specifically when choosing, can choose at random as required, perhaps specify or the like according to certain strategy.
In addition, can also adopt following mode to compare: corresponding key element in the key element of first content of pages and second content of pages relatively obtains a comparative result.Wherein, when confirming the key element of the page, can difference according to actual needs confirm key element to be compared.Wherein a kind of strategy of definite key element to be compared can be; At first with picture that the page comprised, flash, file such as audio-visual; Key word in the page, keyword, contents such as page title are as the set of page key element; Then with the subclass of this page key element set, as the comparison other of the key element to be compared of the key element of first content of pages relatively and second content of pages.Wherein, When with picture that the page was comprised, flash, file such as audio-visual during as key element to be compared; Can compare according to indexs such as the title of file, size, proof test values; The title of its file can directly obtain in the HTML code by the page, and the size of file, proof test value can obtain through calculating.
Specifically in the key element of first content of pages relatively and second content of pages in the process of corresponding key element; Can be after confirming to need key element subclass relatively; At first according to attribute of an element in the HTML code; Find key element to be compared at first page, in second page, search whether have corresponding key element then, relatively whether these key elements are identical.
About comparative result multiple expression way can be arranged, for example can comparative result be divided into identically and incomplete same, also can the comparative result of first content of pages and second content of pages be quantified as similarity between the two.
S104: discern said target web according to said comparative result and whether distorted webpage.
During concrete the realization, whether being distorted webpage according to the comparative result recognition objective page, multiple concrete implementation can be arranged, wherein a kind ofly be, is identical or incomplete same according to comparative result, and target web is identified as normal webpage or is distorted webpage.
In addition, can be the occurrence of the similarity of first content of pages and second content of pages also according to comparative result, come the recognition objective webpage whether to be distorted webpage.This mode has following realistic meaning in practical application:
In practical application, access frequency and the search rank of many webpages in order to improve search engine consideration such as to enhance the reputation, needs the crawlers of search engine always to grasp the webpage of oneself with very high frequency.But; If what exist in webpage all is static constant content; The crawlers frequency that grasps this webpage may reduce so, and then will cause this webpage to reduce through the probability of search engine redirect, to such an extent as to can't improve the clicking rate of webpage through search engine.Therefore; The webpage making person can specially be provided with the content of a part of dynamic change in webpage; Certainly the content of this part dynamic change possibly be the sub-fraction in the webpage full content, and remaining most of content that embodies theme is constant (because its purpose only is to improve the frequency that is grasped by the crawlers of search engine).But this still can cause following actual conditions: obtain first content of pages and second content of pages has very high similarity with the method for the embodiment of the invention, though similarity does not reach absolutely, can not be defined as and distorted webpage.If directly use this moment the mode of " according to comparative result is identical or incomplete same, and target web is identified as normal webpage or is distorted webpage " to discern, then may be with the webpage of being distorted that is identified as of some normal webpage mistakes.
Therefore, in order to reduce the possibility of erroneous judgement, taked the strategy of " according to comparative result is the occurrence of the similarity of first content of pages and second content of pages, comes the recognition objective webpage whether to be distorted webpage ".Why do like this be because: if there is the content of the dynamic change that the wright specially is provided with in webpage; These contents are the sub-fraction in the content of pages usually; But, can the most contents in the webpage all have been distorted so usually if a webpage was distorted by the hacker.Therefore, the mode through the embodiment of the invention grabs after two content of pages, if though find incomplete same between the two; But similarity is bigger; Then can it be handled as normal webpage, and if similarity is very low, then can be used as to be distorted webpage and treat.During concrete the realization; One threshold value can be set in advance,, compare with this preset threshold value with the similarity that obtains that compares first content of pages and second content of pages; If the similarity that obtains of first content of pages and second content of pages is less than predetermined threshold value; Then target pages is identified as and is distorted the page, otherwise, then target pages is identified as the normal page.Predetermined threshold value can be provided with according to the needs of reality; Perhaps, can also take the method that dynamically arranges, through practice and calibration repeatedly; Dynamic threshold is chosen as a rational value; With what carry out at some webpage is normal renewal, rather than is implemented to avoid producing the risk of erroneous judgement under the situation that the hacker distorted of webpage tamper behavior.
The method of being distorted webpage with the identification that the embodiment of the invention provides is corresponding, the device that the embodiment of the invention also provides a kind of identification to be distorted webpage, and referring to Fig. 2, this device comprises:
The first content of pages acquiring unit 201 is used for initiating the request of access destination webpage, and the content of pages that obtains being confirmed as first content of pages through the mode of simulation in browser address bar input uniform resource position mark URL;
The second content of pages acquiring unit 202 is used for the mode of carrying out redirect by link through simulation, initiates the request of the said target web of visit, and the content of pages that obtains is confirmed as second content of pages;
Comparing unit 203 is used for more said first content of pages and second content of pages, obtains a comparative result;
Wherein, the second content of pages acquiring unit 202 can comprise:
Search engine redirect subelement, the mode that redirect is carried out in the link of the Search Results that is used for being provided by search engine through simulation is initiated the request of the said target web of visit.
Wherein, comparing unit 203 can comprise:
Key element is subelement relatively, is used for the key element of more said first content of pages and second content of pages, obtains a comparative result.
During concrete the realization, comparing unit 203 specifically is used for:
Compare first content of pages and second content of pages, obtain the similarity of first content of pages and second content of pages;
Accordingly, judging unit 204 specifically is used for:
Whether the similarity according to said first content of pages and second content of pages reaches preset threshold value, discerns said target web and whether is distorted webpage.
Through the present invention, can initiate the request of access destination webpage through the mode of simulation input uniform resource position mark URL in browser address bar; And the mode of carrying out redirect by link; Initiate the request of access destination webpage, and the content of pages that relatively obtains, thus the difference of the content of pages of finding to obtain by dual mode access destination webpage; And disclose the behavior that webpage is distorted, can whether be distorted webpage by the effective recognition target web.
Description through above embodiment can know, those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform.Based on such understanding; The part that technical scheme of the present invention contributes to prior art in essence in other words can be come out with the embodied of software product; This computer software product can be stored in the storage medium, like ROM/RAM, magnetic disc, CD etc., comprises that some instructions are with so that a computer equipment (can be a personal computer; Server, the perhaps network equipment etc.) carry out the described method of some part of each embodiment of the present invention or embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses all is the difference with other embodiment.Especially, for device or system embodiment, because it is basically similar in appearance to method embodiment, so describe fairly simplely, relevant part gets final product referring to the part explanation of method embodiment.Apparatus and system embodiment described above only is schematic; Wherein said unit as the separating component explanation can or can not be physically to separate also; The parts that show as the unit can be or can not be physical locations also; Promptly can be positioned at a place, perhaps also can be distributed on a plurality of NEs.Can realize the purpose of present embodiment scheme according to the needs selection some or all of module wherein of reality.Those of ordinary skills promptly can understand and implement under the situation of not paying creative work.
More than the method and the device of webpage are distorted in a kind of identification provided by the present invention; Carried out detailed introduction; Used concrete example among this paper principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, part all can change on embodiment and range of application.In sum, this description should not be construed as limitation of the present invention.
Claims (8)
1. the method that webpage is distorted in identification is characterized in that, comprising:
Through the mode of simulation input uniform resource position mark URL in browser address bar, initiate the request of access destination webpage, and the content of pages that obtains is confirmed as first content of pages;
Through the mode that simulation is carried out redirect by link, initiate the request of the said target web of visit, and the content of pages that obtains is confirmed as second content of pages;
More said first content of pages and second content of pages obtain a comparative result;
Discern said target web according to said comparative result and whether distorted webpage.
2. method according to claim 1 is characterized in that, said mode of carrying out redirect by link through simulation is initiated the request of the said target web of visit, comprising:
Through the mode that redirect is carried out in the link of simulating in the Search Results that is provided by search engine, initiate the request of the said target web of visit.
3. method according to claim 1 is characterized in that, said first content of pages and second content of pages obtain a comparative result, comprising:
The key element of more said first content of pages and second content of pages obtains a comparative result.
4. method according to claim 1 is characterized in that, said comparison first content of pages and second content of pages obtain a comparative result, comprising:
Compare first content of pages and second content of pages, obtain the similarity of first content of pages and second content of pages;
Said to discern said target web according to said comparative result be for by being distorted webpage, comprising:
Whether the similarity according to said first content of pages and second content of pages reaches preset threshold value, discerns said target web and whether is distorted webpage.
5. the device of webpage is distorted in an identification, it is characterized in that, comprising:
The first content of pages acquiring unit is used for initiating the request of access destination webpage, and the content of pages that obtains being confirmed as first content of pages through the mode of simulation in browser address bar input uniform resource position mark URL;
The second content of pages acquiring unit is used for the mode of carrying out redirect by link through simulation, initiates the request of the said target web of visit, and the content of pages that obtains is confirmed as second content of pages;
Comparing unit is used for more said first content of pages and second content of pages, obtains a comparative result;
Recognition unit is used for discerning said target web according to said comparative result and whether is distorted webpage.
6. device according to claim 5 is characterized in that, the said second content of pages acquiring unit comprises:
Search engine redirect subelement, the mode that redirect is carried out in the link of the Search Results that is used for being provided by search engine through simulation is initiated the request of the said target web of visit.
7. device according to claim 5 is characterized in that, said comparing unit comprises:
Key element is subelement relatively, is used for the key element of more said first content of pages and second content of pages, obtains a comparative result.
8. device according to claim 5 is characterized in that, said comparing unit specifically is used for:
Compare first content of pages and second content of pages, obtain the similarity of first content of pages and second content of pages;
Said judging unit specifically is used for:
Whether the similarity according to said first content of pages and second content of pages reaches preset threshold value, discerns said target web and whether is distorted webpage.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011104561726A CN102436564A (en) | 2011-12-30 | 2011-12-30 | Method and device for identifying falsified webpage |
PCT/CN2012/087640 WO2013097742A1 (en) | 2011-12-30 | 2012-12-27 | Methods and devices for identifying tampered webpage and identifying hijacked website |
US14/368,992 US20140380477A1 (en) | 2011-12-30 | 2012-12-27 | Methods and devices for identifying tampered webpage and inentifying hijacked web address |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011104561726A CN102436564A (en) | 2011-12-30 | 2011-12-30 | Method and device for identifying falsified webpage |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102436564A true CN102436564A (en) | 2012-05-02 |
Family
ID=45984622
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011104561726A Pending CN102436564A (en) | 2011-12-30 | 2011-12-30 | Method and device for identifying falsified webpage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102436564A (en) |
Cited By (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102917053A (en) * | 2012-10-18 | 2013-02-06 | 北京奇虎科技有限公司 | Method, device and system for judging uniform resource locator rewriting of webpage |
CN102932435A (en) * | 2012-10-18 | 2013-02-13 | 北京奇虎科技有限公司 | Network detection system |
CN102938041A (en) * | 2012-10-30 | 2013-02-20 | 北京神州绿盟信息安全科技股份有限公司 | Comprehensive detection method and system for page tampering |
WO2013097742A1 (en) * | 2011-12-30 | 2013-07-04 | 北京奇虎科技有限公司 | Methods and devices for identifying tampered webpage and identifying hijacked website |
CN103279710A (en) * | 2013-04-12 | 2013-09-04 | 深圳市易聆科信息技术有限公司 | Method and system for detecting malicious codes of Internet information system |
CN103281177A (en) * | 2013-04-10 | 2013-09-04 | 广东电网公司信息中心 | Method and system for detecting hostile attack on Internet information system |
CN103577526A (en) * | 2013-08-01 | 2014-02-12 | 星云融创(北京)信息技术有限公司 | Method and system as well as browser for verifying page modification |
CN103809941A (en) * | 2012-11-07 | 2014-05-21 | 江苏仕德伟网络科技股份有限公司 | Method for judging whether webpage is doorway page or not |
CN104008131A (en) * | 2014-04-30 | 2014-08-27 | 广州市动景计算机科技有限公司 | Processing method and device for web page data |
CN104052630A (en) * | 2013-03-14 | 2014-09-17 | 北京百度网讯科技有限公司 | Method and system for executing verification on website |
CN104484604A (en) * | 2014-12-31 | 2015-04-01 | 北京神州绿盟信息安全科技股份有限公司 | Method, scanner, device and system for identifying webpage distortion |
CN104506529A (en) * | 2014-12-22 | 2015-04-08 | 北京奇虎科技有限公司 | Website protection method and device |
CN104731949A (en) * | 2015-03-31 | 2015-06-24 | 北京奇虎科技有限公司 | Method and device for recognizing webpage skipping |
CN105184159A (en) * | 2015-08-27 | 2015-12-23 | 深圳市深信服电子科技有限公司 | Web page falsification identification method and apparatus |
CN105245518A (en) * | 2015-09-30 | 2016-01-13 | 小米科技有限责任公司 | Website hijacking detection method and device |
CN105630790A (en) * | 2014-10-28 | 2016-06-01 | 阿里巴巴集团控股有限公司 | Method and device for analyzing web codes |
WO2016082678A1 (en) * | 2014-11-24 | 2016-06-02 | 阿里巴巴集团控股有限公司 | Method and device for monitoring display hijack |
CN105719162A (en) * | 2016-01-20 | 2016-06-29 | 北京京东尚科信息技术有限公司 | Method and device of monitoring validity of promotion links |
CN106778357A (en) * | 2016-12-23 | 2017-05-31 | 北京神州绿盟信息安全科技股份有限公司 | The detection method and device of a kind of webpage tamper |
CN107301355A (en) * | 2017-06-20 | 2017-10-27 | 深信服科技股份有限公司 | A kind of webpage tamper monitoring method and device |
CN107749838A (en) * | 2017-09-27 | 2018-03-02 | 微梦创科网络科技(中国)有限公司 | A kind of method and device for detecting network and kidnapping |
CN107800720A (en) * | 2017-11-29 | 2018-03-13 | 广州酷狗计算机科技有限公司 | Kidnap report method, device, storage medium and equipment |
CN107819789A (en) * | 2017-12-07 | 2018-03-20 | 北京泛融科技有限公司 | A kind of content anti-hijack system and method based on block chain |
CN108183908A (en) * | 2017-12-29 | 2018-06-19 | 哈尔滨安天科技股份有限公司 | A kind of advertisement link based on network flow finds method, system and storage medium |
CN108255866A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | Check the method and apparatus linked in website |
CN108520185A (en) * | 2018-04-16 | 2018-09-11 | 深信服科技股份有限公司 | Detect method, apparatus, equipment and the computer readable storage medium of webpage tamper |
CN109800378A (en) * | 2019-01-23 | 2019-05-24 | 北京字节跳动网络技术有限公司 | Content processing method, device and electronic equipment based on custom browser |
CN110134901A (en) * | 2019-04-30 | 2019-08-16 | 哈尔滨英赛克信息技术有限公司 | A kind of multilink webpage tamper determination method based on flow analysis |
CN110929257A (en) * | 2019-10-30 | 2020-03-27 | 武汉绿色网络信息服务有限责任公司 | Method and device for detecting malicious codes carried in webpage |
CN111199040A (en) * | 2019-12-18 | 2020-05-26 | 中国平安人寿保险股份有限公司 | Page tampering detection method, device, terminal and storage medium |
CN111262842A (en) * | 2020-01-10 | 2020-06-09 | 恒安嘉新(北京)科技股份公司 | Webpage tamper-proofing method and device, electronic equipment and storage medium |
CN111382396A (en) * | 2018-12-29 | 2020-07-07 | 北京奇虎科技有限公司 | Homepage protection method and device |
CN112424778A (en) * | 2018-07-26 | 2021-02-26 | 电子技巧股份有限公司 | Information processing device, information processing method, and information processing program |
CN112507389A (en) * | 2020-10-28 | 2021-03-16 | 西安四叶草信息技术有限公司 | Webpage data processing method and device |
CN112507270A (en) * | 2020-12-11 | 2021-03-16 | 杭州安恒信息技术股份有限公司 | Website tampering alarm method based on title escape in cloud protection and related device |
CN113254984A (en) * | 2021-07-15 | 2021-08-13 | 天聚地合(苏州)数据股份有限公司 | Webpage monitoring method and device, storage medium and equipment |
CN113420252A (en) * | 2021-07-21 | 2021-09-21 | 北京字节跳动网络技术有限公司 | Proxy webpage detection method, device, equipment and storage medium |
CN114401115A (en) * | 2021-12-20 | 2022-04-26 | 浙江乾冠信息安全研究院有限公司 | Method, system, apparatus and medium for detecting anti-detection webpage tampering |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1728655A (en) * | 2004-11-25 | 2006-02-01 | 刘文印 | Method and system for detecting and identifying counterfeit web page |
CN101626368A (en) * | 2008-07-11 | 2010-01-13 | 中联绿盟信息技术(北京)有限公司 | Device, method and system for preventing web page from being distorted |
US20100287028A1 (en) * | 2009-05-05 | 2010-11-11 | Paul A. Lipari | System, method and computer readable medium for determining attention areas of a web page |
CN102129528A (en) * | 2010-01-19 | 2011-07-20 | 北京启明星辰信息技术股份有限公司 | WEB page tampering identification method and system |
-
2011
- 2011-12-30 CN CN2011104561726A patent/CN102436564A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1728655A (en) * | 2004-11-25 | 2006-02-01 | 刘文印 | Method and system for detecting and identifying counterfeit web page |
CN101626368A (en) * | 2008-07-11 | 2010-01-13 | 中联绿盟信息技术(北京)有限公司 | Device, method and system for preventing web page from being distorted |
US20100287028A1 (en) * | 2009-05-05 | 2010-11-11 | Paul A. Lipari | System, method and computer readable medium for determining attention areas of a web page |
CN102129528A (en) * | 2010-01-19 | 2011-07-20 | 北京启明星辰信息技术股份有限公司 | WEB page tampering identification method and system |
Cited By (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013097742A1 (en) * | 2011-12-30 | 2013-07-04 | 北京奇虎科技有限公司 | Methods and devices for identifying tampered webpage and identifying hijacked website |
CN102932435A (en) * | 2012-10-18 | 2013-02-13 | 北京奇虎科技有限公司 | Network detection system |
CN102917053B (en) * | 2012-10-18 | 2016-03-30 | 北京奇虎科技有限公司 | A kind of method, apparatus and system for judging webpage urlrewriting |
CN102932435B (en) * | 2012-10-18 | 2016-06-15 | 北京奇虎科技有限公司 | Network detection system |
CN102917053A (en) * | 2012-10-18 | 2013-02-06 | 北京奇虎科技有限公司 | Method, device and system for judging uniform resource locator rewriting of webpage |
CN102938041B (en) * | 2012-10-30 | 2015-04-15 | 北京神州绿盟信息安全科技股份有限公司 | Comprehensive detection method and system for page tampering |
CN102938041A (en) * | 2012-10-30 | 2013-02-20 | 北京神州绿盟信息安全科技股份有限公司 | Comprehensive detection method and system for page tampering |
CN103809941A (en) * | 2012-11-07 | 2014-05-21 | 江苏仕德伟网络科技股份有限公司 | Method for judging whether webpage is doorway page or not |
CN104052630A (en) * | 2013-03-14 | 2014-09-17 | 北京百度网讯科技有限公司 | Method and system for executing verification on website |
CN104052630B (en) * | 2013-03-14 | 2019-10-11 | 北京百度网讯科技有限公司 | The method and system of verifying is executed to website |
CN103281177A (en) * | 2013-04-10 | 2013-09-04 | 广东电网公司信息中心 | Method and system for detecting hostile attack on Internet information system |
CN103281177B (en) * | 2013-04-10 | 2016-09-14 | 广东电网公司信息中心 | Detection method and system to Internet information system malicious attack |
CN103279710A (en) * | 2013-04-12 | 2013-09-04 | 深圳市易聆科信息技术有限公司 | Method and system for detecting malicious codes of Internet information system |
CN103279710B (en) * | 2013-04-12 | 2016-04-13 | 深圳市易聆科信息技术有限公司 | Method and system for detecting malicious codes of Internet information system |
CN103577526A (en) * | 2013-08-01 | 2014-02-12 | 星云融创(北京)信息技术有限公司 | Method and system as well as browser for verifying page modification |
CN104008131A (en) * | 2014-04-30 | 2014-08-27 | 广州市动景计算机科技有限公司 | Processing method and device for web page data |
CN105630790A (en) * | 2014-10-28 | 2016-06-01 | 阿里巴巴集团控股有限公司 | Method and device for analyzing web codes |
CN105630790B (en) * | 2014-10-28 | 2019-06-04 | 阿里巴巴集团控股有限公司 | The analysis method and device of web page coding |
CN105701402B (en) * | 2014-11-24 | 2018-11-27 | 阿里巴巴集团控股有限公司 | A kind of method and apparatus that monitoring and displaying is kidnapped |
CN105701402A (en) * | 2014-11-24 | 2016-06-22 | 阿里巴巴集团控股有限公司 | Method and device for monitoring display hijack |
WO2016082678A1 (en) * | 2014-11-24 | 2016-06-02 | 阿里巴巴集团控股有限公司 | Method and device for monitoring display hijack |
CN104506529A (en) * | 2014-12-22 | 2015-04-08 | 北京奇虎科技有限公司 | Website protection method and device |
CN104506529B (en) * | 2014-12-22 | 2018-01-09 | 北京奇安信科技有限公司 | Website protection method and device |
CN104484604A (en) * | 2014-12-31 | 2015-04-01 | 北京神州绿盟信息安全科技股份有限公司 | Method, scanner, device and system for identifying webpage distortion |
CN104731949A (en) * | 2015-03-31 | 2015-06-24 | 北京奇虎科技有限公司 | Method and device for recognizing webpage skipping |
CN105184159A (en) * | 2015-08-27 | 2015-12-23 | 深圳市深信服电子科技有限公司 | Web page falsification identification method and apparatus |
CN105184159B (en) * | 2015-08-27 | 2018-11-27 | 深信服科技股份有限公司 | The recognition methods of webpage tamper and device |
CN105245518A (en) * | 2015-09-30 | 2016-01-13 | 小米科技有限责任公司 | Website hijacking detection method and device |
CN105245518B (en) * | 2015-09-30 | 2018-07-24 | 小米科技有限责任公司 | The detection method and device that network address is kidnapped |
CN105719162A (en) * | 2016-01-20 | 2016-06-29 | 北京京东尚科信息技术有限公司 | Method and device of monitoring validity of promotion links |
CN106778357A (en) * | 2016-12-23 | 2017-05-31 | 北京神州绿盟信息安全科技股份有限公司 | The detection method and device of a kind of webpage tamper |
CN108255866B (en) * | 2016-12-29 | 2020-10-27 | 北京国双科技有限公司 | Method and device for checking links in website |
CN108255866A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | Check the method and apparatus linked in website |
CN107301355A (en) * | 2017-06-20 | 2017-10-27 | 深信服科技股份有限公司 | A kind of webpage tamper monitoring method and device |
CN107749838B (en) * | 2017-09-27 | 2020-11-24 | 微梦创科网络科技(中国)有限公司 | Method and device for detecting network hijacking |
CN107749838A (en) * | 2017-09-27 | 2018-03-02 | 微梦创科网络科技(中国)有限公司 | A kind of method and device for detecting network and kidnapping |
CN107800720A (en) * | 2017-11-29 | 2018-03-13 | 广州酷狗计算机科技有限公司 | Kidnap report method, device, storage medium and equipment |
CN107800720B (en) * | 2017-11-29 | 2020-10-27 | 广州酷狗计算机科技有限公司 | Hijacking reporting method, device, storage medium and equipment |
CN107819789A (en) * | 2017-12-07 | 2018-03-20 | 北京泛融科技有限公司 | A kind of content anti-hijack system and method based on block chain |
CN108183908A (en) * | 2017-12-29 | 2018-06-19 | 哈尔滨安天科技股份有限公司 | A kind of advertisement link based on network flow finds method, system and storage medium |
CN108520185A (en) * | 2018-04-16 | 2018-09-11 | 深信服科技股份有限公司 | Detect method, apparatus, equipment and the computer readable storage medium of webpage tamper |
CN112424778A (en) * | 2018-07-26 | 2021-02-26 | 电子技巧股份有限公司 | Information processing device, information processing method, and information processing program |
CN111382396A (en) * | 2018-12-29 | 2020-07-07 | 北京奇虎科技有限公司 | Homepage protection method and device |
CN109800378A (en) * | 2019-01-23 | 2019-05-24 | 北京字节跳动网络技术有限公司 | Content processing method, device and electronic equipment based on custom browser |
CN110134901A (en) * | 2019-04-30 | 2019-08-16 | 哈尔滨英赛克信息技术有限公司 | A kind of multilink webpage tamper determination method based on flow analysis |
CN110134901B (en) * | 2019-04-30 | 2023-06-16 | 哈尔滨英赛克信息技术有限公司 | Multilink webpage tampering judging method based on flow analysis |
CN110929257B (en) * | 2019-10-30 | 2022-02-01 | 武汉绿色网络信息服务有限责任公司 | Method and device for detecting malicious codes carried in webpage |
CN110929257A (en) * | 2019-10-30 | 2020-03-27 | 武汉绿色网络信息服务有限责任公司 | Method and device for detecting malicious codes carried in webpage |
CN111199040A (en) * | 2019-12-18 | 2020-05-26 | 中国平安人寿保险股份有限公司 | Page tampering detection method, device, terminal and storage medium |
CN111199040B (en) * | 2019-12-18 | 2023-09-12 | 中国平安人寿保险股份有限公司 | Page tamper detection method, device, terminal and storage medium |
CN111262842A (en) * | 2020-01-10 | 2020-06-09 | 恒安嘉新(北京)科技股份公司 | Webpage tamper-proofing method and device, electronic equipment and storage medium |
CN111262842B (en) * | 2020-01-10 | 2022-09-06 | 恒安嘉新(北京)科技股份公司 | Webpage tamper-proofing method and device, electronic equipment and storage medium |
CN112507389A (en) * | 2020-10-28 | 2021-03-16 | 西安四叶草信息技术有限公司 | Webpage data processing method and device |
CN112507270A (en) * | 2020-12-11 | 2021-03-16 | 杭州安恒信息技术股份有限公司 | Website tampering alarm method based on title escape in cloud protection and related device |
CN113254984A (en) * | 2021-07-15 | 2021-08-13 | 天聚地合(苏州)数据股份有限公司 | Webpage monitoring method and device, storage medium and equipment |
CN113420252A (en) * | 2021-07-21 | 2021-09-21 | 北京字节跳动网络技术有限公司 | Proxy webpage detection method, device, equipment and storage medium |
CN114401115A (en) * | 2021-12-20 | 2022-04-26 | 浙江乾冠信息安全研究院有限公司 | Method, system, apparatus and medium for detecting anti-detection webpage tampering |
CN114401115B (en) * | 2021-12-20 | 2024-04-05 | 浙江乾冠信息安全研究院有限公司 | Method, system, device and medium for detecting tamper of anti-detected webpage |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102436564A (en) | Method and device for identifying falsified webpage | |
US11108807B2 (en) | Performing rule-based actions for newly observed domain names | |
CN102594934B (en) | Method and device for identifying hijacked website | |
CN104766014B (en) | Method and system for detecting malicious website | |
US20140380477A1 (en) | Methods and devices for identifying tampered webpage and inentifying hijacked web address | |
CN109905288B (en) | Application service classification method and device | |
CN105608134A (en) | Multithreading-based web crawler system and web crawling method thereof | |
US7962523B2 (en) | System and method for detecting templates of a website using hyperlink analysis | |
CN102200980A (en) | Method and system for providing network resources | |
Dongo et al. | A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis | |
CN108270754B (en) | Detection method and device for phishing website | |
JP5364012B2 (en) | Data extraction apparatus, data extraction method, and data extraction program | |
CN113505317A (en) | Illegal advertisement identification method and device, electronic equipment and storage medium | |
KR101853388B1 (en) | Social context for offsite advertisements | |
Guo et al. | A web crawler detection algorithm based on web page member list | |
KR20120090131A (en) | Method, system and computer readable recording medium for providing search results | |
CN104899320A (en) | Webpage repair method, terminal, server and system | |
CN110825976B (en) | Website page detection method and device, electronic equipment and medium | |
Jayaprakash et al. | A Comprehensive Survey on Data Preprocessing Methods in Web Usage Minning | |
Khosla et al. | Capturing web log and performing preprocessing of the users accessing distance education system | |
CN115766167A (en) | Illegal website identification method and device, electronic equipment and readable storage medium | |
Liu | 211Toronto. ca Web Session Analysis and Visualization | |
Bali et al. | Detection of Price Scraping using Behavioral Analysis | |
JP2024528515A (en) | SYSTEM AND METHOD FOR MAPPING A NETWORK ENVIRONMENT USING CROSS ACCOUNT CLUSTERING FOR THE PURPOSE OF MONITORING AND/OR DETECTING UNAUTHORIZED ENTITY NETWORKS - Patent application | |
Ufwinki | Web log pre-processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20120502 |
|
C41 | Transfer of patent application or patent right or utility model | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20161128 Address after: 100016 Jiuxianqiao Chaoyang District Beijing Road No. 10, building 15, floor 17, layer 1701-26, 3 Applicant after: BEIJING QI'ANXIN SCIENCE & TECHNOLOGY CO., LTD. Address before: The 4 layer 100016 unit of Beijing city Chaoyang District Jiuxianqiao Road No. 14 Building C Applicant before: Qizhi Software (Beijing) Co., Ltd. |