CN109992737A - Third party's web page contents checking method, device and electronic equipment - Google Patents

Third party's web page contents checking method, device and electronic equipment Download PDF

Info

Publication number
CN109992737A
CN109992737A CN201910263886.1A CN201910263886A CN109992737A CN 109992737 A CN109992737 A CN 109992737A CN 201910263886 A CN201910263886 A CN 201910263886A CN 109992737 A CN109992737 A CN 109992737A
Authority
CN
China
Prior art keywords
party
resource
webpage
moment
legal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910263886.1A
Other languages
Chinese (zh)
Inventor
钱宝坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910263886.1A priority Critical patent/CN109992737A/en
Publication of CN109992737A publication Critical patent/CN109992737A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the invention discloses a kind of third party's web page contents checking method, device and electronic equipment, device and electronic equipments.Method therein includes: to imitate user access activity at the first moment, and by content corresponding to the link of third party's webpage, automatic load is in third party's browser;The resource acquisition record for parsing third party's browser, obtains the resource path list of third party's webpage and stores;According to the resource path list, resource corresponding to third party's webpage described in the first moment is obtained;After the resource is reviewed, if illegally, the link of third party's webpage is by undercarriage.The embodiment of the present invention content can audit corresponding to third party's web page interlinkage to insertion, it is ensured that the health and safety of third party's web page contents avoids issuable adverse effect even legal risk for the operator of current application, ensures smoothly operation.

Description

Third party's web page contents checking method, device and electronic equipment
Technical field
The present invention relates to technical field of network security more particularly to a kind of third party's web page contents checking method, device and Electronic equipment, device and electronic equipment.
Background technique
In a certain webpage, it is often embedded in the link of third party's webpage, still, these, which link corresponding web page contents, is Current web page operator bad control.If the content of third party's webpage violates the requirement of relevant laws and regulations, it is possible to meeting Adverse effect is caused to the current web page operator, or even brings some legal risks.
Therefore how efficiently third party's web page contents of insertion to be audited, it is those skilled in the art's urgent need to resolve The technical issues of.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of method, apparatus and electricity for promoting third party's webpage opening speed Sub- equipment, at least part of solution problems of the prior art.
In a first aspect, the embodiment of the invention provides a kind of third party's web page contents checking methods, comprising:
It is automatic to add by content corresponding to the link of third party's webpage by imitating user access activity at the first moment It is loaded in third party's browser;
The resource acquisition record of third party's browser is parsed, to obtain the resource path column of third party's webpage Table;
Based on the resource path list, resource corresponding to third party's webpage described in first moment is obtained;
It is confirmed as illegally removing the link of third party's webpage in response to the resource.
A kind of specific implementation according to an embodiment of the present invention, whether the resource is determined by machine legal.
A kind of specific implementation according to an embodiment of the present invention, it is legal to be confirmed as in response to the resource, further includes Following steps:
At the second moment, by imitating user access activity, again certainly by content corresponding to the link of third party's webpage Dynamic load is in third party's browser;
The resource acquisition record of third party's browser is parsed, to obtain the resource path column of third party's webpage Table;
Based on the resource path list, resource corresponding to third party's webpage described in second moment is obtained;
Judge whether resource corresponding to third party's webpage described in second moment is legal;
It is illegal in response to resource corresponding to third party's webpage described in second moment, remove third party's webpage Link.
A kind of specific implementation according to an embodiment of the present invention judges that the institute of third party's webpage described in second moment is right The whether legal size for comprising determining that resource corresponding to third party's webpage described in first moment of the resource answered and described Whether the size of resource corresponding to third party's webpage described in two moment is identical;
In response to third party described in resource corresponding to third party's webpage described in first moment and second moment Resource size corresponding to webpage is identical, determines that resource corresponding to third party's webpage described in second moment is legal.
A kind of specific implementation according to an embodiment of the present invention, in response to third party's webpage institute described in first moment Corresponding resource is different from resource size corresponding to third party's webpage described in second moment, the side audited by machine Formula judges whether resource corresponding to third party's webpage described in second moment is legal.
A kind of specific implementation according to an embodiment of the present invention judges that the institute of third party's webpage described in second moment is right Whether the resource answered is legal to include:
Judge whether resource corresponding to third party's webpage described in second moment is legal using machine learning mode.
A kind of specific implementation according to an embodiment of the present invention, between first moment and second moment when Between be spaced preassign.
A kind of specific implementation according to an embodiment of the present invention, the resource path list of third party's webpage include with It is at least one of lower: the URL of Javascript, the URL of pattern file, the URL of picture and the URL of external resource, the outside Resource includes font file, audio, at least one of document in video and page.
A kind of specific implementation according to an embodiment of the present invention, the machine determine that the resource is in the following manner It is no legal:
The search based on preset keyword is carried out from the resource;And
Determine whether the resource is legal based on search result.
Second aspect, the embodiment of the invention also provides a kind of third party's web page contents to audit device, comprising:
First loading module is used at the first moment, by imitating user access activity, by the link institute of third party's webpage Corresponding content, automatic load is in third party's browser;
First parsing module, the resource acquisition for parsing third party's browser records, to obtain the third party The resource path list of webpage;
First resource obtains module, for being based on the resource path list, obtains third party described in first moment Resource corresponding to webpage;
First remove module illegally removes the link of third party's webpage for being confirmed as in response to the resource.
A kind of specific implementation according to an embodiment of the present invention, whether the resource is determined by machine legal.
A kind of specific implementation according to an embodiment of the present invention, described device further include:
Second loading module, it is legal for being confirmed as in response to the resource, at the second moment, visited by imitating user It asks behavior, content corresponding to the link of third party's webpage is loaded automatically again in third party's browser;
Second parsing module, the resource acquisition for parsing third party's browser records, to obtain the third party The resource path list of webpage;
Secondary resource obtains module, for being based on the resource path list, obtains third party described in second moment Resource corresponding to webpage;
Second remove module, for judging whether resource corresponding to third party's webpage described in second moment is legal; And it is illegal in response to resource corresponding to third party's webpage described in second moment, remove the link of third party's webpage.
A kind of specific implementation according to an embodiment of the present invention, in second remove module further include:
Comparing unit, for determining the size and described of resource corresponding to third party's webpage described in first moment Whether the size of resource corresponding to third party's webpage described in two moment is identical;
First response unit, in response to resource corresponding to third party's webpage described in first moment and described the Resource size corresponding to third party's webpage described in two moment is identical, determines corresponding to third party's webpage described in second moment Resource it is legal.
A kind of specific implementation according to an embodiment of the present invention, the second remove module further include:
Second response unit, in response to resource corresponding to third party's webpage described in first moment and described the Resource size corresponding to third party's webpage described in two moment is different, in such a way that machine is audited, judges second moment Whether resource corresponding to third party's webpage is legal.
A kind of specific implementation according to an embodiment of the present invention in second remove module, judges the second moment institute Whether legal stating resource corresponding to third party's webpage includes: to judge third described in second moment using machine learning mode Whether resource corresponding to square webpage is legal.
A kind of specific implementation according to an embodiment of the present invention, between first moment and second moment when Between be spaced preassign.
A kind of specific implementation according to an embodiment of the present invention, the resource path list of third party's webpage include with It is at least one of lower: the URL of Javascript, the URL of pattern file, the URL of picture and the URL of external resource, the outside Resource includes font file, audio, at least one of document in video and page.
A kind of specific implementation according to an embodiment of the present invention, the machine determine that the resource is in the following manner It is no legal: to extract keyword from the resource;And determine whether the resource is legal based on the keyword.Third Aspect, the embodiment of the invention also provides a kind of electronic equipment, which includes:
At least one processor;And
The memory being connect at least one processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one processor, and the instruction is by least one processor It executes, so that at least one processor is able to carry out the in any implementation of aforementioned first aspect or first aspect Tripartite's web page contents checking method.
Fourth aspect, the embodiment of the invention also provides a kind of non-transient computer readable storage medium, the non-transient meters Calculation machine readable storage medium storing program for executing stores computer instruction, and the computer instruction is for making the computer execute aforementioned first aspect or the Third party's web page contents checking method in any implementation of one side.
5th aspect, the embodiment of the invention also provides a kind of computer program product, which includes The calculation procedure being stored in non-transient computer readable storage medium, the computer program include program instruction, when the program When instruction is computer-executed, the computer is made to execute the third in aforementioned first aspect or any implementation of first aspect Square web page contents checking method.
Third party's web page contents checking method provided in an embodiment of the present invention, device and electronic equipment, non-transient computer In readable storage medium storing program for executing and computer program:
It, can (can also be with by third party's browser when link of third party's webpage in the form of a certain is embedded in current web page As custom browser) user's click behavior is imitated, by the content of third party's webpage, automatic load is in the browser.The The content of tripartite's webpage includes, html (main body of webpage), javascript file (determine the behavior of webpage, such as various Event response, such as click), pattern file (attribute of decision element, for example, appearance, size), and, picture, frame, Iframe etc..After the completion of the behavior loaded in third party's webpage, the request resource path list of third party's webpage is automatically analyzed, The corresponding the Resources list of corresponding insertion link is formed, and the Resources list is stored.Next, being arranged according to resource Table obtains content corresponding to the list.Finally, being audited to the content.If auditing result is illegal, for example, in violation of rules and regulations or It is illegal or violate public order and good custom, then undercarriage link.
In a preferred embodiment, because the corresponding content of the Resources list has timeliness, that is to say, that third party The exploitation side of webpage its can in real time or regularly update, therefore, in order to keep audit continuous and effective, then need periodically to insertion The content of third party's webpage check that the mode inspected periodically is as follows: the preferentially relatively file size at two moment, if literary Part is in the same size, generally it is believed that content corresponding to list does not change, then without considering whether undercarriage;If file size is not Unanimously, then the corresponding particular content of the list is obtained, then considers whether undercarriage after audit again.
In a preferred embodiment, if third party's webpage (can be understood as " two jump ") of insertion be also embedded in it is another The webpage (can be understood as three jumps) of side, or even there are also " four jump ", the audit of content can also be carried out using this principle.
Obviously, the embodiment of the present invention content can audit corresponding to third party's web page interlinkage to insertion, it is ensured that The health and safety of third party's web page contents avoids issuable adverse effect even law wind for the operator of current application Danger ensures smoothly operation.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this field For those of ordinary skill, without creative efforts, it can also be obtained according to these attached drawings other attached drawings.
Fig. 1 is a kind of step flow chart of third party's web page contents checking method provided in an embodiment of the present invention;
Fig. 2 be another embodiment of the present invention provides a kind of third party's web page contents checking method step flow chart;
Fig. 3 be another embodiment of the present invention provides a kind of third party's web page contents checking method in, the second moment third The step flow chart of the whether legal judgement of square web page contents;
Fig. 4 is the structural block diagram that third party's web page contents provided in an embodiment of the present invention audit device;
Fig. 5 be another embodiment of the present invention provides third party's web page contents audit device structural block diagram;
Fig. 6 be another embodiment of the present invention provides third party's web page contents audit device in, realize the second moment third The block diagram of the whether legal structure of square web page contents;
Fig. 7 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described in detail with reference to the accompanying drawing.
Illustrate embodiment of the present disclosure below by way of specific specific example, those skilled in the art can be by this specification Disclosed content understands other advantages and effect of the disclosure easily.Obviously, described embodiment is only the disclosure A part of the embodiment, instead of all the embodiments.The disclosure can also be subject to reality by way of a different and different embodiment It applies or applies, the various details in this specification can also be based on different viewpoints and application, in the spirit without departing from the disclosure Lower carry out various modifications or alterations.It should be noted that in the absence of conflict, the feature in following embodiment and embodiment can To be combined with each other.Based on the embodiment in the disclosure, those of ordinary skill in the art are without creative efforts Every other embodiment obtained belongs to the range of disclosure protection.
It should be noted that the various aspects of embodiment within the scope of the appended claims are described below.Ying Xian And be clear to, aspect described herein can be embodied in extensive diversified forms, and any specific structure described herein And/or function is only illustrative.Based on the disclosure, it will be understood by one of ordinary skill in the art that one described herein Aspect can be independently implemented with any other aspect, and can combine the two or both in these aspects or more in various ways. For example, any several aspects set forth herein can be used to carry out facilities and equipments and/or practice method.In addition, can be used Other structures other than one or more of aspect set forth herein and/or it is functional implement this equipment and/or Practice the method.
It should also be noted that, diagram provided in following embodiment only illustrates the basic structure of the disclosure in a schematic way Think, component count, shape and the size when only display is with component related in the disclosure rather than according to actual implementation in schema are drawn System, when actual implementation kenel, quantity and the ratio of each component can arbitrarily change for one kind, and its assembly layout kenel can also It can be increasingly complex.
In addition, in the following description, specific details are provided for a thorough understanding of the examples.However, fields The skilled person will understand that the aspect can be practiced without these specific details.
The embodiment of the present disclosure provides a kind of generation method of Webpage correlation video.Webpage correlation video provided in this embodiment Generation method can be executed by a computing device, which can be implemented as software, or be embodied as software and hard The combination of part, which, which can integrate, is arranged in server, terminal device etc..
Fig. 1 is a kind of step flow chart of third party's web page contents checking method provided in an embodiment of the present invention.
The application scenarios of the present embodiment are as follows: in current web page, be embedded in third party's web page interlinkage.For example, " content is flat In platform ", it is embedded in the link of one " allow and share knowledge as habit ".The link can be written form, in other field Under scape, or graphic form, or be dynamic video form.The implementation of the present embodiment is not influenced.The link " allows sharing Knowledge become habit " it is corresponding be third party's webpage homepage.Third party is by the network address of its webpage, and in above content platform The form of upper need link to be shown consigns to the developer of content platform, it is desirable to by the way that its web page interlinkage is placed on flow Expand the popularity and user's pageview of platform on biggish content platform.
For the operator of content platform, if the content for being embedded in third party's webpage of its platform is illegal, it may give Its platform brings undesirable influence, or even may also generate legal risk, leads to the not smooth of operation.Here illegally include Third party's webpage propagates the content for morally violating public order and good custom, alternatively, there are the wind in copyright for the content of third party's webpage The legal issues such as danger, although harbor principle gives certain exemption, can know in advance, early to find, early processing is obvious More preferably.
In order to avoid drawbacks described above, the embodiment of the present invention is intended to using third party's browser, in advance to third party in link The content progress of webpage is pre-loaded, then analyzes the resource request list of third party's browser, then by these lists, obtains Corresponding resource is obtained in turn to audit above-mentioned acquisition resource by artificial or machine.
Illustrate how to obtain as the web page resources of third party's webpage first, referring to Fig.1 in S101, S102 and S103.
S101, at the first moment, by imitating user access activity, by content corresponding to the link of third party's webpage, Automatic load is in third party's browser.
Because third party is likely to that its network address and web page interlinkage are only consigned to the content platform, in its webpage The resource being related to, there is no deliver.For example, resource involved in webpage includes:
1) html language file is the main body of webpage;
2) file of javascript determines the behavior of webpage, such as various event responses, such as click;,
3) pattern file, the attribute of decision element, for example, appearance, size.Pattern file is generally used for webpage, generally It is all CSS, file is using .css as suffix.Style sheet is the definition about the following element of file:
A, the default font of title and text, size and color;
B, preceding page appearance;
C, the arrangement pitch of single part;
D, line space, surrounding margin are away from distance between, title etc.;
E, any table of contents automatically generated should include how many grades of titles;
F, any boilerplate content etc. for including in corresponding page.
4) picture, frame, iframe etc..
These resources need the third party's justice browser created in through this embodiment to obtain.
Third party's browser in the present embodiment is the kernel based on existing browser, carries out modification, the adjustment of function With obtain after extension.For those skilled in the art, it is based on existing browser, according to customized in the present embodiment Browser needs the function completed to modify, adjust the extension of either function, be it is known, the present invention no longer goes to live in the household of one's in-laws on getting married herein It states.
After the completion of the building of third party's browser, by way of the triggering third party website imitating user links behavior, Such as webpage clicking link (however, the present invention is not limited thereto clicks a kind of way of realization of only the present embodiment), by third party The resource of webpage, load in third party's browser, as previously mentioned, resource here include html language file, File, the pattern file of javascript, and, picture, frame, iframe etc..
S102 parses the resource acquisition record of third party's browser, to obtain the resource road of third party's webpage Diameter list.
In this step, in third party's browser by third party's webpage load after the completion of, automatically analyze third party's net Page resource path list, the resource path list of third party's webpage include at least one of the following: the URL of Javascript, The URL of the URL of pattern file, the URL of picture and external resource, the external resource include font file, audio, video and At least one of document in page.
S103 is based on the resource path list, obtains resource corresponding to third party's webpage described in first moment.
According to the URL of Javascript, the URL of pattern file, the picture in the resource path list of third party's webpage URL, and, font file, audio-video, the information such as URL of external resource of document in page obtain corresponding to third party's webpage Then these resources are stored in the caching of current application local by resource, be used for follow-up checks.
S104 is confirmed as illegally removing the link of third party's webpage in response to the resource.
Illustrate the main body of audit first.
In one embodiment, audit resource can be based on program, be audited automatically by machine.For example, presetting certain Keyword is as machine search target, then, is based on search result, determines whether above-mentioned resource is legal.Specifically, be exactly Search, if occurring, illegally, if not occurring, can be regarded as closing with the presence or absence of the keyword as search target in resource Method.It is of course also possible to be manual examination and verification, but this efficiency is with regard to very low.Assuming that there is many third parties on some content platform Link, then, manual examination and verification will also form heavy work load.In yet some other cases, it can also be audited and be done by machine Initial survey, by manually being rechecked, to avoid erroneous judgement, has taken into account efficiency and accurate to the web page contents come are sifted out.
Obviously, the present embodiment content can be audited corresponding to third party's web page interlinkage to insertion, it is ensured that third The health and safety of square web page contents avoids issuable adverse effect even legal risk for the operator of current application, Ensure smoothly operation.
Another embodiment of the present invention is described further below with reference to Fig. 2.
There is the corresponding content of the Resources list timeliness this is because third party's webpage can regularly update therefore, to be Audit is more efficient, then needs to inspect periodically,
It is the step flow chart of third party's web page contents checking method provided in an embodiment of the present invention, packet referring to Fig. 2, Fig. 2 Include following steps:
S201, at the first moment, by imitating user access activity, by content corresponding to the link of third party's webpage, Automatic load is in third party's browser.
S202 parses the resource acquisition record of third party's browser, to obtain the resource road of third party's webpage Diameter list.
In this step, in third party's browser by third party's webpage load after the completion of, automatically analyze third party's net Page resource path list, the resource path list of third party's webpage include the URL of Javascript, pattern file URL, The URL of picture, and, font file, audio-video, in page the external resource of document URL.
S203 is based on the resource path list, obtains resource corresponding to third party's webpage described in first moment.
According to URL, the sample as the Javascript in the resource path list in " obtaining " webpage of third party's webpage The URL of the URL of formula file, picture, and, font file, audio-video, the information such as URL of external resource of document in page obtain Then these resources are stored in the caching of current application local by resource corresponding to third party's webpage, be used for follow-up checks.
It is legal to be confirmed as in response to the resource, further includes operating as follows.
S204 audits the resource:
If illegal, execute:
S205 removes the link of third party's webpage;
If legal, S206~S210 is executed
S206 after designated time interval, at the second moment, imitates user access activity, by the link of third party's webpage Corresponding content, automatic load is in third party's browser again;
It is similar with above-mentioned steps S201, the content load of third party's webpage is carried out again.
It should be noted that can also can also be verified according to the actual situation with not specified time interval.The present invention It does not limit this.
S207 parses the resource acquisition record of third party's browser, to obtain the resource road of third party's webpage Diameter list.
It is similar with the S202 in above-described embodiment.
S208 obtains resource corresponding to third party's webpage described in the second moment according to the resource path list.
It is similar with the S203 in above-described embodiment.
S209 judges whether resource corresponding to third party's webpage described in second moment is legal,
Here legal, is to see whether the content of third party's webpage obviously violates public order and good custom or legal provisions.One In a embodiment, audit resource can be based on program, be audited automatically by machine.The audit can be searching based on certain keywords Rope.It is of course also possible to be manual examination and verification, but this efficiency is with regard to very low.Assuming that there is many third parties on certain content platforms Link, then, artificial audit is also work load.In yet some other cases, it can also be audited by machine and do initial survey, to sieve Web page contents out, to avoid erroneous judgement, have taken into account efficiency and preparation by manually being rechecked.
If it is illegal, then execute S210, then it is non-in response to resource corresponding to third party's webpage described in second moment Method removes the link of third party's webpage.
If legal: after then waiting with a specified time interval, continuing acquisition third hair web page contents and audited.
It needs exist for making further instructions the first moment and the second moment.
First, it is not only to carry out the audit of third party's web page contents two moment;
The second, the second moment is used only to a moment at the first moment of difference, shows after a moment , by another moment of scheduled time interval;
Third, explaining as can be seen that the second moment can also regard " when first of next time interval as from second point Carve ", after the second moment, there can also be the audit at third moment.
It is specified that the time interval at the four, the first moment and the second moment first passes through experience in advance.
The present embodiment can the content to third party's webpage carry out continue dynamic examining, dynamic guarantee third party's web page contents Health it is legal.
Another embodiment of the present invention is described further below with reference to Fig. 3.
Referring to Fig. 3, give in third party's web page contents checking method one embodiment of the present invention, in third party's webpage After content has carried out primary audit, if its auditing result be it is legal, then judge the second moment acquisition web page contents whether When legal, it may include steps of:
S301 determines the size of resource corresponding to the first moment third party's webpage;
S302 determines the size of resource corresponding to the second moment third party's webpage;
Two moment third party's web page resources sizes are compared by S303, judgement:
S304, if corresponding to third party's webpage at resource corresponding to third party's webpage at the first moment and the second moment Resource size is identical, then resource corresponding to second moment third party's webpage is legal.
If corresponding to third party's webpage at resource corresponding to third party's webpage at the first moment of S305 and the second moment Resource size is different, then
S306 judges that resource corresponding to third party's webpage described in second moment is in such a way that machine is audited It is no legal.
By the embodiment, in the case where the analysis without substantive content, webpage can be completed with higher efficiency The judgement of content.
Second aspect, the embodiment of the invention also provides a kind of third party's web page contents to audit device.
The application scenarios of the present embodiment are as follows: in current web page, be embedded in third party's web page interlinkage.For example, in some Hold in platform, is embedded in the link of one " allow and share knowledge as habit ".The link can be written form, in other Under scene, or graphic form, or be dynamic video form.The implementation of the present embodiment is not influenced.The link " allows point Enjoy knowledge become habit " it is corresponding be third party's webpage homepage.Third party is flat by the network address of its webpage, and in above content The form of link to be shown is needed to consign to the developer of this content platform on platform, it is desirable to by placing its web page interlinkage Expand the popularity of platform and user's pageview of platform webpage on the biggish content platform of flow.
The operator of content platform is said, if the content for being embedded in third party's webpage of its platform is illegal, it may be given Undesirable influence is brought, or even legal risk may also be generated, leads to the not smooth of operation.Here illegally include third party's net Page, which is propagated, morally violates the content of public order and good custom, alternatively, there are the laws such as risk in copyright for the content of third party's webpage Problem, although harbor principle gives certain exemption, can know in advance, early to find, early processing is obviously more preferable.
In order to avoid drawbacks described above, the embodiment of the present invention is intended to using third party's browser, in advance to third party in link The content progress of webpage is pre-loaded, then analyzes the resource request list of third party's browser, then by these lists, obtains Corresponding resource is obtained in turn to audit above-mentioned acquisition resource by artificial or machine.
Illustrate how third party's webpage web page resources obtain first, shows third of the embodiment of the present invention referring to Fig. 4, Fig. 4 The structural block diagram of square web page contents audit device.Below to the first load mould 41 therein, the first parsing module 42 and the first money Source obtains module 43.
First load mould 41, is used at the first moment, by imitating user access activity, by the link institute of third party's webpage Corresponding content, automatic load is in third party's browser.
Because third party is likely to that its network address and web page interlinkage are only consigned to this content platform, for its webpage Involved in resource, there is no deliver.For example, resource involved in webpage includes:
1) html language file is the main body of webpage;
2) file of javascript determines the behavior of webpage, such as various event responses, such as click;,
3) pattern file, the attribute of decision element, for example, appearance, size.Pattern file is generally used for webpage, generally It is all CSS, file is using .css as suffix.Style sheet is the definition about the following element of file:
A, the default font of title and text, size and color;
B, preceding page appearance;
C, the arrangement pitch of single part;
D, line space, surrounding margin are away from distance between, title etc.;
E, any table of contents automatically generated should include how many grades of titles;
F, any boilerplate content etc. for including in corresponding page.
4) picture, frame, iframe etc..
These resources need the third party's justice browser created in through this embodiment to obtain.
Third party's browser in the present embodiment is the kernel based on existing browser, carries out modification, the adjustment of function With obtain after extension.For those skilled in the art, it is based on existing browser, according to customized in the present embodiment Browser needs the function completed to modify, adjust the extension of either function, be it is known, the present invention no longer goes to live in the household of one's in-laws on getting married herein It states.
After the completion of the building of third party's browser, by way of the triggering third party website imitating user links behavior, Such as webpage clicking link (however, the present invention is not limited thereto clicks a kind of way of realization of only the present embodiment), by third party The resource of webpage, load in third party's browser, as previously mentioned, resource here include html language file, File, the pattern file of javascript, and, picture, frame, iframe etc..
First parsing module 42, the resource acquisition for parsing third party's browser records, to obtain the third The resource path list of square webpage.
In this step, in third party's browser by third party's webpage load after the completion of, automatically analyze third party's net Page resource path list, the resource path list of third party's webpage include the URL of Javascript, pattern file URL, The URL of picture, and, font file, audio-video, in page the external resource of document URL.
First resource obtains module 43, for being based on the resource path list, obtains third described in first moment Resource corresponding to square webpage.
According to the URL of Javascript, the URL of pattern file, the picture in the resource path list of third party's webpage URL, and, font file, audio-video, the information such as URL of external resource of document in page obtain corresponding to third party's webpage Then these resources are stored in the caching of current application local by resource, be used for follow-up checks.
First remove module 44 illegally removes the chain of third party's webpage for being confirmed as in response to the resource It connects.
In one embodiment, audit resource can be based on program, be audited automatically by machine.For example, presetting certain Keyword is as machine search target, then, is based on search result, determines whether above-mentioned resource is legal.Specifically, be exactly Search, if occurring, illegally, if not occurring, can be regarded as closing with the presence or absence of the keyword as search target in resource Method.It is of course also possible to be manual examination and verification, but this efficiency is with regard to very low.Assuming that there is many third parties on some content platform Link, then, manual examination and verification will also form heavy work load.In yet some other cases, it can also be audited and be done by machine Initial survey, by manually being rechecked, to avoid erroneous judgement, has taken into account efficiency and accurate to the web page contents come are sifted out.
Obviously, the present embodiment content can be audited corresponding to third party's web page interlinkage to insertion, it is ensured that third The health and safety of square web page contents avoids issuable adverse effect even legal risk for the operator of current application, Ensure smoothly operation.
Referring to Fig. 5, Fig. 5 be another embodiment of the present invention provides third party's web page contents audit the structural block diagram of device. Include:
First load mould 51 will be corresponding to the link of third party's webpage for imitating user access activity at the first moment Content, automatic load is in third party's browser.
First parsing module 52, the resource acquisition for parsing third party's browser record, and obtain the third party The resource path list of webpage.
First resource obtains module 53, for obtaining third party's net described in the first moment according to the resource path list Resource corresponding to page.
First remove module 54, for being audited to the resource, if illegally, removing the chain of third party's webpage It connects.
Second loading module 55, at the second moment, imitating user access activity after designated time interval, by the Content corresponding to the link of tripartite's webpage, automatic load is in third party's browser again;
Second parsing module 56, the resource acquisition for parsing third party's browser record, and obtain the third party The resource path list of webpage;
Secondary resource obtains module 57, for obtaining third party's net described in the second moment according to the resource path list Resource corresponding to page;
Second remove module 58, for judging whether resource corresponding to third party's webpage described in second moment closes Method, if it is not, then deleting the link of third party's webpage.
It needs exist for making further instructions the first moment and the second moment.
First, it is not only to carry out the audit of third party's web page contents two moment;
The second, the second moment is used only to a moment at the first moment of difference, shows after a moment , by another moment of scheduled time interval;
Third, explaining as can be seen that the second moment can also regard " when first of next time interval as from second point Carve ", after the second moment, there can also be the audit at third moment.
It is specified that the time interval at the four, the first moment and the second moment first passes through experience in advance.
The present embodiment can the content to third party's webpage carry out continue dynamic examining, dynamic guarantee third party's web page contents Health it is legal.
Referring to Fig. 6, Fig. 6 be another embodiment of the present invention provides third party's web page contents audit in device, second removes The structural block diagram of module.It include: comparing unit 60, for determining resource corresponding to third party's webpage described in first moment Size and second moment described in resource corresponding to third party's webpage size it is whether identical;
First response unit 61, in response to resource corresponding to third party's webpage described in first moment with it is described Resource size corresponding to third party's webpage described in second moment is identical, determines that the institute of third party's webpage described in second moment is right The resource answered is legal.
Second response unit 62, in response to resource corresponding to third party's webpage described in first moment with it is described Resource size corresponding to third party's webpage described in second moment is different, in such a way that machine is audited, when judging described second Whether legal carve resource corresponding to third party's webpage.
Preferably, judge specifically to use when whether resource corresponding to third party's webpage described in second moment is legal: Judge whether resource corresponding to third party's webpage described in second moment is legal using machine learning mode.In addition, one In a embodiment, the time interval between the first moment and second moment can be preassigned.
By the embodiment, in the case where the analysis without substantive content, webpage can be completed with higher efficiency The judgement of content.
A kind of specific implementation according to an embodiment of the present invention, between first moment and the time at second moment Every preassigning, with the content periodic auditing to third party's webpage.
Fig. 7 shows the structural schematic diagram of electronic equipment 70 provided in an embodiment of the present invention, and electronic equipment 70 includes at least One processor 701 (such as CPU), at least one input/output interface 704, memory 702 and at least one communication bus 703, for realizing the connection communication between these components.At least one processor 701 is used to execute to store in memory 702 Computer instruction, so that at least one described processor 701 is able to carry out aforementioned any third party's web page contents checking method side The embodiment of method.Memory 702 is non-transient memory (non-transitory memory), may include volatibility and deposits Reservoir, such as high-speed random access memory (RAM:Random Access Memory), also may include non-volatile memories Device (non-volatile memory), for example, at least a magnetic disk storage.It (can by least one input/output interface 704 To be wired or wireless communication interface) realize and the communication connection between at least one other equipment or unit.
In some embodiments, memory 702 stores program 7021, and processor 701 executes program 7021, for holding Content in the aforementioned any embodiment of the method for promoting third party's webpage opening speed of row.
The electronic equipment can exist in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) particular server: providing the equipment of the service of calculating, and the composition of server includes processor, hard disk, memory, is Bus of uniting etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, is handling Ability, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic equipments with data interaction function.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.
For Installation practice, since it is substantially similar to the method embodiment, so the comparison of description is simple Single, the relevent part can refer to the partial explaination of embodiments of method.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.
In the above-described embodiment, multiple steps or method can be with storages in memory and by suitable instruction execution The software or firmware that system executes are realized.For example, in another embodiment, can be used if realized with hardware Any one of following technology well known in the art or their combination are realized: being had for realizing logic function to data-signal The discrete logic of the logic gates of energy, the specific integrated circuit with suitable combinational logic gate circuit, programmable gate Array (PGA), field programmable gate array (FPGA) etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those familiar with the art, all answers It is included within the scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.

Claims (12)

1. a kind of third party's web page contents checking method characterized by comprising
At the first moment, by imitating user access activity, by content corresponding to the link of third party's webpage, automatic load exists In third party's browser;
The resource acquisition record of third party's browser is parsed, to obtain the resource path list of third party's webpage;
Based on the resource path list, resource corresponding to third party's webpage described in first moment is obtained;
It is confirmed as illegally removing the link of third party's webpage in response to the resource.
2. the method according to claim 1, wherein
Whether the resource is determined by machine legal.
3. the method according to claim 1, wherein be confirmed as in response to the resource it is legal, further include as Lower step:
At the second moment, by imitating user access activity, content corresponding to the link of third party's webpage is added automatically again It is loaded in third party's browser;
The resource acquisition record of third party's browser is parsed, to obtain the resource path list of third party's webpage;
Based on the resource path list, resource corresponding to third party's webpage described in second moment is obtained;
Judge whether resource corresponding to third party's webpage described in second moment is legal;
It is illegal in response to resource corresponding to third party's webpage described in second moment, remove the chain of third party's webpage It connects.
4. according to the method described in claim 3, it is characterized in that, judging corresponding to third party's webpage described in second moment The whether legal size for comprising determining that resource corresponding to third party's webpage described in first moment of resource and described second Whether the size of resource corresponding to third party's webpage described in the moment is identical;
In response to third party's webpage described in resource corresponding to third party's webpage described in first moment and second moment Corresponding resource size is identical, determines that resource corresponding to third party's webpage described in second moment is legal.
5. according to the method described in claim 4, it is characterized in that,
In response to third party's webpage described in resource corresponding to third party's webpage described in first moment and second moment Corresponding resource size is different, in such a way that machine is audited, judges corresponding to third party's webpage described in second moment Resource it is whether legal.
6. according to the method described in claim 3, it is characterized in that, judging corresponding to third party's webpage described in second moment Resource whether legal include:
Judge whether resource corresponding to third party's webpage described in second moment is legal using machine learning mode.
7. method according to any one of claim 3 to 6, which is characterized in that
Time interval between first moment and second moment preassigns.
8. the method according to claim 1, wherein
The resource path list of third party's webpage includes at least one of the following: the URL of Javascript, pattern file The URL of URL, the URL of picture and external resource, the external resource include font file, audio, in video and page in document At least one.
9. according to the method described in claim 2, it is characterized in that, whether the machine determines the resource in the following manner It is legal:
The search based on preset keyword is carried out from the resource;And
Determine whether the resource is legal based on search result.
10. a kind of third party's web page contents audit device characterized by comprising
First loading module was used at the first moment, will be corresponding to the link of third party's webpage by imitating user access activity Content, automatic load is in third party's browser;
First parsing module, the resource acquisition for parsing third party's browser records, to obtain third party's webpage Resource path list;
First resource obtains module, for being based on the resource path list, obtains third party's webpage described in first moment Corresponding resource;
First remove module illegally removes the link of third party's webpage for being confirmed as in response to the resource.
11. a kind of electronic equipment, which is characterized in that the electronic equipment includes:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one It manages device to execute, so that at least one described processor is able to carry out in third party's webpage described in aforementioned any claim 1-9 Hold checking method.
12. a kind of machine readable media is stored thereon with computer executable instructions, the computer executable instructions are by machine Device makes the machine execute third party's web page contents checking method according to claim 1 to 9 when executing.
CN201910263886.1A 2019-04-03 2019-04-03 Third party's web page contents checking method, device and electronic equipment Pending CN109992737A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910263886.1A CN109992737A (en) 2019-04-03 2019-04-03 Third party's web page contents checking method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910263886.1A CN109992737A (en) 2019-04-03 2019-04-03 Third party's web page contents checking method, device and electronic equipment

Publications (1)

Publication Number Publication Date
CN109992737A true CN109992737A (en) 2019-07-09

Family

ID=67132097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910263886.1A Pending CN109992737A (en) 2019-04-03 2019-04-03 Third party's web page contents checking method, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN109992737A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110740126A (en) * 2019-09-23 2020-01-31 紫光云(南京)数字技术有限公司 Method, device and system for accessing smart city application program and computer storage medium
CN111327609A (en) * 2020-02-14 2020-06-23 北京奇艺世纪科技有限公司 Data auditing method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571783A (en) * 2011-12-29 2012-07-11 北京神州绿盟信息安全科技股份有限公司 Phishing website detection method, device and system as well as website
CN103428183A (en) * 2012-05-23 2013-12-04 北京新媒传信科技有限公司 Method and device for identifying malicious website
US20140283038A1 (en) * 2013-03-15 2014-09-18 Shape Security Inc. Safe Intelligent Content Modification
CN106209579A (en) * 2016-06-28 2016-12-07 武汉斗鱼网络科技有限公司 Barrage website chat process quickly generates the system and method for hyperlink
CN107294918A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of fishing webpage detection method and device
CN108228818A (en) * 2017-12-29 2018-06-29 网易(杭州)网络有限公司 Web page resources loading method and device, electronic equipment and storage medium
CN108304584A (en) * 2018-03-06 2018-07-20 百度在线网络技术(北京)有限公司 Illegal page detection method, apparatus, intruding detection system and storage medium
CN109246139A (en) * 2018-10-25 2019-01-18 北京城市网邻信息技术有限公司 A kind of monitoring method, device, electronic equipment and storage medium that website is kidnapped

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571783A (en) * 2011-12-29 2012-07-11 北京神州绿盟信息安全科技股份有限公司 Phishing website detection method, device and system as well as website
CN103428183A (en) * 2012-05-23 2013-12-04 北京新媒传信科技有限公司 Method and device for identifying malicious website
US20140283038A1 (en) * 2013-03-15 2014-09-18 Shape Security Inc. Safe Intelligent Content Modification
CN107294918A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of fishing webpage detection method and device
CN106209579A (en) * 2016-06-28 2016-12-07 武汉斗鱼网络科技有限公司 Barrage website chat process quickly generates the system and method for hyperlink
CN108228818A (en) * 2017-12-29 2018-06-29 网易(杭州)网络有限公司 Web page resources loading method and device, electronic equipment and storage medium
CN108304584A (en) * 2018-03-06 2018-07-20 百度在线网络技术(北京)有限公司 Illegal page detection method, apparatus, intruding detection system and storage medium
CN109246139A (en) * 2018-10-25 2019-01-18 北京城市网邻信息技术有限公司 A kind of monitoring method, device, electronic equipment and storage medium that website is kidnapped

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110740126A (en) * 2019-09-23 2020-01-31 紫光云(南京)数字技术有限公司 Method, device and system for accessing smart city application program and computer storage medium
CN111327609A (en) * 2020-02-14 2020-06-23 北京奇艺世纪科技有限公司 Data auditing method and device
CN111327609B (en) * 2020-02-14 2022-09-30 北京奇艺世纪科技有限公司 Data auditing method and device

Similar Documents

Publication Publication Date Title
Mitchell Web scraping with Python: Collecting more data from the modern web
CN102483698B (en) The client tier checking of dynamic WEB application
Courtois et al. Challenging Google Search filter bubbles in social and political information: Disconforming evidence from a digital methods case study
Díaz et al. The augmented web: rationales, opportunities, and challenges on browser-side transcoding
Patil Swati et al. Search engine optimization: A study
CN104766014A (en) Method and system used for detecting malicious website
US20130132851A1 (en) Sentiment estimation of web browsing user
WO2012030730A2 (en) Systems and methods for ruled based inclusion of pixel retargeting in campaign management
CN106126747A (en) Data capture method based on reptile and device
CN103279516B (en) Web spider identification method
US20140164296A1 (en) Chatbot system and method with entity-relevant content from entity
Oltețeanu et al. What determines creative association? Revealing two factors which separately influence the creative process when solving the remote associates test
CN107807937B (en) Website SEO processing method, device and system
CN106503907B (en) Service evaluation information determination method and server
Wohlgenannt et al. Crowd-based ontology engineering with the uComp Protégé plugin
CN109977300A (en) Enterprise's public sentiment acquisition methods, device, terminal and computer storage medium
US20180176117A1 (en) Method and program product for a private performance network with geographical load simulation
CN109992737A (en) Third party's web page contents checking method, device and electronic equipment
US11314795B2 (en) User navigation in a target portal
CN109814868A (en) Network transmission analogy method, device, computer equipment and storage medium
US20170193087A1 (en) Real-Time Markup of User Text with Deep Links
CN103336693B (en) The creation method of refer chain, device and security detection equipment
CN106233284A (en) The method and system stablizing identifier is generated for the node potentially including main contents in information resources
JP2008299681A (en) Content providing device, content providing method, and computer program therefor
US10740071B2 (en) Predicting and using utility of script execution in functional web crawling and other crawling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination