CN109992737A - Third party's web page contents checking method, device and electronic equipment - Google Patents
Third party's web page contents checking method, device and electronic equipment Download PDFInfo
- Publication number
- CN109992737A CN109992737A CN201910263886.1A CN201910263886A CN109992737A CN 109992737 A CN109992737 A CN 109992737A CN 201910263886 A CN201910263886 A CN 201910263886A CN 109992737 A CN109992737 A CN 109992737A
- Authority
- CN
- China
- Prior art keywords
- party
- resource
- webpage
- moment
- legal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The embodiment of the invention discloses a kind of third party's web page contents checking method, device and electronic equipment, device and electronic equipments.Method therein includes: to imitate user access activity at the first moment, and by content corresponding to the link of third party's webpage, automatic load is in third party's browser;The resource acquisition record for parsing third party's browser, obtains the resource path list of third party's webpage and stores;According to the resource path list, resource corresponding to third party's webpage described in the first moment is obtained;After the resource is reviewed, if illegally, the link of third party's webpage is by undercarriage.The embodiment of the present invention content can audit corresponding to third party's web page interlinkage to insertion, it is ensured that the health and safety of third party's web page contents avoids issuable adverse effect even legal risk for the operator of current application, ensures smoothly operation.
Description
Technical field
The present invention relates to technical field of network security more particularly to a kind of third party's web page contents checking method, device and
Electronic equipment, device and electronic equipment.
Background technique
In a certain webpage, it is often embedded in the link of third party's webpage, still, these, which link corresponding web page contents, is
Current web page operator bad control.If the content of third party's webpage violates the requirement of relevant laws and regulations, it is possible to meeting
Adverse effect is caused to the current web page operator, or even brings some legal risks.
Therefore how efficiently third party's web page contents of insertion to be audited, it is those skilled in the art's urgent need to resolve
The technical issues of.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of method, apparatus and electricity for promoting third party's webpage opening speed
Sub- equipment, at least part of solution problems of the prior art.
In a first aspect, the embodiment of the invention provides a kind of third party's web page contents checking methods, comprising:
It is automatic to add by content corresponding to the link of third party's webpage by imitating user access activity at the first moment
It is loaded in third party's browser;
The resource acquisition record of third party's browser is parsed, to obtain the resource path column of third party's webpage
Table;
Based on the resource path list, resource corresponding to third party's webpage described in first moment is obtained;
It is confirmed as illegally removing the link of third party's webpage in response to the resource.
A kind of specific implementation according to an embodiment of the present invention, whether the resource is determined by machine legal.
A kind of specific implementation according to an embodiment of the present invention, it is legal to be confirmed as in response to the resource, further includes
Following steps:
At the second moment, by imitating user access activity, again certainly by content corresponding to the link of third party's webpage
Dynamic load is in third party's browser;
The resource acquisition record of third party's browser is parsed, to obtain the resource path column of third party's webpage
Table;
Based on the resource path list, resource corresponding to third party's webpage described in second moment is obtained;
Judge whether resource corresponding to third party's webpage described in second moment is legal;
It is illegal in response to resource corresponding to third party's webpage described in second moment, remove third party's webpage
Link.
A kind of specific implementation according to an embodiment of the present invention judges that the institute of third party's webpage described in second moment is right
The whether legal size for comprising determining that resource corresponding to third party's webpage described in first moment of the resource answered and described
Whether the size of resource corresponding to third party's webpage described in two moment is identical;
In response to third party described in resource corresponding to third party's webpage described in first moment and second moment
Resource size corresponding to webpage is identical, determines that resource corresponding to third party's webpage described in second moment is legal.
A kind of specific implementation according to an embodiment of the present invention, in response to third party's webpage institute described in first moment
Corresponding resource is different from resource size corresponding to third party's webpage described in second moment, the side audited by machine
Formula judges whether resource corresponding to third party's webpage described in second moment is legal.
A kind of specific implementation according to an embodiment of the present invention judges that the institute of third party's webpage described in second moment is right
Whether the resource answered is legal to include:
Judge whether resource corresponding to third party's webpage described in second moment is legal using machine learning mode.
A kind of specific implementation according to an embodiment of the present invention, between first moment and second moment when
Between be spaced preassign.
A kind of specific implementation according to an embodiment of the present invention, the resource path list of third party's webpage include with
It is at least one of lower: the URL of Javascript, the URL of pattern file, the URL of picture and the URL of external resource, the outside
Resource includes font file, audio, at least one of document in video and page.
A kind of specific implementation according to an embodiment of the present invention, the machine determine that the resource is in the following manner
It is no legal:
The search based on preset keyword is carried out from the resource;And
Determine whether the resource is legal based on search result.
Second aspect, the embodiment of the invention also provides a kind of third party's web page contents to audit device, comprising:
First loading module is used at the first moment, by imitating user access activity, by the link institute of third party's webpage
Corresponding content, automatic load is in third party's browser;
First parsing module, the resource acquisition for parsing third party's browser records, to obtain the third party
The resource path list of webpage;
First resource obtains module, for being based on the resource path list, obtains third party described in first moment
Resource corresponding to webpage;
First remove module illegally removes the link of third party's webpage for being confirmed as in response to the resource.
A kind of specific implementation according to an embodiment of the present invention, whether the resource is determined by machine legal.
A kind of specific implementation according to an embodiment of the present invention, described device further include:
Second loading module, it is legal for being confirmed as in response to the resource, at the second moment, visited by imitating user
It asks behavior, content corresponding to the link of third party's webpage is loaded automatically again in third party's browser;
Second parsing module, the resource acquisition for parsing third party's browser records, to obtain the third party
The resource path list of webpage;
Secondary resource obtains module, for being based on the resource path list, obtains third party described in second moment
Resource corresponding to webpage;
Second remove module, for judging whether resource corresponding to third party's webpage described in second moment is legal;
And it is illegal in response to resource corresponding to third party's webpage described in second moment, remove the link of third party's webpage.
A kind of specific implementation according to an embodiment of the present invention, in second remove module further include:
Comparing unit, for determining the size and described of resource corresponding to third party's webpage described in first moment
Whether the size of resource corresponding to third party's webpage described in two moment is identical;
First response unit, in response to resource corresponding to third party's webpage described in first moment and described the
Resource size corresponding to third party's webpage described in two moment is identical, determines corresponding to third party's webpage described in second moment
Resource it is legal.
A kind of specific implementation according to an embodiment of the present invention, the second remove module further include:
Second response unit, in response to resource corresponding to third party's webpage described in first moment and described the
Resource size corresponding to third party's webpage described in two moment is different, in such a way that machine is audited, judges second moment
Whether resource corresponding to third party's webpage is legal.
A kind of specific implementation according to an embodiment of the present invention in second remove module, judges the second moment institute
Whether legal stating resource corresponding to third party's webpage includes: to judge third described in second moment using machine learning mode
Whether resource corresponding to square webpage is legal.
A kind of specific implementation according to an embodiment of the present invention, between first moment and second moment when
Between be spaced preassign.
A kind of specific implementation according to an embodiment of the present invention, the resource path list of third party's webpage include with
It is at least one of lower: the URL of Javascript, the URL of pattern file, the URL of picture and the URL of external resource, the outside
Resource includes font file, audio, at least one of document in video and page.
A kind of specific implementation according to an embodiment of the present invention, the machine determine that the resource is in the following manner
It is no legal: to extract keyword from the resource;And determine whether the resource is legal based on the keyword.Third
Aspect, the embodiment of the invention also provides a kind of electronic equipment, which includes:
At least one processor;And
The memory being connect at least one processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one processor, and the instruction is by least one processor
It executes, so that at least one processor is able to carry out the in any implementation of aforementioned first aspect or first aspect
Tripartite's web page contents checking method.
Fourth aspect, the embodiment of the invention also provides a kind of non-transient computer readable storage medium, the non-transient meters
Calculation machine readable storage medium storing program for executing stores computer instruction, and the computer instruction is for making the computer execute aforementioned first aspect or the
Third party's web page contents checking method in any implementation of one side.
5th aspect, the embodiment of the invention also provides a kind of computer program product, which includes
The calculation procedure being stored in non-transient computer readable storage medium, the computer program include program instruction, when the program
When instruction is computer-executed, the computer is made to execute the third in aforementioned first aspect or any implementation of first aspect
Square web page contents checking method.
Third party's web page contents checking method provided in an embodiment of the present invention, device and electronic equipment, non-transient computer
In readable storage medium storing program for executing and computer program:
It, can (can also be with by third party's browser when link of third party's webpage in the form of a certain is embedded in current web page
As custom browser) user's click behavior is imitated, by the content of third party's webpage, automatic load is in the browser.The
The content of tripartite's webpage includes, html (main body of webpage), javascript file (determine the behavior of webpage, such as various
Event response, such as click), pattern file (attribute of decision element, for example, appearance, size), and, picture, frame,
Iframe etc..After the completion of the behavior loaded in third party's webpage, the request resource path list of third party's webpage is automatically analyzed,
The corresponding the Resources list of corresponding insertion link is formed, and the Resources list is stored.Next, being arranged according to resource
Table obtains content corresponding to the list.Finally, being audited to the content.If auditing result is illegal, for example, in violation of rules and regulations or
It is illegal or violate public order and good custom, then undercarriage link.
In a preferred embodiment, because the corresponding content of the Resources list has timeliness, that is to say, that third party
The exploitation side of webpage its can in real time or regularly update, therefore, in order to keep audit continuous and effective, then need periodically to insertion
The content of third party's webpage check that the mode inspected periodically is as follows: the preferentially relatively file size at two moment, if literary
Part is in the same size, generally it is believed that content corresponding to list does not change, then without considering whether undercarriage;If file size is not
Unanimously, then the corresponding particular content of the list is obtained, then considers whether undercarriage after audit again.
In a preferred embodiment, if third party's webpage (can be understood as " two jump ") of insertion be also embedded in it is another
The webpage (can be understood as three jumps) of side, or even there are also " four jump ", the audit of content can also be carried out using this principle.
Obviously, the embodiment of the present invention content can audit corresponding to third party's web page interlinkage to insertion, it is ensured that
The health and safety of third party's web page contents avoids issuable adverse effect even law wind for the operator of current application
Danger ensures smoothly operation.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this field
For those of ordinary skill, without creative efforts, it can also be obtained according to these attached drawings other attached drawings.
Fig. 1 is a kind of step flow chart of third party's web page contents checking method provided in an embodiment of the present invention;
Fig. 2 be another embodiment of the present invention provides a kind of third party's web page contents checking method step flow chart;
Fig. 3 be another embodiment of the present invention provides a kind of third party's web page contents checking method in, the second moment third
The step flow chart of the whether legal judgement of square web page contents;
Fig. 4 is the structural block diagram that third party's web page contents provided in an embodiment of the present invention audit device;
Fig. 5 be another embodiment of the present invention provides third party's web page contents audit device structural block diagram;
Fig. 6 be another embodiment of the present invention provides third party's web page contents audit device in, realize the second moment third
The block diagram of the whether legal structure of square web page contents;
Fig. 7 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described in detail with reference to the accompanying drawing.
Illustrate embodiment of the present disclosure below by way of specific specific example, those skilled in the art can be by this specification
Disclosed content understands other advantages and effect of the disclosure easily.Obviously, described embodiment is only the disclosure
A part of the embodiment, instead of all the embodiments.The disclosure can also be subject to reality by way of a different and different embodiment
It applies or applies, the various details in this specification can also be based on different viewpoints and application, in the spirit without departing from the disclosure
Lower carry out various modifications or alterations.It should be noted that in the absence of conflict, the feature in following embodiment and embodiment can
To be combined with each other.Based on the embodiment in the disclosure, those of ordinary skill in the art are without creative efforts
Every other embodiment obtained belongs to the range of disclosure protection.
It should be noted that the various aspects of embodiment within the scope of the appended claims are described below.Ying Xian
And be clear to, aspect described herein can be embodied in extensive diversified forms, and any specific structure described herein
And/or function is only illustrative.Based on the disclosure, it will be understood by one of ordinary skill in the art that one described herein
Aspect can be independently implemented with any other aspect, and can combine the two or both in these aspects or more in various ways.
For example, any several aspects set forth herein can be used to carry out facilities and equipments and/or practice method.In addition, can be used
Other structures other than one or more of aspect set forth herein and/or it is functional implement this equipment and/or
Practice the method.
It should also be noted that, diagram provided in following embodiment only illustrates the basic structure of the disclosure in a schematic way
Think, component count, shape and the size when only display is with component related in the disclosure rather than according to actual implementation in schema are drawn
System, when actual implementation kenel, quantity and the ratio of each component can arbitrarily change for one kind, and its assembly layout kenel can also
It can be increasingly complex.
In addition, in the following description, specific details are provided for a thorough understanding of the examples.However, fields
The skilled person will understand that the aspect can be practiced without these specific details.
The embodiment of the present disclosure provides a kind of generation method of Webpage correlation video.Webpage correlation video provided in this embodiment
Generation method can be executed by a computing device, which can be implemented as software, or be embodied as software and hard
The combination of part, which, which can integrate, is arranged in server, terminal device etc..
Fig. 1 is a kind of step flow chart of third party's web page contents checking method provided in an embodiment of the present invention.
The application scenarios of the present embodiment are as follows: in current web page, be embedded in third party's web page interlinkage.For example, " content is flat
In platform ", it is embedded in the link of one " allow and share knowledge as habit ".The link can be written form, in other field
Under scape, or graphic form, or be dynamic video form.The implementation of the present embodiment is not influenced.The link " allows sharing
Knowledge become habit " it is corresponding be third party's webpage homepage.Third party is by the network address of its webpage, and in above content platform
The form of upper need link to be shown consigns to the developer of content platform, it is desirable to by the way that its web page interlinkage is placed on flow
Expand the popularity and user's pageview of platform on biggish content platform.
For the operator of content platform, if the content for being embedded in third party's webpage of its platform is illegal, it may give
Its platform brings undesirable influence, or even may also generate legal risk, leads to the not smooth of operation.Here illegally include
Third party's webpage propagates the content for morally violating public order and good custom, alternatively, there are the wind in copyright for the content of third party's webpage
The legal issues such as danger, although harbor principle gives certain exemption, can know in advance, early to find, early processing is obvious
More preferably.
In order to avoid drawbacks described above, the embodiment of the present invention is intended to using third party's browser, in advance to third party in link
The content progress of webpage is pre-loaded, then analyzes the resource request list of third party's browser, then by these lists, obtains
Corresponding resource is obtained in turn to audit above-mentioned acquisition resource by artificial or machine.
Illustrate how to obtain as the web page resources of third party's webpage first, referring to Fig.1 in S101, S102 and S103.
S101, at the first moment, by imitating user access activity, by content corresponding to the link of third party's webpage,
Automatic load is in third party's browser.
Because third party is likely to that its network address and web page interlinkage are only consigned to the content platform, in its webpage
The resource being related to, there is no deliver.For example, resource involved in webpage includes:
1) html language file is the main body of webpage;
2) file of javascript determines the behavior of webpage, such as various event responses, such as click;,
3) pattern file, the attribute of decision element, for example, appearance, size.Pattern file is generally used for webpage, generally
It is all CSS, file is using .css as suffix.Style sheet is the definition about the following element of file:
A, the default font of title and text, size and color;
B, preceding page appearance;
C, the arrangement pitch of single part;
D, line space, surrounding margin are away from distance between, title etc.;
E, any table of contents automatically generated should include how many grades of titles;
F, any boilerplate content etc. for including in corresponding page.
4) picture, frame, iframe etc..
These resources need the third party's justice browser created in through this embodiment to obtain.
Third party's browser in the present embodiment is the kernel based on existing browser, carries out modification, the adjustment of function
With obtain after extension.For those skilled in the art, it is based on existing browser, according to customized in the present embodiment
Browser needs the function completed to modify, adjust the extension of either function, be it is known, the present invention no longer goes to live in the household of one's in-laws on getting married herein
It states.
After the completion of the building of third party's browser, by way of the triggering third party website imitating user links behavior,
Such as webpage clicking link (however, the present invention is not limited thereto clicks a kind of way of realization of only the present embodiment), by third party
The resource of webpage, load in third party's browser, as previously mentioned, resource here include html language file,
File, the pattern file of javascript, and, picture, frame, iframe etc..
S102 parses the resource acquisition record of third party's browser, to obtain the resource road of third party's webpage
Diameter list.
In this step, in third party's browser by third party's webpage load after the completion of, automatically analyze third party's net
Page resource path list, the resource path list of third party's webpage include at least one of the following: the URL of Javascript,
The URL of the URL of pattern file, the URL of picture and external resource, the external resource include font file, audio, video and
At least one of document in page.
S103 is based on the resource path list, obtains resource corresponding to third party's webpage described in first moment.
According to the URL of Javascript, the URL of pattern file, the picture in the resource path list of third party's webpage
URL, and, font file, audio-video, the information such as URL of external resource of document in page obtain corresponding to third party's webpage
Then these resources are stored in the caching of current application local by resource, be used for follow-up checks.
S104 is confirmed as illegally removing the link of third party's webpage in response to the resource.
Illustrate the main body of audit first.
In one embodiment, audit resource can be based on program, be audited automatically by machine.For example, presetting certain
Keyword is as machine search target, then, is based on search result, determines whether above-mentioned resource is legal.Specifically, be exactly
Search, if occurring, illegally, if not occurring, can be regarded as closing with the presence or absence of the keyword as search target in resource
Method.It is of course also possible to be manual examination and verification, but this efficiency is with regard to very low.Assuming that there is many third parties on some content platform
Link, then, manual examination and verification will also form heavy work load.In yet some other cases, it can also be audited and be done by machine
Initial survey, by manually being rechecked, to avoid erroneous judgement, has taken into account efficiency and accurate to the web page contents come are sifted out.
Obviously, the present embodiment content can be audited corresponding to third party's web page interlinkage to insertion, it is ensured that third
The health and safety of square web page contents avoids issuable adverse effect even legal risk for the operator of current application,
Ensure smoothly operation.
Another embodiment of the present invention is described further below with reference to Fig. 2.
There is the corresponding content of the Resources list timeliness this is because third party's webpage can regularly update therefore, to be
Audit is more efficient, then needs to inspect periodically,
It is the step flow chart of third party's web page contents checking method provided in an embodiment of the present invention, packet referring to Fig. 2, Fig. 2
Include following steps:
S201, at the first moment, by imitating user access activity, by content corresponding to the link of third party's webpage,
Automatic load is in third party's browser.
S202 parses the resource acquisition record of third party's browser, to obtain the resource road of third party's webpage
Diameter list.
In this step, in third party's browser by third party's webpage load after the completion of, automatically analyze third party's net
Page resource path list, the resource path list of third party's webpage include the URL of Javascript, pattern file URL,
The URL of picture, and, font file, audio-video, in page the external resource of document URL.
S203 is based on the resource path list, obtains resource corresponding to third party's webpage described in first moment.
According to URL, the sample as the Javascript in the resource path list in " obtaining " webpage of third party's webpage
The URL of the URL of formula file, picture, and, font file, audio-video, the information such as URL of external resource of document in page obtain
Then these resources are stored in the caching of current application local by resource corresponding to third party's webpage, be used for follow-up checks.
It is legal to be confirmed as in response to the resource, further includes operating as follows.
S204 audits the resource:
If illegal, execute:
S205 removes the link of third party's webpage;
If legal, S206~S210 is executed
S206 after designated time interval, at the second moment, imitates user access activity, by the link of third party's webpage
Corresponding content, automatic load is in third party's browser again;
It is similar with above-mentioned steps S201, the content load of third party's webpage is carried out again.
It should be noted that can also can also be verified according to the actual situation with not specified time interval.The present invention
It does not limit this.
S207 parses the resource acquisition record of third party's browser, to obtain the resource road of third party's webpage
Diameter list.
It is similar with the S202 in above-described embodiment.
S208 obtains resource corresponding to third party's webpage described in the second moment according to the resource path list.
It is similar with the S203 in above-described embodiment.
S209 judges whether resource corresponding to third party's webpage described in second moment is legal,
Here legal, is to see whether the content of third party's webpage obviously violates public order and good custom or legal provisions.One
In a embodiment, audit resource can be based on program, be audited automatically by machine.The audit can be searching based on certain keywords
Rope.It is of course also possible to be manual examination and verification, but this efficiency is with regard to very low.Assuming that there is many third parties on certain content platforms
Link, then, artificial audit is also work load.In yet some other cases, it can also be audited by machine and do initial survey, to sieve
Web page contents out, to avoid erroneous judgement, have taken into account efficiency and preparation by manually being rechecked.
If it is illegal, then execute S210, then it is non-in response to resource corresponding to third party's webpage described in second moment
Method removes the link of third party's webpage.
If legal: after then waiting with a specified time interval, continuing acquisition third hair web page contents and audited.
It needs exist for making further instructions the first moment and the second moment.
First, it is not only to carry out the audit of third party's web page contents two moment;
The second, the second moment is used only to a moment at the first moment of difference, shows after a moment
, by another moment of scheduled time interval;
Third, explaining as can be seen that the second moment can also regard " when first of next time interval as from second point
Carve ", after the second moment, there can also be the audit at third moment.
It is specified that the time interval at the four, the first moment and the second moment first passes through experience in advance.
The present embodiment can the content to third party's webpage carry out continue dynamic examining, dynamic guarantee third party's web page contents
Health it is legal.
Another embodiment of the present invention is described further below with reference to Fig. 3.
Referring to Fig. 3, give in third party's web page contents checking method one embodiment of the present invention, in third party's webpage
After content has carried out primary audit, if its auditing result be it is legal, then judge the second moment acquisition web page contents whether
When legal, it may include steps of:
S301 determines the size of resource corresponding to the first moment third party's webpage;
S302 determines the size of resource corresponding to the second moment third party's webpage;
Two moment third party's web page resources sizes are compared by S303, judgement:
S304, if corresponding to third party's webpage at resource corresponding to third party's webpage at the first moment and the second moment
Resource size is identical, then resource corresponding to second moment third party's webpage is legal.
If corresponding to third party's webpage at resource corresponding to third party's webpage at the first moment of S305 and the second moment
Resource size is different, then
S306 judges that resource corresponding to third party's webpage described in second moment is in such a way that machine is audited
It is no legal.
By the embodiment, in the case where the analysis without substantive content, webpage can be completed with higher efficiency
The judgement of content.
Second aspect, the embodiment of the invention also provides a kind of third party's web page contents to audit device.
The application scenarios of the present embodiment are as follows: in current web page, be embedded in third party's web page interlinkage.For example, in some
Hold in platform, is embedded in the link of one " allow and share knowledge as habit ".The link can be written form, in other
Under scene, or graphic form, or be dynamic video form.The implementation of the present embodiment is not influenced.The link " allows point
Enjoy knowledge become habit " it is corresponding be third party's webpage homepage.Third party is flat by the network address of its webpage, and in above content
The form of link to be shown is needed to consign to the developer of this content platform on platform, it is desirable to by placing its web page interlinkage
Expand the popularity of platform and user's pageview of platform webpage on the biggish content platform of flow.
The operator of content platform is said, if the content for being embedded in third party's webpage of its platform is illegal, it may be given
Undesirable influence is brought, or even legal risk may also be generated, leads to the not smooth of operation.Here illegally include third party's net
Page, which is propagated, morally violates the content of public order and good custom, alternatively, there are the laws such as risk in copyright for the content of third party's webpage
Problem, although harbor principle gives certain exemption, can know in advance, early to find, early processing is obviously more preferable.
In order to avoid drawbacks described above, the embodiment of the present invention is intended to using third party's browser, in advance to third party in link
The content progress of webpage is pre-loaded, then analyzes the resource request list of third party's browser, then by these lists, obtains
Corresponding resource is obtained in turn to audit above-mentioned acquisition resource by artificial or machine.
Illustrate how third party's webpage web page resources obtain first, shows third of the embodiment of the present invention referring to Fig. 4, Fig. 4
The structural block diagram of square web page contents audit device.Below to the first load mould 41 therein, the first parsing module 42 and the first money
Source obtains module 43.
First load mould 41, is used at the first moment, by imitating user access activity, by the link institute of third party's webpage
Corresponding content, automatic load is in third party's browser.
Because third party is likely to that its network address and web page interlinkage are only consigned to this content platform, for its webpage
Involved in resource, there is no deliver.For example, resource involved in webpage includes:
1) html language file is the main body of webpage;
2) file of javascript determines the behavior of webpage, such as various event responses, such as click;,
3) pattern file, the attribute of decision element, for example, appearance, size.Pattern file is generally used for webpage, generally
It is all CSS, file is using .css as suffix.Style sheet is the definition about the following element of file:
A, the default font of title and text, size and color;
B, preceding page appearance;
C, the arrangement pitch of single part;
D, line space, surrounding margin are away from distance between, title etc.;
E, any table of contents automatically generated should include how many grades of titles;
F, any boilerplate content etc. for including in corresponding page.
4) picture, frame, iframe etc..
These resources need the third party's justice browser created in through this embodiment to obtain.
Third party's browser in the present embodiment is the kernel based on existing browser, carries out modification, the adjustment of function
With obtain after extension.For those skilled in the art, it is based on existing browser, according to customized in the present embodiment
Browser needs the function completed to modify, adjust the extension of either function, be it is known, the present invention no longer goes to live in the household of one's in-laws on getting married herein
It states.
After the completion of the building of third party's browser, by way of the triggering third party website imitating user links behavior,
Such as webpage clicking link (however, the present invention is not limited thereto clicks a kind of way of realization of only the present embodiment), by third party
The resource of webpage, load in third party's browser, as previously mentioned, resource here include html language file,
File, the pattern file of javascript, and, picture, frame, iframe etc..
First parsing module 42, the resource acquisition for parsing third party's browser records, to obtain the third
The resource path list of square webpage.
In this step, in third party's browser by third party's webpage load after the completion of, automatically analyze third party's net
Page resource path list, the resource path list of third party's webpage include the URL of Javascript, pattern file URL,
The URL of picture, and, font file, audio-video, in page the external resource of document URL.
First resource obtains module 43, for being based on the resource path list, obtains third described in first moment
Resource corresponding to square webpage.
According to the URL of Javascript, the URL of pattern file, the picture in the resource path list of third party's webpage
URL, and, font file, audio-video, the information such as URL of external resource of document in page obtain corresponding to third party's webpage
Then these resources are stored in the caching of current application local by resource, be used for follow-up checks.
First remove module 44 illegally removes the chain of third party's webpage for being confirmed as in response to the resource
It connects.
In one embodiment, audit resource can be based on program, be audited automatically by machine.For example, presetting certain
Keyword is as machine search target, then, is based on search result, determines whether above-mentioned resource is legal.Specifically, be exactly
Search, if occurring, illegally, if not occurring, can be regarded as closing with the presence or absence of the keyword as search target in resource
Method.It is of course also possible to be manual examination and verification, but this efficiency is with regard to very low.Assuming that there is many third parties on some content platform
Link, then, manual examination and verification will also form heavy work load.In yet some other cases, it can also be audited and be done by machine
Initial survey, by manually being rechecked, to avoid erroneous judgement, has taken into account efficiency and accurate to the web page contents come are sifted out.
Obviously, the present embodiment content can be audited corresponding to third party's web page interlinkage to insertion, it is ensured that third
The health and safety of square web page contents avoids issuable adverse effect even legal risk for the operator of current application,
Ensure smoothly operation.
Referring to Fig. 5, Fig. 5 be another embodiment of the present invention provides third party's web page contents audit the structural block diagram of device.
Include:
First load mould 51 will be corresponding to the link of third party's webpage for imitating user access activity at the first moment
Content, automatic load is in third party's browser.
First parsing module 52, the resource acquisition for parsing third party's browser record, and obtain the third party
The resource path list of webpage.
First resource obtains module 53, for obtaining third party's net described in the first moment according to the resource path list
Resource corresponding to page.
First remove module 54, for being audited to the resource, if illegally, removing the chain of third party's webpage
It connects.
Second loading module 55, at the second moment, imitating user access activity after designated time interval, by the
Content corresponding to the link of tripartite's webpage, automatic load is in third party's browser again;
Second parsing module 56, the resource acquisition for parsing third party's browser record, and obtain the third party
The resource path list of webpage;
Secondary resource obtains module 57, for obtaining third party's net described in the second moment according to the resource path list
Resource corresponding to page;
Second remove module 58, for judging whether resource corresponding to third party's webpage described in second moment closes
Method, if it is not, then deleting the link of third party's webpage.
It needs exist for making further instructions the first moment and the second moment.
First, it is not only to carry out the audit of third party's web page contents two moment;
The second, the second moment is used only to a moment at the first moment of difference, shows after a moment
, by another moment of scheduled time interval;
Third, explaining as can be seen that the second moment can also regard " when first of next time interval as from second point
Carve ", after the second moment, there can also be the audit at third moment.
It is specified that the time interval at the four, the first moment and the second moment first passes through experience in advance.
The present embodiment can the content to third party's webpage carry out continue dynamic examining, dynamic guarantee third party's web page contents
Health it is legal.
Referring to Fig. 6, Fig. 6 be another embodiment of the present invention provides third party's web page contents audit in device, second removes
The structural block diagram of module.It include: comparing unit 60, for determining resource corresponding to third party's webpage described in first moment
Size and second moment described in resource corresponding to third party's webpage size it is whether identical;
First response unit 61, in response to resource corresponding to third party's webpage described in first moment with it is described
Resource size corresponding to third party's webpage described in second moment is identical, determines that the institute of third party's webpage described in second moment is right
The resource answered is legal.
Second response unit 62, in response to resource corresponding to third party's webpage described in first moment with it is described
Resource size corresponding to third party's webpage described in second moment is different, in such a way that machine is audited, when judging described second
Whether legal carve resource corresponding to third party's webpage.
Preferably, judge specifically to use when whether resource corresponding to third party's webpage described in second moment is legal:
Judge whether resource corresponding to third party's webpage described in second moment is legal using machine learning mode.In addition, one
In a embodiment, the time interval between the first moment and second moment can be preassigned.
By the embodiment, in the case where the analysis without substantive content, webpage can be completed with higher efficiency
The judgement of content.
A kind of specific implementation according to an embodiment of the present invention, between first moment and the time at second moment
Every preassigning, with the content periodic auditing to third party's webpage.
Fig. 7 shows the structural schematic diagram of electronic equipment 70 provided in an embodiment of the present invention, and electronic equipment 70 includes at least
One processor 701 (such as CPU), at least one input/output interface 704, memory 702 and at least one communication bus
703, for realizing the connection communication between these components.At least one processor 701 is used to execute to store in memory 702
Computer instruction, so that at least one described processor 701 is able to carry out aforementioned any third party's web page contents checking method side
The embodiment of method.Memory 702 is non-transient memory (non-transitory memory), may include volatibility and deposits
Reservoir, such as high-speed random access memory (RAM:Random Access Memory), also may include non-volatile memories
Device (non-volatile memory), for example, at least a magnetic disk storage.It (can by least one input/output interface 704
To be wired or wireless communication interface) realize and the communication connection between at least one other equipment or unit.
In some embodiments, memory 702 stores program 7021, and processor 701 executes program 7021, for holding
Content in the aforementioned any embodiment of the method for promoting third party's webpage opening speed of row.
The electronic equipment can exist in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data
Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low
Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function
Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio,
Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) particular server: providing the equipment of the service of calculating, and the composition of server includes processor, hard disk, memory, is
Bus of uniting etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, is handling
Ability, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic equipments with data interaction function.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.
For Installation practice, since it is substantially similar to the method embodiment, so the comparison of description is simple
Single, the relevent part can refer to the partial explaination of embodiments of method.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction
The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass
Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment
It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings
Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable
Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.
In the above-described embodiment, multiple steps or method can be with storages in memory and by suitable instruction execution
The software or firmware that system executes are realized.For example, in another embodiment, can be used if realized with hardware
Any one of following technology well known in the art or their combination are realized: being had for realizing logic function to data-signal
The discrete logic of the logic gates of energy, the specific integrated circuit with suitable combinational logic gate circuit, programmable gate
Array (PGA), field programmable gate array (FPGA) etc..
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by those familiar with the art, all answers
It is included within the scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.
Claims (12)
1. a kind of third party's web page contents checking method characterized by comprising
At the first moment, by imitating user access activity, by content corresponding to the link of third party's webpage, automatic load exists
In third party's browser;
The resource acquisition record of third party's browser is parsed, to obtain the resource path list of third party's webpage;
Based on the resource path list, resource corresponding to third party's webpage described in first moment is obtained;
It is confirmed as illegally removing the link of third party's webpage in response to the resource.
2. the method according to claim 1, wherein
Whether the resource is determined by machine legal.
3. the method according to claim 1, wherein be confirmed as in response to the resource it is legal, further include as
Lower step:
At the second moment, by imitating user access activity, content corresponding to the link of third party's webpage is added automatically again
It is loaded in third party's browser;
The resource acquisition record of third party's browser is parsed, to obtain the resource path list of third party's webpage;
Based on the resource path list, resource corresponding to third party's webpage described in second moment is obtained;
Judge whether resource corresponding to third party's webpage described in second moment is legal;
It is illegal in response to resource corresponding to third party's webpage described in second moment, remove the chain of third party's webpage
It connects.
4. according to the method described in claim 3, it is characterized in that, judging corresponding to third party's webpage described in second moment
The whether legal size for comprising determining that resource corresponding to third party's webpage described in first moment of resource and described second
Whether the size of resource corresponding to third party's webpage described in the moment is identical;
In response to third party's webpage described in resource corresponding to third party's webpage described in first moment and second moment
Corresponding resource size is identical, determines that resource corresponding to third party's webpage described in second moment is legal.
5. according to the method described in claim 4, it is characterized in that,
In response to third party's webpage described in resource corresponding to third party's webpage described in first moment and second moment
Corresponding resource size is different, in such a way that machine is audited, judges corresponding to third party's webpage described in second moment
Resource it is whether legal.
6. according to the method described in claim 3, it is characterized in that, judging corresponding to third party's webpage described in second moment
Resource whether legal include:
Judge whether resource corresponding to third party's webpage described in second moment is legal using machine learning mode.
7. method according to any one of claim 3 to 6, which is characterized in that
Time interval between first moment and second moment preassigns.
8. the method according to claim 1, wherein
The resource path list of third party's webpage includes at least one of the following: the URL of Javascript, pattern file
The URL of URL, the URL of picture and external resource, the external resource include font file, audio, in video and page in document
At least one.
9. according to the method described in claim 2, it is characterized in that, whether the machine determines the resource in the following manner
It is legal:
The search based on preset keyword is carried out from the resource;And
Determine whether the resource is legal based on search result.
10. a kind of third party's web page contents audit device characterized by comprising
First loading module was used at the first moment, will be corresponding to the link of third party's webpage by imitating user access activity
Content, automatic load is in third party's browser;
First parsing module, the resource acquisition for parsing third party's browser records, to obtain third party's webpage
Resource path list;
First resource obtains module, for being based on the resource path list, obtains third party's webpage described in first moment
Corresponding resource;
First remove module illegally removes the link of third party's webpage for being confirmed as in response to the resource.
11. a kind of electronic equipment, which is characterized in that the electronic equipment includes:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
It manages device to execute, so that at least one described processor is able to carry out in third party's webpage described in aforementioned any claim 1-9
Hold checking method.
12. a kind of machine readable media is stored thereon with computer executable instructions, the computer executable instructions are by machine
Device makes the machine execute third party's web page contents checking method according to claim 1 to 9 when executing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910263886.1A CN109992737A (en) | 2019-04-03 | 2019-04-03 | Third party's web page contents checking method, device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910263886.1A CN109992737A (en) | 2019-04-03 | 2019-04-03 | Third party's web page contents checking method, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109992737A true CN109992737A (en) | 2019-07-09 |
Family
ID=67132097
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910263886.1A Pending CN109992737A (en) | 2019-04-03 | 2019-04-03 | Third party's web page contents checking method, device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109992737A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110740126A (en) * | 2019-09-23 | 2020-01-31 | 紫光云(南京)数字技术有限公司 | Method, device and system for accessing smart city application program and computer storage medium |
CN111327609A (en) * | 2020-02-14 | 2020-06-23 | 北京奇艺世纪科技有限公司 | Data auditing method and device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102571783A (en) * | 2011-12-29 | 2012-07-11 | 北京神州绿盟信息安全科技股份有限公司 | Phishing website detection method, device and system as well as website |
CN103428183A (en) * | 2012-05-23 | 2013-12-04 | 北京新媒传信科技有限公司 | Method and device for identifying malicious website |
US20140283038A1 (en) * | 2013-03-15 | 2014-09-18 | Shape Security Inc. | Safe Intelligent Content Modification |
CN106209579A (en) * | 2016-06-28 | 2016-12-07 | 武汉斗鱼网络科技有限公司 | Barrage website chat process quickly generates the system and method for hyperlink |
CN107294918A (en) * | 2016-03-31 | 2017-10-24 | 阿里巴巴集团控股有限公司 | A kind of fishing webpage detection method and device |
CN108228818A (en) * | 2017-12-29 | 2018-06-29 | 网易(杭州)网络有限公司 | Web page resources loading method and device, electronic equipment and storage medium |
CN108304584A (en) * | 2018-03-06 | 2018-07-20 | 百度在线网络技术(北京)有限公司 | Illegal page detection method, apparatus, intruding detection system and storage medium |
CN109246139A (en) * | 2018-10-25 | 2019-01-18 | 北京城市网邻信息技术有限公司 | A kind of monitoring method, device, electronic equipment and storage medium that website is kidnapped |
-
2019
- 2019-04-03 CN CN201910263886.1A patent/CN109992737A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102571783A (en) * | 2011-12-29 | 2012-07-11 | 北京神州绿盟信息安全科技股份有限公司 | Phishing website detection method, device and system as well as website |
CN103428183A (en) * | 2012-05-23 | 2013-12-04 | 北京新媒传信科技有限公司 | Method and device for identifying malicious website |
US20140283038A1 (en) * | 2013-03-15 | 2014-09-18 | Shape Security Inc. | Safe Intelligent Content Modification |
CN107294918A (en) * | 2016-03-31 | 2017-10-24 | 阿里巴巴集团控股有限公司 | A kind of fishing webpage detection method and device |
CN106209579A (en) * | 2016-06-28 | 2016-12-07 | 武汉斗鱼网络科技有限公司 | Barrage website chat process quickly generates the system and method for hyperlink |
CN108228818A (en) * | 2017-12-29 | 2018-06-29 | 网易(杭州)网络有限公司 | Web page resources loading method and device, electronic equipment and storage medium |
CN108304584A (en) * | 2018-03-06 | 2018-07-20 | 百度在线网络技术(北京)有限公司 | Illegal page detection method, apparatus, intruding detection system and storage medium |
CN109246139A (en) * | 2018-10-25 | 2019-01-18 | 北京城市网邻信息技术有限公司 | A kind of monitoring method, device, electronic equipment and storage medium that website is kidnapped |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110740126A (en) * | 2019-09-23 | 2020-01-31 | 紫光云(南京)数字技术有限公司 | Method, device and system for accessing smart city application program and computer storage medium |
CN111327609A (en) * | 2020-02-14 | 2020-06-23 | 北京奇艺世纪科技有限公司 | Data auditing method and device |
CN111327609B (en) * | 2020-02-14 | 2022-09-30 | 北京奇艺世纪科技有限公司 | Data auditing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mitchell | Web scraping with Python: Collecting more data from the modern web | |
CN102483698B (en) | The client tier checking of dynamic WEB application | |
Courtois et al. | Challenging Google Search filter bubbles in social and political information: Disconforming evidence from a digital methods case study | |
Díaz et al. | The augmented web: rationales, opportunities, and challenges on browser-side transcoding | |
Patil Swati et al. | Search engine optimization: A study | |
CN104766014A (en) | Method and system used for detecting malicious website | |
US20130132851A1 (en) | Sentiment estimation of web browsing user | |
WO2012030730A2 (en) | Systems and methods for ruled based inclusion of pixel retargeting in campaign management | |
CN106126747A (en) | Data capture method based on reptile and device | |
CN103279516B (en) | Web spider identification method | |
US20140164296A1 (en) | Chatbot system and method with entity-relevant content from entity | |
Oltețeanu et al. | What determines creative association? Revealing two factors which separately influence the creative process when solving the remote associates test | |
CN107807937B (en) | Website SEO processing method, device and system | |
CN106503907B (en) | Service evaluation information determination method and server | |
Wohlgenannt et al. | Crowd-based ontology engineering with the uComp Protégé plugin | |
CN109977300A (en) | Enterprise's public sentiment acquisition methods, device, terminal and computer storage medium | |
US20180176117A1 (en) | Method and program product for a private performance network with geographical load simulation | |
CN109992737A (en) | Third party's web page contents checking method, device and electronic equipment | |
US11314795B2 (en) | User navigation in a target portal | |
CN109814868A (en) | Network transmission analogy method, device, computer equipment and storage medium | |
US20170193087A1 (en) | Real-Time Markup of User Text with Deep Links | |
CN103336693B (en) | The creation method of refer chain, device and security detection equipment | |
CN106233284A (en) | The method and system stablizing identifier is generated for the node potentially including main contents in information resources | |
JP2008299681A (en) | Content providing device, content providing method, and computer program therefor | |
US10740071B2 (en) | Predicting and using utility of script execution in functional web crawling and other crawling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |