CN102073728A - Method, device and equipment for determining web access requests - Google Patents

Method, device and equipment for determining web access requests Download PDF

Info

Publication number
CN102073728A
CN102073728A CN2011100067722A CN201110006772A CN102073728A CN 102073728 A CN102073728 A CN 102073728A CN 2011100067722 A CN2011100067722 A CN 2011100067722A CN 201110006772 A CN201110006772 A CN 201110006772A CN 102073728 A CN102073728 A CN 102073728A
Authority
CN
China
Prior art keywords
info web
text information
determined text
web
access means
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100067722A
Other languages
Chinese (zh)
Inventor
姚远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN2011100067722A priority Critical patent/CN102073728A/en
Publication of CN102073728A publication Critical patent/CN102073728A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a method and equipment for determining web access requests, wherein the technical scheme of the invention comprises the following steps: obtaining web information; judging whether the web information satisfies the predetermined conditions when the web information is detected to comprise executable objects which possibly trigger the web access; if the web information satisfies the predetermined conditions, sending the web access requests corresponding to the executable objects. Compared with the prior art, the invention has the advantages that whether the web possibly contains the executable objects or not can be prejudged, and the web requests are not sent to the web without the executable objects, therefore, the web browsing efficiency is improved.

Description

A kind of method, device and equipment that is used for determining the web page access request
Technical field
The present invention relates to web browsing technology, relate in particular to a kind of method, device and equipment that is used for determining the web page access request.
Background technology
All comprised JS, Applet, Ajax, VBscript etc. in current many webpages and can carry out object, and can carry out object, needed browser obtaining to initiate repeatedly on the basis of preliminary info web the web page contents that obtains that web-page requests could be complete for these.Current prior art is that all webpages are all carried out secondary request, but in fact, does not comprise the object carried out that need repeatedly ask in many webpages, and these webpages are repeatedly asked to increase network bandwidth burden, expends more time.
Therefore, need a kind of pre-judgement technology, judge whether webpage needs repeatedly to ask, reaching conserve network bandwidth, and improve the purpose of browse efficiency and user experience.
Summary of the invention
The purpose of this invention is to provide a kind of method and apparatus of determining the web page access request that is used to relate to.
According to an aspect of the present invention, provide a kind of computer implemented method that is used for determining the web page access request, wherein, this method may further comprise the steps:
A obtains info web;
B comprises the object carried out that may trigger web page access when detecting in the described info web, judge whether this info web conforms to a predetermined condition;
-when described info web meets this predetermined condition, initiate can carry out the corresponding web page access request of object with this.
According to another aspect of the present invention, also provide a kind of access means that is used for determining the web page access request, wherein, this access means comprises:
First deriving means, be used to obtain info web;
Judgment means, be used for comprising the object carried out that may trigger web page access when detecting described info web, judge whether this info web conforms to a predetermined condition;
The first request apparatus for initiating, be used for when described info web meets this predetermined condition, initiate and this can carry out the corresponding web page access request of object.
Compared with prior art, the present invention has the following advantages: 1) can be fast judge according to acquired info web whether webpage is the webpage that possible trigger further request under this info web, and according to judged result, only the webpage that may trigger further request is initiated request next time, thereby conserve network bandwidth improves web page access efficient; 2) can come whether may trigger next web-page requests according to the predetermined condition of multilayer level according to the solution of the present invention and judge, make that the result who judges is more accurate info web; 3) can in multiple application, assist relevant device to reduce the web-page requests number of times according to the solution of the present invention, improve access efficiency, for example, can significantly reduce the number of times of Web Spider requested webpage information, improve the speed that the webpage spider obtains webpage; Perhaps, when user's browsing page, reduce the number of times of browse request webpage, improved user's viewing experience etc.
Description of drawings
By reading the detailed description of doing with reference to the following drawings that non-limiting example is done, it is more obvious that other features, objects and advantages of the present invention will become:
Fig. 1 is the computer implemented method flow diagram that is used for determining the web page access request of one aspect of the invention;
Fig. 2 is the method flow diagram that is used for determining the web page access request of a preferred embodiment of the invention;
Fig. 3 is the method flow diagram that is used for determining the web page access request of another preferred embodiment of the present invention;
Fig. 4 has illustrated the structural representation of the access means that is used for definite web page access request of another aspect of the present invention;
Fig. 5 has illustrated the structural representation of the access means that is used for definite web page access request of a preferred embodiment of the invention;
Fig. 6 has illustrated the structural representation of the access means that is used for definite web page access request of another preferred embodiment of the present invention.
Same or analogous Reference numeral is represented same or analogous parts in the accompanying drawing.
Embodiment
Below in conjunction with accompanying drawing the present invention is described in further detail.
Fig. 1 has illustrated the computer implemented method flow diagram that is used for determining the web page access request of one aspect of the invention.Wherein, the method according to this invention can be finished by operating system in the computer equipment or processing controller, for simplicity's sake, below described operating system or processing controller is referred to as access means.Wherein, this computer equipment includes but not limited to: 1) subscriber equipment; 2) network equipment.Described subscriber equipment includes but not limited to computing machine, smart mobile phone, PDA etc.; The described network equipment includes but not limited to group of server that single network server, a plurality of webserver are formed or based on the cloud that is made of a large amount of computing machines or the webserver of cloud computing (Cloud Computing), wherein, cloud computing is a kind of of Distributed Calculation, a super virtual machine of being made up of the loosely-coupled computing machine collection of a group.
In step S1, access means is obtained info web.Wherein, the described mode of obtaining info web includes but not limited to: 1) access means comes web page server to correspondence to carry out a webpage request of obtaining according to the address information of being obtained to obtain; 2) the access means reading and saving is in access means or with the access means physical separation but the info web in the device that communicates to connect.
In step S2, when detecting, access means comprises the object carried out that may trigger web page access in the described info web, judge whether this info web conforms to a predetermined condition.Wherein, the described object carried out that may trigger web page access comprises the object based on Java, JS, Ajax and/or VBscript; Whether described detection mode includes but not limited to analyze and comprises in the info web that is obtained and can carry out the identification information that corresponding code of object or label etc. can be carried out object.
For example, predetermined condition is to comprise in the info web to comprise following arbitrary character string among the URL of webpage of keyword " audition " and info web correspondence: " mp3 ", " rm ", " wma " or " ape ".Access means detects acquired info web, and inquiry obtains the label of javascript in info web, then access means is judged and is comprised the object the carried out JS that may trigger web page access in the described info web, and continues to judge whether this info web conforms to a predetermined condition.Access means to info web and described info web the URL of corresponding webpage analyze, inquiry obtains keyword " audition " in info web, and inquiry obtains character string " wma " in described URL, and then access means judges that described info web conforms to a predetermined condition.
Again for example, predetermined condition is to comprise character string " playlist " and character string " object " in the info web code simultaneously.Access means is inquired about the code corresponding with carrying out object in acquired info web, and the label of acquisition Vbscript, then access means is judged and is comprised the object the carried out Vbscript that may trigger web page access in the described info web, and continue to judge whether this info web conforms to a predetermined condition, judge promptly whether this info web comprises character string " playlist " and character string " object " simultaneously.In the code section of access means, inquire character string at info web
" playlist " and character string " object " have promptly comprised character string " playlist " and character string " object " simultaneously in the code information that info web comprised, then access means judges that info web conforms to a predetermined condition.Wherein, access means can be determined code section in the info web according to the identification information in the info web.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, anyly comprise the object carried out that may trigger web page access in the described info web by detecting, judge the implementation whether this info web conforms to a predetermined condition, all should be within the scope of the present invention.
In step S3, when described info web met this predetermined condition, access means was initiated can carry out the corresponding web page access request of object with this.
For example, to comprising the info web that JS can carry out object, when access means judges that meeting described info web conforms to a predetermined condition, then can carry out object, initiate JS to the server of the corresponding webpage of info web institute and ask according to described JS.
Again for example, judge as access means to comprise in the info web conform to a predetermined condition and can carry out object JS and can carry out object Applet that then browser is initiated the JS request respectively and Applet asks to the server of the corresponding webpage of described info web institute.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, it is any when described info web meets this predetermined condition, initiate can carry out the implementation of the corresponding web page access request of object with this, all should be within the scope of the present invention.
As one of preferred version of the present invention, wherein, the method according to this invention also comprises step S10 (figure does not show), step S11 (figure does not show), step S12 (figure does not show) and step S13 (figure does not show).
In step S10, when judging that an info web meets this predetermined condition, access means is set up according to the address information of webpage under this info web or is upgraded the webpage class library.
Concrete, when access means judges that an info web meets this predetermined condition, if access means has been obtained the address information of the affiliated webpage of this info web, then access means is added this address information or is updated in the webpage class library, if access means is not obtained the address information of the affiliated webpage of this info web, then access means is further obtained described address information, and described address information is added in the webpage class library.
In step S11, access means is obtained the address information of new web page information and affiliated webpage thereof.
Concrete, the mode of obtaining the address information of described new web page information and affiliated webpage thereof includes but not limited to: 1) access means is by obtaining info web in the default info web storehouse, and obtains the address information of webpage under the described info web according to info web search; 2) access means is by obtaining info web in the default info web storehouse, and inquires about the address information that obtains webpage under the described info web in existing and the corresponding database of info web; 3) access means is obtained described address information, and obtains described new web page information etc. according to described address information.
Then, in step S12, access means is inquired about in described webpage class library based on the address information of the webpage that is obtained, to obtain Query Result.
Then, in step S13, when Query Result was coupling, access means was initiated corresponding web page access request to the object carried out in this new web page information.Wherein, described coupling represents that the address information of webpage under the described new web page information and the address information in the webpage class library mate fully.
Concrete, when the address information that in the webpage class library, inquires webpage under the new web page information, then access means is further obtained all kinds of object information of carrying out that comprised in the new web page information, and initiates corresponding web page access request according to the described information of carrying out object.
Method according to present embodiment, access means is for the info web of being asked, can judge fast whether its affiliated webpage is the webpage that possible trigger further request, and according to judged result, only the webpage that may trigger further request is initiated request next time, the access efficiency of raising webpage that can be bigger.For example, climb when getting info web,, can significantly reduce the web-page requests number of times, improve to climb and get express delivery, reduce the consumption of the network bandwidth by adopting the solution of the present invention in Web Spider or reptile etc.; Perhaps, when the user capture webpage, accelerate the webpage formation speed, improve user experience.
Fig. 2 has illustrated the method flow diagram that is used for determining the web page access request of a preferred embodiment of the invention.Method according to present embodiment comprises step S1, step S4, step S2, step S3.
Step S1 is described in detail with reference to the embodiment shown in FIG. 1, and is contained in this by reference, repeats no more.
In step S4, access means is obtained the quantity of the pre-determined text information that is comprised in the described info web.
Wherein, described pre-determined text information comprises following at least one class: 1) short text information; 2) short text information combination.
The mode of obtaining described pre-determined text information content includes but not limited to: the occurrence number of inquiry pre-determined text information and all pre-determined text information of accumulative total in info web.
For example, pre-determined text information comprises " song ", " audition ", " popular program request ", " mp3 ", " newly singing online "; First predetermined threshold is 10.Access means is inquired about aforementioned pre-determined text information in info web, the number of times that obtains appearance " song " in the info web is 5 times, the number of times that obtains appearance " audition " in the info web is 3 times, the number of times that obtains appearance " popular program request " in the info web is 3 times, and then the pre-determined text information content of access means acquisition amounts to 11.
Then, in step S2, comprise the object carried out that may trigger web page access in the described info web, judge whether this info web conforms to a predetermined condition when access means detects.Wherein, described predetermined condition comprises: the pre-determined text information content that described info web comprised is more than or equal to first predetermined threshold, and described first predetermined threshold should be determined according to actual conditions and demand by those skilled in the art.
For example, if first predetermined threshold is 10, the pre-determined text information content of gained is 11 in abovementioned steps S4, and then access means judges that the info web that obtains conforms to a predetermined condition.
Need to prove, described access means is obtained the step of the quantity of the pre-determined text information that is comprised in the described info web and both can be carried out before step S2, also can be included among the step S2 and carry out, for example, in step S2, access means detect comprise in the described info web may trigger the object carried out of web page access after, execution in step S4 obtains the pre-determined text information content, subsequently, based on predetermined condition and the pre-determined text information content that obtained, judge whether the object carried out in this new web page information is initiated corresponding web page access request.
What need further specify is, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any implementation of obtaining the quantity of the pre-determined text information that is comprised in the described info web all should be within the scope of the present invention.
Step S3 is being described in detail with reference among the embodiment shown in Figure 1, and is contained in this by reference, repeats no more.
Fig. 3 has illustrated the method flow diagram that is used for determining the web page access request of another preferred embodiment of the present invention.Method according to present embodiment comprises step S1, step S5, step S6, step S2 and step S3.
Step S1 is being described in detail with reference among the embodiment shown in Figure 1, and is contained in this by reference, repeats no more.
In step S5, access means is obtained the pre-determined text information that comprises in the described info web.
Concrete, the described mode of obtaining pre-determined text information includes but not limited to: inquire about in info web and described pre-determined text information is added up acquisition by access means.
In step S6, access means according to the pre-determined text information of being obtained, determines that the general comment of described info web is worth based on first pre-defined rule.
Wherein, described first pre-defined rule according to following at least each, determine that described general comment is worth:
The total quantity of the pre-determined text information that 1) described info web comprised;
Wherein, the total quantity of described pre-determined text information is the quantity sum of the pre-determined text information of all particular category.Concrete, the number of times that pre-determined text information occurs is inquired about and added up to access means in info web, and according to the total degree that pre-determined text information in the info web occurs, determine that the general comment of described info web is worth.For example, directly the quantitative value with described total degree correspondence is worth as general comment, perhaps, this quantitative value is carried out certain processing, for example, multiply by corresponding coefficient, perhaps, carry out normalized after, be worth as general comment.
Total classification of the pre-determined text information that 2) described info web comprised;
Concrete, access means determines that according to the quantity of the classification of the pre-determined text information that is comprised in the info web general comment of described info web is worth.
For example, first pre-defined rule comprises that the quantity of total classification of the pre-determined text information that comprises according to described info web determines that general comment is worth, and for example, the categorical measure of the pre-determined text information that comprises in the info web is worth as general comment.Access means is analyzed the info web that obtains, and obtains wherein to comprise short text information " song " and " broadcast " of natural language class; The short text information " gequ " of address classes, and other short text information " playmusic " of code word.Access means is according to first pre-defined rule, and the general comment value that obtains described info web is 3.
3) the pairing weighted value of all pre-determined text information that described info web comprised;
The pre-determined text information concrete, that access means is obtained in the info web to be comprised is obtained its corresponding weighted value according to described pre-determined text information, and obtains general comment according to described weighted value and be worth.For example, directly info web is comprised the weighted value addition of pre-determined text information correspondence, be worth to obtain this general comment, again for example, the weighted value that each info web is comprised pre-determined text information correspondence this general comment value of acquisition etc. of averaging afterwards.
Wherein, the mode of the pre-determined text information weighted value of described acquisition particular category includes but not limited to: a) weighted value corresponding with pre-determined text information by pre-stored in queried access device or other devices obtains; B) by obtaining the default relevant information corresponding with this pre-determined text, for example, the search frequency of this pre-determined text information, the ability etc. of expressing the meaning of this pre-determined text information, and the relevant informations of obtaining carry out respective handling more, and for example, summation or averaged wait and obtain.
4) the pairing weighted value of all pre-determined text information categories that described info web comprised.
Concrete, access means is obtained the pre-determined text information in the info web, according to the particular category of the pre-determined text information of being obtained, obtains the weighted value corresponding with described particular category, and obtains general comment value according to described weighted value.Wherein, the weighted value of described classification correspondence obtains by inquiring about the default weighted value information corresponding with each particular category.
Wherein, described particular category includes but not limited to:
1) natural language classification; The pre-determined text information of this natural language classification can be read by the user after webpage generates, for example, and the natural language vocabulary that is comprised in the info web or the combination of natural language vocabulary etc.;
2) address classes; For example, the URL address information that is comprised in the info web, perhaps, the link information in the object carried out that info web comprised etc.;
3) code classification; For example, the code information that can resolve according to predetermined decoding rule of browser etc.
Wherein, for other pre-determined text information of code word, access means can be determined according to the flag information that is comprised in the info web;
For the pre-determined text information of address classes, can determine by following arbitrary mode:
I) come identification address information according to identification information, and the address information that all identifications are obtained is as the pre-determined text information of address classes;
Ii) discern and to carry out object range, and identification can be carried out the pre-determined text information that the interior address information of object range is used as address classes according to identification information
Unrecognized is the text message of code classification and address classes, as the pre-determined text information of natural language classification.
Need to prove that access means also can be in conjunction with any some kinds in above-mentioned four factors, the general comment that obtains described info web is worth.For example, first pre-defined rule regulation, general comment value=∑ (Wi*Ni); Wherein, i represents the classification of pre-determined text information, Wi represents the weighted value of the pre-determined text information of particular category correspondence, Ni represents the quantity of the pre-determined text information of particular category, i.e. general comment be worth quantity by the pre-determined text information of each classification multiply by with classification corresponding class weighted value after add up and obtain.When the classification weighted value of presetting the natural language classification is 2; The classification weighted value of address classes is 4; Other classification weighted value of code word is 8, and in the pre-determined text information that access means is obtained, comprise the pre-determined text information of the pre-determined text information of 33 natural language classifications, 2 address classes and the pre-determined text information of 4 item code classifications, then access means determines that according to first pre-defined rule it is 106 that the general comment of described info web is worth.
Again for example, access means is carried out respective handling to four values that obtained after obtaining corresponding value respectively according to above-mentioned four factors again, for example, average, or squared and, or addition etc. after the weighting respectively obtains general comment value.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any based on first pre-defined rule, according to the pre-determined text information of being obtained, determine the implementation of the general comment value of described info web, all should be within the scope of the present invention.
In step S2, when detecting, access means comprises the object carried out that may trigger web page access in the described info web, judge whether this info web conforms to a predetermined condition.Wherein, described predetermined condition also comprises: described general comment is worth more than or equal to second predetermined threshold; Described second predetermined threshold should be determined according to actual conditions and demand by those skilled in the art.
Need to prove, described step S5 and step S6 both can carry out before step S2, also can be included among the step S2 and carry out, for example, in step S2, access means detect comprise in the described info web may trigger the object carried out of web page access after, execution in step S5 obtains the pre-determined text information that comprises in the described info web, follows execution in step S6, based on first pre-defined rule, according to the pre-determined text information of being obtained, determine that the general comment of described info web is worth.Subsequently, be worth, judge whether the object carried out in this new web page information is initiated corresponding web page access request based on predetermined condition and the general comment that obtained.
Then, step S3 is being described in detail with reference among the embodiment shown in Figure 1, and is contained in this by reference, repeats no more.
As one of preferred version of the present invention, wherein, also comprise step S7 (figure does not show) according to the method for present embodiment.Described predetermined condition also comprises: described general comment is worth importance degree less than described the 3rd predetermined threshold and described webpage greater than the 4th predetermined threshold.Wherein, described the 3rd predetermined threshold is smaller or equal to described second predetermined threshold, and described the 3rd predetermined threshold and the 4th predetermined threshold can be determined according to actual conditions and demand by those skilled in the art.
In step S7, access means is obtained the importance degree of the webpage under the described info web.The mode that access means is obtained this importance degree includes but not limited to: 1) obtain default and the corresponding importance degree of the described webpage of this info web; 2) obtain the relevant information of obtaining in advance corresponding with this info web, by relevant information is carried out respective handling, for example, directly the value with a factor correspondence comprising in the relevant information is used as this importance degree, perhaps, with the value addition of every factor correspondence of comprising in the relevant information average or weighting summation after normalization etc., obtain this importance degree.Wherein, described relevant information comprises following at least one: 1) the clicked number of times of webpage under the described info web; 2) the recommended number of times of webpage under the described info web; 3) authority of webpage etc. under the described info web.
Abovementioned steps S7 can be after step S1, carries out before the step S2; Perhaps, step S7 is included among the step S2, and when in step S2, access means judges that described general comment is worth less than carrying out behind second predetermined threshold, subsequently, access means is judged to be worth according to the importance degree that info web obtained and general comment and whether is satisfied described general comment and be worth less than the importance degree of described the 3rd predetermined threshold and the described webpage predetermined condition greater than the 4th predetermined threshold, to determine whether execution in step S3 according to judged result.
As one of preferred version of the present invention, wherein, also comprise step S8 (figure does not show) according to the method for present embodiment.
In step S8, access means is based on second pre-defined rule, obtains corresponding each individual event evaluation of estimate respectively according to the pre-determined text information of each particular category that described info web comprised.Wherein, described second pre-defined rule comprises and determines described individual event evaluation of estimate according to following at least one factor:
1) the corresponding respectively quantity of the pre-determined text information of each particular category that described info web comprised;
Concrete, access means inquiry also statistics obtains the quantity of the pre-determined text information of each particular category in the info web, and according to the quantity of the pre-determined text information of described particular category, obtains the individual event evaluation of estimate of respective classes.Wherein, to include but not limited to adopt with the pre-determined text information content be the multiple function of parameter to the method for the individual event evaluation of estimate of described acquisition respective classes.
For example, second pre-defined rule regulation, natural language classification individual event evaluation of estimate is the mean value of each other pre-determined text information content of natural class of languages that info web comprised; Address classes individual event evaluation of estimate is the quantity of the address classes pre-determined text information that comprised in the address information of webpage under the info web; Code classification individual event evaluation of estimate is that the quantity of the code classification pre-determined text information that comprised in the code information of webpage under the info web multiply by the adjustment coefficient, for example, and the value after 0.5.For an info web, access means obtains in the pre-determined text information of natural language classification, and " music " appearance 8 times, " requesting song " appearance 12 times, " hot broadcast " occur 4 times; In the pre-determined text information of address classes, " song " occurs 1 time, and " listen " occurs 1 time; In other pre-determined text information of code word, " playlist " occurs 3 times, " musicbox " occurs 4 times, " listen " occurs 2 times, then according to second pre-defined rule, it is the mean value of the occurrence number sum of each natural language pre-determined text information that access means obtains natural language classification individual event evaluation of estimate, and promptly natural language classification individual event evaluation of estimate is (8+12+4)/3=8; Address classes pre-determined text information has 2, i.e. address classes individual event evaluation of estimate is 2; Code classification individual event evaluation of estimate is (3+4+2) * 0.5=4.5.
2) the corresponding respectively weighted value of the pre-determined text information of each particular category that described info web comprised;
Concrete, the pre-determined text information of the particular category that access means is obtained in the info web to be comprised, and obtain the weighted value of described pre-determined text information correspondence, to obtain the individual event evaluation of estimate of respective classes according to this weighted value.Wherein, the mode of the pre-determined text information weighted value of described acquisition particular category includes but not limited to: the weighted value table that a) passes through the pre-determined text information of inquiry pre-stored obtains; B) obtain the corresponding relevant information of pre-determined text information that obtain in advance and this particular category, by relevant information is carried out respective handling, for example, directly the value with a factor correspondence comprising in the relevant information is used as this importance degree, perhaps, with the value addition of every factor correspondence of comprising in the relevant information average or weighting summation after normalization etc., obtain this importance degree.
For example, second pre-defined rule regulation obtains after the weighted value addition of each individual event evaluation of estimate by each particular category pre-determined text information that info web comprised.If the pre-determined text information that access means is obtained at step S5 comprises that pre-determined text information " music ", " requesting song " of natural language classification go out and " hot broadcast "; The pre-determined text information " song " of address classes reaches " listen "; Other pre-determined text information " playlist " of code word, " musicbox " reach " listen ", and access means is by the weighted value table of each default pre-determined text information correspondence of inquiry, and it is as follows to obtain the classification weighted value:
In the natural language classification, " music " weighted value is 0.5, and the requesting song weighted value is 1, and hot broadcast is 1.2;
In the address classes, " song " weighted value is 1.1, and " listen " weighted value is 1.6;
In the code classification, " playlist " weighted value is 2.1, and " musicbox " weighted value is 1.4, and " listen " weighted value is 1.2;
Then access means is according to second pre-defined rule, and it is as follows to obtain each individual event weight:
Natural language classification individual event weighted value=0.5+1+1.2=2.7;
Address classes individual event weighted value=1.1+1.6=2.7;
Code classification individual event weighted value=2.1+1.4+1.2=4.7.
Need to prove, access means also can obtain each individual event weighted value in conjunction with above-mentioned two factors, for example, if second pre-defined rule regulation, the individual event weighted value multiply by the value of addition gained behind its weighted value for the occurrence number of each pre-determined text information, then access means is according to second pre-defined rule, calculate the individual event weighted value of the weighted sum of pre-determined text information under each classification as respective classes, the pre-determined text information of each particular category of access means gained and occurrence number thereof and weighted value are shown in above-mentioned two embodiment, and it is as follows that then access means obtains each individual event mean value respectively:
Natural language classification individual event weighted value is 8*0.5+12*1+4*1.2=20.8;
Address classes individual event weighted value is 1*1.1+1*1.6=2.7;
Code classification individual event weighted value is 3*2.1+4*1.4+2*1.2=14.3.
In step S2, when detecting, access means comprises the object carried out that may trigger web page access in the described info web, judge whether this info web conforms to a predetermined condition.Wherein, described predetermined condition also comprises: described general comment is worth greater than the 5th predetermined threshold, and described each individual event evaluation of estimate is all greater than corresponding each predetermined threshold; Described the 5th predetermined threshold can be determined according to actual conditions and demand by those skilled in the art, but described the 5th predetermined threshold choose the condition that should satisfy more than or equal to described second predetermined threshold.
For example, the predetermined threshold of default natural language classification individual event evaluation of estimate is 12 in the access means, the predetermined threshold of address classes individual event evaluation of estimate is 1, the predetermined threshold of code classification individual event evaluation of estimate is 10, it is 76 that general comment is worth, and it is 106 that the general comment that access means is obtained in step S6 is worth, the natural language classification individual event weighted value that is obtained in step S8 is 20.8, address classes individual event weighted value is 2.7, code classification individual event weighted value is 14.3, then access means judges that described general comment is worth greater than the 5th predetermined threshold, and described each individual event evaluation of estimate is all greater than corresponding each predetermined threshold, and then described info web conforms to a predetermined condition.
Again for example, the predetermined threshold of default natural language classification individual event evaluation of estimate is 12 in the access means, the predetermined threshold of address classes individual event evaluation of estimate is 5, the predetermined threshold of code classification individual event evaluation of estimate is 10, it is 76 that general comment is worth, and it is 106 that the general comment that access means is obtained in step S6 is worth, the natural language classification individual event weighted value that is obtained in step S8 is 20.8, address classes individual event weighted value is 2.7, code classification individual event weighted value is 14.3, then access means is judged address classes individual event weighted value less than its predetermined threshold, and described info web does not meet predetermined condition.
Abovementioned steps S8 can be after step S1, carries out before the step S2; Perhaps, step S8 is included among the step S2 and carries out, for example, in step S2 access means detect comprise in the described info web may trigger the object carried out of web page access after, access means execution in step S6 and step S8 are worth and each individual event evaluation of estimate to obtain general comment, subsequently, access means judges according to a general comment value that info web obtained and an individual event evaluation of estimate whether satisfy general comment value greater than the 5th predetermined threshold, and described each individual event evaluation of estimate is all greater than corresponding each predetermined threshold, to determine whether execution in step S3 according to judged result.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any based on second pre-defined rule, pre-determined text information according to each particular category that described info web comprised obtains the implementation of each individual event evaluation of estimate accordingly respectively, all should be within the scope of the present invention.
As one of preferred version of the present invention, wherein, described method also comprises step S9 (figure does not show).
In step S9, access means is by training to determine described predetermined condition in advance to a plurality of webpages.Wherein, the mode of described pre-training includes but not limited to adopt following disaggregated model to realize: 1) supporting vector machine model; 2) Bayesian model; 3) maximum entropy model etc.Described predetermined condition comprises the info web of info web for need repeatedly visiting that the disaggregated model judgement is obtained.
Concrete, access means obtains a plurality of webpage and a plurality of webpages that have been defined as only need initiating a request of access that have been defined as need initiating repeatedly request of access, subsequently, access means is come the disaggregated model training according to these a plurality of webpages, with the disaggregated model after the acquisition training.Then when in step S2, comprise in the time of to carry out object in the access means judgement info web, if the classification results that disaggregated model is exported according to info web is the info web that needs are repeatedly visited, then access means judges that info web conforms to a predetermined condition, and execution in step S3.
Method according to present embodiment, access means is according to the predetermined condition of a plurality of levels, coming whether may trigger next web-page requests to info web judges, make that the result who judges is more accurate, more effectively improve web page access efficient, avoid the wasting of resources and the time loss that are caused because of unnecessary web page access request.
Fig. 4 has illustrated the structural representation of the access means that is used for definite web page access request of one aspect of the invention.
First deriving means 1 obtains info web.Wherein, the described mode of obtaining info web includes but not limited to: 1) first deriving means 1 comes web page server to correspondence to carry out a webpage request of obtaining according to the address information of being obtained to obtain; 2) first deriving means, 1 reading and saving is in access means or with the access means physical separation but the info web in the device that communicates to connect.
Comprise the object carried out that may trigger web page access in the described info web when judgment means 2 detects, judge whether this info web conforms to a predetermined condition.Wherein, the described object carried out that may trigger web page access comprises the object based on Java, JS, Ajax and/or VBscript; Whether described detection mode includes but not limited to analyze and comprises in the info web that is obtained and can carry out the identification information that corresponding code of object or label etc. can be carried out object.
For example, predetermined condition is to comprise in the info web to comprise following arbitrary character string among the URL of webpage of keyword " audition " and info web correspondence: " mp3 ", " rm ", " wma " or " ape ".Judgment means 2 detects acquired info web, and inquiry obtains the label of javascript in info web, then comprise the object the carried out JS that may trigger web page access in the described info web of judgment means 2 judgements, subsequently, 2 pairs of info webs of judgment means and described info web the URL of corresponding webpage analyze, inquiry obtains keyword " audition " in info web, and inquiry obtains character string " wma " in described URL, and then judgment means 2 judges that described info web conforms to a predetermined condition.
Again for example, predetermined condition is to comprise character string " playlist " and character string " object " in the info web code simultaneously.Judgment means 2 is inquired about the code corresponding with carrying out object in acquired info web, and the label of acquisition Vbscript, then comprise the object the carried out Vbscript that may trigger web page access in the described info web of judgment means 2 judgements, and continue to judge whether this info web conforms to a predetermined condition, judge promptly whether this info web comprises character string " playlist " and character string " object " simultaneously.In the code section of judgment means 2, inquire character string " playlist " and character string " object " at info web, be to have comprised character string " playlist " and character string " object " simultaneously in the code information that comprises of info web, then judgment means 2 judges that info webs conform to a predetermined condition.Wherein, judgment means 2 can be determined code section in the info web according to the identification information in the info web.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, anyly comprise the object carried out that may trigger web page access in the described info web by detecting, judge the implementation whether this info web conforms to a predetermined condition, all should be within the scope of the present invention.
When described info web met this predetermined condition, the first request apparatus for initiating 3 initiated can carry out the corresponding web page access request of object with this.
For example, to comprising the info web that JS can carry out object, when judgment means 2 judged that meeting described info web conforms to a predetermined condition, then the first request apparatus for initiating 3 can be carried out object according to described JS, initiated JS to the server of the corresponding webpage of info web institute and asked.
Again for example, judge as judgment means 2 to comprise in the info web that conforms to a predetermined condition and to carry out object JS and can carry out object Applet that then the first request apparatus for initiating 3 is initiated the JS request respectively and Applet asks to the server of the corresponding webpage of described info web institute.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, it is any when described info web meets this predetermined condition, initiate can carry out the implementation of the corresponding web page access request of object with this, all should be within the scope of the present invention.
As one of preferred version of the present invention, wherein, also comprise second updating device (figure does not show), the 5th deriving means (figure does not show), inquiry unit (figure does not show) and the second request apparatus for initiating (figure does not show) according to access means of the present invention.
When judging that an info web meets this predetermined condition, second updating device is set up according to the address information of webpage under this info web or is upgraded the webpage class library.
Concrete, when judgment means 2 judges that an info web meets this predetermined condition, if access means has been obtained the address information of the affiliated webpage of this info web, then second updating device adds this address information or is updated in the webpage class library, if second updating device does not obtain the address information of the affiliated webpage of this info web, then access means is further obtained described address information, and described address information is added in the webpage class library.
The 5th deriving means obtains the address information of new web page information and affiliated webpage thereof.
Concrete, the mode of obtaining the address information of described new web page information and affiliated webpage thereof includes but not limited to: 1) the 5th deriving means is by obtaining info web in the default info web storehouse, and obtains the address information of webpage under the described info web according to info web search; 2) the 5th deriving means is by obtaining info web in the default info web storehouse, and inquires about the address information that obtains webpage under the described info web in existing and the corresponding database of info web; 3) the 5th deriving means obtains described address information, and obtains described new web page information etc. according to described address information.
Then, inquiry unit is inquired about in described webpage class library based on the address information of the webpage that is obtained, to obtain Query Result.
Then, when Query Result was coupling, the second request apparatus for initiating was initiated corresponding web page access request to the object carried out in this new web page information.Wherein, described coupling represents that the address information of webpage under the described new web page information and the address information in the webpage class library mate fully.
Concrete, when the address information that in the webpage class library, inquires webpage under the new web page information, then the second request apparatus for initiating further obtains all kinds of object information of carrying out that comprised in the new web page information, and initiates corresponding web page access request according to the described information of carrying out object.
Access means according to present embodiment, for the info web of being asked, can judge fast whether its affiliated webpage is the webpage that possible trigger further request, and according to judged result, only the webpage that may trigger further request is initiated request next time, the access efficiency of raising webpage that can be bigger.For example, climb when getting info web,, can significantly reduce the web-page requests number of times, improve to climb and get express delivery, reduce the consumption of the network bandwidth by adopting the solution of the present invention in Web Spider or reptile etc.; Perhaps, when the user capture webpage, accelerate the webpage formation speed, improve user experience.
Fig. 5 has illustrated the structural representation of the access means that is used for definite web page access request of a preferred embodiment of the invention.Access means according to present embodiment comprises first deriving means 1, second deriving means 4, judgment means 2 and the first request apparatus for initiating 3.
First deriving means 1 is described in detail with reference to the embodiment shown in FIG. 4, and is contained in this by reference, repeats no more.
Second deriving means 4 obtains the quantity of the pre-determined text information that is comprised in the described info web.
Wherein, described pre-determined text information comprises following at least one class: 1) short text information; 2) short text information combination.
The mode of obtaining described pre-determined text information content includes but not limited to: the occurrence number of inquiry pre-determined text information and all pre-determined text information of accumulative total in info web.
For example, pre-determined text information comprises " song ", " audition ", " popular program request ", " mp3 ", " newly singing online "; First predetermined threshold is 10.Second deriving means 4 is inquired about aforementioned pre-determined text information in info web, the number of times that obtains appearance " song " in the info web is 5 times, the number of times that obtains appearance " audition " in the info web is 3 times, the number of times that obtains appearance " popular program request " in the info web is 3 times, and then the pre-determined text information content of second deriving means, 4 acquisitions amounts to 11.
Then, comprise the object carried out that may trigger web page access in the described info web, judge whether this info web conforms to a predetermined condition when judgment means 2 detects.Wherein, described predetermined condition comprises: the pre-determined text information content that described info web comprised is more than or equal to first predetermined threshold, and described first predetermined threshold should be determined according to actual conditions and demand by those skilled in the art.
For example, if first predetermined threshold is 10, the pre-determined text information content of gained is 11 in abovementioned steps S4, and then judgment means 2 judges that the info web that obtains conforms to a predetermined condition.
Need to prove, described second deriving means 4 obtains the operation of the quantity of the pre-determined text information that is comprised in the described info web and both can carry out before judgment means 2 is carried out determining step, carry out in the process of execution determining step that also can judgment means 2, for example, when judgment means 2 detect comprise in the described info web may trigger the object carried out of web page access after, second deriving means 4 obtains the pre-determined text information content, subsequently, based on predetermined condition and the pre-determined text information content that obtained, judge whether the object carried out in this new web page information is initiated corresponding web page access request.
What need further specify is, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any implementation of obtaining the quantity of the pre-determined text information that is comprised in the described info web all should be within the scope of the present invention.
The first request apparatus for initiating 3 is being described in detail with reference among the embodiment shown in Figure 4, and is contained in this by reference, repeats no more.
Fig. 6 has illustrated the structural representation of the access means that is used for definite web page access request of another preferred embodiment of the present invention.Access means according to present embodiment comprises first deriving means 1, the 3rd deriving means 5, first definite device 6, judgment means 2 and the first request apparatus for initiating 3.
First deriving means 1 is being described in detail with reference among the embodiment shown in Figure 4, and is contained in this by reference, repeats no more.
The 3rd deriving means 5 obtains the pre-determined text information that comprises in the described info web.
Concrete, the described mode of obtaining pre-determined text information includes but not limited to: inquire about in info web and described pre-determined text information is added up acquisition by access means.
First determines device 6 based on first pre-defined rule, according to the pre-determined text information of being obtained, determines that the general comment of described info web is worth.
Wherein, described first pre-defined rule according to following at least each, determine that described general comment is worth:
The total quantity of the pre-determined text information that 1) described info web comprised;
Wherein, the total quantity of described pre-determined text information is the quantity sum of the pre-determined text information of all particular category.Concrete, the number of times that pre-determined text information occurs is inquired about and added up to the 3rd deriving means 5 in info web, and first determines device 6 according to the total degree that pre-determined text information in the info web occurs, and determines that the general comment of described info web is worth.For example, directly the quantitative value with described total degree correspondence is worth as general comment, perhaps, this quantitative value is carried out certain processing, for example, multiply by corresponding coefficient, perhaps, carry out normalized after, be worth as general comment.
Total classification of the pre-determined text information that 2) described info web comprised;
Concrete, first determines the quantity of device 6 according to the classification of the pre-determined text information that is comprised in the info web, determines that the general comment of described info web is worth.
For example, first pre-defined rule comprises that the quantity of total classification of the pre-determined text information that comprises according to described info web determines that general comment is worth, and for example, the categorical measure of the pre-determined text information that comprises in the info web is worth as general comment.The info web of first definite 6 pairs of acquisitions of device is analyzed, and obtains wherein to comprise short text information " song " and " broadcast " of natural language class; The short text information " gequ " of address classes, and other short text information " playmusic " of code word.First determines device 6 according to first pre-defined rule, and the general comment value that obtains described info web is 3.
3) the pairing weighted value of all pre-determined text information that described info web comprised;
Concrete, first determines that device 6 according to the pre-determined text information that is comprised in the info web, obtains corresponding weighted value, and obtains general comment value according to described weighted value.For example, directly info web is comprised the weighted value addition of pre-determined text information correspondence, be worth to obtain this general comment, again for example, the weighted value that each info web is comprised pre-determined text information correspondence this general comment value of acquisition etc. of averaging afterwards.
Wherein, the mode of the pre-determined text information weighted value of described acquisition particular category includes but not limited to: a) weighted value corresponding with pre-determined text information by pre-stored in queried access device or other devices obtains; B) by obtaining the default relevant information corresponding with this pre-determined text, for example, the search frequency of this pre-determined text information, the ability etc. of expressing the meaning of this pre-determined text information, and the relevant informations of obtaining carry out respective handling more, and for example, summation or averaged wait and obtain.
4) the pairing weighted value of all pre-determined text information categories that described info web comprised.
Concrete, first determines the particular category of the pre-determined text information that device 6 is obtained according to the 3rd deriving means 5, obtains the weighted value corresponding with described particular category, and obtains general comment according to described weighted value and be worth.Wherein, the weighted value of described classification correspondence obtains by inquiring about the default weighted value information corresponding with each particular category.
Wherein, described particular category includes but not limited to:
1) natural language classification; The pre-determined text information of this natural language classification can be read by the user after webpage generates, for example, and the natural language vocabulary that is comprised in the info web or the combination of natural language vocabulary etc.;
2) address classes; For example, the URL address information that is comprised in the info web, perhaps, the link information in the object carried out that info web comprised etc.;
3) code classification; For example, the code information that can resolve according to predetermined decoding rule of browser etc.
Wherein, for other pre-determined text information of code word, access means can be determined according to the flag information that is comprised in the info web;
For the pre-determined text information of address classes, can determine by following arbitrary mode:
I) come identification address information according to identification information, and the address information that all identifications are obtained is as the pre-determined text information of address classes;
Ii) discern and to carry out object range, and identification can be carried out the pre-determined text information that the interior address information of object range is used as address classes according to identification information
Unrecognized is the text message of code classification and address classes, as the pre-determined text information of natural language classification.
Need to prove that first determines that device 6 also can be in conjunction with any some kinds in above-mentioned four factors, the general comment that obtains described info web is worth.For example, first pre-defined rule regulation, general comment value=∑ (W i* N i); Wherein, i represents the classification of pre-determined text information, W iThe weighted value of the pre-determined text information of expression particular category correspondence, N iThe quantity by the pre-determined text information of each classification of being worth the quantity of pre-determined text information of expression particular category, i.e. general comment multiply by with classification corresponding class weighted value after add up and obtain.When the classification weighted value of presetting the natural language classification is 2; The classification weighted value of address classes is 4; Other classification weighted value of code word is 8, and in the pre-determined text information that the 3rd deriving means 5 obtains, comprise the pre-determined text information of the pre-determined text information of 33 natural language classifications, 2 address classes and the pre-determined text information of 4 item code classifications, then first definite device 6 determines that according to first pre-defined rule it is 106 that the general comment of described info web is worth.
Again for example, first determine that device 6 obtains corresponding value respectively according to above-mentioned four factors after, again four values that obtained are carried out respective handling, for example, average, or squared and, or addition etc. after the weighting respectively obtains general comment value.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any based on first pre-defined rule, according to the pre-determined text information of being obtained, determine the implementation of the general comment value of described info web, all should be within the scope of the present invention.
Judgment means 2 detects and comprises the object carried out that may trigger web page access in the described info web, judges whether this info web conforms to a predetermined condition.Wherein, described predetermined condition also comprises: described general comment is worth more than or equal to second predetermined threshold; Described second predetermined threshold should be determined according to actual conditions and demand by those skilled in the art.
Need to prove, described the 3rd deriving means 5 and first determine device 6 both can be before judgment means 2 executable operations, also can carry out executable operations in the process of decision operation in judgment means 2, for example, judgment means 2 detect comprise in the described info web may trigger the object carried out of web page access after, the 3rd deriving means 5 obtains the pre-determined text information that comprises in the described info web, then first determines that device 6 is based on first pre-defined rule, according to the pre-determined text information of being obtained, determine that the general comment of described info web is worth.Subsequently, be worth, judge whether the object carried out in this new web page information is initiated corresponding web page access request based on predetermined condition and the general comment that obtained.
Then, first request generating device 3 is being described in detail with reference among the embodiment shown in Figure 4, and is contained in this by reference, repeats no more.
As one of preferred version of the present invention, wherein, also comprise the 4th deriving means (figure does not show) according to the device of present embodiment.Described predetermined condition also comprises: described general comment is worth importance degree less than described the 3rd predetermined threshold and described webpage greater than the 4th predetermined threshold.Wherein, described the 3rd predetermined threshold is smaller or equal to described second predetermined threshold, and described the 3rd predetermined threshold and the 4th predetermined threshold can be determined according to actual conditions and demand by those skilled in the art.
The 4th deriving means obtains the importance degree of the webpage under the described info web.The mode that the 4th deriving means obtains this importance degree includes but not limited to: 1) obtain default and the corresponding importance degree of the described webpage of this info web; 2) obtain the relevant information of obtaining in advance corresponding with this info web, by relevant information is carried out respective handling, for example, directly the value with a factor correspondence comprising in the relevant information is used as this importance degree, perhaps, with the value addition of every factor correspondence of comprising in the relevant information average or weighting summation after normalization etc., obtain this importance degree.Wherein, described relevant information comprises following at least one: 1) the clicked number of times of webpage under the described info web; 2) the recommended number of times of webpage under the described info web; 3) authority of webpage etc. under the described info web.
The 4th deriving means can be after first deriving means 1, executable operations before the judgment means 2; Perhaps, the 4th deriving means can be in the process of judgment means 2 executable operations, and executable operations after being worth less than second predetermined threshold in the described general comment of judgment means 2 judgements, subsequently, access means is judged to be worth according to the importance degree that info web obtained and general comment and whether is satisfied described general comment and be worth less than the importance degree of described the 3rd predetermined threshold and the described webpage predetermined condition greater than the 4th predetermined threshold, to determine whether to carry out the operation of the first request apparatus for initiating 3 according to judged result.
As one of preferred version of the present invention, wherein, comprise also that according to the access means of present embodiment second determines device (figure does not show).
Second determines device based on second pre-defined rule, obtains corresponding each individual event evaluation of estimate respectively according to the pre-determined text information of each particular category that described info web comprised.Wherein, described second pre-defined rule comprises and determines described individual event evaluation of estimate according to following at least one factor:
1) the corresponding respectively quantity of the pre-determined text information of each particular category that described info web comprised;
Concrete, second determines the quantity of device according to the pre-determined text information of each particular category in the info web, and according to the quantity of the pre-determined text information of described particular category, obtains the individual event evaluation of estimate of respective classes.Wherein, to include but not limited to adopt with the pre-determined text information content be the multiple function of parameter to the method for the individual event evaluation of estimate of described acquisition respective classes.
For example, second pre-defined rule regulation, natural language classification individual event evaluation of estimate is the mean value of each other pre-determined text information content of natural class of languages that info web comprised; Address classes individual event evaluation of estimate is the quantity of the address classes pre-determined text information that comprised in the address information of webpage under the info web; Code classification individual event evaluation of estimate is that the quantity of the code classification pre-determined text information that comprised in the code information of webpage under the info web multiply by the adjustment coefficient, for example, and the value after 0.5.For an info web, the 3rd deriving means 5 obtains in the pre-determined text information of natural language classification, and " music " appearance 8 times, " requesting song " appearance 12 times, " hot broadcast " occur 4 times; In the pre-determined text information of address classes, " song " occurs 1 time, and " listen " occurs 1 time; In other pre-determined text information of code word, " playlist " occurs 3 times, " musicbox " occurs 4 times, " listen " occurs 2 times, then according to second pre-defined rule, second determines that it is the mean value of the occurrence number sum of each natural language pre-determined text information that device obtains natural language classification individual event evaluation of estimate, and promptly natural language classification individual event evaluation of estimate is (8+12+4)/3=8; Address classes pre-determined text information has 2, i.e. address classes individual event evaluation of estimate is 2; Code classification individual event evaluation of estimate is (3+4+2) * 0.5=4.5.
2) the corresponding respectively weighted value of the pre-determined text information of each particular category that described info web comprised;
Concrete, second determines the pre-determined text information of the particular category that device obtains in the info web to be comprised, and obtains the weighted value of described pre-determined text information correspondence, to obtain the individual event evaluation of estimate of respective classes according to this weighted value.Wherein, the mode of the pre-determined text information weighted value of described acquisition particular category includes but not limited to: the weighted value table that a) passes through the pre-determined text information of inquiry pre-stored obtains; B) obtain the corresponding relevant information of pre-determined text information that obtain in advance and this particular category, by relevant information is carried out respective handling, for example, directly the value with a factor correspondence comprising in the relevant information is used as this importance degree, perhaps, with the value addition of every factor correspondence of comprising in the relevant information average or weighting summation after normalization etc., obtain this importance degree.
For example, second pre-defined rule regulation obtains after the weighted value addition of each individual event evaluation of estimate by each particular category pre-determined text information that info web comprised.If the pre-determined text information that the 3rd deriving means 5 is obtained comprises that pre-determined text information " music ", " requesting song " of natural language classification go out and " hot broadcast "; The pre-determined text information " song " of address classes reaches " listen "; Other pre-determined text information " playlist " of code word, " musicbox " reach " listen ", and second determines the weighted value table of device by each default pre-determined text information correspondence of inquiry, and it is as follows to obtain the classification weighted value:
In the natural language classification, " music " weighted value is 0.5, and the requesting song weighted value is 1, and hot broadcast is 1.2;
In the address classes, " song " weighted value is 1.1, and " listen " weighted value is 1.6;
In the code classification, " playlist " weighted value is 2.1, and " musicbox " weighted value is 1.4, and " listen " weighted value is 1.2;
Then second determines device according to second pre-defined rule, and it is as follows to obtain each individual event weight:
Natural language classification individual event weighted value=0.5+1+1.2=2.7;
Address classes individual event weighted value=1.1+1.6=2.7;
Code classification individual event weighted value=2.1+1.4+1.2=4.7.
Need to prove, second determines that device also can obtain each individual event weighted value in conjunction with above-mentioned two factors, for example, if second pre-defined rule regulation, the individual event weighted value multiply by the value of addition gained behind its weighted value for the occurrence number of each pre-determined text information, then second determines that device is according to second pre-defined rule, calculate the individual event weighted value of the weighted sum of pre-determined text information under each classification as respective classes, the pre-determined text information of each particular category of the 3rd deriving means 5 gained and occurrence number thereof and weighted value are shown in above-mentioned two embodiment, and then second to determine that device obtains each individual event mean value respectively as follows:
Natural language classification individual event weighted value is 8*0.5+12*1+4*1.2=20.8;
Address classes individual event weighted value is 1*1.1+1*1.6=2.7;
Code classification individual event weighted value is 3*2.1+4*1.4+2*1.2=14.3.
Judgment means 2 detects and comprises the object carried out that may trigger web page access in the described info web, judges whether this info web conforms to a predetermined condition.Wherein, described predetermined condition also comprises: described general comment is worth greater than the 5th predetermined threshold, and described each individual event evaluation of estimate is all greater than corresponding each predetermined threshold; Described the 5th predetermined threshold can be determined according to actual conditions and demand by those skilled in the art, but described the 5th predetermined threshold choose the condition that should satisfy more than or equal to described second predetermined threshold.
For example, the predetermined threshold of default natural language classification individual event evaluation of estimate is 12 in the access means, the predetermined threshold of address classes individual event evaluation of estimate is 1, the predetermined threshold of code classification individual event evaluation of estimate is 10, it is 76 that general comment is worth, and first determines that the general comment value that device 6 is obtained is 106, second determines that the natural language classification individual event weighted value that device is obtained is 20.8, address classes individual event weighted value is 2.7, code classification individual event weighted value is 14.3, then judgment means 2 judges that described general comment is worth greater than the 5th predetermined threshold, and described each individual event evaluation of estimate is all greater than corresponding each predetermined threshold, and then described info web conforms to a predetermined condition.
Again for example, the predetermined threshold of default natural language classification individual event evaluation of estimate is 12, the predetermined threshold of address classes individual event evaluation of estimate is 5, the predetermined threshold of code classification individual event evaluation of estimate is 10, it is 76 that general comment is worth, and first determines that it is 106 that the general comment that is obtained in the device 6 is worth, second determines that the natural language classification individual event weighted value that device is obtained is 20.8, address classes individual event weighted value is 2.7, code classification individual event weighted value is 14.3, then judgment means 2 is judged address classes individual event weighted value less than its predetermined threshold, and described info web does not meet predetermined condition.
Second determines that device can be after first deriving means 1, executable operations before the judgment means 2; Perhaps, second determines that device can carry out executable operations in the process of decision operation in judgment means 2, for example, judgment means 2 detect comprise in the described info web may trigger the object carried out of web page access after, determine that by first the operation that device 6 and second determines that device is carried out is worth and each individual event evaluation of estimate to obtain general comment, subsequently, judgment means 2 judges according to a general comment value that info web obtained and an individual event evaluation of estimate whether satisfy general comment value greater than the 5th predetermined threshold, and described each individual event evaluation of estimate is all greater than corresponding each predetermined threshold, to determine whether to carry out the operation of the first request apparatus for initiating 3 according to judged result.
Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any based on second pre-defined rule, pre-determined text information according to each particular category that described info web comprised obtains the implementation of each individual event evaluation of estimate accordingly respectively, all should be within the scope of the present invention.
As one of preferred version of the present invention, wherein, described access means also comprises first updating device (figure does not show).
First updating device is by training to determine described predetermined condition in advance according to a plurality of webpages.Wherein, the mode of described pre-training includes but not limited to adopt following disaggregated model to realize: 1) supporting vector machine model; 2) Bayesian model; 3) maximum entropy model etc.Described predetermined condition comprises the info web of info web for need repeatedly visiting that the disaggregated model judgement is obtained.
Concrete, access means obtains a plurality of webpage and a plurality of webpages that have been defined as only need initiating a request of access that have been defined as need initiating repeatedly request of access, subsequently, first updating device comes the disaggregated model training according to these a plurality of webpages, with the disaggregated model after the acquisition training.Then judge and comprise in the info webs can carry out object the time when judgment means 2, if the classification results that disaggregated model is exported according to info web is the info web that needs are repeatedly visited, then judgment means 2 judges that info web conforms to a predetermined condition, and carries out the operation of the first request apparatus for initiating 3.
Access means according to present embodiment, predetermined condition according to the multilayer level, coming whether may trigger next web-page requests to info web judges, make that the result who judges is more accurate, more effectively improve web page access efficient, avoid the wasting of resources and the time loss that are caused because of unnecessary web page access request.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned one exemplary embodiment, and under the situation that does not deviate from spirit of the present invention or essential characteristic, can realize the present invention with other concrete form.Therefore, no matter from which point, all should regard embodiment as exemplary, and be nonrestrictive, scope of the present invention is limited by claims rather than above-mentioned explanation, therefore is intended to be included in the present invention dropping on the implication that is equal to important document of claim and all changes in the scope.Any Reference numeral in the claim should be considered as limit related claim.In addition, obviously other unit or step do not got rid of in " comprising " speech, and odd number is not got rid of plural number.A plurality of unit of stating in system's claim or device also can be realized by software or hardware by a unit or device.The first, the second word such as grade is used for representing title, and does not represent any specific order.

Claims (27)

1. computer implemented method that is used for determining the web page access request, wherein, this method may further comprise the steps:
A obtains info web;
B comprises the object carried out that may trigger web page access when detecting in the described info web, judge whether this info web conforms to a predetermined condition;
-when described info web meets this predetermined condition, initiate can carry out the corresponding web page access request of object with this.
2. method according to claim 1, wherein, this method is further comprising the steps of:
-obtain the quantity of the pre-determined text information that is comprised in the described info web;
Wherein, described predetermined condition comprises:
-pre-determined text information content that described info web comprised is more than or equal to first predetermined threshold.
3. method according to claim 1, wherein, this method is further comprising the steps of:
-obtain the pre-determined text information that comprises in the described info web;
-based on first pre-defined rule,, determine that the general comment of described info web is worth according to the pre-determined text information of being obtained;
Wherein, described predetermined condition also comprises:
-described general comment is worth more than or equal to second predetermined threshold.
4. method according to claim 3, wherein, this method is further comprising the steps of:
-obtain the importance degree of the webpage under the described info web;
Wherein, described predetermined condition also comprises:
-described general comment is worth importance degree less than described the 3rd predetermined threshold and described webpage greater than the 4th predetermined threshold, and wherein, described the 3rd predetermined threshold is smaller or equal to described second predetermined threshold.
5. according to claim 3 or 4 described methods, wherein, described first pre-defined rule according to following at least each, determine that described general comment is worth:
The total quantity of-pre-determined text the information that described info web comprised;
The total classification of-pre-determined text information that described info web comprised;
-the pairing weighted value of all pre-determined text information that described info web comprised;
-the pairing weighted value of all pre-determined text information categories that described info web comprised.
6. according to each described method in the claim 3 to 5, wherein, this method is further comprising the steps of:
-based on second pre-defined rule, obtain corresponding each individual event evaluation of estimate respectively according to the pre-determined text information of each particular category that described info web comprised;
Wherein, described predetermined condition also comprises:
-described general comment is worth greater than the 5th predetermined threshold, and described each individual event evaluation of estimate is all greater than corresponding each predetermined threshold.
7. method according to claim 6, wherein, described second pre-defined rule comprises, according to following at least one factor, determines described individual event evaluation of estimate:
The quantity that the pre-determined text information of-each particular category that described info web comprised is corresponding respectively;
The weighted value that the pre-determined text information of-each particular category that described info web comprised is corresponding respectively.
8. according to each described method in the claim 5 to 7, wherein, described classification comprises:
-natural language classification;
-address classes;
-code classification.
9. according to each described method in the claim 2 to 8, wherein, described pre-determined text information comprises following at least one class:
-short text information;
-short text information combination.
10. according to each described method in the claim 1 to 9, wherein, this method is further comprising the steps of:
-by train to determine described predetermined condition in advance according to a plurality of webpages.
11. according to each described method in the claim 1 to 10, wherein, this method is further comprising the steps of:
-when judging that an info web meets this predetermined condition, set up or upgrade the webpage class library according to the address information of webpage under this info web.
12. method according to claim 11, wherein, this method may further comprise the steps:
-obtain the address information of new web page information and affiliated webpage thereof;
-inquire about in described webpage class library based on the address information of the webpage that is obtained, to obtain Query Result;
-when described Query Result is coupling, the object carried out in this new web page information is sent out
Play corresponding web page access request.
13. according to each described method in the claim 1 to 12, wherein, the described object of carrying out comprises object based on Java, JS, Ajax and/or VBscript.
14. an access means that is used for determining the web page access request, wherein, this access means comprises:
First deriving means, be used to obtain info web;
Judgment means, be used for comprising the object carried out that may trigger web page access when detecting described info web, judge whether this info web conforms to a predetermined condition;
The first request apparatus for initiating, be used for when described info web meets this predetermined condition, initiate and this can carry out the corresponding web page access request of object.
15. access means according to claim 14, wherein, this access means also comprises:
Second deriving means, be used for obtaining the quantity of the pre-determined text information that described info web comprises;
Wherein, described predetermined condition comprises:
-pre-determined text information content that described info web comprised is more than or equal to first predetermined threshold.
16. access means according to claim 14, wherein, this access means also comprises:
The 3rd deriving means, be used for obtaining the pre-determined text information that described info web comprises;
First determine device, be used for,, determine the general comment value of described info web according to the pre-determined text information of being obtained based on first pre-defined rule;
Wherein, described predetermined condition also comprises:
-described general comment is worth more than or equal to second predetermined threshold.
17. access means according to claim 16, wherein, this access means also comprises:
The 4th deriving means, be used to obtain the importance degree of the webpage under the described info web;
Wherein, described predetermined condition also comprises:
-described general comment is worth importance degree less than described the 3rd predetermined threshold and described webpage greater than the 4th predetermined threshold, and wherein, described the 3rd predetermined threshold is smaller or equal to described second predetermined threshold.
18. according to claim 16 or 17 described access means, wherein, described first pre-defined rule comprises, according to following at least each, determine that described general comment is worth:
The total quantity of-pre-determined text the information that described info web comprised;
The total classification of-pre-determined text information that described info web comprised;
-the pairing weighted value of all pre-determined text information that described info web comprised;
-the pairing weighted value of all pre-determined text information categories that described info web comprised.
19. according to each described access means in the claim 16 to 18, wherein, this access means also comprises:
Second determine device, be used for, obtain corresponding each individual event evaluation of estimate respectively according to the pre-determined text information of each particular category that described info web comprised based on second pre-defined rule;
Wherein, described predetermined condition also comprises:
-described general comment is worth greater than the 5th predetermined threshold, and described each individual event evaluation of estimate is all greater than corresponding each predetermined threshold.
20. access means according to claim 19, wherein, described second pre-defined rule is determined described individual event evaluation of estimate according to following at least one factor:
The quantity that the pre-determined text information of-each particular category that described info web comprised is corresponding respectively;
The weighted value that the pre-determined text information of-each particular category that described info web comprised is corresponding respectively.
21. according to each described access means in the claim 18 to 20, wherein, described classification comprises:
-natural language classification;
-address classes;
-code classification.
22. according to each described access means in the claim 15 to 21, wherein, described pre-determined text information comprises following at least one class:
-short text information;
-short text information combination.
23. according to each described access means in the claim 14 to 21, wherein, this access means also comprises:
First updating device, be used for by train to determine described predetermined condition in advance according to a plurality of webpages.
24. according to each described access means in the claim 14 to 23, wherein, this access means also comprises:
Second updating device, be used for when judging that an info web meets this predetermined condition, set up or upgrade the webpage class library according to the address information of webpage under this info web.
25. access means according to claim 24, wherein, this access means comprises:
The 5th deriving means, be used for when obtaining the address information of new web page information and affiliated webpage thereof;
Inquiry unit, be used for inquiring about at described webpage class library, to obtain Query Result based on the address information of the webpage that is obtained;
The second request apparatus for initiating, be used for when described Query Result when mating, the object carried out in this new web page information is initiated corresponding web page access request.
26. according to each described access means in the claim 14 to 25, wherein, the described object of carrying out comprises object based on Java, JS, Ajax and/or VBscript.
27. a computer equipment, wherein, this computer equipment comprises as each described access means in the claim 14 to 16.
CN2011100067722A 2011-01-13 2011-01-13 Method, device and equipment for determining web access requests Pending CN102073728A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100067722A CN102073728A (en) 2011-01-13 2011-01-13 Method, device and equipment for determining web access requests

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100067722A CN102073728A (en) 2011-01-13 2011-01-13 Method, device and equipment for determining web access requests

Publications (1)

Publication Number Publication Date
CN102073728A true CN102073728A (en) 2011-05-25

Family

ID=44032267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100067722A Pending CN102073728A (en) 2011-01-13 2011-01-13 Method, device and equipment for determining web access requests

Country Status (1)

Country Link
CN (1) CN102073728A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831148A (en) * 2012-06-19 2012-12-19 北京奇虎科技有限公司 Method and device for loading recommended data based on browser
WO2013159246A1 (en) * 2012-04-28 2013-10-31 Hewlett-Packard Development Company, L.P. Detecting valuable sections in webpage
CN104023409A (en) * 2013-02-28 2014-09-03 腾讯科技(深圳)有限公司 Network connection method and system
US9137394B2 (en) 2011-04-13 2015-09-15 Hewlett-Packard Development Company, L.P. Systems and methods for obtaining a resource
US9152357B2 (en) 2011-02-23 2015-10-06 Hewlett-Packard Development Company, L.P. Method and system for providing print content to a client
US9182932B2 (en) 2007-11-05 2015-11-10 Hewlett-Packard Development Company, L.P. Systems and methods for printing content associated with a website
US9489161B2 (en) 2011-10-25 2016-11-08 Hewlett-Packard Development Company, L.P. Automatic selection of web page objects for printing
US9773214B2 (en) 2012-08-06 2017-09-26 Hewlett-Packard Development Company, L.P. Content feed printing
US10082992B2 (en) 2014-12-22 2018-09-25 Hewlett-Packard Development Company, L.P. Providing a print-ready document

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040003248A1 (en) * 2002-06-26 2004-01-01 Microsoft Corporation Protection of web pages using digital signatures
CN1747394A (en) * 2004-09-09 2006-03-15 英业达股份有限公司 Studying monitoring system and method by instant communication tool
CN101231661A (en) * 2008-02-19 2008-07-30 上海估家网络科技有限公司 Method and system for digging object grade knowledge
CN101515300A (en) * 2009-04-02 2009-08-26 阿里巴巴集团控股有限公司 Method and system for grabbing Ajax webpage content
CN101697156A (en) * 2009-10-29 2010-04-21 孟智平 Method and system for constructing chain web pages

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040003248A1 (en) * 2002-06-26 2004-01-01 Microsoft Corporation Protection of web pages using digital signatures
CN1747394A (en) * 2004-09-09 2006-03-15 英业达股份有限公司 Studying monitoring system and method by instant communication tool
CN101231661A (en) * 2008-02-19 2008-07-30 上海估家网络科技有限公司 Method and system for digging object grade knowledge
CN101515300A (en) * 2009-04-02 2009-08-26 阿里巴巴集团控股有限公司 Method and system for grabbing Ajax webpage content
CN101697156A (en) * 2009-10-29 2010-04-21 孟智平 Method and system for constructing chain web pages

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9182932B2 (en) 2007-11-05 2015-11-10 Hewlett-Packard Development Company, L.P. Systems and methods for printing content associated with a website
US9152357B2 (en) 2011-02-23 2015-10-06 Hewlett-Packard Development Company, L.P. Method and system for providing print content to a client
US9137394B2 (en) 2011-04-13 2015-09-15 Hewlett-Packard Development Company, L.P. Systems and methods for obtaining a resource
US9489161B2 (en) 2011-10-25 2016-11-08 Hewlett-Packard Development Company, L.P. Automatic selection of web page objects for printing
WO2013159246A1 (en) * 2012-04-28 2013-10-31 Hewlett-Packard Development Company, L.P. Detecting valuable sections in webpage
CN102831148A (en) * 2012-06-19 2012-12-19 北京奇虎科技有限公司 Method and device for loading recommended data based on browser
US9773214B2 (en) 2012-08-06 2017-09-26 Hewlett-Packard Development Company, L.P. Content feed printing
CN104023409A (en) * 2013-02-28 2014-09-03 腾讯科技(深圳)有限公司 Network connection method and system
CN104023409B (en) * 2013-02-28 2018-03-27 腾讯科技(深圳)有限公司 Method for connecting network and system
US10082992B2 (en) 2014-12-22 2018-09-25 Hewlett-Packard Development Company, L.P. Providing a print-ready document

Similar Documents

Publication Publication Date Title
CN102073728A (en) Method, device and equipment for determining web access requests
CN106415537B (en) Locally applied search result is inserted into WEB search result
US20180113933A1 (en) Systems and methods for measuring the semantic relevance of keywords
CN102298614B (en) Method for determining collection category of page collection information and device and equipment
WO2017071251A1 (en) Information pushing method and device
CN105069099B (en) A kind of information recommendation method and system
CN105718184A (en) Data processing method and apparatus
CN112136127B (en) Action indicator for search operation output element
CN105210051A (en) Estimating visibility of content items
WO2014194689A1 (en) Method, server, browser, and system for recommending text information
KR20170010004A (en) Automated click type selection for content performance optimization
CN104536980A (en) To-be-commented item quality information determination method and device
CN101957834A (en) Content recommending method and device based on user characteristics
CN102446191A (en) Method for generating webpage content abstracts and equipment and system adopting same
US20210089606A1 (en) Resource locator remarketing
US9898748B1 (en) Determining popular and trending content characteristics
CN112182351B (en) News recommendation method and device based on multi-feature fusion
CN102035883A (en) Method and device for optimizing webpage in network equipment
CN103699669A (en) Method for message pushing in browser and browser terminal
US11249993B2 (en) Answer facts from structured content
CN103279516A (en) Web spider identification method
CN103699603A (en) Information recommendation method and system based on user behaviors
US20200356569A1 (en) Triggering local extensions based on inferred intent
US8429535B2 (en) Client utility interaction analysis
CN112925900B (en) Search information processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110525