Embodiment
Exemplary embodiment of the present invention is described below with reference to accompanying drawings in more detail.Although shown exemplary embodiment of the present invention in accompanying drawing, yet should be appreciated that and can realize the present invention and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order to understand the present invention more thoroughly that these embodiment are provided, and can with scope of the present invention complete convey to those skilled in the art.
Fig. 2 is that it comprises: step 21 according to the flow chart of the resource off-line method for down loading of one embodiment of the invention: the network attribute of the resource of off-line download is determined the Virtual network operator that described resource belongs to as required; Step 22:, according to the task regulating strategy of setting, select the off-line download server from the off-line download server cluster of described Virtual network operator, wherein, described off-line download server is used for that resource is carried out off-line and downloads; Step 23: the off-line downloading task of described resource is distributed to selected off-line download server to carry out off-line, download.
Hence one can see that, and while according to the technical scheme of this embodiment, carrying out the download of resource off-line, the network attribute of the resource of off-line download is determined the Virtual network operator that it belongs at first as required, determines namely which Virtual network operator is the resource of wanting off-line to download belong to.After determining Virtual network operator,, according to the task regulating strategy of setting, select the off-line download server from the off-line download server cluster of determined Virtual network operator.After choosing the off-line download server, just the off-line downloading task of this resource can be distributed to this off-line download server, by this off-line download server, this resource be carried out off-line and download.The Virtual network operator that the resource that needs just off-line to download due to determined Virtual network operator in this programme belongs to, so the problem of having avoided across a network operator to download.And the network operation business who belongs to from resource downloads this resource can significantly improve the speed that off-line is downloaded, and reduces the pressure of off-line download server.
According to one embodiment of the present of invention, the network attribute of the resource of off-line download determines that the Virtual network operator that described resource belongs to may further include as required:
Obtain the domain-name information corresponding with the uniform resource position mark URL of described resource, and parse the IP address corresponding with domain name information;
To obtain the Virtual network operator corresponding with described IP address and it is defined as the Virtual network operator that described resource is belonged to, described database stores Virtual network operator and IP address thereof according to the IP address lookup database corresponding with domain name information.
In the present embodiment, be to obtain the domain-name information corresponding with this URL according to the URL of resource, can parse the IP address corresponding with this domain-name information according to domain-name information., due to the information that stores Virtual network operator and IP address thereof in database, therefore, according to the IP address, just can find the Virtual network operator corresponding with this IP address in database, thereby can obtain the corresponding Virtual network operator of this resource.Certainly, can also adopt the present known or in the future known any mode in this area for the definite of Virtual network operator.
For the task regulating strategy, namely how the off-line downloading task is distributed, distribute to which off-line download server.According to one embodiment of the present of invention, the task regulating strategy is the off-line downloading task of described resource to be distributed to the off-line download server of present load weights minimum.
Wherein, described load weights can adopt following formula to calculate:
Load weights=k1*cpu use amount+k2* disk surplus+k3* internal memory surplus+k4* bandwidth resources, wherein, k1 is weights corresponding to cpu use amount, and k2 is weights corresponding to disk surplus, and k3 is weights corresponding to internal memory surplus, and k4 is weights corresponding to bandwidth resources.
According to one embodiment of the present of invention, before the network attribute of the resource that described off-line is as required downloaded was determined Virtual network operator that described resource belongs to, this embodiment also comprised:
Obtain the described heavy feature that disappears that needs the resource of off-line download, the described heavy feature that disappears refers to that the identify label of described resource and its URL according to described resource, size and contents fragment generate;
Judge that the described heavy feature that disappears of resource that off-line downloads that needs is whether identical with the disappear heavy feature that disappears of the resource that the off-line of storage is downloaded in heavy table of the overall situation, and describedly needing time interval between resource that resource that off-line downloads and described off-line download whether less than the setting-up time value, disappear heavy table of the described overall situation stores the heavy feature that disappears of the resource that off-line downloads;
Identical and time interval of heavy feature, less than the setting-up time value, is not downloaded the described resource that needs off-line to download if disappear; Otherwise, set up the described off-line downloading task that needs the resource of off-line download.
In brief, this embodiment disappears and heavily processes the resource that needs off-line to download, and avoids the problem of repeated downloads.
Disappear heavily to process and relate to the overall situation and disappear heavy and heavy two aspects that disappear, part.
The overall situation disappears heavily: to all users as seen, can avoid downloading the resource (namely in order to determine to ask the resource of downloading whether to be downloaded, avoiding repeated downloads to increase server stress) that other users had downloaded.Particularly, if a resource has been downloaded by the someone, the heavy feature that disappears of this resource can be recorded to a visible table of the overall situation, disappear and heavily show as the overall situation, every other people is when downloaded resources afterwards, can use the heavy feature that disappears of the resource that will download to remove to inquire about the visible table of this overall situation (as, the overall situation heavily table that disappears), if find that resource exists, needn't repeated downloads.
Wherein, the heavy feature that disappears can refer to the identify label of this resource.Can think, the identical a plurality of resources of the heavy feature that disappears, its content is identical.Can when Gains resources full content not, check the local identical resource that whether has.For example file A has been preserved in this locality, and file is larger, at this moment downloads a file resource from network again, but does not know whether be exactly file A,, if it is identical with local file A that file is all downloaded just discovery, has so just consumed larger resource.By the heavy feature of disappearing of comparison resource, just needn't download all the elements of this document resource, just can know whether this locality exists same file, thereby can prevent repeated downloads.
Particularly, the overall situation table that disappears heavily is a key-value structure.Wherein, key namely disappears and weighs feature, can comprise resource address (URL), resource size and resource characteristic (as the resource content fragment).And the value value is fixed as 1, be used for this key of expression and disappear and heavily show to exist in the overall situation, and this resource is downloaded and exists.When the user submits an off-line downloading task (task that the user submits to can embody with the off-line download request) to, can be by the resource address of resource corresponding to this task, resource size and resource characteristic are spliced into a character string, with this character string as key, and search in the overall situation disappears heavily table with the resource of this key coupling and whether exist, if exist, disappear heavily, it is the required resource of untrue download user, directly the prompting user downloads successfully, and meets user's demand with the resource identical with this resource of off-line download before; If there is no, the overall situation disappears and weighs failure (need not in other words the overall situation disappears heavily), at this moment the true required resource of download user, and when download is completed, based on the resource updates overall situation of this download heavily table that disappears, in disappearing heavily table, the overall situation adds the heavy feature of disappearing of this resource, if follow-uply need off-line to download with this to disappear while weighing the resource that feature is complementary and need not true download.The heavy feature that disappears just adds the overall situation to and disappears and heavily show after the resource actual download is completed, guaranteed like this downloading task of a user to a resource, is not subjected to the impact of another user on the failure of the downloading task of this same resource.For example: first user is when downloading a resource, the heavy feature that just will disappear is put into the overall situation heavily table that disappears, the second user also asks to download this same resource can heavily exist this resource in table because of in the overall situation, disappearing, and is not that the second user truly downloads (the second subscriber's local obtains).In case the first user failed download, the second user will inevitably failed download.
for example, off-line downloading task according to user's downloaded resources A of user request, obtain the URL of this resource A according to this off-line downloading task, size and contents fragment, and according to the URL of this resource A, size and contents fragment generate the heavy feature key (character string) that disappears, disappear and heavily inquire about the resource that is complementary with this key in table in the overall situation based on this heavy feature key that disappears, as, inquire about the key ' identical with this key, if in the overall situation, disappear and heavily inquire this key ' in table, represent that resource A was downloaded, need not downloaded resources A again, if do not inquire this key ', represent that resource A was not downloaded, need downloaded resources A, use for the user.
Part disappears heavily: only to the individual as seen, can avoid individual's submission task repeatedly to cause repeated downloads.In the situation that there have the overall situation to disappear to be heavy, also needing the part heavy reason that disappears is only after the downloading task of this resource is successfully completed, just corresponding information (disappear heavy feature) can be present in the overall situation disappears heavily in table, after namely a resource was downloaded complete fully, the heavy feature that disappears of this resource just can be added to the overall situation and disappear heavily in table.So, before file has not been downloaded, in case same user submits an identical URL address repeatedly to, in order to download identical resource (namely repeatedly submitting the downloading task of a plurality of identical same resources of request to), if do not have part to disappear heavily, also can cause resource repeatedly to be downloaded, strengthening server stress affects download efficiency.Part disappears to weigh and does not use the heavy feature of disappearing of resource to disappear heavily, but directly uses the URL address to disappear heavily,, if same user submits identical URL address to, disappears heavily.
A kind of mode, can disappear before heavy and carry out part and disappear heavily carrying out the overall situation.particularly, can be unique user ID of each user assignment in advance, utilize user ID and resource address (URL) to limit a user's downloading task, and search this user according to user ID and resource address and whether have identical downloading task, that is to say, can the server that receives user's off-line downloading task (as, task server) inquiring user task list in, use the URL of this user ID and resource as keyword, search this user and whether submitted identical downloading task in user task list, if find (namely, there is identical task for this user), represent that this user exists the downloading task of this resource, the task of returning exists information to the user, otherwise (namely this task does not exist) carried out the overall situation again and disappeared heavily.
For example, first user and the second user off-line simultaneously download the www.t.com/test.doc resource, at this moment first user and the second user have the downloading task of this this resource of correspondence, before downloading task is not completed, i.e. (resource downloaded to the offline service device by off-line before), the overall situation disappears and does not heavily have the heavy feature of disappearing of this resource in table, and certainly, the failure of such the second user's downloading task can not affect the download of first user to this resource.Do not have part to disappear while weighing, if first user is repeatedly submitted the downloading task of downloading this resource to, there will be so a plurality of identical downloading task corresponding to this user in user task list, this downloading task of first user submission repeatedly, will cause first user that the task of a lot of request same asset is arranged in user task list.In a single day disappear heavily and carry out part, can avoid same user repeatedly to download the problem of same resource, prevent from increasing server stress.
Another kind of mode, also can after the overall situation disappear heavily, carry out part and disappear heavily.
Generation for the heavy feature that disappears, according to one embodiment of the present of invention, can comprise: extract a 100k content of described resource, middle random site 100k content, the afterbody 100k content content segments as resource; The URL of described resource, resource size and described content segments are spliced into character string; Described character string is carried out MD5 calculate to obtain the described heavy feature that disappears.For example: URL is
Www.t.com/test.docCan obtain resource size corresponding to this URL 5000, three resource fragments that resource head aaa... (100k byte data), middle bbb... (100k byte data) and afterbody ccc (100k byte data) are corresponding, disappearing heavily is characterized as MD5 and is:
“
www.t.com/test.doc5000aaa...bbb...ccc...”。
Fig. 3 is the flow chart according to the resource off-line method for down loading of one embodiment of the present of invention.
In this embodiment, further can comprise download request is disappeared heavily and to process etc., this disappears heavily to process and comprises that the overall situation disappears heavy and the part weight that disappears.Wherein the resource of request download can be the Internet resources that request is downloaded, the content that can download on network in other words, such as: game, software, music, text etc.
In this embodiment, at first, according to the download request from the user, the resource that will download is resolved checking, thereby determine the Virtual network operator of ownership, heavily process and also adopt before in the end determining to disappear, avoid the repeated downloads of same asset to reduce server stress, to improve the server responding ability.
Step S001, receive the off-line download request that the user sends.
Step S002, the resource that will download is carried out URL to be resolved and verifies, obtain the domain-name information corresponding to uniform resource position mark URL of described resource, parse IP address corresponding to domain name information by domain name system DNS, and send checking request (as checking whether this IP is arranged, this IP is correct etc.)., if this authentication failed in step S002, send authentication failed message, notify the user, as step S004.
If being verified in step S002, return to described resource file name, resource size and target domain name, and enter next step, i.e. step S003.
At step S003 place, can verify the user,, as the checking of user identity etc.,, if authentication failed message is sent in the user rs authentication failure, notify the user, as to step S004.If user rs authentication is passed through, can carry out the overall situation to URL and disappear heavy and part disappears heavily (in one embodiment, to URL, can first carry out the overall situation heavy part weight that disappears that carries out again that disappears), further,, if user rs authentication is passed through, enter step S005.
Step S005, judge whether that URL is carried out the overall situation to disappear heavily.If be judged as "Yes", the Internet resources of the described URL of meaning were downloaded namely and were existed by other users, cancelled download request, as step S011.Do not disappear heavily if do not need to carry out the overall situation, namely be judged as "No", enter into step S006.
Step S006, judge whether to carry out part and disappear heavily.If judgement needs part to disappear heavily (this user submits the request of repetition to), namely "Yes", enter step S011, cancels download request, and during the notice user task carried out.If judge that described URL does not need to carry out part and disappears heavily (certainly not needing the overall situation to disappear heavily) yet, i.e. the judgement of step S006 is also "No", and for this download request, the task server creation task, as step S007.The initial condition of creation task is " in task queue ".
The task that step S007 creates, will send to corresponding off-line download server cluster and go, and namely find the Virtual network operator of resource ownership., as step S008, remove exactly to determine off-line download server cluster (Virtual network operator of resource ownership).Each Virtual network operator and corresponding IP address thereof can be pre-stored in database (as the IP storehouse).And at abovementioned steps S002, the uniform resource position mark URL that download request is carried out described resource is resolved domain-name information DNS corresponding to this URL obtain, and and then IP address corresponding to this DNS by this dns resolution is obtained.At step S008, the pre-stored information of the described database of this IP address lookup that can be corresponding according to domain name information D NS (IP with corresponding operator), obtain the Virtual network operator corresponding with described IP address, this Virtual network operator is defined as Virtual network operator that described resource belongs to (such as Netcom, telecommunications, education network, mobile etc.), and then obtain off-line download server cluster number corresponding to this Virtual network operator, and an off-line download server in definite off-line download server cluster is carried out downloading task.Particularly, can set in advance the corresponding table of an off-line download server cluster and Virtual network operator.For example, telecommunications correspondence 3, No. 5 off-line download server clusters, Netcom's correspondence 2, No. 4 off-line download server clusters, when the network attribute of resource is telecommunications, can selects at random an off-line download server and carry out the off-line downloading task from off-line download server cluster corresponding to telecommunications number, also can select an off-line download server according to certain rule (as, minimum load etc.).
Before the network attribute of the resource that off-line is as required downloaded is determined Virtual network operator that described resource belongs to, can also obtain the heavy feature that disappears of the resource that off-line downloads, the resource that described off-line is downloaded disappear heavy characteristic storage in the overall situation disappears heavily table.
like this, in step S005, obtain described user request, need the disappearing of resource that off-line is downloaded to weigh feature, while whether carrying out judgement that the overall situation disappears heavy, judge that the described heavy feature that disappears of resource that off-line downloads that needs is whether identical with the disappear heavy feature that disappears of the resource that the off-line of storage is downloaded in heavy table of the described overall situation, certainly, can also be simultaneously at this, judge and describedly need time interval between resource that resource that off-line downloads and described off-line download whether less than the time value of setting (disappearing of will being described below of time value weighs the ageing of feature and describe)., if be judged as "Yes", cancel to download that described to need the request of the resource that off-line downloads be step S011.Otherwise, set up that described to need the off-line downloading task of the resource that off-line downloads be step S007.
Utilize the heavy processing (be called the overall situation disappear heavy strategy) that disappears of this global resource, the pressure of avoiding repeated downloads to cause server, it mainly utilizes the aforesaid heavy feature that disappears to realize.The heavy feature that disappears can generate according to URL, size and the contents fragment of described resource.Such as: can extract a 100k content of resource, middle random site 100k content, afterbody 100k content, as the fragment of resource content, are spliced into a character string in conjunction with resource URL and resource size, then this character string is generated a MD5 characteristic value.Give one example: the corresponding URL of the resource that the user need to download is
Www.t.com/test.docCan obtain resource size corresponding to this URL, three resource fragments that resource head, centre and afterbody are corresponding, as, resource size is 5000, slice header: aaa... (100k byte data), centre: bbb... (100k byte data), afterbody: ccc (100k byte data), disappear heavily be characterized as md5 ("
Www.t.com/test.doc5000aaa...bbb...ccc..."), further, can, by resource size corresponding to header acquisition request URL of http, can pass through the partial content of the range agreement Gains resources of http.When the MD5 characteristic value Already in the overall situation disappear heavily in table, it is carried out the overall situation and disappears heavy and repeated downloads again.
When resource being disappeared heavily, can only to the resource that belongs to the resource type in the type white list, disappear heavily.Further, resource type can be the file type of requested resource.This document type can also judge according to extension name, for example, picture/mb-type, extension name can be .jpg, gif etc.Being in the resource type in the type white list, can be the type of the resource that seldom is modified, such as picture, video, software program etc.
In one embodiment, the heavy feature that disappears can effective property, and for example its term of validity can be made as a week (only illustrate, not limit the invention) herein, crosses after date, and the heavy resource that disappears need to be downloaded again.Validity for the heavy feature that disappears, can disappear and heavily show to realize by the overall situation, particularly, the heavy feature that disappears when the resource of having obtained off-line download, the resource that just described off-line can be downloaded disappear heavy characteristic storage in the overall situation disappears heavily table, and the overall situation table that disappears is heavily upgraded, cross after date when the heavy feature that disappears, can discharge disappear this heavy feature that disappears in heavy table of the overall situation.And aforementioned when step S005 carries out judgement that the overall situation disappears heavy, compare judgement except offseting heavy feature, describedly need time interval between resource that resource that off-line downloads and described off-line download whether less than the time value of setting if also judge simultaneously, this time value is effective period, just can be faster, more effectively definite, whether need to do the processing that the overall situation disappears and weighs.
Step S009, can carry out the dynamic task regulation and control based on the load of off-line download server in described off-line download server cluster, and select specified off-line download server, can determine to distribute the off-line download server of this task.
When the off-line download server by in step S009 selection cluster, off-line can be downloaded asynchronous message and send to corresponding off-line download server cluster, and then the off-line downloading task can enter corresponding task queue, the execution of wait task.
Dynamic task regulates and controls to determine to assign the task to which the off-line download server in cluster, for example: after task can be submitted to as shown in Figure 4 a certain off-line download server cluster 44 or 45, the task dispatcher in off-line download server cluster 44 or 45 (not shown go out) can be assigned to task on the line of present load weights minimum the machine processing of getting on.The computing formula of load weights is:
Load weights=k1*cpu use amount+k2* disk surplus+k3* internal memory surplus+k4* bandwidth resources are done ranking operation;
Wherein K is the shared weights of every computer resource, and the off-line download service mainly relies on disk resource, so the weights that adopt can be k2=5; K1=k3=k4=1.
After the off-line downloading task is assigned to the off-line download server of described appointment, carries out off-line by this off-line download server (as its off-line, downloading progress of work worker) and download.Particularly, at first the off-line downloading task enters the task queue loitering phase, the off-line download progress of work worker of off-line download server obtains the task in task queue successively, and then can be according to obtaining of task, notify corresponding user's cluster (as, notice sends the user of off-line download request), and the modification task status is " in download ".
After the task that step S010, off-line download had been assigned to the off-line download server that above-mentioned steps S009 determines, the off-line download server of described appointment receives distributed the task of coming, start to carry out the off-line downloading task.After this, if download successfully the content that will download (as " picture " etc.) is saved in (as non-relational database cassandra) in database, after preserving successfully, download result parameter and be set to " success " concurrent line download feedback asynchronous message that is sent to corresponding user's cluster, and revise page metamessage (meta information), can upgrade task status and be " download is completed "; And if failed download or preserve unsuccessfully, send off-line after downloading result parameter and being set to " failure " and recording failure cause and download the feedback asynchronous message to corresponding user's cluster, and revise page metamessage (meta information), can upgrade task status and be " failed download ".
Fig. 4 is the structure chart of the resource off-line download apparatus of the embodiment of the invention.
The device of Fig. 4 comprises and disappears task server 41, message server 42, the overall situation refitting puts 43, off-line download server cluster 44,45, and the Storm Distributed Computing Platform (not shown) in the task dispatcher (not shown) in off-line download server cluster 44,45, off-line download server (not shown), off-line download server, cloud storage (not shown) etc.
Task server 41 is the network attribute of the resource of off-line download as required, determines the Virtual network operator that described resource belongs to.It receives user's request, the inquiry overall situation disappears and heavily shows to judge whether to carry out the overall situation and disappear heavily, can send a download message to message server 42 if resource disappears heavy, message content can comprise target off-line download server cluster that resource URL address, step S008 determine number etc.This task server 41 is mainly used to process user's request, and resolving resource URL obtains mission bit stream and off-line download server cluster number, in order to determine off-line download cluster.
Particularly, task server 41 first can carry out association store in the IP address of correspondence respectively with diverse network operator and each Virtual network operator.At first this task server 41 parses domain-name information corresponding to resource URL and by dns resolution, obtains IP information corresponding to domain name.Then can utilize this IP information to inquire about corresponding Virtual network operator (Netcom, telecommunications, education network, movement etc.) in the IP information bank.And calculate the off-line with same operator and download cluster number.But the operation in these task server 41 execution graphs 3 in step S001-S003, receive download request, URL parsing and checking, user rs authentication that the user sends, and wherein, arbitrary checking is by showing authentication failed, as the operation of step S004.When checking is all passed through, judging whether so to carry out URL disappears heavily, operation as step S005, S006, when can disappearing to weigh, definite resource performs step S011, namely cancel download request, this resource is not downloaded, utilized table 43 retry (the heavy strategy that disappears as above give an account of the global resource that continues disappear heavy strategy) that disappears that disappears heavily of the overall situation in this system.And if do not need to disappear heavily, when namely step S005, S006 are no, create the task of off-line downloaded resources, as step S007, and the task of this off-line downloaded resources is dealt into message server 42 in system.
The overall situation disappears to reset and puts 43, and the execution global resource disappears heavy tactful, as described in the method for above-mentioned Fig. 3 description.Before the network attribute of the resource that off-line is as required downloaded is determined Virtual network operator that described resource belongs to, can also obtain the heavy feature that disappears of the resource that off-line downloads, the resource that described off-line is downloaded disappear heavy characteristic storage in the overall situation disappears heavily table.Can also obtain the described heavy feature that disappears that needs the resource of off-line download.Judge that the described heavy feature that disappears of resource that off-line downloads that needs is whether identical with the disappear heavy feature that disappears of the resource that the off-line of storage is downloaded in heavy table of the described overall situation, whether and judging describedly needs time interval between resource that resource that off-line downloads and described off-line download less than the time value of setting, if so, not downloading the described resource that needs off-line to download is step S011.Otherwise, set up that described to need the off-line downloading task of the resource that off-line downloads be step S007.Wherein, the heavy feature that disappears of described resource generates according to URL, size and the contents fragment of described resource.And can feature heavy according to disappearing of the resource after the download of obtaining after having downloaded in the back upgrade this overall situation heavily table that disappears.The overall situation table that disappears heavily can be arranged in the refitting that disappears of this overall situation and puts 43, by this refitting that disappears, is put and is carried out above-mentioned global resource disappear heavy strategy and corresponding implementation step.
This message server 42, can receive information, the off-line downloading task from described task server and process corresponding information, and send described information, off-line downloading task and described information to corresponding off-line download server cluster.The various message that namely will receive, information and each task are distributed to corresponding off-line download server cluster (as pressing cluster number distribution), realize the operation of forwarding messages, it receives from the message of task server 41, sends it to correct destination again, as is forwarded to target off-line download cluster.
Two off-line download server clusters (for example corresponding off-line download server cluster 44 and off-line download server cluster 45 corresponding to telecommunications of Netcom) have been shown in Fig. 4, those skilled in the art should infer, the data of off-line download server cluster of the present invention can be not limited to this, be that off-line download server cluster can comprise a plurality of off-line download servers, further, each off-line download server can comprise Storm platform, cloud storage, this Storm platform can be used for downloading target resource, and the cloud storage can be used for storage resources information.Off-line download server cluster 44,45, according to the task regulating strategy of setting, from the off-line download server cluster of described definite Virtual network operator, select the off-line download server, and the off-line downloading task of described resource is distributed to described off-line download server.
Wherein, the task that message server 42 will create according to off-line download server cluster number is sent to corresponding off-line download server cluster, as the operation of step S008.Each off-line download server in each off-line download server cluster can also comprise a Storm Distributed Computing Platform (not shown).The Storm Distributed Computing Platform is distributed, fault-tolerant real time computation system., for distributed real-time calculating provides one group of generic primitives, can be used among " stream is processed " processing messages and more new database in real time.Storm also can be used to " calculate continuously " (continuous computation), and data flow is done continuous-query, when calculating just with result with the formal output that flows to the user.The cloud storage can be stored the Internet resources that off-line is downloaded, and the user can access the Internet resources that off-line is downloaded by access cloud memory space.
The task regulating strategy comprises one or more task scheduling strategies, and off-line download server cluster, according to described one or more task scheduling strategies, distributes the off-line downloading task of described resource the off-line download server that is given to the present load minimum.In off-line download server cluster, have task dispatcher, arrange in its download server of off-line at described Virtual network operator cluster and carry out the distribution of off-line downloading task.Particularly, the off-line downloading task of described resource is sent to described task dispatcher, this task dispatcher, according to described task scheduling strategy, calculate the load weights of each off-line download server, the off-line download server of present load weights minimum is appointed as the off-line download server of described task.
The off-line downloading task that is assigned in this off-line download cluster is distributed to task dispatcher.This task dispatcher is according to the resource service condition of each off-line download server in this cluster, then task is assigned on the line of present load weights minimum that machine (off-line download server) is upper to be processed, so that as step S010 execution downloading task.The computing formula of load weights is:
Load weights=k1*cpu use amount+k2* disk surplus+k3* internal memory surplus+k4* bandwidth resources are done ranking operation;
Wherein K is the shared weights of every computer resource, and the off-line download service mainly relies on disk resource, so the weights that adopt can be k2=5; K1=k3=k4=1.
Fig. 5 is the structure chart according to the resource off-line download apparatus of one embodiment of the invention.This device can comprise Virtual network operator determination module 51, is used for the network attribute of the resource of off-line download as required and determines the Virtual network operator that described resource belongs to; The off-line download server is selected module 52, is used for selecting the off-line download server of appointment according to the task regulating strategy of setting from the off-line download server cluster of described Virtual network operator; Task execution module 53, be used for that the off-line downloading task of described resource is distributed to selected off-line download server and download to carry out off-line.Above-mentioned module is the functional module corresponding to the treatment step of method shown in Fig. 2,3.
Further, Virtual network operator determination module 51 can carry out association store in the IP address of correspondence respectively with diverse network operator and diverse network operator, obtain the domain-name information corresponding to uniform resource position mark URL of described resource, by domain name system DNS, parse IP address corresponding to domain name information (the first acquisition module); The information of the IP address lookup described association store corresponding according to domain name information, obtain Virtual network operator corresponding to IP address corresponding to domain name information, this Virtual network operator is defined as the Virtual network operator (the second acquisition module) that described resource belongs to.Virtual network operator determination module 51 comprises first, second acquisition module (not shown).
Off-line download server selection module 52 can be according to one or more task scheduling strategies, the off-line downloading task of described resource is distributed the off-line download server that is given to the present load minimum, namely select the off-line download server of present load minimum in off-line download server cluster.further, can be used for carrying out the off-line download server cluster setting of described Virtual network operator the task dispatcher (scheduler in the Storm Distributed Computing Platform of mentioning in as Fig. 4) that the off-line downloading task is distributed, the off-line downloading task of described resource is distributed to described task dispatcher, by described task dispatcher, resource service condition according to each off-line download server in the off-line download server cluster of described Virtual network operator, calculate the load weights of each off-line download server, by described task dispatcher, the off-line download server of load weights minimum is defined as the off-line download server of described appointment.The off-line download server is selected module 52, and the formula of concrete load weights for calculate each off-line download server by described task dispatcher is as follows:
Load weights=k1*cpu use amount+k2* disk surplus+k3* internal memory surplus+k4* bandwidth resources;
Described k1 is weights corresponding to cpu use amount, and described k2 is weights corresponding to disk surplus, and described k3 is weights corresponding to internal memory surplus, and described k4 is weights corresponding to bandwidth resources.
Fig. 6 is the structure chart according to the resource off-line download apparatus of one embodiment of the present of invention.The module 61 that can comprising in this device disappears heavily processes, Virtual network operator determination module 62, off-line download server are selected module 63, task execution module 64.
Modules is also the functional module of the execution step of method shown in corresponding diagram 2,3.
As shown in Figure 6.This device can weigh function and the enforcement of processing policy in corresponding said method about disappearing.It comprises: heavy processing module 61 disappears, can be used for obtaining the heavy feature that disappears of the resource of will off-line downloading, and judge whether the resource that described off-line is downloaded needs to disappear heavily (carrying out before Virtual network operator determines the module of processing before the i.e. processing of Virtual network operator determination module 62).
Virtual network operator determination module 62, can be used for the network attribute of the resource of off-line download as required and determine the Virtual network operator that described resource belongs to.
The off-line download server is selected module 63, is used for selecting the off-line download server of appointment according to the task regulating strategy of setting from the off-line download server cluster of described Virtual network operator.
Task execution module 64, be used for that the off-line downloading task of described resource is distributed to selected off-line download server and download to carry out off-line.
Further, the resource that the heavy processing module 61 that disappears is downloaded described off-line disappear heavy characteristic storage in the overall situation disappears heavily table, the heavy feature that disappears of described resource can generate according to URL, size and the contents fragment of described resource.The heavy processing module 61 that disappears is obtained the described heavy feature that disappears that needs the resource of off-line download; And judge that the described heavy feature that disappears of resource that off-line downloads that needs is whether identical with the disappear heavy feature that disappears of the resource that the off-line of storage is downloaded in heavy table of the described overall situation, and describedly need time interval between resource that resource that off-line downloads and described off-line download less than the time value of setting, if so, do not download the described resource that needs off-line to download; Otherwise, set up the described off-line downloading task that needs the resource of off-line download.
Further, the heavy feature that disappears during the overall situation disappears and heavily shows can be to extract the head 100k content of described resource, middle random site 100k content, afterbody 100k content is as the content segments of resource, the URL of described resource, resource size and resource content segment are spliced into a character string, described character string are done Message Digest Algorithm 5 (MD5) calculate.
This global resource heavy strategy that disappears, depend on and above-mentionedly complete look into weight-normality, and heavy resource corresponding to url that guarantee to disappear is that consistent or inconsistent probability is in tolerable scope.
Intrinsic not relevant to any certain computer, virtual system or miscellaneous equipment with demonstration at this algorithm that provides.Various general-purpose systems also can with based on using together with this teaching.According to top description, it is apparent constructing the desired structure of this type systematic.In addition, the present invention is not also for any certain programmed language.Should be understood that and can utilize various programming languages to realize content of the present invention described here, and the top description that language-specific is done is in order to disclose preferred forms of the present invention.
In the specification that provides herein, a large amount of details have been described.Yet, can understand, embodiments of the invention can be in the situation that do not have these details to put into practice.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand one or more in each inventive aspect, in the description to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes in the above.Yet the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires the more feature of feature of clearly putting down in writing than institute in each claim.Or rather, as following claims reflected, inventive aspect was to be less than all features of the disclosed single embodiment in front.Therefore, follow claims of embodiment and incorporate clearly thus this embodiment into, wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and can adaptively change and they are arranged in one or more equipment different from this embodiment the module in the equipment in embodiment.Can be combined into a module or unit or assembly to the module in embodiment or unit or assembly, and can put them into a plurality of submodules or subelement or sub-component in addition.At least some in such feature and/or process or unit are mutually repelling, and can adopt any combination to disclosed all features in this specification (comprising claim, summary and the accompanying drawing followed) and so all processes or the unit of disclosed any method or equipment make up.Unless clearly statement in addition, in this specification (comprising claim, summary and the accompanying drawing followed) disclosed each feature can be by providing identical, be equal to or the alternative features of similar purpose replaces.
In addition, those skilled in the art can understand, although embodiment more described herein comprise some feature rather than further feature included in other embodiment, the combination of the feature of different embodiment mean be in scope of the present invention within and form different embodiment.For example, in the following claims, the one of any of embodiment required for protection can be used with compound mode arbitrarily.
All parts embodiment of the present invention can realize with hardware, perhaps with the software module of moving on one or more processor, realizes, perhaps the combination with them realizes.It will be understood by those of skill in the art that and can use in practice microprocessor or digital signal processor (DSP) to realize according to some or all some or repertoire of parts in the equipment of the embodiment of the present invention.The present invention can also be embodied as be used to part or all equipment or the device program (for example, computer program and computer program) of carrying out method as described herein.The program of the present invention that realizes like this can be stored on computer-readable medium, perhaps can have the form of one or more signal.Such signal can be downloaded and obtain from internet website, perhaps provides on carrier signal, perhaps with any other form, provides.
It should be noted above-described embodiment the present invention will be described rather than limit the invention, and those skilled in the art can design alternative embodiment in the situation that do not break away from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and is not listed in element or step in claim.Being positioned at word " " before element or " one " does not get rid of and has a plurality of such elements.The present invention can realize by means of the hardware that includes some different elements and by means of the computer of suitably programming.In having enumerated the unit claim of some devices, several in these devices can be to carry out imbody by same hardware branch.The use of word first, second and C grade does not represent any order.Can be title with these word explanations.
The invention discloses A1, a kind of resource off-line method for down loading, it comprises: the network attribute of the resource of off-line download is determined the Virtual network operator that described resource belongs to as required; According to the task regulating strategy of setting, select the off-line download server from the off-line download server cluster of described Virtual network operator, wherein, described off-line download server is used for that resource is carried out off-line and downloads; The off-line downloading task of described resource is distributed to selected off-line download server to be downloaded to carry out off-line.A2, method as described in A1, wherein, the network attribute of the resource that described off-line is as required downloaded is determined the Virtual network operator that described resource belongs to, further comprise: obtain the domain-name information corresponding with the uniform resource position mark URL of described resource, and parse the IP address corresponding with domain name information; To obtain the Virtual network operator corresponding with described IP address and it is defined as the Virtual network operator that described resource is belonged to, described database stores Virtual network operator and IP address thereof according to the IP address lookup database corresponding with domain name information.A3, as A1 or the described method of A2, wherein, described task regulating strategy is for distributing to the off-line downloading task of described resource the off-line download server of present load weights minimum.A4, method as described in A3, wherein, described load weights are: k1*cpu use amount+k2* disk surplus+k3* internal memory surplus+k4* bandwidth resources, wherein, described k1 is weights corresponding to cpu use amount, described k2 is weights corresponding to disk surplus, and described k3 is weights corresponding to internal memory surplus, and described k4 is weights corresponding to bandwidth resources.A5, method as described in A1 to A4 any one, wherein, before the network attribute of the resource that described off-line is as required downloaded is determined Virtual network operator that described resource belongs to, also comprise: obtain the described heavy feature that disappears that needs the resource of off-line download, the described heavy feature that disappears refers to that the identify label of described resource and its URL according to described resource, size and contents fragment generate; Judge that the described heavy feature that disappears of resource that off-line downloads that needs is whether identical with the disappear heavy feature that disappears of the resource that the off-line of storage is downloaded in heavy table of the overall situation, and describedly needing time interval between resource that resource that off-line downloads and described off-line download whether less than the setting-up time value, disappear heavy table of the described overall situation stores the heavy feature that disappears of the resource that off-line downloads; Identical and time interval of heavy feature, less than the setting-up time value, is not downloaded the described resource that needs off-line to download if disappear; Otherwise, set up the described off-line downloading task that needs the resource of off-line download.A6, method as described in A5, wherein, the described heavy feature that disappears generates through the following steps: extract a 100k content of described resource, middle random site 100k content, the afterbody 100k content content segments as resource; The URL of described resource, resource size and described content segments are spliced into character string; Described character string is carried out MD5 calculate to obtain the described heavy feature that disappears.
The invention also discloses B7, a kind of resource off-line download apparatus, it comprises: the Virtual network operator determination module is suitable for the network attribute of the resource of off-line download as required and determines the Virtual network operator that described resource belongs to; The off-line download server is selected module, is suitable for according to the task regulating strategy of setting, and selects the off-line download server from the off-line download server cluster of described Virtual network operator, and wherein, described off-line download server is used for that resource is carried out off-line and downloads; Task execution module, be suitable for that the off-line downloading task of described resource is distributed to selected off-line download server and download to carry out off-line.B8, as device as described in B7, wherein, described Virtual network operator determination module further comprises: the first acquisition module is suitable for obtaining the domain-name information corresponding with the uniform resource position mark URL of described resource, and parses the IP address corresponding with domain name information; The second acquisition module, be suitable for the basis IP address lookup database corresponding with domain name information to obtain the Virtual network operator corresponding with described IP address and it is defined as the Virtual network operator that described resource is belonged to, described database stores Virtual network operator and IP address thereof.B9, as B7 or the described device of B8, wherein, described task regulating strategy is for distributing to the off-line downloading task of described resource the off-line download server of present load weights minimum.B10, device as described in B9, wherein, described load weights are: k1*cpu use amount+k2* disk surplus+k3* internal memory surplus+k4* bandwidth resources, wherein, described k1 is weights corresponding to cpu use amount, described k2 is weights corresponding to disk surplus, and described k3 is weights corresponding to internal memory surplus, and described k4 is weights corresponding to bandwidth resources.B11, device as described in B7 to B10 any one, wherein, also comprise: the heavy processing module that disappears is suitable for: obtain the described heavy feature that disappears that needs the resource of off-line download, the described heavy feature that disappears refers to that the identify label of described resource and its URL according to described resource, size and contents fragment generate; Judge that the described heavy feature that disappears of resource that off-line downloads that needs is whether identical with the disappear heavy feature that disappears of the resource that the off-line of storage is downloaded in heavy table of the overall situation, and describedly needing time interval between resource that resource that off-line downloads and described off-line download whether less than the setting-up time value, disappear heavy table of the described overall situation stores the heavy feature that disappears of the resource that off-line downloads; Identical and time interval of heavy feature, less than the setting-up time value, is not downloaded the described resource that needs off-line to download if disappear; Otherwise, set up the described off-line downloading task that needs the resource of off-line download.B12, device as described in B11, also comprising the heavy feature generation module that disappears, and it comprises: extraction unit is suitable for extracting a 100k content of described resource, middle random site 100k content, the afterbody 100k content content segments as resource; Concatenation unit, be suitable for the URL of described resource, resource size and described content segments are spliced into character string; Computing unit, be suitable for that described character string is carried out MD5 and calculate.