Summary of the invention
It the link extracting method that the main purpose of the present invention is to provide a kind of based on web crawlers, device, equipment and deposits
Storage media, it is intended to the performance of web crawlers be improved by the optimization to link extracting mode, to guarantee that web crawlers can
Information needed for the acquisition people of fast accurate promotes user experience.
To achieve the above object, the present invention provides a kind of link extracting method based on web crawlers, the method packet
Include following steps:
In the data grabber request for receiving agricultural product to be analyzed, extracted from data grabber request to be visited flat
First uniform resource position mark URL of platform links and agricultural product subject information relevant to the agricultural product to be analyzed;
According to first URL link, access request is sent to the platform to be visited;
After receiving the response that the platform to be visited is made according to the access request, the first URL chain is grabbed
Connect the data information in the corresponding page;
The data information is parsed, obtains the second URL link embedded in the page, and by described second
URL link is added to URL queue to be crawled;
Based on the anchor multiple attributes integration mode of path polymerization, to first URL link and the URL queue to be crawled
In the second URL link handled, obtain the corresponding rich text format of second URL link multiple attributes theme letter
Breath;
Respectively by the corresponding multiple attributes subject information of each second URL link and institute in the URL queue to be crawled
It states agricultural product subject information to compare, extracts the multiple attributes for meeting preset threshold with the agricultural product subject information similarity
Corresponding second URL link of subject information.
Preferably, described that the data information is parsed, the second URL link embedded in the page is obtained, and
The step of second URL link is added to URL queue to be crawled, comprising:
The data information is parsed, the second URL link embedded in the page is obtained;
Second URL link is parsed, the corresponding standardization label of second URL link is obtained;
The corresponding abstract tree of second URL link is generated according to the standardization label;
Based on dom tree matching process, the node content of the abstract tree is matched with the agricultural product subject information,
Unmatched node content is removed, is obtained and matched second URL link of the agricultural product subject information;
URL queue to be crawled will be added to matched second URL link of the agricultural product subject information.
Preferably, it is described based on path polymerization anchor multiple attributes integration mode, to first URL link and it is described to
The second URL link crawled in URL queue is handled, and the multiple of the corresponding rich text format of second URL link is obtained
The step of attribute subject information, comprising:
According to the second URL link in first URL link and the URL queue to be crawled, generate described to be analyzed
The corresponding path access digraph of agricultural product;
Based on the anchor multiple attributes integration mode of path polymerization, the most short access road in the path access digraph is determined
Diameter obtains most short access path set;
It determines the corresponding Anchor Text of the most short access path of each in the most short access path set, obtains described most short
The corresponding access path Anchor Text set of access path set, and be each element point in the access path Anchor Text set
With a weight;
Formula pair is standardized according to preset weight, it is corresponding to each element in the access path Anchor Text set
Weight is standardized;
Descending sort is carried out to the weight after standardization, obtains the more of the corresponding rich text format of second URL link
Weight attribute subject information.
Preferably, which is characterized in that described respectively that each second URL link in the URL queue to be crawled is corresponding
Multiple attributes subject information and agricultural product subject information the step of comparing, extract and the agricultural product subject information
Similarity meets the step of multiple attributes subject information corresponding second URL link of preset threshold, comprising:
Multiple attributes theme feature word is extracted from the multiple attributes subject information, to the multiple attributes theme feature
Word carries out Hash processing, obtains the first cryptographic Hash, and the multiple attributes theme feature word is the multiple attributes subject information pair
The element in access path Anchor Text set answered;
The corresponding weight of the multiple attributes theme feature word is obtained from the access path Anchor Text set, and is combined
First cryptographic Hash is quantified as primary vector by the weight;
From the agricultural product subject information extract agricultural product theme feature word, and to the agricultural product theme feature word into
The processing of row Hash, obtains the second cryptographic Hash;
Second cryptographic Hash is quantified as secondary vector according to for the preset weight of agricultural product theme feature word;
The primary vector and the secondary vector are compared, extracted full with the agricultural product subject information similarity
Corresponding second URL link of the multiple attributes subject information of sufficient preset threshold.
Preferably, which is characterized in that the anchor multiple attributes integration mode based on path polymerization is to the URL to be crawled
Before the second URL link carries out the step of feature extraction in queue, the method also includes:
Using the counting bloom filter of chain feature, and in conjunction with multiple Hash in the URL queue to be crawled
Two URL links carry out joint duplicate removal, are all different the second URL link of any two in the URL queue to be crawled.
Preferably, the counting bloom filter using chain feature, and in conjunction with multiple Hash to the URL to be crawled
Before the step of the second URL link in queue carries out joint duplicate removal, the method also includes:
The URL queue to be crawled is traversed, signature analysis is carried out to current second URL link traversed, is mentioned
Take protocol type part, path sections and the inquiry part of current second URL link;
According to the protocol type part, the path sections and the inquiry part, the current 2nd URL chain is obtained
Connect corresponding global feature URL link;
The corresponding relationship between current second URL link and the global feature URL link is established, and will be described right
Update should be related into the URL queue to be crawled.
Preferably, the counting bloom filter using chain feature, and in conjunction with multiple Hash to the URL to be crawled
The step of the second URL link in queue carries out joint duplicate removal, comprising:
The URL queue to be crawled is traversed, the corresponding global feature of current second URL link traversed is obtained
URL link;
Whole duplicate checking is carried out to the global feature URL link using the counting bloom filter of chain feature, obtains institute
State the corresponding duplicate checking mark of global feature URL link;
According to the duplicate checking mark, feature identification is carried out to the global feature URL link, obtains multiple characteristic fragments;
According to preset URL link reformulation rule, the multiple characteristic fragment is recombinated, obtains N number of recombination URL chain
Tab segments, the N are the integer more than or equal to 1;
Multiple Hash duplicate checking is carried out to N number of recombination URL link segment, obtains that current second URL link is corresponding to be looked into
Weight result;
According to the duplicate checking as a result, being retained or being abandoned behaviour to the second URL link in the URL queue to be crawled
Make.
In addition, to achieve the above object, the present invention also proposes a kind of link extraction element based on web crawlers, the dress
It sets and includes:
Extraction module, for being requested from the data grabber in the data grabber request for receiving agricultural product to be analyzed
Middle the first uniform resource position mark URL link for extracting platform to be visited and theme relevant to the agricultural product to be analyzed are believed
Breath;
Sending module, for sending access request to the platform to be visited according to first URL link;
Handling module, for grabbing after receiving the response that the platform to be visited is made according to the access request
Data information in the corresponding page of first URL link;
Parsing module obtains the second URL link embedded in the page for parsing to the data information,
And second URL link is added to URL queue to be crawled;
Processing module, the anchor multiple attributes integration mode for being polymerize based on path, to first URL link and described
The second URL link in URL queue to be crawled is handled, and the more of the corresponding rich text format of second URL link are obtained
Weight attribute subject information;
Extraction module, for respectively by each corresponding multiple attributes of the second URL link in the URL queue to be crawled
Subject information is compared with the agricultural product subject information, is extracted and is met default threshold with the agricultural product subject information similarity
Corresponding second URL link of the multiple attributes subject information of value.
In addition, to achieve the above object, the present invention also proposes a kind of link extract equipment based on web crawlers, described to set
It is standby include: memory, processor and be stored on the memory and can run on the processor based on web crawlers
Link extraction procedure, the link extraction procedure based on web crawlers is arranged for carrying out as described above is climbed based on network
The step of link extracting method of worm.
In addition, to achieve the above object, the present invention also proposes a kind of computer readable storage medium, described computer-readable
The link extraction procedure based on web crawlers is stored on storage medium, the link extraction procedure based on web crawlers is located
Manage the step of realizing the link extracting method based on web crawlers as described above when device executes.
Link extraction scheme provided by the invention based on web crawlers passes through the anchor multiple attributes collection polymerizeing based on path
At mode, the second URL link treated in the first URL link and URL queue to be crawled of access platform is handled, and obtains
The multiple attributes subject information of the corresponding rich text format of two URL links, and will each the 2nd URL in URL queue be crawled
It links corresponding multiple attributes subject information to compare with agricultural product subject information, extract and agricultural product subject information similarity
Corresponding second URL link of multiple attributes subject information for meeting preset threshold is effectively guaranteed and extracts to specific URL link
Accuracy rate, and then can be to avoid web crawlers because of the wasting of resources caused by the crawling of unrelated link, to significantly improve net
The performance of network crawler, enables information needed for the acquisition people of web crawlers fast accurate, promotes user experience.
Specific embodiment
It should be appreciated that described herein, specific examples are only used to explain the present invention, is not intended to limit the present invention.
Referring to Fig.1, Fig. 1 is that the link based on web crawlers for the hardware running environment that the embodiment of the present invention is related to mentions
Take device structure schematic diagram.
As shown in Figure 1, being somebody's turn to do the link extract equipment based on web crawlers may include: processor 1001, such as centre
It manages device (Central Processing Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, storage
Device 1005.Wherein, communication bus 1002 is for realizing the connection communication between these components.User interface 1003 may include showing
Display screen (Display), input unit such as keyboard (Keyboard), optional user interface 1003 can also include the wired of standard
Interface, wireless interface.Network interface 1004 optionally may include standard wireline interface and wireless interface (such as Wireless Fidelity
(WIreless-FIdelity, WI-FI) interface).Memory 1005 can be the random access memory (Random of high speed
Access Memory, RAM) memory, be also possible to stable nonvolatile memory (Non-Volatile Memory,
), such as magnetic disk storage NVM.Memory 1005 optionally can also be the storage device independently of aforementioned processor 1001.
The link based on web crawlers is mentioned it will be understood by those skilled in the art that structure shown in Fig. 1 is not constituted
The restriction for taking equipment may include perhaps combining certain components or different component cloth than illustrating more or fewer components
It sets.
As shown in Figure 1, as may include operating system, network communication mould in a kind of memory 1005 of storage medium
Block, Subscriber Interface Module SIM and the link extraction procedure based on web crawlers.
In link extract equipment based on web crawlers shown in Fig. 1, network interface 1004 is mainly used for taking with network
Business device carries out data communication;User interface 1003 is mainly used for carrying out data interaction with user;The present invention is based on web crawlers
Processor 1001, memory 1005 in link extract equipment can be set in the link extract equipment based on web crawlers,
The link extract equipment based on web crawlers calls what is stored in memory 1005 to climb based on network by processor 1001
The link extraction procedure of worm, and execute the link extracting method provided in an embodiment of the present invention based on web crawlers.
The link extracting method based on web crawlers that the embodiment of the invention provides a kind of is the present invention referring to Fig. 2, Fig. 2
A kind of flow diagram of the link extracting method first embodiment based on web crawlers.
In the present embodiment, the link extracting method based on web crawlers the following steps are included:
Step S10 is extracted from data grabber request in the data grabber request for receiving agricultural product to be analyzed
First uniform resource position mark URL of platform to be visited links and agricultural product theme relevant to the agricultural product to be analyzed is believed
Breath.
Specifically, the executing subject of the present embodiment is any deployment or the terminal device for being equipped with network crawler system.
It is noted that in the present embodiment, in order to improve the crawl of the corresponding data of agricultural product to be analyzed as far as possible
Speed, resolution speed etc. operate, described network crawler system preferred distribution formula network crawler system in the present embodiment.
However, it should be understood that the terminal device can be client device in practical applications, it is also possible to take
Business device end equipment, herein with no restrictions.
In addition, above-mentioned described platform to be visited can be the network provider for showing and needing anal yzing agricul products in practical applications
City.
Correspondingly, described uniform resource locator (Uniform Resource Locator, URL) is to access the net
Network address needed for network store.
However, it should be understood that above-mentioned described agricultural product to be analyzed are to various agricultural product common at present
One general designation, agricultural product to be analyzed can be tea product, fruit and vegetable food, cereal product etc. in practical applications, herein no longer
It enumerates, any restrictions is not also done to this.
In order to make it easy to understand, the present embodiment is using tea product as agricultural product to be analyzed.
Correspondingly, above-mentioned described agricultural product subject information is then tea product main information, in practical applications tea product
Main information can specifically include characteristic information relevant to the tea product to be analyzed, for example define tea product to be analyzed
Type be green tea, produce season be it is clear and bright before, price in 500/kg~1000/kg etc., will not enumerate herein, this also do not appointed
What is limited.
Step S20 sends access request to the platform to be visited according to first URL link.
Specifically, in practical applications, web crawlers can be using based on transmission control protocol/Internet Protocol
(Transmission Control Protocol/Internet Protocol, ICP/IP protocol) transmits the super texts of data
This transport protocol (HyperText Transfer Protocol, HTTP) is to platform (the substantially clothes of the platform to be visited
Business device) send access request.
It should be understood that being given above only a kind of specific implementation for sending access request to the platform to be visited
Mode does not constitute any restriction to technical solution of the present invention, and in practical applications, those skilled in the art can basis
It needs to be configured, herein with no restrictions.
Step S30 grabs described the after receiving the response that the platform to be visited is made according to the access request
Data information in the corresponding page of one URL link.
It should be understood that in practical applications, if the access request success sent to the platform to be visited, and
After the platform to be visited is proved to be successful the first URL link carried in the access request, and successful response can be made,
And feed back the data information in the corresponding page of first URL link.At this point, web crawlers and can grab described to be visited
The data information of platform feedback being directed in the corresponding page of first URL link.
Step S40 parses the data information, obtains the second URL link embedded in the page, and by institute
It states the second URL link and is added to URL queue to be crawled.
It should be understood that in practical applications, in addition to that can show with described wait divide in the corresponding page of the first URL link
Analyse the identical data information of agricultural product, it is also possible to multiple URL links relevant to the data information can be shown, for the ease of area
Divide referred to herein as the second URL link.
Such as a net including the agricultural product to be analyzed is shown in the corresponding page of the first URL link
Network store homepage mainly shows the four major class agricultural production such as agricultural product A, agricultural product B, agricultural product C and agricultural product D in the homepage
Product information, while each big agricultural products are corresponding with second URL link again, it is main in the corresponding page of the second URL link
Show the small agricultural products that corresponding agricultural product include.
For example, mainly showing agricultural product A-1, agricultural product A- in the corresponding page of corresponding second URL link of agricultural product A
2 and agricultural product A-3;Agricultural product B-1 and agricultural product are mainly shown in the corresponding page of corresponding second URL link of agricultural product B
B-2;Agricultural product C-1, agricultural product C-2, agricultural product are mainly shown in the corresponding page of corresponding second URL link of agricultural product C
C-3 and agricultural product C4;Agricultural product D-1 and agricultural product are mainly shown in the corresponding page of corresponding second URL link of agricultural product D
D-2。
It should be understood that the above is only for example, any restriction is not constituted to technical solution of the present invention, in reality
In the application of border, those skilled in the art, which can according to need, to be configured, herein with no restrictions.
In addition, in the present embodiment, why the second URL link embedded in the page is added to wait crawl
URL queue is data the second URL link number that is more, thus parsing because web crawlers crawls in practical applications
It measures relatively bulky.And it often crawls, parse second URL link and can consume many times, thus a large amount of second URL link
It tends not to access in the short time, therefore needs for the second URL link got every time to be added in URL queue to be crawled.
" second " in addition, " first " in above-mentioned described " the first URL link ", and in " the second URL link " is only
It is only for distinguishing the URL link embedded in the corresponding URL link of the platform to be visited page corresponding with the URL link, not
URL link itself is caused to limit.In practical applications, any one " second URL link " is relative in its corresponding page
Embedded URL link can be regarded as one " the first URL link ".
In addition, it is noted that due in practical applications, in addition to meeting in the corresponding page of second URL link
Relevant information including agricultural product to be analyzed, will also include some interference informations, for example, various formats advertisement (picture, audio,
Video etc.) information.Therefore, in order to simplify the structure of the corresponding page of the second URL link as far as possible, while facilitating web crawlers golden
Belong to data to crawl, it, can be right when obtaining the second URL link, and the second URL link is added to URL queue to be crawled
Second URL link carries out denoising.
In order to make it easy to understand, the present embodiment provides a kind of specific denoising mode, approximately as:
(1) data information is parsed, obtains the second URL link embedded in the page.
(2) second URL link is parsed, obtains the corresponding standardization label of second URL link.
Specifically, in practical applications, the mark in the substantially corresponding page of the second URL link to standardize herein
Label.
Due to current Webpage be normally based on hypertext markup language (HyperText Markup Language,
HTML it) compiles.
Further, since noise link is normally present some picture tags, the hyperlink of tag definition in practical applications,
And in the URL of some specified hyperlink targets, therefore it need to only standardize to this kind of label.
(3) the corresponding abstract tree of second URL link is generated according to the standardization label.
(4) it is based on dom tree matching process, by the node content of the abstract tree and agricultural product subject information progress
Match, remove unmatched node content, obtains and matched second URL link of the agricultural product subject information.
It is generated when specifically, due to the abstract tree according to the standardization label, thus each of abstract tree
Node essence is exactly a standardization label.Therefore it is carried out by the node content of the abstract tree and the agricultural product subject information
When matching, the keyword in the keyword and agricultural product subject information in the node is specifically extracted, then by two keywords
It compares, and then determines whether the node needs to remove.In this way, by it is described it is abstract tree each of node
Content wants after being matched with the agricultural product theme, can complete the removal linked to noise, so obtain with it is described
Matched second URL link of agricultural product subject information.
(5) URL queue to be crawled will be added to matched second URL link of the agricultural product subject information.
It should be understood that only a kind of specific denoising mode that the present embodiment provides, to technical solution of the present invention
Any restriction is not constituted, in practical applications, those skilled in the art, which can according to need, to be configured, and is not limited herein
System.
Step S50, based on the anchor multiple attributes integration mode of path polymerization, to first URL link and described wait climb
It takes the second URL link in URL queue to be handled, obtains the multiple category of the corresponding rich text format of second URL link
Sexual Themes information.
The multiple attributes master of the corresponding rich text format of each described second URL link of above-mentioned acquisition in order to facilitate understanding
The operation for inscribing information, is given below a kind of concrete implementation mode, approximately as:
(1) it according to the second URL link in first URL link and the URL queue to be crawled, generates described wait divide
Analyse the corresponding path access digraph of agricultural product.
Specifically, each vertex of the path access digraph is exactly the corresponding page of a URL link, with Fig. 3
For be specifically described.
As shown in figure 3, the source web page u essence in figure is exactly the corresponding page of the first URL link, teas type and tea price are then
For the two embedded corresponding pages of the second URL link parsed from the data information of the page, page object v1, page object
V2 and page object v3 is then the 2nd URL of the next layer of page parsed from the corresponding page of above-mentioned two second URL link
Link.
(2) the anchor multiple attributes integration mode based on path polymerization, determines the most short visit in the path access digraph
It asks the way diameter, obtains most short access path set.
Specifically, in practical applications in a path access digraph there may be mulitpath, and it is therein
There may be loop free path (inc) for shortest path, it is also possible to which there are endless path (closures).
For the ease of distinguishing, in practical applications, it can indicate that shortest path has ring actually with different set
Or it is acyclic.
In order to make it easy to understand, a kind of specific representation of most short loop free path set is given below.
Such as M most short loop free paths of source web page to page object, following set expression can be used:
It should be understood that the above is only for example, any restriction is not constituted to technical solution of the present invention, in reality
In the application of border, those skilled in the art, which can according to need, to be configured, herein with no restrictions.
(3) it determines the corresponding Anchor Text of the most short access path of each in the most short access path set, obtains described
The corresponding access path Anchor Text set of most short access path set, and be each member in the access path Anchor Text set
Element one weight of distribution.
It, specifically can be according to as follows in for the access path Anchor Text set when each one weight of Elemental partition
Mode carries out:
Firstly, agreement PmIt is most short loop free path, the value range of m meets:
Then, arrange w (Pm)≤w(Pm+1), the value range of m meets:
Then, arrange w (PmThe value range of)≤w (P), P meet:
Finally, agreement PmIn Pm+1It determines before, the value range of m meets:
Wherein, W is weight, and M takes positive integer.
Starting provides that the weight of each element (i.e. each edge) is 1, if therefore path P pass through m directed edge, then w (P)
=m.
It should be understood that only a kind of specific implementation for distributing weight is given above, to technology of the invention
Scheme does not constitute any restriction, and in practical applications, those skilled in the art, which can according to need, to be configured, herein not
It is limited.
(4) formula pair is standardized according to preset weight, to each element pair in the access path Anchor Text set
The weight answered is standardized.
Specifically, the weight standardization formula used in the present embodiment is as follows:
Wherein,It is that element e existsIn weight.
Still by taking access path digraph shown in Fig. 3 as an example, by the weight normalized form, former access road can be modified
In diameter from source web page u to the Anchor Text of page object v1, page object v2 and page object v3 in element weights.
(5) descending sort is carried out to the weight after standardization, obtains the corresponding rich text format of second URL link
Multiple attributes subject information.
Step S60, respectively by the corresponding multiple attributes theme of each second URL link in the URL queue to be crawled
Information is compared with the agricultural product subject information, is extracted and is met preset threshold with the agricultural product subject information similarity
Corresponding second URL link of multiple attributes subject information.
The operation that link based on web crawlers is extracted in order to facilitate understanding, is given below a kind of concrete implementation mode,
Approximately as:
(1) multiple attributes theme feature word is extracted from the multiple attributes subject information, to the multiple attributes theme
Feature Words carry out Hash processing, obtain the first cryptographic Hash, and the multiple attributes theme feature word is multiple attributes theme letter
Cease an element in corresponding access path Anchor Text set.
(2) the corresponding weight of the multiple attributes theme feature word is obtained from the access path Anchor Text set, and
First cryptographic Hash is quantified as primary vector in conjunction with the weight.
(3) agricultural product theme feature word is extracted from the agricultural product subject information, and to the agricultural product theme feature
Word carries out Hash processing, obtains the second cryptographic Hash.
(4) according to be the preset weight of agricultural product theme feature word by second cryptographic Hash be quantified as second to
Amount.
(5) primary vector and the secondary vector are compared, is extracted similar to the agricultural product subject information
Degree meets corresponding second URL link of multiple attributes subject information of preset threshold.
Specifically, the present embodiment is by by the comparison process of multiple attributes subject information and the agricultural product subject information
The comparison between two vectors is converted to, so as to the more visual in image comparing result that obtains, that is, facilitates mentioning for link
It takes, guaranteed accuracy.
By foregoing description it is not difficult to find that the link extracting method provided in this embodiment based on web crawlers, passes through base
In the anchor multiple attributes integration mode of path polymerization, the in the first URL link and URL queue to be crawled of access platform is treated
Two URL links are handled, and obtain the multiple attributes subject information of the corresponding rich text format of the second URL link, and will be wait climb
It takes the corresponding multiple attributes subject information of each second URL link in URL queue to compare with agricultural product subject information, mentions
It takes and meets corresponding second URL link of the multiple attributes subject information of preset threshold with agricultural product subject information similarity, effectively
Ensure that the accuracy rate extracted to specific URL link, and then can to avoid web crawlers because of the crawling of unrelated link caused by
The wasting of resources enables needed for the acquisition people of web crawlers fast accurate to significantly improve the performance of web crawlers
Information, promoted user experience.
With reference to Fig. 4, Fig. 4 is a kind of process signal of link extracting method second embodiment based on web crawlers of the present invention
Figure.
Based on above-mentioned first embodiment, the present embodiment based on the link extracting method of web crawlers the step S50 it
Before, further includes:
Step S00, using the counting bloom filter of chain feature, and in conjunction with multiple Hash to the URL team to be crawled
Second URL link in column carries out joint duplicate removal.
Specifically, the above-mentioned described counting bloom filter using chain feature, and in conjunction with multiple Hash to described
The joint duplicate removal that second URL link in URL queue to be crawled carries out, is broadly divided into corresponding to the URL link whole
Body characteristics URL link duplicate removal and to URL link segment duplicate removal.
And URL link segment is obtained according to global feature URL link, thus in order to guarantee above-mentioned joint duplicate removal
Operation can be gone on smoothly, and need first to determine the corresponding relationship between the second URL link and global feature URL link.
In order to make it easy to understand, the present embodiment provide it is corresponding between a kind of the second URL link of determination and global feature URL link
The specific implementation of relationship, approximately as:
(1) the URL queue to be crawled is traversed, signature analysis is carried out to current second URL link traversed,
Extract protocol type part, path sections and the inquiry part of current second URL link.
Specifically, since URL link in practical applications is the resource on unique identification network.Also, one
As for, a URL link would generally include following five component parts: protocol type part (usually use Protocol table
Show), server address part (usual user Host indicate), port numbers part (usually being indicated with Port), path sections (usually
Indicated with Path) and inquiry part (usually being indicated with Fragment).
Wherein, protocol type part, path sections and these three parts of inquiry part can usually embody a URL chain
The feature connect.
Thus, the present embodiment is by traversing the URL queue to be crawled, and to current 2nd URL traversed
Link carries out signature analysis, and then extracts the protocol type part of current second URL link (below subsequent explanation
User p1Indicate), path sections are (for the ease of user p below subsequent explanation2Indicate) and inquire that part (continues for the ease of after
Bright following user p3It indicates).
(2) according to the protocol type part, the path sections and the inquiry part, described current second is obtained
The corresponding global feature URL link of URL link.
Specifically, due to p1、p2And p3This three parts can embody whole features of current second URL link, thus
By to p1、p2And p3The corresponding global feature URL link of current second URL link can be obtained by being combined, and be used below
p1p2p3Indicate the corresponding global feature URL link of each second URL link.
(3) corresponding relationship between current second URL link and the global feature URL link is established, and by institute
Corresponding relationship is stated to update into the URL queue to be crawled.
Specifically, current second URL link and the global feature URL chain why are established in the present embodiment
Corresponding relationship between connecing, and it is subsequent to for convenience into the URL queue to be crawled that the corresponding relationship, which updated,
During two URL link duplicate removals, can the corresponding relationship be quickly found out the corresponding global feature URL chain of current second URL link
It connects, and then the corresponding URL link segment of current second URL link is obtained according to whole URL link.
In addition, in practical applications, the corresponding relationship can not also be updated into the URL queue to be crawled, and
It is individually to store.When treating the second URL link crawled in URL queue and carrying out joint duplicate removal, according to current traversed
Two URL links search the corresponding global feature URL link of current second URL link i.e. from the mapping table individually stored
It can.
It should be understood that the above is only for example, any restriction is not constituted to technical solution of the present invention, in reality
In the application of border, those skilled in the art, which can according to need, to be configured, herein with no restrictions.
Further, obtain above-mentioned corresponding relationship and the corresponding global feature URL link of each second URL link it
Afterwards, the above-mentioned described counting bloom filter using chain feature, and in conjunction with multiple Hash in the URL queue to be crawled
Second URL link carry out the operation of joint duplicate removal, specifically can be as described below:
(1) the URL queue to be crawled is traversed, obtains the corresponding entirety of current second URL link traversed
Feature URL link.
Specifically, obtaining the corresponding global feature URL link of current second URL link traversed is according to above-mentioned
Described corresponding relationship obtains.
(2) whole duplicate checking is carried out to the global feature URL link using the counting bloom filter of chain feature, obtained
The corresponding duplicate checking mark of the global feature URL link.
Specifically, counting bloom filter employed in the present embodiment and non-existing use when link duplicate removal
The counting grand device of cloth, but the counting bloom filter of the chain feature based on URL link.
That is, the calculating Bloom filter of the present embodiment is climbed when carrying out duplicate removal to link particular by treating
The corresponding global feature URL link of each second URL link in URL queue is taken to carry out feature identification, then basis recognizes
Feature carry out whole duplicate checking, i.e., be to have entered link to each second to carry out Characteristic Contrast, and then realize whole look into duplicate removal
Weight.
Also, identify for convenience it is subsequent recombinated according to characteristic fragment after URL link segment, can also be global feature
URL link distributes corresponding duplicate checking mark.
(3) according to the duplicate checking mark, feature identification is carried out to the global feature URL link, obtains multiple feature pieces
Section.
It specifically, with global feature URL link is still p1p2p3For, by being carried out to the global feature URL link
After feature identification, obtained multiple characteristic fragments, which specifically can be, respectively includes protocol type part, path sections and asking portion
The segment divided, i.e., to characteristic fragment p1, characteristic fragment p2With characteristic fragment p3。
(4) according to preset URL link reformulation rule, the multiple characteristic fragment is recombinated, obtains N number of recombination
URL link segment.
It should be understood that since a global feature URL link is by protocol type part, path sections and inquiry three
Part composition, thus 1 recombination URL link segment can be at least obtained, therefore N is the integer more than or equal to 1 in the present embodiment.
In addition, total in practical application, the URL link reformulation rule can by those skilled in the art as needed into
URL link segment after row setting, such as regulation recombination must include characteristic fragment p1, or the URL link segment after recombination
It cannot include characteristic fragment p3Deng will not enumerate, any restrictions also do not done to this herein.
Correspondingly, if URL link reformulation rule be recombination after URL link segment must include characteristic fragment p1, then
It only includes p that obtained recombination URL link segment, which generally comprises,1The URL link segment of characteristic fragment only includes p1Characteristic fragment and
p2The URL link segment of characteristic fragment, and only include p1Characteristic fragment and p3The URL link segment of characteristic fragment.
If URL link reformulation rule is that the URL link segment after recombination cannot include characteristic fragment p3, then the weight that obtains
It only includes p that group URL link segment, which generally comprises,1The URL link segment of characteristic fragment and only include p1Characteristic fragment and p2Feature piece
The URL link segment of section.
It should be understood that the above is only for example, any restriction is not constituted to technical solution of the present invention, in reality
In the application of border, those skilled in the art can be configured according to actual needs, herein with no restrictions.
(5) multiple Hash duplicate checking is carried out to N number of recombination URL link segment, it is corresponding obtains current second URL link
Duplicate checking result.
It is noted that the second URL link being buffered in URL queue to be crawled may due in practical applications
Have largely, thus the URL link segment obtained after recombinating can be more more.Therefore, in the present embodiment, in order to reduce as far as possible pair
The second URL link cached in URL queue to be crawled is to the occupancy of memory space, according to preset URL link reformulation rule,
The multiple characteristic fragment is recombinated, after obtaining N number of recombination URL link segment, can first be based on MD5 algorithm, to
To N number of recombination URL link segment compressed respectively, and then it is close to obtain the corresponding character string of N number of recombination URL link segment
The character string ciphertext is finally replaced the content in corresponding recombination URL link segment by text.
It should be understood that being given above only a kind of specific compress mode, not to technical solution of the present invention
Any restriction is constituted, in practical applications, those skilled in the art can choose suitable compression method according to actual needs,
Herein with no restrictions.
Correspondingly, above-mentioned that multiple Hash duplicate checking is carried out to N number of recombination URL link segment, obtain the current 2nd URL chain
The operation of corresponding duplicate checking result is connect, specifically:
(5-1) extracts the corresponding character string ciphertext of N number of recombination URL link segment, chooses from N number of character string ciphertext any
One character string ciphertext carries out K Hash processing, obtains K cryptographic Hash.
It should be understood that due to link deduplication operation provided in this embodiment, the tool when carrying out joint duplicate removal to link
What body combined is multiple Hash, i.e., at least needs to carry out 2 Hash processing to a character string ciphertext, therefore above-mentioned described K is
Integer more than or equal to 2.
(5-2) joins using K cryptographic Hash hash to the bit vector space constructed in advance as with reference to cryptographic Hash, and for each
Examine the corresponding spatially-variable counter setting initial count value of cryptographic Hash.
Specifically, each in the present embodiment with reference to the initial meter shown on the corresponding spatially-variable counter of cryptographic Hash
Numerical value is indicated with " 0 ".
(5-3) carries out K Hash processing to remaining N-1 character string ciphertext respectively, and it is close to obtain each remaining character string
The corresponding K cryptographic Hash of text.
(5-4) by the corresponding K cryptographic Hash random hash of each remaining character string ciphertext to institute's bit vector space, and
It is adjacent with reference to cryptographic Hash with any one.
Specifically, it is referred to actually with that for the ease of determining newly to hash to the cryptographic Hash in institute's bit vector space
Cryptographic Hash is adjacent, can preset a determining standard, such as two neighboring with reference to being inserted into new Hash between cryptographic Hash
It, can be using the nearest reference cryptographic Hash of cryptographic Hash that selected distance is newly inserted into as adjacent reference cryptographic Hash when value.
It should be understood that the above is only for example, any restriction is not constituted to technical solution of the present invention, in reality
In the application of border, those skilled in the art can be configured according to actual needs, herein with no restrictions.
(5-5) uses head to insert method before the adjacent corresponding initial count value of reference cryptographic Hash as each new hash to institute
The cryptographic Hash in bit vector space is inserted into a preset characters.
Specifically, the preset characters select " 1 " to indicate in the present embodiment.
Such as cryptographic Hash is referred to for one, the initial count value shown on corresponding spatially-variable counter is
"0".When there is a new cryptographic Hash hash to position adjacent thereto, it is necessary to insert method being previously inserted into " 0 " using head
One preset characters " 1 ", the count value shown on spatially-variable counter at this time become " 10 ".
Correspondingly, position has been thought with reference to cryptographic Hash to this if there are two new cryptographic Hash hash, needed using head
The method of inserting is previously inserted into two preset characters " 1 " in " 0 ", and the count value shown on spatially-variable counter at this time becomes " 110 ".
(5-6) counts each with reference to the number of preset characters before the corresponding initial value of cryptographic Hash, according to described default
The number of character determines the corresponding duplicate checking result of current second URL link.
Specifically, determining duplicate checking result can be with are as follows:
If the number of the preset characters " 1 " before initial count value " 0 " is greater than 1, it is determined that the recombination URL segment weight
It is multiple, it needs to abandon;
Otherwise, it determines the recombination URL segment does not repeat, can retain.
(6) according to the duplicate checking as a result, the second URL link in the URL queue to be crawled is retained or abandoned
Operation.
It should be understood that only a kind of specific implementation for combining duplicate removal is given above, to technology of the invention
Scheme does not constitute any restriction, and in practical applications, those skilled in the art, which can according to need, to be reasonably adjusted, herein not
It is limited.
In addition, in practical applications, in order to further reduce the occupancy to memory space, in the meter using chain feature
Number Bloom filter, and joint duplicate removal is carried out to second URL link in the URL queue to be crawled in conjunction with multiple Hash
Later, it is also based on MD5 algorithm, the second URL link of each of URL queue to be crawled described in after duplicate removal is pressed
Contracting, and then obtain the corresponding character string ciphertext of each second URL link;Finally the character string ciphertext is replaced corresponding
Content in second URL link reduces empty to storage to compress the second URL link in URL queue to be crawled as far as possible
Between occupancy.
By foregoing description, it is not difficult to find out that, the link extracting method provided in this embodiment based on web crawlers is being treated
It crawls before the second URL link in URL queue extracts operation, by treating the second URL link crawled in URL queue
Deduplication operation is carried out, to further reduce unnecessary interference in link extraction process, improves the extraction effect of web crawlers
Rate.
In addition, counting bloom filter of the present embodiment by using chain feature, and in conjunction with multiple Hash to it is described to
It crawls the second URL link cached in URL queue and carries out whole and part joint duplicate removal, to reduce counting as far as possible
The False Rate of Bloom filter effectively improves the performance of web crawlers, enables the acquisition people of web crawlers fast accurate
Needed for information, the user experience is improved as far as possible.
In addition, during duplicate removal, by being based on compression algorithm, if MD5 algorithm compresses URL link, thus to the greatest extent
The possible occupancy reduced to memory space.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium
On be stored with the link extraction procedure based on web crawlers, the link extraction procedure based on web crawlers is executed by processor
The step of Shi Shixian as described above link extracting method based on web crawlers.
It is the structural block diagram of the link extraction element first embodiment the present invention is based on web crawlers referring to Fig. 5, Fig. 5.
As shown in figure 5, the link extraction element based on web crawlers that the embodiment of the present invention proposes includes: extraction module
5001, sending module 5002, handling module 5003, parsing module 5004, processing module 5005 and extraction module 5006.
Wherein, extraction module 5001, for receive agricultural product to be analyzed data grabber request when, from the data
Crawl request in extract platform to be visited the first uniform resource position mark URL link and it is relevant to the agricultural product to be analyzed
Subject information;Sending module 5002, for sending access request to the platform to be visited according to first URL link;It grabs
Modulus block 5003 grabs described for after receiving the response that the platform to be visited is made according to the access request
Data information in the corresponding page of one URL link;Parsing module 5004 is obtained for parsing to the data information
The second URL link embedded in the page, and second URL link is added to URL queue to be crawled;Processing module
5005, the anchor multiple attributes integration mode for being polymerize based on path, to first URL link and the URL team to be crawled
The second URL link in column is handled, and the multiple attributes theme letter of the corresponding rich text format of second URL link is obtained
Breath;Extraction module 5006, for respectively by each corresponding multiple attributes of the second URL link in the URL queue to be crawled
Subject information is compared with the agricultural product subject information, is extracted and is met default threshold with the agricultural product subject information similarity
Corresponding second URL link of the multiple attributes subject information of value.
It should be understood that each module involved in the present embodiment is logic module, and in practical applications, one
Logic unit can be a physical unit, be also possible to a part of a physical unit, can also be with multiple physical units
Combination realize.In addition, in order to protrude innovative part of the invention, it will not be proposed by the invention with solution in the present embodiment
The technical issues of the less close unit of relationship introduce, but this does not indicate that there is no other units in present embodiment.
In addition, the link extraction element based on web crawlers provided in the present embodiment in order to facilitate understanding is in practical application
In each functional module specific process flow, below for parsing module 5004, processing module 5005 and extraction module 5006
Processing be specifically described.
Specifically, the execution of parsing module 5004 parses the data information, obtain in the page
The second embedding URL link, and second URL link is added to the operation of URL queue to be crawled, it realizes in a particular application
Process approximately as:
Firstly, parsing to the data information, the second URL link embedded in the page is obtained;
Then, second URL link is parsed, obtains the corresponding standardization label of second URL link;
Then, the corresponding abstract tree of second URL link is generated according to the standardization label;
Then, it is based on dom tree matching process, the node content of the abstract tree and the agricultural product subject information are carried out
Matching, removes unmatched node content, obtains and matched second URL link of the agricultural product subject information;
Finally, by URL queue to be crawled is added to matched second URL link of the agricultural product subject information.
It is denoised it should be understood that being given above only a kind of the second URL link crawled in URL queue for the treatment of
Specific implementation, any restriction, in a particular application, those skilled in the art are not constituted to technical solution of the present invention
Member, which can according to need, to be configured, and the present invention is without limitation.
In addition, the anchor multiple attributes integration mode based on path polymerization that the processing module 5005 executes, to described the
The second URL link in one URL link and the URL queue to be crawled is handled, and it is corresponding to obtain second URL link
The operation of the multiple attributes subject information of rich text format, in a particular application implementation process approximately as:
Firstly, according to the second URL link in first URL link and the URL queue to be crawled, generate it is described to
The corresponding path access digraph of anal yzing agricul products;
Then, the anchor multiple attributes integration mode based on path polymerization, determines most short in the path access digraph
Access path obtains most short access path set;
It is then determined the corresponding Anchor Text of the most short access path of each in the most short access path set, obtains institute
The corresponding access path Anchor Text set of most short access path set is stated, and is each in the access path Anchor Text set
One weight of Elemental partition;
Then, formula pair is standardized according to preset weight, to each element in the access path Anchor Text set
Corresponding weight is standardized;
Finally, carrying out descending sort to the weight after standardization, the corresponding rich text format of second URL link is obtained
Multiple attributes subject information.
It should be understood that being given above, only a kind of to obtain each second URL link in URL queue to be crawled corresponding
Rich text format multiple attributes subject information specific implementation, any limit is not constituted to technical solution of the present invention
Fixed, in a particular application, those skilled in the art, which can according to need, to be configured, and the present invention is without limitation.
In addition, the extraction module 5006 execute respectively by each second URL link in the URL queue to be crawled
The step of corresponding multiple attributes subject information and the agricultural product subject information compare is extracted and the agricultural product theme
Information similarity meets the corresponding second URL link operation of multiple attributes subject information of preset threshold, real in a particular application
Existing process approximately as:
Firstly, multiple attributes theme feature word is extracted from the multiple attributes subject information, to the multiple attributes master
It inscribes Feature Words and carries out Hash processing, obtain the first cryptographic Hash, the multiple attributes theme feature word is the multiple attributes theme
An element in the corresponding access path Anchor Text set of information;
Then, the corresponding weight of the multiple attributes theme feature word is obtained from the access path Anchor Text set,
And first cryptographic Hash is quantified as primary vector in conjunction with the weight;
Then, agricultural product theme feature word is extracted from the agricultural product subject information, and special to the agricultural product theme
It levies word and carries out Hash processing, obtain the second cryptographic Hash;
Then, according to for the preset weight of agricultural product theme feature word by second cryptographic Hash be quantified as second to
Amount;
Finally, the primary vector and the secondary vector are compared, extract and the agricultural product subject information phase
Meet corresponding second URL link of multiple attributes subject information of preset threshold like degree.
It should be understood that being given above only a kind of specific implementation for extracting specific link from URL queue to be crawled
Mode does not constitute any restriction to technical solution of the present invention, and in a particular application, those skilled in the art can basis
It needs to be configured, the present invention is without limitation.
By foregoing description it is not difficult to find that the link extraction element provided in this embodiment based on web crawlers, passes through base
In the anchor multiple attributes integration mode of path polymerization, the in the first URL link and URL queue to be crawled of access platform is treated
Two URL links are handled, and obtain the multiple attributes subject information of the corresponding rich text format of the second URL link, and will be wait climb
It takes the corresponding multiple attributes subject information of each second URL link in URL queue to compare with agricultural product subject information, mentions
It takes and meets corresponding second URL link of the multiple attributes subject information of preset threshold with agricultural product subject information similarity, effectively
Ensure that the accuracy rate extracted to specific URL link, and then can to avoid web crawlers because of the crawling of unrelated link caused by
The wasting of resources enables needed for the acquisition people of web crawlers fast accurate to significantly improve the performance of web crawlers
Information, promoted user experience.
It should be noted that workflow described above is only schematical, not to protection model of the invention
Enclose composition limit, in practical applications, those skilled in the art can select according to the actual needs part therein or
It all achieves the purpose of the solution of this embodiment, herein with no restrictions.
In addition, the not technical detail of detailed description in the present embodiment, reference can be made to provided by any embodiment of the invention
De-weight method is linked, details are not described herein again.
Based on the first embodiment of the above-mentioned link extraction element based on web crawlers, propose that the present invention is based on web crawlers
Link extraction element second embodiment.
In the present embodiment, the link extraction element based on web crawlers also packet deduplication module.
Wherein, deduplication module, for using chain feature counting bloom filter, and in conjunction with multiple Hash to it is described to
Second URL link crawled in URL queue carries out joint duplicate removal.
It should be noted that each module involved in the present embodiment is logic module, and in practical applications, one
Logic unit can be a physical unit, be also possible to a part of a physical unit, can also be with multiple physical units
Combination realize.In addition, in order to protrude innovative part of the invention, it will not be proposed by the invention with solution in the present embodiment
The technical issues of the less close unit of relationship introduce, but this does not indicate that there is no other units in present embodiment.
In addition, it is noted that in the present embodiment deduplication module using chain feature counting bloom filter, and
When carrying out joint duplicate removal to second URL link wait crawl in URL queue in conjunction with multiple Hash, it is specifically divided into institute
State the corresponding global feature URL link duplicate removal of URL link and to URL link segment duplicate removal.
And URL link segment is obtained according to global feature URL link, thus in order to guarantee that deduplication module can be suitable
Benefit executes aforesaid operations, needs first to determine the corresponding relationship between the second URL link and global feature URL link.
It, substantially can following institute about the mode for determining corresponding relationship between the second URL link and global feature URL link
It states:
Firstly, traversing to the URL queue to be crawled, feature point is carried out to current second URL link traversed
Protocol type part, path sections and the inquiry part of current second URL link are extracted in analysis;
Then, according to the protocol type part, the path sections and the inquiry part, described current second is obtained
The corresponding global feature URL link of URL link;
Finally, establishing the corresponding relationship between current second URL link and the global feature URL link, and will
The corresponding relationship is updated into the URL queue to be crawled.
Correspondingly, after obtaining above-mentioned corresponding relationship, the operation that the deduplication module executes, specifically:
Firstly, traversing to the URL queue to be crawled, it is corresponding whole to obtain current second URL link traversed
Body characteristics URL link;
Then, whole duplicate checking is carried out to the global feature URL link using the counting bloom filter of chain feature, obtained
To the corresponding duplicate checking mark of the global feature URL link;
Then, according to the duplicate checking mark, feature identification is carried out to the global feature URL link, obtains multiple features
Segment;
Then, according to preset URL link reformulation rule, the multiple characteristic fragment is recombinated, obtains N number of recombination
URL link segment;
Then, multiple Hash duplicate checking is carried out to N number of recombination URL link segment, it is corresponding obtains current second URL link
Duplicate checking result;
Finally, according to the duplicate checking as a result, the second URL link in the URL queue to be crawled is retained or lost
Abandon operation.
It should be noted that in the present embodiment, above-mentioned described N is the integer more than or equal to 1.
However, it should be understood that being given above only a kind of the second URL link of determination and global feature URL link
Between corresponding relationship, and using the counting bloom filter of chain feature, and in conjunction with multiple Hash to the URL team to be crawled
Second URL link in column carries out the specific implementation of joint duplicate removal, does not constitute and appoints to technical solution of the present invention
What is limited, and in a particular application, those skilled in the art, which can according to need, to be configured, and the present invention is without limitation.
Further, in practical applications, in order to reduce the 2nd URL chain for treating and crawling and caching in URL queue as far as possible
The occupancy to memory space is connect, according to preset URL link reformulation rule, the multiple characteristic fragment is recombinated, is obtained
To after N number of recombination URL link segment, it can be first based on MD5 algorithm, obtained N number of recombination URL link segment is carried out respectively
Compression, and then obtains the corresponding character string ciphertext of N number of recombination URL link segment, finally replaces the character string ciphertext pair
The content in recombination URL link segment answered.
Correspondingly, described that multiple Hash duplicate checking is carried out to N number of recombination URL link segment, obtain the current 2nd URL chain
The operation of corresponding duplicate checking result is connect, specifically:
Firstly, extracting the corresponding character string ciphertext of N number of recombination URL link segment, chosen from N number of character string ciphertext any
One character string ciphertext carries out K Hash processing, obtains K cryptographic Hash;
Then, join using K cryptographic Hash hash to the bit vector space constructed in advance as with reference to cryptographic Hash, and for each
Examine the corresponding spatially-variable counter setting initial count value of cryptographic Hash;
Then, K Hash processing is carried out to remaining N-1 character string ciphertext respectively, it is close obtains each remaining character string
The corresponding K cryptographic Hash of text;
Then, by each corresponding K cryptographic Hash random hash of residue character string ciphertext to institute's bit vector space, and
It is adjacent with reference to cryptographic Hash with any one;
Then, head is used to insert method before the adjacent corresponding initial count value of reference cryptographic Hash as each new hash to institute
The cryptographic Hash in bit vector space is inserted into a preset characters;
Finally, counting each with reference to the number of preset characters before the corresponding initial value of cryptographic Hash, according to described default
The number of character determines the corresponding duplicate checking result of current second URL link.
It should be noted that in the present embodiment, above-mentioned described K is the integer more than or equal to 2.
However, it should be understood that being given above the corresponding duplicate checking result of only a kind of current second URL link of acquisition
Specific implementation, any restriction, in a particular application, those skilled in the art are not constituted to technical solution of the present invention
Member, which can according to need, to be configured, and the present invention is without limitation.
In addition, in practical applications, in order to further reduce the occupancy to memory space, to the URL to be crawled
After the second URL link in queue carries out joint duplicate removal, it is also based on MD5 algorithm, to URL to be crawled described in after duplicate removal
The second URL link of each of queue is compressed, and then obtains the corresponding character string ciphertext of each second URL link;
The character string ciphertext is finally replaced into the content in corresponding second URL link, to compress URL to be crawled as far as possible
The second URL link in queue reduces the occupancy to memory space.
By foregoing description, it is not difficult to find out that, the link extraction element provided in this embodiment based on web crawlers is being treated
It crawls before the second URL link in URL queue extracts operation, by treating the second URL link crawled in URL queue
Deduplication operation is carried out, to further reduce unnecessary interference in link extraction process, improves the extraction effect of web crawlers
Rate.
In addition, counting bloom filter of the present embodiment by using chain feature, and in conjunction with multiple Hash to it is described to
It crawls the second URL link cached in URL queue and carries out whole and part joint duplicate removal, to reduce counting as far as possible
The False Rate of Bloom filter effectively improves the performance of web crawlers, enables the acquisition people of web crawlers fast accurate
Needed for information, the user experience is improved as far as possible.
In addition, during duplicate removal, by being based on compression algorithm, if MD5 algorithm compresses URL link, thus to the greatest extent
The possible occupancy reduced to memory space.
It should be noted that workflow described above is only schematical, not to protection model of the invention
Enclose composition limit, in practical applications, those skilled in the art can select according to the actual needs part therein or
It all achieves the purpose of the solution of this embodiment, herein with no restrictions.
In addition, the not technical detail of detailed description in the present embodiment, reference can be made to provided by any embodiment of the invention
De-weight method is linked, details are not described herein again.
In addition, it should be noted that, herein, the terms "include", "comprise" or its any other variant are intended to contain
Lid non-exclusive inclusion, so that process, method, article or system including a series of elements are not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or system
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or system including the element.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as read-only memory (Read Only Memory, ROM)/RAM, magnetic disk, CD), including some instructions are used so that one
Terminal device (can be mobile phone, computer, server or the network equipment etc.) executes side described in each embodiment of the present invention
Method.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.