CN106168973A - Network data classified collection method and device - Google Patents
Network data classified collection method and device Download PDFInfo
- Publication number
- CN106168973A CN106168973A CN201610542380.0A CN201610542380A CN106168973A CN 106168973 A CN106168973 A CN 106168973A CN 201610542380 A CN201610542380 A CN 201610542380A CN 106168973 A CN106168973 A CN 106168973A
- Authority
- CN
- China
- Prior art keywords
- parameter
- data
- sorting
- sorting parameter
- collected
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 241000208340 Araliaceae Species 0.000 claims description 7
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 7
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 7
- 235000008434 ginseng Nutrition 0.000 claims description 7
- 230000008569 process Effects 0.000 description 9
- 230000007115 recruitment Effects 0.000 description 5
- 241000031708 Saprospiraceae Species 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000013480 data collection Methods 0.000 description 2
- 241000209140 Triticum Species 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Abstract
The invention provides a network data classified collection method and a device, wherein the method comprises the following steps: determining data to be acquired and determining at least one classification parameter corresponding to the data to be acquired; determining a parameter value corresponding to each classification parameter; generating an entry link corresponding to each classification parameter according to each classification parameter and the corresponding parameter value; and acquiring data corresponding to the corresponding classification parameters one by one aiming at each entry link. According to the method and the device, the data to be collected are classified, each classification parameter and the corresponding parameter value are spliced into the entry link, the list page corresponding to the entry link can be displayed by accessing the entry link, the list page corresponding to each classification has less content, so that the list page of each classification can be completely displayed even if the website has limitation on the number of displayed pages, and the function of preventing data missing collection can be realized by collecting the data of the displayed list page.
Description
Technical field
The present invention relates to big market demand and analysis field, particularly to a kind of network data sort-type acquisition method and dress
Put.
Background technology
The biggest data age quietly rises, and network is flooded with substantial amounts of public information, and Large-Scale Interconnected website ratio
All, therefore these websites become the key object of data collection task to ratio.
Current collecting method is: find the original list that desired data is corresponding in website, due to quantity of information very
Greatly, this original list includes a lot of paging, is acquired, by page turn over operation, the data that each paging is corresponding, wherein,
When carrying out data acquisition for each paging, need to access details page link listed in each paging one by one, thus adopt
Collection is to desired data all of on website.
But for large-scale internet site, its data total amount is excessive, and is restricted by hardware environment, general on website
Only can show that a part of data, existing acquisition mode are to link for the details page demonstrated to carry out data acquisition, therefore,
Whole coverings of site information cannot be realized, thus cause data leakage to adopt problem.
Summary of the invention
Embodiments provide a kind of network data sort-type acquisition method and device, it is possible to efficiently solve existing
In technology, data leak the problem adopted.
First aspect, embodiments provides a kind of network data sort-type acquisition method and includes:
Determine data to be collected, and determine at least one sorting parameter that described data to be collected are corresponding;
Determine the parameter value that each sorting parameter is corresponding;
According to each sorting parameter and corresponding parameter value, generate the linking inlet ports that each sorting parameter is the most corresponding;
For each linking inlet ports, gather the data corresponding to corresponding sorting parameter one by one.
Preferably,
The described parameter value determining that each sorting parameter is corresponding, including:
Determine the targeted website at described data place to be collected;
For obtaining the original list that described data to be collected are corresponding in described targeted website;
In described original list, select each sorting parameter one by one, obtain the classification chain that each sorting parameter is corresponding
Connect;
According to each assorted linking obtained, determine the parameter value that each sorting parameter is corresponding.
Preferably,
The described parameter value determining that each sorting parameter is corresponding, including:
Obtain the target component list for described data to be collected prestored;
According to the corresponding relation of described target component list storage, determine the parameter value that each sorting parameter is corresponding.
Preferably,
The described parameter value according to each sorting parameter with correspondence, generates the entrance chain that each sorting parameter is the most corresponding
Connect, including:
Be respectively directed to each current class parameter and the current parameter value of correspondence, perform following operation: by described currently
The character of sorting parameter, current parameter value and setting is spliced by setting form;Spliced content is added to described
In the assorted linking that current class parameter is corresponding, obtain the linking inlet ports that described current class parameter is corresponding.
Preferably,
Described gather the data corresponding to corresponding sorting parameter one by one for each linking inlet ports, including:
Link for each current entry, proceed as follows respectively:
Obtain the object listing page that the link of described current entry is corresponding;The described object listing page includes at least one
The paging page;
Details in each paging page are linked and conducts interviews, and the details link to accessing carries out data acquisition.
Second aspect, embodiments provides a kind of network data sort-type harvester, including:
First determines unit, is used for determining data to be collected, and determines at least one point that described data to be collected are corresponding
Class parameter;
Second determines unit, for determining the parameter value that each sorting parameter is corresponding;
Signal generating unit, for according to each sorting parameter and corresponding parameter value, generating each sorting parameter the most right
The linking inlet ports answered;
Collecting unit, for for each linking inlet ports, gathers the data corresponding to corresponding sorting parameter one by one.
Preferably,
Described second determines unit, including:
First determines subelement, for determining the targeted website at described data place to be collected;
First obtains subelement, for obtaining the original list that described data to be collected are corresponding in described targeted website;
Select subelement, for selecting each sorting parameter in described original list one by one, obtain each classification
The assorted linking that parameter is corresponding;
Second determines subelement, for according to each assorted linking obtained, determining the ginseng that each sorting parameter is corresponding
Numerical value.
Preferably,
Described second determines unit, including:
Second obtains subelement, for obtaining the target component list for described data to be collected prestored;
3rd determines subelement, for the corresponding relation according to described target component list storage, determines that each is classified
The parameter value that parameter is corresponding.
Preferably,
Described signal generating unit, specifically for being respectively directed to each current class parameter and the current parameter value of correspondence, holds
The following operation of row: the character of described current class parameter, current parameter value and setting is spliced by setting form;To spell
Content after connecing is added in the assorted linking that described current class parameter is corresponding, obtains described corresponding the entering of current class parameter
Mouth link.
Preferably,
Described collecting unit, specifically for linking for each current entry, proceeds as follows respectively: obtain described
The object listing page that current entry link is corresponding;The described object listing page includes at least one paging page;To each
Details link in the individual paging page conducts interviews, and the details link to accessing carries out data acquisition.
Embodiments provide a kind of network data sort-type acquisition method and device, be determined by data to be collected
At least one sorting parameter, data to be collected to be classified, utilize each sorting parameter and corresponding parameter value to spell
Be connected into linking inlet ports, by access this linking inlet ports can show to should the original list of linking inlet ports, due to each
The original list content of classification correspondence is less, therefore, even if website is restricted to display number of pages, and the original list of each classification
It is likely to show completely, by the original list of display is carried out data acquisition such that it is able to realize preventing data leakage from adopting
Function.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is the present invention
Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to according to
These accompanying drawings obtain other accompanying drawing.
Fig. 1 is a kind of network data sort-type acquisition method flow chart that one embodiment of the invention provides;
Fig. 2 is the another kind of network data sort-type acquisition method flow chart that one embodiment of the invention provides;
Fig. 3 is the hardware structure figure of the device place equipment that one embodiment of the invention provides;
Fig. 4 is the network data sort-type harvester structure chart that one embodiment of the invention provides.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
The a part of embodiment of the present invention rather than whole embodiments, based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained on the premise of not making creative work, broadly falls into the scope of protection of the invention.
As it is shown in figure 1, embodiments provide a kind of network data sort-type acquisition method, the method can include
Following steps:
Step 101: determine data to be collected, and determine at least one sorting parameter that described data to be collected are corresponding;
Step 102: determine the parameter value that each sorting parameter is corresponding;
Step 103: according to each sorting parameter and corresponding parameter value, generate entering of each sorting parameter correspondence respectively
Mouth link;
Step 104: for each linking inlet ports, gather the data corresponding to corresponding sorting parameter one by one.
Embodiments provide a kind of network data sort-type acquisition method, be determined by data to be collected at least
One sorting parameter, so that data to be collected are classified, utilize each sorting parameter and corresponding parameter value be spliced into into
Mouthful link, by access this linking inlet ports can show to should the original list of linking inlet ports, owing to each classification is right
The original list content answered is less, and therefore, even if website is restricted to display number of pages, the original list of each classification is likely to
Can show completely, by the original list of display is carried out data acquisition such that it is able to realize preventing data from leaking the function adopted.
In an embodiment of the invention, in order to make gatherer process comprehensively and orderliness is clear, described determine each point
The parameter value that class parameter is corresponding, including:
Determine the targeted website at described data place to be collected;
The original list that described data to be collected are corresponding is obtained in described targeted website;
In described original list, select each sorting parameter one by one, obtain the classification chain that each sorting parameter is corresponding
Connect;
According to each assorted linking obtained, determine the parameter value that each sorting parameter is corresponding.
Such as, data to be collected are the data messages of above all McDonald of Beijing area of U.S. group, it is first determined to be collected
The targeted website at data place is U.S. group, and by " McDonald " as the sorting parameter determined, next opens the homepage of U.S. group, is working as
Inputting Beijing in the search column of front homepage, system can generate an original list, then in described original list, find wheat
When labor option and click on, system can generate the original list that the McDonald of a Beijing area upper with U.S. group is corresponding, is finally working as
Original list described in before obtains the assorted linking that McDonald is corresponding, thus gets the parameter value that " McDonald " is corresponding.
Such as, the assorted linking obtained is:
http://bj.meituan.com/shops/?W=%E9%BA%A6%E5%BD%93%E5%8A%B3&
Mtt=1, then can be using 1 in this assorted linking as the parameter value of this sorting parameter " McDonald ".
By the current link corresponding at sorting parameter gets the parameter value that each sorting parameter is corresponding, utilize and divide
The total data of large-scale website can be classified by the parameter value of class parameter and correspondence, can avoid owing to website shows not
Problem is adopted in the data leakage entirely caused.Meanwhile, the acquisition mode of this sorting technique and parameter value has the good suitability, operation
Simplicity, when the classification number of data to be collected is less, can obtain the parameter that each sorting parameter is corresponding simply and easily
Value.
In an embodiment of the invention, in order to make gatherer process comprehensively and orderliness is clear, described determine each point
The parameter value that class parameter is corresponding, including:
Obtain the target component list for described data to be collected prestored;
According to the corresponding relation of described target component list storage, determine the parameter value that each sorting parameter is corresponding.
When the classification situation of data to be collected is more, such as, data to be collected are the McDonald of Beijing area, KFC
With the data message such as many U.S.s of taste, acquisition target component list can be first passed through, in target component list, finding classification ginseng afterwards
Number and the corresponding relation of parameter value.
Such as, this corresponding relation can include such as table 1 below:
Table 1:
Sorting parameter | Parameter value |
McDonald | 1 |
KFC | 2 |
How beautiful taste is | 3 |
…… | …… |
Can be obtained by the corresponding relation in table 1, the parameter value of " McDonald " correspondence is 1, the parameter that " KFC " is corresponding
Value is 2, and the parameter value of " the many U.S.s of taste " correspondence is 3.
Can quickly obtain the parameter value that each sorting parameter is corresponding, especially when number to be collected in this way
According to classification more time, the corresponding relation of each sorting parameter and parameter value thereof can be called out, it is possible at data acquisition
Portion of time is saved during collection.
In an embodiment of the invention, in order to realize preventing data from leaking the function adopted, described according to each point
Class parameter and corresponding parameter value, generate the linking inlet ports that each sorting parameter is the most corresponding, including:
Be respectively directed to each current class parameter and the current parameter value of correspondence, perform following operation: by described currently
The character of sorting parameter, current parameter value and setting is spliced by setting form;Spliced content is added to described
In the assorted linking that current class parameter is corresponding, obtain the linking inlet ports that described current class parameter is corresponding.
Wherein, this interpolation content-form can also set according to user's request.Such as, this interpolation form is: first will divide
Class parameter, parameter value and setting character splice according to setting form, add the content of splicing to current class chain afterwards
Connect backmost.With sorting parameter be " McDonald ", parameter value is " 1 ", sets character as “ &=", the form that sets is as " classification ginseng
Number, set character, parameter value and splice successively ", current class be linked as " http://bj.meituan.com/shops/&mtt=
1 " as a example by, spliced content is " Mai Danglao &=1 ", and linking inlet ports corresponding to the current class parameter that obtains is http: //
Bj.meituan.com/shops/&mtt=1 Mai Danglao &=1.
The linking inlet ports utilizing sorting parameter, parameter value and setting character to generate, covers current class parameter corresponding
All website data information, links rather than aobvious as accessing in traditional data gatherer process on website by accessing current entry
The part data shown, thus the total data that current class parameter is corresponding can be collected, it is therefore prevented that data leak the problem adopted.
Below Pekinese is worked as data instance to be collected, the network data sort-type in the embodiment of the present invention is adopted
Diversity method is described in detail, as in figure 2 it is shown, embodiments provide a kind of network data sort-type acquisition method, and should
Method may include that
Step 201: determine that data to be collected are Pekinese's work.
In this step, general data to be collected can be given in a text form, the most first determines described number to be collected
According to, it could be classified afterwards, so that it is determined that at least one sorting parameter of described data to be collected.Therefore, first obtain
Get text information, then read over to content of text, finally determine data to be collected.Treating in the embodiment of the present invention
Gather data and be defined as Pekinese's work.
Step 202: determine at least one sorting parameter that Pekinese's operational data information is corresponding.
In this step, after determining data to be collected, complete the classification to described data to be collected, so that it is determined that described
At least one sorting parameter of data to be collected, obtains for follow-up corresponding parameter value and lays the foundation.
Wherein, when classifying the operational data of Beijing area, the number of sorting parameter and classification can need according to user
Asking and be set, but the number of sorting parameter is at least one, such as, the work of Beijing area is drawn and can be divided into four classes, respectively
It is " state-owned enterprise ", " undergraduate course ", " wages " and " working experience ".In the embodiment of the present invention with sorting parameter be " state-owned enterprise ", " undergraduate course " be
Example, is this two class by the workload partition of Beijing area.
Step 203: determine that the targeted website at described data place to be collected is Zhaopin.com station.
In this step, after determining at least one sorting parameter that described data to be collected are corresponding, sorting parameter is with " state
Enterprise ", as a example by " undergraduate course ", the parameter value corresponding for obtaining each sorting parameter, first should be according to the data to be collected determined, really
The targeted website at fixed described data place to be collected.
Wherein, this targeted website can be arbitrary recruitment website, it is also possible to select according to user's request, such as " intelligence
Connection recruitment ", " future is carefree " and " street net ".Using " intelligence connection recruitment " as targeted website in the embodiment of the present invention.
Step 204: obtain the original list corresponding to Pekinese's work in described Zhaopin.com stands.
In this step, after determining that the targeted website of described data to be collected is intelligence connection recruitment, this intelligence should first be opened
Connection recruitment website, then by input keyword on Zhaopin.com station, wherein, this keyword is Beijing, gets website
The original list that the work of upper Beijing area is corresponding, the data in described original list are the part Beijing areas of display on website
Operational data.
Step 205: select each sorting parameter in described original list one by one, obtains each sorting parameter corresponding
Assorted linking.
In this step, after getting the original list that in targeted website, data to be collected are corresponding, for obtaining each
The parameter value that sorting parameter is corresponding, can be obtained by the form generating respective links corresponding to each sorting parameter.
With sorting parameter be " state-owned enterprise " and " undergraduate course ", being linked as of the corresponding original list of Beijing area work " http: //
Sou.zhaopin.com/jobs/=&sm=0&isfilter=1&p=1&ct=-1 " as a example by, the list under current link
In the page, find the sorting item of company nature and educational requirement, be usually in the top of original list or side, then this two
Clicking on state-owned enterprise and undergraduate course in individual sorting item, system can generate the list under current class parameter according to each sorting parameter
The page, eventually get the original list under sorting parameter " state-owned enterprise " correspondence is linked as http: //
Sou.zhaopin.com/jobs/sm=0&isfilter=1&p=1&ct=1, the list under sorting parameter " undergraduate course " is corresponding
The page be linked as http://sou.zhaopin.com/jobs/=&sm=0&ct=-1&isfilter=1&p=1&el=
4。
Step 206: according to the state-owned enterprise obtained, link that undergraduate course is corresponding, determines state-owned enterprise, the parameter value that undergraduate course is the most corresponding.
In this step, the parameter that each sorting parameter is corresponding can be got in the link of the original list of website
Value.
Alternatively, the another way obtaining parameter value corresponding to sorting parameter is: obtained by the shortcut on keyboard
Take each sorting parameter and the corresponding relation of corresponding parametric values on targeted website, determine each classification ginseng by this corresponding relation
The parameter value that number is corresponding.
When building in targeted website, the corresponding relation of each sorting parameter and corresponding parametric values can be stored, use
Family can be directly obtained the corresponding relation of this storage.
Wherein, this shortcut can be developer's setting when carrying out software development, and such as, this shortcut is F12.
Step 207: be combined into linking inlet ports.
In this step, if data message being acquired on website, entrance chain corresponding with sorting parameter need to be generated
Connecing, lay the foundation for next accessing corresponding link, the linking inlet ports simultaneously generated in this step is the classification got
Based on parameter and corresponding parameter value, therefore can cover the content of all data to be collected on website, from
And so that gatherer process comprehensively and orderliness is clear, prevent data from leaking the problem adopted.The form of implementing is: by least one
The character of sorting parameter, parameter value and setting splices according to setting form, spliced content is added to afterwards and works as
In the current link that front sorting parameter is corresponding, thus get the linking inlet ports under current class parameter.
Wherein, the character of this setting can be any character, and character number can be at least one.Such as, this setting
Character be " & ";For another example, this character set is as " %& ".
Further, this setting form can also set according to user's request, and such as, this sets form as classification ginseng
Number, character and parameter value splice successively, to set character for " & ", sorting parameter as " state-owned enterprise ", parameter value as a example by " 1 ", splice
After content be " Guo Qi &1 ".
Further, the interpolation form of splicing content can also set according to user's request, such as, and this interpolation form
For: it is placed in before the parameter value in this sorting parameter current link in splicing.With sorting parameter be " state-owned enterprise ", " state-owned enterprise " right
As a example by the current link answered is " http://sou.zhaopin.com/jobs/sm=0&isfilter=1&p=1&ct=1 ",
The linking inlet ports getting " state-owned enterprise " corresponding is http://sou.zhaopin.com/jobs/sm=0&isfilter=1&p=
1&ct=Guo Qi &11.
Step 208: for two linking inlet ports generated, gather the data corresponding to corresponding sorting parameter one by one.
The linking inlet ports that this step is mainly generated by access, on the basis that website data all covers, finally
Comprehensively gather the data that corresponding sorting parameter is corresponding.Specifically include:
Obtain the object listing page that the link of described current entry is corresponding;The described object listing page includes at least one
The paging page;
Details in each paging page are linked and conducts interviews, and the details link to accessing carries out data acquisition.
With linking inlet ports corresponding to sorting parameter " state-owned enterprise " for " http://sou.zhaopin.com/jobs/sm=0&
Isfilter=1&p=1&ct=Guo Qi &11 ", linking inlet ports that " undergraduate course " is corresponding be " http://sou.zhaopin.com/
Jobs/=&sm=0&ct=-1&isfilter=1&p=1&el=4 Ben Ke &22 " as a example by, first the two link is carried out
Accessing successively, system can be respectively directed to the two linking inlet ports and automatically generate two corresponding original lists.
Wherein, contain much information, so each original list has a lot of list paging face, such as " state-owned enterprise " due to gather
Corresponding whole operational data Information commons page 20, whole operational data Information commons page 30 that " undergraduate course " is corresponding, then for
The list paging face of described generation, conducts interviews successively according to the form of page turning.Such as, generate with sorting parameter for " state-owned enterprise "
As a example by the original list of total data, successively every one page can be conducted interviews from page 1 to 20.
Further, by the link of the details in the list paging face of state-owned enterprise and undergraduate course is conducted interviews, current point is got
Total data information under class.Same, as a example by the original list that sorting parameter is the total data that " state-owned enterprise " generates, obtaining
After getting all original lists of page 1 to 20, respectively each details link on every one page is conducted interviews, finally gather
Job information to all Beijing area state-owned enterprises.
As shown in Figure 3, Figure 4, a kind of network data sort-type harvester is embodiments provided.Device embodiment
Can be realized by software, it is also possible to realize by the way of hardware or software and hardware combining.For hardware view, such as Fig. 3
Shown in, a kind of hardware structure diagram of network data sort-type harvester place equipment provided for the embodiment of the present invention, except
Outside processor shown in Fig. 3, internal memory, network interface and nonvolatile memory, in embodiment, the equipment at device place leads to
Often can also include other hardware, such as the forwarding chip etc. of responsible process message.As a example by implemented in software, as shown in Figure 4, make
It is the device on a logical meaning, is that the CPU by its place equipment is by computer journey corresponding in nonvolatile memory
Sequence instruction reads and runs formation in internal memory.The network data sort-type harvester that the present embodiment provides, including:
First determines unit 401, is used for determining data to be collected, and determines corresponding at least one of described data to be collected
Sorting parameter;
Second determines unit 402, for determining the parameter value that each sorting parameter is corresponding;
Signal generating unit 403, for according to each sorting parameter and corresponding parameter value, generates each sorting parameter respectively
Corresponding linking inlet ports;
Collecting unit 404, for for each linking inlet ports, gathers the data corresponding to corresponding sorting parameter one by one.
In an embodiment of the invention, described second determines unit 402, including:
First determines subelement, for determining the targeted website at described data place to be collected;
First obtains subelement, for obtaining the original list that described data to be collected are corresponding in described targeted website;
Select subelement, for selecting each sorting parameter in described original list one by one, obtain each classification
The assorted linking that parameter is corresponding;
Second determines subelement, for according to each assorted linking obtained, determining the ginseng that each sorting parameter is corresponding
Numerical value.
In an embodiment of the invention, described second determines unit 402, including:
Second obtains subelement, for obtaining the target component list for described data to be collected prestored;
3rd determines subelement, for the corresponding relation according to described target component list storage, determines that each is classified
The parameter value that parameter is corresponding.
In an embodiment of the invention, described signal generating unit 403, specifically for:
Be respectively directed to each current class parameter and the current parameter value of correspondence, perform following operation: by described currently
The character of sorting parameter, current parameter value and setting is spliced by setting form;Spliced content is added to described
In the assorted linking that current class parameter is corresponding, obtain the linking inlet ports that described current class parameter is corresponding.
In an embodiment of the invention, described collecting unit 404, specifically for:
Link for each current entry, proceed as follows respectively:
Obtain the object listing page that the link of described current entry is corresponding;The described object listing page includes at least one
The paging page;
Details in each paging page are linked and conducts interviews, and the details link to accessing carries out data acquisition.
To sum up, each embodiment of the present invention has the effect that
1, in embodiments of the present invention, it is determined by least one sorting parameter of data to be collected, with by number to be collected
According to classifying, each sorting parameter and corresponding parameter value is utilized to be spliced into linking inlet ports, by accessing this linking inlet ports
Can show to should the original list of linking inlet ports, original list content corresponding due to each classification is less, therefore,
Even if website is restricted to display number of pages, the original list of each classification is likely to show completely, by the row to display
The table page carries out data acquisition such that it is able to realize preventing data from leaking the function adopted.
2, in embodiments of the present invention, by the current link corresponding at sorting parameter gets each sorting parameter
Corresponding parameter value, utilizes the parameter value of sorting parameter and correspondence the total data of large-scale website to be classified, can
To avoid owing to website shows that problem is adopted in the data the most entirely caused leakage.Meanwhile, the acquisition mode of this sorting technique and parameter value
There is the good suitability, easy and simple to handle, during for less classification, each sorting parameter pair can be obtained simply and easily
The parameter value answered.
3, in embodiments of the present invention, obtain, by target correspondence parameter list, the parameter that each sorting parameter is corresponding
Value, especially when the classification of data to be collected is more, can adjust the corresponding relation of each sorting parameter and parameter value thereof
Use, it is possible to during data acquisition, save portion of time.
4, in embodiments of the present invention, the linking inlet ports utilizing sorting parameter, parameter value and setting character to generate, cover
Whole website data information that current class parameter is corresponding, link rather than as traditional data collection by accessing current entry
During access the part data of display on website, thus the total data that current class parameter is corresponding can be collected, prevent
Data leak the problem adopted.
The contents such as the information between each unit in said apparatus is mutual, execution process, owing to implementing with the inventive method
Example is based on same design, and particular content can be found in the narration in the inventive method embodiment, and here is omitted.
It should be noted that in this article, the relational terms of such as first and second etc is used merely to an entity
Or operation separates with another entity or operating space, and not necessarily require or imply existence between these entities or operation
The relation of any this reality or order.And, term " includes ", " comprising " or its any other variant are intended to non-
Comprising of exclusiveness, so that include that the process of a series of key element, method, article or equipment not only include those key elements,
But also include other key elements being not expressly set out, or also include being consolidated by this process, method, article or equipment
Some key elements.In the case of there is no more restriction, statement the key element " including a 〃 " and limiting, do not arrange
Except there is also other same factor in including the process of described key element, method, article or equipment.
One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can be passed through
The hardware that programmed instruction is relevant completes, and aforesaid program can be stored in the storage medium of embodied on computer readable, this program
Upon execution, perform to include the step of said method embodiment;And aforesaid storage medium includes: ROM, RAM, magnetic disc or light
In the various medium that can store program code such as dish.
Last it should be understood that the foregoing is only presently preferred embodiments of the present invention, it is merely to illustrate the skill of the present invention
Art scheme, is not intended to limit protection scope of the present invention.All made within the spirit and principles in the present invention any amendment,
Equivalent, improvement etc., be all contained in protection scope of the present invention.
Claims (10)
1. network data sort-type acquisition method, it is characterised in that the method includes:
Determine data to be collected, and determine at least one sorting parameter that described data to be collected are corresponding;
Determine the parameter value that each sorting parameter is corresponding;
According to each sorting parameter and corresponding parameter value, generate the linking inlet ports that each sorting parameter is the most corresponding;
For each linking inlet ports, gather the data corresponding to corresponding sorting parameter one by one.
Method the most according to claim 1, it is characterised in that the described parameter value determining that each sorting parameter is corresponding,
Including:
Determine the targeted website at described data place to be collected;
The original list that described data to be collected are corresponding is obtained in described targeted website;
In described original list, select each sorting parameter one by one, obtain the assorted linking that each sorting parameter is corresponding;
According to each assorted linking obtained, determine the parameter value that each sorting parameter is corresponding.
Method the most according to claim 1, it is characterised in that the described parameter value determining that each sorting parameter is corresponding,
Including:
Obtain the target component list for described data to be collected prestored;
According to the corresponding relation of described target component list storage, determine the parameter value that each sorting parameter is corresponding.
Method the most according to claim 2, it is characterised in that the described parameter according to each sorting parameter with correspondence
Value, generates the linking inlet ports that each sorting parameter is the most corresponding, including:
It is respectively directed to each current class parameter and the current parameter value of correspondence, performs following operation: by described current class
The character of parameter, current parameter value and setting is spliced by setting form;Spliced content is added to described currently
In the assorted linking that sorting parameter is corresponding, obtain the linking inlet ports that described current class parameter is corresponding.
5. according to described method arbitrary in claim 1-4, it is characterised in that described for each linking inlet ports, one by one
Gather the data corresponding to corresponding sorting parameter, including:
Link for each current entry, proceed as follows respectively:
Obtain the object listing page that the link of described current entry is corresponding;The described object listing page includes at least one paging
The page;
Details in each paging page are linked and conducts interviews, and the details link to accessing carries out data acquisition.
6. network data sort-type harvester, it is characterised in that including:
First determines unit, is used for determining data to be collected, and determines at least one classification ginseng that described data to be collected are corresponding
Number;
Second determines unit, for determining the parameter value that each sorting parameter is corresponding;
Signal generating unit, for according to each sorting parameter and corresponding parameter value, generating each sorting parameter correspondence respectively
Linking inlet ports;
Collecting unit, for for each linking inlet ports, gathers the data corresponding to corresponding sorting parameter one by one.
Network data sort-type harvester the most according to claim 6, it is characterised in that described second determines unit,
Including:
First determines subelement, for determining the targeted website at described data place to be collected;
First obtains subelement, for obtaining the original list that described data to be collected are corresponding in described targeted website;
Select subelement, for selecting each sorting parameter in described original list one by one, obtain each sorting parameter
Corresponding assorted linking;
Second determines subelement, for according to each assorted linking obtained, determining the parameter value that each sorting parameter is corresponding.
Network data sort-type harvester the most according to claim 6, it is characterised in that described second determines unit,
Including:
Second obtains subelement, for obtaining the target component list for described data to be collected prestored;
3rd determines subelement, for the corresponding relation according to described target component list storage, determines each sorting parameter
Corresponding parameter value.
Network data sort-type harvester the most according to claim 7, it is characterised in that described signal generating unit, specifically
For being respectively directed to each current class parameter and the current parameter value of correspondence, perform following operation: by described current class
The character of parameter, current parameter value and setting is spliced by setting form;Spliced content is added to described currently
In the assorted linking that sorting parameter is corresponding, obtain the linking inlet ports that described current class parameter is corresponding.
10. according to the arbitrary described network data sort-type harvester of claim 6-9, it is characterised in that described collection list
Unit, specifically for linking for each current entry, proceeds as follows respectively: obtain described current entry link correspondence
The object listing page;The described object listing page includes at least one paging page;To the details in each paging page
Link conducts interviews, and the details link to accessing carries out data acquisition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610542380.0A CN106168973A (en) | 2016-07-11 | 2016-07-11 | Network data classified collection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610542380.0A CN106168973A (en) | 2016-07-11 | 2016-07-11 | Network data classified collection method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106168973A true CN106168973A (en) | 2016-11-30 |
Family
ID=58065805
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610542380.0A Pending CN106168973A (en) | 2016-07-11 | 2016-07-11 | Network data classified collection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106168973A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112632356A (en) * | 2020-12-25 | 2021-04-09 | 深圳市高德信通信股份有限公司 | Network information data classification collection method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070067217A1 (en) * | 2005-09-20 | 2007-03-22 | Joshua Schachter | System and method for selecting advertising |
CN101620608A (en) * | 2008-07-04 | 2010-01-06 | 全国组织机构代码管理中心 | Information collection method and system |
CN105426424A (en) * | 2015-11-04 | 2016-03-23 | 浪潮软件集团有限公司 | Directional paging type acquisition method for network data |
-
2016
- 2016-07-11 CN CN201610542380.0A patent/CN106168973A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070067217A1 (en) * | 2005-09-20 | 2007-03-22 | Joshua Schachter | System and method for selecting advertising |
CN101620608A (en) * | 2008-07-04 | 2010-01-06 | 全国组织机构代码管理中心 | Information collection method and system |
CN105426424A (en) * | 2015-11-04 | 2016-03-23 | 浪潮软件集团有限公司 | Directional paging type acquisition method for network data |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112632356A (en) * | 2020-12-25 | 2021-04-09 | 深圳市高德信通信股份有限公司 | Network information data classification collection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Haralambopoulos et al. | Renewable energy projects: structuring a multi-criteria group decision-making framework | |
US9495282B2 (en) | Method and systems for a dashboard testing framework in an online demand service environment | |
CN103530414B (en) | Web Page Key Words open up word method and apparatus | |
CN110880136A (en) | Recommendation method, system, equipment and storage medium for matched product | |
CN104699837B (en) | Method, device and server for selecting illustrated pictures of web pages | |
CN106599299A (en) | Determining method and device of website key words | |
CN106484699A (en) | The generation method of data base querying field and device | |
Cummaudo et al. | What should I document? A preliminary systematic mapping study into API documentation knowledge | |
US10019520B1 (en) | System and process for using artificial intelligence to provide context-relevant search engine results | |
US8799791B2 (en) | System for use in editorial review of stored information | |
CN110264283A (en) | A kind of popularization resource exhibition method and device | |
CN106201260A (en) | A kind of explorer optimization method and device | |
CN103227791B (en) | A kind of method of data acquisition and device | |
CN106168962B (en) | Search method and device for providing accurate viewpoint based on natural search result | |
CN106168973A (en) | Network data classified collection method and device | |
Gutierrez et al. | Forest and landscape restoration monitoring frameworks: how principled are they? | |
CN106649374A (en) | Navigation tag ordering method and device | |
US20180260820A1 (en) | System device and process for an educational regulatory electronic tool kit | |
Hadidi | Using quality function deployment to conduct assessment for engineering designs’ contractors | |
CN104885075B (en) | A kind of method and device executing reverse search using crucial link | |
CN103870520B (en) | For searching for the device and method of information | |
US8886665B2 (en) | Systems and methods for enhancing management effectiveness | |
Yadav et al. | Resources, facilities and services of the Indian citation index (ICI) | |
KR101126699B1 (en) | Analysising system and method thereof for creation of r?d idea | |
Jeyshankar | Link Analysis and Web Impact Factor of Indian Nationalised Banks’ Website: A Webometric Study |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161130 |