CN106168973A - Network data classified collection method and device - Google Patents

Network data classified collection method and device Download PDF

Info

Publication number
CN106168973A
CN106168973A CN201610542380.0A CN201610542380A CN106168973A CN 106168973 A CN106168973 A CN 106168973A CN 201610542380 A CN201610542380 A CN 201610542380A CN 106168973 A CN106168973 A CN 106168973A
Authority
CN
China
Prior art keywords
parameter
data
sorting
sorting parameter
collected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610542380.0A
Other languages
Chinese (zh)
Inventor
邢荣
王传超
徐宏伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Group Co Ltd
Original Assignee
Inspur Software Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Group Co Ltd filed Critical Inspur Software Group Co Ltd
Priority to CN201610542380.0A priority Critical patent/CN106168973A/en
Publication of CN106168973A publication Critical patent/CN106168973A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

The invention provides a network data classified collection method and a device, wherein the method comprises the following steps: determining data to be acquired and determining at least one classification parameter corresponding to the data to be acquired; determining a parameter value corresponding to each classification parameter; generating an entry link corresponding to each classification parameter according to each classification parameter and the corresponding parameter value; and acquiring data corresponding to the corresponding classification parameters one by one aiming at each entry link. According to the method and the device, the data to be collected are classified, each classification parameter and the corresponding parameter value are spliced into the entry link, the list page corresponding to the entry link can be displayed by accessing the entry link, the list page corresponding to each classification has less content, so that the list page of each classification can be completely displayed even if the website has limitation on the number of displayed pages, and the function of preventing data missing collection can be realized by collecting the data of the displayed list page.

Description

Network data sort-type acquisition method and device
Technical field
The present invention relates to big market demand and analysis field, particularly to a kind of network data sort-type acquisition method and dress Put.
Background technology
The biggest data age quietly rises, and network is flooded with substantial amounts of public information, and Large-Scale Interconnected website ratio All, therefore these websites become the key object of data collection task to ratio.
Current collecting method is: find the original list that desired data is corresponding in website, due to quantity of information very Greatly, this original list includes a lot of paging, is acquired, by page turn over operation, the data that each paging is corresponding, wherein, When carrying out data acquisition for each paging, need to access details page link listed in each paging one by one, thus adopt Collection is to desired data all of on website.
But for large-scale internet site, its data total amount is excessive, and is restricted by hardware environment, general on website Only can show that a part of data, existing acquisition mode are to link for the details page demonstrated to carry out data acquisition, therefore, Whole coverings of site information cannot be realized, thus cause data leakage to adopt problem.
Summary of the invention
Embodiments provide a kind of network data sort-type acquisition method and device, it is possible to efficiently solve existing In technology, data leak the problem adopted.
First aspect, embodiments provides a kind of network data sort-type acquisition method and includes:
Determine data to be collected, and determine at least one sorting parameter that described data to be collected are corresponding;
Determine the parameter value that each sorting parameter is corresponding;
According to each sorting parameter and corresponding parameter value, generate the linking inlet ports that each sorting parameter is the most corresponding;
For each linking inlet ports, gather the data corresponding to corresponding sorting parameter one by one.
Preferably,
The described parameter value determining that each sorting parameter is corresponding, including:
Determine the targeted website at described data place to be collected;
For obtaining the original list that described data to be collected are corresponding in described targeted website;
In described original list, select each sorting parameter one by one, obtain the classification chain that each sorting parameter is corresponding Connect;
According to each assorted linking obtained, determine the parameter value that each sorting parameter is corresponding.
Preferably,
The described parameter value determining that each sorting parameter is corresponding, including:
Obtain the target component list for described data to be collected prestored;
According to the corresponding relation of described target component list storage, determine the parameter value that each sorting parameter is corresponding.
Preferably,
The described parameter value according to each sorting parameter with correspondence, generates the entrance chain that each sorting parameter is the most corresponding Connect, including:
Be respectively directed to each current class parameter and the current parameter value of correspondence, perform following operation: by described currently The character of sorting parameter, current parameter value and setting is spliced by setting form;Spliced content is added to described In the assorted linking that current class parameter is corresponding, obtain the linking inlet ports that described current class parameter is corresponding.
Preferably,
Described gather the data corresponding to corresponding sorting parameter one by one for each linking inlet ports, including:
Link for each current entry, proceed as follows respectively:
Obtain the object listing page that the link of described current entry is corresponding;The described object listing page includes at least one The paging page;
Details in each paging page are linked and conducts interviews, and the details link to accessing carries out data acquisition.
Second aspect, embodiments provides a kind of network data sort-type harvester, including:
First determines unit, is used for determining data to be collected, and determines at least one point that described data to be collected are corresponding Class parameter;
Second determines unit, for determining the parameter value that each sorting parameter is corresponding;
Signal generating unit, for according to each sorting parameter and corresponding parameter value, generating each sorting parameter the most right The linking inlet ports answered;
Collecting unit, for for each linking inlet ports, gathers the data corresponding to corresponding sorting parameter one by one.
Preferably,
Described second determines unit, including:
First determines subelement, for determining the targeted website at described data place to be collected;
First obtains subelement, for obtaining the original list that described data to be collected are corresponding in described targeted website;
Select subelement, for selecting each sorting parameter in described original list one by one, obtain each classification The assorted linking that parameter is corresponding;
Second determines subelement, for according to each assorted linking obtained, determining the ginseng that each sorting parameter is corresponding Numerical value.
Preferably,
Described second determines unit, including:
Second obtains subelement, for obtaining the target component list for described data to be collected prestored;
3rd determines subelement, for the corresponding relation according to described target component list storage, determines that each is classified The parameter value that parameter is corresponding.
Preferably,
Described signal generating unit, specifically for being respectively directed to each current class parameter and the current parameter value of correspondence, holds The following operation of row: the character of described current class parameter, current parameter value and setting is spliced by setting form;To spell Content after connecing is added in the assorted linking that described current class parameter is corresponding, obtains described corresponding the entering of current class parameter Mouth link.
Preferably,
Described collecting unit, specifically for linking for each current entry, proceeds as follows respectively: obtain described The object listing page that current entry link is corresponding;The described object listing page includes at least one paging page;To each Details link in the individual paging page conducts interviews, and the details link to accessing carries out data acquisition.
Embodiments provide a kind of network data sort-type acquisition method and device, be determined by data to be collected At least one sorting parameter, data to be collected to be classified, utilize each sorting parameter and corresponding parameter value to spell Be connected into linking inlet ports, by access this linking inlet ports can show to should the original list of linking inlet ports, due to each The original list content of classification correspondence is less, therefore, even if website is restricted to display number of pages, and the original list of each classification It is likely to show completely, by the original list of display is carried out data acquisition such that it is able to realize preventing data leakage from adopting Function.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is the present invention Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to according to These accompanying drawings obtain other accompanying drawing.
Fig. 1 is a kind of network data sort-type acquisition method flow chart that one embodiment of the invention provides;
Fig. 2 is the another kind of network data sort-type acquisition method flow chart that one embodiment of the invention provides;
Fig. 3 is the hardware structure figure of the device place equipment that one embodiment of the invention provides;
Fig. 4 is the network data sort-type harvester structure chart that one embodiment of the invention provides.
Detailed description of the invention
For making the purpose of the embodiment of the present invention, technical scheme and advantage clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is The a part of embodiment of the present invention rather than whole embodiments, based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained on the premise of not making creative work, broadly falls into the scope of protection of the invention.
As it is shown in figure 1, embodiments provide a kind of network data sort-type acquisition method, the method can include Following steps:
Step 101: determine data to be collected, and determine at least one sorting parameter that described data to be collected are corresponding;
Step 102: determine the parameter value that each sorting parameter is corresponding;
Step 103: according to each sorting parameter and corresponding parameter value, generate entering of each sorting parameter correspondence respectively Mouth link;
Step 104: for each linking inlet ports, gather the data corresponding to corresponding sorting parameter one by one.
Embodiments provide a kind of network data sort-type acquisition method, be determined by data to be collected at least One sorting parameter, so that data to be collected are classified, utilize each sorting parameter and corresponding parameter value be spliced into into Mouthful link, by access this linking inlet ports can show to should the original list of linking inlet ports, owing to each classification is right The original list content answered is less, and therefore, even if website is restricted to display number of pages, the original list of each classification is likely to Can show completely, by the original list of display is carried out data acquisition such that it is able to realize preventing data from leaking the function adopted.
In an embodiment of the invention, in order to make gatherer process comprehensively and orderliness is clear, described determine each point The parameter value that class parameter is corresponding, including:
Determine the targeted website at described data place to be collected;
The original list that described data to be collected are corresponding is obtained in described targeted website;
In described original list, select each sorting parameter one by one, obtain the classification chain that each sorting parameter is corresponding Connect;
According to each assorted linking obtained, determine the parameter value that each sorting parameter is corresponding.
Such as, data to be collected are the data messages of above all McDonald of Beijing area of U.S. group, it is first determined to be collected The targeted website at data place is U.S. group, and by " McDonald " as the sorting parameter determined, next opens the homepage of U.S. group, is working as Inputting Beijing in the search column of front homepage, system can generate an original list, then in described original list, find wheat When labor option and click on, system can generate the original list that the McDonald of a Beijing area upper with U.S. group is corresponding, is finally working as Original list described in before obtains the assorted linking that McDonald is corresponding, thus gets the parameter value that " McDonald " is corresponding.
Such as, the assorted linking obtained is:
http://bj.meituan.com/shops/?W=%E9%BA%A6%E5%BD%93%E5%8A%B3& Mtt=1, then can be using 1 in this assorted linking as the parameter value of this sorting parameter " McDonald ".
By the current link corresponding at sorting parameter gets the parameter value that each sorting parameter is corresponding, utilize and divide The total data of large-scale website can be classified by the parameter value of class parameter and correspondence, can avoid owing to website shows not Problem is adopted in the data leakage entirely caused.Meanwhile, the acquisition mode of this sorting technique and parameter value has the good suitability, operation Simplicity, when the classification number of data to be collected is less, can obtain the parameter that each sorting parameter is corresponding simply and easily Value.
In an embodiment of the invention, in order to make gatherer process comprehensively and orderliness is clear, described determine each point The parameter value that class parameter is corresponding, including:
Obtain the target component list for described data to be collected prestored;
According to the corresponding relation of described target component list storage, determine the parameter value that each sorting parameter is corresponding.
When the classification situation of data to be collected is more, such as, data to be collected are the McDonald of Beijing area, KFC With the data message such as many U.S.s of taste, acquisition target component list can be first passed through, in target component list, finding classification ginseng afterwards Number and the corresponding relation of parameter value.
Such as, this corresponding relation can include such as table 1 below:
Table 1:
Sorting parameter Parameter value
McDonald 1
KFC 2
How beautiful taste is 3
…… ……
Can be obtained by the corresponding relation in table 1, the parameter value of " McDonald " correspondence is 1, the parameter that " KFC " is corresponding Value is 2, and the parameter value of " the many U.S.s of taste " correspondence is 3.
Can quickly obtain the parameter value that each sorting parameter is corresponding, especially when number to be collected in this way According to classification more time, the corresponding relation of each sorting parameter and parameter value thereof can be called out, it is possible at data acquisition Portion of time is saved during collection.
In an embodiment of the invention, in order to realize preventing data from leaking the function adopted, described according to each point Class parameter and corresponding parameter value, generate the linking inlet ports that each sorting parameter is the most corresponding, including:
Be respectively directed to each current class parameter and the current parameter value of correspondence, perform following operation: by described currently The character of sorting parameter, current parameter value and setting is spliced by setting form;Spliced content is added to described In the assorted linking that current class parameter is corresponding, obtain the linking inlet ports that described current class parameter is corresponding.
Wherein, this interpolation content-form can also set according to user's request.Such as, this interpolation form is: first will divide Class parameter, parameter value and setting character splice according to setting form, add the content of splicing to current class chain afterwards Connect backmost.With sorting parameter be " McDonald ", parameter value is " 1 ", sets character as “ &=", the form that sets is as " classification ginseng Number, set character, parameter value and splice successively ", current class be linked as " http://bj.meituan.com/shops/&mtt= 1 " as a example by, spliced content is " Mai Danglao &=1 ", and linking inlet ports corresponding to the current class parameter that obtains is http: // Bj.meituan.com/shops/&mtt=1 Mai Danglao &=1.
The linking inlet ports utilizing sorting parameter, parameter value and setting character to generate, covers current class parameter corresponding All website data information, links rather than aobvious as accessing in traditional data gatherer process on website by accessing current entry The part data shown, thus the total data that current class parameter is corresponding can be collected, it is therefore prevented that data leak the problem adopted.
Below Pekinese is worked as data instance to be collected, the network data sort-type in the embodiment of the present invention is adopted Diversity method is described in detail, as in figure 2 it is shown, embodiments provide a kind of network data sort-type acquisition method, and should Method may include that
Step 201: determine that data to be collected are Pekinese's work.
In this step, general data to be collected can be given in a text form, the most first determines described number to be collected According to, it could be classified afterwards, so that it is determined that at least one sorting parameter of described data to be collected.Therefore, first obtain Get text information, then read over to content of text, finally determine data to be collected.Treating in the embodiment of the present invention Gather data and be defined as Pekinese's work.
Step 202: determine at least one sorting parameter that Pekinese's operational data information is corresponding.
In this step, after determining data to be collected, complete the classification to described data to be collected, so that it is determined that described At least one sorting parameter of data to be collected, obtains for follow-up corresponding parameter value and lays the foundation.
Wherein, when classifying the operational data of Beijing area, the number of sorting parameter and classification can need according to user Asking and be set, but the number of sorting parameter is at least one, such as, the work of Beijing area is drawn and can be divided into four classes, respectively It is " state-owned enterprise ", " undergraduate course ", " wages " and " working experience ".In the embodiment of the present invention with sorting parameter be " state-owned enterprise ", " undergraduate course " be Example, is this two class by the workload partition of Beijing area.
Step 203: determine that the targeted website at described data place to be collected is Zhaopin.com station.
In this step, after determining at least one sorting parameter that described data to be collected are corresponding, sorting parameter is with " state Enterprise ", as a example by " undergraduate course ", the parameter value corresponding for obtaining each sorting parameter, first should be according to the data to be collected determined, really The targeted website at fixed described data place to be collected.
Wherein, this targeted website can be arbitrary recruitment website, it is also possible to select according to user's request, such as " intelligence Connection recruitment ", " future is carefree " and " street net ".Using " intelligence connection recruitment " as targeted website in the embodiment of the present invention.
Step 204: obtain the original list corresponding to Pekinese's work in described Zhaopin.com stands.
In this step, after determining that the targeted website of described data to be collected is intelligence connection recruitment, this intelligence should first be opened Connection recruitment website, then by input keyword on Zhaopin.com station, wherein, this keyword is Beijing, gets website The original list that the work of upper Beijing area is corresponding, the data in described original list are the part Beijing areas of display on website Operational data.
Step 205: select each sorting parameter in described original list one by one, obtains each sorting parameter corresponding Assorted linking.
In this step, after getting the original list that in targeted website, data to be collected are corresponding, for obtaining each The parameter value that sorting parameter is corresponding, can be obtained by the form generating respective links corresponding to each sorting parameter.
With sorting parameter be " state-owned enterprise " and " undergraduate course ", being linked as of the corresponding original list of Beijing area work " http: // Sou.zhaopin.com/jobs/=&sm=0&isfilter=1&p=1&ct=-1 " as a example by, the list under current link In the page, find the sorting item of company nature and educational requirement, be usually in the top of original list or side, then this two Clicking on state-owned enterprise and undergraduate course in individual sorting item, system can generate the list under current class parameter according to each sorting parameter The page, eventually get the original list under sorting parameter " state-owned enterprise " correspondence is linked as http: // Sou.zhaopin.com/jobs/sm=0&isfilter=1&p=1&ct=1, the list under sorting parameter " undergraduate course " is corresponding The page be linked as http://sou.zhaopin.com/jobs/=&sm=0&ct=-1&isfilter=1&p=1&el= 4。
Step 206: according to the state-owned enterprise obtained, link that undergraduate course is corresponding, determines state-owned enterprise, the parameter value that undergraduate course is the most corresponding.
In this step, the parameter that each sorting parameter is corresponding can be got in the link of the original list of website Value.
Alternatively, the another way obtaining parameter value corresponding to sorting parameter is: obtained by the shortcut on keyboard Take each sorting parameter and the corresponding relation of corresponding parametric values on targeted website, determine each classification ginseng by this corresponding relation The parameter value that number is corresponding.
When building in targeted website, the corresponding relation of each sorting parameter and corresponding parametric values can be stored, use Family can be directly obtained the corresponding relation of this storage.
Wherein, this shortcut can be developer's setting when carrying out software development, and such as, this shortcut is F12.
Step 207: be combined into linking inlet ports.
In this step, if data message being acquired on website, entrance chain corresponding with sorting parameter need to be generated Connecing, lay the foundation for next accessing corresponding link, the linking inlet ports simultaneously generated in this step is the classification got Based on parameter and corresponding parameter value, therefore can cover the content of all data to be collected on website, from And so that gatherer process comprehensively and orderliness is clear, prevent data from leaking the problem adopted.The form of implementing is: by least one The character of sorting parameter, parameter value and setting splices according to setting form, spliced content is added to afterwards and works as In the current link that front sorting parameter is corresponding, thus get the linking inlet ports under current class parameter.
Wherein, the character of this setting can be any character, and character number can be at least one.Such as, this setting Character be " & ";For another example, this character set is as " %& ".
Further, this setting form can also set according to user's request, and such as, this sets form as classification ginseng Number, character and parameter value splice successively, to set character for " & ", sorting parameter as " state-owned enterprise ", parameter value as a example by " 1 ", splice After content be " Guo Qi &1 ".
Further, the interpolation form of splicing content can also set according to user's request, such as, and this interpolation form For: it is placed in before the parameter value in this sorting parameter current link in splicing.With sorting parameter be " state-owned enterprise ", " state-owned enterprise " right As a example by the current link answered is " http://sou.zhaopin.com/jobs/sm=0&isfilter=1&p=1&ct=1 ", The linking inlet ports getting " state-owned enterprise " corresponding is http://sou.zhaopin.com/jobs/sm=0&isfilter=1&p= 1&ct=Guo Qi &11.
Step 208: for two linking inlet ports generated, gather the data corresponding to corresponding sorting parameter one by one.
The linking inlet ports that this step is mainly generated by access, on the basis that website data all covers, finally Comprehensively gather the data that corresponding sorting parameter is corresponding.Specifically include:
Obtain the object listing page that the link of described current entry is corresponding;The described object listing page includes at least one The paging page;
Details in each paging page are linked and conducts interviews, and the details link to accessing carries out data acquisition.
With linking inlet ports corresponding to sorting parameter " state-owned enterprise " for " http://sou.zhaopin.com/jobs/sm=0& Isfilter=1&p=1&ct=Guo Qi &11 ", linking inlet ports that " undergraduate course " is corresponding be " http://sou.zhaopin.com/ Jobs/=&sm=0&ct=-1&isfilter=1&p=1&el=4 Ben Ke &22 " as a example by, first the two link is carried out Accessing successively, system can be respectively directed to the two linking inlet ports and automatically generate two corresponding original lists.
Wherein, contain much information, so each original list has a lot of list paging face, such as " state-owned enterprise " due to gather Corresponding whole operational data Information commons page 20, whole operational data Information commons page 30 that " undergraduate course " is corresponding, then for The list paging face of described generation, conducts interviews successively according to the form of page turning.Such as, generate with sorting parameter for " state-owned enterprise " As a example by the original list of total data, successively every one page can be conducted interviews from page 1 to 20.
Further, by the link of the details in the list paging face of state-owned enterprise and undergraduate course is conducted interviews, current point is got Total data information under class.Same, as a example by the original list that sorting parameter is the total data that " state-owned enterprise " generates, obtaining After getting all original lists of page 1 to 20, respectively each details link on every one page is conducted interviews, finally gather Job information to all Beijing area state-owned enterprises.
As shown in Figure 3, Figure 4, a kind of network data sort-type harvester is embodiments provided.Device embodiment Can be realized by software, it is also possible to realize by the way of hardware or software and hardware combining.For hardware view, such as Fig. 3 Shown in, a kind of hardware structure diagram of network data sort-type harvester place equipment provided for the embodiment of the present invention, except Outside processor shown in Fig. 3, internal memory, network interface and nonvolatile memory, in embodiment, the equipment at device place leads to Often can also include other hardware, such as the forwarding chip etc. of responsible process message.As a example by implemented in software, as shown in Figure 4, make It is the device on a logical meaning, is that the CPU by its place equipment is by computer journey corresponding in nonvolatile memory Sequence instruction reads and runs formation in internal memory.The network data sort-type harvester that the present embodiment provides, including:
First determines unit 401, is used for determining data to be collected, and determines corresponding at least one of described data to be collected Sorting parameter;
Second determines unit 402, for determining the parameter value that each sorting parameter is corresponding;
Signal generating unit 403, for according to each sorting parameter and corresponding parameter value, generates each sorting parameter respectively Corresponding linking inlet ports;
Collecting unit 404, for for each linking inlet ports, gathers the data corresponding to corresponding sorting parameter one by one.
In an embodiment of the invention, described second determines unit 402, including:
First determines subelement, for determining the targeted website at described data place to be collected;
First obtains subelement, for obtaining the original list that described data to be collected are corresponding in described targeted website;
Select subelement, for selecting each sorting parameter in described original list one by one, obtain each classification The assorted linking that parameter is corresponding;
Second determines subelement, for according to each assorted linking obtained, determining the ginseng that each sorting parameter is corresponding Numerical value.
In an embodiment of the invention, described second determines unit 402, including:
Second obtains subelement, for obtaining the target component list for described data to be collected prestored;
3rd determines subelement, for the corresponding relation according to described target component list storage, determines that each is classified The parameter value that parameter is corresponding.
In an embodiment of the invention, described signal generating unit 403, specifically for:
Be respectively directed to each current class parameter and the current parameter value of correspondence, perform following operation: by described currently The character of sorting parameter, current parameter value and setting is spliced by setting form;Spliced content is added to described In the assorted linking that current class parameter is corresponding, obtain the linking inlet ports that described current class parameter is corresponding.
In an embodiment of the invention, described collecting unit 404, specifically for:
Link for each current entry, proceed as follows respectively:
Obtain the object listing page that the link of described current entry is corresponding;The described object listing page includes at least one The paging page;
Details in each paging page are linked and conducts interviews, and the details link to accessing carries out data acquisition.
To sum up, each embodiment of the present invention has the effect that
1, in embodiments of the present invention, it is determined by least one sorting parameter of data to be collected, with by number to be collected According to classifying, each sorting parameter and corresponding parameter value is utilized to be spliced into linking inlet ports, by accessing this linking inlet ports Can show to should the original list of linking inlet ports, original list content corresponding due to each classification is less, therefore, Even if website is restricted to display number of pages, the original list of each classification is likely to show completely, by the row to display The table page carries out data acquisition such that it is able to realize preventing data from leaking the function adopted.
2, in embodiments of the present invention, by the current link corresponding at sorting parameter gets each sorting parameter Corresponding parameter value, utilizes the parameter value of sorting parameter and correspondence the total data of large-scale website to be classified, can To avoid owing to website shows that problem is adopted in the data the most entirely caused leakage.Meanwhile, the acquisition mode of this sorting technique and parameter value There is the good suitability, easy and simple to handle, during for less classification, each sorting parameter pair can be obtained simply and easily The parameter value answered.
3, in embodiments of the present invention, obtain, by target correspondence parameter list, the parameter that each sorting parameter is corresponding Value, especially when the classification of data to be collected is more, can adjust the corresponding relation of each sorting parameter and parameter value thereof Use, it is possible to during data acquisition, save portion of time.
4, in embodiments of the present invention, the linking inlet ports utilizing sorting parameter, parameter value and setting character to generate, cover Whole website data information that current class parameter is corresponding, link rather than as traditional data collection by accessing current entry During access the part data of display on website, thus the total data that current class parameter is corresponding can be collected, prevent Data leak the problem adopted.
The contents such as the information between each unit in said apparatus is mutual, execution process, owing to implementing with the inventive method Example is based on same design, and particular content can be found in the narration in the inventive method embodiment, and here is omitted.
It should be noted that in this article, the relational terms of such as first and second etc is used merely to an entity Or operation separates with another entity or operating space, and not necessarily require or imply existence between these entities or operation The relation of any this reality or order.And, term " includes ", " comprising " or its any other variant are intended to non- Comprising of exclusiveness, so that include that the process of a series of key element, method, article or equipment not only include those key elements, But also include other key elements being not expressly set out, or also include being consolidated by this process, method, article or equipment Some key elements.In the case of there is no more restriction, statement the key element " including a 〃 " and limiting, do not arrange Except there is also other same factor in including the process of described key element, method, article or equipment.
One of ordinary skill in the art will appreciate that: all or part of step realizing said method embodiment can be passed through The hardware that programmed instruction is relevant completes, and aforesaid program can be stored in the storage medium of embodied on computer readable, this program Upon execution, perform to include the step of said method embodiment;And aforesaid storage medium includes: ROM, RAM, magnetic disc or light In the various medium that can store program code such as dish.
Last it should be understood that the foregoing is only presently preferred embodiments of the present invention, it is merely to illustrate the skill of the present invention Art scheme, is not intended to limit protection scope of the present invention.All made within the spirit and principles in the present invention any amendment, Equivalent, improvement etc., be all contained in protection scope of the present invention.

Claims (10)

1. network data sort-type acquisition method, it is characterised in that the method includes:
Determine data to be collected, and determine at least one sorting parameter that described data to be collected are corresponding;
Determine the parameter value that each sorting parameter is corresponding;
According to each sorting parameter and corresponding parameter value, generate the linking inlet ports that each sorting parameter is the most corresponding;
For each linking inlet ports, gather the data corresponding to corresponding sorting parameter one by one.
Method the most according to claim 1, it is characterised in that the described parameter value determining that each sorting parameter is corresponding, Including:
Determine the targeted website at described data place to be collected;
The original list that described data to be collected are corresponding is obtained in described targeted website;
In described original list, select each sorting parameter one by one, obtain the assorted linking that each sorting parameter is corresponding;
According to each assorted linking obtained, determine the parameter value that each sorting parameter is corresponding.
Method the most according to claim 1, it is characterised in that the described parameter value determining that each sorting parameter is corresponding, Including:
Obtain the target component list for described data to be collected prestored;
According to the corresponding relation of described target component list storage, determine the parameter value that each sorting parameter is corresponding.
Method the most according to claim 2, it is characterised in that the described parameter according to each sorting parameter with correspondence Value, generates the linking inlet ports that each sorting parameter is the most corresponding, including:
It is respectively directed to each current class parameter and the current parameter value of correspondence, performs following operation: by described current class The character of parameter, current parameter value and setting is spliced by setting form;Spliced content is added to described currently In the assorted linking that sorting parameter is corresponding, obtain the linking inlet ports that described current class parameter is corresponding.
5. according to described method arbitrary in claim 1-4, it is characterised in that described for each linking inlet ports, one by one Gather the data corresponding to corresponding sorting parameter, including:
Link for each current entry, proceed as follows respectively:
Obtain the object listing page that the link of described current entry is corresponding;The described object listing page includes at least one paging The page;
Details in each paging page are linked and conducts interviews, and the details link to accessing carries out data acquisition.
6. network data sort-type harvester, it is characterised in that including:
First determines unit, is used for determining data to be collected, and determines at least one classification ginseng that described data to be collected are corresponding Number;
Second determines unit, for determining the parameter value that each sorting parameter is corresponding;
Signal generating unit, for according to each sorting parameter and corresponding parameter value, generating each sorting parameter correspondence respectively Linking inlet ports;
Collecting unit, for for each linking inlet ports, gathers the data corresponding to corresponding sorting parameter one by one.
Network data sort-type harvester the most according to claim 6, it is characterised in that described second determines unit, Including:
First determines subelement, for determining the targeted website at described data place to be collected;
First obtains subelement, for obtaining the original list that described data to be collected are corresponding in described targeted website;
Select subelement, for selecting each sorting parameter in described original list one by one, obtain each sorting parameter Corresponding assorted linking;
Second determines subelement, for according to each assorted linking obtained, determining the parameter value that each sorting parameter is corresponding.
Network data sort-type harvester the most according to claim 6, it is characterised in that described second determines unit, Including:
Second obtains subelement, for obtaining the target component list for described data to be collected prestored;
3rd determines subelement, for the corresponding relation according to described target component list storage, determines each sorting parameter Corresponding parameter value.
Network data sort-type harvester the most according to claim 7, it is characterised in that described signal generating unit, specifically For being respectively directed to each current class parameter and the current parameter value of correspondence, perform following operation: by described current class The character of parameter, current parameter value and setting is spliced by setting form;Spliced content is added to described currently In the assorted linking that sorting parameter is corresponding, obtain the linking inlet ports that described current class parameter is corresponding.
10. according to the arbitrary described network data sort-type harvester of claim 6-9, it is characterised in that described collection list Unit, specifically for linking for each current entry, proceeds as follows respectively: obtain described current entry link correspondence The object listing page;The described object listing page includes at least one paging page;To the details in each paging page Link conducts interviews, and the details link to accessing carries out data acquisition.
CN201610542380.0A 2016-07-11 2016-07-11 Network data classified collection method and device Pending CN106168973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610542380.0A CN106168973A (en) 2016-07-11 2016-07-11 Network data classified collection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610542380.0A CN106168973A (en) 2016-07-11 2016-07-11 Network data classified collection method and device

Publications (1)

Publication Number Publication Date
CN106168973A true CN106168973A (en) 2016-11-30

Family

ID=58065805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610542380.0A Pending CN106168973A (en) 2016-07-11 2016-07-11 Network data classified collection method and device

Country Status (1)

Country Link
CN (1) CN106168973A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632356A (en) * 2020-12-25 2021-04-09 深圳市高德信通信股份有限公司 Network information data classification collection method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070067217A1 (en) * 2005-09-20 2007-03-22 Joshua Schachter System and method for selecting advertising
CN101620608A (en) * 2008-07-04 2010-01-06 全国组织机构代码管理中心 Information collection method and system
CN105426424A (en) * 2015-11-04 2016-03-23 浪潮软件集团有限公司 Directional paging type acquisition method for network data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070067217A1 (en) * 2005-09-20 2007-03-22 Joshua Schachter System and method for selecting advertising
CN101620608A (en) * 2008-07-04 2010-01-06 全国组织机构代码管理中心 Information collection method and system
CN105426424A (en) * 2015-11-04 2016-03-23 浪潮软件集团有限公司 Directional paging type acquisition method for network data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632356A (en) * 2020-12-25 2021-04-09 深圳市高德信通信股份有限公司 Network information data classification collection method

Similar Documents

Publication Publication Date Title
Haralambopoulos et al. Renewable energy projects: structuring a multi-criteria group decision-making framework
US9495282B2 (en) Method and systems for a dashboard testing framework in an online demand service environment
CN103530414B (en) Web Page Key Words open up word method and apparatus
CN110880136A (en) Recommendation method, system, equipment and storage medium for matched product
CN104699837B (en) Method, device and server for selecting illustrated pictures of web pages
CN106599299A (en) Determining method and device of website key words
CN106484699A (en) The generation method of data base querying field and device
Cummaudo et al. What should I document? A preliminary systematic mapping study into API documentation knowledge
US10019520B1 (en) System and process for using artificial intelligence to provide context-relevant search engine results
US8799791B2 (en) System for use in editorial review of stored information
CN110264283A (en) A kind of popularization resource exhibition method and device
CN106201260A (en) A kind of explorer optimization method and device
CN103227791B (en) A kind of method of data acquisition and device
CN106168962B (en) Search method and device for providing accurate viewpoint based on natural search result
CN106168973A (en) Network data classified collection method and device
Gutierrez et al. Forest and landscape restoration monitoring frameworks: how principled are they?
CN106649374A (en) Navigation tag ordering method and device
US20180260820A1 (en) System device and process for an educational regulatory electronic tool kit
Hadidi Using quality function deployment to conduct assessment for engineering designs’ contractors
CN104885075B (en) A kind of method and device executing reverse search using crucial link
CN103870520B (en) For searching for the device and method of information
US8886665B2 (en) Systems and methods for enhancing management effectiveness
Yadav et al. Resources, facilities and services of the Indian citation index (ICI)
KR101126699B1 (en) Analysising system and method thereof for creation of r?d idea
Jeyshankar Link Analysis and Web Impact Factor of Indian Nationalised Banks’ Website: A Webometric Study

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161130