CN110020050A - A kind of intelligent grabbing rule configuration technology implementation method based on normative document - Google Patents
A kind of intelligent grabbing rule configuration technology implementation method based on normative document Download PDFInfo
- Publication number
- CN110020050A CN110020050A CN201711048560.4A CN201711048560A CN110020050A CN 110020050 A CN110020050 A CN 110020050A CN 201711048560 A CN201711048560 A CN 201711048560A CN 110020050 A CN110020050 A CN 110020050A
- Authority
- CN
- China
- Prior art keywords
- website
- acquisition
- normative document
- crawl
- configuration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The intelligent grabbing rule configuration technology implementation method based on normative document that the invention discloses a kind of, belong to normative document network automatic acquisition technology field, template is grabbed by pre-configured different types of website, according to normative document each stage the text information showed, the information preservation of normative document on internet is automatically grabbed to local library, automatically update Index information, exploitation complies with standard the webpage capture tool of characteristic, website is announced according to domestic working standard and customizes configuration, realizes the automatic real-time acquisition of normative document data.The present invention, which changes, originally tracks normative document information from each website daily by manual type, and carries out standard information crawl manually, greatly reduce employee work amount and improve data accuracy, improves the real-time, comprehensive of local standard library documents and materials.
Description
Technical field
The present invention relates to a kind of Web page intelligent grasping means based on normative document characteristic value, are based on more particularly to one kind
The intelligent grabbing rule configuration technology implementation method of normative document, belongs to normative document network automatic acquisition technology field.
Background technique
Webpage automatically grabs, be it is a kind of automatically grab the program or script of web message according to certain rules,
They are widely used in internet search engine or other similar website, can be with all pages that it is able to access that of automatic collection
Content, to obtain or update the content and retrieval mode of these websites.Functionally, crawl be generally divided into data acquisition,
Three parts of processing and storage.Tradition crawl obtains the URL on Initial page since the URL of one or several Initial pages,
During grabbing webpage, new URL is constantly extracted from current page and is put into queue, certain stopping until meeting system
Condition.The workflow for focusing crawl is complex, needs to filter the chain unrelated with theme according to certain web page analysis algorithm
It connects, the URL queue to be captured such as retains useful link and put it into.Then, it by according to certain search strategy from team
The selection next step webpage URL to be grabbed in column, and repeat the above process, stopping when reaching a certain condition of system.Separately
Outside, all webpages being crawled will be stored by system, certain analysis, filtering be carried out, and establish index, so as to looking into later
It askes and retrieves;For focusing for crawl, the obtained analysis result of this process is also possible to provide later crawl process
Feedback and guidance.
It is common technology in current internet that webpage, which automatically grabs technology, at present the webpage capture software master on internet
The general format for the news and article content crawl to large-scale website, based on news, article publication content is concentrated, it can
The crawl for covering most of website meets the needs of rear end worker.Normative document text information is had any different in common news
Information, webpage capture need to customize the content of standard feature, such as standard No., standard name, state, issue date, implementation date
Etc. elements crawl, and each normative document website is board-like inconsistent, needs to track the normative document data content of crawl
It is inconsistent, it is unsatisfactory for versatility, the crawl of traditional webpage capture software realization normative document can not be utilized.
Summary of the invention
The main object of the present invention is to provide for a kind of intelligent grabbing rule configuration technology realization based on normative document
Method, for solving the problems, such as that low efficiency existing for artificial tracking enquiry normative document data, timeliness is poor, accuracy rate is low.
The purpose of the present invention can reach by using following technical solution:
A kind of intelligent grabbing rule configuration technology implementation method based on normative document, includes the following steps:
Step 1: establishing website acquisition configuration module, acquisition information editing's module, website acquisition warning module, install and match
Set webpage capture software server;
Step 2: website acquisition configuration module configures the collection rule of website under each classification, according to normative document by website
Acquisition configuration module is divided into three classes the crawl of normative document acquisition configuration module;
Step 3: after the starting of webpage capture software, the website for needing to grab is obtained from database, and is configured in write-in
Deposit, grab the normative document information of all websites after starting for the first time, behind according to acquisition the frequency crawl respective site standard text
Offer information;
Step 4: bulletin management personnel are by acquisition information editing's module, in the normative document of webpage capture software collection
Appearance is checked, if it is confirmed that it is errorless, it submits, acquisition information editing's module is classified by the acquisition normative document information submitted,
Voluntarily determine it is newly-increased normative document information or the content for updating existing normative document;
Step 5: bulletin management personnel acquire the acquisition abnormity that warning module web page monitored grabs software, website by website
Without information is grabbed in the threshold value of warning of its configuration, bulletin management personnel will be showed, bulletin management personnel are according to website weight
New configuration collection rule.
Further, the website acquisition configuration module is used to acquire the management of acquisition website and each website in detail
The configuration of rule;
Acquisition information editing's module is used to reprocess the normative document information of automatic collection;Website acquires early warning mould
Block is for the alarm to information station is not collected for a long time;
The webpage capture software is used for receiving station acquisition configuration information, automatic collection normative document information.
Further, the website acquisition configuration module, the acquisition information editing module, the website acquire early warning mould
Block and the webpage capture software are all made of server completion.
Further, the website acquisition configuration module, the acquisition information editing module, the website acquire early warning mould
Block and the webpage capture software are all made of internet communication.
Further, the three classes normative document configuration module include: system reschedule acquisition configuration module, consult on
Original text acquisition configuration module and standard announce acquisition configuration module.
Further, the crawl for making acquisition configuration module tracks project initiation phase normative document of rescheduling;
The exposure draft acquisition configuration module tracks consult on the crawl of stage criterion document;
The standard announces acquisition configuration module tracks publication stage, review stage, the crawl for abrogating stage criterion document;
Website configures collection rule according to different templates under each classification.
Further, in the step 2, the website acquisition configuration module carries out the configuration of website rules for grasping, including such as
Lower step:
Step 21: adding crawl site link and coded format, slave site acquisition configuration mould in website acquisition configuration module
Block server end sends a request to webpage capture software server end, and webpage capture software is sent out according to hyperlink request and coded format
It send HTTP request to obtain the website source code and returns website crawl configuration module server end;
Step 22: the website source code that website crawl configuration module will acquire is loaded into webpage, configures text pickup area,
The starting position of selected text and end position only retain crawl body part source code after submission;
Step 23: from the rules for grasping of body part configuration standard literature content, it is assumed that need to grab standard bulletin title,
It then needs to fill in the beginning keyword for intercepting the content and terminates keyword, the content grabbed can be tested after filling in, grabs text
This can be highlighted in source code;
Step 24: the crawl frequency of configuration site grabs the frequency as unit of hour, when webpage capture software self-starting
As initial time, timer is added in time, and as unit of hour, starting service for the first time grabs the website under all classification, after
Website in the acquisition frequency is added in acquisition target, release should automatically after the completion of acquisition by face using hour as spacer unit
Object.
Step 25: the threshold value of warning of configuration site, for threshold value of warning as unit of day, each website will after the completion of crawl
The time that last time success grabs is updated, when successfully the crawl time exceeds some website by the end of for the last time in current time
When the threshold value of warning of the website, the system automatic early-warning website.
Further, in the step 3, the webpage capture software, which passes through to configure, acquires the normative document of different phase
And be inserted into different databases, webpage capture software adopts the automatic duplicate removal of the data of crawl, the normative document of different phase
Collection includes project initiation phase, the normative document for consulting on stage, publication stage, review stage, the stage of abrogating.
Advantageous effects of the invention: the intelligent grabbing rule configuration technology according to the invention based on normative document is real
Existing method, the intelligent grabbing rule configuration technology implementation method provided by the invention based on normative document, by pre-configured
Different types of website crawl template automatically grabs interconnection according to normative document in the text information showed in each stage
The information preservation of online normative document automatically updates Index information to local library, change originally by manual type daily from
Each website tracks normative document information, and carries out standard information crawl manually, greatly reduces employee work amount and improves number
According to accuracy, the real-time, comprehensive of local standard library documents and materials is improved.
Detailed description of the invention
Fig. 1 is a preferred reality of the intelligent grabbing rule configuration technology implementation method according to the invention based on normative document
Apply the flow chart of example;
Fig. 2 is a preferred reality of the intelligent grabbing rule configuration technology implementation method according to the invention based on normative document
Apply the flow chart of the progress website rules for grasping configuration of example.
Specific embodiment
To make the more clear and clear technical solution of the present invention of those skilled in the art, below with reference to examples and drawings
The present invention is described in further detail, and embodiments of the present invention are not limited thereto.
As shown in Figure 1, a kind of intelligent grabbing rule configuration technology realization side based on normative document provided in this embodiment
Method includes the following steps:
Step 1: establishing website acquisition configuration module, acquisition information editing's module, website acquisition warning module, install and match
Set webpage capture software server;
Step 2: website acquisition configuration module configures the collection rule of website under each classification, according to normative document by website
Acquisition configuration module is divided into three classes the crawl of normative document acquisition configuration module;
Step 3: after the starting of webpage capture software, the website for needing to grab is obtained from database, and is configured in write-in
Deposit, grab the normative document information of all websites after starting for the first time, behind according to acquisition the frequency crawl respective site standard text
Offer information;
Step 4: bulletin management personnel are by acquisition information editing's module, in the normative document of webpage capture software collection
Appearance is checked, if it is confirmed that it is errorless, it submits, acquisition information editing's module is classified by the acquisition normative document information submitted,
Voluntarily determine it is newly-increased normative document information or the content for updating existing normative document;
Step 5: bulletin management personnel acquire the acquisition abnormity that warning module web page monitored grabs software, website by website
Without information is grabbed in the threshold value of warning of its configuration, bulletin management personnel will be showed, bulletin management personnel are according to website weight
New configuration collection rule.
Further, in the present embodiment, the website acquisition configuration module is used for the management to acquisition website, and each
The configuration of the detailed collection rule of website;
Acquisition information editing's module is used to reprocess the normative document information of automatic collection;Website acquires early warning mould
Block is for the alarm to information station is not collected for a long time;
The webpage capture software is used for receiving station acquisition configuration information, automatic collection normative document information.
Further, in the present embodiment, the website acquisition configuration module, the acquisition information editing module, described
Website acquisition warning module and the webpage capture software are all made of server completion;It is the website acquisition configuration module, described
Acquisition information editing's module, website acquisition warning module and the webpage capture software are all made of internet communication.
Further, in the present embodiment, the three classes normative document configuration module includes: that system is rescheduled acquisition configuration
Module, exposure draft acquisition configuration module and standard announce acquisition configuration module;
The crawl for making acquisition configuration module tracks project initiation phase normative document of rescheduling;
The exposure draft acquisition configuration module tracks consult on the crawl of stage criterion document;
The standard announces acquisition configuration module tracks publication stage, review stage, the crawl for abrogating stage criterion document;
Website configures collection rule according to different templates under each classification.
Further, in the present embodiment, as shown in Fig. 2, in the step 2, the website acquisition configuration module is carried out
The configuration of website rules for grasping, includes the following steps:
Step 21: adding crawl site link and coded format, slave site acquisition configuration mould in website acquisition configuration module
Block server end sends a request to webpage capture software server end, and webpage capture software is sent out according to hyperlink request and coded format
It send HTTP request to obtain the website source code and returns website crawl configuration module server end;
Step 22: the website source code that website crawl configuration module will acquire is loaded into webpage, configures text pickup area,
The starting position of selected text and end position only retain crawl body part source code after submission;
Step 23: from the rules for grasping of body part configuration standard literature content, it is assumed that need to grab standard bulletin title,
It then needs to fill in the beginning keyword for intercepting the content and terminates keyword, the content grabbed can be tested after filling in, grabs text
This can be highlighted in source code;
Step 24: the crawl frequency of configuration site grabs the frequency as unit of hour, when webpage capture software self-starting
As initial time, timer is added in time, and as unit of hour, starting service for the first time grabs the website under all classification, after
Website in the acquisition frequency is added in acquisition target, release should automatically after the completion of acquisition by face using hour as spacer unit
Object.
Step 25: the threshold value of warning of configuration site, for threshold value of warning as unit of day, each website will after the completion of crawl
The time that last time success grabs is updated, when successfully the crawl time exceeds some website by the end of for the last time in current time
When the threshold value of warning of the website, the system automatic early-warning website.
Further, in the present embodiment, in the step 3, the webpage capture software is by configuring different phase
Normative document acquire and be inserted into different databases, webpage capture software is to the automatic duplicate removal of the data of crawl, not same order
The normative document acquisition of section includes project initiation phase, the standard text for consulting on stage, publication stage, review stage, the stage of abrogating
It offers.
Further, in the present embodiment, the region crawl configuration method of use is as follows:
Text grabs configuration method: first positioning the position that text occurs in html, can pass through browser developers tool
It checks, frame chooses text that can check which DOM node it appears under, and it is special then to find in the DOM node context
Character string (only occurred once in html), the crawl configuration of the start of text character string and terminator-string of setting, as text
Localization region.
List grabs configuration method: navigating under the DOM node in list html, is normally contained in the labels such as table, ul
Interior, analysis includes the label for the content of being grabbed, and generally with the appearance of the circulations such as td, li, finds out these label common grounds, selectes
Content front and back special string is grabbed in these labels, is set as grabbing the beginning character string and terminator-string of content, as
The localization region of list crawl configuration.
Further, in the present embodiment, the crawl data duplicate removal method of use is as follows:
Such as newly one website of addition, configuration after the rules for grasping of the website, grab software and are grabbing the station for the first time well
When point, full dose grabs the webpage information that the link under the website is included, and database is inserted into all-links address, later
When the crawl in stage, the all-links under the website are grabbed first, it is linked with the crawl saved in database and is compared,
The link increased newly under the website is obtained, crawl software only grabs the webpage information of newly-increased link, and the information of crawl is inserted into
In database, this ensure that the information of crawl storage does not restore in repeated data every time.
In conclusion in the present embodiment, configuring skill according to the intelligent grabbing rule based on normative document of the present embodiment
Art implementation method, the intelligent grabbing rule configuration technology implementation method provided in this embodiment based on normative document, by preparatory
The good different types of website of configuration grabs template, and the text information showed according to normative document in each stage is grabbed automatically
It takes the information preservation of normative document on internet to local library, automatically updates Index information, change originally by manual type
Daily from each website track normative document information, and manually carry out standard information crawl, greatly reduce employee work amount and
Data accuracy is improved, the real-time, comprehensive of local standard library documents and materials is improved.
The above, further embodiment only of the present invention, but scope of protection of the present invention is not limited thereto, and it is any
Within the scope of the present disclosure, according to the technique and scheme of the present invention and its design adds those familiar with the art
With equivalent substitution or change, protection scope of the present invention is belonged to.
Claims (8)
1. a kind of intelligent grabbing rule configuration technology implementation method based on normative document, which comprises the steps of:
Step 1: establishing website acquisition configuration module, acquisition information editing's module, website acquisition warning module, installation and configuration net
Page crawl software server;
Step 2: website acquisition configuration module configures the collection rule of website under each classification, is acquired website according to normative document
Configuration module is divided into three classes the crawl of normative document acquisition configuration module;
Step 3: after the starting of webpage capture software, the website for needing to grab is obtained from database, and is configured write-in memory, is opened
The normative document information of all websites is grabbed after dynamic for the first time, behind believed according to the normative document of acquisition frequency crawl respective site
Breath;
Step 4: bulletin management personnel by acquisition information editing's module, to the normative document content of webpage capture software collection into
Row verification, if it is confirmed that it is errorless, it submits, acquisition information editing's module passes through the acquisition normative document information classification submitted, voluntarily
Judgement is newly-increased normative document information or the content for updating existing normative document;
Step 5: bulletin management personnel acquire the acquisition abnormity that warning module web page monitored grabs software by website, and website is at it
Without information is grabbed in the threshold value of warning of configuration, bulletin management personnel will be showed, bulletin management personnel match again according to website
Set collection rule.
2. a kind of intelligent grabbing rule configuration technology implementation method based on normative document according to claim 1, special
Sign is that the website acquisition configuration module is used for the management to acquisition website and the configuration of the detailed collection rule of each website;
Acquisition information editing's module is used to reprocess the normative document information of automatic collection;Website acquires warning module and uses
In the alarm to information station is not collected for a long time;
The webpage capture software is used for receiving station acquisition configuration information, automatic collection normative document information.
3. a kind of intelligent grabbing rule configuration technology implementation method based on normative document according to claim 1, special
Sign is that the website acquisition configuration module, the acquisition information editing module, the website acquire warning module and the net
Page crawl software is all made of server completion.
4. a kind of intelligent grabbing rule configuration technology implementation method based on normative document according to claim 1, special
Sign is that the website acquisition configuration module, the acquisition information editing module, the website acquire warning module and the net
Page crawl software is all made of internet communication.
5. a kind of intelligent grabbing rule configuration technology implementation method based on normative document according to claim 1, special
Sign is, the three classes normative document configuration module includes: that system is rescheduled acquisition configuration module, exposure draft acquisition configuration
Module and standard announce acquisition configuration module.
6. a kind of intelligent grabbing rule configuration technology implementation method based on normative document according to claim 5, special
Sign is, the system is rescheduled the crawl of acquisition configuration module tracks project initiation phase normative document;
The exposure draft acquisition configuration module tracks consult on the crawl of stage criterion document;
The standard announces acquisition configuration module tracks publication stage, review stage, the crawl for abrogating stage criterion document;
Website configures collection rule according to different templates under each classification.
7. a kind of intelligent grabbing rule configuration technology implementation method based on normative document according to claim 1, special
Sign is, in the step 2, the website acquisition configuration module carries out the configuration of website rules for grasping, includes the following steps:
Step 21: adding crawl site link and coded format, slave site acquisition configuration module clothes in website acquisition configuration module
Business device end sends a request to webpage capture software server end, and webpage capture software is sent according to hyperlink request and coded format
HTTP request obtains the website source code and returns website crawl configuration module server end;
Step 22: the website source code that website crawl configuration module will acquire is loaded into webpage, configures text pickup area, is selected
The starting position of text and end position only retain crawl body part source code after submission;
Step 23: from the rules for grasping of body part configuration standard literature content, it is assumed that need to grab standard bulletin title, then need
It fills in the beginning keyword for intercepting the content and terminates keyword, the content grabbed can be tested after filling in, text is grabbed and exists
It can be highlighted in source code;
Step 24: the crawl frequency of configuration site grabs time of the frequency as unit of hour, when webpage capture software self-starting
As initial time, timer is added, as unit of hour, starting service for the first time grabs the website under all classification, behind with
Hour is spacer unit, will be added in acquisition target in the website in the acquisition frequency, and discharge the object after the completion of acquisition automatically.
Step 25: the threshold value of warning of configuration site, as unit of day, each website will update threshold value of warning after the completion of crawl
The time that last time success grabs, when successfully the crawl time has exceeded this to some website by the end of for the last time in current time
When the threshold value of warning of website, the system automatic early-warning website.
8. a kind of intelligent grabbing rule configuration technology implementation method based on normative document according to claim 1, special
Sign is, in the step 3, the normative document of different phase is acquired and be inserted into not by configuring by the webpage capture software
In same database, for webpage capture software to the automatic duplicate removal of the data of crawl, the normative document acquisition of different phase includes project verification
The normative document in stage, the stage of consulting on, publication stage, review stage, the stage of abrogating.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711048560.4A CN110020050B (en) | 2017-10-31 | 2017-10-31 | Method for realizing intelligent capture rule configuration technology based on standard documents |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711048560.4A CN110020050B (en) | 2017-10-31 | 2017-10-31 | Method for realizing intelligent capture rule configuration technology based on standard documents |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110020050A true CN110020050A (en) | 2019-07-16 |
CN110020050B CN110020050B (en) | 2022-11-15 |
Family
ID=67186727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711048560.4A Active CN110020050B (en) | 2017-10-31 | 2017-10-31 | Method for realizing intelligent capture rule configuration technology based on standard documents |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110020050B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003186901A (en) * | 2001-12-21 | 2003-07-04 | Nippon Telegr & Teleph Corp <Ntt> | Web SITE RETRIEVAL METHOD AND SYSTEM, EXECUTION PROGRAM FOR THE METHOD, AND RECORDING MEDIUM WITH ITS PROGRAM RECORDED THEREON |
US20060206448A1 (en) * | 2005-03-11 | 2006-09-14 | Adam Hyder | System and method for improved job seeking |
CN101986294A (en) * | 2010-10-18 | 2011-03-16 | 林桢 | Internet Web 2.0 platform-based on-line document management system |
CN103699537A (en) * | 2012-09-28 | 2014-04-02 | 中国石油天然气股份有限公司 | Establishment method for natural gas and pipe technical standard bibliographic database |
CN106294422A (en) * | 2015-05-25 | 2017-01-04 | 《中国学术期刊(光盘版)》电子杂志社有限公司 | The method that a kind of document based on webpage batch is downloaded |
-
2017
- 2017-10-31 CN CN201711048560.4A patent/CN110020050B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003186901A (en) * | 2001-12-21 | 2003-07-04 | Nippon Telegr & Teleph Corp <Ntt> | Web SITE RETRIEVAL METHOD AND SYSTEM, EXECUTION PROGRAM FOR THE METHOD, AND RECORDING MEDIUM WITH ITS PROGRAM RECORDED THEREON |
US20060206448A1 (en) * | 2005-03-11 | 2006-09-14 | Adam Hyder | System and method for improved job seeking |
CN101986294A (en) * | 2010-10-18 | 2011-03-16 | 林桢 | Internet Web 2.0 platform-based on-line document management system |
CN103699537A (en) * | 2012-09-28 | 2014-04-02 | 中国石油天然气股份有限公司 | Establishment method for natural gas and pipe technical standard bibliographic database |
CN106294422A (en) * | 2015-05-25 | 2017-01-04 | 《中国学术期刊(光盘版)》电子杂志社有限公司 | The method that a kind of document based on webpage batch is downloaded |
Also Published As
Publication number | Publication date |
---|---|
CN110020050B (en) | 2022-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101971172B (en) | Mobile sitemaps | |
CN106021257B (en) | A kind of crawler capturing data method, apparatus and system for supporting online programming | |
CN110245035A (en) | A kind of link trace method and device | |
CN102760151B (en) | Implementation method of open source software acquisition and searching system | |
CN107947940A (en) | A kind of method and device of data exchange | |
CN103955463B (en) | A kind of policy destructing method and system of government | |
CN108052632A (en) | A kind of method for obtaining network information, system and company information search system | |
CN107220142A (en) | Perform the method and device of data recovery operation | |
CN101484892B (en) | A method of managing web services using integrated document | |
US5905979A (en) | Abstract manager system and method for managing an abstract database | |
CN104318481A (en) | Power-grid-operation-oriented holographic time scale measurement data extraction conversion method | |
CN109829096A (en) | A kind of collecting method, device, electronic equipment and storage medium | |
CN101610265A (en) | A kind of flow process recognition methods of Business Works | |
CN110866273A (en) | Inter-enterprise standard consensus method based on block chain and interplanetary file system | |
CN104407901A (en) | Code adding method and device | |
CN104778078A (en) | Content management system and information content issuing method | |
CN106802928B (en) | Power grid historical data management method and system | |
CN108039960A (en) | Configuration information delivery method and server | |
CN102902794A (en) | Web page classification system and method | |
CN104965904B (en) | A kind of grasping means of multi-platform data and device | |
CN109657908A (en) | A kind of visiting method, system and computer readable storage medium | |
CN110020050A (en) | A kind of intelligent grabbing rule configuration technology implementation method based on normative document | |
CN102902737B (en) | A kind of network image is independently collected and screening technique | |
CN108205548A (en) | A kind of Web Spider structure and its method of work based on agriculture webpage information acquisition | |
CN107122403A (en) | A kind of webpage academic report information extraction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |