CN104462431A - Method for crawling web page recruitment information - Google Patents
Method for crawling web page recruitment information Download PDFInfo
- Publication number
- CN104462431A CN104462431A CN201410774571.0A CN201410774571A CN104462431A CN 104462431 A CN104462431 A CN 104462431A CN 201410774571 A CN201410774571 A CN 201410774571A CN 104462431 A CN104462431 A CN 104462431A
- Authority
- CN
- China
- Prior art keywords
- recruitment
- information
- recruitment information
- web page
- page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007115 recruitment Effects 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 8
- 230000009193 crawling Effects 0.000 title abstract description 6
- 238000004458 analytical method Methods 0.000 claims abstract description 5
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention relates to a method for crawling web page recruitment information, which solves the problems of difficulty in acquiring and warehousing the web page recruitment information. At present, a certain number of recruitment websites exist on the internet, the main recruitment of enterprises is carried out by releasing recruitment information on the recruitment websites, and similarly, the main way for an applicant to acquire work is the recruitment information released on the recruitment websites by the enterprises. The recruitment information can reflect the requirements and changes of the current social and economic structure to a certain extent; if scientific processing and analysis are carried out on the recruitment information, more targeted policy adjustment and talent culture can be realized.
Description
Technical field
The present invention relates to a kind of computer utility, specifically a kind of method crawling webpage recruitment information.
Background technology
Along with popularizing of internet, the carrier of recruitment information is turned to all kinds of recruitment websites on internet gradually by papery newpapers and periodicals.Now, recruitment website has become enterprise and applicant and has issued and the main path obtaining recruitment information.Enterprise, in order to recruit the high-grade, precision and advanced talent, all can issue corresponding recruitment information on different recruitment websites, and applicant, in order to find satisfied work, also can go to different websites to go to find corresponding recruitment information.Along with being on the increase of recruitment website, recruitment information also presents ever-increasing trend, and the information content is different and changeable along with post and enterprise different also present, and this gives to gather and has also been with a numerous difficult problem as follows:
1. the page is irregular, causes the changeable of rule;
2., along with the continuous increase of data volume, page address constantly changes;
3. site information renewal speed is fast.
Summary of the invention
The object of this invention is to provide a kind of method crawling webpage recruitment information.
The object of the invention is to gather for all kinds of recruitment informations on recruitment website, mainly issue obtain the topmost approach of recruitment information because recruitment website has become current enterprise and applicant.According to gathering the rule of internet data, recruitment information all kinds of in recruitment website is gathered: the object of the invention is to realize in the following manner, concrete steps are as follows:
1) acquisition software and packet catcher are installed;
2) analyze recruitment website address, find the address of respective different classes of recruitment information;
3) obtain paging information by packet catcher, configuration related tool implementation data gathers;
4) the main flow recruitment website that will gather is found from internet;
5) packet catcher is utilized to obtain the page address of all kinds of recruitment information;
6) analyze the page, find the page rule of the recruitment information that will capture;
7) information acquisition is carried out by the rule that Allocation Analysis is good;
8) gather data storing to database.
Object beneficial effect of the present invention is: solve webpage recruitment information and gather difficult, that warehouse-in is difficult problem.Current internet exists the recruitment website of some, the now main recruitment of enterprise is undertaken by issuing recruitment information at recruitment website, and same, the main path that applicant obtains work is the recruitment information that enterprise issues on recruitment website.These recruitment informations can react the requirement and change of society economic structure to a certain extent; If carry out the treatment and analyses of science to recruitment information, can realize having more policy adjustment targetedly and personnel training.
Accompanying drawing explanation
Fig. 1 is the process flow diagram crawling webpage recruitment information.
Embodiment
With reference to Figure of description, method of the present invention is described in detail below.
Because different recruitment website address is different, different classes of recruitment information address is different especially, and therefore, a point following step carries out data acquisition to recruitment information:
1) acquisition software and packet catcher are installed;
2) analyze recruitment website address, find the address of respective different classes of recruitment information;
3) obtain paging information by packet catcher, configuration related tool implementation data gathers;
4) the main flow recruitment website that will gather is found from internet;
5) packet catcher is utilized to obtain the page address of all kinds of recruitment information;
6) analyze the page, find the page rule of the recruitment information that will capture;
7) information acquisition is carried out by the rule that Allocation Analysis is good;
8) gather data storing to database.
Except the technical characteristic described in instructions, be the known technology of those skilled in the art.
Claims (1)
1. crawl a method for webpage recruitment information, it is characterized in that concrete steps are as follows:
1) acquisition software and packet catcher are installed;
2) analyze recruitment website address, find the address of respective different classes of recruitment information;
3) obtain paging information by packet catcher, configuration related tool implementation data gathers;
4) the main flow recruitment website that will gather is found from internet;
5) packet catcher is utilized to obtain the page address of all kinds of recruitment information;
6) analyze the page, find the page rule of the recruitment information that will capture;
7) information acquisition is carried out by the rule that Allocation Analysis is good;
8) gather data storing to database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410774571.0A CN104462431A (en) | 2014-12-16 | 2014-12-16 | Method for crawling web page recruitment information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410774571.0A CN104462431A (en) | 2014-12-16 | 2014-12-16 | Method for crawling web page recruitment information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104462431A true CN104462431A (en) | 2015-03-25 |
Family
ID=52908466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410774571.0A Pending CN104462431A (en) | 2014-12-16 | 2014-12-16 | Method for crawling web page recruitment information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104462431A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512864A (en) * | 2016-01-28 | 2016-04-20 | 丁沂 | Method for automatically acquiring post professional ability requirements based on internet |
CN107203872A (en) * | 2017-05-26 | 2017-09-26 | 山东省科学院情报研究所 | Region demand for talent based on big data quantifies analysis method |
CN108733827A (en) * | 2018-05-24 | 2018-11-02 | 佛山市轻遣网络有限公司 | A kind of recruitment information acquisition methods outside recruitment website and system |
CN112506986A (en) * | 2020-11-19 | 2021-03-16 | 阿坝师范学院 | Specific professional talent skill requirement mining system based on web recruitment information |
CN113254745A (en) * | 2021-04-28 | 2021-08-13 | 深圳格隆汇信息科技有限公司 | Economic information collection system, method, computer device and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004145695A (en) * | 2002-10-25 | 2004-05-20 | Matsushita Electric Ind Co Ltd | Filtering information processing system |
CN101452463A (en) * | 2007-12-05 | 2009-06-10 | 浙江大学 | Method and apparatus for directionally grabbing page resource |
CN101957866A (en) * | 2010-10-25 | 2011-01-26 | 中国农业大学 | Network text information integration method and device |
CN102184227A (en) * | 2011-05-10 | 2011-09-14 | 北京邮电大学 | General crawler engine system used for WEB service and working method thereof |
CN103186613A (en) * | 2011-12-30 | 2013-07-03 | 大连天维科技有限公司 | Movie and television resource aggregation system |
-
2014
- 2014-12-16 CN CN201410774571.0A patent/CN104462431A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004145695A (en) * | 2002-10-25 | 2004-05-20 | Matsushita Electric Ind Co Ltd | Filtering information processing system |
CN101452463A (en) * | 2007-12-05 | 2009-06-10 | 浙江大学 | Method and apparatus for directionally grabbing page resource |
CN101957866A (en) * | 2010-10-25 | 2011-01-26 | 中国农业大学 | Network text information integration method and device |
CN102184227A (en) * | 2011-05-10 | 2011-09-14 | 北京邮电大学 | General crawler engine system used for WEB service and working method thereof |
CN103186613A (en) * | 2011-12-30 | 2013-07-03 | 大连天维科技有限公司 | Movie and television resource aggregation system |
Non-Patent Citations (1)
Title |
---|
HIHEIHEICDN: "招聘信息抓取系统", 《HTTP://BLOG.CSDN.NET/HIHEIHEICDN/ARTICLE/DETAILS/6470642》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105512864A (en) * | 2016-01-28 | 2016-04-20 | 丁沂 | Method for automatically acquiring post professional ability requirements based on internet |
CN107203872A (en) * | 2017-05-26 | 2017-09-26 | 山东省科学院情报研究所 | Region demand for talent based on big data quantifies analysis method |
CN107203872B (en) * | 2017-05-26 | 2020-06-02 | 山东省科学院情报研究所 | Regional talent demand quantitative analysis method based on big data |
CN108733827A (en) * | 2018-05-24 | 2018-11-02 | 佛山市轻遣网络有限公司 | A kind of recruitment information acquisition methods outside recruitment website and system |
CN112506986A (en) * | 2020-11-19 | 2021-03-16 | 阿坝师范学院 | Specific professional talent skill requirement mining system based on web recruitment information |
CN113254745A (en) * | 2021-04-28 | 2021-08-13 | 深圳格隆汇信息科技有限公司 | Economic information collection system, method, computer device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104462431A (en) | Method for crawling web page recruitment information | |
US10509555B2 (en) | Machine data analysis in an information technology environment | |
WO2015002947A3 (en) | Welding system parameter comparison system and method | |
GB201011179D0 (en) | Method of optimizing data flow between a software application and a database server | |
MX2016001077A (en) | Systems and methods for a distributed clinical laboratory. | |
MX340212B (en) | Application program management method and apparatus, server, and terminal device. | |
CN103942309B (en) | A kind of implementation method of Network Data Capture equipment, method and acquisition process | |
MX356304B (en) | Memory resource optimization method and apparatus. | |
RU2014149895A (en) | METHOD FOR DETERMINING THE LOCATION OF POINTS OF INTENSIFICATION OF THE RAY OF THE BREAKTHROUGH USING THE MINERALOGICAL COMPOSITION, AND ALSO THE RELATED SYSTEM AND SOFTWARE PRODUCT | |
CN104166683A (en) | Data mining method | |
CN106547774B (en) | Website content detection method and device | |
CN102289489A (en) | System for title publishing or browsing comment aiming at any webpage | |
CN105045890A (en) | Method and device for determining hot news in target news source | |
US9652139B1 (en) | Graphical representation of an output | |
CN107016106A (en) | A kind of information acquisition system and Web application | |
CN103810177A (en) | Method for accurately obtaining real dwell time of website visitor on webpages | |
Ye et al. | Big data analytics and cloud computing in the smart grid | |
CN106503213A (en) | A kind of network data information shows management method and system | |
Sedkaoui | Data analytics process: there's great work behind the scenes | |
WO2015099987A3 (en) | Hydrocarbon data management software execution system | |
Vincent et al. | The Pinoke Project performance | |
CN102567356A (en) | Method for cutting and filtering webpage content | |
Nelson | Big data, big decisions | |
Lee et al. | Adaptive run-time overhead adjustments for optimizing multiple continuous query processing | |
Nyambe | The administration of justice in the local and surbodinate cpourts of Zambia: an overview of the appropriateness of the present structure, procedure and qualifications of personnel |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20150325 |