CN104462431A - Method for crawling web page recruitment information - Google Patents

Method for crawling web page recruitment information Download PDF

Info

Publication number
CN104462431A
CN104462431A CN201410774571.0A CN201410774571A CN104462431A CN 104462431 A CN104462431 A CN 104462431A CN 201410774571 A CN201410774571 A CN 201410774571A CN 104462431 A CN104462431 A CN 104462431A
Authority
CN
China
Prior art keywords
recruitment
information
recruitment information
web page
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410774571.0A
Other languages
Chinese (zh)
Inventor
邱继钊
于治楼
范莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Group Co Ltd
Original Assignee
Inspur Software Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Group Co Ltd filed Critical Inspur Software Group Co Ltd
Priority to CN201410774571.0A priority Critical patent/CN104462431A/en
Publication of CN104462431A publication Critical patent/CN104462431A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a method for crawling web page recruitment information, which solves the problems of difficulty in acquiring and warehousing the web page recruitment information. At present, a certain number of recruitment websites exist on the internet, the main recruitment of enterprises is carried out by releasing recruitment information on the recruitment websites, and similarly, the main way for an applicant to acquire work is the recruitment information released on the recruitment websites by the enterprises. The recruitment information can reflect the requirements and changes of the current social and economic structure to a certain extent; if scientific processing and analysis are carried out on the recruitment information, more targeted policy adjustment and talent culture can be realized.

Description

A kind of method crawling webpage recruitment information
Technical field
The present invention relates to a kind of computer utility, specifically a kind of method crawling webpage recruitment information.
Background technology
Along with popularizing of internet, the carrier of recruitment information is turned to all kinds of recruitment websites on internet gradually by papery newpapers and periodicals.Now, recruitment website has become enterprise and applicant and has issued and the main path obtaining recruitment information.Enterprise, in order to recruit the high-grade, precision and advanced talent, all can issue corresponding recruitment information on different recruitment websites, and applicant, in order to find satisfied work, also can go to different websites to go to find corresponding recruitment information.Along with being on the increase of recruitment website, recruitment information also presents ever-increasing trend, and the information content is different and changeable along with post and enterprise different also present, and this gives to gather and has also been with a numerous difficult problem as follows:
1. the page is irregular, causes the changeable of rule;
2., along with the continuous increase of data volume, page address constantly changes;
3. site information renewal speed is fast.
Summary of the invention
The object of this invention is to provide a kind of method crawling webpage recruitment information.
The object of the invention is to gather for all kinds of recruitment informations on recruitment website, mainly issue obtain the topmost approach of recruitment information because recruitment website has become current enterprise and applicant.According to gathering the rule of internet data, recruitment information all kinds of in recruitment website is gathered: the object of the invention is to realize in the following manner, concrete steps are as follows:
1) acquisition software and packet catcher are installed;
2) analyze recruitment website address, find the address of respective different classes of recruitment information;
3) obtain paging information by packet catcher, configuration related tool implementation data gathers;
4) the main flow recruitment website that will gather is found from internet;
5) packet catcher is utilized to obtain the page address of all kinds of recruitment information;
6) analyze the page, find the page rule of the recruitment information that will capture;
7) information acquisition is carried out by the rule that Allocation Analysis is good;
8) gather data storing to database.
Object beneficial effect of the present invention is: solve webpage recruitment information and gather difficult, that warehouse-in is difficult problem.Current internet exists the recruitment website of some, the now main recruitment of enterprise is undertaken by issuing recruitment information at recruitment website, and same, the main path that applicant obtains work is the recruitment information that enterprise issues on recruitment website.These recruitment informations can react the requirement and change of society economic structure to a certain extent; If carry out the treatment and analyses of science to recruitment information, can realize having more policy adjustment targetedly and personnel training.
Accompanying drawing explanation
Fig. 1 is the process flow diagram crawling webpage recruitment information.
Embodiment
With reference to Figure of description, method of the present invention is described in detail below.
Because different recruitment website address is different, different classes of recruitment information address is different especially, and therefore, a point following step carries out data acquisition to recruitment information:
1) acquisition software and packet catcher are installed;
2) analyze recruitment website address, find the address of respective different classes of recruitment information;
3) obtain paging information by packet catcher, configuration related tool implementation data gathers;
4) the main flow recruitment website that will gather is found from internet;
5) packet catcher is utilized to obtain the page address of all kinds of recruitment information;
6) analyze the page, find the page rule of the recruitment information that will capture;
7) information acquisition is carried out by the rule that Allocation Analysis is good;
8) gather data storing to database.
Except the technical characteristic described in instructions, be the known technology of those skilled in the art.

Claims (1)

1. crawl a method for webpage recruitment information, it is characterized in that concrete steps are as follows:
1) acquisition software and packet catcher are installed;
2) analyze recruitment website address, find the address of respective different classes of recruitment information;
3) obtain paging information by packet catcher, configuration related tool implementation data gathers;
4) the main flow recruitment website that will gather is found from internet;
5) packet catcher is utilized to obtain the page address of all kinds of recruitment information;
6) analyze the page, find the page rule of the recruitment information that will capture;
7) information acquisition is carried out by the rule that Allocation Analysis is good;
8) gather data storing to database.
CN201410774571.0A 2014-12-16 2014-12-16 Method for crawling web page recruitment information Pending CN104462431A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410774571.0A CN104462431A (en) 2014-12-16 2014-12-16 Method for crawling web page recruitment information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410774571.0A CN104462431A (en) 2014-12-16 2014-12-16 Method for crawling web page recruitment information

Publications (1)

Publication Number Publication Date
CN104462431A true CN104462431A (en) 2015-03-25

Family

ID=52908466

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410774571.0A Pending CN104462431A (en) 2014-12-16 2014-12-16 Method for crawling web page recruitment information

Country Status (1)

Country Link
CN (1) CN104462431A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512864A (en) * 2016-01-28 2016-04-20 丁沂 Method for automatically acquiring post professional ability requirements based on internet
CN107203872A (en) * 2017-05-26 2017-09-26 山东省科学院情报研究所 Region demand for talent based on big data quantifies analysis method
CN108733827A (en) * 2018-05-24 2018-11-02 佛山市轻遣网络有限公司 A kind of recruitment information acquisition methods outside recruitment website and system
CN112506986A (en) * 2020-11-19 2021-03-16 阿坝师范学院 Specific professional talent skill requirement mining system based on web recruitment information
CN113254745A (en) * 2021-04-28 2021-08-13 深圳格隆汇信息科技有限公司 Economic information collection system, method, computer device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004145695A (en) * 2002-10-25 2004-05-20 Matsushita Electric Ind Co Ltd Filtering information processing system
CN101452463A (en) * 2007-12-05 2009-06-10 浙江大学 Method and apparatus for directionally grabbing page resource
CN101957866A (en) * 2010-10-25 2011-01-26 中国农业大学 Network text information integration method and device
CN102184227A (en) * 2011-05-10 2011-09-14 北京邮电大学 General crawler engine system used for WEB service and working method thereof
CN103186613A (en) * 2011-12-30 2013-07-03 大连天维科技有限公司 Movie and television resource aggregation system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004145695A (en) * 2002-10-25 2004-05-20 Matsushita Electric Ind Co Ltd Filtering information processing system
CN101452463A (en) * 2007-12-05 2009-06-10 浙江大学 Method and apparatus for directionally grabbing page resource
CN101957866A (en) * 2010-10-25 2011-01-26 中国农业大学 Network text information integration method and device
CN102184227A (en) * 2011-05-10 2011-09-14 北京邮电大学 General crawler engine system used for WEB service and working method thereof
CN103186613A (en) * 2011-12-30 2013-07-03 大连天维科技有限公司 Movie and television resource aggregation system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HIHEIHEICDN: "招聘信息抓取系统", 《HTTP://BLOG.CSDN.NET/HIHEIHEICDN/ARTICLE/DETAILS/6470642》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512864A (en) * 2016-01-28 2016-04-20 丁沂 Method for automatically acquiring post professional ability requirements based on internet
CN107203872A (en) * 2017-05-26 2017-09-26 山东省科学院情报研究所 Region demand for talent based on big data quantifies analysis method
CN107203872B (en) * 2017-05-26 2020-06-02 山东省科学院情报研究所 Regional talent demand quantitative analysis method based on big data
CN108733827A (en) * 2018-05-24 2018-11-02 佛山市轻遣网络有限公司 A kind of recruitment information acquisition methods outside recruitment website and system
CN112506986A (en) * 2020-11-19 2021-03-16 阿坝师范学院 Specific professional talent skill requirement mining system based on web recruitment information
CN113254745A (en) * 2021-04-28 2021-08-13 深圳格隆汇信息科技有限公司 Economic information collection system, method, computer device and storage medium

Similar Documents

Publication Publication Date Title
CN104462431A (en) Method for crawling web page recruitment information
US10509555B2 (en) Machine data analysis in an information technology environment
WO2015002947A3 (en) Welding system parameter comparison system and method
GB201011179D0 (en) Method of optimizing data flow between a software application and a database server
MX2016001077A (en) Systems and methods for a distributed clinical laboratory.
MX340212B (en) Application program management method and apparatus, server, and terminal device.
CN103942309B (en) A kind of implementation method of Network Data Capture equipment, method and acquisition process
MX356304B (en) Memory resource optimization method and apparatus.
RU2014149895A (en) METHOD FOR DETERMINING THE LOCATION OF POINTS OF INTENSIFICATION OF THE RAY OF THE BREAKTHROUGH USING THE MINERALOGICAL COMPOSITION, AND ALSO THE RELATED SYSTEM AND SOFTWARE PRODUCT
CN104166683A (en) Data mining method
CN106547774B (en) Website content detection method and device
CN102289489A (en) System for title publishing or browsing comment aiming at any webpage
CN105045890A (en) Method and device for determining hot news in target news source
US9652139B1 (en) Graphical representation of an output
CN107016106A (en) A kind of information acquisition system and Web application
CN103810177A (en) Method for accurately obtaining real dwell time of website visitor on webpages
Ye et al. Big data analytics and cloud computing in the smart grid
CN106503213A (en) A kind of network data information shows management method and system
Sedkaoui Data analytics process: there's great work behind the scenes
WO2015099987A3 (en) Hydrocarbon data management software execution system
Vincent et al. The Pinoke Project performance
CN102567356A (en) Method for cutting and filtering webpage content
Nelson Big data, big decisions
Lee et al. Adaptive run-time overhead adjustments for optimizing multiple continuous query processing
Nyambe The administration of justice in the local and surbodinate cpourts of Zambia: an overview of the appropriateness of the present structure, procedure and qualifications of personnel

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150325