CN107870966A - A kind of recruitment general regulations data pick-up method based on semantic model - Google Patents

A kind of recruitment general regulations data pick-up method based on semantic model Download PDF

Info

Publication number
CN107870966A
CN107870966A CN201710686374.7A CN201710686374A CN107870966A CN 107870966 A CN107870966 A CN 107870966A CN 201710686374 A CN201710686374 A CN 201710686374A CN 107870966 A CN107870966 A CN 107870966A
Authority
CN
China
Prior art keywords
word
recruitment
general regulations
speech
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710686374.7A
Other languages
Chinese (zh)
Inventor
何梁
王承明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sprout Technology LLC
Original Assignee
Chengdu Sprout Technology LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sprout Technology LLC filed Critical Chengdu Sprout Technology LLC
Priority to CN201710686374.7A priority Critical patent/CN107870966A/en
Publication of CN107870966A publication Critical patent/CN107870966A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to Computer Applied Technology field, is concretely related to a kind of recruitment general regulations data pick-up method based on semantic model, including following steps:S1, the establishment for carrying out multithreading and queue to the system first, S2, row label cleaning is entered to recruitment general regulations, S3, traversal takes out the link of net Shen address in full, link is stored in database, S4, carry out other information extraction, the first step to analyzing in full, then takes out the keyword of position, orients the position of position first;Second step, the word between position and position is taken out, then further excavated, position first position, taken out it to the word before next section of position, take out these vocabulary, cleaning storage, S5, difficult article semanteme participle model is extracted;The present invention is by the way that recruitment information is classified, label cleans, database classification typing, and carries out part-of-speech tagging, and people can be helped fast and effeciently to extract the useful information needed to people itself.

Description

A kind of recruitment general regulations data pick-up method based on semantic model
Technical field
The present invention relates to Computer Applied Technology field, is concretely related to a kind of recruitment general regulations number based on semantic model According to abstracting method.
Background technology
The increasing university student of China will pour in society, the eye that a feast for the eyes recruitment information selects by students now Spend confused, present university student is that the job centre of each school goes to check these recruitment informations, and some goes number of site to see Some recruitment informations collect, but each recruitment information form is mixed and disorderly, allow students to see gruelling, have some to be even difficult to Useful information is therefrom found, wastes the students substantial amounts of time, so we need a method, from mixed and disorderly recruitment The useful information of students needs is extracted in general regulations.
The content of the invention
Therefore the present invention proposes a kind of recruitment general regulations data pick-up method based on semantic model, for solving from a large amount of miscellaneous The problem of extraction useful information is carried out to the net information such as Shen address, position vacant in random recruitment general regulations.
1st, the technical proposal of the invention is realized in this way:A kind of recruitment general regulations data pick-up side based on semantic model Method, including following steps:
S1, the establishment for carrying out multithreading and queue to the system first, for carrying out the flow work, and establish position Complete works, city is complete works of, the complete works of expectation of specialty;
S2, row label cleaning is entered to recruitment general regulations;
S3, traversal takes out the link of net Shen address in full, and link is stored in into database;
S4, other information extraction is carried out, the first step to analyzing in full, then takes out the keyword of position, determined first Position goes out the position of position;Second step, the word between position and position is taken out, then further excavated, position first duty Position, it is taken out to the word before next section of position, takes out these vocabulary, cleaning storage;
S5, difficult article semanteme participle model being extracted, the word divided first recruitment general regulations carries out part-of-speech tagging, Position word is defined as nr, and position word neutrality is nt, and professional word is n, and measure word eng, punctuation mark w, left bracket lbh, the right side includes Number it is rbh, company's word is h, place name ns, and mark is completed.
Further, in the S2 enter row label cleaning when, by more than html and javascript language carry out Reject.
Further, what the difficult article semanteme participle model extracted concretely comprises the following steps:
A, recruitment general regulations are segmented first, then removed " " " wait stop words and insignificant word;
B, part-of-speech tagging is carried out to the word that has divided, position word is defined as nr, and position word neutrality is nt, and professional word is n, measure word For eng, punctuation mark w, left bracket lbh, right parenthesis rbh, company's word is h, place name ns;
C, after the completion of part-of-speech tagging, by general regulations divide into two kinds of texts it is a kind of be participle after source text, a kind of part of speech Part of speech arrangement text after mark;
D, it is last, substantial amounts of general regulations are subjected to part of speech and carry out text mining, arrange position arrangement with part of speech from text Compare, extract the part of speech queueing discipline of position, after the part of speech queueing discipline for extracting position, according to this Rule Extraction Go out other positions of same rule.
By above disclosure, beneficial effects of the present invention are:The present invention is by the way that recruitment information is classified, label Cleaning, database classification typing, and part-of-speech tagging is carried out, people can be helped fast and effeciently to extract to people certainly The useful information that body needs.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the recruitment general regulations data pick-up embodiment of the method based on semantic model of the present invention.
Embodiment
Below in conjunction with the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, Obviously, described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based in the present invention Embodiment, the every other embodiment that those of ordinary skill in the art are obtained under the premise of creative work is not made, all Belong to the scope of protection of the invention.
As shown in figure 1, a kind of recruitment general regulations data pick-up method based on semantic model, including following steps:
S1, the establishment for carrying out multithreading and queue to the system first, for carrying out the flow work, and establish position Complete works, city is complete works of, the complete works of expectation of specialty;
S2, row label cleaning is entered to recruitment general regulations, by more than html and javascript language rejected, then Into in next step;
S3, traversal takes out the link of net Shen address in full, and link is stored in into database;
S4, other information extraction is carried out, the first step to analyzing in full, then takes out the keyword of position, determined first Position goes out the position of position;Second step, the word between position and position is taken out, then further excavated, position first duty Position, it is taken out to the word before next section of position, takes out these vocabulary, cleaning storage;
S5, difficult article semanteme participle model being extracted, the word divided first recruitment general regulations carries out part-of-speech tagging, Position word is defined as nr, and position word neutrality is nt, and professional word is n, and measure word eng, punctuation mark w, left bracket lbh, the right side includes Number it is rbh, company's word is h, place name ns, and mark is completed by the analysis to substantial amounts of general regulations, and we can draw a conclusion. Such as:Recruit post:Technician, participle are exactly later to recruit post:Technician's part of speech arrangement be exactly kwnr so Nr is just position certainly, if occurring, this centre word of kwnt so nt can also judge position, add this rule afterwards Go to search in part of speech arrangement kind, find corresponding participle table can and find out position.
Finally illustrate, the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted, although with reference to compared with The present invention is described in detail good embodiment, it will be understood by those within the art that, can be to the skill of the present invention Art scheme is modified or equivalent substitution, and without departing from the objective and scope of technical solution of the present invention, it all should cover at this Among the right of invention.

Claims (3)

  1. A kind of 1. recruitment general regulations data pick-up method based on semantic model, it is characterised in that including following steps:
    S1, the establishment for carrying out multithreading and queue to the system first, for carrying out the flow work, and it is big to establish position Entirely, city is complete works of, the complete works of expectation of specialty;
    S2, row label cleaning is entered to recruitment general regulations;
    S3, traversal takes out the link of net Shen address in full, and link is stored in into database;
    S4, other information extraction is carried out, the first step to analyzing in full, then takes out the keyword of position, oriented first The position of position;Second step, the word between position and position is taken out, then further excavated, positioned first position, take Go out it to the word before next section of position, take out these vocabulary, cleaning storage;
    S5, difficult article semanteme participle model is extracted, the word divided first recruitment general regulations carries out part-of-speech tagging, position Word is defined as nr, and position word neutrality is nt, and professional word is n, measure word eng, punctuation mark w, left bracket lbh, and right parenthesis is Rbh, company's word are h, place name ns, and mark is completed.
  2. A kind of 2. recruitment general regulations data pick-up method based on semantic model according to claim 1, it is characterised in that:Institute State in S2 when entering row label cleaning, by more than html and javascript language rejected.
  3. A kind of 3. recruitment general regulations data pick-up method based on semantic model according to claim 1, it is characterised in that:Institute State concretely comprising the following steps for difficult article semanteme participle model extraction:
    A, recruitment general regulations are segmented first, then removed " " " wait stop words and insignificant word;
    B, part-of-speech tagging is carried out to the word divided, position word is defined as nr, and position word neutrality is nt, and professional word is n, and measure word is Eng, punctuation mark w, left bracket lbh, right parenthesis rbh, company's word are h, place name ns;
    C, after the completion of part-of-speech tagging, by general regulations divide into two kinds of texts it is a kind of be participle after source text, a kind of part-of-speech tagging Part of speech arrangement text afterwards;
    D, it is last, substantial amounts of general regulations are subjected to part of speech and carry out text mining, arrange position compared with part of speech arranges from text Compared with extracting the part of speech queueing discipline of position, after the part of speech queueing discipline for extracting position, go out phase according to this Rule Extraction With other positions of rule.
CN201710686374.7A 2017-08-11 2017-08-11 A kind of recruitment general regulations data pick-up method based on semantic model Pending CN107870966A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710686374.7A CN107870966A (en) 2017-08-11 2017-08-11 A kind of recruitment general regulations data pick-up method based on semantic model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710686374.7A CN107870966A (en) 2017-08-11 2017-08-11 A kind of recruitment general regulations data pick-up method based on semantic model

Publications (1)

Publication Number Publication Date
CN107870966A true CN107870966A (en) 2018-04-03

Family

ID=61761803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710686374.7A Pending CN107870966A (en) 2017-08-11 2017-08-11 A kind of recruitment general regulations data pick-up method based on semantic model

Country Status (1)

Country Link
CN (1) CN107870966A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177369A (en) * 2019-11-19 2020-05-19 厦门二五八网络科技集团股份有限公司 Method and device for automatically classifying labels of articles

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
US20130262086A1 (en) * 2012-03-27 2013-10-03 Accenture Global Services Limited Generation of a semantic model from textual listings
CN104346382A (en) * 2013-07-31 2015-02-11 香港理工大学 Text analysis system and method employing language query
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining
CN106777048A (en) * 2016-12-09 2017-05-31 全国组织机构代码管理中心 Enterprise-quality credit data acquisition methods and system
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 System and method for constructing knowledge graph for information analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
US20130262086A1 (en) * 2012-03-27 2013-10-03 Accenture Global Services Limited Generation of a semantic model from textual listings
CN104346382A (en) * 2013-07-31 2015-02-11 香港理工大学 Text analysis system and method employing language query
CN104572849A (en) * 2014-12-17 2015-04-29 西安美林数据技术股份有限公司 Automatic standardized filing method based on text semantic mining
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 System and method for constructing knowledge graph for information analysis
CN106777048A (en) * 2016-12-09 2017-05-31 全国组织机构代码管理中心 Enterprise-quality credit data acquisition methods and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177369A (en) * 2019-11-19 2020-05-19 厦门二五八网络科技集团股份有限公司 Method and device for automatically classifying labels of articles

Similar Documents

Publication Publication Date Title
CN107832229B (en) NLP-based system test case automatic generation method
CN110347894A (en) Knowledge mapping processing method, device, computer equipment and storage medium based on crawler
CN110968700A (en) Domain event map construction method and device fusing multi-class affairs and entity knowledge
JP5212604B2 (en) Risk detection system, risk detection method and program thereof
CN107330071A (en) A kind of legal advice information intelligent replies method and platform
CN107844609A (en) A kind of emergency information abstracting method and system based on style and vocabulary
CN109858010A (en) Field new word identification method, device, computer equipment and storage medium
DE102019001267A1 (en) Dialog-like system for answering inquiries
ES2375403T3 (en) A METHOD FOR THE AUTOMATIC INDEXATION OF DOCUMENTS.
Raharjana et al. User story extraction from online news for software requirements elicitation: A conceptual model
CN107194617B (en) App software engineer soft skill classification system and method
CN113282955B (en) Method, system, terminal and medium for extracting privacy information in privacy policy
CN112036153B (en) Work order error correction method and device, computer readable storage medium and computer equipment
CN108932278B (en) Man-machine conversation method and system based on semantic framework
CN109933671A (en) Construct method, apparatus, computer equipment and the storage medium of personal knowledge map
CN104536953A (en) Method and device for recognizing textual emotion polarity
CN110609983A (en) Structured decomposition method for policy file
CN109614623A (en) A kind of composition processing method and system based on syntactic analysis
CN110990587A (en) Enterprise relation discovery method and system based on topic model
CN112380848B (en) Text generation method, device, equipment and storage medium
CN110134844A (en) Subdivision field public sentiment monitoring method, device, computer equipment and storage medium
CN107870966A (en) A kind of recruitment general regulations data pick-up method based on semantic model
Khorjuvenkar et al. Parts of speech tagging for Konkani language
Dejean Extracting structured data from unstructured document with incomplete resources
Yamada et al. Annotation of argument structure in Japanese legal documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180403