CN107870966A - A kind of recruitment general regulations data pick-up method based on semantic model - Google Patents
A kind of recruitment general regulations data pick-up method based on semantic model Download PDFInfo
- Publication number
- CN107870966A CN107870966A CN201710686374.7A CN201710686374A CN107870966A CN 107870966 A CN107870966 A CN 107870966A CN 201710686374 A CN201710686374 A CN 201710686374A CN 107870966 A CN107870966 A CN 107870966A
- Authority
- CN
- China
- Prior art keywords
- word
- recruitment
- general regulations
- speech
- method based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to Computer Applied Technology field, is concretely related to a kind of recruitment general regulations data pick-up method based on semantic model, including following steps:S1, the establishment for carrying out multithreading and queue to the system first, S2, row label cleaning is entered to recruitment general regulations, S3, traversal takes out the link of net Shen address in full, link is stored in database, S4, carry out other information extraction, the first step to analyzing in full, then takes out the keyword of position, orients the position of position first;Second step, the word between position and position is taken out, then further excavated, position first position, taken out it to the word before next section of position, take out these vocabulary, cleaning storage, S5, difficult article semanteme participle model is extracted;The present invention is by the way that recruitment information is classified, label cleans, database classification typing, and carries out part-of-speech tagging, and people can be helped fast and effeciently to extract the useful information needed to people itself.
Description
Technical field
The present invention relates to Computer Applied Technology field, is concretely related to a kind of recruitment general regulations number based on semantic model
According to abstracting method.
Background technology
The increasing university student of China will pour in society, the eye that a feast for the eyes recruitment information selects by students now
Spend confused, present university student is that the job centre of each school goes to check these recruitment informations, and some goes number of site to see
Some recruitment informations collect, but each recruitment information form is mixed and disorderly, allow students to see gruelling, have some to be even difficult to
Useful information is therefrom found, wastes the students substantial amounts of time, so we need a method, from mixed and disorderly recruitment
The useful information of students needs is extracted in general regulations.
The content of the invention
Therefore the present invention proposes a kind of recruitment general regulations data pick-up method based on semantic model, for solving from a large amount of miscellaneous
The problem of extraction useful information is carried out to the net information such as Shen address, position vacant in random recruitment general regulations.
1st, the technical proposal of the invention is realized in this way:A kind of recruitment general regulations data pick-up side based on semantic model
Method, including following steps:
S1, the establishment for carrying out multithreading and queue to the system first, for carrying out the flow work, and establish position
Complete works, city is complete works of, the complete works of expectation of specialty;
S2, row label cleaning is entered to recruitment general regulations;
S3, traversal takes out the link of net Shen address in full, and link is stored in into database;
S4, other information extraction is carried out, the first step to analyzing in full, then takes out the keyword of position, determined first
Position goes out the position of position;Second step, the word between position and position is taken out, then further excavated, position first duty
Position, it is taken out to the word before next section of position, takes out these vocabulary, cleaning storage;
S5, difficult article semanteme participle model being extracted, the word divided first recruitment general regulations carries out part-of-speech tagging,
Position word is defined as nr, and position word neutrality is nt, and professional word is n, and measure word eng, punctuation mark w, left bracket lbh, the right side includes
Number it is rbh, company's word is h, place name ns, and mark is completed.
Further, in the S2 enter row label cleaning when, by more than html and javascript language carry out
Reject.
Further, what the difficult article semanteme participle model extracted concretely comprises the following steps:
A, recruitment general regulations are segmented first, then removed " " " wait stop words and insignificant word;
B, part-of-speech tagging is carried out to the word that has divided, position word is defined as nr, and position word neutrality is nt, and professional word is n, measure word
For eng, punctuation mark w, left bracket lbh, right parenthesis rbh, company's word is h, place name ns;
C, after the completion of part-of-speech tagging, by general regulations divide into two kinds of texts it is a kind of be participle after source text, a kind of part of speech
Part of speech arrangement text after mark;
D, it is last, substantial amounts of general regulations are subjected to part of speech and carry out text mining, arrange position arrangement with part of speech from text
Compare, extract the part of speech queueing discipline of position, after the part of speech queueing discipline for extracting position, according to this Rule Extraction
Go out other positions of same rule.
By above disclosure, beneficial effects of the present invention are:The present invention is by the way that recruitment information is classified, label
Cleaning, database classification typing, and part-of-speech tagging is carried out, people can be helped fast and effeciently to extract to people certainly
The useful information that body needs.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the recruitment general regulations data pick-up embodiment of the method based on semantic model of the present invention.
Embodiment
Below in conjunction with the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described,
Obviously, described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based in the present invention
Embodiment, the every other embodiment that those of ordinary skill in the art are obtained under the premise of creative work is not made, all
Belong to the scope of protection of the invention.
As shown in figure 1, a kind of recruitment general regulations data pick-up method based on semantic model, including following steps:
S1, the establishment for carrying out multithreading and queue to the system first, for carrying out the flow work, and establish position
Complete works, city is complete works of, the complete works of expectation of specialty;
S2, row label cleaning is entered to recruitment general regulations, by more than html and javascript language rejected, then
Into in next step;
S3, traversal takes out the link of net Shen address in full, and link is stored in into database;
S4, other information extraction is carried out, the first step to analyzing in full, then takes out the keyword of position, determined first
Position goes out the position of position;Second step, the word between position and position is taken out, then further excavated, position first duty
Position, it is taken out to the word before next section of position, takes out these vocabulary, cleaning storage;
S5, difficult article semanteme participle model being extracted, the word divided first recruitment general regulations carries out part-of-speech tagging,
Position word is defined as nr, and position word neutrality is nt, and professional word is n, and measure word eng, punctuation mark w, left bracket lbh, the right side includes
Number it is rbh, company's word is h, place name ns, and mark is completed by the analysis to substantial amounts of general regulations, and we can draw a conclusion.
Such as:Recruit post:Technician, participle are exactly later to recruit post:Technician's part of speech arrangement be exactly kwnr so
Nr is just position certainly, if occurring, this centre word of kwnt so nt can also judge position, add this rule afterwards
Go to search in part of speech arrangement kind, find corresponding participle table can and find out position.
Finally illustrate, the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted, although with reference to compared with
The present invention is described in detail good embodiment, it will be understood by those within the art that, can be to the skill of the present invention
Art scheme is modified or equivalent substitution, and without departing from the objective and scope of technical solution of the present invention, it all should cover at this
Among the right of invention.
Claims (3)
- A kind of 1. recruitment general regulations data pick-up method based on semantic model, it is characterised in that including following steps:S1, the establishment for carrying out multithreading and queue to the system first, for carrying out the flow work, and it is big to establish position Entirely, city is complete works of, the complete works of expectation of specialty;S2, row label cleaning is entered to recruitment general regulations;S3, traversal takes out the link of net Shen address in full, and link is stored in into database;S4, other information extraction is carried out, the first step to analyzing in full, then takes out the keyword of position, oriented first The position of position;Second step, the word between position and position is taken out, then further excavated, positioned first position, take Go out it to the word before next section of position, take out these vocabulary, cleaning storage;S5, difficult article semanteme participle model is extracted, the word divided first recruitment general regulations carries out part-of-speech tagging, position Word is defined as nr, and position word neutrality is nt, and professional word is n, measure word eng, punctuation mark w, left bracket lbh, and right parenthesis is Rbh, company's word are h, place name ns, and mark is completed.
- A kind of 2. recruitment general regulations data pick-up method based on semantic model according to claim 1, it is characterised in that:Institute State in S2 when entering row label cleaning, by more than html and javascript language rejected.
- A kind of 3. recruitment general regulations data pick-up method based on semantic model according to claim 1, it is characterised in that:Institute State concretely comprising the following steps for difficult article semanteme participle model extraction:A, recruitment general regulations are segmented first, then removed " " " wait stop words and insignificant word;B, part-of-speech tagging is carried out to the word divided, position word is defined as nr, and position word neutrality is nt, and professional word is n, and measure word is Eng, punctuation mark w, left bracket lbh, right parenthesis rbh, company's word are h, place name ns;C, after the completion of part-of-speech tagging, by general regulations divide into two kinds of texts it is a kind of be participle after source text, a kind of part-of-speech tagging Part of speech arrangement text afterwards;D, it is last, substantial amounts of general regulations are subjected to part of speech and carry out text mining, arrange position compared with part of speech arranges from text Compared with extracting the part of speech queueing discipline of position, after the part of speech queueing discipline for extracting position, go out phase according to this Rule Extraction With other positions of rule.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710686374.7A CN107870966A (en) | 2017-08-11 | 2017-08-11 | A kind of recruitment general regulations data pick-up method based on semantic model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710686374.7A CN107870966A (en) | 2017-08-11 | 2017-08-11 | A kind of recruitment general regulations data pick-up method based on semantic model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107870966A true CN107870966A (en) | 2018-04-03 |
Family
ID=61761803
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710686374.7A Pending CN107870966A (en) | 2017-08-11 | 2017-08-11 | A kind of recruitment general regulations data pick-up method based on semantic model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107870966A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177369A (en) * | 2019-11-19 | 2020-05-19 | 厦门二五八网络科技集团股份有限公司 | Method and device for automatically classifying labels of articles |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102314417A (en) * | 2011-09-22 | 2012-01-11 | 西安电子科技大学 | Method for identifying Web named entity based on statistical model |
US20130262086A1 (en) * | 2012-03-27 | 2013-10-03 | Accenture Global Services Limited | Generation of a semantic model from textual listings |
CN104346382A (en) * | 2013-07-31 | 2015-02-11 | 香港理工大学 | Text analysis system and method employing language query |
CN104572849A (en) * | 2014-12-17 | 2015-04-29 | 西安美林数据技术股份有限公司 | Automatic standardized filing method based on text semantic mining |
CN106777048A (en) * | 2016-12-09 | 2017-05-31 | 全国组织机构代码管理中心 | Enterprise-quality credit data acquisition methods and system |
CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | System and method for constructing knowledge graph for information analysis |
-
2017
- 2017-08-11 CN CN201710686374.7A patent/CN107870966A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102314417A (en) * | 2011-09-22 | 2012-01-11 | 西安电子科技大学 | Method for identifying Web named entity based on statistical model |
US20130262086A1 (en) * | 2012-03-27 | 2013-10-03 | Accenture Global Services Limited | Generation of a semantic model from textual listings |
CN104346382A (en) * | 2013-07-31 | 2015-02-11 | 香港理工大学 | Text analysis system and method employing language query |
CN104572849A (en) * | 2014-12-17 | 2015-04-29 | 西安美林数据技术股份有限公司 | Automatic standardized filing method based on text semantic mining |
CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | System and method for constructing knowledge graph for information analysis |
CN106777048A (en) * | 2016-12-09 | 2017-05-31 | 全国组织机构代码管理中心 | Enterprise-quality credit data acquisition methods and system |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177369A (en) * | 2019-11-19 | 2020-05-19 | 厦门二五八网络科技集团股份有限公司 | Method and device for automatically classifying labels of articles |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107832229B (en) | NLP-based system test case automatic generation method | |
CN110347894A (en) | Knowledge mapping processing method, device, computer equipment and storage medium based on crawler | |
CN110968700A (en) | Domain event map construction method and device fusing multi-class affairs and entity knowledge | |
JP5212604B2 (en) | Risk detection system, risk detection method and program thereof | |
CN107330071A (en) | A kind of legal advice information intelligent replies method and platform | |
CN107844609A (en) | A kind of emergency information abstracting method and system based on style and vocabulary | |
CN109858010A (en) | Field new word identification method, device, computer equipment and storage medium | |
DE102019001267A1 (en) | Dialog-like system for answering inquiries | |
ES2375403T3 (en) | A METHOD FOR THE AUTOMATIC INDEXATION OF DOCUMENTS. | |
Raharjana et al. | User story extraction from online news for software requirements elicitation: A conceptual model | |
CN107194617B (en) | App software engineer soft skill classification system and method | |
CN113282955B (en) | Method, system, terminal and medium for extracting privacy information in privacy policy | |
CN112036153B (en) | Work order error correction method and device, computer readable storage medium and computer equipment | |
CN108932278B (en) | Man-machine conversation method and system based on semantic framework | |
CN109933671A (en) | Construct method, apparatus, computer equipment and the storage medium of personal knowledge map | |
CN104536953A (en) | Method and device for recognizing textual emotion polarity | |
CN110609983A (en) | Structured decomposition method for policy file | |
CN109614623A (en) | A kind of composition processing method and system based on syntactic analysis | |
CN110990587A (en) | Enterprise relation discovery method and system based on topic model | |
CN112380848B (en) | Text generation method, device, equipment and storage medium | |
CN110134844A (en) | Subdivision field public sentiment monitoring method, device, computer equipment and storage medium | |
CN107870966A (en) | A kind of recruitment general regulations data pick-up method based on semantic model | |
Khorjuvenkar et al. | Parts of speech tagging for Konkani language | |
Dejean | Extracting structured data from unstructured document with incomplete resources | |
Yamada et al. | Annotation of argument structure in Japanese legal documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180403 |