CN107870966A

CN107870966A - A kind of recruitment general regulations data pick-up method based on semantic model

Info

Publication number: CN107870966A
Application number: CN201710686374.7A
Authority: CN
Inventors: 何梁; 王承明
Original assignee: Chengdu Sprout Technology LLC
Current assignee: Chengdu Sprout Technology LLC
Priority date: 2017-08-11
Filing date: 2017-08-11
Publication date: 2018-04-03

Abstract

The present invention relates to Computer Applied Technology field, is concretely related to a kind of recruitment general regulations data pick-up method based on semantic model, including following steps：S1, the establishment for carrying out multithreading and queue to the system first, S2, row label cleaning is entered to recruitment general regulations, S3, traversal takes out the link of net Shen address in full, link is stored in database, S4, carry out other information extraction, the first step to analyzing in full, then takes out the keyword of position, orients the position of position first；Second step, the word between position and position is taken out, then further excavated, position first position, taken out it to the word before next section of position, take out these vocabulary, cleaning storage, S5, difficult article semanteme participle model is extracted；The present invention is by the way that recruitment information is classified, label cleans, database classification typing, and carries out part-of-speech tagging, and people can be helped fast and effeciently to extract the useful information needed to people itself.

Description

A kind of recruitment general regulations data pick-up method based on semantic model

Technical field

The present invention relates to Computer Applied Technology field, is concretely related to a kind of recruitment general regulations number based on semantic model According to abstracting method.

Background technology

The increasing university student of China will pour in society, the eye that a feast for the eyes recruitment information selects by students now Spend confused, present university student is that the job centre of each school goes to check these recruitment informations, and some goes number of site to see Some recruitment informations collect, but each recruitment information form is mixed and disorderly, allow students to see gruelling, have some to be even difficult to Useful information is therefrom found, wastes the students substantial amounts of time, so we need a method, from mixed and disorderly recruitment The useful information of students needs is extracted in general regulations.

The content of the invention

Therefore the present invention proposes a kind of recruitment general regulations data pick-up method based on semantic model, for solving from a large amount of miscellaneous The problem of extraction useful information is carried out to the net information such as Shen address, position vacant in random recruitment general regulations.

1st, the technical proposal of the invention is realized in this way：A kind of recruitment general regulations data pick-up side based on semantic model Method, including following steps：

S1, the establishment for carrying out multithreading and queue to the system first, for carrying out the flow work, and establish position Complete works, city is complete works of, the complete works of expectation of specialty；

S2, row label cleaning is entered to recruitment general regulations；

S3, traversal takes out the link of net Shen address in full, and link is stored in into database；

S4, other information extraction is carried out, the first step to analyzing in full, then takes out the keyword of position, determined first Position goes out the position of position；Second step, the word between position and position is taken out, then further excavated, position first duty Position, it is taken out to the word before next section of position, takes out these vocabulary, cleaning storage；

S5, difficult article semanteme participle model being extracted, the word divided first recruitment general regulations carries out part-of-speech tagging, Position word is defined as nr, and position word neutrality is nt, and professional word is n, and measure word eng, punctuation mark w, left bracket lbh, the right side includes Number it is rbh, company's word is h, place name ns, and mark is completed.

Further, in the S2 enter row label cleaning when, by more than html and javascript language carry out Reject.

Further, what the difficult article semanteme participle model extracted concretely comprises the following steps：

A, recruitment general regulations are segmented first, then removed " " " wait stop words and insignificant word；

B, part-of-speech tagging is carried out to the word that has divided, position word is defined as nr, and position word neutrality is nt, and professional word is n, measure word For eng, punctuation mark w, left bracket lbh, right parenthesis rbh, company's word is h, place name ns；

C, after the completion of part-of-speech tagging, by general regulations divide into two kinds of texts it is a kind of be participle after source text, a kind of part of speech Part of speech arrangement text after mark；

D, it is last, substantial amounts of general regulations are subjected to part of speech and carry out text mining, arrange position arrangement with part of speech from text Compare, extract the part of speech queueing discipline of position, after the part of speech queueing discipline for extracting position, according to this Rule Extraction Go out other positions of same rule.

By above disclosure, beneficial effects of the present invention are：The present invention is by the way that recruitment information is classified, label Cleaning, database classification typing, and part-of-speech tagging is carried out, people can be helped fast and effeciently to extract to people certainly The useful information that body needs.

Brief description of the drawings

Fig. 1 is a kind of flow chart of the recruitment general regulations data pick-up embodiment of the method based on semantic model of the present invention.

Embodiment

Below in conjunction with the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, Obviously, described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.Based in the present invention Embodiment, the every other embodiment that those of ordinary skill in the art are obtained under the premise of creative work is not made, all Belong to the scope of protection of the invention.

As shown in figure 1, a kind of recruitment general regulations data pick-up method based on semantic model, including following steps：

S2, row label cleaning is entered to recruitment general regulations, by more than html and javascript language rejected, then Into in next step；

S5, difficult article semanteme participle model being extracted, the word divided first recruitment general regulations carries out part-of-speech tagging, Position word is defined as nr, and position word neutrality is nt, and professional word is n, and measure word eng, punctuation mark w, left bracket lbh, the right side includes Number it is rbh, company's word is h, place name ns, and mark is completed by the analysis to substantial amounts of general regulations, and we can draw a conclusion. Such as：Recruit post：Technician, participle are exactly later to recruit post:Technician's part of speech arrangement be exactly kwnr so Nr is just position certainly, if occurring, this centre word of kwnt so nt can also judge position, add this rule afterwards Go to search in part of speech arrangement kind, find corresponding participle table can and find out position.

Finally illustrate, the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted, although with reference to compared with The present invention is described in detail good embodiment, it will be understood by those within the art that, can be to the skill of the present invention Art scheme is modified or equivalent substitution, and without departing from the objective and scope of technical solution of the present invention, it all should cover at this Among the right of invention.

Claims

A kind of 1. recruitment general regulations data pick-up method based on semantic model, it is characterised in that including following steps：

S1, the establishment for carrying out multithreading and queue to the system first, for carrying out the flow work, and it is big to establish position Entirely, city is complete works of, the complete works of expectation of specialty；

S2, row label cleaning is entered to recruitment general regulations；

S3, traversal takes out the link of net Shen address in full, and link is stored in into database；

S4, other information extraction is carried out, the first step to analyzing in full, then takes out the keyword of position, oriented first The position of position；Second step, the word between position and position is taken out, then further excavated, positioned first position, take Go out it to the word before next section of position, take out these vocabulary, cleaning storage；

S5, difficult article semanteme participle model is extracted, the word divided first recruitment general regulations carries out part-of-speech tagging, position Word is defined as nr, and position word neutrality is nt, and professional word is n, measure word eng, punctuation mark w, left bracket lbh, and right parenthesis is Rbh, company's word are h, place name ns, and mark is completed.
A kind of 2. recruitment general regulations data pick-up method based on semantic model according to claim 1, it is characterised in that：Institute State in S2 when entering row label cleaning, by more than html and javascript language rejected.
A kind of 3. recruitment general regulations data pick-up method based on semantic model according to claim 1, it is characterised in that：Institute State concretely comprising the following steps for difficult article semanteme participle model extraction：

A, recruitment general regulations are segmented first, then removed " " " wait stop words and insignificant word；

B, part-of-speech tagging is carried out to the word divided, position word is defined as nr, and position word neutrality is nt, and professional word is n, and measure word is Eng, punctuation mark w, left bracket lbh, right parenthesis rbh, company's word are h, place name ns；

C, after the completion of part-of-speech tagging, by general regulations divide into two kinds of texts it is a kind of be participle after source text, a kind of part-of-speech tagging Part of speech arrangement text afterwards；

D, it is last, substantial amounts of general regulations are subjected to part of speech and carry out text mining, arrange position compared with part of speech arranges from text Compared with extracting the part of speech queueing discipline of position, after the part of speech queueing discipline for extracting position, go out phase according to this Rule Extraction With other positions of rule.