CN111754208A - Automatic screening method for recruitment resumes - Google Patents

Automatic screening method for recruitment resumes Download PDF

Info

Publication number
CN111754208A
CN111754208A CN202010619694.2A CN202010619694A CN111754208A CN 111754208 A CN111754208 A CN 111754208A CN 202010619694 A CN202010619694 A CN 202010619694A CN 111754208 A CN111754208 A CN 111754208A
Authority
CN
China
Prior art keywords
resume
recruitment
model
post
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010619694.2A
Other languages
Chinese (zh)
Inventor
邱继钊
杨胜华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Original Assignee
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chaozhou Zhuoshu Big Data Industry Development Co Ltd filed Critical Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority to CN202010619694.2A priority Critical patent/CN111754208A/en
Publication of CN111754208A publication Critical patent/CN111754208A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/105Human resources
    • G06Q10/1053Employment or hiring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Strategic Management (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an automatic screening method of recruitment resumes, belonging to the technical field of Text extraction. The work and the time for manually screening resume meeting the post from the massive resumes are saved for the recruitment company.

Description

Automatic screening method for recruitment resumes
Technical Field
The invention relates to text extraction in natural language processing, and simultaneously relates to the field of multi-label classification, in particular to an automatic screening method for recruiting resumes.
Background
In the internet era, the social and life positions of internet information are more and more remarkable, and the shopping, communication and life styles of people are changed accordingly. With the rise of the recruitment website, the main recruitment path of the enterprise also turns to the publication of recruitment information by the recruitment website, the original method of delivering resumes by offline by an applicant is replaced by online delivered resumes, and the method of recruiting by the two parties through the recruitment website greatly facilitates the enterprise and the individual.
The multi-label learning problem is a research hotspot in the field of international machine learning, and originally originates from the ambiguity problem encountered in the document classification problem. Under the traditional supervised learning framework, real-world objects and concept labels thereof are in one-to-one correspondence, generally, the learning problem is considered to have no ambiguity, and the learning problem is called as a single label classification problem, namely, a sample only has a single label. However, in real-world problems, ambiguity objects are widely present. Because of the ambiguity problem, one sample may be associated with multiple tokens, a class of problems that is multi-token classification problems. The multi-marker learning has wide application in real life, such as automatic video labeling, bioinformatics, Web mining, information retrieval, personalized recommendation and other real applications.
Association rules (Association rule) are one of the most active research methods in the knowledge discovery field, and are first proposed by Agrawal et al in 1993 for mining Association between different commodities (items) in a customer transaction database, and the rules reflect the purchasing behavior pattern of a user. A typical example of association rule mining is shopping basket analysis. The association rule research is helpful for finding out the association between different commodities (items) in the transaction database and finding out the purchasing behavior pattern of the customer, such as the influence of purchasing a certain commodity on purchasing other commodities. The analysis results may be applied to commodity shelf layouts, inventory arrangements, and to classify users according to purchasing patterns.
The TextRank algorithm is a graph-based ranking algorithm for text. The basic idea is derived from the PageRank algorithm of Google, a text is divided into a plurality of composition units (words and sentences), a graph model is established, important components in the text are sequenced by using a voting mechanism, and keyword extraction and abstract can be realized only by using the information of a single document. Different from models such as LDA and HMM, the TextRank does not need to learn and train a plurality of documents in advance, and is widely applied due to simplicity and effectiveness. The TextRank algorithm firstly carries out word segmentation operation on a provided sentence, the obtained word segmentation is put into a set, the importance degree of the word segmentation mainly refers to the number of neighbors before and after the word segmentation, the more the neighbors are, the more the words are voted for the word segmentation, the higher the weight value is, the more the importance is, and the more the word segmentation occurs continuously, the more the neighbors are; the more in the middle (compared to the beginning and end) this participle, the more its neighbors. The main application of the TextRank algorithm has two aspects, namely extracting important keywords in a text and selecting the keywords in a section of speech with more times
In machine learning, the training algorithm model mainly comprises three steps: firstly, preprocessing data; secondly, selecting a proper algorithm model; and thirdly, training the model based on the training sample and obtaining an optimal algorithm model.
With the popularization and application of the internet, information carriers gradually transit from paper newspapers and periodicals to the internet information. With the wide rise of the recruitment websites, the release of enterprise recruitment information is gradually changed from paper newspapers to various recruitment websites on the internet. At present, a recruitment website becomes a main way for enterprises and applicants to release and acquire recruitment information, and resume delivery and post screening in a recruitment link are completed on line through the Internet. And in order to increase the probability of finding the job, the applicant often delivers resumes in a broadcast network mode. Although the broad net can increase the possibility of finding the job by the applicant, the wide net will undoubtedly increase the workload for the enterprise to screen the resume which is not matched with the job. How to quickly and accurately find talents meeting the requirements of the user from the massive resumes becomes a major problem for enterprises.
Disclosure of Invention
In order to solve the technical problems, the invention provides an automatic screening method of recruitment resumes, which can realize automatic screening of resumes according to information released by both the application and the recruitment parties, greatly save time and consumption cost of enterprises, improve the accuracy of final results and enable the enterprises to find resumes and talents meeting requirements in a short time.
The technical scheme of the invention is as follows:
an automatic screening method of recruitment resumes,
and extracting key words in the recruitment resume information by using a Text-Rank algorithm, classifying the recruitment resumes by using a multi-label classification method ML-KNN, mining the association degree of the resumes and the due post based on association rules, and establishing an automatic screening model of the recruitment resumes.
Comprises that
1) Using a crawler to acquire post and resume data in the recruitment website;
2) extracting keywords in the resume and the post information by using a Text-Rank algorithm, taking the resume keywords as features, and taking the post keywords as marks to generate a training sample;
3) and training by using an ML-KNN algorithm model to obtain a screening model.
Further, in the above-mentioned case,
preprocessing the crawled recruitment resume information;
and performing keyword extraction on the processed recruitment resume data by using a Text-Rank algorithm.
In a still further aspect of the present invention,
the method comprises the following specific steps:
step 1): acquiring data; firstly, crawling an enterprise recruitment information page and corresponding delivery resume information; extracting the post requirement and skill requirement data in the recruitment information and the individual skills in the resume to obtain initial post data and resume data;
step 2): extracting key words; extracting key words in post and resume initial data by using a Text-Rank algorithm;
step 3): verifying the keywords extracted in the step 2), and if the extraction quality is poor or the accuracy is low, performing optimization adjustment on the algorithm model in the step 2), and extracting again;
step 4): processing data; carrying out conversion pretreatment on the acquired post and resume keyword information to obtain training examples required by classification;
step 5): establishing a screening model; learning a classifier by using an ML-KNN algorithm according to the training sample in the step 4), and finally establishing a screening model;
step 6): and the model is screened by training for several times, so that the performance of the model is more stable.
The training sample in step 4 is represented as x1 ═ x11, x12, x13, …, x1n ], and the corresponding result set y ═ L1, L2, …, Lm }, where the value of the label L is 0 or 1, 0 indicates that the sample does not have the label, and 1 indicates that the sample has the label.
And (x) establishing a screening model Y ═ f (x), predicting the adaptation position of the unknown sample x according to the model, and calculating the holding probability according to the probability, wherein the larger the holding probability is, the higher the adaptation degree of the resume position is.
The screening model can be trained according to a ten-fold cross validation mode.
The invention has the advantages that
1) The keyword extraction is carried out by using a Text-Rank algorithm, the algorithm is simple and effective, and a plurality of documents do not need to be learned and trained in advance;
2) the method comprises the steps of carrying out classification screening by using an ML-KNN algorithm, searching K neighbor samples by using the idea of KNN through the ML-KNN, and calculating the probability that the current label is 1 and 0 by using Bayesian conditional probability, wherein the label with high probability is determined as the final label of the sample;
3) and 1) and 2) are combined, the automatic resume screening service is provided for enterprise recruitment, the time and the cost of the enterprise are effectively saved, the accuracy of the final result is improved, and the enterprise can find the resumes and talents meeting the requirements in a short time.
Drawings
FIG. 1 is a schematic workflow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
As shown in the figure, the invention defines an automatic screening method of the recruitment resume based on an ML-KNN multi-label learning algorithm, which mainly comprises the following steps:
step 1: and (6) acquiring data. Firstly, an enterprise recruitment information page and corresponding delivery resume information are crawled. Extracting the post requirement and skill requirement data in the recruitment information and the individual skills in the resume to obtain initial post data and resume data;
step 2: and (5) extracting keywords. Extracting key words in post and resume initial data by using a Text-Rank algorithm;
and step 3: and (3) verifying the keywords extracted in the step (2), and if the extraction quality is poor or the accuracy is low, optimizing and adjusting the algorithm model in the step (2) and extracting again.
And 4, step 4: processing data; and performing conversion preprocessing on the acquired position and resume keyword information to obtain training samples required by classification, wherein x1 is [ x11, x12, x13, … and x1n ], and a corresponding result set y is { L1, L2, … and Lm } (the value of the label L is 0 or 1, 0 indicates that the sample does not have the label, and 1 indicates that the sample has the label).
And 5: establishing a screening model; and (4) learning the classifier by using an ML-KNN algorithm according to the training sample in the step 4, finally establishing a screening model Y ═ f (x), predicting the adaptive position of the unknown sample x according to the model, and calculating the holding probability according to the possibility, wherein the larger the holding probability is, the higher the position adaptation degree of the resume is.
Step 6: and training and screening the model for multiple times according to a ten-fold cross validation mode, so that the performance of the model is more stable.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (8)

1. An automatic screening method of recruitment resumes is characterized in that,
and extracting key words in the recruitment resume information by using a Text-Rank algorithm, classifying the recruitment resumes by using a multi-label classification method ML-KNN, mining the association degree of the resumes and the due post based on association rules, and establishing an automatic screening model of the recruitment resumes.
2. The method of claim 1, comprising
1) Using a crawler to acquire post and resume data in the recruitment website;
2) extracting keywords in the resume and the post information by using a Text-Rank algorithm, taking the resume keywords as features, and taking the post keywords as marks to generate a training sample;
3) and training by using an ML-KNN algorithm model to obtain a screening model.
3. The method of claim 2,
and preprocessing the crawled recruitment resume information.
4. The method of claim 3,
and performing keyword extraction on the processed recruitment resume data by using a Text-Rank algorithm.
5. The method of claim 4,
the method comprises the following specific steps:
step 1): acquiring data; firstly, crawling an enterprise recruitment information page and corresponding delivery resume information; extracting the post requirement and skill requirement data in the recruitment information and the individual skills in the resume to obtain initial post data and resume data;
step 2): extracting key words; extracting key words in post and resume initial data by using a Text-Rank algorithm;
step 3): verifying the keywords extracted in the step 2), and if the extraction quality is poor or the accuracy is low, performing optimization adjustment on the algorithm model in the step 2), and extracting again;
step 4): processing data; carrying out conversion pretreatment on the acquired post and resume keyword information to obtain training examples required by classification;
step 5): establishing a screening model; learning a classifier by using an ML-KNN algorithm according to the training sample in the step 4), and finally establishing a screening model;
step 6): and the model is screened by training for several times, so that the performance of the model is more stable.
6. The method of claim 1,
the training sample in step 4 is represented as x1 ═ x11, x12, x13, …, x1n ], and the corresponding result set y ═ L1, L2, …, Lm }, where the value of label L is 0 or 1, 0 indicates that the sample does not have the label, and 1 indicates that the sample has the label.
7. The method of claim 6,
and (x) establishing a screening model Y ═ f (x), predicting the adaptation position of the unknown sample x according to the model, and calculating the holding probability according to the probability, wherein the larger the holding probability is, the higher the adaptation degree of the resume position is.
8. The method of claim 5,
and training a screening model according to a ten-fold cross validation mode.
CN202010619694.2A 2020-07-01 2020-07-01 Automatic screening method for recruitment resumes Withdrawn CN111754208A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010619694.2A CN111754208A (en) 2020-07-01 2020-07-01 Automatic screening method for recruitment resumes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010619694.2A CN111754208A (en) 2020-07-01 2020-07-01 Automatic screening method for recruitment resumes

Publications (1)

Publication Number Publication Date
CN111754208A true CN111754208A (en) 2020-10-09

Family

ID=72678619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010619694.2A Withdrawn CN111754208A (en) 2020-07-01 2020-07-01 Automatic screening method for recruitment resumes

Country Status (1)

Country Link
CN (1) CN111754208A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113342983A (en) * 2021-06-30 2021-09-03 中国平安人寿保险股份有限公司 Resume distribution method, device and equipment based on machine learning and storage medium
CN113506084A (en) * 2021-06-23 2021-10-15 上海师范大学 False recruitment position detection method based on deep learning
CN115879901A (en) * 2023-02-22 2023-03-31 陕西湘秦衡兴科技集团股份有限公司 Intelligent personnel self-service platform

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113506084A (en) * 2021-06-23 2021-10-15 上海师范大学 False recruitment position detection method based on deep learning
CN113342983A (en) * 2021-06-30 2021-09-03 中国平安人寿保险股份有限公司 Resume distribution method, device and equipment based on machine learning and storage medium
CN113342983B (en) * 2021-06-30 2023-02-07 中国平安人寿保险股份有限公司 Resume distribution method, device and equipment based on machine learning and storage medium
CN115879901A (en) * 2023-02-22 2023-03-31 陕西湘秦衡兴科技集团股份有限公司 Intelligent personnel self-service platform
CN115879901B (en) * 2023-02-22 2023-07-28 陕西湘秦衡兴科技集团股份有限公司 Intelligent personnel self-service platform

Similar Documents

Publication Publication Date Title
US20230222366A1 (en) Systems and methods for semantic analysis based on knowledge graph
CN110110335B (en) Named entity identification method based on stack model
CN110888990B (en) Text recommendation method, device, equipment and medium
CN109165294B (en) Short text classification method based on Bayesian classification
US20180211260A1 (en) Model-based routing and prioritization of customer support tickets
CN111767725B (en) Data processing method and device based on emotion polarity analysis model
US20170076225A1 (en) Model-based classification of content items
CN107908715A (en) Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion
US20170075978A1 (en) Model-based identification of relevant content
Akhter et al. Supervised ensemble learning methods towards automatically filtering Urdu fake news within social media
CN111754208A (en) Automatic screening method for recruitment resumes
Nasim et al. Sentiment analysis on Urdu tweets using Markov chains
CN112395410A (en) Entity extraction-based industry public opinion recommendation method and device and electronic equipment
CN110134799B (en) BM25 algorithm-based text corpus construction and optimization method
CN111462752A (en) Client intention identification method based on attention mechanism, feature embedding and BI-L STM
TWI734085B (en) Dialogue system using intention detection ensemble learning and method thereof
US20220148049A1 (en) Method and system for initiating an interface concurrent with generation of a transitory sentiment community
CN116010552A (en) Engineering cost data analysis system and method based on keyword word library
CN103049454B (en) A kind of Chinese and English Search Results visualization system based on many labelings
CN117891948A (en) Small sample news classification method based on internal knowledge extraction and contrast learning
CN111259223A (en) News recommendation and text classification method based on emotion analysis model
CN115934936A (en) Intelligent traffic text analysis method based on natural language processing
Sun et al. GubaLex: Guba-oriented sentiment lexicon for big texts in finance
US20220253728A1 (en) Method and System for Determining and Reclassifying Valuable Words
Swaileh et al. A named entity extraction system for historical financial data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201009

WW01 Invention patent application withdrawn after publication