CN109165295B

CN109165295B - Intelligent resume evaluation method

Info

Publication number: CN109165295B
Application number: CN201811131459.XA
Authority: CN
Inventors: 吴毅
Original assignee: Tianya Community Network Technology Co ltd
Current assignee: Tianya Community Network Technology Co ltd
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2021-07-27
Anticipated expiration: 2038-09-27
Also published as: CN109165295A

Abstract

The invention discloses an intelligent resume evaluation method, which comprises the following steps: acquiring a recruitment data set from a database, wherein the recruitment data set at least comprises enterprise recruitment information; extracting data from the recruitment data set, the data comprising: one or more attributes are corresponding to the recruitment requirement on the position, and the attributes are parameters used for representing the position requirement in the enterprise recruitment information; acquiring resume text data from a database one by one; extracting data from the resume text data, wherein the data comprises: one or more attributes characterizing an applicant; the data extracted from the resume text data is matched with the data extracted from the recruitment data set, and the matched data is written into the database.

Description

Intelligent resume evaluation method

Technical Field

The invention relates to the technical field of data processing, in particular to an intelligent resume evaluation method.

Background

At present, many enterprise HR often adopt manual identification, judgment and screening methods for resumes delivered by applicants, the methods are more dependent on personal experience judgment, and in long-time screening and evaluation, evaluators are easy to feel fatigue to repeatedly browse similar contents, so that recruitment efficiency and subjective judgment are affected, on the other hand, in the existing recruitment process, enterprises tend to find talents through a recruitment website, most of such recruitment websites are characterized in that full description is performed on corresponding recruiters through social networks, behavior data and the like, interests, characters and abilities of the applicants are comprehensively evaluated, and the enterprises are helped to find suitable talents, but the problems exist: the requirement on data required by evaluation is high, the accuracy limitation is large, the difficulty is high, and an effective solution does not appear at present.

Disclosure of Invention

Accordingly, the present invention is directed to an intelligent resume evaluation method to solve at least the above problems.

An intelligent resume evaluation method, comprising:

acquiring a recruitment data set from a database, wherein the recruitment data set at least comprises enterprise recruitment information;

extracting data from the recruitment data set, wherein the data comprises: one or more attributes are corresponding to the recruitment requirement on the position, and the attributes are parameters used for representing the position requirement in the enterprise recruitment information;

acquiring resume text data from a database one by one;

extracting data from the resume text data, wherein the data comprises: one or more attributes characterizing an applicant;

and matching the data extracted from the resume text data with the data extracted from the recruitment data set, and writing the matched resume text data into a database.

Further, the obtaining resume text data item by item from the database includes screening the resume text data, where the screening includes: removing the resume text which does not meet the condition from the resume text data; and acquiring the screened resume text data item by item.

Further, the resume text which does not meet the condition is the resume text which does not adopt a semi-structured data form.

Further, extracting data from the resume text data includes:

dividing the resume text into a basic information class and a complex information class set;

extracting data from the basic information class;

classifying the complex information class set;

and extracting the target information from the complex information class.

Further, when the resume text is divided into a basic information class and a complex information class set, firstly, a matching strategy based on a regular expression is adopted to identify keywords so as to search a dividing point; if no recognizable keywords exist, the first 5-10 lines of text of the resume text are taken as fuzzy basic information classes to extract data.

Further, extracting data from the basic information class includes:

identifying the content of the strong identification element;

the element type is determined based on the element context location.

Further, when the complex information class set is classified, firstly, a key character matching strategy based on a regular expression is adopted to classify the complex information class set; if the matched key words can not be found, the complex information class set is classified by analyzing the format and the font of the text, or the automatic classification algorithm based on the simple vector is used for classification.

Further, when extracting target information from the complex information class, extracting the target information by adopting a key character matching strategy based on a regular expression, wherein the target information is information used for representing the professional skills and the technical level of an applicant in resume text data.

Compared with the prior art, the invention has the beneficial effects that:

according to the intelligent resume evaluation method provided by the invention, the recruitment information and the resume text data are respectively subjected to specific information extraction and automatic matching, so that the resume screening process is simplified, the efficiency is higher compared with the traditional manual resume screening mode, the manpower resource usage is reduced, on the other hand, the requirement on the source of the screened data is lower, the required data can be automatically extracted from the resume of a postman, the screening basis can be adjusted according to the requirement of a recruiter, and the target information extraction accuracy is higher.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only preferred embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive efforts.

Fig. 1 is a schematic flow chart of an intelligent resume evaluation method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a resume text data extraction process according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, the illustrated embodiments are provided to illustrate the invention and not to limit the scope of the invention.

The following embodiments can be applied to a general terminal such as a computer. Of course, the following embodiments may also be applied to a server, which may also be understood as a device composed of one or more computers. Therefore, the structure of the computer shown below is also applicable to the server. The following embodiments may also be implemented in mobile terminals when the computing power of the mobile terminal is gradually increased. Of course, the steps or modules in the following embodiments may be performed in different servers or terminals or mobile terminals, respectively, and the necessary data interaction between the servers or terminals or mobile terminals may be performed.

Referring to fig. 1, the present invention provides an intelligent resume evaluation method, specifically including:

and step S1, acquiring a recruitment data set from the database, wherein the recruitment data set at least comprises enterprise recruitment information.

In the above step, as an optional implementation manner, the recruitment data set stored in the database is latest enterprise recruitment information of the recruitment enterprise, and since the recruitment standard of the recruitment enterprise may change with the passage of time, the requirement for the candidate may change, the latest enterprise recruitment information of the recruitment enterprise is used as a source of the recruitment data set to ensure the accuracy of the recruitment target.

Step S2, extracting data from the recruitment data set, wherein the data comprises: and one or more attributes corresponding to the recruitment requirement on the position, wherein the attributes are parameters used for representing the position requirement in the enterprise recruitment information.

In the above steps, the attribute for characterizing the job requirement may be a parameter such as a scholarly, a specialty, a skill, a work experience, and the like required by the job. The extracted data is the specific requirement for the attribute, for example, the study of the applicant requiring the position in the enterprise recruitment information should be large or more, and the specialty should be software engineering.

Step S3, obtaining resume text data item by item from the database;

in the above step, the resume text data is a resume text delivered by an applicant. In an embodiment of the present invention, the foregoing step further includes screening the resume text data, where the screening includes: removing the resume text which does not meet the condition from the resume text data; and acquiring the screened resume text data item by item. And the resume text which does not meet the condition is the resume text which does not adopt a semi-structured data form.

According to the characteristics of the text, the text data can be divided into three categories: structured data, that is, text data strictly generated according to a certain format, such as various bills, score sheets, and the like; unstructured data, i.e., text data that is dominated by human-accustomed communication and that conforms to natural grammatical rules, such as news reports, novels, prose, etc.; the semi-structured data is between the first two types of text data, and the text data has certain format constraint and does not completely accord with natural grammar rules in the whole text view, but locally uses natural grammar rule organization languages, such as notices, announcements, most resumes and the like, and all belong to the semi-structured text. In order to facilitate the recognition of the resume text data and the extraction of information by a computer, when the resume text data delivered by an applicant is acquired, the resume text which is not in a semi-structured data form needs to be removed, namely the text which is not in a conventional resume writing form needs to be removed.

Step S4, extracting data from the resume text data, wherein the data includes: one or more attributes characterizing the applicant.

In the above steps, the attribute for characterizing the characteristics of the applicant may be information such as name, gender, school, academic calendar, specialty, skill, work experience, and the like.

And step S5, matching the data extracted from the resume text data with the data extracted from the recruitment data set, and writing the matched data into the database.

In step S5, the data representing the characteristics of the applicant extracted from the resume text data is matched with the data representing the recruitment requirement of the recruitment enterprise extracted from the recruitment data set, and if the matching is passed, that is, if a certain aspect of the characteristics of the applicant meets the recruitment requirement of the recruitment enterprise, the matched resume text data is written into the database, so that the human resource department can arrange the applicant to perform an interview according to the resume text data in the database, thereby improving the recruitment work efficiency.

Referring to fig. 2, on the basis of the above embodiment, in step S4, extracting data from the resume text data includes:

step S41, dividing the resume text into a basic information set and a complex information set;

step S42, extracting data from the basic information class;

step S43, classifying the complex information class set;

in step S44, target information is extracted from the complex information class.

In the embodiment of the invention, the resume text is divided into a basic information class and a complex information class, and from the content characteristics of the classes, the basic information class refers to a class formed by the basic information of the applicant, wherein the class is a text with certain common characteristics. The basic information class characterizes the basic situation of the applicant, and may contain a plurality of basic information items, such as name, year and month of birth, school, academic calendar, specialty, native place, contact way and the like. Spaces, carriage returns, and the like are generally provided between the basic information items. The complex information class refers to a class formed by complex information of an applicant, the complex information class represents the extension condition information of the applicant, a plurality of complex information classes such as education experience, work experience, project experience, training experience and the like may exist in a resume text, and the complex information classes form a complex information class set.

The basic information class and the complex information class, and the complex information class have obvious segmentation marks, such as keywords, fonts, formats, and the like, and are different from the content of each class. When the resume text is divided into the basic information class and the complex information class set in step S41, as an optional embodiment, firstly, a keyword matching strategy based on a regular expression is adopted to find out a divided division identifier, and based on the characteristic that each type of text information in the resume text has a title, an exhaustion method can be firstly adopted to store the title possibly appearing in the resume text and the category to which the title belongs in a keyword library, and then a regular expression is designed to retrieve a matched text from the text to be used as the division identifier for division. If the corresponding key characters are not detected in the text, the first 5-10 lines of text of the resume text are used as the fuzzy basic information classes to extract data according to the fact that the basic information of the applicant in the general resume text is located at the beginning of the resume, and of course, the range of the fuzzy basic information classes can be flexibly set according to actual requirements.

In step S42, the extracting of the information required by the recruiting enterprise from the basic information class specifically includes:

identifying the content of the strong identification element;

the element type is determined based on the element context location.

The basic information class is composed of a plurality of basic information items, the basic information items generally comprise a title element and a content element, for example, the name in the resume text is the title element, the third paragraph after the name is the content element, and the title element can judge the strong identification element of the class from the text content of the title element according to the strong and weak identification of the content of the resume text. By designing a regular expression, the strong identification elements in the resume text are retrieved, and then the types of the elements can be judged according to the context positions of the elements. For example, in the basic information class, if a weak identification element or no identification element is located between two strong identification elements, the weak identification element or no identification element is considered as a content element corresponding to a previous element. After the element types in the basic information class are identified, the required information is extracted from the regular expression designed according to the key character matching strategy.

In step S43, since the resume text usually includes a plurality of complex information classes, such as educational backgrounds, work experiences, skills, hobbies, social practices, and the like, which form a complex information class set, the complex information class set needs to be further classified after the segmentation of the basic information class and the complex information class set in the resume text is completed. Firstly, a regular expression-based key character matching strategy is adopted to classify a complex information class set, most resume texts are provided with keywords of education backgrounds and working experiences, and therefore the method is high in speed, high in accuracy and good in classification effect for classifying the complex information class. If the matched key words can not be found, the complex information class set is classified by analyzing the format and the font of the text or by an automatic classification algorithm based on simple vectors according to the characteristics that the title and the content of the complex information class generally adopt different fonts, sizes and formats.

The classification principle of the automatic classification algorithm based on the simple vector is as follows: and generating a central vector for each type of text set according to arithmetic mean, determining a new text vector when the new text comes, calculating the distance between the new text vector and the central vector of each type of text set, namely similarity, and finally judging that the new text belongs to the class closest to the text in classification.

In step S44, after the classification of the complex information category is completed, a regular expression is designed to extract target information based on a keyword matching policy, where the target information is information used for representing the professional skills and technical levels of an applicant in resume text data, and the information is extracted to be used for matching with information representing the job requirements extracted from the recruitment information of a recruitment enterprise, and when the information is matched, according to the job requirement information, for example, according to the requirement that the applicant needs to have professional certification of a high-level software engineer in the recruitment information, the regular expression is designed based on the information, the target information extracted from the complex information category is screened, and if the corresponding information is retrieved, the resume text is stored in a database, otherwise, the resume text is regarded as not meeting the job requirements.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An intelligent resume evaluation method, comprising:

obtaining resume text data from a database one by one, and screening the resume text data, wherein the screening comprises the following steps: removing the resume text which does not meet the condition from the resume text data; acquiring the screened resume text data one by one, wherein the resume text which does not meet the condition is the resume text which does not adopt a semi-structured data form;

extracting data from the resume text data, including:

extracting data from the basic information class;

classifying the complex information class set;

extracting target information from a complex information class, wherein the data comprises: one or more attributes characterizing an applicant;

2. The intelligent resume evaluation method according to claim 1, wherein when the resume text is divided into a basic information class and a complex information class set, keywords are identified by adopting a matching strategy based on a regular expression to find division points; if no recognizable keywords exist, the first 5-10 lines of text of the resume text are taken as fuzzy basic information classes to extract data.

3. The intelligent resume evaluation method of claim 1, wherein extracting data from the basic information class comprises:

identifying the content of the strong identification element;

the element type is determined based on the element context location.

4. The intelligent resume evaluation method according to claim 1, wherein when the complex information class set is classified, firstly, a key character matching strategy based on a regular expression is adopted to classify the complex information class set; if the matched key words can not be found, the complex information class set is classified by analyzing the format and the font of the text, or the automatic classification algorithm based on the simple vector is used for classification.

5. The intelligent resume evaluation method according to claim 1, wherein when extracting target information from the complex information class, a regular expression-based key character matching strategy is adopted to extract the target information, and the target information is information used for representing professional skills and technical levels of an applicant in resume text data.