CN108874928B

CN108874928B - Resume data information analysis processing method, device, equipment and storage medium

Info

Publication number: CN108874928B
Application number: CN201810548844.8A
Authority: CN
Inventors: 张师琲
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-05-31
Filing date: 2018-05-31
Publication date: 2024-02-02
Anticipated expiration: 2038-05-31
Also published as: WO2019227584A1; CN108874928A

Abstract

The invention relates to the technical field of computers, and provides a resume data information analysis processing method, a resume data information analysis processing device, computer equipment and a storage medium, wherein the resume data information analysis processing method comprises the following steps: receiving a resume file uploaded by a user; performing format conversion on the resume file according to a preset text format to obtain a resume text corresponding to the resume file; extracting the tags of the resume text to obtain a title tag; according to preset keywords, matching the title label with the keywords, and determining the successfully matched title label as an effective keyword; and analyzing the resume text according to the data analysis mode corresponding to each effective keyword aiming at each effective keyword, and obtaining the data information corresponding to each effective keyword in the resume text. The invention realizes complete extraction of the resume text data information and effectively improves the resolution accuracy of the resume text.

Description

Resume data information analysis processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for analyzing and processing resume data information.

Background

In daily life, the resume is an important text for the author to search for the job or display the life of the person, and comprises information such as the basic condition, working experience, education experience and the like of the author, while the design style of various personal resume and the writing habit of the person can be different, the specific format of the personal resume information can be different from person to person, and each enterprise wants to acquire talent information, and a manager is required to screen and analyze the needed resume information from a large number of resume.

At present, the traditional resume data analysis method is usually completed manually, resume files are manually collected, manual classification processing is carried out on resume texts in the resume files, resume data information is manually classified and recorded into a database, and because the manual classification processing often has subjective factors, the situation of repeated information recording or information deletion easily occurs, so that the resume data information analysis accuracy is not high.

Disclosure of Invention

Accordingly, in order to solve the above-mentioned problems, it is necessary to provide a method, an apparatus, a device, and a storage medium for analyzing and processing resume data information, which can improve the accuracy of analyzing resume data information.

A resume data information analysis processing method comprises the following steps:

Receiving a resume file uploaded by a user;

performing format conversion on the resume file according to a preset text format to obtain a resume text corresponding to the resume file;

extracting the label from the resume text to obtain a title label;

according to preset keywords, matching the title label with the keywords, and determining the title label successfully matched as an effective keyword;

analyzing the resume text according to the data analysis mode corresponding to each effective keyword aiming at each effective keyword, and obtaining the data information corresponding to each effective keyword in the resume text;

according to a template label in a preset standard resume template, matching the effective keywords with the template label, importing the data information corresponding to the successfully matched effective keywords into a position corresponding to the template label, generating a standard resume report and storing the standard resume report in a resume library.

A resume data information analysis processing device comprises:

the file receiving module is used for receiving the resume file uploaded by the user;

the file conversion module is used for carrying out format conversion on the resume file according to a preset text format to obtain a resume text corresponding to the resume file;

The label extraction module is used for extracting labels from the resume text to obtain a title label;

the tag matching module is used for matching the title tag with the keyword according to a preset keyword, and determining the title tag successfully matched as an effective keyword;

the text analysis module is used for analyzing the resume text according to the data analysis mode corresponding to each effective keyword and obtaining the data information corresponding to each effective keyword in the resume text;

the information importing module is used for matching the effective keywords with the template labels according to the template labels in the preset standard resume template, importing the data information corresponding to the effective keywords which are successfully matched into the positions corresponding to the template labels, generating a standard resume report and storing the standard resume report in a resume library.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the resume data information resolution processing method described above when the computer program is executed.

A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the resume data information analysis processing method described above.

According to the resume data information analysis processing method, the resume data information analysis processing device, the resume data information analysis processing equipment and the storage medium, format conversion is carried out on the received resume file according to the preset text format to obtain resume text corresponding to the resume file, extraction of data information in the resume text is facilitated, tag extraction is carried out on the resume text to obtain title tags, the title tags are matched with the keywords according to preset keywords, the successfully matched title tags are determined to be effective keywords, therefore, for each effective keyword, the resume text is analyzed according to a data analysis mode corresponding to the effective keywords, data information corresponding to the effective keywords in the resume text is obtained, the integrity of extraction of resume text data information can be guaranteed, accurate analysis of resume text data information is achieved, analysis accuracy of resume text is effectively improved, meanwhile, according to template tags in a preset standard resume template, the effective keywords are matched with the template tags, the successfully matched data information corresponding to the template tags is imported into a position corresponding to the template tags, the resume text is generated, and the resume text is not subjected to unified rules, and the resume text is convenient to maintain, and the unified rules are convenient to maintain.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an application environment of a resume data information resolution processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a resume data information resolution processing method according to an embodiment of the invention;

FIG. 3 is a flowchart illustrating an implementation of step S30 in a resume data information resolution processing method according to an embodiment of the present invention;

FIG. 4 is a flowchart of an implementation of step S50 in a resume data information resolution processing method according to an embodiment of the present invention;

FIG. 5 is a flowchart of another implementation of step S50 in a resume data information resolution processing method according to an embodiment of the present invention;

FIG. 6 is a flow chart of analyzing and processing basic time periods in educational experience or work experience in a resume data information analysis and processing method according to an embodiment of the present invention;

FIG. 7 is a flowchart of a resume download request processing method according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a resume data information resolution processing apparatus according to an embodiment of the invention;

FIG. 9 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 shows an application environment provided by an embodiment of the present invention, where the application environment includes a server and a client, where the server and the client are connected through a network, the client is configured to collect a resume file and send the collected resume file to the server, and the client may specifically but not be limited to various personal computers, notebook computers, smartphones, tablet computers and portable wearable devices; the server side is used for processing the resume file, and the server side can be realized by an independent server or a server cluster formed by a plurality of servers. The resume data information analysis processing method provided by the embodiment of the invention is applied to the server.

Referring to fig. 2, fig. 2 shows an implementation flow of the resume data information analysis processing method provided in the present embodiment. The details are as follows:

s10: and receiving the resume file uploaded by the user.

In this embodiment, file types of the resume file may include, but are not limited to: doc, pdf, html, etc., the language type of the resume file may include, but is not limited to: chinese, english, japanese, etc., it should be understood that the file types and language types listed herein are examples only, and that other and further file types or language types are possible, without limitation.

S20: and converting the format of the resume file according to a preset text format to obtain a resume text corresponding to the resume file.

It should be noted that, the preset text format may be xml, pdf, doc, or the like, but is not limited thereto, and may be specifically set according to the actual application requirement, which is not limited herein.

The text format conversion of the resume file may be processed using the parsing class library Tika tool, or other tools, without limitation.

Preferably, the format conversion is performed on the received resume file by using a Tika tool, wherein the Tika tool is a library for detecting file types and extracting contents from files in various formats, the Tika uses various file resolvers and detection technologies of file types to detect and extract data, and the format conversion plug-in provided by the Tika performs format conversion on data in various file formats, so that the irregular resume text format can be converted into a unified preset text format.

Specifically, the format conversion plug-in the Tika tool is adopted to convert the format of the resume files received in the step S1 according to the preset text formats, so that resume texts with uniform formats are obtained, unified standardization of irregular resume texts can be realized, and subsequent analysis and extraction of data information of the resume texts are facilitated.

As an example, according to a preset text format, format conversion is performed on resume files in various different text formats by using a format conversion plug-in Tika software, and a space may be indicated by ", for example, the format of doc resume text is" name "Li Lei", the format of pdf resume text is "", and the format conversion plug-in converts the resume text in these two different formats according to the preset text format, for example, into a text format of "", name "Li Lei".

S30: and extracting the tags of the resume text to obtain the title tags.

In this embodiment, the title tag is a series of tags representing personal characteristics, such as extraction and identification of content in the resume text, and is used for describing professions, academia, work experiences, and the like in the resume, where the title tag may be specifically "name", "academia", "educational experience" or "work experience" and the like.

The label extraction mode may be feature extraction, but may also be other extraction modes, and is not limited herein.

Specifically, according to a preset tag dictionary, traversing the resume text, retrieving the text which is the same as the tag in the tag dictionary, and marking the text as a title tag.

S40: and matching the title label with the keyword according to the preset keyword, and determining the successfully matched title label as the effective keyword.

In this embodiment, the preset keywords may be, but are not limited to, names, schools, professions, educational experiences, work experiences, etc., and they may be specifically set according to actual application requirements, which is not limited herein. The effective keywords are title labels in the resume text, which are matched with preset keywords.

The title tag and the keyword are matched, which can be through condition matching or through other modes, and the condition in the condition matching can be set according to the actual application requirement without limitation.

Preferably, the matching method adopted in the embodiment is conditional matching, and the conditional matching process may specifically be that whether the word senses of the title tag and the keywords are the same is determined according to a preset word bank, wherein a set of hyponyms corresponding to each keyword is defined in the preset word bank, for example, the set of hyponyms of "education experience" includes "education experience", "education degree", etc., the set of hyponyms of "work experience" includes "work experience", "history", etc., and if the title tag belongs to the set of hyponyms of the keywords, it is confirmed that the title tag is the same as the word sense of the keyword, that is, the matching is successful; the condition matching process may also be that the text similarity between the title tag and the keyword is calculated, if the text similarity is greater than or equal to a preset similarity threshold, the matching is successful, where the preset similarity threshold may be specifically 80%, or may be other numerical values, specifically may be set according to the actual application requirement, and the present invention is not limited herein.

S50: and analyzing the resume text according to the data analysis mode corresponding to each effective keyword aiming at each effective keyword, and obtaining the data information corresponding to each effective keyword in the resume text.

In the embodiment of the invention, the data information refers to specific resume text content corresponding to the effective keywords.

The corresponding data parsing modes of different valid keywords may be the same or different, and the data parsing modes may include, but are not limited to: the data partitioning method, regular expression, score algorithm and the like can be specifically set according to actual application requirements, and are not particularly limited.

The data dividing method is to select boundary marks in the text, divide the text into individual boundary marks and text blocks corresponding to each boundary mark by taking the boundary marks as intervals, wherein the boundary marks are effective keywords in resume text, and the text blocks corresponding to each boundary mark are data information corresponding to each effective keyword.

Specifically, according to the data analysis mode corresponding to the effective keywords, the content in the resume text obtained in the step S20 is divided and extracted, so that the data information corresponding to each effective keyword in the resume text is obtained, and the integrity of extracting the content of the resume text can be ensured.

S60: according to template labels in a preset standard resume template, matching the effective keywords with the template labels, importing data information corresponding to the successfully matched effective keywords into positions corresponding to the template labels, generating a standard resume report, and storing the standard resume report in a resume library.

In this embodiment, the preset standard resume template is set according to the actual application requirement, which is not limited herein. The template label may be, but is not limited to, name, academic, educational history, work history, etc., and may be specifically set according to practical application requirements, which is not limited herein.

Specifically, traversing the template label in the preset standard resume report, retrieving the text with the same effective keywords as the template label, namely successfully matching the effective keywords to the template label, importing the data information corresponding to the successfully matched effective keywords into the position corresponding to the template label, generating the standard resume report and storing the standard resume report in a resume library.

Further, the matching of the effective keyword and the template tag may be performed in a condition matching manner, and the condition matching manner may specifically be the same as the condition matching manner performed by the keyword tag and the keyword in step S40, which is not described herein.

In this embodiment, format conversion is performed on a received resume file according to a preset text format to obtain a resume text corresponding to the resume file, which is favorable for subsequent extraction of data information in the resume text, and a title tag is obtained by performing tag extraction on the resume text, and the title tag is matched with the keyword according to a preset keyword, and the successfully matched title tag is determined as an effective keyword, so that quick locking of the position of the data information to be extracted can be realized according to the effective keyword, and therefore, for each effective keyword, the resume text is parsed according to a data parsing mode corresponding to the effective keyword, data information corresponding to the effective keyword in the resume text is obtained, further confirmation of the locked data information can be realized, accuracy of the obtained data information is ensured, and the integrity of extraction of the resume text data information is ensured.

In one embodiment, as shown in fig. 3, in step S30, the extracting the tag from the resume text to obtain the title tag specifically includes the following steps:

s301: and acquiring text lines in the resume text.

In this embodiment, the text line is a word or a sentence in the resume text that is located in a single line.

S302: and extracting the characteristics of the text line according to a preset characteristic index to obtain a characteristic vector.

In this embodiment, the preset feature index is used as the feature extraction standard, and specifically may include, but is not limited to: the text line is a single line, the text length of the text line is smaller than the preset length, the text line does not contain punctuation marks, the text line is in a preset tag dictionary, the fonts of the text line are different from the fonts with the largest full text proportion, the fonts of the text line are the parent level elements in the tag dictionary, the preset characteristic indexes can be other indexes, the preset characteristic indexes can be specifically set according to the actual application requirements, and the preset characteristic indexes are not limited. Wherein, the preset label dictionary comprises elements and labels, and the labels comprise but are not limited to: name, academy, educational experience, work experience, etc., can be specifically set according to practical application requirements, and the elements include, but are not limited to: parent level elements, child level elements, in-line elements and the like, wherein the parent level elements comprise the set size, height, width and the like of fonts, the fonts can be Song Ti, computer fonts Courier or English lined fonts serif and the like, and the fonts can also be other fonts, and the method is not limited herein.

A feature vector is a set of vectors that are used to describe the attribute features of a line of text.

Note that, the text line feature extraction may be performed by using an information labeling method, or may be performed by using other methods, which is not limited herein.

Preferably, the information marking method is used for extracting the text line features, specifically, the text line features meeting the preset feature index requirements are marked as 1, and the text line features not meeting the preset feature index requirements are marked as 0.

For example, if a certain text line satisfies characteristic indexes such as "the text line is a single line", "the text length of the text line is smaller than a preset length", "the text line does not contain punctuation marks", and "the text line is in a preset tag dictionary", the corresponding label is 1, the text line does not satisfy "the font of the text line is different from the font with the largest full text ratio", and "the font of the text line starts with a parent element in the tag dictionary", the corresponding label is 0, and the characteristic vector of the text line is (1,1,1,1,0,0).

S303: if the feature vector meets the preset label condition, the text line is marked as a title label.

Specifically, it is determined whether the feature vector obtained in step S302 meets a preset tag condition, and if the feature vector meets the preset tag condition, the text line corresponding to the feature vector is identified as a title tag.

It should be noted that, the preset label condition may specifically be that the number of the components "1" in the feature vector is greater than or equal to the preset number, but is not limited thereto, and may specifically be set according to the actual application requirement, which is not limited herein.

Continuing with the description of the example of step S302, if the preset tag condition is that the number of components "1" in the feature vector is greater than or equal to 4, the number of components "1" in the feature vector of the text line is 4, and the preset tag condition is satisfied, the text line is identified as a title tag.

In this embodiment, feature extraction is performed on the text line in the obtained resume text according to a preset feature index, so that a feature vector corresponding to the text line can be obtained, accurate reading of the text line of the resume text can be realized, and subsequent confirmation of the title tag is facilitated.

In an embodiment, as shown in fig. 4, in step S50, for each effective keyword, according to a data parsing manner corresponding to the effective keyword, parsing a resume text to obtain data information corresponding to each effective keyword in the resume text specifically includes the following steps:

s501: and acquiring a name data block corresponding to the name.

In this embodiment, the name data block refers to a data block corresponding to the effective keyword "name" in the resume text.

The obtaining manner of the name data block may specifically be that an effective keyword is searched, an effective keyword of the searched "name" is searched, meanwhile, according to the sequence of occurrence of the effective keyword, the content between the effective keyword "name" and the next effective keyword is used as a corresponding name data block of the effective keyword "name", and the obtaining manner of the name data block may also be other manners, which is not limited herein.

S502: and carrying out name data identification on the name data block according to a preset name regular expression, and taking the identified name data as data information corresponding to the name.

In this embodiment, the preset name regular expression may be a common name extraction regular expression used for identifying name data in the resume text. The common name extraction regular expression is:

^[\u4E00-\u9FA5]{2，5}(？:·[\u4E00-\u9FA5]){2，5}$

Wherein the symbol "≡" represents the "last name" matching beginning part in the name, the symbol "$" represents the "first name" matching ending part in the name, "[_4E00-_9F5 ]" represents unicode encoding of all Chinese characters, the symbol "[ ]" represents Chinese characters within the limited range of "[_4E00-_9F5 ]", the "{2,5}" represents selecting 2 to 5 "[_4E00-_9F5 ]", chinese characters within the limited range "? "indicates the number of characters without limitation," means the relationship of logical OR.

It should be noted that, the regular expression is used for processing the character string in the data, and by using some specific characters to describe the rules of the occurrence of the characters in the character string, matching, identifying, extracting or replacing the character string conforming to the rules, the regular expression can also be used for searching, deleting and replacing the character string, and the regular expression can be used for realizing quick searching and accurate searching. Wherein, specific characters are as "[_4E00\u9F5 ]", "{2,5}" or "[1-9] \d {3}", etc.

Specifically, for the name data block acquired in step S501, a specific character in the common name regular expression is adopted as a rule for describing the appearance of a corresponding character in the name data character string of the name data block, and a character string conforming to the rule for appearance of the character in the name data block is identified, and then the character string is determined as the name data of the name data block.

In this embodiment, name data identification is performed on the acquired name data block according to a preset name regular expression, and the identified name data is used as data information corresponding to the name, so that further confirmation and extraction of the data information corresponding to the name can be realized, pertinence to the extracted data information is achieved, and the integrity and accuracy of the obtained data information are guaranteed, so that the accuracy of analyzing the data information is improved.

In an embodiment, the effective keywords include educational experience or work experience, as shown in fig. 5, in step S50, for each effective keyword, according to a data parsing manner corresponding to the effective keyword, parsing the resume text, and obtaining data information corresponding to each effective keyword in the resume text specifically includes the following steps:

s503: and acquiring a data block corresponding to the education resume or the work experience.

In this embodiment, the data block corresponding to the education resume or the work experience refers to the data block corresponding to the effective keyword "education resume or work experience" in the resume text.

It should be noted that, the data block corresponding to the educational resume or the work experience may be acquired in the same manner as the name data block is acquired in step S501, which is not described herein.

S504: and according to a preset score algorithm, performing score calculation on the data block to obtain a score value of the data block.

In this embodiment, the preset score algorithm may be specifically set according to actual application requirements, and is used to calculate scores of data blocks corresponding to the educational resume or the working experience, and specifically may be to set corresponding scores for preset labels in the resume text, where the preset labels include time, school, specialty, and academic level.

Specifically, the score calculation is performed on the data block by using a preset score algorithm, which may be to set corresponding scores for preset labels in the education resume or the work experience, such as time 2 score, school 2 score, professional 1 score, and degree 1 score, when the score calculation is performed on the data block, traversing the data block, searching for the preset label in the data block, and performing the score calculation according to the searched preset label, for example, marking "2 score" if "time" is searched, marking "1 score" if "professional" is searched, and performing accumulated summation on the marked scores after traversing is completed, where the obtained summation result is used as the score value of the data block.

S505: if the score value is larger than a preset score threshold value, determining the target data block corresponding to the score value as data information corresponding to the education resume or the work experience.

In this embodiment, the target data block is a data block with a score value greater than a preset score threshold, and the preset score threshold may be specifically set according to the actual application requirement, which is not limited herein.

For example, assuming that the preset score threshold is 4 points, taking time 2 points, school 2 points, professional 1 points and academic 1 points as examples, if a certain data block includes labels "time", "school" and "professional", the labels are marked with scores according to a preset score algorithm, the scores of the labels are respectively marked with "time" 2 points "," school "mark" 2 points "and" professional "mark" 1 points ", and the score value obtained by accumulating and summing the marks is 5 points, and the score value is greater than the preset score threshold, so that the data block can be determined as corresponding data information of the education resume or the work experience.

In this embodiment, according to a preset score algorithm, score calculation is performed on a data block corresponding to an obtained education resume or work experience to obtain a score value of the data block, and if the score value is greater than a preset score threshold value, a target data block corresponding to the score value is determined to be data information corresponding to the education resume or work experience, so that rapid confirmation and extraction of the data information corresponding to the education resume or work experience can be achieved, and the data information obtaining efficiency is improved.

In an embodiment, as shown in fig. 6, after step S505, the resume data information analysis processing method further includes the following steps:

s506: and identifying the basic time period in the target data block according to a preset time regular expression.

In this embodiment, the preset time regular expression may be a commonly used time regular expression for identifying a basic time period in the resume text. The base time period includes a start time point and an end time point for representing a time from the start to the end of the educational resume or work experience.

The common time regular expression is as follows:

^[1-9]\d{3}.(0[1-9]1[0-2])-[1-9]\d{3}.(0[1-9]1[0-2])$

wherein "[1-9] \d {3}" represents year, "(0[1-9 ]1[0-2 ])" represents date, "[1-9] \d {3}" (0[1-9 ]1[0-2 ]) "represents start time point," [1-9] \d {3} "(0[1-9 ]1[0-2 ]) $" represents end time point, and symbol "-" represents a grid symbol of time, such as "2010.01-2011.04".

Specifically, for the data information corresponding to the education resume or the work experience obtained in step S505, specific characters in the common time regular expression are adopted as rules for describing the occurrence of the corresponding characters in the character string of the basic time period of the data information, the character string conforming to the rules for the occurrence of the characters in the data information is identified, and then the character string is determined as the basic time period of the data information.

S507: analyzing the time inclusion relation of the basic time period, and determining a main time period and a sub-time period in the basic time period;

in this embodiment, the base time period may be a main time period or a sub-time period, where the main time period includes one or more sub-time periods. The inclusion relationship of time refers to the relationship between time periods and is equivalent to the relationship between sets, wherein the inclusion relationship includes the relationship of inclusion, non-inclusion and equality, and if the time periods are described by the form of the sets, the inclusion relationship of time is equivalent to the relationship of inclusion, null and equality in the sets.

The main time period and the sub time period are confirmed in the basic time period, so that the data information corresponding to the education resume or the working experience is further confirmed and extracted, the integrity of the data information is guaranteed, and the distribution condition of the data information can be clearly reflected.

Specifically, according to the inclusion relation of time, determining a main time period and a sub time period in a basic time period is to determine the sequence of different time points according to a starting time point and an ending time point in the basic time period, obtain the time inclusion relation between the basic time periods according to the sequence, and determine the main time period and the sub time period according to the inclusion relation, wherein the included basic time period is the sub time period.

For example, there are time periods "2010.01-2014.04" and "2011.04-2012.04" in the basic time period, and the starting time points and the ending time points of the time periods "2010.01-2014.04" and "2011.04-2012.04" are compared, respectively, to obtain that "2010.01" precedes "2011.04" and "2014.04" is later than "2012.04", that is, the time period "2011.04-2012.04" is included in the time period "2010.01-2014.04", and the time period "2011.04-2012.04" is a sub-time period.

S508: splitting the target data block according to the main time period to obtain the phase data information corresponding to each main time period.

In this embodiment, splitting the target data block refers to dividing the whole of the target data block into each individual main time period and phase data information corresponding to the main time period.

Specifically, according to the main time period obtained in step S507, the target data block is split into the main time period and the phase data information corresponding to each main time period, so that confusion of the extracted data information can be avoided, the integrity of the data information is ensured, and the accuracy of analyzing the data information is improved.

In this embodiment, according to a preset time regular expression, basic time periods in a target data block are identified, the identified basic time periods are ordered, meanwhile, according to a time containing relationship, a main time period and a sub time period in the basic time period are determined, the target data block is split according to the main time period, so that stage data information corresponding to each main time period can be obtained, further confirmation and extraction of data information corresponding to educational resume or work experience can be realized, the integrity of the data information is guaranteed, the distribution situation of the data information can be clearly reflected, the confusion of the extracted data information is avoided, and the accuracy of analyzing the data information is improved.

In an embodiment, as shown in fig. 7, after step S60, the resume data information analysis processing method further includes the following steps:

s70: if a resume information inquiry request sent by a user is received, acquiring inquiry condition information in the resume information inquiry request, wherein the inquiry condition information comprises inquiry condition items and inquiry condition values;

in this embodiment, the query condition information may include one or more query condition terms, and query condition values corresponding to the query condition terms, where the query condition terms are used to match a template tag, so that a query on resume information may be implemented, the query condition terms may be specifically "school", the corresponding query condition values may be specifically "junior middle school", "senior" university ", etc., the query condition terms may be" college principal "," family ", or" research student ", etc., the query condition terms may be" work experience ", the query condition values may be" national enterprise "or" external enterprise ", etc., and may also be other query condition terms and query condition values, where no limitation is made.

S80: and matching the query condition item with the template label in the standard resume report to obtain target data information corresponding to the successfully matched template label.

In this embodiment, the target data information is data information corresponding to a template tag matched with the query term in the standard resume report.

Specifically, based on the query term in step S70, the query term is matched with the template tag in the standard resume report, which may be traversing the standard resume report, searching the template tag identical to the query term, and acquiring the data information corresponding to the template tag.

S90: and comparing the similarity between the target data information and the query condition value to obtain a standard resume report in which the target data information meeting the preset similarity condition is located.

In this embodiment, the preset similarity condition is used as a standard for extracting a standard resume report where the target data information is located, and may specifically be set according to the actual application requirement, which is not limited herein.

For example, the preset similarity condition may be that the text similarity between the target data information and the query condition value is greater than or equal to a preset similarity threshold.

The preset similarity condition may be that the query condition value is included in the target data information, that is, the similarity comparison process is: traversing the target data information, and if the query condition value exists in the target data information, confirming that the target data information meets the preset similarity condition.

The preset similarity condition may also be that the number of times of occurrence of the query condition value in the target data information is greater than or equal to a preset similarity threshold, that is, the similarity comparison process is as follows: and respectively carrying out word segmentation processing on the query condition value and the text of the target data information to obtain a query condition value vocabulary unit and a target data information vocabulary unit, marking as '1' if the words in the query condition value vocabulary unit appear once in the target data information vocabulary unit, marking as '2' if the words appear twice, marking as 'N' if the words appear N times, marking as '0' if the words do not appear, finally obtaining the total number of times of the words in the query condition value vocabulary unit appearing in the target data information vocabulary unit, and confirming that the target data information meets the preset similarity condition if the total number of times is larger than or equal to a similarity threshold value. Examples are as follows:

assuming that the preset similarity threshold is 2, the query condition value is "Beijing university", and the query condition value vocabulary unit obtained by word segmentation of the query condition value is unit1: the target data information vocabulary unit after word segmentation of the target data information is unit2: the number of times that the word in unit1 appears in unit2 is 4, the number of times is greater than a preset similarity threshold value 2, and the target data information is confirmed to meet the preset similarity condition requirement.

It should be noted that, the word segmentation process may use a tool with a word segmentation function, such as a word segmentation plug-in of a solr search engine, or may use other tools, which is not limited herein.

Specifically, the target data information obtained in the step S80 is compared with the query condition value obtained in the step S70 in terms of similarity according to the preset similarity condition, and if the comparison result meets the preset similarity condition, the standard resume report in which the target data information is located is obtained.

In this embodiment, according to the query condition item in the resume information downloading request sent by the user, the template tag included in the standard resume is matched, the target data information corresponding to the template tag successfully matched is obtained, and the standard resume report in which the target data information meeting the preset similarity condition is located is obtained by comparing the similarity of the target data information and the query condition value, so that the required resume information can be quickly and accurately screened out, the query result can be more accurately obtained through the standard resume report, unified standardization of the resume report is realized, and viewing and management of resume information are facilitated.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In an embodiment, a resume data information analysis processing device is provided, where the resume data information analysis processing device corresponds to the resume data information analysis processing method in the above embodiment one by one. As shown in fig. 8, the resume data information analysis processing apparatus includes a file receiving module 801, a file converting module 802, a tag extracting module 803, a tag matching module 804, a text analyzing module 805, and an information importing module 806. The functional modules are described in detail as follows:

a file receiving module 801, configured to receive a resume file uploaded by a user;

the file conversion module 802 is configured to perform format conversion on the resume file according to a preset text format, so as to obtain a resume text corresponding to the resume file;

the tag extraction module 803 is configured to perform tag extraction on the resume text to obtain a title tag;

the tag matching module 804 is configured to match the title tag with a keyword according to a preset keyword, and determine the successfully matched title tag as a valid keyword;

the text parsing module 805 is configured to parse the resume text according to the data parsing manner corresponding to each effective keyword, so as to obtain data information corresponding to each effective keyword in the resume text;

The information importing module 806 is configured to match the valid keyword with the template tag according to the template tag in the preset standard resume template, import the data information corresponding to the valid keyword that is successfully matched into the position corresponding to the template tag, generate a standard resume report, and store the standard resume report in the resume library.

Further, the tag extraction module 803 includes:

a text acquisition unit 8031 for acquiring text lines in the resume text;

the feature extraction unit 8032 is configured to perform feature extraction on the text line according to a preset feature index, so as to obtain a feature vector;

the tag determining unit 8033 is configured to identify the text line as a title tag if the feature vector satisfies a preset tag condition.

Further, the text parsing module 805 includes:

a name obtaining unit 8051, configured to obtain a name data block corresponding to a name;

the name recognition unit 8052 is configured to perform name data recognition on the name data block according to a preset name regular expression, and use the recognized name data as data information corresponding to the name.

Further, the text parsing module 805 further includes:

a data acquisition unit 8053, configured to acquire a data block corresponding to an educational resume or a work experience;

The score calculating unit 8054 is configured to perform score calculation on the data block according to a preset score algorithm, so as to obtain a score value of the data block;

the data determining unit 8055 is configured to determine, if the score value is greater than a preset score threshold, a target data block corresponding to the score value as data information corresponding to an education resume or a work experience.

Further, the resume data information analysis processing device further includes:

a time identifying unit 8056, configured to identify a basic time period in the target data block according to a preset time regular expression;

a time determination unit 8057 for analyzing a time inclusion relation of the base time period, and determining a main time period and a sub time period in the base time period;

the data splitting unit 8058 is configured to split the target data block according to the main time period, so as to obtain phase data information corresponding to each main time period.

a request receiving module 807, configured to obtain query condition information in a resume information query request if a resume information query request sent by a user is received, where the query condition information includes a query condition item and a query condition value;

The condition matching module 808 is configured to match the query condition item with a template tag in the standard resume report, and obtain target data information corresponding to the successfully matched template tag;

the report obtaining module 809 is configured to compare the similarity between the target data information and the query condition value, and obtain a standard resume report where the target data information meeting the preset similarity condition is located.

For specific limitation of the resume data information analysis processing apparatus, reference may be made to the limitation of the resume data information analysis processing method hereinabove, and no further description is given here. The modules in the resume data information analysis processing device can be realized by all or part of software, hardware and combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing resume information. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a resume data information analysis processing method.

In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement the steps of the method for analyzing and processing resume data information in the foregoing embodiment, such as steps S10 to S60 shown in fig. 2. Alternatively, the processor may implement the functions of each module/unit of the resume data information analysis processing apparatus in the above embodiment when executing the computer program, for example, the functions of the modules 801 to 806 shown in fig. 8. In order to avoid repetition, a description thereof is omitted.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, where the computer program when executed by a processor implements the resume data information analysis processing method in the above method embodiment, or where the computer program when executed by a processor implements the functions of each module/unit in the resume data information analysis processing device in the above device embodiment. In order to avoid repetition, a description thereof is omitted.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; while the invention has been described in detail with reference to the foregoing embodiments, it will be appreciated by those skilled in the art that variations may be made in the techniques described in the foregoing embodiments, or equivalents may be substituted for elements thereof; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. The resume data information analysis processing method is characterized by comprising the following steps of:

receiving a resume file uploaded by a user;

Performing format conversion on the resume file according to a preset text format to obtain a resume text corresponding to the resume file; adopting a format conversion plug-in the Tika tool to perform format conversion on the received resume file to obtain a resume text with uniform format;

extracting the label from the resume text to obtain a title label;

according to a template label in a preset standard resume template, matching the effective keywords with the template label, importing the data information corresponding to the successfully matched effective keywords into a position corresponding to the template label, generating a standard resume report and storing the standard resume report in a resume library;

if a resume information inquiry request sent by the user is received, acquiring inquiry condition information in the resume information inquiry request, wherein the inquiry condition information comprises inquiry condition items and inquiry condition values;

Matching the query condition item with the template label in the standard resume report to obtain target data information corresponding to the template label which is successfully matched;

performing similarity comparison on the target data information and the query condition value to obtain a standard resume report in which the target data information meeting a preset similarity condition is located; the similarity comparison process comprises the following steps: word segmentation processing is respectively carried out on the query condition value and the text of the target data information to obtain a query condition value vocabulary unit and a target data information vocabulary unit; acquiring the total number of times of the word in the query condition value vocabulary unit in the target data information vocabulary unit, and if the total number of times is larger than or equal to a similarity threshold value, confirming that the target data information meets a preset similarity condition;

the step of extracting the tags of the resume text to obtain title tags comprises the following steps:

acquiring a text line in the resume text;

extracting the characteristics of the text line according to a preset characteristic index to obtain a characteristic vector;

and if the feature vector meets the preset label condition, marking the text line as the title label.

2. The method for analyzing and processing resume data information according to claim 1, wherein the effective keywords include names, and the analyzing the resume text according to the data analysis mode corresponding to the effective keywords for each effective keyword, and obtaining the data information corresponding to each effective keyword in the resume text includes:

acquiring a name data block corresponding to the name;

and carrying out name data identification on the name data block according to a preset name regular expression, and taking the identified name data as data information corresponding to the name.

3. The method for analyzing and processing resume data information according to claim 1, wherein the effective keywords comprise educational experiences or work experiences, and the analyzing the resume text according to the data analysis mode corresponding to the effective keywords for each effective keyword, and the obtaining the data information corresponding to each effective keyword in the resume text comprises:

acquiring a data block corresponding to the educational experience or the work experience;

according to a preset score algorithm, performing score calculation on the data block to obtain a score value of the data block;

And if the score value is larger than a preset score threshold value, determining the target data block corresponding to the score value as the data information corresponding to the educational experience or the work experience.

4. The resume data information analysis processing method according to claim 3, wherein the data information includes a main time period and phase data information corresponding to each main time period, and the resume data information analysis processing method further includes, after determining the target data block corresponding to the score value as the data information corresponding to the educational history or the work history if the score value is greater than a preset score threshold value:

identifying a basic time period in the target data block according to a preset time regular expression;

analyzing the time inclusion relation of the basic time period, and determining a main time period and a sub time period in the basic time period;

splitting the target data block according to the main time period to obtain phase data information corresponding to each main time period.

5. The resume data information analysis processing device is characterized by comprising:

the file conversion module is used for carrying out format conversion on the resume file according to a preset text format to obtain a resume text corresponding to the resume file; adopting a format conversion plug-in the Tika tool to perform format conversion on the received resume file to obtain a resume text with uniform format;

the information importing module is used for matching the effective keywords with the template labels according to template labels in a preset standard resume template, importing the data information corresponding to the effective keywords which are successfully matched into positions corresponding to the template labels, generating a standard resume report and storing the standard resume report in a resume library;

The request receiving module is used for acquiring query condition information in the resume information query request if the resume information query request sent by the user is received, wherein the query condition information comprises a query condition item and a query condition value;

the condition matching module is used for matching the query condition item with the template label in the standard resume report to obtain target data information corresponding to the successfully matched template label;

the report acquisition module is used for comparing the similarity between the target data information and the query condition value to acquire a standard resume report in which the target data information meeting the preset similarity condition is located; the similarity comparison process comprises the following steps: word segmentation processing is respectively carried out on the query condition value and the text of the target data information to obtain a query condition value vocabulary unit and a target data information vocabulary unit; acquiring the total number of times of the word in the query condition value vocabulary unit in the target data information vocabulary unit, and if the total number of times is larger than or equal to a similarity threshold value, confirming that the target data information meets a preset similarity condition;

the label extraction module comprises:

the text acquisition unit is used for acquiring text lines in the resume text;

The feature extraction unit is used for extracting the features of the text line according to a preset feature index to obtain a feature vector;

and the label determining unit is used for marking the text line as a title label if the feature vector meets the preset label condition.

6. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the resume data information parsing processing method according to any of claims 1 to 4 when the computer program is executed.

7. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the resume data information analysis processing method according to any one of claims 1 to 4.