CN112100492A

CN112100492A - Batch delivery method and system for resumes of different versions

Info

Publication number: CN112100492A
Application number: CN202010954388.4A
Authority: CN
Inventors: 吴晓军
Original assignee: Hebei Jilian Human Resources Service Group Co ltd
Current assignee: Hebei Jilian Human Resources Service Group Co ltd
Priority date: 2020-09-11
Filing date: 2020-09-11
Publication date: 2020-12-18

Abstract

The present disclosure provides a batch delivery method and system for resumes of different versions, including: acquiring position information downloaded from a plurality of sites to form a local position database; generating a theme of the local position information according to the position information in the local position database; calculating a first correlation vector of each position information in the local position database and the generated theme; calculating a second correlation vector of the resume of the user-selected version and the generated subject; calculating the similarity of the first correlation vector and the second correlation vector; delivering the resume in the user-selected version to the associated one or more positions based on the descending order of similarity.

Description

Batch delivery method and system for resumes of different versions

Technical Field

The present disclosure relates to the field of information technologies, and in particular, to a batch delivery method and system for resumes of different versions, an electronic device, and a computer-readable storage medium.

Background

In the existing website for providing the internet recruitment service, a conventional method is that a recruiter issues a job position to be recruited, and a job seeker interested in the job position delivers a resume to the job position. Some recruitment websites can automatically match the relevance between the job seeker and the job position, push the job position to the job seeker with high relevance, and improve the effect of recruitment.

However, resume casting is troublesome, and it is not necessary to visit each large recruitment platform, such as intelligent union and boss direct hiring, but also possible to visit each large company's own official website, to search and screen positions meeting the needs of the company, and then to post the positions one by one. If the job seeker needs to prepare a plurality of resumes and put corresponding resumes in a targeted manner in order to search for a plurality of positions, the job seeker is very troublesome, and wrong resumes can be delivered carelessly.

Therefore, a one-stop method for resume with different versions, automatically matching and adapting to positions and delivering resumes in batches is urgently needed, and functions of automatically selecting positions, storing the positions in a database, managing resumes with different versions, matching positions of resumes and delivering resumes by one key are automatically realized.

Disclosure of Invention

In view of this, an object of the embodiments of the present disclosure is to provide a batch delivery method and system for resumes of different versions, which implement efficient and accurate matching between the resume and the job by generating topics of the resumes and the job and calculating similarity through an LDA machine learning algorithm based on the topics, so as to achieve a function of automatically delivering resumes of different versions in batches.

According to a first aspect of the present disclosure, there is provided a batch delivery method of resumes of different versions, including:

acquiring position information downloaded from a plurality of sites to form a local position database;

generating a theme of the local position information according to the position information in the local position database;

calculating a first correlation vector of each position information in the local position database and the generated theme;

calculating a second correlation vector of the resume of the user-selected version and the generated subject;

calculating the similarity of the first correlation vector and the second correlation vector;

delivering the resume in the user-selected version to the associated one or more positions based on the descending order of similarity.

In one possible embodiment, the method for generating the topic of the local position information comprises:

segmenting words of all position information in the local position database according to the existing dictionary, wherein the words comprise sentence breaks, word segmentation and stop word removal to obtain a first segmented word;

extracting 2-grams and 3-grams according to the obtained first word segmentation words, calculating mutual information values of each 2-gram and each 3-gram, performing descending arrangement on the 2-grams and the 3-grams based on the mutual information values, and selecting the 2-grams and the 3-grams which are ranked in the front for updating the first word segmentation words and the existing dictionary;

respectively calculating information entropies of left and right adjacent characters of the word segmentation words according to the obtained word segmentation words, merging the first word segmentation words based on the information entropies, and further updating the first word segmentation words and the existing dictionary;

filtering a second participle word obtained after the first participle word is updated by using a TF-IDF method to obtain a third participle word;

classifying words according to the positions of the sites, counting the probability of the words appearing in the local position information, and filtering the third word segmentation words according to the probability to obtain fourth word segmentation words;

and converting the fourth word words into word vectors, and clustering the word vectors to obtain a plurality of word clusters serving as the subjects of the local position information.

In one possible embodiment, the calculation of the first correlation vector or the second correlation vector comprises: calculating the first correlation vector or the second correlation vector based on a machine learning model of an LDA topic model.

In one possible embodiment, the calculating the similarity between the first correlation vector and the second correlation vector includes: a cosine distance or a euclidean distance or a manhattan distance between the first correlation vector and the second correlation vector is calculated.

In one possible embodiment, after the similarity-based descending ranking, delivering resumes to the positions associated therewith further comprises: and automatically downloading the position information again at the preset site within a preset time period to obtain the updated local position information.

In a possible embodiment, after obtaining the updated local position information, the method further includes: automatically reminding the user of the position with newly found similarity meeting the preset value, and automatically delivering the resume with the preset version to the position

In one possible embodiment, the training data of the machine learning model based on the LDA topic model is obtained by intersecting words obtained by performing word segmentation on the updated existing dictionary of local position information with words included in the topic.

According to a second aspect of the present disclosure, there is provided a batch delivery system of resumes in different versions, comprising:

the system comprises a position acquisition unit, a position database and a position database, wherein the position acquisition unit is used for acquiring position information downloaded from a plurality of sites and forming the local position database;

the theme generating unit is used for generating a theme of the local position information according to the position information in the local position database;

the first correlation vector unit is used for calculating a first correlation vector of each local position and the generated theme;

the second correlation vector unit is used for calculating a second correlation vector of the user-selected version of the ephemeral and the generated theme;

a similarity calculation unit configured to calculate a similarity between the first correlation vector and the second correlation vector;

and the resume delivery unit is used for delivering the resume of the version selected by the user to the associated one or more positions based on the descending order of the similarity.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the program.

According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. The foregoing and other objects, features and advantages of the application will be apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the drawings. The drawings are not intended to be to scale as practical, emphasis instead being placed upon illustrating the subject matter of the present application.

Fig. 1 illustrates a schematic view of a typical search and presentation interface for job information for a recruitment platform, according to an embodiment of the present disclosure.

Fig. 2 is a diagram illustrating exemplary job information downloaded from a website according to an embodiment of the present disclosure.

FIG. 3 shows a schematic diagram of a typical method of batch delivery of different versions of resumes according to an embodiment of the present disclosure.

FIG. 4 is a diagram illustrating an exemplary method of building training data for a machine learning model according to an embodiment of the present disclosure.

FIG. 5 shows a schematic diagram of a system for batch delivery of exemplary different versions of resumes according to an embodiment of the present disclosure.

Fig. 6 shows a schematic structural diagram of an electronic device for implementing an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The words "a", "an" and "the" and the like as used herein are also intended to include the meanings of "a plurality" and "the" unless the context clearly dictates otherwise. Furthermore, the terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

In view of the above, the present disclosure provides a one-stop method for automatically matching adaptive positions and delivering resume in batch for resumes of different versions, which automatically realizes functions of job selection, saving in a database, managing resumes of different versions, position matching of resumes, and one-key resume delivery.

The present disclosure is described in detail below with reference to the attached drawings.

Taking a certain large-scale recruitment platform information as an example, in order to establish a local position database, a user can log in a recruitment platform by using an account number, search positions concerned by the user, download the positions, and authorize the disclosure to be obtained in an automatic crawling manner.

The content required for establishing the local job database comprises the following contents:

job names, such as: C/C + + engineer

Company names, for example: XXX Ltd

Post release times, for example: 19 hours before

Description of the responsibilities, for example: 1. the system is responsible for image recognition algorithm development and optimization and hardware drive development and debugging work; 2. performing the compiling, optimizing and API interface development work of the bottom layer algorithm model according to the research and development result of the algorithm engineer; 3. the hardware interface driver development, debugging and optimization work is carried out by matching with a hardware engineer; and other contents such as salary range, work calendar, school calendar, etc., and the disclosure is not limited.

Fig. 2 is a diagram illustrating exemplary job information downloaded from a website according to an embodiment of the present disclosure. The job information is saved to Excel or any possible database table, or other database management system, and the disclosure is not limited.

By analogy, 51 jobs, boss direct hiring, Tencent hiring, Ali hiring, company official websites and the like can be obtained. In summary, the source of job information may include all interested sites, all jobs downloaded or crawled locally, built to form a local job database.

By step 301: and acquiring the position information downloaded from the plurality of sites to form a local position database.

And generating a theme of the local position information according to the position information in the local position database. The method for generating the theme of the local position information comprises the following steps:

step 302: and segmenting words of all position information in the local position database according to the existing dictionary, wherein the words comprise sentence breaks, word segmentation and stop word removal, and a first segmented word is obtained.

Step 303: extracting 2-grams and 3-grams from the obtained word segmentation words according to the obtained first word segmentation words, calculating mutual information values of each 2-gram and each 3-gram, performing descending arrangement on the 2-grams and the 3-grams based on the mutual information values, and selecting the 2-grams and the 3-grams with the top order for updating the first word segmentation words and the existing dictionary.

In natural language processing, N-gram is a common language model, and for Chinese, matching information between adjacent words in the context can be utilized to improve the processing effect. The basic idea is to perform a sliding window operation with a size of N on the content in the text according to bytes, so as to form a byte fragment sequence with a length of N, wherein each byte fragment is called a gram. In this disclosure, a 1-gram is a word resulting from word segmentation. The 2-gram is two words in succession and the 3-gram is three words in succession.

Further, the 1-gram in this disclosure is the word obtained after each word segmentation. A 2-gram, is a contiguous 2 words, e.g., an algorithm engineer, connected in such a way as to connect the algorithm to the engineer. A3-gram is a sequence of 3 words, for example, connecting natural, linguistic, and processing to obtain natural language processing.

The formula for calculating the mutual information value is shown below:

mutual information values represent the degree of interdependence between two variables. The binary mutual information is the amount of correlation of two events, and the higher the mutual information value is, the higher the correlation of X and Y is, the higher the possibility that X and Y form a phrase is; conversely, the lower the mutual information value, the lower the correlation between X and Y, and the greater the likelihood of a phrase boundary between X and Y. X and Y in the formula refer to two adjacent words, and the P value is its probability of occurrence.

For example, in one context, "algorithm engineer" is a 2-gram formed by connecting an algorithm with an engineer, and the total number of occurrences is 3, while the total number of 2-grams is 252, so that P (X, Y) in the above formula is 3/252. In the same way, P (X) P (Y) can be obtained.

Step 304: and respectively calculating the information entropies of the left and right adjacent characters of the word segmentation words according to the obtained word segmentation words, and merging the first word segmentation words based on the information entropies for further updating the first word segmentation words and the existing dictionary.

The purpose of calculating the information entropy of the left and right adjacent characters of a word is to use the information entropy to measure how random the left adjacent character set and the right adjacent character set of a text segment are, namely, to set a reasonable threshold value by using the information entropy, and to reserve the word segmentation within the threshold value range, which indicates that the words are more likely to be fixed word groups, otherwise, the left and right adjacent characters are more likely to be randomly combined, and can not be reserved.

For example, for "text/analysis/naming", it can be calculated that the left entropy of the word "analysis" is low, the word "text" and "analysis" should be merged, the right entropy of the word "analysis" is high, and the word "analysis" and "naming" should be separated.

Step 305: and filtering the second participle word obtained after the first participle word is updated by using a TF-IDF method to obtain a third participle word.

The reason for filtering the second-participle words is that despite the new word dictionary, the participle will still separate a large heap of words with seven and eight worlds. E.g., H5 vue, front end, page, five-risk one-fund, group building, employee benefit, growth, responsibility, skill, learning, priority, experience, understanding. The first 4 words are keywords, and the latter ten words have too low value and should be deleted. Therefore, by setting a reasonable third threshold range using the TF-IDF method, common words in job description, such as priority, experience, proficiency, understanding, etc., can be filtered out.

Step 306: and classifying words according to the positions of the sites, counting the probability of the words appearing in the local position information, and filtering the third word segmentation words according to the probability to obtain a fourth word segmentation word.

For example, the original category words of job information obtained from the crawled plurality of recruitment websites are responsible, skilled, and grown. The written words are deleted as they appear in almost all resumes and positions with almost no information content, as statistically the probability of the words appearing in the second text is 99%. This further enhances the filtering of what should be filtered out of step 305.

Step 307: and converting the fourth word words into word vectors, and clustering the word vectors to obtain a plurality of word clusters serving as the subjects of the local position information. Word2vec or other methods may be used to convert the fourth-word words into word vectors, which are clustered by k-means clustering or other clustering methods.

Step 308: a first correlation vector is calculated for each position information in the local position database with the generated topic.

Step 309: a second correlation vector of the user-selected version of the resume with the generated topic is calculated.

Wherein the calculation of the first correlation vector or the second correlation vector comprises: calculating the first correlation vector or the second correlation vector based on a machine learning model of an LDA topic model.

FIG. 4 is a diagram illustrating an exemplary method of building training data for a machine learning model according to an embodiment of the present disclosure. The training data of the machine learning model based on the LDA topic model is obtained by intersecting words obtained by word segmentation of the updated existing dictionary of local position information with words included in the topic. Other methods of establishing training data may also be used, as the present disclosure is not limited in this respect.

Calculating the first correlation vector and the second correlation vector, and for the front-end engineer, assuming that the extracted fourth participle word is: h5, html, css, vue, node, js, page, and beauty.

After clustering, the generated topics are respectively topic 1, topic 2, topic 3 and topic 4, and then the clustering algorithm is calculated by a topic-based LDA machine learning model to obtain:

p (belonging to subject 1) ═ 0.1;

p (belonging to subject 2) ═ 0.3;

p (belonging to subject 3) ═ 0.2;

p (belonging to subject 4) ═ 0.8;

p is the probability.

The first correlation vector is then: v1 ═ 0.1,0.3,0.2, 0.8.

Similarly, for a version of the resume selected by the user, a second correlation vector is calculated, for example, v2 ═ 0.2,0.3,0.2, 0.7.

Step 310: and calculating the similarity of the first correlation vector and the second correlation vector. Since the matching degree of two texts can be represented by the distance of the vectors, the matching degree between the job and the resume can be reflected by calculating the cosine distance or the euclidean distance or the manhattan distance between the first correlation vector and the second correlation vector.

In step 311, the resume of the version selected by the user is delivered to the associated one or more positions based on the descending order of the similarity. The user can preset the resume version which the user wants to deliver.

By means of the theme generation mode, resume and positions are matched, and the problem of inaccurate matching caused by excessive words, multiple meanings of a word, too narrow or too wide meanings of the words and the like is solved.

In one embodiment, after the similarity-based descending ranking, delivering resumes to the positions associated therewith further comprises: and automatically downloading the position information again at the preset site within a preset time period to obtain the updated local position information.

By presetting a time period, for example, one week, at a fixed time of each week, the positions of each site can be automatically downloaded again, expired positions can be deleted, newly added positions can be supplemented, and updated local position information can be obtained.

In one embodiment, after obtaining the updated local position information, the method further includes: and automatically reminding the user of the position of which the newly found similarity accords with the preset value, and automatically delivering the resume of the preset version to the position.

If the preset value is 90%, the user is automatically reminded of the position with the newly found similarity being greater than or equal to 90%, and the resume with the preset version is automatically delivered to the position.

In one embodiment, the user selects a version of his resume and automatically matches the corresponding job from the local job database. At this time, the positions of the self-mind apparatus are directly selected in batches, and resumes are delivered in batches by one key.

Since the matched positions are searched from the current resume, the selection of the wrong resume is also avoided.

And when the delivery is simplified, opening the selected position URLs in sequence, wherein the corresponding recruiter website may need to be logged in, and the present disclosure can also be authorized to automatically complete the delivery.

Through the mode, the job hunting can be effectively helped to find newly-increased positions, the burden of delivering resume of job hunters is reduced, more resumes are thrown, and the success chance of job hunting is increased.

FIG. 5 shows a schematic diagram of a system for batch delivery of exemplary different versions of resumes according to an embodiment of the present disclosure. The system 500, comprising:

a job position obtaining unit 501, configured to obtain job position information downloaded from multiple websites, and form a local job position database;

a theme generating unit 502, configured to generate a theme of the local position information according to the position information in the local position database;

a first correlation vector unit 503, configured to calculate a first correlation vector between each local position and the generated topic;

a second correlation vector unit 504 that calculates a second correlation vector of the user-selected version of the almanac and the generated topic;

a similarity calculation unit 505, configured to calculate a similarity between the first correlation vector and the second correlation vector;

a resume delivery unit 506 for delivering the user-selected version of the resume to the associated one or more positions based on the descending order of similarity.

In one embodiment, the system 500 further comprises: an update job unit 507, configured to, after the similarity-based descending order, post resumes to the job associated therewith, further include: and automatically downloading the position information again at the preset site within a preset time period to obtain the updated local position information.

In one embodiment, the system 500 further comprises: and the automatic reminding unit 508 is used for automatically reminding the user of the position with the newly found similarity meeting the preset value, and automatically delivering the resume with the preset version to the position.

In one embodiment, the system 500 further comprises: a one-touch delivery unit 509, configured to enable a user to select a certain version of the resume, and automatically match the corresponding job from the local job database. At this time, the positions of the self-mind apparatus are directly selected in batches, and resumes are delivered in batches by one key.

Fig. 6 shows a schematic structural diagram of an electronic device for implementing an embodiment of the present disclosure. As shown in fig. 6, the electronic apparatus 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer-readable medium bearing instructions that, in such embodiments, may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable media 611. The various method steps described in this disclosure are performed when the instructions are executed by a Central Processing Unit (CPU) 601.

Although example embodiments have been described, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the disclosed concept. Accordingly, it should be understood that the above-described exemplary embodiments are not limiting, but illustrative.

Claims

1. A batch delivery method of resumes of different versions comprises the following steps:

2. The method of claim 1, the method of generating the topic of the local job information comprising:

3. The method of claim 1, wherein the calculation of the first correlation vector or the second correlation vector comprises: calculating the first correlation vector or the second correlation vector based on a machine learning model of an LDA topic model.

4. The method of claim 1, wherein said calculating the similarity of the first correlation vector and the second correlation vector comprises: a cosine distance or a euclidean distance or a manhattan distance between the first correlation vector and the second correlation vector is calculated.

5. The method of claim 1, further comprising, after the similarity-based descending ranking, delivering resumes to the job sites associated therewith: and automatically downloading the position information again at the preset site within a preset time period to obtain the updated local position information.

6. The method of claim 5, further comprising, after the obtaining updated local job information: and automatically reminding the user of the position of which the newly found similarity accords with the preset value, and automatically delivering the resume of the preset version to the position.

7. The method of claim 3, wherein the training data for the LDA topic model based machine learning model is derived from the intersection of words derived from updated existing dictionary segmentation of local position information with words comprised by the topic.

8. A batch delivery system of different versions of resumes comprising:

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.

10. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 7.