CN109710843A - A method of improving search matching degree in big quantity personnel resume - Google Patents

A method of improving search matching degree in big quantity personnel resume Download PDF

Info

Publication number
CN109710843A
CN109710843A CN201811542296.4A CN201811542296A CN109710843A CN 109710843 A CN109710843 A CN 109710843A CN 201811542296 A CN201811542296 A CN 201811542296A CN 109710843 A CN109710843 A CN 109710843A
Authority
CN
China
Prior art keywords
resume
association
degree
personnel
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811542296.4A
Other languages
Chinese (zh)
Inventor
郑锐韬
涂旭平
李勇波
季统凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
G Cloud Technology Co Ltd
Original Assignee
G Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by G Cloud Technology Co Ltd filed Critical G Cloud Technology Co Ltd
Priority to CN201811542296.4A priority Critical patent/CN109710843A/en
Publication of CN109710843A publication Critical patent/CN109710843A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to big data analysis technical field, especially a kind of method that search matching degree is improved in big quantity personnel resume.Method of the invention is based on LAS algorithm and Apriori algorithm, utilize the semantic analysis of LAS algorithm, obtain the representative word of big quantity personnel resume, matching is associated to resume feature based on Apriori algorithm, the degree of association that all personnel resumes are obtained by association analysis high each associations and the degree of association.When the present invention realizes the search of a large amount of personnel resume, provides a kind of input one feature, the information of other associated features can be obtained;The matching degree scanned in big quantity personnel resume is improved, search efficiency is improved.

Description

A method of improving search matching degree in big quantity personnel resume
Technical field
The present invention relates to big data analysis technical field, especially one kind, and search matching is improved in big quantity personnel resume The method of degree.
Background technique
With the development of talents market, the enterprise to recruit can obtain the information of user on various human resource web sites;But people The lookup of ability is limited to the function that online talents market provides.Need to carry out looking into for large-scale personnel resume for special It looks for, can be obtained by modes such as web crawlers.But for existing big quantity personnel resume, wherein the relevance of resume, The matching degree of search is a problem for the personnel resume of text class.
Summary of the invention
Present invention solves the technical problem that being to provide a kind of side for improving search matching degree in big quantity personnel resume Method;The matching degree scanned in big quantity personnel resume is improved, search efficiency is improved.
The technical solution that the present invention solves above-mentioned technical problem is:
The method is to obtain big quantity using the semantic analysis of LAS algorithm based on LAS algorithm and Apriori algorithm The representative word of personnel resume is associated matching to resume feature based on Apriori algorithm, is obtained by association analysis all The degree of association of personnel resume high each associations and the degree of association.
The method comprising the following specific steps
The method is to obtain big quantity using the semantic analysis of LAS algorithm based on LAS algorithm and Apriori algorithm The representative word of personnel resume is associated matching to resume feature based on Apriori algorithm, is obtained by association analysis all The degree of association of personnel resume high each associations and the degree of association.
The method comprising the following specific steps
Step 1, the content for obtaining big quantity personnel resume, carry out delete processing by punctuation mark respectively, and formation can be used for The resume content of LAS algorithm input;
Step 2, input be each treated biographic information, building form word-document matrix based on each personnel resume, Each personnel resume analysis obtains multiple frequency of occurrence for representing word and each word, is counted by the number of appearance;
Step 3 handles the word frequency of each resume after statistics, removal common existing machine in all resumes Representative word of the rate less than 10%;The representative word of treated each resume is carried out one-to-many preservation by resume feature, Form the feature of each resume;
Step 4: using the word frequency of each resume after acquisition as feature, while the Apriori algorithm for being input to building is enterprising Row association analysis after the completion of fully entering, obtains the information of all degrees of association;
Step 5: after personnel resume association analysis, obtaining the degree of association for analyzing and;For there are other phases of search term The data of vocabulary is closed, it is both comprehensive to be arranged from big to small by incidence coefficient, save each incidence coefficient and associated item;
Step 6: when carrying out personnel resume search, after inputting keyword, being obtained from analyzing in the degree of association arrangement come Other associated keywords are obtained by the degree of association is maximum;Then relevant personnel resume is searched for, the model of search is expanded It encloses.
In the step 1, the content of personnel resume based on web crawlers by obtaining or passing through centralized document It obtains.
In the step 4, obtains the high information of the degree of association and saved;The Apriori that can be reserved for building simultaneously is real Example, for subsequent if there is being used when new personnel resume addition.
In the step 5, the calculating of the degree of association, the confidence level of degree of incidence and appearance including appearance, confidence level Calculating be based on bayesian algorithm;
It need to be saved when preservation and be associated with confidence including specific associations and the degree of association and associations and other associations Degree.
Specific step is as follows for the step 6:
1) it, after inputting the personnel resume keyword to be searched for, chooses whether to be associated search and associated number, and It submits;
If 2), selection is without association search, only inquire with all personnel resumes of the keyword match of input simultaneously Returned data;
If 3), selection is associated search, before search, the pass that is obtained from association analysis with the keyword of input Key item and the degree of association obtain degree of association ranking preceding, meet all associations of degree of incidence, and carry out what all associations came out The personnel resume of key item is searched for;
4) personnel resume, checked out indicates the associations and its degree of association for being associated and checking out thereunder, The guide for being associated search is provided.
The beneficial effects of the present invention are:
By this method, after the feature for obtaining personnel resume, by the quick association analysis of Apriori algorithm, institute is inputted There is the feature of personnel resume to be associated analysis, obtain the degree of association of the feature of all each resumes and the feature of other resumes, And confidence level, it obtains the forward multiclass related information of the degree of association and is saved, in subsequent search process, input one of them Feature can obtain other high features of the degree of association, to improve the matching degree of search personnel resume, improve the efficiency of search.
Detailed description of the invention
The following further describes the present invention with reference to the drawings:
Attached drawing 1 is flow chart of the invention.
Specific embodiment
As shown in Figure 1, the basic procedure of the method for the present invention is as follows:
Step 1: obtaining the content of big quantity personnel resume, carry out delete processing by punctuation mark respectively, so that being formed can Resume content for the input of LAS algorithm;
Step 2: input is each treated biographic information, building form word-document matrix based on each personnel resume, Each personnel resume analysis obtains multiple frequency of occurrence for representing word and each word, is counted by the number of appearance;
Step 3: the word frequency of each resume after statistics is handled, is removed in all resumes, it is common existing The representative word of treated each resume is carried out one-to-many guarantor by resume feature by representative word of the probability less than 10% It deposits, forms the feature of each resume;
Step 4: using the word frequency of each resume after acquisition as feature, while the Apriori algorithm for being input to building is enterprising Row association analysis after the completion of fully entering, obtains the information of all degrees of association;
Step 5: after big quantity personnel resume association analysis, obtains and analyze the next degree of association, the calculating of the degree of association, including The calculating of the degree of incidence of appearance and the confidence level of appearance, confidence level is based on bayesian algorithm, for the search term of appearance, occurs Other related vocabularies appearance data, it is both comprehensive to be arranged from big to small by incidence coefficient, save each incidence coefficient and Associated item;
Step 6: when carrying out personnel resume search, after inputting keyword, being obtained from analyzing in the degree of association arrangement come Other associated keywords are obtained by the degree of association is maximum, then scan for relevant personnel resume, are expanded in this way The range of search provides a kind of method that matching degree is searched in raising for lookups of the talent.
The content of the personnel resume of big quantity is by obtaining based on web crawlers acquisition, or by the document of centralization It taking, the big quantity personnel resume of formation only has resume content, and many resumes do not have specific name or apparent feature, in order to Relevant feature is obtained, all people's ability resume, is handled by punctuation mark is deleted, formation can be used for obtaining each resume Representative word input set.
For all personnel resumes, punctuation mark is deleted, including new line, resume is converted into can be used for The resume content of LAS algorithm input constructs LAS algorithm and therefrom extracts the representative word and the frequency of occurrences of each resume.
The representative word and word frequency of each personnel resume come out, the representative word the probability occurred jointly less than 10% Language is deleted, thus formed it is relatively high represent word jointly, the representative word of each personnel resume after processed, formed The feature of each resume is saved by one-to-many mode.
Apriori algorithm is constructed, and the feature word frequency that all resumes are obtained based on LSA analysis is inputted, thus shape At the association analysis of the key vocabularies of all personnel resumes, obtains the high information of the degree of association and saved;It can be reserved for building simultaneously Apriori example, for it is subsequent be added if there is new personnel resume when, only need to restore the example having been built up, then plus Entering new personnel resume feature can be carried out new association analysis.
The association results for the big quantity personnel resume that Apriori algorithm association analysis obtains, need when stored save include The associated confidence of specific associations and the degree of association and associations and other associations because associations can with it is more A others associations form the high degree of association, include the high confidence level that it occurs, and the calculating of confidence level is calculated based on Bayes Method, the association probability for calculating associations and linked character both need to save pass when stored to obtain its maximum degree of association Copula, and degree of association confidence level need to be saved.
After the key item degree of association obtained after Apriori algorithm association analysis, when searched for from the quantity talent, Steps are as follows:
1) it after inputting the personnel resume keyword to be searched for, chooses whether to be associated search and associated number, and It submits;
If 2), search is without association search, only inquires all personnel resumes with the keyword of input and return Data;
If 3), search has selected to be associated search, before search, with the keyword of input from association analysis Key item and the degree of association out obtain degree of association ranking in preceding, all associations of the associated number of selection, and are owned The search of the personnel resume for the key item that association comes out;
4) personnel resume, checked out indicates the associations and its degree of association for being associated and checking out thereunder, To provide the guide for being associated search for seeker.
The method of the present invention is based on the efficient semantic analysis of LAS (latent semantic analysis) algorithm, and is based on Apriori algorithm pair Resume feature is associated matching, obtains the key message in the content of big quantity personnel resume, obtains institute by association analysis There are each associations and the degree of association that the degree of association of personnel resume is high, to provide one during carrying out personnel resume search The method that kind improves search matching degree, improves the efficiency of search.

Claims (6)

1. a kind of method for improving search matching degree in big quantity personnel resume, it is characterised in that: the method is to be based on LAS algorithm and Apriori algorithm obtain the representative word of big quantity personnel resume, are based on using the semantic analysis of LAS algorithm Apriori algorithm is associated matching to resume feature, high each of the degree of association for obtaining all personnel resumes by association analysis Associations and the degree of association.
2. according to the method described in claim 1, it is characterized by: the method comprising the following specific steps
Step 1, the content for obtaining big quantity personnel resume, carry out delete processing by punctuation mark respectively, and formation can be used for LAS calculation The resume content of method input;
Step 2, input be each treated biographic information, building form word-document matrix based on each personnel resume, each Personnel resume analysis obtains multiple frequency of occurrence for representing word and each word, is counted by the number of appearance;
Step 3 handles the word frequency of each resume after statistics, and removal common existing probability in all resumes is small In 10% representative word;The representative word of treated each resume is carried out one-to-many preservation by resume feature, is formed The feature of each resume;
Step 4: using the word frequency of each resume after acquisition as feature, while being input in the Apriori algorithm of building and being closed Connection analysis, after the completion of fully entering, obtains the information of all degrees of association;
Step 5: after personnel resume association analysis, obtaining the degree of association for analyzing and;For there are other correlatives of search term The data of remittance, it is both comprehensive to be arranged from big to small by incidence coefficient, save each incidence coefficient and associated item;
Step 6: when carrying out personnel resume search, after inputting keyword, obtaining correlation from analyzing in the degree of association arrangement come Other keywords of connection, are obtained by the degree of association is maximum;Then relevant personnel resume is searched for, the range of search is expanded.
3. according to the method described in claim 2, it is characterized by:
In the step 1, the content of personnel resume based on web crawlers by being obtained or being obtained by the document of centralization.
4. according to the method described in claim 2, it is characterized by:
In the step 4, obtains the high information of the degree of association and saved;The Apriori example that can be reserved for building simultaneously, is used In subsequent if there is being used when new personnel resume addition.
5. according to the method described in claim 2, it is characterized by:
In the step 5, the calculating of the degree of association, the confidence level of degree of incidence and appearance including appearance, the meter of confidence level It calculates and is based on bayesian algorithm;
The associated confidence including specific associations and the degree of association and associations and other associations need to be saved when preservation.
6. according to the method described in claim 2, it is characterized by:
Specific step is as follows for the step 6:
1) it, after inputting the personnel resume keyword to be searched for, chooses whether to be associated search and associated number, and mention It hands over;
If 2), selection is without association search, only inquires all personnel resumes with the keyword match of input and return Data;
If 3), selection is associated search, before search, the key item that is obtained from association analysis with the keyword of input And the degree of association, degree of association ranking is obtained preceding, meets all associations of degree of incidence, and carries out the keys that all associations come out The personnel resume search of item;
4) personnel resume, checked out is indicated the associations and its degree of association for being associated and checking out thereunder, is provided It is associated the guide of search.
CN201811542296.4A 2018-12-17 2018-12-17 A method of improving search matching degree in big quantity personnel resume Withdrawn CN109710843A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811542296.4A CN109710843A (en) 2018-12-17 2018-12-17 A method of improving search matching degree in big quantity personnel resume

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811542296.4A CN109710843A (en) 2018-12-17 2018-12-17 A method of improving search matching degree in big quantity personnel resume

Publications (1)

Publication Number Publication Date
CN109710843A true CN109710843A (en) 2019-05-03

Family

ID=66256672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811542296.4A Withdrawn CN109710843A (en) 2018-12-17 2018-12-17 A method of improving search matching degree in big quantity personnel resume

Country Status (1)

Country Link
CN (1) CN109710843A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312397A (en) * 2023-10-18 2023-12-29 广东倍智人才科技股份有限公司 Talent supply chain management method and system based on big data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980961A (en) * 2017-03-02 2017-07-25 中科天地互联网科技(苏州)有限公司 A kind of resume selection matching process and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980961A (en) * 2017-03-02 2017-07-25 中科天地互联网科技(苏州)有限公司 A kind of resume selection matching process and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
肖云鹏等: "以招聘就业大数据为基础反馈教学的‘校企学’服务模式思考", 《当代教育实践与教学研究》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117312397A (en) * 2023-10-18 2023-12-29 广东倍智人才科技股份有限公司 Talent supply chain management method and system based on big data
CN117312397B (en) * 2023-10-18 2024-03-22 广东倍智人才科技股份有限公司 Talent supply chain management method and system based on big data

Similar Documents

Publication Publication Date Title
CN109815308B (en) Method and device for determining intention recognition model and method and device for searching intention recognition
CN109447266B (en) Agricultural scientific and technological service intelligent sorting method based on big data
CN102902821B (en) The image high-level semantics mark of much-talked-about topic Network Based, search method and device
CN104199857B (en) A kind of tax document hierarchy classification method based on multi-tag classification
US8983971B2 (en) Method, apparatus, and system for mobile search
CN104933100B (en) keyword recommendation method and device
CN105045875B (en) Personalized search and device
CN105808590B (en) Search engine implementation method, searching method and device
CN105653840A (en) Similar case recommendation system based on word and phrase distributed representation, and corresponding method
CN110781317A (en) Method and device for constructing event map and electronic equipment
CN103838833A (en) Full-text retrieval system based on semantic analysis of relevant words
CN106874441A (en) Intelligent answer method and apparatus
CN105095433A (en) Recommendation method and device for entities
CN107329995A (en) A kind of controlled answer generation method of semanteme, apparatus and system
CN104794154A (en) O2O service quality evaluation model for medical apparatus based on text mining
Du et al. An approach for selecting seed URLs of focused crawler based on user-interest ontology
CN104484380A (en) Personalized search method and personalized search device
CN112307182B (en) Question-answering system-based pseudo-correlation feedback extended query method
CN109948154B (en) Character acquisition and relationship recommendation system and method based on mailbox names
CN106407377A (en) Search method and device based on artificial intelligence
CN104281565A (en) Semantic dictionary constructing method and device
CN106599174A (en) Real-time news recommendation system and method thereof
CN110866102A (en) Search processing method
CN115640458A (en) Remote sensing satellite information recommendation method, system and equipment
CN112685440B (en) Structural query information expression method for marking search semantic role

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190503