CN109710843A - A method of improving search matching degree in big quantity personnel resume - Google Patents
A method of improving search matching degree in big quantity personnel resume Download PDFInfo
- Publication number
- CN109710843A CN109710843A CN201811542296.4A CN201811542296A CN109710843A CN 109710843 A CN109710843 A CN 109710843A CN 201811542296 A CN201811542296 A CN 201811542296A CN 109710843 A CN109710843 A CN 109710843A
- Authority
- CN
- China
- Prior art keywords
- resume
- association
- degree
- personnel
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention relates to big data analysis technical field, especially a kind of method that search matching degree is improved in big quantity personnel resume.Method of the invention is based on LAS algorithm and Apriori algorithm, utilize the semantic analysis of LAS algorithm, obtain the representative word of big quantity personnel resume, matching is associated to resume feature based on Apriori algorithm, the degree of association that all personnel resumes are obtained by association analysis high each associations and the degree of association.When the present invention realizes the search of a large amount of personnel resume, provides a kind of input one feature, the information of other associated features can be obtained;The matching degree scanned in big quantity personnel resume is improved, search efficiency is improved.
Description
Technical field
The present invention relates to big data analysis technical field, especially one kind, and search matching is improved in big quantity personnel resume
The method of degree.
Background technique
With the development of talents market, the enterprise to recruit can obtain the information of user on various human resource web sites;But people
The lookup of ability is limited to the function that online talents market provides.Need to carry out looking into for large-scale personnel resume for special
It looks for, can be obtained by modes such as web crawlers.But for existing big quantity personnel resume, wherein the relevance of resume,
The matching degree of search is a problem for the personnel resume of text class.
Summary of the invention
Present invention solves the technical problem that being to provide a kind of side for improving search matching degree in big quantity personnel resume
Method;The matching degree scanned in big quantity personnel resume is improved, search efficiency is improved.
The technical solution that the present invention solves above-mentioned technical problem is:
The method is to obtain big quantity using the semantic analysis of LAS algorithm based on LAS algorithm and Apriori algorithm
The representative word of personnel resume is associated matching to resume feature based on Apriori algorithm, is obtained by association analysis all
The degree of association of personnel resume high each associations and the degree of association.
The method comprising the following specific steps
The method is to obtain big quantity using the semantic analysis of LAS algorithm based on LAS algorithm and Apriori algorithm
The representative word of personnel resume is associated matching to resume feature based on Apriori algorithm, is obtained by association analysis all
The degree of association of personnel resume high each associations and the degree of association.
The method comprising the following specific steps
Step 1, the content for obtaining big quantity personnel resume, carry out delete processing by punctuation mark respectively, and formation can be used for
The resume content of LAS algorithm input;
Step 2, input be each treated biographic information, building form word-document matrix based on each personnel resume,
Each personnel resume analysis obtains multiple frequency of occurrence for representing word and each word, is counted by the number of appearance;
Step 3 handles the word frequency of each resume after statistics, removal common existing machine in all resumes
Representative word of the rate less than 10%;The representative word of treated each resume is carried out one-to-many preservation by resume feature,
Form the feature of each resume;
Step 4: using the word frequency of each resume after acquisition as feature, while the Apriori algorithm for being input to building is enterprising
Row association analysis after the completion of fully entering, obtains the information of all degrees of association;
Step 5: after personnel resume association analysis, obtaining the degree of association for analyzing and;For there are other phases of search term
The data of vocabulary is closed, it is both comprehensive to be arranged from big to small by incidence coefficient, save each incidence coefficient and associated item;
Step 6: when carrying out personnel resume search, after inputting keyword, being obtained from analyzing in the degree of association arrangement come
Other associated keywords are obtained by the degree of association is maximum;Then relevant personnel resume is searched for, the model of search is expanded
It encloses.
In the step 1, the content of personnel resume based on web crawlers by obtaining or passing through centralized document
It obtains.
In the step 4, obtains the high information of the degree of association and saved;The Apriori that can be reserved for building simultaneously is real
Example, for subsequent if there is being used when new personnel resume addition.
In the step 5, the calculating of the degree of association, the confidence level of degree of incidence and appearance including appearance, confidence level
Calculating be based on bayesian algorithm;
It need to be saved when preservation and be associated with confidence including specific associations and the degree of association and associations and other associations
Degree.
Specific step is as follows for the step 6:
1) it, after inputting the personnel resume keyword to be searched for, chooses whether to be associated search and associated number, and
It submits;
If 2), selection is without association search, only inquire with all personnel resumes of the keyword match of input simultaneously
Returned data;
If 3), selection is associated search, before search, the pass that is obtained from association analysis with the keyword of input
Key item and the degree of association obtain degree of association ranking preceding, meet all associations of degree of incidence, and carry out what all associations came out
The personnel resume of key item is searched for;
4) personnel resume, checked out indicates the associations and its degree of association for being associated and checking out thereunder,
The guide for being associated search is provided.
The beneficial effects of the present invention are:
By this method, after the feature for obtaining personnel resume, by the quick association analysis of Apriori algorithm, institute is inputted
There is the feature of personnel resume to be associated analysis, obtain the degree of association of the feature of all each resumes and the feature of other resumes,
And confidence level, it obtains the forward multiclass related information of the degree of association and is saved, in subsequent search process, input one of them
Feature can obtain other high features of the degree of association, to improve the matching degree of search personnel resume, improve the efficiency of search.
Detailed description of the invention
The following further describes the present invention with reference to the drawings:
Attached drawing 1 is flow chart of the invention.
Specific embodiment
As shown in Figure 1, the basic procedure of the method for the present invention is as follows:
Step 1: obtaining the content of big quantity personnel resume, carry out delete processing by punctuation mark respectively, so that being formed can
Resume content for the input of LAS algorithm;
Step 2: input is each treated biographic information, building form word-document matrix based on each personnel resume,
Each personnel resume analysis obtains multiple frequency of occurrence for representing word and each word, is counted by the number of appearance;
Step 3: the word frequency of each resume after statistics is handled, is removed in all resumes, it is common existing
The representative word of treated each resume is carried out one-to-many guarantor by resume feature by representative word of the probability less than 10%
It deposits, forms the feature of each resume;
Step 4: using the word frequency of each resume after acquisition as feature, while the Apriori algorithm for being input to building is enterprising
Row association analysis after the completion of fully entering, obtains the information of all degrees of association;
Step 5: after big quantity personnel resume association analysis, obtains and analyze the next degree of association, the calculating of the degree of association, including
The calculating of the degree of incidence of appearance and the confidence level of appearance, confidence level is based on bayesian algorithm, for the search term of appearance, occurs
Other related vocabularies appearance data, it is both comprehensive to be arranged from big to small by incidence coefficient, save each incidence coefficient and
Associated item;
Step 6: when carrying out personnel resume search, after inputting keyword, being obtained from analyzing in the degree of association arrangement come
Other associated keywords are obtained by the degree of association is maximum, then scan for relevant personnel resume, are expanded in this way
The range of search provides a kind of method that matching degree is searched in raising for lookups of the talent.
The content of the personnel resume of big quantity is by obtaining based on web crawlers acquisition, or by the document of centralization
It taking, the big quantity personnel resume of formation only has resume content, and many resumes do not have specific name or apparent feature, in order to
Relevant feature is obtained, all people's ability resume, is handled by punctuation mark is deleted, formation can be used for obtaining each resume
Representative word input set.
For all personnel resumes, punctuation mark is deleted, including new line, resume is converted into can be used for
The resume content of LAS algorithm input constructs LAS algorithm and therefrom extracts the representative word and the frequency of occurrences of each resume.
The representative word and word frequency of each personnel resume come out, the representative word the probability occurred jointly less than 10%
Language is deleted, thus formed it is relatively high represent word jointly, the representative word of each personnel resume after processed, formed
The feature of each resume is saved by one-to-many mode.
Apriori algorithm is constructed, and the feature word frequency that all resumes are obtained based on LSA analysis is inputted, thus shape
At the association analysis of the key vocabularies of all personnel resumes, obtains the high information of the degree of association and saved;It can be reserved for building simultaneously
Apriori example, for it is subsequent be added if there is new personnel resume when, only need to restore the example having been built up, then plus
Entering new personnel resume feature can be carried out new association analysis.
The association results for the big quantity personnel resume that Apriori algorithm association analysis obtains, need when stored save include
The associated confidence of specific associations and the degree of association and associations and other associations because associations can with it is more
A others associations form the high degree of association, include the high confidence level that it occurs, and the calculating of confidence level is calculated based on Bayes
Method, the association probability for calculating associations and linked character both need to save pass when stored to obtain its maximum degree of association
Copula, and degree of association confidence level need to be saved.
After the key item degree of association obtained after Apriori algorithm association analysis, when searched for from the quantity talent,
Steps are as follows:
1) it after inputting the personnel resume keyword to be searched for, chooses whether to be associated search and associated number, and
It submits;
If 2), search is without association search, only inquires all personnel resumes with the keyword of input and return
Data;
If 3), search has selected to be associated search, before search, with the keyword of input from association analysis
Key item and the degree of association out obtain degree of association ranking in preceding, all associations of the associated number of selection, and are owned
The search of the personnel resume for the key item that association comes out;
4) personnel resume, checked out indicates the associations and its degree of association for being associated and checking out thereunder,
To provide the guide for being associated search for seeker.
The method of the present invention is based on the efficient semantic analysis of LAS (latent semantic analysis) algorithm, and is based on Apriori algorithm pair
Resume feature is associated matching, obtains the key message in the content of big quantity personnel resume, obtains institute by association analysis
There are each associations and the degree of association that the degree of association of personnel resume is high, to provide one during carrying out personnel resume search
The method that kind improves search matching degree, improves the efficiency of search.
Claims (6)
1. a kind of method for improving search matching degree in big quantity personnel resume, it is characterised in that: the method is to be based on
LAS algorithm and Apriori algorithm obtain the representative word of big quantity personnel resume, are based on using the semantic analysis of LAS algorithm
Apriori algorithm is associated matching to resume feature, high each of the degree of association for obtaining all personnel resumes by association analysis
Associations and the degree of association.
2. according to the method described in claim 1, it is characterized by: the method comprising the following specific steps
Step 1, the content for obtaining big quantity personnel resume, carry out delete processing by punctuation mark respectively, and formation can be used for LAS calculation
The resume content of method input;
Step 2, input be each treated biographic information, building form word-document matrix based on each personnel resume, each
Personnel resume analysis obtains multiple frequency of occurrence for representing word and each word, is counted by the number of appearance;
Step 3 handles the word frequency of each resume after statistics, and removal common existing probability in all resumes is small
In 10% representative word;The representative word of treated each resume is carried out one-to-many preservation by resume feature, is formed
The feature of each resume;
Step 4: using the word frequency of each resume after acquisition as feature, while being input in the Apriori algorithm of building and being closed
Connection analysis, after the completion of fully entering, obtains the information of all degrees of association;
Step 5: after personnel resume association analysis, obtaining the degree of association for analyzing and;For there are other correlatives of search term
The data of remittance, it is both comprehensive to be arranged from big to small by incidence coefficient, save each incidence coefficient and associated item;
Step 6: when carrying out personnel resume search, after inputting keyword, obtaining correlation from analyzing in the degree of association arrangement come
Other keywords of connection, are obtained by the degree of association is maximum;Then relevant personnel resume is searched for, the range of search is expanded.
3. according to the method described in claim 2, it is characterized by:
In the step 1, the content of personnel resume based on web crawlers by being obtained or being obtained by the document of centralization.
4. according to the method described in claim 2, it is characterized by:
In the step 4, obtains the high information of the degree of association and saved;The Apriori example that can be reserved for building simultaneously, is used
In subsequent if there is being used when new personnel resume addition.
5. according to the method described in claim 2, it is characterized by:
In the step 5, the calculating of the degree of association, the confidence level of degree of incidence and appearance including appearance, the meter of confidence level
It calculates and is based on bayesian algorithm;
The associated confidence including specific associations and the degree of association and associations and other associations need to be saved when preservation.
6. according to the method described in claim 2, it is characterized by:
Specific step is as follows for the step 6:
1) it, after inputting the personnel resume keyword to be searched for, chooses whether to be associated search and associated number, and mention
It hands over;
If 2), selection is without association search, only inquires all personnel resumes with the keyword match of input and return
Data;
If 3), selection is associated search, before search, the key item that is obtained from association analysis with the keyword of input
And the degree of association, degree of association ranking is obtained preceding, meets all associations of degree of incidence, and carries out the keys that all associations come out
The personnel resume search of item;
4) personnel resume, checked out is indicated the associations and its degree of association for being associated and checking out thereunder, is provided
It is associated the guide of search.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811542296.4A CN109710843A (en) | 2018-12-17 | 2018-12-17 | A method of improving search matching degree in big quantity personnel resume |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811542296.4A CN109710843A (en) | 2018-12-17 | 2018-12-17 | A method of improving search matching degree in big quantity personnel resume |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109710843A true CN109710843A (en) | 2019-05-03 |
Family
ID=66256672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811542296.4A Withdrawn CN109710843A (en) | 2018-12-17 | 2018-12-17 | A method of improving search matching degree in big quantity personnel resume |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109710843A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117312397A (en) * | 2023-10-18 | 2023-12-29 | 广东倍智人才科技股份有限公司 | Talent supply chain management method and system based on big data |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106980961A (en) * | 2017-03-02 | 2017-07-25 | 中科天地互联网科技(苏州)有限公司 | A kind of resume selection matching process and system |
-
2018
- 2018-12-17 CN CN201811542296.4A patent/CN109710843A/en not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106980961A (en) * | 2017-03-02 | 2017-07-25 | 中科天地互联网科技(苏州)有限公司 | A kind of resume selection matching process and system |
Non-Patent Citations (1)
Title |
---|
肖云鹏等: "以招聘就业大数据为基础反馈教学的‘校企学’服务模式思考", 《当代教育实践与教学研究》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117312397A (en) * | 2023-10-18 | 2023-12-29 | 广东倍智人才科技股份有限公司 | Talent supply chain management method and system based on big data |
CN117312397B (en) * | 2023-10-18 | 2024-03-22 | 广东倍智人才科技股份有限公司 | Talent supply chain management method and system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109815308B (en) | Method and device for determining intention recognition model and method and device for searching intention recognition | |
CN109447266B (en) | Agricultural scientific and technological service intelligent sorting method based on big data | |
CN102902821B (en) | The image high-level semantics mark of much-talked-about topic Network Based, search method and device | |
CN104199857B (en) | A kind of tax document hierarchy classification method based on multi-tag classification | |
US8983971B2 (en) | Method, apparatus, and system for mobile search | |
CN104933100B (en) | keyword recommendation method and device | |
CN105045875B (en) | Personalized search and device | |
CN105808590B (en) | Search engine implementation method, searching method and device | |
CN105653840A (en) | Similar case recommendation system based on word and phrase distributed representation, and corresponding method | |
CN110781317A (en) | Method and device for constructing event map and electronic equipment | |
CN103838833A (en) | Full-text retrieval system based on semantic analysis of relevant words | |
CN106874441A (en) | Intelligent answer method and apparatus | |
CN105095433A (en) | Recommendation method and device for entities | |
CN107329995A (en) | A kind of controlled answer generation method of semanteme, apparatus and system | |
CN104794154A (en) | O2O service quality evaluation model for medical apparatus based on text mining | |
Du et al. | An approach for selecting seed URLs of focused crawler based on user-interest ontology | |
CN104484380A (en) | Personalized search method and personalized search device | |
CN112307182B (en) | Question-answering system-based pseudo-correlation feedback extended query method | |
CN109948154B (en) | Character acquisition and relationship recommendation system and method based on mailbox names | |
CN106407377A (en) | Search method and device based on artificial intelligence | |
CN104281565A (en) | Semantic dictionary constructing method and device | |
CN106599174A (en) | Real-time news recommendation system and method thereof | |
CN110866102A (en) | Search processing method | |
CN115640458A (en) | Remote sensing satellite information recommendation method, system and equipment | |
CN112685440B (en) | Structural query information expression method for marking search semantic role |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190503 |