WO2017215242A1 - Method and device for searching resumes - Google Patents

Method and device for searching resumes Download PDF

Info

Publication number
WO2017215242A1
WO2017215242A1 PCT/CN2016/113140 CN2016113140W WO2017215242A1 WO 2017215242 A1 WO2017215242 A1 WO 2017215242A1 CN 2016113140 W CN2016113140 W CN 2016113140W WO 2017215242 A1 WO2017215242 A1 WO 2017215242A1
Authority
WO
WIPO (PCT)
Prior art keywords
related word
keyword
resume
words
word
Prior art date
Application number
PCT/CN2016/113140
Other languages
French (fr)
Chinese (zh)
Inventor
李贤�
Original Assignee
广州视源电子科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司 filed Critical 广州视源电子科技股份有限公司
Publication of WO2017215242A1 publication Critical patent/WO2017215242A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2425Iterative querying; Query formulation based on the results of a preceding query

Definitions

  • the invention relates to the field of computer information retrieval, and in particular to a resume search method and device.
  • the user provides related words, increasing the amount of words input by the user, and the user experience is poor; on the other hand, since the related words are provided by the user, there is a possibility that the related words: Language is used and expanded by people, and the expansion of words will change with time. The related words expanded by keywords will also follow the change. If the related words have delays in aging, and the related words are related to the key It will also change with time, and the resume retrieved with this related word is likely to not meet the current employer's demand for talent.
  • the embodiment of the invention provides a resume search method, which provides an accurate related word for the user and improves the accuracy of the resume search.
  • the set of the related words of the keyword and the set of the related words of the to-be-tested related words in the to-be-tested word set are respectively obtained from the vocabulary database, and each of the A comparison word set of the related words to be tested is compared with the set of related words to be tested to obtain the correlation degree between the related word set and each related word in the related word set, specifically:
  • the term containing the inquisitive related word is obtained from the term database according to the inquiring related word, and the inquiring related words are obtained
  • the entry is subjected to word segmentation and screening, and the control word set of the related words to be tested is obtained;
  • the to-be-relevant related word is a related word of the keyword, and the related word set is obtained; Wherein the absolute value is used as the relevance of the related word to the keyword.
  • the weight value of the keyword and the weight value of each of the related words when performing the resume search are calculated, specifically:
  • Base score formula Calculating a reference score S i of the i-th related word in the related word set; wherein r i is a correlation degree of the i-th related word in the related word set; r min is all in the related word set The minimum value of the relevance of the related words; r max is the maximum value of the relevance of all related words in the related word set;
  • the method further includes:
  • the keyword, the weight value occupied by the keyword, each of the related words, and the weight value of each of the related words are submitted to a search engine to Search for a resume in the database, specifically:
  • the keyword and each of the related words included in the resume are displayed in a highlighted form.
  • an embodiment of the present invention further provides a resume search device, including:
  • a receiving module configured to receive keywords used for searching for a resume from a resume database
  • a related word set obtaining module configured to acquire, from the vocabulary database, a set of related words of the keyword to be tested and a control word set of each of the to-be-reviewed related words in the to-be-tested related word set according to the keyword, And comparing the comparison word set of each of the to-be-tested related words with the to-be-tested related word set to obtain a correlation degree between the related word set and each related word in the related word set; wherein the correlation degree is expressed as The degree of relevance of related words to the keywords;
  • a weight calculation module configured to calculate, according to the relevance of each related word, a weight value of the keyword and a weight value of each of the related words when performing a resume search
  • a search module configured to submit, by the keyword, a weight value occupied by the keyword, each of the related words, and a weight value of each of the related words to a search engine, to use the resume database Search for a resume.
  • the related word set obtaining module specifically includes:
  • the related word set unit is configured to obtain an entry including the keyword from the vocabulary database according to the keyword, and perform word segmentation and screening on the term to obtain a related word set to be tested;
  • a comparison word set unit configured to acquire, for each of the to-be-recognized related words in the to-be-tested related word set, an entry including the to-be-tested related word from the entry database according to the inquiring related word, and Performing word segmentation and screening on the words of the related words to be tested, and obtaining a control word set of the related words to be tested;
  • a determining unit configured to determine, when the absolute value of the intersection of the comparison word set of the to-be-tested related word and the to-be-tested related word set is greater than a screening threshold, the related word to be tested is a related word of the keyword Obtaining a related word set; wherein the absolute value is used as a relevance of the related word to the keyword.
  • weight calculation module specifically includes:
  • Benchmark score calculation unit for formulating a score based on a benchmark Calculating a reference score S i of the i-th related word in the related word set; wherein r i is a correlation degree of the i-th related word in the related word set; r min is all in the related word set The minimum value of the relevance of the related words; r max is the maximum value of the relevance of all related words in the related word set;
  • a sum calculation unit for summing the reference scores of all the related words to obtain a total score of the reference scores sum
  • a related word weight calculation unit configured to determine whether a reference score of the i-th related word is greater than W A /n; if yes, according to a second weight formula Calculating a weight value W i of the i-th related word when performing a resume search; if not, according to the third weight formula Calculating a weight value W i of the i-th related word when performing a resume search; wherein n is a weight coefficient, and k is the number of related words in the related word set.
  • the weight calculation module further includes:
  • a mean value calculation unit configured to: according to the mean value formula, before determining whether the reference score of the i-th related word is greater than W A /n Calculate the mean of the benchmark score among them, The average of the correlations of all related words in the related word set;
  • a determining unit configured to determine, for the i-th related word in the related word set, whether the reference score S i of the i-th related word is greater than the reference score mean
  • An update unit configured to: when determining that the reference score S i of the i-th related word is greater than the reference score mean By updating the formula Updating the reference score S i of the i-th related word.
  • searching module specifically includes:
  • a weight associating unit configured to associate, according to a weight association format of the search engine, a weight value occupied by the keyword and the keyword as a first combination, and a weight value occupied by each of the related words Corresponding related words are associated with the second combination;
  • Searching a display unit configured to submit the first combination and the second combination to a search engine, for the search engine to search for a resume from the resume database, and display according to a sorting algorithm built in the search engine The searched resume; wherein the keyword and each of the related words included in the resume are displayed in a highlighted form.
  • the user only needs to input a keyword, and can obtain relevant words related to the keyword, and determine the degree of relevance of the related word to the keyword, and satisfy the time-sensitive requirement of the related word. It is possible to avoid the appearance of resumes corresponding to outdated skills in the field corresponding to the keyword in the process of retrieving the resume.
  • the resumes matching the fields corresponding to the keywords can be more accurately found in the retrieval process, so that the retrieved resumes are more reasonable, that is, Improve the accuracy of your resume search.
  • FIG. 1 is a schematic flow chart of an embodiment of a resume search method provided by the present invention
  • FIG. 2 is a schematic flow chart of an embodiment of step S2 of the resume search method provided in FIG. 1;
  • FIG. 3 is a schematic structural diagram of an embodiment of a resume search device provided by the present invention.
  • FIG. 4 is a schematic structural diagram of an embodiment of a related word set obtaining module of a resume search device provided by the present invention
  • FIG. 5 is a schematic structural diagram of an embodiment of a weight calculation module of a resume search device provided by the present invention.
  • FIG. 6 is a schematic structural diagram of an embodiment of a search module of a resume search device provided by the present invention.
  • FIG. 1 is a schematic flowchart diagram of an embodiment of a resume search method provided by the present invention, where the method includes steps S1 to S4, as follows:
  • FIG. 2 is a schematic flowchart of an embodiment of the step S2 of the resume search method provided in FIG. 1, and the implementation manner of the above step S2 is specifically described:
  • the term containing the keyword Java and sorted before the Mth position is obtained from the paper database, for example, the first 50 pages of the abstract are used as entries, or, in the wiki, the wiki is searched for. Keywords the first 500 abstracts of Java;
  • Formatting the terms according to the standard entry format for example, unifying lowercase in the entry into uppercase, deleting extra spaces in the entry, punctuation in the unified entry, or formatting the entry in full-width or The half-width format is unified into one type.
  • the word segmentation tool is called; preferably, the word segmentation tool is a jieba word segmentation tool, but is not limited to this word segmentation tool.
  • Extracting the words belonging to the core words of the user dictionary from the first word set as the related words ⁇ a 1 , . . . , a n ⁇ to obtain the related words set A ⁇ a 1 ,...,a n ⁇ .
  • the user dictionary may be added through the word segmentation tool or through the resume search device, and the core word provided by the dictionary may be used to extract the core word from the first word set as the related word to be tested.
  • this step S22 is the same as the specific implementation process of the previous step S21, except that the keyword in the step S21 is changed to the related word ⁇ a 1 , . . . , a n ⁇ , and then the obtained word is obtained.
  • quarantine a i related words of the set of words associated quarantine B ai ⁇ b i1, ... , b in ⁇ as a control experiment to be the set of words associated word a i, and thus are not repeated here.
  • the related word set of the keyword is obtained, that is, the related words of the through keyword are compared and matched with the related words of the related words to select related words, and according to the matching value (The above absolute value is used to determine the relevance of the selected related words, which can filter out noise words that are not related to the keywords, and improve the efficiency of acquiring the lower related words.
  • the vocabulary database preferably the essay database, is updated in real time, the related words obtained from the vocabulary database are both immediacy and can be extended and described around the keywords.
  • step S3 is:
  • Base score formula Calculating a reference score S i of the i-th related word in the related word set; wherein r i is a correlation degree of the i-th related word in the related word set; r min is all in the related word set The minimum value of the relevance of the related words; r max is the maximum value of the relevance of all related words in the related word set;
  • the purpose of calculating the benchmark score of each related word in the related word set is to base the correlation degree of the related words on the same benchmark, and represent the degree of correlation between the related words and the keywords in the form of a score.
  • the reference value is preferably r max -r min +1; the first weight formula adopts a logarithmic formula, and when the total value of the benchmark score grows too fast, the weight value of the keyword still maintains a moderate growth, and thus, the relevant words can be In a relatively large number of cases, avoid the occurrence of excessive weight of keywords; when calculating the weight of related words, the second weight formula and the third weight formula are used respectively to calculate, considering the relevant words.
  • the weight value of the related word is greater than the weight value of the keyword, or is much smaller than the weight value of the keyword, that is, during the process of searching for the resume, It can avoid the situation that the relevant words are overwhelmed by keywords and the keywords are too prominent; the setting of the weight coefficient can be determined by the actual situation, and the value is generally 2 or 4.
  • the method further includes:
  • the correlation degree of the related words in the related word set is based on the same benchmark, and the benchmark is the same as the above benchmark, and the purpose is to represent the average correlation between the related word set and the keyword in the form of a score.
  • Degree when the degree of correlation between a related word and a keyword (ie, the benchmark score) is greater than the average degree of correlation (average score) between the related word set and the keyword, the degree of correlation between the related word and the keyword may be The difference between the correlation degree of the related word set and the average degree of relevance of the keyword is magnified 10 times as a reference score of the degree of relevance of the related word to the keyword; otherwise, the related word and the keyword are maintained The original benchmark score of the degree of relevance, which more accurately expresses the relevance of the related words to the keywords, making the relationship between related words and keywords more reasonable.
  • step S4 is:
  • the keyword and each of the related words included in the resume are displayed in a highlighted form.
  • the commonly used search engine is the solr search engine, and the above weight association format can be associated according to the following format: keyword ⁇ keyword weight, related word 1 ⁇ related word 1 weight, related word 2 ⁇ related words 2 weights...
  • the user only needs to input a keyword, and can obtain related words related to the keyword, and determine the degree of relevance of the related words to the keyword, and satisfy the timeliness requirement of the related words.
  • a resume corresponding to an outdated skill in the field corresponding to the keyword is avoided.
  • the resumes matching the fields corresponding to the keywords can be more accurately found in the search process, so that the retrieved resumes are more reasonable. That is to improve the accuracy of the resume search.
  • the resume search device can implement the entire flow of the resume search method, and the structure thereof is as follows:
  • the receiving module 10 is configured to receive keywords used for searching for a resume from a resume database
  • the related word set obtaining module 20 is configured to obtain, according to the keyword, a set of related words of the keyword to be tested and a set of related words of the to-be-tested related word in the to-be-tested related word set respectively from the entry database And comparing the comparison word set of each of the to-be-tested related words with the to-be-tested related word set to obtain a correlation degree between the related word set and each related word in the related word set; wherein the correlation degree is represented The degree of relevance of related words to the keywords;
  • the weight calculation module 30 is configured to calculate, according to the relevance of each related word, a weight value occupied by the keyword and a weight value of each of the related words when performing a resume search;
  • a search module 40 configured to use the keyword, a weight value occupied by the keyword, each of the related words, and the The weight value of each related word is submitted to the search engine to search for the resume from the resume database.
  • FIG. 4 it is a schematic structural diagram of an embodiment of a related word set obtaining module of the resume search device provided by the present invention.
  • the related word set obtaining module 20 specifically includes:
  • the related word set unit 21 is configured to obtain an entry including the keyword from the vocabulary database according to the keyword, and perform word segmentation and screening on the term to obtain a related word set to be tested;
  • a comparison word set unit 22 configured to acquire, for each of the to-be-recognized related words in the to-be-tested related word set, an entry including the to-be-recognized related word from the entry database according to the inquiring related word, And performing word segmentation and screening on the words of the related words to be tested, and obtaining a comparison word set of the related words to be tested;
  • the determining obtaining unit 23 is configured to: when determining that the absolute value of the intersection of the comparison word set of the to-be-tested related word and the inquiring related word set is greater than a screening threshold, the to-be-relevant related word is related to the keyword a word, obtaining a related word set; wherein the absolute value is used as a relevance of the related word to the keyword.
  • FIG. 5 it is a schematic structural diagram of an embodiment of a weight calculation module of the resume search device provided by the present invention.
  • the weight calculation module 30 specifically includes:
  • a reference score calculation unit 31 for using a reference score formula Calculating a reference score S i of the i-th related word in the related word set; wherein r i is a correlation degree of the i-th related word in the related word set; r min is all in the related word set The minimum value of the relevance of the related words; r max is the maximum value of the relevance of all related words in the related word set;
  • the sum calculation unit 32 is configured to sum the reference scores of all the related words to obtain a reference score total value sum
  • the related word weight calculation unit 34 is configured to determine whether the reference score of the i-th related word is greater than W A /n; if yes, according to the second weight formula Calculating a weight value W i of the i-th related word when performing a resume search; if not, according to the third weight formula Calculating a weight value W i of the i-th related word when performing a resume search; wherein n is a second weight coefficient, and k is the number of related words in the related word set.
  • weight calculation module 30 further includes:
  • the mean value calculating unit 35 is configured to: according to the mean value formula, before determining whether the reference score of the ith related word is greater than W A /n Calculate the mean of the benchmark score among them, The average of the correlations of all related words in the related word set;
  • the determining unit 36 is configured to determine, for the i-th related word in the related word set, whether the reference score S i of the i-th related word is greater than the reference score mean
  • the updating unit 37 is configured to: when determining that the reference score S i of the ith related word is greater than the reference score mean By updating the formula Updating the reference score S i of the i-th related word.
  • the search module 40 specifically includes:
  • the weight associating unit 41 is configured to associate, according to a weight association format of the search engine, a weight value occupied by the keyword with the keyword as a first combination, and a weight of each of the related words The value is associated with its corresponding related word as a second combination;
  • a search display unit 42 for submitting the first combination and the second combination to a search engine for the search engine to search for a resume from the resume database, and according to a sorting algorithm built in the search engine
  • the searched resume is displayed; wherein the keyword and each of the related words included in the resume are displayed in a highlighted form.
  • the resume search device provided by the embodiment of the present invention can obtain relevant words related to the keyword by inputting a keyword, and determine the degree of relevance of the related word to the keyword, and satisfy the timeliness requirement of the related word. It is possible to avoid the appearance of resumes corresponding to outdated skills in the field corresponding to the keyword in the process of retrieving the resume. In addition, by distinguishing the weight values of the keywords and related words in the resume search process, the resumes matching the keywords corresponding fields can be more accurately found in the search process, so that the retrieved resumes are more reasonable and the resume is improved. The accuracy of the search.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Fuzzy Systems (AREA)
  • Software Systems (AREA)

Abstract

A method for searching resumes comprises: receiving keywords for use in searching resumes in a resume database (S1); on the basis of said keywords, separately obtaining from an entry database a keyword-related word set to be verified and a comparative word set for each relevant word to be verified in said word set to be verified, then comparing each such comparative word set for each relevant word to be verified to said word set to be verified and obtaining thereby the degree of relevance of each relevant word in each related word set (S2); on the basis of the degree of relevance of each relevant word, calculating the weight of each keyword and the weight of each relevant word when performing a resume search (S3); forwarding to a search engine the keywords and the weights thereof, and the relevant words and the weights thereof, so as to retrieve resumes from a resume database (S4). A corresponding resume search device is also provided. Use of the present solution provides users with accurate relevant words thus enhancing the precision of resume searches.

Description

简历搜索方法及装置Resume search method and device 技术领域Technical field
本发明涉及计算机信息检索领域,尤其涉及一种简历搜索方法及装置。The invention relates to the field of computer information retrieval, and in particular to a resume search method and device.
背景技术Background technique
在以往对简历的搜索匹配过程中,存在两种方法:一种是直接关键词检索;另一种是关键词+相关词检索;但是发明人在实施过程发现上述方案存在以下问题:In the past search and match process of resumes, there are two methods: one is direct keyword search; the other is keyword + related word search; but the inventor found in the implementation process that the above problems existed in the following problems:
对于第一种方案,仅考虑关键词的检索过程中出现的词频来获取检索结果,即,偏向与该关键词相关的技能的简历排序在前,难以综合考虑简历的综合技能,不便于综合评估简历本人的综合技能。For the first scheme, only the word frequency appearing in the keyword retrieval process is used to obtain the retrieval result, that is, the curriculum of the skills related to the keyword is ranked first, and it is difficult to comprehensively consider the comprehensive skills of the resume, which is inconvenient for comprehensive evaluation. Resume my comprehensive skills.
对于第二种方案,一方面是由用户提供相关词,增加用户输入的词量,用户体验差;另一方面由于相关词是由用户提供的,则相关词会存在一种可能性:随着语言被人们使用和拓展,词的拓展也会随着时间的变化,则关键词拓展出来的相关词也会跟随着变化,若相关词存在时效上的延时,且相关词与关键的相关程度也是会随时间的变化而变化的,则用该相关词检索出的简历很有可能不符合当前雇主对人才的需求。For the second scheme, on the one hand, the user provides related words, increasing the amount of words input by the user, and the user experience is poor; on the other hand, since the related words are provided by the user, there is a possibility that the related words: Language is used and expanded by people, and the expansion of words will change with time. The related words expanded by keywords will also follow the change. If the related words have delays in aging, and the related words are related to the key It will also change with time, and the resume retrieved with this related word is likely to not meet the current employer's demand for talent.
发明内容Summary of the invention
本发明实施例提出一种简历搜索方法,为用户提供准确的相关词,提高简历搜索的准确度。The embodiment of the invention provides a resume search method, which provides an accurate related word for the user and improves the accuracy of the resume search.
本发明实施例提供的一种简历搜索方法,包括:A resume search method provided by an embodiment of the present invention includes:
接收用于从简历数据库中进行简历搜索的关键词;Receiving keywords for conducting a resume search from a resume database;
根据所述关键词,分别从词条数据库中获取所述关键词的待验相关词集和所述待验相关词集中每一个待验相关词的对照词集,并将所述每一个待验相关词的对照词集与所述待验相关词集进行比较获取相关词集和所述相关词集中每一相关词的相关度;其中,所述相关度表示为相关词与所述关键词的相关程度;Obtaining, according to the keyword, a test word set of the keyword to be tested and a control word set of each of the to-be-tested related words in the to-be-tested related word set, respectively, and each of the pending test words Comparing the set of related words of the related words with the set of related words to be tested to obtain the relevance of the related words set and each related word in the related word set; wherein the relevance is expressed as the related words and the keywords Relevance;
根据每一个相关词的相关度,计算出在进行简历搜索时所述关键词所占的权重值和所述每一个相关词所占的权重值;Calculating, according to the relevance of each related word, the weight value of the keyword and the weight value of each of the related words when performing the resume search;
将所述关键词、所述关键词所占的权重值、所述每一个相关词和所述每一个相关词所占的权重值提交给搜索引擎,以从所述简历数据库中搜索出简历。 And submitting, by the keyword, the weight value occupied by the keyword, each of the related words, and the weight value of each of the related words to a search engine, to search for a resume from the resume database.
进一步地,根据所述关键词,分别从词条数据库中获取所述关键词的待验相关词集和所述待验相关词集中每一个待验相关词的对照词集,并将所述每一个待验相关词的对照词集与所述待验相关词集进行比较获取相关词集和所述相关词集中每一相关词的相关度,具体为:Further, according to the keyword, the set of the related words of the keyword and the set of the related words of the to-be-tested related words in the to-be-tested word set are respectively obtained from the vocabulary database, and each of the A comparison word set of the related words to be tested is compared with the set of related words to be tested to obtain the correlation degree between the related word set and each related word in the related word set, specifically:
根据所述关键词从词条数据库中获取包含所述关键词的词条,并对所述词条进行分词和筛选,获得待验相关词集;Obtaining an entry containing the keyword from the vocabulary database according to the keyword, and performing word segmentation and screening on the term, and obtaining a related word set to be tested;
对于所述待验相关词集中的每一个待验相关词,根据所述待验相关词从所述词条数据库中获取包含所述待验相关词的词条,并对所述待验相关词的词条进行分词和筛选,获得所述待验相关词的对照词集;For each of the to-be-recognized related words in the to-be-tested related word set, the term containing the inquisitive related word is obtained from the term database according to the inquiring related word, and the inquiring related words are obtained The entry is subjected to word segmentation and screening, and the control word set of the related words to be tested is obtained;
当判定所述待验相关词的对照词集与所述待验相关词集的交集的绝对值大于筛选阈值时,所述待验相关词为所述关键词的相关词,获得相关词集;其中,所述绝对值作为所述相关词与所述关键词的相关度。When it is determined that the absolute value of the intersection of the control word set of the to-be-tested related word and the to-be-tested related word set is greater than a screening threshold, the to-be-relevant related word is a related word of the keyword, and the related word set is obtained; Wherein the absolute value is used as the relevance of the related word to the keyword.
进一步地,所述根据每一个相关词的相关度,计算出在进行简历搜索时所述关键词所占的权重值和所述每一个相关词所占的权重值,具体为:Further, according to the relevance of each related word, the weight value of the keyword and the weight value of each of the related words when performing the resume search are calculated, specifically:
根据基准分值公式
Figure PCTCN2016113140-appb-000001
计算出所述相关词集中第i个相关词的基准分值Si;其中,ri为所述相关词集中的第i个相关词的相关度;rmin为在所述相关词集中的所有相关词的相关度的最小值;rmax为在所述相关词集中的所有相关词的相关度的最大值;
Base score formula
Figure PCTCN2016113140-appb-000001
Calculating a reference score S i of the i-th related word in the related word set; wherein r i is a correlation degree of the i-th related word in the related word set; r min is all in the related word set The minimum value of the relevance of the related words; r max is the maximum value of the relevance of all related words in the related word set;
对所述所有相关词的基准分值进行求和,获得基准分值总值sum;Calculating the benchmark scores of all the related words to obtain a total score of the benchmark scores sum;
根据第一权重公式WA=5+log1.5(sum+1),计算出所述关键词在进行简历搜索时所占的权重值WA;其中,A为所述关键词;Calculating, according to the first weight formula W A =5+log 1.5 (sum+1), a weight value W A of the keyword when performing a resume search; wherein A is the keyword;
判断所述第i个相关词的基准分值是否大于WA/n;若是,则根据第二权重公式
Figure PCTCN2016113140-appb-000002
计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;若否,则根据第三权重公式
Figure PCTCN2016113140-appb-000003
计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;其中,n为权重系数,k为所述相关词集中的相关词的数量。
Determining whether the reference score of the i-th related word is greater than W A /n; if yes, according to the second weight formula
Figure PCTCN2016113140-appb-000002
Calculating a weight value W i of the i-th related word when performing a resume search; if not, according to the third weight formula
Figure PCTCN2016113140-appb-000003
Calculating a weight value W i of the i-th related word when performing a resume search; wherein n is a weight coefficient, and k is the number of related words in the related word set.
再进一步地,在所述判断所述第i个相关词的基准分值是否大于WA/n之前,还包括:Further, before the determining whether the reference score of the i-th related word is greater than W A /n, the method further includes:
根据均值公式
Figure PCTCN2016113140-appb-000004
计算基准分值均值
Figure PCTCN2016113140-appb-000005
其中,
Figure PCTCN2016113140-appb-000006
为所述相关词集中的所有相关词的相关度的平均值;
According to the mean formula
Figure PCTCN2016113140-appb-000004
Calculate the mean of the benchmark score
Figure PCTCN2016113140-appb-000005
among them,
Figure PCTCN2016113140-appb-000006
The average of the correlations of all related words in the related word set;
对于所述相关词集中的第i个相关词,判断所述第i个相关词的基准分值Si是否大于所述基准分值均值
Figure PCTCN2016113140-appb-000007
Determining, for the i-th related word in the related word set, whether the reference score S i of the i-th related word is greater than the reference score mean
Figure PCTCN2016113140-appb-000007
若是,通过更新公式
Figure PCTCN2016113140-appb-000008
更新所述第i个相关词的基准分值Si。更进一步地,所述将所述关键词、所述关键词所占的权重值、所述每一个相关词和所述每一个相关词所占的权重值提交给搜索引擎,以从所述简历数据库中搜索出简历,具体为:
If yes, by updating the formula
Figure PCTCN2016113140-appb-000008
Updating the reference score S i of the i-th related word. Further, the keyword, the weight value occupied by the keyword, each of the related words, and the weight value of each of the related words are submitted to a search engine to Search for a resume in the database, specifically:
根据搜索引擎的权重关联格式,将所述关键词所占的权重值和所述关键词相关联作为第一组合,以及将所述每一个相关词所占的权重值与其对应的相关词相关联作为第二组合;Correlating a weight value occupied by the keyword with the keyword as a first combination according to a weight association format of a search engine, and associating a weight value of each of the related words with a corresponding related word thereof As a second combination;
将所述第一组合和所述第二组合提交给搜索引擎,以供所述搜索引擎从所述简历数据库中搜索出简历,并根据所述搜索引擎内置的排序算法显示搜索出的简历;其中,所述简历中包含的所述关键词和所述每一个相关词以高亮形式显示。Submitting the first combination and the second combination to a search engine for the search engine to search for a resume from the resume database, and displaying the searched resume according to a sorting algorithm built into the search engine; The keyword and each of the related words included in the resume are displayed in a highlighted form.
相应地,本发明实施例还提供一种简历搜索装置,包括:Correspondingly, an embodiment of the present invention further provides a resume search device, including:
接收模块,用于接收用于从简历数据库中进行简历搜索的关键词;a receiving module, configured to receive keywords used for searching for a resume from a resume database;
相关词集获取模块,用于根据所述关键词,分别从词条数据库中获取所述关键词的待验相关词集和所述待验相关词集中每一个待验相关词的对照词集,并将所述每一个待验相关词的对照词集与所述待验相关词集进行比较获取相关词集和所述相关词集中每一相关词的相关度;其中,所述相关度表示为相关词与所述关键词的相关程度;a related word set obtaining module, configured to acquire, from the vocabulary database, a set of related words of the keyword to be tested and a control word set of each of the to-be-reviewed related words in the to-be-tested related word set according to the keyword, And comparing the comparison word set of each of the to-be-tested related words with the to-be-tested related word set to obtain a correlation degree between the related word set and each related word in the related word set; wherein the correlation degree is expressed as The degree of relevance of related words to the keywords;
权重计算模块,用于根据每一个相关词的相关度,计算出在进行简历搜索时所述关键词所占的权重值和所述每一个相关词所占的权重值;a weight calculation module, configured to calculate, according to the relevance of each related word, a weight value of the keyword and a weight value of each of the related words when performing a resume search;
搜索模块,用于将所述关键词、所述关键词所占的权重值、所述每一个相关词和所述每一个相关词所占的权重值提交给搜索引擎,以从所述简历数据库中搜索出简历。a search module, configured to submit, by the keyword, a weight value occupied by the keyword, each of the related words, and a weight value of each of the related words to a search engine, to use the resume database Search for a resume.
进一步地,所述相关词集获取模块具体包括:Further, the related word set obtaining module specifically includes:
待验相关词集单元,用于根据所述关键词从词条数据库中获取包含所述关键词的词条,并对所述词条进行分词和筛选,获得待验相关词集;The related word set unit is configured to obtain an entry including the keyword from the vocabulary database according to the keyword, and perform word segmentation and screening on the term to obtain a related word set to be tested;
对照词集单元,用于对于所述待验相关词集中的每一个待验相关词,根据所述待验相关词从所述词条数据库中获取包含所述待验相关词的词条,并对所述待验相关词的词条进行分词和筛选,获得所述待验相关词的对照词集;a comparison word set unit, configured to acquire, for each of the to-be-recognized related words in the to-be-tested related word set, an entry including the to-be-tested related word from the entry database according to the inquiring related word, and Performing word segmentation and screening on the words of the related words to be tested, and obtaining a control word set of the related words to be tested;
判断获取单元,用于当判定所述待验相关词的对照词集与所述待验相关词集的交集的绝对值大于筛选阈值时,所述待验相关词为所述关键词的相关词,获得相关词集;其中,所述绝对值作为所述相关词与所述关键词的相关度。a determining unit, configured to determine, when the absolute value of the intersection of the comparison word set of the to-be-tested related word and the to-be-tested related word set is greater than a screening threshold, the related word to be tested is a related word of the keyword Obtaining a related word set; wherein the absolute value is used as a relevance of the related word to the keyword.
再进一步地,所述权重计算模块具体包括: Further, the weight calculation module specifically includes:
基准分值计算单元,用于根据基准分值公式
Figure PCTCN2016113140-appb-000009
计算出所述相关词集中第i个相关词的基准分值Si;其中,ri为所述相关词集中的第i个相关词的相关度;rmin为在所述相关词集中的所有相关词的相关度的最小值;rmax为在所述相关词集中的所有相关词的相关度的最大值;
Benchmark score calculation unit for formulating a score based on a benchmark
Figure PCTCN2016113140-appb-000009
Calculating a reference score S i of the i-th related word in the related word set; wherein r i is a correlation degree of the i-th related word in the related word set; r min is all in the related word set The minimum value of the relevance of the related words; r max is the maximum value of the relevance of all related words in the related word set;
求和计算单元,用于对所述所有相关词的基准分值进行求和,获得基准分值总值sum;a sum calculation unit for summing the reference scores of all the related words to obtain a total score of the reference scores sum;
关键词权重计算单元,用于根据第一权重公式WA=5+log1.5(sum+1),计算出所述关键词在进行简历搜索时所占的权重值WA;其中,A为所述关键词;a keyword weight calculation unit, configured to calculate a weight value W A of the keyword when performing a resume search according to the first weight formula W A =5+log 1.5 (sum+1); wherein, A is Key words
相关词权重计算单元,用于判断所述第i个相关词的基准分值是否大于WA/n;若是,则根据第二权重公式
Figure PCTCN2016113140-appb-000010
计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;若否,则根据第三权重公式
Figure PCTCN2016113140-appb-000011
计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;其中,n为权重系数,k为所述相关词集中的相关词的数量。
a related word weight calculation unit, configured to determine whether a reference score of the i-th related word is greater than W A /n; if yes, according to a second weight formula
Figure PCTCN2016113140-appb-000010
Calculating a weight value W i of the i-th related word when performing a resume search; if not, according to the third weight formula
Figure PCTCN2016113140-appb-000011
Calculating a weight value W i of the i-th related word when performing a resume search; wherein n is a weight coefficient, and k is the number of related words in the related word set.
再进一步地,所述权重计算模块还包括:Further, the weight calculation module further includes:
均值计算单元,用于在所述判断所述第i个相关词的基准分值是否大于WA/n之前,根据均值公式
Figure PCTCN2016113140-appb-000012
计算基准分值均值
Figure PCTCN2016113140-appb-000013
其中,
Figure PCTCN2016113140-appb-000014
为所述相关词集中的所有相关词的相关度的平均值;
a mean value calculation unit, configured to: according to the mean value formula, before determining whether the reference score of the i-th related word is greater than W A /n
Figure PCTCN2016113140-appb-000012
Calculate the mean of the benchmark score
Figure PCTCN2016113140-appb-000013
among them,
Figure PCTCN2016113140-appb-000014
The average of the correlations of all related words in the related word set;
判断单元,用于对于所述相关词集中的第i个相关词,判断所述第i个相关词的基准分值Si是否大于所述基准分值均值
Figure PCTCN2016113140-appb-000015
a determining unit, configured to determine, for the i-th related word in the related word set, whether the reference score S i of the i-th related word is greater than the reference score mean
Figure PCTCN2016113140-appb-000015
更新单元,用于当判断所述第i个相关词的基准分值Si是大于所述基准分值均值
Figure PCTCN2016113140-appb-000016
时,通过更新公式
Figure PCTCN2016113140-appb-000017
更新所述第i个相关词的基准分值Si
An update unit, configured to: when determining that the reference score S i of the i-th related word is greater than the reference score mean
Figure PCTCN2016113140-appb-000016
By updating the formula
Figure PCTCN2016113140-appb-000017
Updating the reference score S i of the i-th related word.
更进一步地,所述搜索模块具体包括:Further, the searching module specifically includes:
权重相联单元,用于根据搜索引擎的权重关联格式,将所述关键词所占的权重值和所述关键词相关联作为第一组合,以及将所述每一个相关词所占的权重值与其对应的相关词相关联作为第二组合;a weight associating unit, configured to associate, according to a weight association format of the search engine, a weight value occupied by the keyword and the keyword as a first combination, and a weight value occupied by each of the related words Corresponding related words are associated with the second combination;
搜索显示单元,用于将所述第一组合和所述第二组合提交给搜索引擎,以供所述搜索引擎从所述简历数据库中搜索出简历,并根据所述搜索引擎内置的排序算法显示搜索出的简历;其中,所述简历中包含的所述关键词和所述每一个相关词以高亮形式显示。 Searching a display unit, configured to submit the first combination and the second combination to a search engine, for the search engine to search for a resume from the resume database, and display according to a sorting algorithm built in the search engine The searched resume; wherein the keyword and each of the related words included in the resume are displayed in a highlighted form.
实施本发明实施例,具有如下有益效果:Embodiments of the present invention have the following beneficial effects:
本发明实施例提供的简历搜索方法和装置,用户只需输入关键词,即可获得与该关键词相关的相关词,并确定相关词与该关键词的相关程度,满足相关词对时效性要求,能在检索简历过程避免出现与该关键词对应领域的过时的技能对应的简历。另外,通过计算出关键词和相关词在搜索简历过程中所占的权重值,在检索过程中可以更准确地找到与关键词对应领域下匹配的简历,使得检索出来的简历更为合理,即提高简历搜索的准确度。According to the resume search method and device provided by the embodiment of the present invention, the user only needs to input a keyword, and can obtain relevant words related to the keyword, and determine the degree of relevance of the related word to the keyword, and satisfy the time-sensitive requirement of the related word. It is possible to avoid the appearance of resumes corresponding to outdated skills in the field corresponding to the keyword in the process of retrieving the resume. In addition, by calculating the weight values of the keywords and related words in the process of searching for resumes, the resumes matching the fields corresponding to the keywords can be more accurately found in the retrieval process, so that the retrieved resumes are more reasonable, that is, Improve the accuracy of your resume search.
附图说明DRAWINGS
图1是本发明提供的简历搜索方法的一个实施例的流程示意图;1 is a schematic flow chart of an embodiment of a resume search method provided by the present invention;
图2是图1提供的简历搜索方法的步骤S2的一个实施例的流程示意图;2 is a schematic flow chart of an embodiment of step S2 of the resume search method provided in FIG. 1;
图3是本发明提供的简历搜索装置的一个实施例的结构示意图;3 is a schematic structural diagram of an embodiment of a resume search device provided by the present invention;
图4是本发明提供的简历搜索装置的相关词集获取模块的一个实施例的结构示意图;4 is a schematic structural diagram of an embodiment of a related word set obtaining module of a resume search device provided by the present invention;
图5是本发明提供的简历搜索装置的权重计算模块的一个实施例的结构示意图;5 is a schematic structural diagram of an embodiment of a weight calculation module of a resume search device provided by the present invention;
图6是本发明提供的简历搜索装置的搜索模块的一个实施例的结构示意图。FIG. 6 is a schematic structural diagram of an embodiment of a search module of a resume search device provided by the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
参见图1,是本发明提供的简历搜索方法的一个实施例的流程示意图,该方法包括步骤S1至S4,具体如下:FIG. 1 is a schematic flowchart diagram of an embodiment of a resume search method provided by the present invention, where the method includes steps S1 to S4, as follows:
S1,接收用于从简历数据库中进行简历搜索的关键词;S1, receiving keywords used for searching for a resume from a resume database;
S2,根据所述关键词,分别从词条数据库中获取所述关键词的待验相关词集和所述待验相关词集中每一个待验相关词的对照词集,并将所述每一个待验相关词的对照词集与所述待验相关词集进行比较获取相关词集和所述相关词集中每一相关词的相关度;其中,所述相关度表示为相关词与所述关键词的相关程度;S2. Acquire, according to the keyword, a test word set of the keyword to be tested and a control word set of each of the to-be-tested related words in the to-be-tested related word set, respectively, from the vocabulary database, and each of the The comparison word set of the related words to be tested is compared with the set of related words to be tested to obtain the relevance of the related word set and each related word in the related word set; wherein the correlation degree is expressed as the related word and the key The degree of relevance of the word;
S3,根据每一个相关词的相关度,计算出在进行简历搜索时所述关键词所占的权重值和所述每一个相关词所占的权重值;S3. Calculate, according to the relevance of each related word, a weight value of the keyword and a weight value of each of the related words when performing a resume search;
S4,将所述关键词、所述关键词所占的权重值、所述每一个相关词和所述每一个相关词所占的权重值提交给搜索引擎,以从所述简历数据库中搜索出简历。 S4, submit the keyword, the weight value occupied by the keyword, each of the related words, and the weight value of each of the related words to a search engine to search from the resume database. resume.
进一步地,下面将以关键词Java为例,结合图2,图2是图1提供的简历搜索方法的步骤S2的一个实施例的流程示意图,具体描述上述步骤S2的实施方式:Further, the keyword Java is taken as an example, and FIG. 2 is a schematic flowchart of an embodiment of the step S2 of the resume search method provided in FIG. 1, and the implementation manner of the above step S2 is specifically described:
S21,根据所述关键词Java从词条数据库(优选为包含论文的数据库,例如知网)中获取包含所述关键词Java的词条,并对所述词条进行分词和筛选,获得待验相关词集;此S21步骤具体实施过程如下:S21. Acquire an entry including the keyword Java from a vocabulary database (preferably a database containing a essay, such as HowNet) according to the keyword Java, and perform word segmentation and screening on the term to obtain a test. Related words set; the specific implementation process of this S21 step is as follows:
利用搜索引擎,根据所述关键词Java,从论文数据库中获取包含所述关键词Java且排序在第M位前的词条,例如,前50页论文摘要作为词条,或者,在维基中搜索关键词Java的前500条摘要;Using the search engine, according to the keyword Java, the term containing the keyword Java and sorted before the Mth position is obtained from the paper database, for example, the first 50 pages of the abstract are used as entries, or, in the wiki, the wiki is searched for. Keywords the first 500 abstracts of Java;
根据标准词条格式对所述词条进行格式调整;例如,将词条中的小写统一成大写、对词条中多余的空格删除、统一词条中的标点符号、将词条的全角格式或半角格式统一为一种等。Formatting the terms according to the standard entry format; for example, unifying lowercase in the entry into uppercase, deleting extra spaces in the entry, punctuation in the unified entry, or formatting the entry in full-width or The half-width format is unified into one type.
调用分词工具;优选地,所述分词工具为jieba分词工具,但不限于为此分词工具。The word segmentation tool is called; preferably, the word segmentation tool is a jieba word segmentation tool, but is not limited to this word segmentation tool.
利用所述分词工具对格式调整后的词条进行分词,获得第一词语集;Using the word segmentation tool to perform segmentation on the format-adjusted entry to obtain a first term set;
从所述第一词语集中提取属于用户词典的核心词的词语作为待验相关词{a1,...,an},获得待验相关词集A={a1,...,an}。需要说明的是,可通过分词工具或通过本简历搜索装置添加用户词典,利用词典提供的核心词,从所述第一词语集中提取核心词作为待验相关词。Extracting the words belonging to the core words of the user dictionary from the first word set as the related words {a 1 , . . . , a n } to obtain the related words set A={a 1 ,...,a n }. It should be noted that the user dictionary may be added through the word segmentation tool or through the resume search device, and the core word provided by the dictionary may be used to extract the core word from the first word set as the related word to be tested.
S22,对于所述待验相关词集A={a1,...,an}中的每一个待验相关词,根据所述待验相关词,从所述词条数据库中获取包含所述待验相关词的词条,并对所述待验相关词的词条进行分词和筛选,获得所述待验相关词的对照词集;S22. For each of the to-be-recognized related words in the set of related words A={a 1 , . . . , a n }, obtain the inclusion from the entry database according to the related words to be tested. Declaring the terms of the related words, and classifying and screening the words of the related words to be tested, and obtaining the control words of the related words to be tested;
需要说明的是,此步骤S22与上一个步骤S21的具体实施过程相同,只是区别在于步骤S21中的关键词变为待验相关词{a1,...,an},然后将所获得待验相关词ai的待验相关词集Bai={bi1,...,bin}作为待验相关词ai的对照词集,因而在此不再赘述。It should be noted that this step S22 is the same as the specific implementation process of the previous step S21, except that the keyword in the step S21 is changed to the related word {a 1 , . . . , a n }, and then the obtained word is obtained. quarantine a i related words of the set of words associated quarantine B ai = {b i1, ... , b in} as a control experiment to be the set of words associated word a i, and thus are not repeated here.
S23,当判定所述待验相关词ai的对照词集Bai={bi1,...,bin}与所述待验相关词集A={a1,...,an}的交集的绝对值r大于筛选阈值p时,所述待验相关词ai为所述关键词的相关词,获得相关词集A′={aj},且j∈{1,...,n}、|A′|≤n、|A∩Baj|>p;其中,所述绝对值r作为所述相关词在所述相关词集中具有的相关度。S23, when it is determined that the comparison word set B ai = {b i1 , ..., b in } of the to-be-tested related word a i and the to-be-tested related word set A={a 1 ,..., a n When the absolute value r of the intersection of } is greater than the screening threshold p, the related word a i is the related word of the keyword, and the related word set A'={a j } is obtained, and j∈{1,.. , n}, |A'|≤n, |A∩B aj |>p; wherein the absolute value r is the correlation of the related words in the related word set.
需要说明的是,通过上述步骤S21、S22和S23来获取关键词的相关词集,即利用通关键词的相关词与该相关词的相关词进行比较匹配来选取相关词,并根据匹配值(上述绝对值)来确定选取相关词的相关度,能滤除与关键词无关的噪音词,提高获取下位相关词的效率。另一方面,由于词条数据库,优选为论文数据库,是即时更新的,则从词条数据库中获得的相关词既具有即时性,又能围绕关键词进行拓展描述。It should be noted that, by using the above steps S21, S22, and S23, the related word set of the keyword is obtained, that is, the related words of the through keyword are compared and matched with the related words of the related words to select related words, and according to the matching value ( The above absolute value is used to determine the relevance of the selected related words, which can filter out noise words that are not related to the keywords, and improve the efficiency of acquiring the lower related words. On the other hand, since the vocabulary database, preferably the essay database, is updated in real time, the related words obtained from the vocabulary database are both immediacy and can be extended and described around the keywords.
进一步地,上述步骤S3的具体实施方式为: Further, the specific implementation manner of the foregoing step S3 is:
根据基准分值公式
Figure PCTCN2016113140-appb-000018
计算出所述相关词集中第i个相关词的基准分值Si;其中,ri为所述相关词集中的第i个相关词的相关度;rmin为在所述相关词集中的所有相关词的相关度的最小值;rmax为在所述相关词集中的所有相关词的相关度的最大值;
Base score formula
Figure PCTCN2016113140-appb-000018
Calculating a reference score S i of the i-th related word in the related word set; wherein r i is a correlation degree of the i-th related word in the related word set; r min is all in the related word set The minimum value of the relevance of the related words; r max is the maximum value of the relevance of all related words in the related word set;
对所述所有相关词的基准分值进行求和,获得基准分值总值sum;Calculating the benchmark scores of all the related words to obtain a total score of the benchmark scores sum;
根据第一权重公式WA=5+log1.5(sum+1),计算出所述关键词在进行简历搜索时所占的权重值WA;其中,A为所述关键词;Calculating, according to the first weight formula W A =5+log 1.5 (sum+1), a weight value W A of the keyword when performing a resume search; wherein A is the keyword;
判断所述第i个相关词的基准分值是否大于WA/n;若是,则根据第二权重公式
Figure PCTCN2016113140-appb-000019
计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;若否,则根据第三权重公式
Figure PCTCN2016113140-appb-000020
计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;其中,n为权重系数,k为所述相关词集中的相关词的数量。
Determining whether the reference score of the i-th related word is greater than W A /n; if yes, according to the second weight formula
Figure PCTCN2016113140-appb-000019
Calculating a weight value W i of the i-th related word when performing a resume search; if not, according to the third weight formula
Figure PCTCN2016113140-appb-000020
Calculating a weight value W i of the i-th related word when performing a resume search; wherein n is a weight coefficient, and k is the number of related words in the related word set.
需要说明的是,计算相关词集中每个相关词的基准分值的目的是将相关词的相关度基于同一个基准,以一个分值的形式,表示该相关词与关键词的相关程度,该基准值优选为rmax-rmin+1;第一权重公式采用对数形式的公式,在基准分值总值增长过快时,关键词的权重值仍保持缓和增长,因而,可以在相关词相对较多的情况下,避免关键词所占权重过大这种情况的出现;在进行相关词的权重计算时,分别采用第二权重公式和第三权重公式进行计算,是考虑到当相关词的基准分值过大和过小这两种情况出现时,避免计算出该相关词的权重值大于关键词的权重值,或者是大大小于关键词的权重值,也就是说在检索简历过程中,能避免相关词过多把关键词淹没、以及出现关键词过于突出的情况;对于权重系数的设置,可由实际情况决定,一般取值为2或4。It should be noted that the purpose of calculating the benchmark score of each related word in the related word set is to base the correlation degree of the related words on the same benchmark, and represent the degree of correlation between the related words and the keywords in the form of a score. The reference value is preferably r max -r min +1; the first weight formula adopts a logarithmic formula, and when the total value of the benchmark score grows too fast, the weight value of the keyword still maintains a moderate growth, and thus, the relevant words can be In a relatively large number of cases, avoid the occurrence of excessive weight of keywords; when calculating the weight of related words, the second weight formula and the third weight formula are used respectively to calculate, considering the relevant words. When the two benchmarks of the benchmark scores are too large or too small, it is avoided that the weight value of the related word is greater than the weight value of the keyword, or is much smaller than the weight value of the keyword, that is, during the process of searching for the resume, It can avoid the situation that the relevant words are overwhelmed by keywords and the keywords are too prominent; the setting of the weight coefficient can be determined by the actual situation, and the value is generally 2 or 4.
再进一步地,上述步骤S3的具体实施方式中的“在所述判断所述第i个相关词的基准分值是否大于WA/n”之前,还包括:Further, before the determining whether the reference score of the ith related word is greater than W A /n in the specific implementation manner of the foregoing step S3, the method further includes:
根据均值公式
Figure PCTCN2016113140-appb-000021
计算基准分值均值
Figure PCTCN2016113140-appb-000022
其中,
Figure PCTCN2016113140-appb-000023
为所述相关词集中的相关词的相关度的平均值;
According to the mean formula
Figure PCTCN2016113140-appb-000021
Calculate the mean of the benchmark score
Figure PCTCN2016113140-appb-000022
among them,
Figure PCTCN2016113140-appb-000023
An average value of the relevance of related words in the related word set;
对于所述相关词集中的第i个相关词,判断所述第i个相关词的基准分值Si是否大于所述基准分值均值
Figure PCTCN2016113140-appb-000024
Determining, for the i-th related word in the related word set, whether the reference score S i of the i-th related word is greater than the reference score mean
Figure PCTCN2016113140-appb-000024
若是,通过更新公式
Figure PCTCN2016113140-appb-000025
更新所述第i个相关词的基准分值Si
If yes, by updating the formula
Figure PCTCN2016113140-appb-000025
Updating the reference score S i of the i-th related word.
需要说明的是,将该相关词集中的相关词的相关度均值基于同一个基准,且该基准与上述基准相同,目的是以一个分值的形式,表示该相关词集与关键词的平均相关程度;当一个相关词与关键词的相关程度(即基准分值)大于该相关词集与关键词的平均相关程度(平均分值)时,可将该相关词与关键词的相关程度,与该相关词集与关键词的平均相关程度这两个相关程度之间的差异值放大10倍,作为该相关词与关键词的相关程度的基准分值;反之,则保持该相关词与关键词的相关程度的原基准分值,这种方式更能确切地表达该相关词与关键词的相关程度,使得相关词与关键词之间的关系更为合理。It should be noted that the correlation degree of the related words in the related word set is based on the same benchmark, and the benchmark is the same as the above benchmark, and the purpose is to represent the average correlation between the related word set and the keyword in the form of a score. Degree; when the degree of correlation between a related word and a keyword (ie, the benchmark score) is greater than the average degree of correlation (average score) between the related word set and the keyword, the degree of correlation between the related word and the keyword may be The difference between the correlation degree of the related word set and the average degree of relevance of the keyword is magnified 10 times as a reference score of the degree of relevance of the related word to the keyword; otherwise, the related word and the keyword are maintained The original benchmark score of the degree of relevance, which more accurately expresses the relevance of the related words to the keywords, making the relationship between related words and keywords more reasonable.
更进一步地,上述步骤S4的具体实施方式为:Further, the specific implementation manner of the foregoing step S4 is:
根据搜索引擎的权重关联格式,将所述关键词所占的权重值和所述关键词相关联作为第一组合,以及将所述每一个相关词所占的权重值与其对应的相关词相关联作为第二组合;Correlating a weight value occupied by the keyword with the keyword as a first combination according to a weight association format of a search engine, and associating a weight value of each of the related words with a corresponding related word thereof As a second combination;
将所述第一组合和所述第二组合提交给搜索引擎,以供所述搜索引擎从所述简历数据库中搜索出简历,并根据所述搜索引擎内置的排序算法显示搜索出的简历;其中,所述简历中包含的所述关键词和所述每一个相关词以高亮形式显示。Submitting the first combination and the second combination to a search engine for the search engine to search for a resume from the resume database, and displaying the searched resume according to a sorting algorithm built into the search engine; The keyword and each of the related words included in the resume are displayed in a highlighted form.
需要说明的是,一般采用的搜索引擎为solr搜索引擎,则上述权重关联格式,可依据如下格式进行关联:关键词^关键词权重、相关词1^相关词1权重、相关词2^相关词2权重......It should be noted that the commonly used search engine is the solr search engine, and the above weight association format can be associated according to the following format: keyword ^ keyword weight, related word 1^ related word 1 weight, related word 2^ related words 2 weights...
实施本发明实施例的简历搜索方法,用户只需输入关键词,即可获得与该关键词相关的相关词,并确定相关词与该关键词的相关程度,满足相关词的时效性要求,能在检索简历过程避免出现与该关键词对应领域的过时的技能对应的简历。另外,通过计算出关键词和相关词在搜索简历过程中所占的权重值,则在搜索过程中可以更准确地找到与关键词对应领域下匹配的简历,使得检索出来的简历更为合理,即提高简历搜索的准确度。By implementing the resume search method of the embodiment of the present invention, the user only needs to input a keyword, and can obtain related words related to the keyword, and determine the degree of relevance of the related words to the keyword, and satisfy the timeliness requirement of the related words. In the process of retrieving a resume, a resume corresponding to an outdated skill in the field corresponding to the keyword is avoided. In addition, by calculating the weight values of the keywords and related words in the process of searching for resumes, the resumes matching the fields corresponding to the keywords can be more accurately found in the search process, so that the retrieved resumes are more reasonable. That is to improve the accuracy of the resume search.
参见图3,是本发明提供的简历搜索装置的一个实施例的结构示意图,该简历搜索装置能实现上述简历搜索方法的全部流程,其结构具体如下:3 is a schematic structural diagram of an embodiment of a resume search device provided by the present invention. The resume search device can implement the entire flow of the resume search method, and the structure thereof is as follows:
接收模块10,用于接收用于从简历数据库中进行简历搜索的关键词;The receiving module 10 is configured to receive keywords used for searching for a resume from a resume database;
相关词集获取模块20,用于根据所述关键词,分别从词条数据库中获取所述关键词的待验相关词集和所述待验相关词集中每一个待验相关词的对照词集,并将所述每一个待验相关词的对照词集与所述待验相关词集进行比较获取相关词集和所述相关词集中每一相关词的相关度;其中,所述相关度表示为相关词与所述关键词的相关程度;The related word set obtaining module 20 is configured to obtain, according to the keyword, a set of related words of the keyword to be tested and a set of related words of the to-be-tested related word in the to-be-tested related word set respectively from the entry database And comparing the comparison word set of each of the to-be-tested related words with the to-be-tested related word set to obtain a correlation degree between the related word set and each related word in the related word set; wherein the correlation degree is represented The degree of relevance of related words to the keywords;
权重计算模块30,用于根据每一个相关词的相关度,计算出在进行简历搜索时所述关键词所占的权重值和所述每一个相关词所占的权重值;The weight calculation module 30 is configured to calculate, according to the relevance of each related word, a weight value occupied by the keyword and a weight value of each of the related words when performing a resume search;
搜索模块40,用于将所述关键词、所述关键词所占的权重值、所述每一个相关词和所述 每一个相关词所占的权重值提交给搜索引擎,以从所述简历数据库中搜索出简历。a search module 40, configured to use the keyword, a weight value occupied by the keyword, each of the related words, and the The weight value of each related word is submitted to the search engine to search for the resume from the resume database.
进一步地,参见图4,是本发明提供的简历搜索装置的相关词集获取模块的一个实施例的结构示意图;该相关词集获取模块20具体包括:Further, referring to FIG. 4, it is a schematic structural diagram of an embodiment of a related word set obtaining module of the resume search device provided by the present invention; the related word set obtaining module 20 specifically includes:
待验相关词集单元21,用于根据所述关键词从词条数据库中获取包含所述关键词的词条,并对所述词条进行分词和筛选,获得待验相关词集;The related word set unit 21 is configured to obtain an entry including the keyword from the vocabulary database according to the keyword, and perform word segmentation and screening on the term to obtain a related word set to be tested;
对照词集单元22,用于对于所述待验相关词集中的每一个待验相关词,根据所述待验相关词从所述词条数据库中获取包含所述待验相关词的词条,并对所述待验相关词的词条进行分词和筛选,获得所述待验相关词的对照词集;a comparison word set unit 22, configured to acquire, for each of the to-be-recognized related words in the to-be-tested related word set, an entry including the to-be-recognized related word from the entry database according to the inquiring related word, And performing word segmentation and screening on the words of the related words to be tested, and obtaining a comparison word set of the related words to be tested;
判断获取单元23,用于当判定所述待验相关词的对照词集与所述待验相关词集的交集的绝对值大于筛选阈值时,所述待验相关词为所述关键词的相关词,获得相关词集;其中,所述绝对值作为所述相关词与所述关键词的相关度。The determining obtaining unit 23 is configured to: when determining that the absolute value of the intersection of the comparison word set of the to-be-tested related word and the inquiring related word set is greater than a screening threshold, the to-be-relevant related word is related to the keyword a word, obtaining a related word set; wherein the absolute value is used as a relevance of the related word to the keyword.
再进一步地,参见图5,是本发明提供的简历搜索装置的权重计算模块的一个实施例的结构示意图;该权重计算模块30具体包括:Further, referring to FIG. 5, it is a schematic structural diagram of an embodiment of a weight calculation module of the resume search device provided by the present invention; the weight calculation module 30 specifically includes:
基准分值计算单元31,用于根据基准分值公式
Figure PCTCN2016113140-appb-000026
计算出所述相关词集中第i个相关词的基准分值Si;其中,ri为所述相关词集中的第i个相关词的相关度;rmin为在所述相关词集中的所有相关词的相关度的最小值;rmax为在所述相关词集中的所有相关词的相关度的最大值;
a reference score calculation unit 31 for using a reference score formula
Figure PCTCN2016113140-appb-000026
Calculating a reference score S i of the i-th related word in the related word set; wherein r i is a correlation degree of the i-th related word in the related word set; r min is all in the related word set The minimum value of the relevance of the related words; r max is the maximum value of the relevance of all related words in the related word set;
求和计算单元32,用于对所述所有相关词的基准分值进行求和,获得基准分值总值sum;The sum calculation unit 32 is configured to sum the reference scores of all the related words to obtain a reference score total value sum;
关键词权重计算单元33,用于根据第一权重公式WA=5+log1.5(sum+1),计算出所述关键词在进行简历搜索时所占的权重值WA;其中,A为所述关键词;The keyword weight calculation unit 33 is configured to calculate, according to the first weight formula W A =5+log 1.5 (sum+1), a weight value W A of the keyword when performing a resume search; wherein, A is The keyword;
相关词权重计算单元34,用于判断所述第i个相关词的基准分值是否大于WA/n;若是,则根据第二权重公式
Figure PCTCN2016113140-appb-000027
计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;若否,则根据第三权重公式
Figure PCTCN2016113140-appb-000028
计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;其中,n为第二权重系数,k为所述相关词集中的相关词的数量。
The related word weight calculation unit 34 is configured to determine whether the reference score of the i-th related word is greater than W A /n; if yes, according to the second weight formula
Figure PCTCN2016113140-appb-000027
Calculating a weight value W i of the i-th related word when performing a resume search; if not, according to the third weight formula
Figure PCTCN2016113140-appb-000028
Calculating a weight value W i of the i-th related word when performing a resume search; wherein n is a second weight coefficient, and k is the number of related words in the related word set.
再进一步地,所述权重计算模块30还包括:Further, the weight calculation module 30 further includes:
均值计算单元35,用于在所述判断所述第i个相关词的基准分值是否大于WA/n之前,根 据均值公式
Figure PCTCN2016113140-appb-000029
计算基准分值均值
Figure PCTCN2016113140-appb-000030
其中,
Figure PCTCN2016113140-appb-000031
为所述相关词集中的所有相关词的相关度的平均值;
The mean value calculating unit 35 is configured to: according to the mean value formula, before determining whether the reference score of the ith related word is greater than W A /n
Figure PCTCN2016113140-appb-000029
Calculate the mean of the benchmark score
Figure PCTCN2016113140-appb-000030
among them,
Figure PCTCN2016113140-appb-000031
The average of the correlations of all related words in the related word set;
判断单元36,用于对于所述相关词集中的第i个相关词,判断所述第i个相关词的基准分值Si是否大于所述基准分值均值
Figure PCTCN2016113140-appb-000032
The determining unit 36 is configured to determine, for the i-th related word in the related word set, whether the reference score S i of the i-th related word is greater than the reference score mean
Figure PCTCN2016113140-appb-000032
更新单元37,用于当判断所述第i个相关词的基准分值Si是大于所述基准分值均值
Figure PCTCN2016113140-appb-000033
时,通过更新公式
Figure PCTCN2016113140-appb-000034
更新所述第i个相关词的基准分值Si
The updating unit 37 is configured to: when determining that the reference score S i of the ith related word is greater than the reference score mean
Figure PCTCN2016113140-appb-000033
By updating the formula
Figure PCTCN2016113140-appb-000034
Updating the reference score S i of the i-th related word.
更进一步地,参见图6,是本发明提供的简历搜索装置的搜索模块的一个实施例的结构示意图,该搜索模块40具体包括:Further, referring to FIG. 6, which is a schematic structural diagram of an embodiment of a search module of the resume search device provided by the present invention, the search module 40 specifically includes:
权重相联单元41,用于根据搜索引擎的权重关联格式,将所述关键词所占的权重值和所述关键词相关联作为第一组合,以及将所述每一个相关词所占的权重值与其对应的相关词相关联作为第二组合;The weight associating unit 41 is configured to associate, according to a weight association format of the search engine, a weight value occupied by the keyword with the keyword as a first combination, and a weight of each of the related words The value is associated with its corresponding related word as a second combination;
搜索显示单元42,用于将所述第一组合和所述第二组合提交给搜索引擎,以供所述搜索引擎从所述简历数据库中搜索出简历,并根据所述搜索引擎内置的排序算法显示搜索出的简历;其中,所述简历中包含的所述关键词和所述每一个相关词以高亮形式显示。a search display unit 42 for submitting the first combination and the second combination to a search engine for the search engine to search for a resume from the resume database, and according to a sorting algorithm built in the search engine The searched resume is displayed; wherein the keyword and each of the related words included in the resume are displayed in a highlighted form.
实施本发明实施例提供的简历搜索装置,能用户只需输入关键词,即可获得与该关键词相关的相关词,并确定相关词与该关键词的相关程度,满足相关词的时效性要求,能在检索简历过程避免出现与该关键词对应领域的过时的技能对应的简历。另外,通过区分关键词和相关词在简历搜索过程中所占的权重值,在检索过程中可以更准确地找到与关键词对应领域下匹配的简历,使得检索出来的简历更为合理,提高简历搜索的准确度。The resume search device provided by the embodiment of the present invention can obtain relevant words related to the keyword by inputting a keyword, and determine the degree of relevance of the related word to the keyword, and satisfy the timeliness requirement of the related word. It is possible to avoid the appearance of resumes corresponding to outdated skills in the field corresponding to the keyword in the process of retrieving the resume. In addition, by distinguishing the weight values of the keywords and related words in the resume search process, the resumes matching the keywords corresponding fields can be more accurately found in the search process, so that the retrieved resumes are more reasonable and the resume is improved. The accuracy of the search.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(RandomAccess Memory,RAM)等。One of ordinary skill in the art can understand that all or part of the process of implementing the foregoing embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
以上所述是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也视为本发明的保护范围。 The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It is the scope of protection of the present invention.

Claims (10)

  1. 一种简历搜索方法,其特征在于,包括:A resume search method, characterized in that it comprises:
    接收用于从简历数据库中进行简历搜索的关键词;Receiving keywords for conducting a resume search from a resume database;
    根据所述关键词,分别从词条数据库中获取所述关键词的待验相关词集和所述待验相关词集中每一个待验相关词的对照词集,并将所述每一个待验相关词的对照词集与所述待验相关词集进行比较获取相关词集和所述相关词集中每一相关词的相关度;其中,所述相关度表示为相关词与所述关键词的相关程度;Obtaining, according to the keyword, a test word set of the keyword to be tested and a control word set of each of the to-be-tested related words in the to-be-tested related word set, respectively, and each of the pending test words Comparing the set of related words of the related words with the set of related words to be tested to obtain the relevance of the related words set and each related word in the related word set; wherein the relevance is expressed as the related words and the keywords Relevance;
    根据每一个相关词的相关度,计算出在进行简历搜索时所述关键词所占的权重值和所述每一个相关词所占的权重值;Calculating, according to the relevance of each related word, the weight value of the keyword and the weight value of each of the related words when performing the resume search;
    将所述关键词、所述关键词所占的权重值、所述每一个相关词和所述每一个相关词所占的权重值提交给搜索引擎,以从所述简历数据库中搜索出简历。And submitting, by the keyword, the weight value occupied by the keyword, each of the related words, and the weight value of each of the related words to a search engine, to search for a resume from the resume database.
  2. 如权利要求1所述的简历搜索方法,其特征在于,所述根据所述关键词,分别从词条数据库中获取所述关键词的待验相关词集和所述待验相关词集中每一个待验相关词的对照词集,并将所述每一个待验相关词的对照词集与所述待验相关词集进行比较获取相关词集和所述相关词集中每一相关词的相关度,具体为:The resume search method according to claim 1, wherein said obtaining, according to said keyword, each of said set of related words of said keyword and said set of related words to be examined are respectively obtained from a vocabulary database A comparison word set of the related words to be tested, and comparing the comparison word set of each of the to-be-tested related words with the to-be-tested related word set to obtain the correlation between the related word set and each related word in the related word set ,Specifically:
    根据所述关键词从词条数据库中获取包含所述关键词的词条,并对所述词条进行分词和筛选,获得待验相关词集;Obtaining an entry containing the keyword from the vocabulary database according to the keyword, and performing word segmentation and screening on the term, and obtaining a related word set to be tested;
    对于所述待验相关词集中的每一个待验相关词,根据所述待验相关词从所述词条数据库中获取包含所述待验相关词的词条,并对所述待验相关词的词条进行分词和筛选,获得所述待验相关词的对照词集;For each of the to-be-recognized related words in the to-be-tested related word set, the term containing the inquisitive related word is obtained from the term database according to the inquiring related word, and the inquiring related words are obtained The entry is subjected to word segmentation and screening, and the control word set of the related words to be tested is obtained;
    当判定所述待验相关词的对照词集与所述待验相关词集的交集的绝对值大于筛选阈值时,所述待验相关词为所述关键词的相关词,获得相关词集;其中,所述绝对值作为所述相关词与所述关键词的相关度。When it is determined that the absolute value of the intersection of the control word set of the to-be-tested related word and the to-be-tested related word set is greater than a screening threshold, the to-be-relevant related word is a related word of the keyword, and the related word set is obtained; Wherein the absolute value is used as the relevance of the related word to the keyword.
  3. 如权利要求2所述简历搜索方法,其特征在于,所述根据每一个相关词的相关度,计算出在进行简历搜索时所述关键词所占的权重值和所述每一个相关词所占的权重值,具体为:The resume search method according to claim 2, wherein said calculating, according to the relevance of each of the related words, a weight value of said keyword and a history of each of said related words when performing a resume search The weight value, specifically:
    根据基准分值公式
    Figure PCTCN2016113140-appb-100001
    计算出所述相关词集中第i个相关词的基准分值Si;其中,ri为所述相关词集中的第i个相关词的相关度;rmin为在所述相关词集中的所有相关词 的相关度的最小值;rmax为在所述相关词集中的所有相关词的相关度的最大值;
    Base score formula
    Figure PCTCN2016113140-appb-100001
    Calculating a reference score S i of the i-th related word in the related word set; wherein r i is a correlation degree of the i-th related word in the related word set; r min is all in the related word set The minimum value of the relevance of the related words; r max is the maximum value of the relevance of all related words in the related word set;
    对所述所有相关词的基准分值进行求和,获得基准分值总值sum;Calculating the benchmark scores of all the related words to obtain a total score of the benchmark scores sum;
    根据第一权重公式WA=5+log1.5(sum+1),计算出所述关键词在进行简历搜索时所占的权重值WA;其中,A为所述关键词;Calculating, according to the first weight formula W A =5+log 1.5 (sum+1), a weight value W A of the keyword when performing a resume search; wherein A is the keyword;
    判断所述第i个相关词的基准分值是否大于WA/n;若是,则根据第二权重公式
    Figure PCTCN2016113140-appb-100002
    计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;若否,则根据第三权重公式
    Figure PCTCN2016113140-appb-100003
    计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;其中,n为权重系数,k为所述相关词集中的相关词的数量。
    Determining whether the reference score of the i-th related word is greater than W A /n; if yes, according to the second weight formula
    Figure PCTCN2016113140-appb-100002
    Calculating a weight value W i of the i-th related word when performing a resume search; if not, according to the third weight formula
    Figure PCTCN2016113140-appb-100003
    Calculating a weight value W i of the i-th related word when performing a resume search; wherein n is a weight coefficient, and k is the number of related words in the related word set.
  4. 如权利要求3所述的简历搜索方法,其特征在于,在所述判断所述第i个相关词的基准分值是否大于WA/n之前,还包括:The resume search method according to claim 3, further comprising: before determining whether the reference score of the i-th related word is greater than W A /n,
    根据均值公式
    Figure PCTCN2016113140-appb-100004
    计算基准分值均值
    Figure PCTCN2016113140-appb-100005
    其中,
    Figure PCTCN2016113140-appb-100006
    为所述相关词集中的所有相关词的相关度的平均值;
    According to the mean formula
    Figure PCTCN2016113140-appb-100004
    Calculate the mean of the benchmark score
    Figure PCTCN2016113140-appb-100005
    among them,
    Figure PCTCN2016113140-appb-100006
    The average of the correlations of all related words in the related word set;
    对于所述相关词集中的第i个相关词,判断所述第i个相关词的基准分值Si是否大于所述基准分值均值
    Figure PCTCN2016113140-appb-100007
    Determining, for the i-th related word in the related word set, whether the reference score S i of the i-th related word is greater than the reference score mean
    Figure PCTCN2016113140-appb-100007
    若是,通过更新公式
    Figure PCTCN2016113140-appb-100008
    更新所述第i个相关词的基准分值Si
    If yes, by updating the formula
    Figure PCTCN2016113140-appb-100008
    Updating the reference score S i of the i-th related word.
  5. 如权利要求1所述的简历搜索方法,其特征在于,所述将所述关键词、所述关键词所占的权重值、所述每一个相关词和所述每一个相关词所占的权重值提交给搜索引擎,以从所述简历数据库中搜索出简历,具体为:The resume search method according to claim 1, wherein said keyword, said weight value of said keyword, said each related word, and weight of said each related word The value is submitted to the search engine to search for a resume from the resume database, specifically:
    根据搜索引擎的权重关联格式,将所述关键词所占的权重值和所述关键词相关联作为第一组合,以及将所述每一个相关词所占的权重值与其对应的相关词相关联作为第二组合;Correlating a weight value occupied by the keyword with the keyword as a first combination according to a weight association format of a search engine, and associating a weight value of each of the related words with a corresponding related word thereof As a second combination;
    将所述第一组合和所述第二组合提交给搜索引擎,以供所述搜索引擎从所述简历数据库中搜索出简历,并根据所述搜索引擎内置的排序算法显示搜索出的简历;其中,所述简历中包含的所述关键词和所述每一个相关词以高亮形式显示。Submitting the first combination and the second combination to a search engine for the search engine to search for a resume from the resume database, and displaying the searched resume according to a sorting algorithm built into the search engine; The keyword and each of the related words included in the resume are displayed in a highlighted form.
  6. 一种简历搜索装置,其特征在于,包括: A resume search device, comprising:
    接收模块,用于接收用于从简历数据库中进行简历搜索的关键词;a receiving module, configured to receive keywords used for searching for a resume from a resume database;
    相关词集获取模块,用于根据所述关键词,分别从词条数据库中获取所述关键词的待验相关词集和所述待验相关词集中每一个待验相关词的对照词集,并将所述每一个待验相关词的对照词集与所述待验相关词集进行比较获取相关词集和所述相关词集中每一相关词的相关度;其中,所述相关度表示为相关词与所述关键词的相关程度;a related word set obtaining module, configured to acquire, from the vocabulary database, a set of related words of the keyword to be tested and a control word set of each of the to-be-reviewed related words in the to-be-tested related word set according to the keyword, And comparing the comparison word set of each of the to-be-tested related words with the to-be-tested related word set to obtain a correlation degree between the related word set and each related word in the related word set; wherein the correlation degree is expressed as The degree of relevance of related words to the keywords;
    权重计算模块,用于根据每一个相关词的相关度,计算出在进行简历搜索时所述关键词所占的权重值和所述每一个相关词所占的权重值;a weight calculation module, configured to calculate, according to the relevance of each related word, a weight value of the keyword and a weight value of each of the related words when performing a resume search;
    搜索模块,用于将所述关键词、所述关键词所占的权重值、所述每一个相关词和所述每一个相关词所占的权重值提交给搜索引擎,以从所述简历数据库中搜索出简历。a search module, configured to submit, by the keyword, a weight value occupied by the keyword, each of the related words, and a weight value of each of the related words to a search engine, to use the resume database Search for a resume.
  7. 如权利要求6所述的简历搜索装置,其特征在于,所述相关词集获取模块具体包括:The resume search device according to claim 6, wherein the related word set acquisition module specifically includes:
    待验相关词集单元,用于根据所述关键词从词条数据库中获取包含所述关键词的词条,并对所述词条进行分词和筛选,获得待验相关词集;The related word set unit is configured to obtain an entry including the keyword from the vocabulary database according to the keyword, and perform word segmentation and screening on the term to obtain a related word set to be tested;
    对照词集单元,用于对于所述待验相关词集中的每一个待验相关词,根据所述待验相关词从所述词条数据库中获取包含所述待验相关词的词条,并对所述待验相关词的词条进行分词和筛选,获得所述待验相关词的对照词集;a comparison word set unit, configured to acquire, for each of the to-be-recognized related words in the to-be-tested related word set, an entry including the to-be-tested related word from the entry database according to the inquiring related word, and Performing word segmentation and screening on the words of the related words to be tested, and obtaining a control word set of the related words to be tested;
    判断获取单元,用于当判定所述待验相关词的对照词集与所述待验相关词集的交集的绝对值大于筛选阈值时,所述待验相关词为所述关键词的相关词,获得相关词集;其中,所述绝对值作为所述相关词与所述关键词的相关度。a determining unit, configured to determine, when the absolute value of the intersection of the comparison word set of the to-be-tested related word and the to-be-tested related word set is greater than a screening threshold, the related word to be tested is a related word of the keyword Obtaining a related word set; wherein the absolute value is used as a relevance of the related word to the keyword.
  8. 如权利要求7所述简历搜索装置,其特征在于,所述权重计算模块具体包括:The resume search device according to claim 7, wherein the weight calculation module comprises:
    基准分值计算单元,用于根据基准分值公式
    Figure PCTCN2016113140-appb-100009
    计算出所述相关词集中第i个相关词的基准分值Si;其中,ri为所述相关词集中的第i个相关词的相关度;rmin为在所述相关词集中的所有相关词的相关度的最小值;rmax为在所述相关词集中的所有相关词的相关度的最大值;
    Benchmark score calculation unit for formulating a score based on a benchmark
    Figure PCTCN2016113140-appb-100009
    Calculating a reference score S i of the i-th related word in the related word set; wherein r i is a correlation degree of the i-th related word in the related word set; r min is all in the related word set The minimum value of the relevance of the related words; r max is the maximum value of the relevance of all related words in the related word set;
    求和计算单元,用于对所述所有相关词的基准分值进行求和,获得基准分值总值sum;a sum calculation unit for summing the reference scores of all the related words to obtain a total score of the reference scores sum;
    关键词权重计算单元,用于根据第一权重公式WA=5+log1.5(sum+1),计算出所述关键词在进行简历搜索时所占的权重值WA;其中,A为所述关键词;a keyword weight calculation unit, configured to calculate a weight value W A of the keyword when performing a resume search according to the first weight formula W A =5+log 1.5 (sum+1); wherein, A is Key words
    相关词权重计算单元,用于判断所述第i个相关词的基准分值是否大于WA/n;若是,则 根据第二权重公式
    Figure PCTCN2016113140-appb-100010
    计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;若否,则根据第三权重公式
    Figure PCTCN2016113140-appb-100011
    计算出所述第i个相关词在进行简历搜索时所占的权重值Wi;其中,n为权重系数,k为所述相关词集中的相关词的数量。
    a related word weight calculation unit, configured to determine whether a reference score of the i-th related word is greater than W A /n; if yes, according to a second weight formula
    Figure PCTCN2016113140-appb-100010
    Calculating a weight value W i of the i-th related word when performing a resume search; if not, according to the third weight formula
    Figure PCTCN2016113140-appb-100011
    Calculating a weight value W i of the i-th related word when performing a resume search; wherein n is a weight coefficient, and k is the number of related words in the related word set.
  9. 如权利要求8所述的简历搜索装置,其特征在于,所述权重计算模块还包括:The resume search device according to claim 8, wherein the weight calculation module further comprises:
    均值计算单元,用于在所述判断所述第i个相关词的基准分值是否大于WA/n之前,根据均值公式
    Figure PCTCN2016113140-appb-100012
    计算基准分值均值
    Figure PCTCN2016113140-appb-100013
    其中,
    Figure PCTCN2016113140-appb-100014
    为所述相关词集中的所有相关词的相关度的平均值;
    a mean value calculation unit, configured to: according to the mean value formula, before determining whether the reference score of the i-th related word is greater than W A /n
    Figure PCTCN2016113140-appb-100012
    Calculate the mean of the benchmark score
    Figure PCTCN2016113140-appb-100013
    among them,
    Figure PCTCN2016113140-appb-100014
    The average of the correlations of all related words in the related word set;
    判断单元,用于对于所述相关词集中的第i个相关词,判断所述第i个相关词的基准分值Si是否大于所述基准分值均值
    Figure PCTCN2016113140-appb-100015
    a determining unit, configured to determine, for the i-th related word in the related word set, whether the reference score S i of the i-th related word is greater than the reference score mean
    Figure PCTCN2016113140-appb-100015
    更新单元,用于当判断所述第i个相关词的基准分值Si是大于所述基准分值均值
    Figure PCTCN2016113140-appb-100016
    时,通过更新公式
    Figure PCTCN2016113140-appb-100017
    更新所述第i个相关词的基准分值Si
    An update unit, configured to: when determining that the reference score S i of the i-th related word is greater than the reference score mean
    Figure PCTCN2016113140-appb-100016
    By updating the formula
    Figure PCTCN2016113140-appb-100017
    Updating the reference score S i of the i-th related word.
  10. 如权利要求6所述的简历搜索装置,其特征在于,所述搜索模块具体包括:The resume search device according to claim 6, wherein the search module specifically comprises:
    权重相联单元,用于根据搜索引擎的权重关联格式,将所述关键词所占的权重值和所述关键词相关联作为第一组合,以及将所述每一个相关词所占的权重值与其对应的相关词相关联作为第二组合;a weight associating unit, configured to associate, according to a weight association format of the search engine, a weight value occupied by the keyword and the keyword as a first combination, and a weight value occupied by each of the related words Corresponding related words are associated with the second combination;
    搜索显示单元,用于将所述第一组合和所述第二组合提交给搜索引擎,以供所述搜索引擎从所述简历数据库中搜索出简历,并根据所述搜索引擎内置的排序算法显示搜索出的简历;其中,所述简历中包含的所述关键词和所述每一个相关词以高亮形式显示。 Searching a display unit, configured to submit the first combination and the second combination to a search engine, for the search engine to search for a resume from the resume database, and display according to a sorting algorithm built in the search engine The searched resume; wherein the keyword and each of the related words included in the resume are displayed in a highlighted form.
PCT/CN2016/113140 2016-06-17 2016-12-29 Method and device for searching resumes WO2017215242A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610445551.8A CN106126589B (en) 2016-06-17 2016-06-17 Resume search method and device
CN201610445551.8 2016-06-17

Publications (1)

Publication Number Publication Date
WO2017215242A1 true WO2017215242A1 (en) 2017-12-21

Family

ID=57470907

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/113140 WO2017215242A1 (en) 2016-06-17 2016-12-29 Method and device for searching resumes

Country Status (2)

Country Link
CN (1) CN106126589B (en)
WO (1) WO2017215242A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580336A (en) * 2020-12-25 2021-03-30 深圳壹账通创配科技有限公司 Information calibration retrieval method and device, computer equipment and readable storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126589B (en) * 2016-06-17 2018-05-22 广州视源电子科技股份有限公司 Resume search method and device
CN106095982B (en) * 2016-06-17 2019-03-29 广州视源电子科技股份有限公司 Resume search method and device
CN106980961A (en) * 2017-03-02 2017-07-25 中科天地互联网科技(苏州)有限公司 A kind of resume selection matching process and system
CN107220234A (en) * 2017-05-17 2017-09-29 东莞市华睿电子科技有限公司 A kind of screening technique of electronics resume
CN107357917B (en) * 2017-07-20 2020-04-07 北京拉勾科技有限公司 Resume searching method and computing device
CN107870976A (en) * 2017-09-25 2018-04-03 平安科技(深圳)有限公司 Resume identification device, method and computer-readable recording medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425687A (en) * 2012-05-21 2013-12-04 阿里巴巴集团控股有限公司 Retrieval method and system based on queries
CN105045781A (en) * 2015-08-27 2015-11-11 广州神马移动信息科技有限公司 Calculation method and device for similarity of query word as well as query word searching method and device
CN105512101A (en) * 2015-11-30 2016-04-20 北大方正集团有限公司 Method and device automatically constructing subject term
CN106126589A (en) * 2016-06-17 2016-11-16 广州视源电子科技股份有限公司 Resume searching method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216931A (en) * 2013-05-29 2014-12-17 酷盛(天津)科技有限公司 Real-time recommending system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103425687A (en) * 2012-05-21 2013-12-04 阿里巴巴集团控股有限公司 Retrieval method and system based on queries
CN105045781A (en) * 2015-08-27 2015-11-11 广州神马移动信息科技有限公司 Calculation method and device for similarity of query word as well as query word searching method and device
CN105512101A (en) * 2015-11-30 2016-04-20 北大方正集团有限公司 Method and device automatically constructing subject term
CN106126589A (en) * 2016-06-17 2016-11-16 广州视源电子科技股份有限公司 Resume searching method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580336A (en) * 2020-12-25 2021-03-30 深圳壹账通创配科技有限公司 Information calibration retrieval method and device, computer equipment and readable storage medium

Also Published As

Publication number Publication date
CN106126589B (en) 2018-05-22
CN106126589A (en) 2016-11-16

Similar Documents

Publication Publication Date Title
WO2017215242A1 (en) Method and device for searching resumes
WO2019091026A1 (en) Knowledge base document rapid search method, application server, and computer readable storage medium
US9141728B2 (en) Methods and systems for creating and using an adaptive thesaurus
US7769771B2 (en) Searching a document using relevance feedback
JP5638031B2 (en) Rating method, search result classification method, rating system, and search result classification system
KR20160149978A (en) Search engine and implementation method thereof
US8019758B2 (en) Generation of a blended classification model
CN108897887B (en) Teaching resource recommendation method based on knowledge graph and user similarity
CN109213925B (en) Legal text searching method
CN106708929B (en) Video program searching method and device
JP2005122533A (en) Question-answering system and question-answering processing method
CN112328891B (en) Method for training search model, method for searching target object and device thereof
WO2017215243A1 (en) Method and device for searching resumes
US11675845B2 (en) Identifying merchant data associated with multiple data structures
WO2017215245A1 (en) Method and device for searching resumes
JP2000200281A (en) Device and method for information retrieval and recording medium where information retrieval program is recorded
CN112579729A (en) Training method and device for document quality evaluation model, electronic equipment and medium
JP5250009B2 (en) Suggestion query extraction apparatus and method, and program
CN106570196B (en) Video program searching method and device
WO2017215244A1 (en) Method and device for providing relevant words
JP4935243B2 (en) Search program, information search device, and information search method
JP5204203B2 (en) Example translation system, example translation method, and example translation program
JP5179564B2 (en) Query segment position determination device
CN111737413A (en) Feedback model information retrieval method, system and medium based on concept net semantics
US9104755B2 (en) Ontology enhancement method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16905349

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16905349

Country of ref document: EP

Kind code of ref document: A1