CN104361102B - A kind of expert recommendation method and system based on group matches - Google Patents

A kind of expert recommendation method and system based on group matches Download PDF

Info

Publication number
CN104361102B
CN104361102B CN201410680306.6A CN201410680306A CN104361102B CN 104361102 B CN104361102 B CN 104361102B CN 201410680306 A CN201410680306 A CN 201410680306A CN 104361102 B CN104361102 B CN 104361102B
Authority
CN
China
Prior art keywords
expert
matched
item
experts
matching degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410680306.6A
Other languages
Chinese (zh)
Other versions
CN104361102A (en
Inventor
肖贺
李振华
刘云浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201410680306.6A priority Critical patent/CN104361102B/en
Publication of CN104361102A publication Critical patent/CN104361102A/en
Application granted granted Critical
Publication of CN104361102B publication Critical patent/CN104361102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于群组匹配的专家推荐方法及系统,属于互联网技术领域,所述方法包括:S1:通过网络爬虫获取专家列表中各专家的网页信息;S2:对所述网页信息进行提取,以获得各专家的专家学术信息;S3:根据所述专家学术信息计算各专家与待匹配项目之间的匹配度;S4:根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家。本发明通过群组匹配的方式实现专家推荐,提高了专家推荐效率,大幅降低时间开销,另外,计算各专家与待匹配项目之间的匹配度时,还考虑了各专家与待匹配项目之间的社会关系匹配度,从而在实现专家推荐时,还有效避免或预防了学术腐败问题。

The invention discloses an expert recommendation method and system based on group matching, which belongs to the technical field of the Internet. The method includes: S1: obtaining the web page information of each expert in the expert list through a web crawler; S2: performing the web page information Extract to obtain the expert academic information of each expert; S3: Calculate the matching degree between each expert and the item to be matched according to the expert academic information; S4: Determine the matching degree and group matching model according to the dynamic programming algorithm as Experts recommended by the item to be matched. The present invention realizes expert recommendation through group matching, improves the efficiency of expert recommendation, and greatly reduces time overhead. In addition, when calculating the matching degree between each expert and the item to be matched, the relationship between each expert and the item to be matched is also considered. The degree of social relationship matching, thus effectively avoiding or preventing academic corruption when implementing expert recommendation.

Description

一种基于群组匹配的专家推荐方法及系统An expert recommendation method and system based on group matching

技术领域technical field

本发明涉及互联网技术领域,特别涉及一种基于群组匹配的专家推荐方法及系统。The invention relates to the technical field of the Internet, in particular to an expert recommendation method and system based on group matching.

背景技术Background technique

科研项目的评审效率和评审质量对一个单位甚至一个国家的科研发展水平有重要影响。作为一种快捷、先进的评审方式,网络评审贯穿一个科研或工程项目从立项、申请、组织、论证、评估、验收、奖励到备案等各个阶段的全生命周期,其宗旨是利用计算机和网络系统替代传统的人工操作,从而低评审成本、提高工作效率和评审质量,并利用电子信息系统来规范评审过程。The evaluation efficiency and quality of scientific research projects have an important impact on the scientific research development level of a unit or even a country. As a fast and advanced evaluation method, online evaluation runs through the entire life cycle of a scientific research or engineering project from project establishment, application, organization, demonstration, evaluation, acceptance, reward to filing and other stages. Its purpose is to use computer and network systems to Replace traditional manual operations, thereby reducing review costs, improving work efficiency and review quality, and using electronic information systems to standardize the review process.

近年来,云计算、大数据、推荐系统、深度学习、社会网络等新型信息技术的迅速发展使得智能化的网络评审成为可能,其中智能化的专家推荐系统是整个网络评审过程的核心与难点。这里智能化的含义是:系统不仅能对内部信息(基于代码、精确化、结构化)进行处理和提炼,还能不断汇聚外部信息(基于语义、模糊化、非结构化),通过数据积累对专家本身进行分类和评价,生成更具指导意义的智能专家库,从而构造更为合理的推荐模型和算法,但现有的专家推荐系统存在专家推荐效率过低的问题,导致时间开销过大。In recent years, the rapid development of new information technologies such as cloud computing, big data, recommendation systems, deep learning, and social networks has made intelligent network review possible, and the intelligent expert recommendation system is the core and difficulty of the entire network review process. The meaning of intelligence here is: the system can not only process and refine internal information (based on code, precise, structured), but also continuously gather external information (based on semantics, fuzzy, unstructured), through data accumulation. Experts themselves classify and evaluate to generate a more instructive intelligent expert database, thereby constructing more reasonable recommendation models and algorithms. However, the existing expert recommendation system has the problem of low expert recommendation efficiency, resulting in excessive time consumption.

发明内容Contents of the invention

鉴于上述问题,本发明提供了一种基于群组匹配的专家推荐方法,所述方法包括:In view of the above problems, the present invention provides an expert recommendation method based on group matching, the method comprising:

S1:通过网络爬虫获取专家列表中各专家的网页信息;S1: Obtain the webpage information of each expert in the expert list through a web crawler;

S2:对所述网页信息进行提取,以获得各专家的专家学术信息;S2: Extracting the webpage information to obtain expert academic information of each expert;

S3:根据所述专家学术信息计算各专家与待匹配项目之间的匹配度;S3: Calculate the matching degree between each expert and the item to be matched according to the academic information of the expert;

S4:根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,所述群组匹配模型为所有待匹配项目推荐专家的匹配度之和达到最大时,所述待匹配项目和推荐的专家之间的对应关系。S4: According to the matching degree and the group matching model, the expert recommended for the item to be matched is determined by a dynamic programming algorithm, and the group matching model is when the sum of the matching degrees of all recommended experts for the item to be matched reaches the maximum, The corresponding relationship between the item to be matched and the recommended expert.

其中,步骤S1中,根据所述专家列表中的专家姓名通过网络爬虫获取专家列表中各专家的网页信息。Wherein, in step S1, the webpage information of each expert in the expert list is acquired through a web crawler according to the names of the experts in the expert list.

其中,步骤S2具体包括:Wherein, step S2 specifically includes:

S201:从所述网页信息中搜索与当前专家的专家姓名及工作单位相匹配的网页信息,若未搜索到,则执行步骤S202,否则从搜索到的第一个网页信息中提取出专家学术信息,并执行步骤S203,所述专家列表包括:各专家的专家姓名及工作单位;S201: Search the webpage information for the webpage information matching the expert name and work unit of the current expert, if not found, execute step S202, otherwise extract the expert's academic information from the first webpage information searched , and execute step S203, the list of experts includes: the name and work unit of each expert;

S202:从所述网页信息中搜索与当前专家的专家姓名相匹配的网页信息,从搜索到的第一个网页信息中提取专家学术信息;S202: Search for webpage information matching the expert name of the current expert from the webpage information, and extract expert academic information from the first searched webpage information;

S203:将所述专家列表中未提取专家学术信息的专家作为当前专家,并返回步骤S201。S203: Use the experts whose academic information has not been extracted from the expert list as the current experts, and return to step S201.

其中,所述专家学术信息包括:专家姓名、工作单位、研究领域关键词、论文名称及论文作者。Wherein, the academic information of the expert includes: the name of the expert, the work unit, keywords in the research field, the title of the paper and the author of the paper.

其中,步骤S3中,根据所述专家学术信息通过下式计算各专家与待匹配项目之间的匹配度,Wherein, in step S3, according to the academic information of the experts, the matching degree between each expert and the item to be matched is calculated by the following formula,

Mi,j=α*MKi,j+β*MJi,j+γ*MLi,j-δ*MSi,j M i,j =α*MK i,j +β*MJ i,j +γ*ML i,j -δ*MS i,j

其中,Mi,j为专家i与待匹配项目j之间的匹配度,α、β、γ、δ均为常数,MKi,j为专家i与待匹配项目j之间的科研领域关键词匹配度,MJi,j为专家i与待匹配项目j之间的期刊会议标签匹配度,MLi,j为专家i与待匹配项目j之间的学术层次匹配度,MSi,j为专家i与待匹配项目j之间的社会关系匹配度。Among them, M i,j is the matching degree between expert i and project j to be matched, α, β, γ, and δ are all constants, and MK i,j is the scientific research field keywords between expert i and project j to be matched Matching degree, MJ i,j is the matching degree of journal conference labels between expert i and project j to be matched, ML i,j is the academic level matching degree between expert i and project j to be matched, MS i,j is expert The matching degree of social relationship between i and item j to be matched.

其中,所述MSi,j通过下式进行计算,Wherein, the MS i,j is calculated by the following formula,

其中,为权重值;δi,j为专家i与第v个专家之间工作单位的相关度,当工作单位相同时,δi,j的取值为1,否则取值为0;sp为专家i与第v个专家合作过的论文;n为论文作者的数量;ti为专家i所占的权重;tv为第v个专家所占的权重;k为待匹配项目j所对应项目申请人的序号;a为待匹配项目j所对应项目申请人的数量。in, is the weight value; δ i, j is the correlation degree of the work unit between expert i and the vth expert, when the work unit is the same, the value of δ i, j is 1, otherwise the value is 0; sp is expert i Papers that have cooperated with the vth expert; n is the number of paper authors; t i is the weight of expert i; t v is the weight of the vth expert; k is the applicant corresponding to the project j to be matched ; a is the number of project applicants corresponding to project j to be matched.

其中,所述群组匹配模型为:Wherein, the group matching model is:

其中,ci,j为群组匹配矩阵,当专家i推荐给第j个项目时,则矩阵中对应的第i行第j列取值为1,否则取值为0;m为待匹配项目的总数;n为专家的总数;ε为每个待匹配项目所对应专家数量的最大值;σ为每个专家所对应项目数量的最大值。in, c i,j is the group matching matrix. When expert i recommends the jth item, the value of the i-th row and column j in the matrix is 1, otherwise the value is 0; m is the total number of items to be matched ; n is the total number of experts; ε is the maximum number of experts corresponding to each item to be matched; σ is the maximum number of items corresponding to each expert.

其中,步骤S4中,根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,具体包括:Wherein, in step S4, according to the matching degree and the group matching model, the expert recommended for the item to be matched is determined through a dynamic programming algorithm, specifically including:

S401:根据所述研究领域关键词确定每个待匹配项目所对应的专家,并将每个待匹配项目所对应的专家按照所述匹配度进行排序;S401: Determine the experts corresponding to each item to be matched according to the keywords in the research field, and sort the experts corresponding to each item to be matched according to the matching degree;

S402:依次将所述匹配度最高的专家分配给对应的待匹配项目,直至所述待匹配项目所对应专家数量达到了最大值ε或所述专家所对应项目数量达到了最大值σ。S402: Assign the experts with the highest matching degree to the corresponding items to be matched in turn until the number of experts corresponding to the items to be matched reaches the maximum value ε or the number of items corresponding to the experts reaches the maximum value σ.

本发明还公开了一种基于群组匹配的专家推荐系统,所述系统包括:The invention also discloses an expert recommendation system based on group matching, the system includes:

网页获取模块,用于通过网络爬虫获取专家列表中各专家的网页信息;The web page acquisition module is used to obtain the web page information of each expert in the expert list through a web crawler;

信息提取模块,用于对所述网页信息进行提取,以获得各专家的专家学术信息;An information extraction module, configured to extract the webpage information to obtain expert academic information of each expert;

匹配度计算模块,用于根据所述专家学术信息计算各专家与待匹配项目之间的匹配度;A matching degree calculation module, configured to calculate the matching degree between each expert and the item to be matched according to the expert's academic information;

专家推荐模块,用于根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,所述群组匹配模型为所有待匹配项目推荐专家的匹配度之和达到最大时,所述待匹配项目和推荐的专家之间的对应关系。The expert recommendation module is used to determine the expert recommended for the item to be matched through a dynamic programming algorithm according to the matching degree and the group matching model, and the group matching model is the sum of the matching degrees of experts recommended for all items to be matched When the maximum value is reached, the corresponding relationship between the item to be matched and the recommended expert.

本发明通过群组匹配的方式实现专家推荐,提高了专家推荐效率,大幅降低时间开销,另外,计算各专家与待匹配项目之间的匹配度时,还考虑了各专家与待匹配项目之间的社会关系匹配度,从而在实现专家推荐时,还有效避免或预防了学术腐败问题。The present invention realizes expert recommendation through group matching, improves the efficiency of expert recommendation, and greatly reduces time overhead. In addition, when calculating the matching degree between each expert and the item to be matched, the relationship between each expert and the item to be matched is also considered. The degree of social relationship matching, thus effectively avoiding or preventing academic corruption when implementing expert recommendation.

附图说明Description of drawings

图1是本发明一种实施方式的基于群组匹配的专家推荐方法的流程图;Fig. 1 is a flowchart of an expert recommendation method based on group matching in an embodiment of the present invention;

图2是本发明一种实施方式的基于群组匹配的专家推荐系统的结构框图。Fig. 2 is a structural block diagram of an expert recommendation system based on group matching according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例,对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明,但不用来限制本发明的范围。The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

图1是本发明一种实施方式的基于群组匹配的专家推荐方法的流程图;参照图1,所述方法包括:Fig. 1 is a flowchart of an expert recommendation method based on group matching in an embodiment of the present invention; referring to Fig. 1, the method includes:

S1:通过网络爬虫获取专家列表中各专家的网页信息;S1: Obtain the webpage information of each expert in the expert list through a web crawler;

需要说明的是,一般的搜索引擎中需要网络爬虫下载大量的数据,而步骤S1的网络爬虫只需要访问单站点(即学术性网站),并且规则明显,故而不需要用到分布式及网页排名等技术。It should be noted that a web crawler is required to download a large amount of data in a general search engine, but the web crawler in step S1 only needs to visit a single site (that is, an academic website), and the rules are obvious, so there is no need to use distributed and webpage ranking and other technologies.

不过目标网站需要对身份进行认证,所以我们需要使用浏览器解析认证过程,然后模拟浏览器登录服务器进行爬取。一般来说,网站的认证机制可以通过模拟cookie来获取,经过对目标网站的分析,发现目标网站也是通过cookie进行认证。所以,首先人工登录网站服务器获取到拥有权限的cookie,然后拷贝到网络爬虫中,网络爬虫使用这个权限对网页进行爬取。However, the target website needs to authenticate the identity, so we need to use the browser to parse the authentication process, and then simulate the browser to log in to the server for crawling. Generally speaking, the authentication mechanism of the website can be obtained by simulating cookies. After analyzing the target website, it is found that the target website is also authenticated through cookies. Therefore, first, manually log in to the website server to obtain a cookie with permission, and then copy it to the web crawler, and the web crawler uses this permission to crawl the webpage.

经过分析,获取专家的网页信息时只需要改动提交的http中的专家编号范围参数就可以获取到相应的网页信息。获取的网页信息以html网页的形式保存在硬盘中,然后使用正则表达式对网页信息进行解析,获取到专家的信息,并构建了基本专家数据库。After analysis, when obtaining the webpage information of the expert, only need to change the parameter of the expert number range in the submitted http to obtain the corresponding webpage information. The obtained webpage information is stored in the hard disk in the form of html webpage, and then the webpage information is analyzed by using regular expressions to obtain expert information and build a basic expert database.

可理解的是,本步骤中,根据所述专家列表中的专家姓名通过网络爬虫获取专家列表中各专家的网页信息。It can be understood that, in this step, the webpage information of each expert in the expert list is acquired through a web crawler according to the names of the experts in the expert list.

S2:对所述网页信息进行提取,以获得各专家的专家学术信息;S2: Extracting the webpage information to obtain expert academic information of each expert;

需要说明的是,有了基本专家数据库后,可以使用专家姓名和工作单位对专家进行筛选,提取专家学术信息。在提取专家学术信息的过程中,发现有很多专家列表中的专家出现了重名的现象,故而可以通过工作单位进行区分。It should be noted that, with the basic expert database, experts can be screened by name and work unit to extract expert academic information. In the process of extracting the academic information of experts, it is found that many experts in the expert list have the same name, so they can be distinguished by the work unit.

本步骤可由以下步骤S201~203实现:This step can be realized by the following steps S201-203:

S201:从所述网页信息中搜索与当前专家的专家姓名及工作单位相匹配的网页信息,若未搜索到,则执行步骤S202,否则从搜索到的第一个网页信息中提取出专家学术信息,并执行步骤S203,所述专家列表包括:各专家的专家姓名及工作单位;S201: Search the webpage information for the webpage information matching the expert name and work unit of the current expert, if not found, execute step S202, otherwise extract the expert's academic information from the first webpage information searched , and execute step S203, the list of experts includes: the name and work unit of each expert;

S202:从所述网页信息中搜索与当前专家的专家姓名相匹配的网页信息,从搜索到的第一个网页信息中提取专家学术信息;S202: Search for webpage information matching the expert name of the current expert from the webpage information, and extract expert academic information from the first searched webpage information;

S203:将所述专家列表中未提取专家学术信息的专家作为当前专家,并返回步骤S201。S203: Use the experts whose academic information has not been extracted from the expert list as the current experts, and return to step S201.

可选地,所述专家学术信息包括:专家姓名、工作单位、研究领域关键词、论文名称及论文作者。Optionally, the expert's academic information includes: expert's name, work unit, research field keywords, paper title and paper author.

S3:根据所述专家学术信息计算各专家与待匹配项目之间的匹配度;S3: Calculate the matching degree between each expert and the item to be matched according to the academic information of the expert;

S4:根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,所述群组匹配模型为所有待匹配项目推荐专家的匹配度之和达到最大时,所述待匹配项目和推荐的专家之间的对应关系。S4: According to the matching degree and the group matching model, the expert recommended for the item to be matched is determined by a dynamic programming algorithm, and the group matching model is when the sum of the matching degrees of all recommended experts for the item to be matched reaches the maximum, The corresponding relationship between the item to be matched and the recommended expert.

在实现专家推荐时,需要相关领域的专家来评审项目,故而专家和项目之间的匹配度与专家和项目的领域相关度成正相关;为了避免和预防学术腐败,若项目的申请人和评审专家之间具有社交关系的相关度,如共同合作过论文、在同一个工作单位等,那么专家和项目之间的匹配度,则与专家和项目之间的社交关系相关度成反比,故而,步骤S3中,根据所述专家学术信息通过下式计算各专家与待匹配项目之间的匹配度,When implementing expert recommendation, experts in related fields are needed to review projects, so the matching degree between experts and projects is positively related to the domain correlation between experts and projects; in order to avoid and prevent academic corruption, if the project applicant and review experts There is a social relationship between experts, such as co-authored papers, in the same work unit, etc., then the matching degree between experts and projects is inversely proportional to the social relationship between experts and projects. Therefore, the steps In S3, according to the academic information of the experts, the matching degree between each expert and the item to be matched is calculated by the following formula,

Mi,j=α*MKi,j+β*MJi,j+γ*MLi,j-δ*MSi,j M i,j =α*MK i,j +β*MJ i,j +γ*ML i,j -δ*MS i,j

其中,Mi,j为专家i与待匹配项目j之间的匹配度,α、β、γ、δ均为常数,MKi,j为专家i与待匹配项目j之间的科研领域关键词匹配度,MJi,j为专家i与待匹配项目j之间的期刊会议标签匹配度,MLi,j为专家i与待匹配项目j之间的学术层次匹配度,MSi,j为专家i与待匹配项目j之间的社会关系匹配度。Among them, M i,j is the matching degree between expert i and project j to be matched, α, β, γ, and δ are all constants, and MK i,j is the scientific research field keywords between expert i and project j to be matched Matching degree, MJ i,j is the matching degree of journal conference labels between expert i and project j to be matched, ML i,j is the academic level matching degree between expert i and project j to be matched, MS i,j is expert The matching degree of social relationship between i and item j to be matched.

为计算所述社会关系匹配度,可选地,所述MSi,j通过下式进行计算,In order to calculate the matching degree of the social relationship, optionally, the MS i,j is calculated by the following formula,

其中,为权重值;δi,j为专家i与第v个专家之间工作单位的相关度,当工作单位相同时,δi,j的取值为1,否则取值为0;sp为专家i与第v个专家合作过的论文;n为论文作者的数量;ti为专家i所占的权重(该权重可根据论文作者顺序确定);tv为第v个专家所占的权重(该权重可根据论文作者顺序确定);k为待匹配项目j所对应项目申请人的序号;a为待匹配项目j所对应项目申请人的数量。in, is the weight value; δ i, j is the correlation degree of the work unit between expert i and the vth expert, when the work unit is the same, the value of δ i, j is 1, otherwise the value is 0; sp is expert i Papers that have cooperated with the vth expert; n is the number of authors of the paper; t i is the weight of expert i (the weight can be determined according to the order of the authors of the paper); t v is the weight of the vth expert (the The weight can be determined according to the order of the authors of the paper); k is the serial number of the project applicant corresponding to the project j to be matched; a is the number of project applicants corresponding to the project j to be matched.

可选地,假设专家i的科研领域关键词表述为向量<Ki,1,Ki,2,Ki,3,...,Ki,N>,其权重(实际是关键词频率)表述为向量<wi,1,wi,2,wi,3,...,wi,N>;项目j的科研领域关键词表述为向量<Kj,1,Kj,2,Kj,3,...,Kj,N>,其权重(实际是关键词频率)表述为向量<wj,1,wj,2,wj,3,...,wj,N>。使用基于内容的向量推荐算法,定义科研领域关键词匹配度MKi,j为:Optionally, assume that the keywords of expert i’s scientific research field are expressed as vectors <K i,1 ,K i,2 ,K i,3 ,...,K i,N >, and its weight (actually keyword frequency) It is expressed as a vector <w i,1 ,wi ,2 ,wi ,3 ,...,wi ,N >; the keywords of the scientific research field of project j are expressed as a vector <K j,1 ,K j,2 , K j,3 ,...,K j,N >, its weight (actually keyword frequency) is expressed as a vector <w j,1 ,w j,2 ,w j,3 ,...,w j, N >. Using the content-based vector recommendation algorithm, define the keyword matching degree MK i,j in the scientific research field as:

这里R(Ki,x,Kj,y)表示两个科研领域关键词Ki,x和Kj,y的相似度(R=Resemblance)。Here R(K i,x ,K j,y ) represents the similarity between keywords K i,x and K j,y of two scientific research fields (R=Resemblance).

由于科研领域关键词在进行提取时,通常不会特别准确,故而,在计算R(Ki,x,Kj,y)时,首先计算两个科研关键词的编辑距离:Levenshein distance,编辑距离是指两个字串之间,由一个转成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。假设编辑距离为d,两个科研关键词中最长的词长度为max,那么相似度便为1-d/max。Since keywords in the field of scientific research are usually not particularly accurate when extracting them, when calculating R(K i,x ,K j,y ), first calculate the edit distance between two scientific research keywords: Levenshein distance, edit distance It refers to the minimum number of editing operations required to convert from one string to another between two strings. Permissible editing operations include replacing one character with another, inserting a character, and deleting a character. Assuming that the edit distance is d, and the length of the longest word among the two scientific research keywords is max, then the similarity is 1-d/max.

假设专家i的期刊会议标签表述为向量<Ji,1,Ji,2,Ji,3,...,Ji,N>,其权重(实际是标签频率)表述为向量<wi,1,wi,2,wi,3,...,wi,N>;项目j的期刊会议标签表述为向量<Jj,1,Jj,2,Jj,3,...,Jj,N>,其权重(实际是标签频率)表述为向量<wj,1,wj,2,wj,3,...,wj,N>。使用基于标签的向量推荐算法,定义期刊会议标签匹配度MJi,j为:Assume that expert i’s journal conference labels are expressed as vectors <J i,1 ,J i,2 ,J i,3 ,...,J i,N >, and their weights (actually label frequencies) are expressed as vectors <w i ,1 ,wi ,2 ,wi ,3 ,...,wi ,N >; the journal conference label of project j is expressed as a vector <J j,1 ,J j,2 ,J j,3 ,.. .,J j,N >, its weight (actually label frequency) is expressed as a vector <w j,1 ,w j,2 ,w j,3 ,...,w j,N >. Using the tag-based vector recommendation algorithm, define the journal conference tag matching degree MJ i,j as:

这里I(Ji,x,Jj,y)表示两个期刊会议标签Ji,x和Jj,y是否相同(I=Identity)。Here I(J i,x ,J j,y ) indicates whether the conference labels J i,x and J j,y of two journals are the same (I=Identity).

需要注意的是,由于期刊会议标签一般是十分精确的,不同于R(Ki,x,Kj,y)介于0到1.0之间,I(Ji,x,Jj,y)当期刊会议标签相等时取1,不相等时取0。It should be noted that since journal conference labels are generally very accurate, unlike R(K i,x ,K j,y ) between 0 and 1.0, I(J i,x ,J j,y ) when Journal conference labels are equal to 1 and 0 if not equal.

假设专家i的学术层次向量为<单位级别,职称,科研项目规模>,其中单位级别为所在高校层次:如985、211、普通本科、专科之类;职称:如硕士生导师、博士生导师、长江学者、院士等;科研项目规模:已经申请完毕包括已经完成和正在进行的国家科研项目:如863项目等,评价指标为科研资金数额;Assume that the academic level vector of expert i is <unit level, professional title, scientific research project scale>, where the unit level is the level of the university: such as 985, 211, general undergraduate, junior college, etc.; professional title: such as master tutor, doctoral tutor, Cheung Kong Scholars, academicians, etc.; scale of scientific research projects: applications have been completed, including completed and ongoing national scientific research projects: such as 863 projects, etc., and the evaluation index is the amount of scientific research funds;

对于使用这样一组向量表示的集合,使用k-means聚类方法对专家的学生层次进行聚类,首先选取k个具有代表性的专家,把和代表专家相似的专家放到一类里面,这样把专家放到k个类别里面。计算两个专家的学术层次相似度的时候,如果两个专家属于同一类,那么相似度为1,如果两个专家不处于一类,那么相似度为0。For a set represented by such a set of vectors, the k-means clustering method is used to cluster the student level of experts. First, k representative experts are selected, and experts similar to the representative experts are put into a class, so that Put experts into k categories. When calculating the academic level similarity of two experts, if the two experts belong to the same category, then the similarity is 1, and if the two experts are not in the same category, then the similarity is 0.

那么每一个项目的申请人的学术层次我们可以使用这样一组向量V1、V2、V3、……、Vp(假设项目j共有p个申请人)来表示。在计算某个专家和项目之间的学术层次匹配度的时候,可以分别计算专家和每个项目申请人的相似度,然后求和,除以申请人的个数,得到项目和专家之间的相似度。Then we can use such a set of vectors V 1 , V 2 , V 3 , ..., V p to represent the academic level of applicants for each project (assuming that there are p applicants in project j). When calculating the academic matching degree between an expert and a project, the similarity between the expert and each project applicant can be calculated separately, then summed, divided by the number of applicants, and the degree of similarity between the project and the expert can be obtained similarity.

可选地,所述群组匹配模型为:Optionally, the group matching model is:

其中,ci,j为群组匹配矩阵,当专家i推荐给第j个项目时,则矩阵中对应的第i行第j列取值为1,否则取值为0;m为待匹配项目的总数;n为专家的总数;ε为每个待匹配项目所对应专家数量的最大值;σ为每个专家所对应项目数量的最大值。in, c i,j is the group matching matrix. When expert i recommends the jth item, the value of the i-th row and column j in the matrix is 1, otherwise the value is 0; m is the total number of items to be matched ; n is the total number of experts; ε is the maximum number of experts corresponding to each item to be matched; σ is the maximum number of items corresponding to each expert.

为便于确定为所述待匹配项目所推荐的专家,可选地,步骤S4中,根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,具体包括:In order to facilitate the determination of experts recommended for the item to be matched, optionally, in step S4, the expert recommended for the item to be matched is determined through a dynamic programming algorithm according to the matching degree and the group matching model, specifically including :

S401:根据所述研究领域关键词确定每个待匹配项目所对应的专家,并将每个待匹配项目所对应的专家按照所述匹配度进行排序;S401: Determine the experts corresponding to each item to be matched according to the keywords in the research field, and sort the experts corresponding to each item to be matched according to the matching degree;

S402:依次将所述匹配度最高的专家分配给对应的待匹配项目,直至所述待匹配项目所对应专家数量达到了最大值ε或所述专家所对应项目数量达到了最大值σ。S402: Assign the experts with the highest matching degree to the corresponding items to be matched in turn until the number of experts corresponding to the items to be matched reaches the maximum value ε or the number of items corresponding to the experts reaches the maximum value σ.

本发明还公开了一种基于群组匹配的专家推荐系统,参照图2,所述系统包括:The present invention also discloses an expert recommendation system based on group matching. Referring to FIG. 2, the system includes:

网页获取模块,用于通过网络爬虫获取专家列表中各专家的网页信息;The web page acquisition module is used to obtain the web page information of each expert in the expert list through a web crawler;

信息提取模块,用于对所述网页信息进行提取,以获得各专家的专家学术信息;An information extraction module, configured to extract the webpage information to obtain expert academic information of each expert;

匹配度计算模块,用于根据所述专家学术信息计算各专家与待匹配项目之间的匹配度;A matching degree calculation module, configured to calculate the matching degree between each expert and the item to be matched according to the expert's academic information;

专家推荐模块,用于根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,所述群组匹配模型为所有待匹配项目推荐专家的匹配度之和达到最大时,所述待匹配项目和推荐的专家之间的对应关系。The expert recommendation module is used to determine the expert recommended for the item to be matched through a dynamic programming algorithm according to the matching degree and the group matching model, and the group matching model is the sum of the matching degrees of experts recommended for all items to be matched When the maximum value is reached, the corresponding relationship between the item to be matched and the recommended expert.

本系统还包括用于实现上述方法各步骤的模块、子模块、单元、子单元,为避免重复说明,不再赘述。The system also includes modules, sub-modules, units, and sub-units for realizing each step of the above-mentioned method, and in order to avoid repeated description, details are not repeated here.

以上实施方式仅用于说明本发明,而并非对本发明的限制,有关技术领域的普通技术人员,在不脱离本发明的精神和范围的情况下,还可以做出各种变化和变型,因此所有等同的技术方案也属于本发明的范畴,本发明的专利保护范围应由权利要求限定。The above embodiments are only used to illustrate the present invention, but not to limit the present invention. Those of ordinary skill in the relevant technical field can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, all Equivalent technical solutions also belong to the category of the present invention, and the scope of patent protection of the present invention should be defined by the claims.

Claims (5)

1. An expert recommendation method based on group matching, the method comprising:
s1: acquiring webpage information of each expert in the expert list through a web crawler;
s2: extracting the webpage information to obtain expert academic information of each expert;
s3: calculating the matching degree between each expert and the item to be matched according to the expert academic information;
s4: determining experts recommended for the items to be matched through a dynamic programming algorithm according to the matching degree and a group matching model, wherein the group matching model is a corresponding relation between the items to be matched and the recommended experts when the sum of the matching degrees of all the recommended experts of the items to be matched reaches the maximum;
the expert academic information comprises: expert names, work units, research field keywords, paper names and paper authors;
in step S4, determining, according to the matching degree and the group matching model, an expert recommended for the item to be matched through a dynamic programming algorithm, specifically including:
s401: determining the experts corresponding to each item to be matched according to the research field keywords, and sequencing the experts corresponding to each item to be matched according to the matching degree;
s402: sequentially distributing the experts with the highest matching degree to the corresponding items to be matched until the number of the experts corresponding to the items to be matched reaches a maximum value epsilon or the number of the items corresponding to the experts reaches a maximum value sigma;
in step S3, the matching degree between each expert and the item to be matched is calculated according to the expert academic information through the following formula,
M i,j =α*MK i,j +β*MJ i,j +γ*ML i,j -δ*MS i,j
wherein M is i,j For the matching degree between the expert i and the item j to be matched, alpha, beta, gamma and delta are constants, MK i,j For expertsDegree of matching, MJ, of scientific research field keywords between i and item j to be matched i,j For the matching degree of the labels of the periodical meeting between the expert i and the item j to be matched, ML i,j For the academic level matching degree between the expert i and the item j to be matched, MS i,j Matching degree of social relationship between the expert i and the item j to be matched;
the MS i,j The calculation is carried out by the following formula,
wherein, is a weighted value; delta. For the preparation of a coating i,j Is the correlation degree of the working units between the expert i and the expert v, when the working units are the same, delta i,j The value of (1) is 1, otherwise the value of (0) is obtained; sp is a thesis that the expert i cooperates with the nth expert; n is the number of paper authors; t is t i The weight occupied by expert i; t is t v The weight occupied by the vth expert; k is the serial number of the project applicant corresponding to the project j to be matched; a is the number of the item applicants corresponding to the item j to be matched.
2. The method as claimed in claim 1, wherein in step S1, web page information of each expert in the expert list is obtained through a web crawler according to the names of the experts in the expert list.
3. The method according to claim 2, wherein step S2 specifically comprises:
s201: searching the web page information matched with the expert name and the working unit of the current expert from the web page information, if the web page information is not searched, executing the step S202, otherwise, extracting expert academic information from the searched first web page information, and executing the step S203, wherein the expert list comprises: the expert names and work units of each expert;
s202: searching webpage information matched with the expert name of the current expert from the webpage information, and extracting expert academic information from the searched first webpage information;
s203: the experts from which the expert academic information is not extracted in the expert list are taken as the current experts, and the process returns to step S201.
4. The method of claim 1, wherein the group matching model is:
wherein,c i,j matching a matrix for the group, wherein when an expert i recommends a jth project, the jth column of a corresponding ith row in the matrix takes a value of 1, and otherwise, the jth column takes a value of 0; m is the total number of items to be matched; n is the total number of experts; epsilon is the maximum value of the number of experts corresponding to each item to be matched; sigma is the maximum value of the number of items corresponding to each expert.
5. An expert recommendation system based on group matching, the system comprising:
the webpage acquisition module is used for acquiring webpage information of each expert in the expert list through a web crawler;
the information extraction module is used for extracting the webpage information to obtain expert academic information of each expert;
the matching degree calculation module is used for calculating the matching degree between each expert and the item to be matched according to the expert academic information;
the expert recommending module is used for determining the experts recommended for the items to be matched through a dynamic programming algorithm according to the matching degree and a group matching model, and the group matching model is the corresponding relation between the items to be matched and the recommended experts when the sum of the matching degrees of all the recommended experts of the items to be matched reaches the maximum;
the expert academic information includes: expert names, work units, research field keywords, thesis names and thesis authors;
the expert recommendation module is specifically configured to:
determining the experts corresponding to each item to be matched according to the research field keywords, and sequencing the experts corresponding to each item to be matched according to the matching degree;
sequentially distributing the experts with the highest matching degree to corresponding items to be matched until the number of the experts corresponding to the items to be matched reaches a maximum value epsilon or the number of the items corresponding to the experts reaches a maximum value sigma;
calculating the matching degree between each expert and the item to be matched according to the expert academic information through the following formula,
M i,j =α*MK i,j +β*MJ i,j +γ*ML i,j -δ*MS i,j
wherein M is i,j For the matching degree between the expert i and the item j to be matched, alpha, beta, gamma and delta are constants, MK i,j Matching degree of keywords, MJ, in the scientific research field between expert i and item j to be matched i,j For the matching degree of the labels of the periodical meeting between the expert i and the item j to be matched, ML i,j For the academic level matching degree between the expert i and the item j to be matched, MS i,j Matching degree of social relationship between the expert i and the item j to be matched;
the MS i,j The calculation is carried out by the following formula,
wherein, is a weighted value; delta. For the preparation of a coating i,j Is the correlation degree of the working units between the expert i and the expert v, when the working units are the same, delta i,j The value of (1) is 1, otherwise the value of (0) is obtained; sp is a thesis that the expert i cooperates with the nth expert; n is the number of paper authors; t is t i The weight occupied by expert i; t is t v The weight occupied by the vth expert; k is the serial number of the project applicant corresponding to the project j to be matched; a is the number of the item applicants corresponding to the item j to be matched.
CN201410680306.6A 2014-11-24 2014-11-24 A kind of expert recommendation method and system based on group matches Active CN104361102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410680306.6A CN104361102B (en) 2014-11-24 2014-11-24 A kind of expert recommendation method and system based on group matches

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410680306.6A CN104361102B (en) 2014-11-24 2014-11-24 A kind of expert recommendation method and system based on group matches

Publications (2)

Publication Number Publication Date
CN104361102A CN104361102A (en) 2015-02-18
CN104361102B true CN104361102B (en) 2018-05-11

Family

ID=52528362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410680306.6A Active CN104361102B (en) 2014-11-24 2014-11-24 A kind of expert recommendation method and system based on group matches

Country Status (1)

Country Link
CN (1) CN104361102B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160699A (en) * 2019-11-26 2020-05-15 清华大学 Expert recommendation method and system

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260849A (en) * 2015-10-21 2016-01-20 内蒙古科技大学 Scientific researcher evaluation method across social networks
CN106227771B (en) * 2016-07-15 2019-05-07 浙江大学 A Domain Expert Discovery Method Based on Social Programming Websites
CN106295147A (en) * 2016-07-29 2017-01-04 广州比特软件科技有限公司 The medical expert's personalized recommendation method solved based on big data and system
CN106952191A (en) * 2017-03-09 2017-07-14 深圳市华第时代科技有限公司 The automatic reviewing method of motion and system
CN108255957A (en) * 2017-12-21 2018-07-06 杭州传送门网络科技有限公司 One kind recommends matching process based on Venture Capital field precision dataization
CN108829752A (en) * 2018-05-25 2018-11-16 南京邮电大学 Based on personalized tutor's proposed algorithm
CN108549730A (en) * 2018-06-01 2018-09-18 云南电网有限责任公司电力科学研究院 A kind of search method and device of expert info
CN108873706B (en) * 2018-07-30 2022-04-15 中国石油化工股份有限公司 Trap evaluation intelligent expert recommendation method based on deep neural network
CN110263135B (en) * 2019-05-20 2022-12-16 北京字节跳动网络技术有限公司 Data exchange matching method, device, medium and electronic equipment
CN110888964B (en) * 2019-07-22 2023-09-01 天津大学 Expert Secondary Recommendation Method and Device Based on Improved PageRank Algorithm
CN110956354A (en) * 2019-08-30 2020-04-03 深圳传世智慧科技有限公司 Change management resource matching method, server and change management system
CN110795640B (en) * 2019-10-12 2023-08-18 华中师范大学 Self-adaptive group recommendation method for compensating group member difference
CN111090801B (en) * 2019-12-18 2023-06-09 创新奇智(青岛)科技有限公司 Expert human relation map drawing method and system
CN113516094B (en) * 2021-07-28 2024-03-08 中国科学院计算技术研究所 System and method for matching and evaluating expert for document
CN113902290B (en) * 2021-09-14 2022-11-04 中国人民解放军军事科学院战略评估咨询中心 Expert matching effectiveness measuring and calculating method facing evaluation task
CN114510918A (en) * 2022-02-16 2022-05-17 数字浙江技术运营有限公司 Expert matching method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605665A (en) * 2013-10-24 2014-02-26 杭州电子科技大学 Keyword based evaluation expert intelligent search and recommendation method
CN103631859A (en) * 2013-10-24 2014-03-12 杭州电子科技大学 Intelligent review expert recommending method for science and technology projects
CN103823896A (en) * 2014-03-13 2014-05-28 蚌埠医学院 Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm
WO2014107672A1 (en) * 2013-01-07 2014-07-10 dotbox, inc. Validated product recommendation system and methods

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014107672A1 (en) * 2013-01-07 2014-07-10 dotbox, inc. Validated product recommendation system and methods
CN103605665A (en) * 2013-10-24 2014-02-26 杭州电子科技大学 Keyword based evaluation expert intelligent search and recommendation method
CN103631859A (en) * 2013-10-24 2014-03-12 杭州电子科技大学 Intelligent review expert recommending method for science and technology projects
CN103823896A (en) * 2014-03-13 2014-05-28 蚌埠医学院 Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于社会网络的科技咨询专家库构建及其可视化研究;王雪芬;《中国优秀硕士学位论文全文数据库经济与管理科学辑》;20100815(第10期);J168-2:正文第29页倒数第3段-倒数第1段,第30页第1段 *
基于网络方法的专家知识推荐;许云红;《中国博士学位论文全文数据库经济与管理科学辑》;20101015(第10期);J152-20:正文第13页2.3.1节,第53页第1段,第54页第1段,第56-58页5.3节 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160699A (en) * 2019-11-26 2020-05-15 清华大学 Expert recommendation method and system

Also Published As

Publication number Publication date
CN104361102A (en) 2015-02-18

Similar Documents

Publication Publication Date Title
CN104361102B (en) A kind of expert recommendation method and system based on group matches
Lao et al. Fast query execution for retrieval models based on path-constrained random walks
CN114238573B (en) Text countercheck sample-based information pushing method and device
Li Learning to rank for information retrieval and natural language processing
He et al. Context-aware citation recommendation
Kong et al. Exploring dynamic research interest and academic influence for scientific collaborator recommendation
CN103886054B (en) Personalization recommendation system and method of network teaching resources
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN102508859A (en) Advertisement classification method and device based on webpage characteristic
Amami et al. A graph based approach to scientific paper recommendation
CN112989215B (en) A Knowledge Graph Enhanced Recommendation System Based on Sparse User Behavior Data
Deng et al. Enhanced models for expertise retrieval using community-aware strategies
Noel et al. Applicability of Latent Dirichlet Allocation to multi-disk search
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
Wang et al. Multi-task representation learning for demographic prediction
Utama et al. Scientific Articles Recommendation System Based On User’s Relatedness Using Item-Based Collaborative Filtering Method
Zhao et al. ST-LDA: high quality similar words augmented LDA for service clustering
Banaei et al. Web page rank estimation in search engine based on SEO parameters using machine learning techniques
Kaur Web content classification: a survey
Azzam et al. A question routing technique using deep neural network for communities of question answering
Lu et al. Learning multimodal neural network with ranking examples
Manolopoulos et al. Metrics and rankings: Myths and fallacies
Mehrotra et al. Task-Based User Modelling for Personalization via Probabilistic Matrix Factorization.
Ma Research on digital English teaching materials recommendation based on improved machine learning
Roy et al. Automated resume classification using machine learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant