CN104361102B - A kind of expert recommendation method and system based on group matches - Google Patents
A kind of expert recommendation method and system based on group matches Download PDFInfo
- Publication number
- CN104361102B CN104361102B CN201410680306.6A CN201410680306A CN104361102B CN 104361102 B CN104361102 B CN 104361102B CN 201410680306 A CN201410680306 A CN 201410680306A CN 104361102 B CN104361102 B CN 104361102B
- Authority
- CN
- China
- Prior art keywords
- expert
- matched
- item
- experts
- matching degree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000011160 research Methods 0.000 claims description 26
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 3
- 239000011248 coating agent Substances 0.000 claims 2
- 238000000576 coating method Methods 0.000 claims 2
- 238000002360 preparation method Methods 0.000 claims 2
- 238000012163 sequencing technique Methods 0.000 claims 2
- 239000013598 vector Substances 0.000 description 13
- 238000012552 review Methods 0.000 description 7
- 238000011156 evaluation Methods 0.000 description 5
- 235000014510 cooky Nutrition 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种基于群组匹配的专家推荐方法及系统,属于互联网技术领域,所述方法包括:S1:通过网络爬虫获取专家列表中各专家的网页信息;S2:对所述网页信息进行提取,以获得各专家的专家学术信息;S3:根据所述专家学术信息计算各专家与待匹配项目之间的匹配度;S4:根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家。本发明通过群组匹配的方式实现专家推荐,提高了专家推荐效率,大幅降低时间开销,另外,计算各专家与待匹配项目之间的匹配度时,还考虑了各专家与待匹配项目之间的社会关系匹配度,从而在实现专家推荐时,还有效避免或预防了学术腐败问题。
The invention discloses an expert recommendation method and system based on group matching, which belongs to the technical field of the Internet. The method includes: S1: obtaining the web page information of each expert in the expert list through a web crawler; S2: performing the web page information Extract to obtain the expert academic information of each expert; S3: Calculate the matching degree between each expert and the item to be matched according to the expert academic information; S4: Determine the matching degree and group matching model according to the dynamic programming algorithm as Experts recommended by the item to be matched. The present invention realizes expert recommendation through group matching, improves the efficiency of expert recommendation, and greatly reduces time overhead. In addition, when calculating the matching degree between each expert and the item to be matched, the relationship between each expert and the item to be matched is also considered. The degree of social relationship matching, thus effectively avoiding or preventing academic corruption when implementing expert recommendation.
Description
技术领域technical field
本发明涉及互联网技术领域,特别涉及一种基于群组匹配的专家推荐方法及系统。The invention relates to the technical field of the Internet, in particular to an expert recommendation method and system based on group matching.
背景技术Background technique
科研项目的评审效率和评审质量对一个单位甚至一个国家的科研发展水平有重要影响。作为一种快捷、先进的评审方式,网络评审贯穿一个科研或工程项目从立项、申请、组织、论证、评估、验收、奖励到备案等各个阶段的全生命周期,其宗旨是利用计算机和网络系统替代传统的人工操作,从而低评审成本、提高工作效率和评审质量,并利用电子信息系统来规范评审过程。The evaluation efficiency and quality of scientific research projects have an important impact on the scientific research development level of a unit or even a country. As a fast and advanced evaluation method, online evaluation runs through the entire life cycle of a scientific research or engineering project from project establishment, application, organization, demonstration, evaluation, acceptance, reward to filing and other stages. Its purpose is to use computer and network systems to Replace traditional manual operations, thereby reducing review costs, improving work efficiency and review quality, and using electronic information systems to standardize the review process.
近年来,云计算、大数据、推荐系统、深度学习、社会网络等新型信息技术的迅速发展使得智能化的网络评审成为可能,其中智能化的专家推荐系统是整个网络评审过程的核心与难点。这里智能化的含义是:系统不仅能对内部信息(基于代码、精确化、结构化)进行处理和提炼,还能不断汇聚外部信息(基于语义、模糊化、非结构化),通过数据积累对专家本身进行分类和评价,生成更具指导意义的智能专家库,从而构造更为合理的推荐模型和算法,但现有的专家推荐系统存在专家推荐效率过低的问题,导致时间开销过大。In recent years, the rapid development of new information technologies such as cloud computing, big data, recommendation systems, deep learning, and social networks has made intelligent network review possible, and the intelligent expert recommendation system is the core and difficulty of the entire network review process. The meaning of intelligence here is: the system can not only process and refine internal information (based on code, precise, structured), but also continuously gather external information (based on semantics, fuzzy, unstructured), through data accumulation. Experts themselves classify and evaluate to generate a more instructive intelligent expert database, thereby constructing more reasonable recommendation models and algorithms. However, the existing expert recommendation system has the problem of low expert recommendation efficiency, resulting in excessive time consumption.
发明内容Contents of the invention
鉴于上述问题,本发明提供了一种基于群组匹配的专家推荐方法,所述方法包括:In view of the above problems, the present invention provides an expert recommendation method based on group matching, the method comprising:
S1:通过网络爬虫获取专家列表中各专家的网页信息;S1: Obtain the webpage information of each expert in the expert list through a web crawler;
S2:对所述网页信息进行提取,以获得各专家的专家学术信息;S2: Extracting the webpage information to obtain expert academic information of each expert;
S3:根据所述专家学术信息计算各专家与待匹配项目之间的匹配度;S3: Calculate the matching degree between each expert and the item to be matched according to the academic information of the expert;
S4:根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,所述群组匹配模型为所有待匹配项目推荐专家的匹配度之和达到最大时,所述待匹配项目和推荐的专家之间的对应关系。S4: According to the matching degree and the group matching model, the expert recommended for the item to be matched is determined by a dynamic programming algorithm, and the group matching model is when the sum of the matching degrees of all recommended experts for the item to be matched reaches the maximum, The corresponding relationship between the item to be matched and the recommended expert.
其中,步骤S1中,根据所述专家列表中的专家姓名通过网络爬虫获取专家列表中各专家的网页信息。Wherein, in step S1, the webpage information of each expert in the expert list is acquired through a web crawler according to the names of the experts in the expert list.
其中,步骤S2具体包括:Wherein, step S2 specifically includes:
S201:从所述网页信息中搜索与当前专家的专家姓名及工作单位相匹配的网页信息,若未搜索到,则执行步骤S202,否则从搜索到的第一个网页信息中提取出专家学术信息,并执行步骤S203,所述专家列表包括:各专家的专家姓名及工作单位;S201: Search the webpage information for the webpage information matching the expert name and work unit of the current expert, if not found, execute step S202, otherwise extract the expert's academic information from the first webpage information searched , and execute step S203, the list of experts includes: the name and work unit of each expert;
S202:从所述网页信息中搜索与当前专家的专家姓名相匹配的网页信息,从搜索到的第一个网页信息中提取专家学术信息;S202: Search for webpage information matching the expert name of the current expert from the webpage information, and extract expert academic information from the first searched webpage information;
S203:将所述专家列表中未提取专家学术信息的专家作为当前专家,并返回步骤S201。S203: Use the experts whose academic information has not been extracted from the expert list as the current experts, and return to step S201.
其中,所述专家学术信息包括:专家姓名、工作单位、研究领域关键词、论文名称及论文作者。Wherein, the academic information of the expert includes: the name of the expert, the work unit, keywords in the research field, the title of the paper and the author of the paper.
其中,步骤S3中,根据所述专家学术信息通过下式计算各专家与待匹配项目之间的匹配度,Wherein, in step S3, according to the academic information of the experts, the matching degree between each expert and the item to be matched is calculated by the following formula,
Mi,j=α*MKi,j+β*MJi,j+γ*MLi,j-δ*MSi,j M i,j =α*MK i,j +β*MJ i,j +γ*ML i,j -δ*MS i,j
其中,Mi,j为专家i与待匹配项目j之间的匹配度,α、β、γ、δ均为常数,MKi,j为专家i与待匹配项目j之间的科研领域关键词匹配度,MJi,j为专家i与待匹配项目j之间的期刊会议标签匹配度,MLi,j为专家i与待匹配项目j之间的学术层次匹配度,MSi,j为专家i与待匹配项目j之间的社会关系匹配度。Among them, M i,j is the matching degree between expert i and project j to be matched, α, β, γ, and δ are all constants, and MK i,j is the scientific research field keywords between expert i and project j to be matched Matching degree, MJ i,j is the matching degree of journal conference labels between expert i and project j to be matched, ML i,j is the academic level matching degree between expert i and project j to be matched, MS i,j is expert The matching degree of social relationship between i and item j to be matched.
其中,所述MSi,j通过下式进行计算,Wherein, the MS i,j is calculated by the following formula,
其中,为权重值;δi,j为专家i与第v个专家之间工作单位的相关度,当工作单位相同时,δi,j的取值为1,否则取值为0;sp为专家i与第v个专家合作过的论文;n为论文作者的数量;ti为专家i所占的权重;tv为第v个专家所占的权重;k为待匹配项目j所对应项目申请人的序号;a为待匹配项目j所对应项目申请人的数量。in, is the weight value; δ i, j is the correlation degree of the work unit between expert i and the vth expert, when the work unit is the same, the value of δ i, j is 1, otherwise the value is 0; sp is expert i Papers that have cooperated with the vth expert; n is the number of paper authors; t i is the weight of expert i; t v is the weight of the vth expert; k is the applicant corresponding to the project j to be matched ; a is the number of project applicants corresponding to project j to be matched.
其中,所述群组匹配模型为:Wherein, the group matching model is:
其中,ci,j为群组匹配矩阵,当专家i推荐给第j个项目时,则矩阵中对应的第i行第j列取值为1,否则取值为0;m为待匹配项目的总数;n为专家的总数;ε为每个待匹配项目所对应专家数量的最大值;σ为每个专家所对应项目数量的最大值。in, c i,j is the group matching matrix. When expert i recommends the jth item, the value of the i-th row and column j in the matrix is 1, otherwise the value is 0; m is the total number of items to be matched ; n is the total number of experts; ε is the maximum number of experts corresponding to each item to be matched; σ is the maximum number of items corresponding to each expert.
其中,步骤S4中,根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,具体包括:Wherein, in step S4, according to the matching degree and the group matching model, the expert recommended for the item to be matched is determined through a dynamic programming algorithm, specifically including:
S401:根据所述研究领域关键词确定每个待匹配项目所对应的专家,并将每个待匹配项目所对应的专家按照所述匹配度进行排序;S401: Determine the experts corresponding to each item to be matched according to the keywords in the research field, and sort the experts corresponding to each item to be matched according to the matching degree;
S402:依次将所述匹配度最高的专家分配给对应的待匹配项目,直至所述待匹配项目所对应专家数量达到了最大值ε或所述专家所对应项目数量达到了最大值σ。S402: Assign the experts with the highest matching degree to the corresponding items to be matched in turn until the number of experts corresponding to the items to be matched reaches the maximum value ε or the number of items corresponding to the experts reaches the maximum value σ.
本发明还公开了一种基于群组匹配的专家推荐系统,所述系统包括:The invention also discloses an expert recommendation system based on group matching, the system includes:
网页获取模块,用于通过网络爬虫获取专家列表中各专家的网页信息;The web page acquisition module is used to obtain the web page information of each expert in the expert list through a web crawler;
信息提取模块,用于对所述网页信息进行提取,以获得各专家的专家学术信息;An information extraction module, configured to extract the webpage information to obtain expert academic information of each expert;
匹配度计算模块,用于根据所述专家学术信息计算各专家与待匹配项目之间的匹配度;A matching degree calculation module, configured to calculate the matching degree between each expert and the item to be matched according to the expert's academic information;
专家推荐模块,用于根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,所述群组匹配模型为所有待匹配项目推荐专家的匹配度之和达到最大时,所述待匹配项目和推荐的专家之间的对应关系。The expert recommendation module is used to determine the expert recommended for the item to be matched through a dynamic programming algorithm according to the matching degree and the group matching model, and the group matching model is the sum of the matching degrees of experts recommended for all items to be matched When the maximum value is reached, the corresponding relationship between the item to be matched and the recommended expert.
本发明通过群组匹配的方式实现专家推荐,提高了专家推荐效率,大幅降低时间开销,另外,计算各专家与待匹配项目之间的匹配度时,还考虑了各专家与待匹配项目之间的社会关系匹配度,从而在实现专家推荐时,还有效避免或预防了学术腐败问题。The present invention realizes expert recommendation through group matching, improves the efficiency of expert recommendation, and greatly reduces time overhead. In addition, when calculating the matching degree between each expert and the item to be matched, the relationship between each expert and the item to be matched is also considered. The degree of social relationship matching, thus effectively avoiding or preventing academic corruption when implementing expert recommendation.
附图说明Description of drawings
图1是本发明一种实施方式的基于群组匹配的专家推荐方法的流程图;Fig. 1 is a flowchart of an expert recommendation method based on group matching in an embodiment of the present invention;
图2是本发明一种实施方式的基于群组匹配的专家推荐系统的结构框图。Fig. 2 is a structural block diagram of an expert recommendation system based on group matching according to an embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图和实施例,对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明,但不用来限制本发明的范围。The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.
图1是本发明一种实施方式的基于群组匹配的专家推荐方法的流程图;参照图1,所述方法包括:Fig. 1 is a flowchart of an expert recommendation method based on group matching in an embodiment of the present invention; referring to Fig. 1, the method includes:
S1:通过网络爬虫获取专家列表中各专家的网页信息;S1: Obtain the webpage information of each expert in the expert list through a web crawler;
需要说明的是,一般的搜索引擎中需要网络爬虫下载大量的数据,而步骤S1的网络爬虫只需要访问单站点(即学术性网站),并且规则明显,故而不需要用到分布式及网页排名等技术。It should be noted that a web crawler is required to download a large amount of data in a general search engine, but the web crawler in step S1 only needs to visit a single site (that is, an academic website), and the rules are obvious, so there is no need to use distributed and webpage ranking and other technologies.
不过目标网站需要对身份进行认证,所以我们需要使用浏览器解析认证过程,然后模拟浏览器登录服务器进行爬取。一般来说,网站的认证机制可以通过模拟cookie来获取,经过对目标网站的分析,发现目标网站也是通过cookie进行认证。所以,首先人工登录网站服务器获取到拥有权限的cookie,然后拷贝到网络爬虫中,网络爬虫使用这个权限对网页进行爬取。However, the target website needs to authenticate the identity, so we need to use the browser to parse the authentication process, and then simulate the browser to log in to the server for crawling. Generally speaking, the authentication mechanism of the website can be obtained by simulating cookies. After analyzing the target website, it is found that the target website is also authenticated through cookies. Therefore, first, manually log in to the website server to obtain a cookie with permission, and then copy it to the web crawler, and the web crawler uses this permission to crawl the webpage.
经过分析,获取专家的网页信息时只需要改动提交的http中的专家编号范围参数就可以获取到相应的网页信息。获取的网页信息以html网页的形式保存在硬盘中,然后使用正则表达式对网页信息进行解析,获取到专家的信息,并构建了基本专家数据库。After analysis, when obtaining the webpage information of the expert, only need to change the parameter of the expert number range in the submitted http to obtain the corresponding webpage information. The obtained webpage information is stored in the hard disk in the form of html webpage, and then the webpage information is analyzed by using regular expressions to obtain expert information and build a basic expert database.
可理解的是,本步骤中,根据所述专家列表中的专家姓名通过网络爬虫获取专家列表中各专家的网页信息。It can be understood that, in this step, the webpage information of each expert in the expert list is acquired through a web crawler according to the names of the experts in the expert list.
S2:对所述网页信息进行提取,以获得各专家的专家学术信息;S2: Extracting the webpage information to obtain expert academic information of each expert;
需要说明的是,有了基本专家数据库后,可以使用专家姓名和工作单位对专家进行筛选,提取专家学术信息。在提取专家学术信息的过程中,发现有很多专家列表中的专家出现了重名的现象,故而可以通过工作单位进行区分。It should be noted that, with the basic expert database, experts can be screened by name and work unit to extract expert academic information. In the process of extracting the academic information of experts, it is found that many experts in the expert list have the same name, so they can be distinguished by the work unit.
本步骤可由以下步骤S201~203实现:This step can be realized by the following steps S201-203:
S201:从所述网页信息中搜索与当前专家的专家姓名及工作单位相匹配的网页信息,若未搜索到,则执行步骤S202,否则从搜索到的第一个网页信息中提取出专家学术信息,并执行步骤S203,所述专家列表包括:各专家的专家姓名及工作单位;S201: Search the webpage information for the webpage information matching the expert name and work unit of the current expert, if not found, execute step S202, otherwise extract the expert's academic information from the first webpage information searched , and execute step S203, the list of experts includes: the name and work unit of each expert;
S202:从所述网页信息中搜索与当前专家的专家姓名相匹配的网页信息,从搜索到的第一个网页信息中提取专家学术信息;S202: Search for webpage information matching the expert name of the current expert from the webpage information, and extract expert academic information from the first searched webpage information;
S203:将所述专家列表中未提取专家学术信息的专家作为当前专家,并返回步骤S201。S203: Use the experts whose academic information has not been extracted from the expert list as the current experts, and return to step S201.
可选地,所述专家学术信息包括:专家姓名、工作单位、研究领域关键词、论文名称及论文作者。Optionally, the expert's academic information includes: expert's name, work unit, research field keywords, paper title and paper author.
S3:根据所述专家学术信息计算各专家与待匹配项目之间的匹配度;S3: Calculate the matching degree between each expert and the item to be matched according to the academic information of the expert;
S4:根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,所述群组匹配模型为所有待匹配项目推荐专家的匹配度之和达到最大时,所述待匹配项目和推荐的专家之间的对应关系。S4: According to the matching degree and the group matching model, the expert recommended for the item to be matched is determined by a dynamic programming algorithm, and the group matching model is when the sum of the matching degrees of all recommended experts for the item to be matched reaches the maximum, The corresponding relationship between the item to be matched and the recommended expert.
在实现专家推荐时,需要相关领域的专家来评审项目,故而专家和项目之间的匹配度与专家和项目的领域相关度成正相关;为了避免和预防学术腐败,若项目的申请人和评审专家之间具有社交关系的相关度,如共同合作过论文、在同一个工作单位等,那么专家和项目之间的匹配度,则与专家和项目之间的社交关系相关度成反比,故而,步骤S3中,根据所述专家学术信息通过下式计算各专家与待匹配项目之间的匹配度,When implementing expert recommendation, experts in related fields are needed to review projects, so the matching degree between experts and projects is positively related to the domain correlation between experts and projects; in order to avoid and prevent academic corruption, if the project applicant and review experts There is a social relationship between experts, such as co-authored papers, in the same work unit, etc., then the matching degree between experts and projects is inversely proportional to the social relationship between experts and projects. Therefore, the steps In S3, according to the academic information of the experts, the matching degree between each expert and the item to be matched is calculated by the following formula,
Mi,j=α*MKi,j+β*MJi,j+γ*MLi,j-δ*MSi,j M i,j =α*MK i,j +β*MJ i,j +γ*ML i,j -δ*MS i,j
其中,Mi,j为专家i与待匹配项目j之间的匹配度,α、β、γ、δ均为常数,MKi,j为专家i与待匹配项目j之间的科研领域关键词匹配度,MJi,j为专家i与待匹配项目j之间的期刊会议标签匹配度,MLi,j为专家i与待匹配项目j之间的学术层次匹配度,MSi,j为专家i与待匹配项目j之间的社会关系匹配度。Among them, M i,j is the matching degree between expert i and project j to be matched, α, β, γ, and δ are all constants, and MK i,j is the scientific research field keywords between expert i and project j to be matched Matching degree, MJ i,j is the matching degree of journal conference labels between expert i and project j to be matched, ML i,j is the academic level matching degree between expert i and project j to be matched, MS i,j is expert The matching degree of social relationship between i and item j to be matched.
为计算所述社会关系匹配度,可选地,所述MSi,j通过下式进行计算,In order to calculate the matching degree of the social relationship, optionally, the MS i,j is calculated by the following formula,
其中,为权重值;δi,j为专家i与第v个专家之间工作单位的相关度,当工作单位相同时,δi,j的取值为1,否则取值为0;sp为专家i与第v个专家合作过的论文;n为论文作者的数量;ti为专家i所占的权重(该权重可根据论文作者顺序确定);tv为第v个专家所占的权重(该权重可根据论文作者顺序确定);k为待匹配项目j所对应项目申请人的序号;a为待匹配项目j所对应项目申请人的数量。in, is the weight value; δ i, j is the correlation degree of the work unit between expert i and the vth expert, when the work unit is the same, the value of δ i, j is 1, otherwise the value is 0; sp is expert i Papers that have cooperated with the vth expert; n is the number of authors of the paper; t i is the weight of expert i (the weight can be determined according to the order of the authors of the paper); t v is the weight of the vth expert (the The weight can be determined according to the order of the authors of the paper); k is the serial number of the project applicant corresponding to the project j to be matched; a is the number of project applicants corresponding to the project j to be matched.
可选地,假设专家i的科研领域关键词表述为向量<Ki,1,Ki,2,Ki,3,...,Ki,N>,其权重(实际是关键词频率)表述为向量<wi,1,wi,2,wi,3,...,wi,N>;项目j的科研领域关键词表述为向量<Kj,1,Kj,2,Kj,3,...,Kj,N>,其权重(实际是关键词频率)表述为向量<wj,1,wj,2,wj,3,...,wj,N>。使用基于内容的向量推荐算法,定义科研领域关键词匹配度MKi,j为:Optionally, assume that the keywords of expert i’s scientific research field are expressed as vectors <K i,1 ,K i,2 ,K i,3 ,...,K i,N >, and its weight (actually keyword frequency) It is expressed as a vector <w i,1 ,wi ,2 ,wi ,3 ,...,wi ,N >; the keywords of the scientific research field of project j are expressed as a vector <K j,1 ,K j,2 , K j,3 ,...,K j,N >, its weight (actually keyword frequency) is expressed as a vector <w j,1 ,w j,2 ,w j,3 ,...,w j, N >. Using the content-based vector recommendation algorithm, define the keyword matching degree MK i,j in the scientific research field as:
这里R(Ki,x,Kj,y)表示两个科研领域关键词Ki,x和Kj,y的相似度(R=Resemblance)。Here R(K i,x ,K j,y ) represents the similarity between keywords K i,x and K j,y of two scientific research fields (R=Resemblance).
由于科研领域关键词在进行提取时,通常不会特别准确,故而,在计算R(Ki,x,Kj,y)时,首先计算两个科研关键词的编辑距离:Levenshein distance,编辑距离是指两个字串之间,由一个转成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。假设编辑距离为d,两个科研关键词中最长的词长度为max,那么相似度便为1-d/max。Since keywords in the field of scientific research are usually not particularly accurate when extracting them, when calculating R(K i,x ,K j,y ), first calculate the edit distance between two scientific research keywords: Levenshein distance, edit distance It refers to the minimum number of editing operations required to convert from one string to another between two strings. Permissible editing operations include replacing one character with another, inserting a character, and deleting a character. Assuming that the edit distance is d, and the length of the longest word among the two scientific research keywords is max, then the similarity is 1-d/max.
假设专家i的期刊会议标签表述为向量<Ji,1,Ji,2,Ji,3,...,Ji,N>,其权重(实际是标签频率)表述为向量<wi,1,wi,2,wi,3,...,wi,N>;项目j的期刊会议标签表述为向量<Jj,1,Jj,2,Jj,3,...,Jj,N>,其权重(实际是标签频率)表述为向量<wj,1,wj,2,wj,3,...,wj,N>。使用基于标签的向量推荐算法,定义期刊会议标签匹配度MJi,j为:Assume that expert i’s journal conference labels are expressed as vectors <J i,1 ,J i,2 ,J i,3 ,...,J i,N >, and their weights (actually label frequencies) are expressed as vectors <w i ,1 ,wi ,2 ,wi ,3 ,...,wi ,N >; the journal conference label of project j is expressed as a vector <J j,1 ,J j,2 ,J j,3 ,.. .,J j,N >, its weight (actually label frequency) is expressed as a vector <w j,1 ,w j,2 ,w j,3 ,...,w j,N >. Using the tag-based vector recommendation algorithm, define the journal conference tag matching degree MJ i,j as:
这里I(Ji,x,Jj,y)表示两个期刊会议标签Ji,x和Jj,y是否相同(I=Identity)。Here I(J i,x ,J j,y ) indicates whether the conference labels J i,x and J j,y of two journals are the same (I=Identity).
需要注意的是,由于期刊会议标签一般是十分精确的,不同于R(Ki,x,Kj,y)介于0到1.0之间,I(Ji,x,Jj,y)当期刊会议标签相等时取1,不相等时取0。It should be noted that since journal conference labels are generally very accurate, unlike R(K i,x ,K j,y ) between 0 and 1.0, I(J i,x ,J j,y ) when Journal conference labels are equal to 1 and 0 if not equal.
假设专家i的学术层次向量为<单位级别,职称,科研项目规模>,其中单位级别为所在高校层次:如985、211、普通本科、专科之类;职称:如硕士生导师、博士生导师、长江学者、院士等;科研项目规模:已经申请完毕包括已经完成和正在进行的国家科研项目:如863项目等,评价指标为科研资金数额;Assume that the academic level vector of expert i is <unit level, professional title, scientific research project scale>, where the unit level is the level of the university: such as 985, 211, general undergraduate, junior college, etc.; professional title: such as master tutor, doctoral tutor, Cheung Kong Scholars, academicians, etc.; scale of scientific research projects: applications have been completed, including completed and ongoing national scientific research projects: such as 863 projects, etc., and the evaluation index is the amount of scientific research funds;
对于使用这样一组向量表示的集合,使用k-means聚类方法对专家的学生层次进行聚类,首先选取k个具有代表性的专家,把和代表专家相似的专家放到一类里面,这样把专家放到k个类别里面。计算两个专家的学术层次相似度的时候,如果两个专家属于同一类,那么相似度为1,如果两个专家不处于一类,那么相似度为0。For a set represented by such a set of vectors, the k-means clustering method is used to cluster the student level of experts. First, k representative experts are selected, and experts similar to the representative experts are put into a class, so that Put experts into k categories. When calculating the academic level similarity of two experts, if the two experts belong to the same category, then the similarity is 1, and if the two experts are not in the same category, then the similarity is 0.
那么每一个项目的申请人的学术层次我们可以使用这样一组向量V1、V2、V3、……、Vp(假设项目j共有p个申请人)来表示。在计算某个专家和项目之间的学术层次匹配度的时候,可以分别计算专家和每个项目申请人的相似度,然后求和,除以申请人的个数,得到项目和专家之间的相似度。Then we can use such a set of vectors V 1 , V 2 , V 3 , ..., V p to represent the academic level of applicants for each project (assuming that there are p applicants in project j). When calculating the academic matching degree between an expert and a project, the similarity between the expert and each project applicant can be calculated separately, then summed, divided by the number of applicants, and the degree of similarity between the project and the expert can be obtained similarity.
可选地,所述群组匹配模型为:Optionally, the group matching model is:
其中,ci,j为群组匹配矩阵,当专家i推荐给第j个项目时,则矩阵中对应的第i行第j列取值为1,否则取值为0;m为待匹配项目的总数;n为专家的总数;ε为每个待匹配项目所对应专家数量的最大值;σ为每个专家所对应项目数量的最大值。in, c i,j is the group matching matrix. When expert i recommends the jth item, the value of the i-th row and column j in the matrix is 1, otherwise the value is 0; m is the total number of items to be matched ; n is the total number of experts; ε is the maximum number of experts corresponding to each item to be matched; σ is the maximum number of items corresponding to each expert.
为便于确定为所述待匹配项目所推荐的专家,可选地,步骤S4中,根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,具体包括:In order to facilitate the determination of experts recommended for the item to be matched, optionally, in step S4, the expert recommended for the item to be matched is determined through a dynamic programming algorithm according to the matching degree and the group matching model, specifically including :
S401:根据所述研究领域关键词确定每个待匹配项目所对应的专家,并将每个待匹配项目所对应的专家按照所述匹配度进行排序;S401: Determine the experts corresponding to each item to be matched according to the keywords in the research field, and sort the experts corresponding to each item to be matched according to the matching degree;
S402:依次将所述匹配度最高的专家分配给对应的待匹配项目,直至所述待匹配项目所对应专家数量达到了最大值ε或所述专家所对应项目数量达到了最大值σ。S402: Assign the experts with the highest matching degree to the corresponding items to be matched in turn until the number of experts corresponding to the items to be matched reaches the maximum value ε or the number of items corresponding to the experts reaches the maximum value σ.
本发明还公开了一种基于群组匹配的专家推荐系统,参照图2,所述系统包括:The present invention also discloses an expert recommendation system based on group matching. Referring to FIG. 2, the system includes:
网页获取模块,用于通过网络爬虫获取专家列表中各专家的网页信息;The web page acquisition module is used to obtain the web page information of each expert in the expert list through a web crawler;
信息提取模块,用于对所述网页信息进行提取,以获得各专家的专家学术信息;An information extraction module, configured to extract the webpage information to obtain expert academic information of each expert;
匹配度计算模块,用于根据所述专家学术信息计算各专家与待匹配项目之间的匹配度;A matching degree calculation module, configured to calculate the matching degree between each expert and the item to be matched according to the expert's academic information;
专家推荐模块,用于根据所述匹配度及群组匹配模型通过动态规划算法确定为所述待匹配项目所推荐的专家,所述群组匹配模型为所有待匹配项目推荐专家的匹配度之和达到最大时,所述待匹配项目和推荐的专家之间的对应关系。The expert recommendation module is used to determine the expert recommended for the item to be matched through a dynamic programming algorithm according to the matching degree and the group matching model, and the group matching model is the sum of the matching degrees of experts recommended for all items to be matched When the maximum value is reached, the corresponding relationship between the item to be matched and the recommended expert.
本系统还包括用于实现上述方法各步骤的模块、子模块、单元、子单元,为避免重复说明,不再赘述。The system also includes modules, sub-modules, units, and sub-units for realizing each step of the above-mentioned method, and in order to avoid repeated description, details are not repeated here.
以上实施方式仅用于说明本发明,而并非对本发明的限制,有关技术领域的普通技术人员,在不脱离本发明的精神和范围的情况下,还可以做出各种变化和变型,因此所有等同的技术方案也属于本发明的范畴,本发明的专利保护范围应由权利要求限定。The above embodiments are only used to illustrate the present invention, but not to limit the present invention. Those of ordinary skill in the relevant technical field can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, all Equivalent technical solutions also belong to the category of the present invention, and the scope of patent protection of the present invention should be defined by the claims.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410680306.6A CN104361102B (en) | 2014-11-24 | 2014-11-24 | A kind of expert recommendation method and system based on group matches |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410680306.6A CN104361102B (en) | 2014-11-24 | 2014-11-24 | A kind of expert recommendation method and system based on group matches |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104361102A CN104361102A (en) | 2015-02-18 |
CN104361102B true CN104361102B (en) | 2018-05-11 |
Family
ID=52528362
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410680306.6A Active CN104361102B (en) | 2014-11-24 | 2014-11-24 | A kind of expert recommendation method and system based on group matches |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104361102B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160699A (en) * | 2019-11-26 | 2020-05-15 | 清华大学 | Expert recommendation method and system |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105260849A (en) * | 2015-10-21 | 2016-01-20 | 内蒙古科技大学 | Scientific researcher evaluation method across social networks |
CN106227771B (en) * | 2016-07-15 | 2019-05-07 | 浙江大学 | A Domain Expert Discovery Method Based on Social Programming Websites |
CN106295147A (en) * | 2016-07-29 | 2017-01-04 | 广州比特软件科技有限公司 | The medical expert's personalized recommendation method solved based on big data and system |
CN106952191A (en) * | 2017-03-09 | 2017-07-14 | 深圳市华第时代科技有限公司 | The automatic reviewing method of motion and system |
CN108255957A (en) * | 2017-12-21 | 2018-07-06 | 杭州传送门网络科技有限公司 | One kind recommends matching process based on Venture Capital field precision dataization |
CN108829752A (en) * | 2018-05-25 | 2018-11-16 | 南京邮电大学 | Based on personalized tutor's proposed algorithm |
CN108549730A (en) * | 2018-06-01 | 2018-09-18 | 云南电网有限责任公司电力科学研究院 | A kind of search method and device of expert info |
CN108873706B (en) * | 2018-07-30 | 2022-04-15 | 中国石油化工股份有限公司 | Trap evaluation intelligent expert recommendation method based on deep neural network |
CN110263135B (en) * | 2019-05-20 | 2022-12-16 | 北京字节跳动网络技术有限公司 | Data exchange matching method, device, medium and electronic equipment |
CN110888964B (en) * | 2019-07-22 | 2023-09-01 | 天津大学 | Expert Secondary Recommendation Method and Device Based on Improved PageRank Algorithm |
CN110956354A (en) * | 2019-08-30 | 2020-04-03 | 深圳传世智慧科技有限公司 | Change management resource matching method, server and change management system |
CN110795640B (en) * | 2019-10-12 | 2023-08-18 | 华中师范大学 | Self-adaptive group recommendation method for compensating group member difference |
CN111090801B (en) * | 2019-12-18 | 2023-06-09 | 创新奇智(青岛)科技有限公司 | Expert human relation map drawing method and system |
CN113516094B (en) * | 2021-07-28 | 2024-03-08 | 中国科学院计算技术研究所 | System and method for matching and evaluating expert for document |
CN113902290B (en) * | 2021-09-14 | 2022-11-04 | 中国人民解放军军事科学院战略评估咨询中心 | Expert matching effectiveness measuring and calculating method facing evaluation task |
CN114510918A (en) * | 2022-02-16 | 2022-05-17 | 数字浙江技术运营有限公司 | Expert matching method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605665A (en) * | 2013-10-24 | 2014-02-26 | 杭州电子科技大学 | Keyword based evaluation expert intelligent search and recommendation method |
CN103631859A (en) * | 2013-10-24 | 2014-03-12 | 杭州电子科技大学 | Intelligent review expert recommending method for science and technology projects |
CN103823896A (en) * | 2014-03-13 | 2014-05-28 | 蚌埠医学院 | Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm |
WO2014107672A1 (en) * | 2013-01-07 | 2014-07-10 | dotbox, inc. | Validated product recommendation system and methods |
-
2014
- 2014-11-24 CN CN201410680306.6A patent/CN104361102B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014107672A1 (en) * | 2013-01-07 | 2014-07-10 | dotbox, inc. | Validated product recommendation system and methods |
CN103605665A (en) * | 2013-10-24 | 2014-02-26 | 杭州电子科技大学 | Keyword based evaluation expert intelligent search and recommendation method |
CN103631859A (en) * | 2013-10-24 | 2014-03-12 | 杭州电子科技大学 | Intelligent review expert recommending method for science and technology projects |
CN103823896A (en) * | 2014-03-13 | 2014-05-28 | 蚌埠医学院 | Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm |
Non-Patent Citations (2)
Title |
---|
基于社会网络的科技咨询专家库构建及其可视化研究;王雪芬;《中国优秀硕士学位论文全文数据库经济与管理科学辑》;20100815(第10期);J168-2:正文第29页倒数第3段-倒数第1段,第30页第1段 * |
基于网络方法的专家知识推荐;许云红;《中国博士学位论文全文数据库经济与管理科学辑》;20101015(第10期);J152-20:正文第13页2.3.1节,第53页第1段,第54页第1段,第56-58页5.3节 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111160699A (en) * | 2019-11-26 | 2020-05-15 | 清华大学 | Expert recommendation method and system |
Also Published As
Publication number | Publication date |
---|---|
CN104361102A (en) | 2015-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104361102B (en) | A kind of expert recommendation method and system based on group matches | |
Lao et al. | Fast query execution for retrieval models based on path-constrained random walks | |
CN114238573B (en) | Text countercheck sample-based information pushing method and device | |
Li | Learning to rank for information retrieval and natural language processing | |
He et al. | Context-aware citation recommendation | |
Kong et al. | Exploring dynamic research interest and academic influence for scientific collaborator recommendation | |
CN103886054B (en) | Personalization recommendation system and method of network teaching resources | |
CN108280114B (en) | Deep learning-based user literature reading interest analysis method | |
CN102508859A (en) | Advertisement classification method and device based on webpage characteristic | |
Amami et al. | A graph based approach to scientific paper recommendation | |
CN112989215B (en) | A Knowledge Graph Enhanced Recommendation System Based on Sparse User Behavior Data | |
Deng et al. | Enhanced models for expertise retrieval using community-aware strategies | |
Noel et al. | Applicability of Latent Dirichlet Allocation to multi-disk search | |
CN110310012B (en) | Data analysis method, device, equipment and computer readable storage medium | |
Wang et al. | Multi-task representation learning for demographic prediction | |
Utama et al. | Scientific Articles Recommendation System Based On User’s Relatedness Using Item-Based Collaborative Filtering Method | |
Zhao et al. | ST-LDA: high quality similar words augmented LDA for service clustering | |
Banaei et al. | Web page rank estimation in search engine based on SEO parameters using machine learning techniques | |
Kaur | Web content classification: a survey | |
Azzam et al. | A question routing technique using deep neural network for communities of question answering | |
Lu et al. | Learning multimodal neural network with ranking examples | |
Manolopoulos et al. | Metrics and rankings: Myths and fallacies | |
Mehrotra et al. | Task-Based User Modelling for Personalization via Probabilistic Matrix Factorization. | |
Ma | Research on digital English teaching materials recommendation based on improved machine learning | |
Roy et al. | Automated resume classification using machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |