CN111008285B - Author disambiguation method based on thesis key attribute network - Google Patents

Author disambiguation method based on thesis key attribute network Download PDF

Info

Publication number
CN111008285B
CN111008285B CN201911207075.6A CN201911207075A CN111008285B CN 111008285 B CN111008285 B CN 111008285B CN 201911207075 A CN201911207075 A CN 201911207075A CN 111008285 B CN111008285 B CN 111008285B
Authority
CN
China
Prior art keywords
list
author
name
node
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911207075.6A
Other languages
Chinese (zh)
Other versions
CN111008285A (en
Inventor
冯凯
康锐文
王元卓
刘冰冰
彭亮
贾士杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Science And Technology Big Data Research Institute
Original Assignee
Big Data Research Institute Institute Of Computing Technology Chinese Academy Of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Big Data Research Institute Institute Of Computing Technology Chinese Academy Of Sciences filed Critical Big Data Research Institute Institute Of Computing Technology Chinese Academy Of Sciences
Priority to CN201911207075.6A priority Critical patent/CN111008285B/en
Publication of CN111008285A publication Critical patent/CN111008285A/en
Application granted granted Critical
Publication of CN111008285B publication Critical patent/CN111008285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Abstract

The invention discloses an author disambiguation method based on a thesis key attribute network, wherein a key attribute relationship network is a relationship network formed by collecting key attributes in a thesis and through the correlation relationship of the key attributes, and the relationship network between thesis co-writers, the relationship network of the same organization and the relationship network of the same field are respectively formed, and finally the relationship network of the thesis key attributes is formed. The method of the invention can effectively solve the situation that the same name of a person corresponds to different actual authors in a paper by extracting the paper name, the author mechanism and the author field in the paper and establishing a relational network around the author name, matching the name of the author of the paper when disambiguating the author of the paper and relating the author mechanism and the author field in the relational network. In addition, by combining the paper names to match the co-writers of the current authors to be disambiguated and matching the co-writers again, the situation that the same actual author has different name writing methods can be effectively solved.

Description

Author disambiguation method based on thesis key attribute network
Technical Field
The invention belongs to the technical field of disambiguation of the same author and different authors of a thesis, and particularly relates to an author disambiguation method based on a key attribute network of the thesis.
Background art:
in recent years, with the development of the internet, people have been closely related to the internet in all aspects of life, and academic activities are also the same. Most of the academic results can be queried through the Internet nowadays. However, in the presence of massive data, it is particularly important how to accurately query the data needed by people. At present, most of the paper platforms can search through authors to retrieve the paper information published by the authors of the query. In this case, the accuracy of the author's name is particularly important. But in real-life situations the following two situations generally occur.
One is that the names of authors of papers published by the same author may be presented in different ways, such as the name of the author's real name called "Zhang", possibly "San Zhang" in some foreign literature, and also in an abbreviated manner such as "Zhang s.
The second is the case of different authors with duplicate names, for example, two authors in different organizations are called 'Lisi', or one author is called 'Wanwu', the other author is called 'Wanwu', and the name results of communication authors written in some foreign documents are all 'Wu Wang'.
The two situations cause great difficulty in thesis retrieval, many thesis search engines in the existing system directly perform retrieval query aiming at character string matching, along with the increase of data volume, the accuracy of retrieved results cannot be guaranteed to a great extent, and the results need to be manually discriminated in most cases. With the improvement of the requirement on the accuracy of the paper authors, many methods for disambiguating the paper authors also appear, but the traditional methods are only simple and simple to match from the dimensions of organizations, keywords, published information and the like, and with the increase of data volume, the traditional methods cause the screened papers to be disordered and have no seal, and researchers need to perform long-time screening in the later period. The research efficiency is seriously influenced.
Disclosure of Invention
The invention provides an author disambiguation method based on a thesis key attribute network, which is mainly based on the current necessity of disambiguating an author of a thesis and the effectiveness of a traditional disambiguation method under the condition of large data volume, and combines data of the same actual author with different name writing methods; data of the same name but corresponding to different actual authors are distinguished.
The technical scheme adopted for realizing the purpose is as follows: an author disambiguation method based on a thesis key attribute network establishes a key attribute relationship network, which is a relationship network formed by collecting key attributes in the thesis and through the correlation relationship of the key attributes, wherein entity nodes in the relationship network mainly comprise: author name, author institution, author domain and thesis name; the authors are clustered through three dimensions of the thesis name, the institution and the field, so as to form a relationship network among the thesis co-writers, a relationship network of the same institution and a relationship network of the same field, and finally form a relationship network of key attributes of the thesis; the implementation logic of the author disambiguation method based on the key attribute relationship network comprises the following steps 1 to 7.
Step 1: cell a1 is input into the relationship net.
Step 2: and inserting the domain, the mechanism and the paper name in the unit A1 into the relational network for Merge operation.
And step 3: query N1 in A1 whether there is the same node as all N nodes in the relationship network.
And 4, step 4: if the same node exists, the FLOW1 is entered to start the judgment, mainly to judge whether the same name exists but corresponds to the situation of different actual authors.
The FLOW1 procedure was performed, including the following steps (1) - (7).
(1) The lists of domains (F) and organizations (O) associated with the same N node as N1 are taken and are denoted as F-List and O-List, respectively.
(2) Matching F related to N1 with F-List, calculating weight, wherein the weight is 1 once matching is successful, calculating field weight sum, and recording as: SumWeightField.
(3) And (3) matching O related to N1 with O-List, calculating weight, wherein the weight is 2 when matching is successful, and calculating the weight sum of the mechanism, which is recorded as: SumWeiightorg.
(4) Calculating the weighted sum, which is recorded as: SumWeight ═ weight (F) + weight (O).
(5) If SumWeight > 2, the label N1 is the same person as the matching successful N node.
(6) If SumWeight is less than or equal to 2, the node marked as N1 and the N node with successful matching is two persons.
(7) And outputting the result.
And 5: if the two types of data are different, the FLOW2 is entered, and the judgment is started to mainly judge whether the situation that the same actual author has different name writings exists or not.
The FLOW2 procedure was performed, including the following steps (1) - (8).
(1) A paper name node List Title-List which is the same as the paper name (T) of A1, a Field node List Field-List which is the same as the Field (F) of A1, and an organization node List Org-List which is the same as the organization (O) of A1 are respectively taken out in the relational network.
(2) Through a Title-List associated author name node, namely an N node, the relationship between a paper author and a co-worker thereof is associated, through querying the co-worker of N1, the matching is queried again in a reverse direction, i.e., the potential matching authors are screened, the part of N-List is the re-associated co-worker with the author who has collaborated with N1 in a1, and the part is based on a realistic situation that the author who has collaborated with N1 may collaborate with N1 more than once, and the main steps are as follows:
a) and querying an author name N-List associated with the Title-List through the Title-List.
b) And querying a paper name T-List associated with the N-List through the N-List, namely associating the paper name.
c) And inquiring the name of the author associated with the name through the T-List, and outputting the name as the N-Title-List.
(3) And inquiring the author name node associated with the Field-List, and outputting the author name node as the N-Field-List.
(4) The author name points associated with Org-List are queried and the output is N-Org-List.
(5) N1 is respectively matched with N-Title-List, N-Field-List and N-Org-List in terms of correlation degree, and respectively recorded as Ret-Title-List, Ret-Field-List and Ret-Org-List, wherein the weights are 3, 2 and 1.
(6) And aggregating the Ret-Title-List, the Ret-Field-List and the Ret-Org-List according to values, solving intersection, respectively calculating the weights and SumWeight of different result sets after aggregation, and outputting the result set as the Ret-List.
(7) The highest weight and SumWeight in the Ret-List is taken, and if SumWeight is more than 4, the author is the same author, and if SumWeight is less than or equal to 4, the author is different.
(8) If the weight sum is the highest and more than 4, the relevance matching of the author names is carried out again, and the one with the highest relevance matching is taken.
Step 6: and inputting the result of the step 4 or the step 5 into the relational network, inserting the author name node into the relational network if the author name node is a new author name node, and otherwise, updating the author name node in the relational network and adding a new alias for the author name node.
And 7: and repeating the above 6 steps to achieve the purpose of disambiguation while establishing the relation network.
The above-mentioned unit: referring to an input of information as a unit, one of which is one of a list of author information extracted in a paper, includes: author name (N), domain (F), organization (O), paper name (T). A1 represents a specific example of a unit; the description of the documents in the following is the same, and is not repeated for the sake of convenience.
The step 1 is mainly to input unit data into the relational network, and includes the following steps (1) - (2).
(1) All data of author name, thesis name, domain and organization character are converted into lower case.
(2) Removing special characters such as "-", etc. in the data.
The above step 2 includes the following steps (1) to (5).
(1) And extracting the domain nodes in A1, and sequentially inserting the domain nodes into the relational network, wherein F1 is one domain node in A1.
(2) It is determined whether the same node as F1 exists in the relationship network.
(3) If so, ignore.
(4) And if not, inserting into the relation network.
(5) The rest of the organisations are in steps 1 to 4 above with the title of the article.
The invention has the beneficial effects that: the method of the invention can effectively solve the situation that the same name of a person corresponds to different actual authors in a paper by extracting the paper name, the author mechanism and the author field in the paper and establishing a relational network around the author name, matching the name of the author of the paper when disambiguating the author of the paper and relating the author mechanism and the author field in the relational network. In addition, the situation that the same actual Author has different name writing methods can be effectively solved by matching the co-writers (Coop Author List) of the current authors to be disambiguated with the paper names and matching the co-writers of the Coop Author List again.
Drawings
FIG. 1 is an exemplary diagram of a key attribute relationship network.
FIG. 2 is a general flow diagram of a disambiguation method.
Fig. 3 is a FLOW chart of FLOW 1.
Fig. 4 is a FLOW chart of FLOW 2.
Detailed Description
The invention provides an author disambiguation method based on a thesis key attribute network, which is mainly based on the current necessity of disambiguating the author of a thesis and the effectiveness of a traditional disambiguation method under the condition of large data volume. The method can effectively solve the situation that the same person name corresponds to different actual authors in the paper by extracting the paper name, the author mechanism and the author field in the paper and establishing a relational network around the author name, matching the name of the author of the paper when disambiguating the author of the paper and relating the author mechanism and the author field in the relational network. In addition, by combining the paper names to match the co-writers of the current authors to be disambiguated and matching the co-writers again, the situation that the same actual author has different name writing methods can be effectively solved. The following first gives a brief description of the relationship network, and then explains the logic for implementing the method.
The key attribute relational network is a relational network formed by collecting key attributes in the papers and through their correlation, wherein entity nodes in the relational network mainly comprise: author name, author institution, author field, thesis name. The authors are clustered through three dimensions of the thesis name, the institution and the field, so that a relationship network among the thesis co-writers, a relationship network of the same institution and a relationship network of the same field are respectively formed, and finally a relationship network of key attributes of the thesis is formed.
A summary of the relationship network is provided below, followed by a description of the logic for implementing the method.
FIG. 1 is an exemplary diagram of a key attribute relationship network, wherein N represents author name, F represents domain, O represents mechanism, and T represents paper name, and the key attribute relationship network is formed by the relationship between nodes. A unit: for convenience of description, a single input of information is referred to herein as a unit, where a unit is one of a list of author information extracted in a paper, and includes: author name (N), domain (F), organization (O), paper name (T). A1 represents a specific example of a unit; the description of the documents in the following is the same, and is not repeated for the sake of convenience.
The following describes the implementation logic of the author disambiguation method based on the key attribute relationship network in detail, and fig. 2 is a general flowchart of the disambiguation method.
The flow in fig. 2 is explained.
(1) Inputting cell A1 into the relationship net;
(2) and inserting the domain, the mechanism and the paper name in the unit A1 into the relation network for Merge operation.
(3) Query N1 in A1 whether there is the same node as all N nodes in the relationship network.
(4) If the same node exists, the FLOW1 is entered to start the judgment, mainly to judge whether the same name exists but corresponds to the situation of different actual authors.
(5) If the two types of data are different, the FLOW2 is entered, and the judgment is started to mainly judge whether the situation that the same actual author has different name writings exists or not.
(6) And (4) inputting the result of the step (4) or (5) into the relational network, inserting the author name node into the relational network if the author name node is a new author name node, and otherwise, updating the author name node in the relational network and adding a new alias for the author name node.
(7) And (6) circulating the steps to achieve the purpose of disambiguation while establishing the relation network.
Wherein, the step (1) is mainly to input unit data into the relation network, and the step (1) mainly comprises the following steps:
1. converting all data of author names, thesis names, fields and mechanism characters into lower case;
2. removing special characters such as "-", etc. in the data.
The main steps of the step (2) are as follows:
1. and extracting the domain nodes in A1, and sequentially inserting the domain nodes into the relational network, wherein F1 is one domain node in A1.
2. It is determined whether the same node as F1 exists in the relationship network.
3. If so, ignore.
4. And if not, inserting into the relation network.
5. The rest of the organisations are in steps 1 to 4 above with the title of the article.
In the step (4), when it is determined that N1 has the same node as N in the relational network, FLOW1 is performed, and fig. 3 is a FLOW1 flowchart, which is described in detail below.
1. Taking out a List of domains (F) and organizations (O) which are associated with the same N node as the N1 node and respectively marking as F-List and O-List;
2. matching F related to N1 with F-List, calculating weight, wherein the weight is 1 once matching is successful, calculating field weight sum, and recording as: SumWeightFieid;
3. and (3) matching O related to N1 with O-List, calculating weight, wherein the weight is 2 when matching is successful, and calculating the weight sum of the mechanism, which is recorded as: SumWeiightorg;
4. calculating the weighted sum, which is recorded as: SumWeight ═ weight (f) + weight (o);
5. if SumWeight is more than 2, marking N1 as the same person as the successfully matched N node;
6. if SumWeight is less than or equal to 2, the node marked as N1 and the N node successfully matched is two persons;
7. and outputting the result.
When the step (5) determines that N1 does not have the same node as N in the relational network, FLOW2 is performed, and fig. 4 is a FLOW2 flowchart, which is described in detail below.
1. Respectively taking out a paper name node List Title-List which is the same as the paper name (T) of A1, a Field node List Field-List which is the same as the Field (F) of A1 and an organization node List Org-List which is the same as the organization (O) of A1 in the relational network;
2. through a Title-List associated author name node, namely an N node, the relationship between a paper author and a co-worker thereof is associated, through querying the co-worker of N1, the matching is queried again in a reverse direction, i.e., the potential matching authors are screened, the part of N-List is the re-associated co-worker with the author who has collaborated with N1 in a1, and the part is based on a realistic situation that the author who has collaborated with N1 may collaborate with N1 more than once, and the main steps are as follows:
d) inquiring an author name N-List associated with the Title-List through the Title-List;
e) inquiring a paper name T-List associated with the N-List through the N-List, namely associating the paper name;
f) and inquiring the name of the author associated with the name through the T-List, and outputting the name as the N-Title-List.
3. And inquiring the author name node associated with the Field-List, and outputting the author name node as the N-Field-List.
4. The author name points associated with Org-List are queried and the output is N-Org-List.
5. N1 is respectively matched with N-Title-List, N-Field-List and N-Org-List in terms of correlation degree, and respectively recorded as Ret-Title-List, Ret-Field-List and Ret-Org-List, wherein the weights are 3, 2 and 1.
6. And aggregating the Ret-Title-List, the Ret-Field-List and the Ret-Org-List according to values, solving intersection, respectively calculating the weights and SumWeigt of different result sets after aggregation, and outputting the result set as the Ret-List.
7. The one with the highest weight and SumWeigt in the Ret-List is taken, and the same author is the one with SumWeigt > 4, and the different author is the one with SumWeigt ≦ 4.
8. If the weight sum is the highest and more than 4, the relevance matching of the author names is carried out again, and the one with the highest relevance matching is taken.

Claims (3)

1. An author disambiguation method based on a paper key attribute network is characterized in that a key attribute relationship network is established, the key attribute relationship network is formed by collecting key attributes in paper and through the correlation relationship of the key attributes, and entity nodes in the relationship network comprise: author name, author institution, author domain and thesis name; the authors are clustered through three dimensions of the thesis name, the institution and the field, so as to form a relationship network among the thesis co-writers, a relationship network of the same institution and a relationship network of the same field, and finally form a relationship network of key attributes of the thesis; the implementation logic of the author disambiguation method based on the key attribute relationship network comprises the following steps:
step 1: inputting cell A1 into the relationship net;
step 2: inserting the fields, mechanisms and paper names in the unit A1 into the relational network, and performing Merge operation;
and step 3: inquiring whether the N1 in the A1 has the same node with all the N nodes in the relational network;
and 4, step 4: if the same node exists, the FLOW1 is entered, and the judgment is started to judge whether the same name is the same but the same name corresponds to the situation of different actual authors; the FLOW1 process was carried out, comprising the following steps (1) - (7):
(1) taking out a domain F and a mechanism O List which are associated with the same N node as the N1 node and respectively recording the domain F and the mechanism O as an F-List and an O-List;
(2) matching F related to N1 with F-List, calculating weight, wherein the weight is 1 once matching is successful, calculating field weight sum, and recording as: SumWeightField;
(3) and (3) matching O related to N1 with O-List, calculating weight, wherein the weight is 2 when matching is successful, and calculating the weight sum of the mechanism, which is recorded as: SumWeiightorg;
(4) calculating the weighted sum, which is recorded as: SumWeight ═ SumWeightField + SumWeightOrg;
(5) if SumWeight is more than 2, marking N1 as the same person as the successfully matched N node;
(6) if SumWeight is less than or equal to 2, the node marked as N1 and the N node successfully matched is two persons;
(7) outputting a result;
and 5: if the two types of data are different, entering a FLOW2, starting to judge, and judging whether the situation that the same actual author has different name writing methods exists or not; the FLOW2 process was carried out, comprising the following steps (1) - (8):
(1) respectively taking out a paper name node List Title-List which is the same as the paper name T of A1, a Field node List Field-List which is the same as the Field F of A1 and an organization node List Org-List which is the same as the organization O of A1 in the relational network;
(2) through a Title-List associated author name node, namely an N node, the relationship between a paper author and a co-worker thereof is associated, through querying the co-worker of N1, the matching is queried again in a reverse direction, namely, the matched author is screened, the part of N-List is the co-worker associated again with the author who has collaborated with N1 in a1, and the part is based on a realistic condition that the author who has collaborated with N1 collaborates with N1 more than once, and the following steps are performed:
a) inquiring an author name N-List associated with the Title-List through the Title-List;
b) inquiring a paper name T-List associated with the N-List through the N-List, namely associating the paper name;
c) inquiring the name of the author associated with the name through the T-List, and outputting the name as an N-Title-List;
(3) inquiring an author name node associated with the Field-List, and outputting the author name node as the N-Field-List;
(4) inquiring an author name node associated with the Org-List, and outputting the author name node as N-Org-List;
(5) respectively matching N1 with N-Title-List, N-Field-List and N-Org-List in terms of correlation degree, respectively recording as Ret-Title-List, Ret-Field-List and Ret-Org-List, wherein the weights are 3, 2 and 1;
(6) aggregating the Ret-Title-List, the Ret-Field-List and the Ret-Org-List according to values, solving intersection, respectively calculating the weight and SumWeight of different aggregated result sets, and outputting the result set as Ret-List;
(7) taking the highest weight and SumWeight one of Ret-List, if SumWeight is more than 4, the author is the same author, and if SumWeight is less than or equal to 4, the author is different;
(8) if the weight sum is the highest and is more than 4, the relevance matching of the author name is carried out again, and the one with the highest relevance matching is taken;
step 6: inputting the result of the step 4 or the step 5 into the relation network, if the result is a new author name node, inserting the author name node into the relation network, otherwise, updating the author name node in the relation network, and adding a new alias for the author name node;
and 7: repeating the above 6 steps, and achieving the purpose of disambiguation while establishing a relation network;
the unit is as follows: referring to an input of information as a unit, one of which is one of a list of author information extracted in a paper, includes: author name N, field F, organization O, thesis name T; a1 represents a specific example of a unit.
2. The author disambiguation method based on paper key attribute network as claimed in claim 1, wherein said step 1 is inputting unit data into a relational network, comprising the steps of:
(1) converting all data of author names, thesis names, fields and mechanism characters into lower case;
(2) special characters in the data are removed.
3. The author disambiguation method based on paper key attribute network as claimed in claim 1, wherein step 2 comprises the steps of:
(1) extracting the domain nodes in A1, and sequentially inserting the domain nodes into the relational network, wherein F1 is one domain node in A1;
(2) judging whether the same node as F1 exists in the relational network;
(3) if so, ignoring;
(4) if not, inserting into the relation network;
(5) the rest of the institutions and the paper title repeat the above steps (1) to (4).
CN201911207075.6A 2019-11-29 2019-11-29 Author disambiguation method based on thesis key attribute network Active CN111008285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911207075.6A CN111008285B (en) 2019-11-29 2019-11-29 Author disambiguation method based on thesis key attribute network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911207075.6A CN111008285B (en) 2019-11-29 2019-11-29 Author disambiguation method based on thesis key attribute network

Publications (2)

Publication Number Publication Date
CN111008285A CN111008285A (en) 2020-04-14
CN111008285B true CN111008285B (en) 2021-04-13

Family

ID=70113498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911207075.6A Active CN111008285B (en) 2019-11-29 2019-11-29 Author disambiguation method based on thesis key attribute network

Country Status (1)

Country Link
CN (1) CN111008285B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487825A (en) * 2020-11-30 2021-03-12 北京航空航天大学 Talent information database disambiguation system
CN112528089B (en) * 2020-12-04 2023-11-14 平安科技(深圳)有限公司 Method, device and computer equipment for disambiguating paper authors

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080065623A1 (en) * 2006-09-08 2008-03-13 Microsoft Corporation Person disambiguation using name entity extraction-based clustering
CN104182420A (en) * 2013-05-27 2014-12-03 华东师范大学 Ontology-based Chinese name disambiguation method
CN106055539A (en) * 2016-05-27 2016-10-26 中国科学技术信息研究所 Name disambiguation method and apparatus
CN109558494A (en) * 2018-10-29 2019-04-02 中国科学院计算机网络信息中心 A kind of scholar's name disambiguation method based on heterogeneous network insertion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100583804C (en) * 2007-06-22 2010-01-20 清华大学 Method and system for processing social network expert information based on expert value propagation algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080065623A1 (en) * 2006-09-08 2008-03-13 Microsoft Corporation Person disambiguation using name entity extraction-based clustering
CN104182420A (en) * 2013-05-27 2014-12-03 华东师范大学 Ontology-based Chinese name disambiguation method
CN106055539A (en) * 2016-05-27 2016-10-26 中国科学技术信息研究所 Name disambiguation method and apparatus
CN109558494A (en) * 2018-10-29 2019-04-02 中国科学院计算机网络信息中心 A kind of scholar's name disambiguation method based on heterogeneous network insertion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Scholar search-oriented author disambiguation;Hao Wu等;《2012 9th International Conference on Fuzzy Systems and Knowledge Discovery》;20120709;第1-5页 *
文献数据库中作者同名消歧研究;张文静;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190915(第9期);第I138-568页 *

Also Published As

Publication number Publication date
CN111008285A (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN107862027A (en) Retrieve intension recognizing method, device, electronic equipment and readable storage medium storing program for executing
CN105393263A (en) Feature completion in computer-human interactive learning
CN110297988A (en) Hot topic detection method based on weighting LDA and improvement Single-Pass clustering algorithm
CN110119473B (en) Method and device for constructing target file knowledge graph
CN104268160A (en) Evaluation object extraction method based on domain dictionary and semantic roles
WO2015043075A1 (en) Microblog-oriented emotional entity search system
CN106126619A (en) A kind of video retrieval method based on video content and system
Ilina et al. Social event detection on twitter
CN104281653A (en) Viewpoint mining method for ten million microblog texts
CN113962293B (en) LightGBM classification and representation learning-based name disambiguation method and system
CN103886020B (en) A kind of real estate information method for fast searching
CN111460158B (en) Microblog topic public emotion prediction method based on emotion analysis
CN111008285B (en) Author disambiguation method based on thesis key attribute network
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
CN105404677A (en) Tree structure based retrieval method
Campbell et al. Content+ context networks for user classification in twitter
CN112363996B (en) Method, system and medium for establishing physical model of power grid knowledge graph
CN114077705A (en) Method and system for portraying media account on social platform
CN105205075B (en) From the name entity sets extended method of extension and recommended method is inquired based on collaboration
Cortez et al. A flexible approach for extracting metadata from bibliographic citations
CN111460147A (en) Title short text classification method based on semantic enhancement
CN115795060A (en) Entity alignment method based on knowledge enhancement
Luo et al. Towards combining web classification and web information extraction: a case study
CN105426490A (en) Tree structure based indexing method
de Moura Social network analysis at scale: graph-based analysis of Twitter trends and communities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 450000 8 / F, creative island building, no.6, Zhongdao East Road, Zhengdong New District, Zhengzhou City, Henan Province

Patentee after: China Science and technology big data Research Institute

Address before: 450000 8 / F, creative island building, no.6, Zhongdao East Road, Zhengdong New District, Zhengzhou City, Henan Province

Patentee before: Big data Research Institute Institute of computing technology Chinese Academy of Sciences

OL01 Intention to license declared