CN109558494A - A kind of scholar's name disambiguation method based on heterogeneous network insertion - Google Patents

A kind of scholar's name disambiguation method based on heterogeneous network insertion Download PDF

Info

Publication number
CN109558494A
CN109558494A CN201811267181.9A CN201811267181A CN109558494A CN 109558494 A CN109558494 A CN 109558494A CN 201811267181 A CN201811267181 A CN 201811267181A CN 109558494 A CN109558494 A CN 109558494A
Authority
CN
China
Prior art keywords
paper
path
node
author
heterogeneous network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811267181.9A
Other languages
Chinese (zh)
Inventor
杜一
乔子越
周园春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN201811267181.9A priority Critical patent/CN109558494A/en
Publication of CN109558494A publication Critical patent/CN109558494A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of scholar's name disambiguation methods based on heterogeneous network insertion, it the steps include: 1) to set multiple authors for needing to disambiguate, collect it is all the relevant paper of the authors that disambiguate is needed to setting, then generate paper relationship heterogeneous network using the semantic information of the author of collected paper and paper;2) according to paper relationship heterogeneous network, by generating the path comprising paper nodes neighbors node text information based on first path random walk strategy, and these paths are saved as into training corpus;3) training corpus is trained using Skip-gram model, generates the corresponding paper characterization vector of each paper;4) for the author that a setting needs to disambiguate in step 1), the corresponding paper characterization vector of paper of the author is obtained from obtained paper characterization vector;5) the paper characterization vector that step 4) obtains is clustered, obtains several clusters, realizes the disambiguation to author's name.

Description

A kind of scholar's name disambiguation method based on heterogeneous network insertion
Technical field
The present invention relates to big data, knowledge mapping, entity is disambiguated, and heterogeneous network embedded technology field is specifically a kind of non- The method based on the characterization vector study of first path random walk heterogeneous network node of supervision carries out the technology of scholar's name disambiguation.
Background technique
In building scientific and technical literature knowledge base, it is frequently encountered the problem of author's name disambiguates.Such as in knowledge base magnanimity Document in, have a large amount of author of the same name and exist, this can reduce name retrieval, character relation excavates, the association of personage's similitude Accuracy.Such as when retrieving author's name, it may appear that paper written by all authors of the same name is asked to solve this Topic, the paper that these are retrieved by the method for generalling use cluster is allocated to different author's entities, and clusters and can be used Co-author's relationship of paper, the journal title that paper is published, the information such as Article Titles similitude are led in this way as the feature of paper Crossing can be next to marking off by different authors of the same name to dividing for paper.Problem is that how to make good use of these above-mentioned opinions The characteristic information of text.
Researcher there are many disambiguates the solution of oneself that proposes of problem for this name, most common Thinking is exactly the characteristic information according to paper, constructs it to each paper and characterizes vector, by the distribution characteristics of vector, is come Paper is distinguished;Further, can by construct paper network, by network structural information by the feature of paper to Amount projects in the stronger latent space of characterization ability, so that the higher paper of similitude is in sky in new paper vector space Between on distribution be more nearly, while similarity is high or incoherent paper between be distributed it is farther.
Summary of the invention
According to the shortcoming of authors' name disambiguation method in existing scientific and technical literature knowledge base, the present invention provides one kind to be based on The author's name of internet startup disk study on heterogeneous paper network through first path random walk names entity disambiguation method, should Method using paper author, distribution the text informations such as periodical and title, keyword, abstract, by building heterogeneous network come The structural relation between paper is established, learns to obtain the characterization vector of paper by the internet startup disk to heterogeneous network, and according to These vectors achieve the purpose that paper cluster to disambiguate academic author's name.
The present invention specifically includes the following steps:
Step 1: collecting all papers relevant with the author that disambiguates of needs in paper library, by the author of these papers, The semantic information (including the information such as title, keyword and abstract) of the journal title issued, paper constructs paper relationship heterogeneous network Network.
Step 2: the paper relationship heterogeneous network of the generation according to step 1, by based on first path random walk plan The path comprising paper nodes neighbors node text information is slightly generated, and these paths are saved as into skip-gram in next step The training corpus of model.
Step 3: the corpus of the path composition of random walk is generated according to the step 2, is based on Skip-gram model Learn paper and characterizes vector.
Step 4: the author's title for needing to disambiguate in step 1 one collects the corresponding characterization vector of its paper, gives Agglomerate number K clusters above-mentioned characterization vector using Agglomerative Hierarchical Clustering method.Different clusters after cluster, represent it is of the same name but Paper set written by different authors, to realize the disambiguation to author's name.
Relative to previous correlation technique, scholar's name disambiguation method of the invention based on heterogeneous network insertion is excellent Point and contribution essentially consist in:
1. a method based on heterogeneous network representative learning is proposed, by constructing paper relationship heterogeneous network, based on member Path random walk strategy generates the path comprising paper nodes neighbors node text information, and formed according to these yuan of path Training corpus is the vector that each paper node efficiently learns the latent space to a low-dimensional using Skip-gram model It indicates, so that more, the same periodicals of common author, the distribution more adjunction of the higher paper of title similitude spatially Closely, at the same be unsatisfactory for these conditions paper distribution it is farther.
2. by the heterogeneous relation network of building paper, random walk and skip-gram model based on first path are protected The semantic information (title, abstract, keyword of paper etc.) for having stayed the structural information of paper network and the attribute of paper, compared to Previous algorithm increases the similitude used between the text informations such as Article Titles, abstract, keyword, improves paper table Levy the representativeness of vector.
3. by benchmark dataset test show this method while keeping higher arithmetic speed, relatively 20% to 40% is improved in the effect of most clustering.
Detailed description of the invention
Fig. 1 is the method for the present invention flow diagram;
Fig. 2 is heterogeneous network schematic diagram;
Fig. 3 is first path schematic diagram;
Fig. 4 is coordinates measurement schematic diagram.
Specific embodiment
Below in conjunction with attached drawing and embodiment the present invention is further elaborated explanation.
The present invention takes a kind of non-supervisory side based on the characterization vector study of first path random walk heterogeneous network node Method carries out the disambiguation of scholar's name.In following embodiment, chooses name and disambiguate paper benchmark database as paper library, and combine attached Figure, it is further elucidated above to the present invention.
Step 1: collecting all papers relevant with the author that disambiguates of needs in paper library, by the author of these papers, The journal title issued, the information architectures paper relationship heterogeneous network such as title, keyword and abstract.
Using each paper as the node in heterogeneous network, if having common author between them, just at them Between construct an entitled CoAuthor of relationship side, while this edge have common author's number attribute, if there is 1 altogether Same author, the attribute of this relationship are just 1, if there is 2 common authors, then attribute is just 2, and so on.
If two papers come from the same periodical, it is entitled that a relationship is just established between the two papers The side of CoVenue, since often a paper can only belong to a periodical, so the attribute value of this relationship is all 1.
If in the title of two papers, keyword and abstract, having the word of identical appearance, and this word is not off word, , here also there is the attribute of number on the side for so just constructing a CoWord between them accordingly, if there is a co-occurrence word, So attribute value is 1, if there are two co-occurrence word, attribute 2, and so on.
A kind of opinion for having node type, three kinds of relationship types and two of them relationship to have attribute is thus constructed Literary heterogeneous network.The schematic diagram of network is as shown in Figure 2.
In this step, the relationship of building removes CoAuthor (common author), CoWord (same to keyword), CoVenue (altogether Same periodical/meeting) outside, it can also be constructed according to other achievement informations, such as the adduction relationship between paper, common author state Family carries out identical descriptor after subject classification etc. for full text, i.e., several relations and corresponding attribute of a relation are arranged first; If constructing a line between the corresponding node of two papers there are the relationship of a certain setting between two papers, and according to pass The title on the side is arranged in system, and the attribute value on the side is arranged according to the attribute of a relation of the relationship.
Step 2: the paper relationship heterogeneous network of the generation according to step 1, by based on first path random walk plan The path comprising paper nodes neighbors node text information is slightly generated, and these paths are saved as into skip-gram in next step The training corpus of model.
According to the paper heterogeneous network that step 1 generates, a node is arbitrarily selected in the paper heterogeneous network, with the node For start node, random walk is carried out by path of side.
Include during providing the random walk under the guidance of first path, in first path the side of multiple and different relationship names simultaneously The appearance sequence on these sides is set, such as according to member path as p1-CoAuthor-p2-CoWord-p3-CoVunue-p4 Sequence carry out random walk (i.e. in random walk at random refer to when going to some relationship, randomly select with currently Node passes through the node that the relationship is connected), each time in walk process, pass through one according to the type on side as defined in current first path Kind rule is randomly selected, randomly selects node that one is connected by the type side with present node as next node, i.e., A paper node is randomly choosed first as starting path point, then randomly selects rule selection and the node side by above-mentioned Type be CoAuthor node as next path point, then by it is above-mentioned randomly select rule selection with the path node The node that the type on side is CoWord randomly selects rule selection and path section as next path point, finally by above-mentioned The node that the type on point side is CoVunue thus constitutes the migration sequence in a first path as next path point.Again A new first path is generated according to above-mentioned steps using the last one node in above-mentioned first path as start node, by n times In this way after iteration, change generate a long path, wherein each path node storage be paper mark id.Then iteration The such long path of M generation selects the node in network as the starting section in long path in order every time when raw growing path Point, and by each long path by row storage, each path node id is separated with separator (such as space or tab), is generated Training corpus.
First path schematic diagram is as shown in Figure 3.
Meanwhile in the random walk process under the guidance of first path, migration to some current node is simultaneously advised towards first path During certain fixed class side random walk, the attribute information of the relationship can take into account, this attribute is equivalent to the weight on side, power Value is bigger, illustrates that the relationship of two nodes is closer, therefore the attribute value on this side is bigger, then node is jumped along this edge Probability it is bigger, for example, in Fig. 2, if p1 is present node, the relationship of next-hop is CoAuthor, then having the pass with p1 Two nodes of system are p4 and p2 respectively, according to the attribute value of relationship between them, then the probability from p1 migration to p4 is 1/ 3, the probability of migration to p2 is 2/3.
In some cases, has something to do is missing from for some papers, such as institute in the title of some paper Some words do not appear in the title of an any other paper, then this relationship of CoWord is the absence of for it , when this happens, more flexible strategy is just used, i.e., according to the next relationship for currently lacking relationship in first path Migration carries out migration with regard to then according to its CoVunue relationship for that above said paper.
The schematic diagram for generating path is as shown in Figure 4.
Such migration strategy is not fixed simultaneously, can be designed new by redesigning to first path Migration strategy, such as in the heterogeneous network of the above-mentioned type, first path is designed as p1-CoAuthor-p2-CoVunue- P3-CoWord-p4, design in this way can be generated new random walk path, then form new corpus.
Such heterogeneous network designs and there are many multiplicity, such as when the information in paper library includes reference information When, a kind of side of new type can be constructed in above-mentioned heterogeneous network, thus constructing one has a kind of node class The random walk path corpus of the network can be generated by designing new first path in type, the heterogeneous network of four kinds of relationship types Library.With should the paper in paper library lack a certain characteristic information when, relationship can be constructed without using this feature.
Step 3: the corpus of the path composition of random walk is generated according to the step 2, is based on Skip-gram model Realize the study of paper vector.
The corpus that the path composition of random walk is generated according to the step 2, is instructed using skip-gram model Practice, the word2vec method in the library gensim or Google open source based on C language word2vec work in specifically used python Tool.
Skip-gram model method sees the id of node as word, regards the node catenation sequence in path as word Contextual information ultimately generates the corresponding vector of each node i d by training, correspondingly, thus generating the characterization of paper Vector.
Step 4: the author's title for needing to disambiguate for one collects its all paper in existing database by step Rapid one, two, the three characterization vectors learnt give agglomerate number K, using Agglomerative Hierarchical Clustering method, to above-mentioned characterization vector into Row cluster.Different agglomerates after cluster, represent paper set written by different authors, disappear to realize to author's name Discrimination.
Experiment uses paper (Jie Tang, A.C.M.Fong, Bo Wang, and Jing Zhang.A Unified Probabilistic Framework for Name Disambiguation in Digital Library.IEEE Transaction on Knowledge and Data Engineering,Volume 24,Issue 6,2012,Pages 975-987. and Xuezhi Wang, Jie Tang, Hong Cheng, and Philip S.Yu.ADANA:Active Name Disambiguation.In Proceedings of 2011IEEE International Conference on Data Mining.pp.794-803. the paper data set in) has 100 authors' names for needing to disambiguate in the data, amounts to 7447 Paper, paper name and author information be it is complete, 4% paper lacks journal title.
All papers in data set are constructed into a heterogeneous network together first, then using the above method to every Paper learns an insertion vector, and then for each author for needing to disambiguate, the corresponding paper of the author is placed on one Cluster is played, and assumes known class number K.
It is clustered using the method or K-Means clustering method of HAC (Agglomerative Hierarchical Clustering).Cluster result is used Pairwise Precise, Pairwise Recall, Pairwise F1 evaluation index assessed, and average.It can also Not preassign agglomerate number K, in cluster, the clustering algorithms such as such as DBSCAN are used.
Baseline method used at present has LINE, DNGR, metapath2vec.Three of the above method is all that network is embedding Enter method, by constructing paper network, according to corresponding internet startup disk method, the characterization vector of paper is arrived in study.Wherein LINE With title similarity refers in building paper homogenous network, if the title between paper have it is certain similar Property, then increase the weight on corresponding paper connection side, and finally carry out internet startup disk study using the method for LINE.Following table is not With the disambiguation effect of method.
Method Prec Rec F1
our approach 79.68 80.14 78.85
LINE 61.22 49.96 53.02
LINE with title similarity 79.29 58.69 64.98
metapath2vec 64.44 67.75 64.40
DNGR 44.62 70.21 51.65
It can be seen that method of the invention is substantially better than other methods.Due to the study for having used heterogeneous network to be embedded in Method, remains the relation information of paper itself as far as possible, so that the characterization ability that the paper vector learnt has is stronger, Therefore the effect of disambiguation is improved.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the spirit and scope of the present invention, this The protection scope of invention should subject to the claims.

Claims (10)

1. a kind of scholar's name disambiguation method based on heterogeneous network insertion, the steps include:
1) set it is multiple need the authors that disambiguate, collect the relevant paper of all authors disambiguated with setting needs, then utilize The author of collected paper and semantic information generate paper relationship heterogeneous network;
It 2) include paper nodes neighbors by being generated based on first path random walk strategy according to the paper relationship heterogeneous network The path of node text information, and these paths are saved as to the training corpus of skip-gram model;
3) training corpus is trained using Skip-gram model, generate the corresponding paper of each paper characterize to Amount;
4) for the author that a setting needs to disambiguate in step 1), the author is obtained from the paper characterization vector that step 3) obtains The corresponding paper of paper characterize vector;
5) the paper characterization vector that step 4) obtains is clustered, obtains several clusters;Using different clusters as of the same name with the author But it is not paper set written by same people, realizes the disambiguation to author's name.
2. the method as described in claim 1, which is characterized in that the method for generating the paper relationship heterogeneous network are as follows: will be every Several relations and corresponding attribute of a relation is arranged as the node in heterogeneous network in one paper;If between two papers There are the relationships of a certain setting, then a line is constructed between the corresponding node of two papers, and the name on the side is arranged according to relationship Claim, and the attribute value on the side is set according to the attribute of a relation of the relationship.
3. method according to claim 2, which is characterized in that the relationship includes but is not limited to following one or more of passes System: containing common author, comprising same keyword, belong to that common periodical or meeting, there are adduction relationships, common author country.
4. method according to claim 1 or 2, which is characterized in that the method for generating the training corpus are as follows: in the paper A node is arbitrarily selected in relationship heterogeneous network, using the node as start node, carries out migration under the guidance of first path, it is long to generate one Path;Change start node continues to generate the long path, and by each long path by row storage, each path node id is used Separator separates, and generates training corpus.
5. method as claimed in claim 4, which is characterized in that carry out the method for migration under the guidance of first path are as follows:
51) selection that path top is carried out according to the sequence of side appearance as defined in first path, if present node has to next node Multiple qualified sides then choose a qualified side and determine the connected next node of present node;The member In path including multiple and different relationship names while and appearance sequence while these are set;
52) after the n times for repeating step 51) setting, a long path is obtained.
6. method as claimed in claim 5, which is characterized in that in the step 51), choose one according to the weight on side and meet Determine the connected next node of present node in the side of condition;Wherein, the probability that the bigger side of weight is selected is also bigger.
7. the method as described in claim 1, which is characterized in that the skip-gram model is by the path node in path Id sees word as, and the node catenation sequence in path is regarded as to the contextual information of word, ultimately generates each section by training The corresponding vector of point id, i.e. node i d correspond to the paper characterization vector of paper.
8. method as claimed in claim 1 or 7, which is characterized in that the skip-gram model is the library gensim in python Word2vec method or Google open source based on C language word2vec tool.
9. the method as described in claim 1, which is characterized in that utilize author, journal title and the semantic information of collected paper Generate paper relationship heterogeneous network;Institute's semantic information includes but is not limited to following one or more of information: author, title, pass Keyword and summary info.
10. the method as described in claim 1, which is characterized in that given agglomerate number K, using Agglomerative Hierarchical Clustering method, to step The rapid paper characterization vector 4) obtained is clustered.
CN201811267181.9A 2018-10-29 2018-10-29 A kind of scholar's name disambiguation method based on heterogeneous network insertion Pending CN109558494A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811267181.9A CN109558494A (en) 2018-10-29 2018-10-29 A kind of scholar's name disambiguation method based on heterogeneous network insertion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811267181.9A CN109558494A (en) 2018-10-29 2018-10-29 A kind of scholar's name disambiguation method based on heterogeneous network insertion

Publications (1)

Publication Number Publication Date
CN109558494A true CN109558494A (en) 2019-04-02

Family

ID=65865176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811267181.9A Pending CN109558494A (en) 2018-10-29 2018-10-29 A kind of scholar's name disambiguation method based on heterogeneous network insertion

Country Status (1)

Country Link
CN (1) CN109558494A (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020214A (en) * 2019-04-08 2019-07-16 北京航空航天大学 A kind of social networks streaming events detection system merging knowledge
CN110502637A (en) * 2019-08-27 2019-11-26 秒针信息技术有限公司 A kind of information processing method and information processing unit based on Heterogeneous Information network
CN110516146A (en) * 2019-07-15 2019-11-29 中国科学院计算机网络信息中心 A kind of author's name disambiguation method based on the insertion of heterogeneous figure convolutional neural networks
CN110830291A (en) * 2019-10-30 2020-02-21 辽宁工程技术大学 Node classification method of heterogeneous information network based on meta-path
CN111008285A (en) * 2019-11-29 2020-04-14 中科院计算技术研究所大数据研究院 Author disambiguation method based on thesis key attribute network
CN111104797A (en) * 2019-12-17 2020-05-05 南开大学 Paper network representation learning method based on dual sequence-to-sequence generation
CN111191466A (en) * 2019-12-25 2020-05-22 中国科学院计算机网络信息中心 Homonymous author disambiguation method based on network characterization and semantic characterization
CN111221968A (en) * 2019-12-31 2020-06-02 北京航空航天大学 Author disambiguation method and device based on subject tree clustering
CN111241283A (en) * 2020-01-15 2020-06-05 电子科技大学 Rapid characterization method for portrait of scientific research student
CN111581949A (en) * 2020-05-12 2020-08-25 上海市研发公共服务平台管理中心 Method and device for disambiguating name of learner, storage medium and terminal
CN111881693A (en) * 2020-07-28 2020-11-03 平安科技(深圳)有限公司 Paper author disambiguation method and device and computer equipment
CN111930955A (en) * 2020-10-12 2020-11-13 北京智源人工智能研究院 Method and device for disambiguating author name and electronic equipment
CN112148776A (en) * 2020-09-29 2020-12-29 清华大学 Academic relation prediction method and device based on neural network introducing semantic information
CN112417082A (en) * 2020-10-14 2021-02-26 西南科技大学 Scientific research achievement data disambiguation filing storage method
CN112463977A (en) * 2020-10-22 2021-03-09 三盟科技股份有限公司 Community mining method, system, computer and storage medium based on knowledge graph
CN112487825A (en) * 2020-11-30 2021-03-12 北京航空航天大学 Talent information database disambiguation system
CN112597305A (en) * 2020-12-22 2021-04-02 上海师范大学 Scientific and technological literature author name disambiguation method based on deep learning and web end disambiguation device
WO2021077642A1 (en) * 2019-10-24 2021-04-29 中国科学院信息工程研究所 Network space security threat detection method and system based on heterogeneous graph embedding
CN112836050A (en) * 2021-02-04 2021-05-25 山东大学 Citation network node classification method and system aiming at relation uncertainty
CN112836518A (en) * 2021-01-29 2021-05-25 华南师范大学 Name disambiguation model processing method, system and storage medium
CN113051397A (en) * 2021-03-10 2021-06-29 北京工业大学 Academic paper homonymy disambiguation method based on heterogeneous information network representation learning and word vector representation
CN113111178A (en) * 2021-03-04 2021-07-13 中国科学院计算机网络信息中心 Method and device for disambiguating homonymous authors based on expression learning without supervision
CN113554175A (en) * 2021-09-18 2021-10-26 平安科技(深圳)有限公司 Knowledge graph construction method and device, readable storage medium and terminal equipment
CN117556058A (en) * 2024-01-11 2024-02-13 安徽大学 Knowledge graph enhanced network embedded author name disambiguation method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111973A (en) * 2014-06-17 2014-10-22 中国科学院计算技术研究所 Scholar name duplication disambiguation method and system
CN104268200A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Unsupervised named entity semantic disambiguation method based on deep learning
US20160335367A1 (en) * 2015-05-15 2016-11-17 Microsoft Technology Licensing, Llc Entity disambiguation using multisource learning
CN107451596A (en) * 2016-05-30 2017-12-08 清华大学 A kind of classified nodes method and device
CN107590128A (en) * 2017-09-21 2018-01-16 湖北大学 A kind of paper based on high confidence features attribute Hierarchical clustering methods author's disambiguation method of the same name
CN107633263A (en) * 2017-08-30 2018-01-26 清华大学 Network embedding grammar based on side
CN107861939A (en) * 2017-09-30 2018-03-30 昆明理工大学 A kind of domain entities disambiguation method for merging term vector and topic model
CN108228728A (en) * 2017-12-11 2018-06-29 北京航空航天大学 A kind of paper network node of parametrization represents learning method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268200A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Unsupervised named entity semantic disambiguation method based on deep learning
CN104111973A (en) * 2014-06-17 2014-10-22 中国科学院计算技术研究所 Scholar name duplication disambiguation method and system
US20160335367A1 (en) * 2015-05-15 2016-11-17 Microsoft Technology Licensing, Llc Entity disambiguation using multisource learning
CN107451596A (en) * 2016-05-30 2017-12-08 清华大学 A kind of classified nodes method and device
CN107633263A (en) * 2017-08-30 2018-01-26 清华大学 Network embedding grammar based on side
CN107590128A (en) * 2017-09-21 2018-01-16 湖北大学 A kind of paper based on high confidence features attribute Hierarchical clustering methods author's disambiguation method of the same name
CN107861939A (en) * 2017-09-30 2018-03-30 昆明理工大学 A kind of domain entities disambiguation method for merging term vector and topic model
CN108228728A (en) * 2017-12-11 2018-06-29 北京航空航天大学 A kind of paper network node of parametrization represents learning method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUXIAO DONG等: "metapath2vec:Scalable Representation Learning for Heterogeneous Networks", 《MICROSOFT RESEARCH》 *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110020214B (en) * 2019-04-08 2021-05-18 北京航空航天大学 Knowledge-fused social network streaming event detection system
CN110020214A (en) * 2019-04-08 2019-07-16 北京航空航天大学 A kind of social networks streaming events detection system merging knowledge
CN110516146A (en) * 2019-07-15 2019-11-29 中国科学院计算机网络信息中心 A kind of author's name disambiguation method based on the insertion of heterogeneous figure convolutional neural networks
CN110516146B (en) * 2019-07-15 2022-08-19 中国科学院计算机网络信息中心 Author name disambiguation method based on heterogeneous graph convolutional neural network embedding
CN110502637B (en) * 2019-08-27 2022-03-01 秒针信息技术有限公司 Information processing method and information processing device based on heterogeneous information network
CN110502637A (en) * 2019-08-27 2019-11-26 秒针信息技术有限公司 A kind of information processing method and information processing unit based on Heterogeneous Information network
WO2021077642A1 (en) * 2019-10-24 2021-04-29 中国科学院信息工程研究所 Network space security threat detection method and system based on heterogeneous graph embedding
CN110830291B (en) * 2019-10-30 2023-01-10 辽宁工程技术大学 Node classification method of heterogeneous information network based on meta-path
CN110830291A (en) * 2019-10-30 2020-02-21 辽宁工程技术大学 Node classification method of heterogeneous information network based on meta-path
CN111008285B (en) * 2019-11-29 2021-04-13 中科院计算技术研究所大数据研究院 Author disambiguation method based on thesis key attribute network
CN111008285A (en) * 2019-11-29 2020-04-14 中科院计算技术研究所大数据研究院 Author disambiguation method based on thesis key attribute network
CN111104797B (en) * 2019-12-17 2023-05-02 南开大学 Dual-based sequence-to-sequence generation paper network representation learning method
CN111104797A (en) * 2019-12-17 2020-05-05 南开大学 Paper network representation learning method based on dual sequence-to-sequence generation
US11775594B2 (en) 2019-12-25 2023-10-03 Computer Network Information Center, Chinese Academy Of Sciences Method for disambiguating between authors with same name on basis of network representation and semantic representation
CN111191466A (en) * 2019-12-25 2020-05-22 中国科学院计算机网络信息中心 Homonymous author disambiguation method based on network characterization and semantic characterization
WO2021128158A1 (en) * 2019-12-25 2021-07-01 中国科学院计算机网络信息中心 Method for disambiguating between authors with same name on basis of network representation and semantic representation
CN111221968A (en) * 2019-12-31 2020-06-02 北京航空航天大学 Author disambiguation method and device based on subject tree clustering
CN111221968B (en) * 2019-12-31 2023-07-21 北京航空航天大学 Author disambiguation method and device based on subject tree clustering
CN111241283A (en) * 2020-01-15 2020-06-05 电子科技大学 Rapid characterization method for portrait of scientific research student
CN111581949A (en) * 2020-05-12 2020-08-25 上海市研发公共服务平台管理中心 Method and device for disambiguating name of learner, storage medium and terminal
WO2021139256A1 (en) * 2020-07-28 2021-07-15 平安科技(深圳)有限公司 Disambiguation method and apparatus for author of paper, and computer device
CN111881693B (en) * 2020-07-28 2023-01-13 平安科技(深圳)有限公司 Paper author disambiguation method and device and computer equipment
CN111881693A (en) * 2020-07-28 2020-11-03 平安科技(深圳)有限公司 Paper author disambiguation method and device and computer equipment
CN112148776A (en) * 2020-09-29 2020-12-29 清华大学 Academic relation prediction method and device based on neural network introducing semantic information
CN112148776B (en) * 2020-09-29 2024-05-03 清华大学 Academic relationship prediction method and device based on neural network introducing semantic information
CN111930955A (en) * 2020-10-12 2020-11-13 北京智源人工智能研究院 Method and device for disambiguating author name and electronic equipment
CN112417082A (en) * 2020-10-14 2021-02-26 西南科技大学 Scientific research achievement data disambiguation filing storage method
CN112417082B (en) * 2020-10-14 2022-06-07 西南科技大学 Scientific research achievement data disambiguation filing storage method
CN112463977A (en) * 2020-10-22 2021-03-09 三盟科技股份有限公司 Community mining method, system, computer and storage medium based on knowledge graph
CN112487825A (en) * 2020-11-30 2021-03-12 北京航空航天大学 Talent information database disambiguation system
CN112597305A (en) * 2020-12-22 2021-04-02 上海师范大学 Scientific and technological literature author name disambiguation method based on deep learning and web end disambiguation device
CN112597305B (en) * 2020-12-22 2023-09-01 上海师范大学 Scientific literature author name disambiguation method and web end disambiguation device based on deep learning
CN112836518B (en) * 2021-01-29 2023-12-26 华南师范大学 Method, system and storage medium for processing name disambiguation model
CN112836518A (en) * 2021-01-29 2021-05-25 华南师范大学 Name disambiguation model processing method, system and storage medium
CN112836050A (en) * 2021-02-04 2021-05-25 山东大学 Citation network node classification method and system aiming at relation uncertainty
CN112836050B (en) * 2021-02-04 2022-05-17 山东大学 Citation network node classification method and system aiming at relation uncertainty
CN113111178B (en) * 2021-03-04 2021-12-10 中国科学院计算机网络信息中心 Method and device for disambiguating homonymous authors based on expression learning without supervision
CN113111178A (en) * 2021-03-04 2021-07-13 中国科学院计算机网络信息中心 Method and device for disambiguating homonymous authors based on expression learning without supervision
CN113051397A (en) * 2021-03-10 2021-06-29 北京工业大学 Academic paper homonymy disambiguation method based on heterogeneous information network representation learning and word vector representation
CN113554175B (en) * 2021-09-18 2021-11-26 平安科技(深圳)有限公司 Knowledge graph construction method and device, readable storage medium and terminal equipment
CN113554175A (en) * 2021-09-18 2021-10-26 平安科技(深圳)有限公司 Knowledge graph construction method and device, readable storage medium and terminal equipment
CN117556058A (en) * 2024-01-11 2024-02-13 安徽大学 Knowledge graph enhanced network embedded author name disambiguation method and device
CN117556058B (en) * 2024-01-11 2024-05-24 安徽大学 Knowledge graph enhanced network embedded author name disambiguation method and device

Similar Documents

Publication Publication Date Title
CN109558494A (en) A kind of scholar's name disambiguation method based on heterogeneous network insertion
CN110516146B (en) Author name disambiguation method based on heterogeneous graph convolutional neural network embedding
Ramage et al. Clustering the tagged web
Gupta et al. Survey on social tagging techniques
Almpanidis et al. Combining text and link analysis for focused crawling—An application for vertical search engines
Yin et al. Building taxonomy of web search intents for name entity queries
Au Yeung et al. Contextualising tags in collaborative tagging systems
Foley et al. Learning to extract local events from the web
Aznag et al. Leveraging formal concept analysis with topic correlation for service clustering and discovery
Ju et al. Things and strings: improving place name disambiguation from short texts by combining entity co-occurrence with topic modeling
Plangprasopchok et al. Constructing folksonomies from user-specified relations on flickr
Bauman et al. Discovering Contextual Information from User Reviews for Recommendation Purposes.
Iezzi Centrality measures for text clustering
Role et al. Beyond cluster labeling: Semantic interpretation of clusters’ contents using a graph representation
Faralli et al. Automatic acquisition of a taxonomy of microblogs users’ interests
Qassimi et al. The role of collaborative tagging and ontologies in emerging semantic of web resources
Panasyuk et al. Extraction of semantic activities from twitter data.
Yuan et al. Category hierarchy maintenance: a data-driven approach
Fernando et al. Comparing taxonomies for organising collections of documents
Bagdouri et al. Profession-based person search in microblogs: Using seed sets to find journalists
Gabriel et al. Summarizing dynamic social tagging systems
Jain et al. Organizing query completions for web search
Ali et al. Graph-based semantic learning, representation and growth from text: A systematic review
Ayyasamy et al. Mining Wikipedia knowledge to improve document indexing and classification
Li et al. A hierarchical entity-based approach to structuralize user generated content in social media: A case of Yahoo! answers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190402