CN109376238B - Paper correlation degree quantification method based on reference document list overlapping degree - Google Patents

Paper correlation degree quantification method based on reference document list overlapping degree Download PDF

Info

Publication number
CN109376238B
CN109376238B CN201811072484.5A CN201811072484A CN109376238B CN 109376238 B CN109376238 B CN 109376238B CN 201811072484 A CN201811072484 A CN 201811072484A CN 109376238 B CN109376238 B CN 109376238B
Authority
CN
China
Prior art keywords
paper
articles
article
papers
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811072484.5A
Other languages
Chinese (zh)
Other versions
CN109376238A (en
Inventor
刘嘉莹
张冬瑜
肖心茹
步晓楠
宁兆龙
夏锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201811072484.5A priority Critical patent/CN109376238B/en
Publication of CN109376238A publication Critical patent/CN109376238A/en
Application granted granted Critical
Publication of CN109376238B publication Critical patent/CN109376238B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a thesis correlation degree quantification method based on reference document list overlapping degree. The method associates the overlapping degree of the reference documents with the similarity of the articles through a co-citation idea, quantifies the similarity of the articles by combining a statistical verification network and an error discovery rate, and obtains a threshold value for judging whether the two articles are similar. Meanwhile, the invention also provides several application methods of the quantification method, and the application methods are applied to the detection of homology of the thesis, the search of missed introduction and the simplification of a reference list. The method can evaluate the relevancy of the thesis on the basis of the analysis, improves the accuracy and effectiveness of the relevancy calculation of the thesis, greatly reduces the calculation amount, and simultaneously applies the evaluation method to the missed-introduction detection of the thesis, thereby providing a basis for the ranking retrieval, clustering and classification of the thesis and the error detection in the reference documents.

Description

Paper correlation degree quantification method based on reference document list overlapping degree
Technical Field
The invention belongs to the technical field of quantitative thesis similarity in the field of design academics, and particularly relates to a thesis correlation degree measuring method based on a co-citation thought and a statistical verification network.
Background
With the explosive development of scientific fields, the number of academic papers is also rapidly increasing. Under the condition, the relevance among the quantitative papers has a great value, and the relevance can be used as an important basis for document retrieval and document classification clustering. However, mainstream text analysis methods (such as cosine similarity-based methods and TF-IDF-based methods) are not very suitable for academic papers with huge text data volume, and have high computational complexity and low efficiency; the reliability of results generated by classification screening comparison methods (such as a naive Bayes algorithm and a KNN algorithm) in academic papers of the same class and subject is not strong. On the other hand, we note that the reference as a very important part of the academic papers can reflect important problems in the research field of the papers and the like. However, there are many implicit citation non-specification problems in the references, such as missing and multiple citations. These problems are difficult to detect and it is relevant to have some way to embody the references and papers.
Disclosure of Invention
The invention aims to provide a paper similarity quantification method based on the overlapping degree of a reference document list aiming at the problems existing in the research, and the similarity between articles is measured by applying a statistical verification network method to the overlapping degree of the reference documents cited by the two papers. Meanwhile, the invention also utilizes the FDR error discovery rate method to carry out multiple verification on the calculated values to obtain a threshold value for judging whether the two papers are similar. In addition, the invention also provides an application method of the quantification method, and the application method is applied to the problems of verifying the homology of the paper, searching missed references, simplifying reference documents and the like.
The technical scheme of the invention is as follows:
a paper correlation quantification method based on reference list overlapping degree comprises the following steps:
s1: preprocessing data sets to build a paper reference network
Selecting a paper with Computer Science as a data set, extracting useful information in the data set, including the number of the paper, the title of the paper and a reference document list, calculating the number of cited articles and the number of cited articles of each paper, and storing the information in a dictionary structure to form a paper cited network;
s2: classifying the data set of S1 and calculating related parameters
Classifying the data set of S1 and calculating the related parameters, the formula is as follows:
Figure BDA0001799921920000011
Figure BDA0001799921920000012
wherein A represents a paper collection that cites at least one article, B represents a paper collection that is cited at least by two articles,
Figure BDA0001799921920000013
represents a subset divided by the number of references k for set B,
Figure BDA0001799921920000021
representing all reference subsets in set A
Figure BDA0001799921920000022
The collection of articles in (a) is,
Figure BDA0001799921920000023
to represent
Figure BDA0001799921920000024
The size of (d);
s3: calculating the correlation p of the two papers by using the correlation parameters and the concept of statistical verification network described in S2
S3.1: for paper pair (i, j), d is cited in paper iiAn article and article j cite djUnder the assumption of articles, the probability that the article i and the article j refer to X articles together is calculated, and the calculation formula is as follows:
Figure BDA0001799921920000025
s3.2: using as described in S3.1
Figure BDA0001799921920000026
Calculating similarity quantization value p between any two papers, and commonly referring to paper i and paper j
Figure BDA0001799921920000027
The number of articles in (1) is greater than or equal to
Figure BDA0001799921920000028
Wherein each p value corresponds to a k value, the calculation formula is as follows:
Figure BDA0001799921920000029
wherein d isiAnd djRepresenting respective sets of references in paper pair (i, j)
Figure BDA00017999219200000210
The number of articles in (a) is,
Figure BDA00017999219200000211
indicating that paper i and paper j refer together a set
Figure BDA00017999219200000212
The number of articles in (1); the value of p is pij(k) Indicating that paper i is commonly referenced with paper j
Figure BDA00017999219200000213
The number of articles in (1) is greater than or equal to
Figure BDA00017999219200000214
The probability of (d);
s3.3: for slave kminTo kmaxRepeating the process S3.1-S3.2 for all possible values of k, such that each article pair (i, j) is associated with more than one p-value, each p-value associated with a quoted number of articles k;
s4: multiple tests are performed on the p-value in S3 by using the false discovery rate method to obtain a threshold p for judging whether the articles are significantly similar*Identifying a significantly similar pair of papers;
s4.1: setting a statistical threshold p, assuming a total of NtIndividual tests, ranking the p-values of all different tests in increasing order:
Figure BDA00017999219200000215
s4.2: continuously readjusting the threshold p until the maximum t satisfying the condition is foundmaxThe conditions were as follows:
Figure BDA00017999219200000216
wherein N istDenotes the number of tests, NtRefers to the set S to which all values at the number of references k correspondkThe number of all the different article pairs tested;
s4.3: comparing each value of p with the readjusted threshold value p*If the value of p is less than the threshold value p*If the similarity of the paper pair (i, j) is at the threshold value p*The above time passes statistical verification, i.e. papers i, j are similar;
s5: analyzing the result of S4, verifying the homology of the paper and finding out the possible missing phenomenon, deleting the reference document with low correlation degree, and providing correct reference for scientific evaluation
S5.1: verifying the homology of the paper, for each threshold p*All calculate Pi→j(p*) The calculation formula is as follows:
Figure BDA0001799921920000031
wherein, Pi→j(p*) To be at a threshold value p*Probability of citation between any two papers verified by similarity, K (p)*) To pass verificationThere are a number of references, M (p), between pairs of papers*) Passing a confidence threshold p for at least one of the associated p values*Size of the collection of statistically tested papers;
fit out Pi→j(p*) And p*The functional image of the two papers obtains the conclusion that the higher the similarity of the two papers is, the higher the probability of the citation relationship exists between the two papers is, and verifies the homology of the citation of the papers;
s5.2: searching possible missing leads, selecting a paper m based on the conclusion of S5.1, obtaining a plurality of articles similar to the paper m in the steps of S1-S4, classifying all the articles which are similar to the paper m and quote the article n into a set S, if the article m does not quote the article n, automatically setting a threshold value p ', and judging that the article m misses the article chapter n when the set S is larger than the threshold value p';
s5.3: steps S1-S4 are performed on the article and its own references to delete references with insufficiently high similarity and simplify the list of references of the article.
The invention has the beneficial effects that: through a co-citation idea, the overlapping degree of a reference document is associated with the similarity of the two papers, and the similarity of the two papers is quantified by combining a statistical verification network. In addition, the invention also calculates the probability that the two articles quote the same article at the same time by using the related knowledge of statistics, and performs multiple verification on the calculated value by using the FDR false discovery rate method to obtain a threshold value for judging whether the two articles are similar. The invention can evaluate the similarity between papers and reference documents on the basis of the analysis, thereby providing basis for document retrieval and clustering classification of papers, and also providing convenience for the discovery of implicit citation problems, such as missing citation and multiple citation, of article authors and reviewers.
Drawings
FIG. 1 is a flow chart of a method for quantifying paper relevancy based on reference list overlap;
FIG. 2 is a set used in the present invention
Figure BDA0001799921920000032
And
Figure BDA0001799921920000033
a graph of relationships between;
FIG. 3 shows the experimentally obtained Pi→j(p*) And a statistical threshold p*A functional relationship graph between;
FIG. 4 is a diagram illustrating a paper missing phenomenon.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.
The embodiment of the invention provides a paper relevancy quantification method based on reference document list overlapping degree, the flow is shown as figure 1, and the design comprises the following steps:
the method comprises the following steps: and (4) preprocessing data and deleting irrelevant redundant data.
Selecting a paper with Computer Science as the domain information in the data set, deleting the redundant attribute of the paper, only leaving a number id for identifying the paper, a title of the paper and reference list references, and simultaneously calculating the number of articles cited and the number of times of articles cited in each article and storing the information in a dictionary structure.
Step two: the data sets are classified and simplified and relevant parameters are calculated.
2.1: sift out several sets A, B, Sk
Figure BDA0001799921920000041
Wherein A represents a collection of articles that cite at least one document, B represents a collection of articles that are cited at least by two articles,
Figure BDA0001799921920000042
representing a collection of articles referenced by k articles,
Figure BDA0001799921920000043
means at least referencing
Figure BDA0001799921920000044
Article in (1), SkIs that
Figure BDA0001799921920000045
And
Figure BDA0001799921920000046
the union of (a). We show more intuitively in fig. 2
Figure BDA0001799921920000047
And
Figure BDA0001799921920000048
the relationship between them.
The screening process includes that firstly, a paper which quotes at least one article is put into a set A, and a paper which is quoted by at least two articles is put into a set B. Then dividing the subset into a set B according to the reference times k
Figure BDA0001799921920000049
And calculate
Figure BDA00017999219200000410
The size of (2). Then refer to in set A
Figure BDA00017999219200000411
The articles in (1) are put into a collection
Figure BDA00017999219200000412
Finally, find out
Figure BDA00017999219200000413
And
Figure BDA00017999219200000414
is combined to obtain Sk
2.2: constructing statistical verification networks of two parties by using the characteristic that each article in the citation network can only quote other published articles and calculating the parameter di、dj
Figure BDA00017999219200000415
The value of (c). Wherein d isiAnd djRepresenting collections of articles i and j, respectively
Figure BDA00017999219200000416
The number of articles in (a) is,
Figure BDA00017999219200000417
to represent
Figure BDA00017999219200000418
The size of (a) is (b),
Figure BDA00017999219200000419
representing measured i and j co-reference sets
Figure BDA00017999219200000420
The number of articles in (1).
Consider that if there is only one cited article with an in-degree of k, the number of common citations for both articles must be 1, i.e.
Figure BDA00017999219200000421
For more than one article cited, the size of the intersection of the set of cited articles cited by two articles is calculated to obtain
Figure BDA00017999219200000422
The size of (2).
Step three: and calculating a similarity quantization value p. Calculating whether two articles are related or not by using the idea of statistical verification network, namely calculating the probability to obtain the quantitative value p of the similarity between any two articles (the articles i and j are commonly quoted
Figure BDA00017999219200000423
The number of articles in (1) is greater than or equal to
Figure BDA00017999219200000424
Probability of (d). At this point, each p value corresponds to a k value (the number of times the reference is cited).
For each pair of papers (i, j), we use diAnd djIndicating the number of elements in their respective reference.
D in the separate citations for article i and article j can be calculated using the following formulaiAnd djUnder the assumption of different articles, the probability that they choose to refer to X articles that are not identical:
Figure BDA0001799921920000051
the p-value for each article can thus be calculated using the following formula:
Figure BDA0001799921920000052
wherein the value of p, i.e. pij(k) Meaning that i and j are commonly referenced
Figure BDA0001799921920000053
The number of articles in (1) is greater than or equal to
Figure BDA0001799921920000054
The probability of (c).
For slave kminTo kmaxRepeats the process for all possible k values. Each pair of articles (i, j) is associated with more than one p-value, each p-value of the association corresponding to an article whose quote is k.
Step four: the result is subjected to multiple tests by using the FDR error discovery rate method to obtain a significant threshold value p*And checking that all AND are less than threshold p*The article pair associated with the p-value of (a). Assuming this statistical threshold value, only verified pairs of articles can be considered significantly similar. The method specifically comprises the following steps:
4.1: setting a statistical threshold p*Suppose thatTotal N istAnd (5) testing. Ranking the p-values of all different tests in increasing order
Figure BDA0001799921920000055
4.2: the threshold is continuously readjusted until the maximum tmax is found that satisfies the condition:
Figure BDA0001799921920000056
wherein N istIndicates the number of tests, N in the present inventiontRefers to the set S corresponding to all the incomes k on the quote networkkThe number of all different article pairs tested on the corresponding set.
4.3: comparing each value of p with the readjusted threshold value p*. If the value of p is less than the threshold value p*We can assume that the similarity of the article to (i, j) is at the threshold p*The above time passes statistical verification, i.e. papers i, j are similar.
Through the above steps, we can obtain the similarity values of all the paper pairs.
Step five: and (3) applying an algorithm, researching a citation mode of a homologous article, reflecting a possible missing citation phenomenon and simplifying a reference list.
5.1: homologous articles refer to the tendency of you to associate with one another in papers with a certain degree of similarity. The expression of homology in the paper citation can be obtained by using the results obtained by the above algorithm to express the data graphically. At a certain threshold p*Here, the probability P that the observation paper i refers to the paper j for passing verificationi→j(p*) Obtaining M (p)*) With p*The conclusion that the value decreases as p decreases, because*The references of the article pairs need to have a higher degree of overlap to enter the set (p)*)。
For each threshold value p*Calculate Pi→j(p*) I.e. at the threshold value p*Pass similarity verificationThe probability of citation between any two papers of (a) is calculated by the formula:
Figure BDA0001799921920000061
wherein, K (p)*) Indicates the number of references, M (p), that exist between pairs of validated papers*) At least one of the p values representing the association passes a confidence threshold p*Size of the collection of statistically tested papers.
FIG. 3 shows the experimentally obtained Pi→j(p*) And a statistical threshold p*A function of (a). The analysis chart shows that the higher the similarity of two articles, the higher the probability that there is a citation relationship between them. This verifies the homology of the paper citation.
5.2: in the previous step, we have verified that the higher the similarity of the two articles, the higher the overlap of the reference lists. Consider the situation as shown in figure 4. Article B is verified to be similar to articles C, D, E, etc. at a small threshold, and the references to B and C, B and D, B and E all overlap, i.e., C, D, E has a high probability of referring to the same document. If document a is cited at this point in C, D, E, and S includes all articles that are similar to B and that cited a, when set S is sufficiently large, there is a high probability that B also cited a. A threshold value p 'is set, and when the size of the set S is larger than the threshold value p', the article B is judged to miss the article A.
5.3: the above method is performed in the papers and their own references, the thresholds associated with the documents are determined, and the references with insufficiently high correlation are deleted, thereby simplifying the list of references.
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (1)

1. A paper relevancy quantification method based on reference document list overlapping degree is characterized by comprising the following steps:
s1: preprocessing data sets to build a paper reference network
Selecting a paper with Computer Science as a data set, extracting useful information in the data set, including the number of the paper, the title of the paper and a reference document list, calculating the number of cited articles and the number of cited articles of each paper, and storing the information in a dictionary structure to form a paper cited network;
s2: classifying the data set of S1 and calculating related parameters
Classifying the data set of S1 and calculating the related parameters, the formula is as follows:
Figure FDA0002751055450000011
Figure FDA0002751055450000012
wherein A represents a paper collection that cites at least one article, B represents a paper collection that is cited at least by two articles,
Figure FDA0002751055450000013
represents a subset divided by the number of references k for set B,
Figure FDA0002751055450000014
representing all reference subsets in set A
Figure FDA0002751055450000015
The collection of articles in (a) is,
Figure FDA0002751055450000016
to represent
Figure FDA0002751055450000017
The size of (d);
s3: calculating the correlation p of the two papers by using the correlation parameters and the concept of statistical verification network described in S2
S3.1: for paper pair (i, j), d is cited in paper iiAn article and article j cite djUnder the assumption of articles, the probability that the article i and the article j refer to X articles together is calculated, and the calculation formula is as follows:
Figure FDA0002751055450000018
s3.2: using as described in S3.1
Figure FDA0002751055450000019
Calculating similarity quantization value p between any two papers, and commonly referring to paper i and paper j
Figure FDA00027510554500000110
The number of articles in (1) is greater than or equal to
Figure FDA00027510554500000111
Wherein each p value corresponds to a k value, the calculation formula is as follows:
Figure FDA00027510554500000112
wherein d isiAnd djRepresenting respective sets of references in paper pair (i, j)
Figure FDA00027510554500000113
The number of articles in (a) is,
Figure FDA00027510554500000114
indicating that paper i and paper j refer together a set
Figure FDA00027510554500000115
The number of articles in (1); the value of p is pij(k) Indicating that paper i is commonly referenced with paper j
Figure FDA00027510554500000116
The number of articles in (1) is greater than or equal to
Figure FDA00027510554500000117
The probability of (d);
s3.3: for slave kminTo kmaxRepeating the process S3.1-S3.2 for all possible values of k, such that each article pair (i, j) is associated with more than one p-value, each p-value associated with a quoted number of articles k;
s4: multiple tests are performed on the p-value in S3 by using the false discovery rate method to obtain a threshold p for judging whether the articles are significantly similar*Identifying a significantly similar pair of papers;
s4.1: setting a statistical threshold p*Assuming a total of NtIndividual tests, ranking the p-values of all different tests in increasing order:
Figure FDA0002751055450000023
s4.2: constantly readjusting the threshold p*Until the maximum t satisfying the condition is foundmaxThe conditions were as follows:
Figure FDA0002751055450000021
wherein N istDenotes the number of tests, NtRefers to the set S to which all values at the number of references k correspondkThe number of all the different article pairs tested;
s4.3: comparing each value of p with the readjusted threshold value p*If the value of p is less than the threshold value p*The similarity of the paper to (i, j) isThreshold value p*The above time passes statistical verification, i.e. papers i, j are similar;
s5: analyzing the result of S4, verifying the homology of the paper and finding out the possible missing phenomenon, deleting the reference document with low correlation degree, and providing correct reference for scientific evaluation
S5.1: verifying the homology of the paper, for each threshold p*All calculate Pi→j(p*) The calculation formula is as follows:
Figure FDA0002751055450000022
wherein, Pi→j(p*) To be at a threshold value p*Probability of citation between any two papers verified by similarity, K (p)*) For the number of references between verified pairs of papers, M (p)*) Passing a confidence threshold p for at least one of the associated p values*Size of the collection of statistically tested papers;
fit out Pi→j(p*) And p*The functional image of the two papers obtains the conclusion that the higher the similarity of the two papers is, the higher the probability of the citation relationship exists between the two papers is, and verifies the homology of the citation of the papers;
s5.2: searching possible missing leads, selecting a paper m based on the conclusion of S5.1, obtaining a plurality of articles similar to the paper m in the steps of S1-S4, classifying all the articles which are similar to the paper m and quote the article n into a set S, if the article m does not quote the article n, automatically setting a threshold value p ', and judging that the article m misses the article chapter n when the set S is larger than the threshold value p';
s5.3: steps S1-S4 are performed on the article and its own references to delete references with insufficiently high similarity and simplify the list of references of the article.
CN201811072484.5A 2018-09-14 2018-09-14 Paper correlation degree quantification method based on reference document list overlapping degree Active CN109376238B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811072484.5A CN109376238B (en) 2018-09-14 2018-09-14 Paper correlation degree quantification method based on reference document list overlapping degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811072484.5A CN109376238B (en) 2018-09-14 2018-09-14 Paper correlation degree quantification method based on reference document list overlapping degree

Publications (2)

Publication Number Publication Date
CN109376238A CN109376238A (en) 2019-02-22
CN109376238B true CN109376238B (en) 2021-01-05

Family

ID=65405261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811072484.5A Active CN109376238B (en) 2018-09-14 2018-09-14 Paper correlation degree quantification method based on reference document list overlapping degree

Country Status (1)

Country Link
CN (1) CN109376238B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096710B (en) * 2019-05-09 2022-12-30 董云鹏 Article analysis and self-demonstration method
CN114911935A (en) * 2019-05-17 2022-08-16 爱酷赛股份有限公司 Cluster analysis method, cluster analysis system, and cluster analysis program
CN110489745B (en) * 2019-07-31 2020-12-22 北京大学 Paper text similarity detection method based on citation network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126732A (en) * 2016-07-04 2016-11-16 中南大学 Author's power of influence transmission capacity Forecasting Methodology based on interest scale model
CN108132961A (en) * 2017-11-06 2018-06-08 浙江工业大学 A kind of bibliography based on reference prediction recommends method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011044578A1 (en) * 2009-10-11 2011-04-14 Patrick Walsh Method and system for performing classified document research

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126732A (en) * 2016-07-04 2016-11-16 中南大学 Author's power of influence transmission capacity Forecasting Methodology based on interest scale model
CN108132961A (en) * 2017-11-06 2018-06-08 浙江工业大学 A kind of bibliography based on reference prediction recommends method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Yicong Liang等.Finding Relevant Papers Based on Citation Relations.《Web-Age Information Management》.2011,403-414. *
基于内容与引用关系的学术论文推荐;蔡阿妮;《中国优秀硕士学位论文全文数据库 信息科技辑》;20141115;第I138-568页 *
多重相关检验中错误发现率的控制算法;刘遵雄等;《井冈山大学学报(自然科学版)》;20161231;第35-40页 *

Also Published As

Publication number Publication date
CN109376238A (en) 2019-02-22

Similar Documents

Publication Publication Date Title
Chen et al. Colnet: Embedding the semantics of web tables for column type prediction
Becchetti et al. Efficient algorithms for large-scale local triangle counting
US7409404B2 (en) Creating taxonomies and training data for document categorization
US7603370B2 (en) Method for duplicate detection and suppression
CN109376238B (en) Paper correlation degree quantification method based on reference document list overlapping degree
US8738635B2 (en) Detection of junk in search result ranking
WO2017000610A1 (en) Webpage classification method and apparatus
Mahmoud et al. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems
CN108647322B (en) Method for identifying similarity of mass Web text information based on word network
Li et al. Mining blackhole and volcano patterns in directed graphs: A general approach
Melucci On rank correlation in information retrieval evaluation
Hammouda et al. Data mining in e-learning
Alonso et al. Duplicate news story detection revisited
US20230273964A1 (en) Apparatus and method for evaluating search engine performance, and dashboard
Ehsan et al. Qurve: Query refinement for view recommendation in visual data exploration
Song et al. SFP-Rank: significant frequent pattern analysis for effective ranking
Kumar et al. Similarity measure approaches applied in text document clustering for information retrieval
KR101823463B1 (en) Apparatus for providing researcher searching service and method thereof
CN114911826A (en) Associated data retrieval method and system
Strobin et al. Recommendations and object discovery in graph databases using path semantic analysis
Nuray-Turan et al. Self-tuning in graph-based reference disambiguation
Irshad et al. SwCS: Section-Wise Content Similarity Approach to Exploit Scientific Big Data.
KR102081867B1 (en) Method for building inverted index, method and apparatus searching similar data using inverted index
Makary et al. Using supervised machine learning to automatically build relevance judgments for a test collection
Jasbick et al. Pushing diversity into higher dimensions: the LID effect on diversified similarity searching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant