CN113343062A - Scientific and technological resource matching method based on Pagerank algorithm - Google Patents

Scientific and technological resource matching method based on Pagerank algorithm Download PDF

Info

Publication number
CN113343062A
CN113343062A CN202110564589.8A CN202110564589A CN113343062A CN 113343062 A CN113343062 A CN 113343062A CN 202110564589 A CN202110564589 A CN 202110564589A CN 113343062 A CN113343062 A CN 113343062A
Authority
CN
China
Prior art keywords
value
document
scientific
algorithm
elasticissearch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110564589.8A
Other languages
Chinese (zh)
Inventor
徐昱琳
李璇
周文举
易开祥
费敏锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN202110564589.8A priority Critical patent/CN113343062A/en
Publication of CN113343062A publication Critical patent/CN113343062A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a scientific and technological resource matching method based on a Pagerank algorithm. The relevancy calculation formula used by the Elasticissearch mainly refers to the word frequency/reverse document frequency and a vector space model, and does not consider the literature value of the query object. In order to quickly and accurately find the most valuable documents which accord with the expectation, the invention constructs an Elasticissearch document search engine, uses a new document value sorting algorithm based on a Pagerank algorithm to calculate the document value, and adds a document value factor according to the original calculation relevance rule of the Elasticissearch to obtain a new relevance sorting rule. After the document value sorting algorithm is mature, the thought of the document value sorting algorithm is used for other scientific and technological resources, and the value sorting algorithm such as intellectual property, human resources, policy consultation and the like is obtained by modifying the details of the algorithm.

Description

Scientific and technological resource matching method based on Pagerank algorithm
Technical Field
The invention relates to the field of search engines and resource matching, in particular to construction of a scientific and technological resource matching algorithm based on a Pagerank algorithm.
Background
With the development of society, the demand of human beings on the sharing of scientific and technological resources is increasing day by day, and a search engine also becomes an important tool for people to acquire mass information. Academic documents are important components of scientific and technological resources, and users mainly acquire the academic documents through academic search engines. In recent years, search engines for academic document retrieval have been introduced, whether by search engine companies or database vendors, to provide users with academic search services, such as Google Scholar, encyclopedia, Cnki academia, Web of Science, and the like. The Elasticissearch technology is used by search engines used by many websites in China at present. The elastic search is a search server developed by Java language and based on Lucene, and can realize a search function according to input keywords.
The Elasticissearch can store the data of the user in an Elasticissearch database, the corresponding sentence is segmented through a segmenter, the weight and the segmentation result are stored in the data together, when the user searches the data, the results are ranked and scored according to the weight, and the returned results are presented to the user. For document matching in scientific and technological resources, the relevance algorithm of the Elasticissearch using the search term frequency/reverse document frequency only considers the relevance degree with the search term, does not consider the value of the document, and may cause the documents with slightly low similarity but high document value not to be dominant in the ranking in the matching process.
The Pagerank algorithm is a webpage sorting algorithm of Google, a weight is attached to each target webpage, and sorting sequence is determined according to the weight. Similar to the problem of webpage ranking, the domestic scholars propose to apply the Pagerank algorithm to the ranking of documents, and simultaneously define the document value as the weighted sum of the intrinsic value of the document and the value obtained after being quoted. The intrinsic value of a document is mainly determined by the level of publication or conference published by the document and the authority of an author, and the publication age is used as a standard for measuring the importance of a reference document to describe the weight of value transfer. Throughout the large number of papers on the ranking of the value of the documents, the problem of unreasonable ranking of the documents due to the publication time is still existed.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to overcome the defects in the prior art, and provides a scientific and technological resource matching method based on a Pagerank algorithm. The relevance ranking constructed by the method considers the value ranking of the documents, mainly aims at the treatise documents, and enables the value ranking to meet the requirements.
In order to achieve the purpose of the invention, the invention adopts the following inventive concept:
the principle of the invention is as follows: the method is characterized in that a paper document value sorting algorithm in the method is based on a Pagerank improved algorithm, the emphasis points of new and old document value evaluation are determined by using different publication times, the introduced quantity, the downloaded quantity and the influence factors are introduced to depict the inherent value of the paper document, the value obtained by introduction can be determined by mutually introducing documents to obtain the value, a new document value sorting algorithm is finally obtained, the value scores of the paper documents are evaluated, the total score finally determined for sorting is calculated by combining the relevance scores of the Elasticisearch, so that a document search engine considering the document value is realized, the idea of the document value sorting algorithm is used for other scientific and technological resources after the document value sorting algorithm is mature, and the value sorting algorithm of property rights, human resources, policy consultation and the like can be obtained by modifying the details of the algorithm.
According to the inventive concept, the invention adopts the following technical scheme:
a scientific and technological resource matching method based on a Pagerank algorithm comprises the following steps:
step 1: building a basic Elasticissearch search engine by using a Spring framework of IDEA software;
step 2: crawling and analyzing the thesis data from the resource pool by using an Elasticissearch search engine, and putting the data into an Elasticissearch index library;
and step 3: defining a new document value sorting algorithm based on a Pagerank sorting algorithm, and calculating and modifying to obtain a new correlation score calculation rule by combining the Elasticissearch correlation;
and 4, step 4: and (4) inquiring according to the input keywords, and when a Request is generated in the IDAE, sequencing the document values according to the scores by using a newly defined relevancy scoring rule.
Preferably, the Elasticissearch index library visualization is implemented using an Elasticissearch-head-master plug-in. Preferably, the method is based on an Elasticissearch (7.6.1) framework, projects are built by means of IDEA software, visualization of an Elasticissearch index library is achieved by using an Elasticissearch-head-master plug-in, document value factors are considered by search result scoring rules, and a document value sorting algorithm is based on a webpage sorting algorithm Pagerank.
Preferably, in step 1, the Elasticsearch version is configured in the configuration file of the Spring project, and the data search is implemented by calling SearchRequest, SearchResponse, resthighlevel client, TermQueryBuilder API in Elasticsearch.
Preferably, in the step 2, a Spring project is built based on a Spring project framework in the IDEA, an elastosearch engine version bound to the project is concerned in a pom.xml file, an 3.6.3 version of the maven bound elastosearch engine is automatically imported into a required jar package, and after the project is built, paper resource data is crawled by a simple crawler and stored in an ES index library after permission of the resource platform is obtained.
Preferably, in step 3, the basic Pagerank algorithm determines that a web page is important if the web page is linked to by many other web pages, that is, the Pagerank value is relatively high; if a web page with a high pageank value is linked to another web page, determining that the pageank value of the linked web page is correspondingly increased; the main calculation formula of the Pagerank algorithm is as follows:
Figure BDA0003080460220000031
wherein PR (pi) represents the PR value of page pi; c (pk) represents the total number of pages pk linked out by page pj; d is a damping factor, generally takes a value of 0.85, and represents the probability that the user continues to click the link, and 1-d is the probability that the user jumps out of the link and clicks a new page.
Preferably, in the step 3, for the characteristics of the scientific and technical paper documents, a weighted sum of the intrinsic value of the documents and the quoted value is used in combination with a Pagerank algorithm to define the value of a certain document, wherein the intrinsic value of the documents is determined by the quoted amount, the downloaded amount and the influence factor of the scientific and technical documents, and the quoted value of the scientific and technical documents is delivered by the quoted documents; meanwhile, the weight relationship among the 2 values is determined by publication time, and the following literature value sorting algorithm formula is adopted:
KLV(ui)=(1-d(ti))*LZV(ui)+d(ti)*TRV(ui);
where KLV (ui) represents the literature value of this document, d (ti) represents the time damping function of this document, LZV (ui) represents the intrinsic value of this document itself, and TRV (ui) represents the value to which this document is cited. Aiming at the characteristics of scientific and technological paper documents, the core idea of the Pagerank algorithm is used in document value calculation, and the following can be obtained: a document is important if it is linked to many other documents, i.e. the value of the document is relatively high; if a web page with a high document value links to another document, the document value of the linked document is increased accordingly.
Preferably, in the step 4, the value score KLV of each document is calculated, the relevancy ranking rule of the Elasticsearch is modified, the score calculated by the original relevancy of the Elasticsearch accounts for 60%, the score calculated by the value of the scientific and technological paper documents accounts for 40%, and the ranking display result is displayed according to the final total score, and the specific implementation steps are as follows:
step 3.1: determining the intrinsic value of the scientific literature;
step 3.2: determining the value of the scientific literature;
step 3.3: determining a time damping coefficient;
step 3.4: establishing a document value sorting algorithm formula;
step 3.5: and fusing a literature value sorting algorithm formula to calculate the original similarity of the Elasticissearch, thereby obtaining a new literature correlation calculation mode.
A computer system program executes the scientific and technological resource matching method based on the Pagerank algorithm.
Compared with the prior art, the invention has the following obvious and prominent substantive characteristics and remarkable advantages:
1. the invention considers that the inquired object is scientific and technological thesis documents, the value attribute of the documents is one of the influence factors for determining search sequencing display, and an algorithm for calculating the value of the documents is newly added based on the original correlation calculation, thereby not only ensuring the accuracy of search, but also ensuring the quality of the searched documents, and being more beneficial to searching proper and high-quality thesis documents by users;
2. the document value sorting algorithm used by the invention improves the damping factor in the original Pagerank algorithm, adds the time factor, gives a weight different from the obtained value to the inherent value of the document according to the publication time of the document, and increases the fairness of the new document and the old document in the value evaluation.
Drawings
FIG. 1 is a diagram illustrating the architecture of the Elasticissearch in the present invention.
FIG. 2 is a simplified flow chart of a document value ranking algorithm in accordance with the present invention.
FIG. 3 is an index library visualization interface provided by the elastic search-head-master in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The above-described scheme is further illustrated below with reference to specific embodiments, which are detailed below:
the first embodiment is as follows:
in this embodiment, referring to fig. 1-2, a scientific and technological resource matching method based on the Pagerank algorithm includes the following steps:
step 1: building a basic Elasticissearch search engine by using a Spring framework of IDEA software;
step 2: crawling and analyzing the thesis data from the resource pool by using an Elasticissearch search engine, and putting the data into an Elasticissearch index library;
and step 3: defining a new document value sorting algorithm based on a Pagerank sorting algorithm, and calculating and modifying to obtain a new correlation score calculation rule by combining the Elasticissearch correlation;
and 4, step 4: and (4) inquiring according to the input keywords, and when a Request is generated in the IDAE, sequencing the document values according to the scores by using a newly defined relevancy scoring rule.
In the scientific and technological resource matching method based on the Pagerank algorithm, starting from literature resources in scientific and technological resources, a literature value algorithm is introduced into the Elasticissearch correlation calculation, so that a literature search engine considering literature values is built. The relevance ranking constructed by the method considers the value ranking of the documents, mainly aims at the treatise documents, and enables the value ranking to meet the requirements.
Example two:
this embodiment is substantially the same as the first embodiment, and is characterized in that:
in this embodiment, referring to FIGS. 1-3, the Elasticissearch index library visualization is implemented using the Elasticissearch-head-master plug-in.
In this embodiment, in step 1, an Elasticsearch version is configured in a configuration file of a Spring project, and data search is implemented by calling SearchRequest, SearchResponse, resthighlevel client, and TermQueryBuilder APIs in Elasticsearch.
In this embodiment, in step 2, a Spring project is built based on a Spring project framework in the IDEA, an elastosearch engine version bound to the project is concerned in a pom.xml file, an 3.6.3 version bound to the maven elastosearch engine is automatically imported into a required jar package, and after the project is built, paper resource data is crawled by a simple crawler and stored in an ES index library after permission of the resource platform is obtained.
In this embodiment, in step 3, if a web page is linked to by many other web pages, the basic pageank algorithm determines that the web page is important, that is, the pageank value is relatively high; if a web page with a high pageank value is linked to another web page, determining that the pageank value of the linked web page is correspondingly increased; the main calculation formula of the Pagerank algorithm is as follows:
Figure BDA0003080460220000051
wherein PR (pi) represents the PR value of page pi; c (pk) represents the total number of pages pk linked out by page pj; d is a damping factor, generally takes a value of 0.85, and represents the probability that the user continues to click the link, and 1-d is the probability that the user jumps out of the link and clicks a new page.
In the embodiment, in the step 3, for the characteristics of the scientific and technical paper documents, a weighted sum of the intrinsic value of the documents and the quoted value is used in combination with a Pagerank algorithm to define the value of a certain document, wherein the intrinsic value of the documents is determined by the quoted amount, the reading amount and the influence factor of the scientific and technical documents, and the quoted value of the scientific and technical documents is delivered by the quoted documents; meanwhile, the weight relationship among the 2 values is determined by publication time, and the following literature value sorting algorithm formula is adopted:
KLV(ui)=(1-d(ti))*LZV(ui)+d(ti)*TRV(ui);
where KLV (ui) represents the literature value of this document, d (ti) represents the time damping function of this document, LZV (ui) represents the intrinsic value of this document itself, and TRV (ui) represents the value to which this document is cited.
In this embodiment, in the step 4, the value score KLV of each document is calculated, the relevancy ranking rule of the Elasticsearch is modified, the score calculated by the original relevancy of the Elasticsearch accounts for 60%, the score calculated by the value of the scientific and technological paper documents accounts for 40%, and the ranking display result is displayed according to the final total score, which is specifically implemented as follows:
step 3.1: determining the intrinsic value of the scientific literature;
step 3.2: determining the value of the scientific literature;
step 3.3: determining a time damping coefficient;
step 3.4: establishing a document value sorting algorithm formula;
step 3.5: and fusing a literature value sorting algorithm formula to calculate the original similarity of the Elasticissearch, thereby obtaining a new literature correlation calculation mode.
In the method, the inquired object is a scientific and technological thesis document, the value attribute of the document itself should become one of the influence factors for determining search sequencing display, and an algorithm for calculating the value of the document is newly added based on the original correlation calculation, so that the accuracy of search is ensured, the quality of the searched document is also ensured, and the method is more beneficial for a user to search for the searched documentIs suitable forAnd high quality; the document value sorting algorithm used in the method improves the damping factor in the original Pagerank algorithm, adds the time factor, gives a weight different from the quoted value to the inherent value of the document according to the publication time of the document, and increases the fairness of the new document and the old document in the value evaluation.
Example three:
this embodiment is substantially the same as the above embodiment, and is characterized in that:
in this embodiment, as shown in fig. 1: an IDEA software is used for building a Spring project, an Elasticissearch (7.6.1) is used for building a search engine, an IKAnalyzer is used for a word segmentation device, and an Elasticissearch index library visualization is realized by combining an Elasticissearch-head-master plug-in, as shown in FIG. 3.
As shown in fig. 2: because the query object is a scientific and technological paper document, the method mainly modifies the correlation calculation mode of the elastic search and adds the value attribute of the scientific and technological paper document. For the value score of scientific and technical paper documents, a new formula is provided for score calculation, and the score calculation is realized step by step according to steps 3.1 to 3.5, wherein the details of the steps are as follows:
step 3.1: the intrinsic value of scientific and technological paper documents is mainly determined by quoted quantity, download quantity and influence factors, and the quoted quantity and the download quantity are important indexes for measuring the intrinsic value of the documents, and can prove the practical value of the documents to a certain extent. The influence factors are indexes for measuring the usefulness and the display degree of the journal, but can reflect the quality of the paper to a certain extent. Therefore, the intrinsic value of the document is described by combining the three, and the formula is as follows:
Figure BDA0003080460220000061
wherein LZV (u)i) The index value (YN), (ui) represents the index amount of document ui, XZ (ui) represents the document download amount, and IF (ui) represents the influence factor of the document.
Step 3.2: there is also a transfer of value in the citation relationship between documents, analogous to the link relationship between web pages. The cited documents will receive the acceptance of other documents, this acceptance being expressed in terms of the transfer of value, each document transferring its own value to the cited reference separately, and the expression of value obtained by the citation is then:
Figure BDA0003080460220000062
where TRV (ui) is quoted, n represents the total number of documents cited as document ui, KLV (uj) represents the value of document uj, Bj represents the collection of citations for document uj, and KLV (uk) represents the value of document uj as citation for document uk.
Step 3.3: damping function: the present invention adjusts the damping factor in the Pagerank algorithm. The cited amount of recently published documents is not high, not necessarily because of low price, so the original damping factor is combined with the time factor to obtain a new damping function, and the formula is as follows:
Figure BDA0003080460220000071
in the formula, d is a damping factor and generally takes a value of 0.85, ti represents publication time of a document, to represents current time, and Σk(t0-tk) The published days and of all documents are indicated. The damping function is provided, the weights of the intrinsic value of the literature and the value obtained after the literature is quoted are given according to the difference of the publication time, the objective fairness influence of the publication time on the quoted quantity is reduced, and the result of the value ordering of the literature is optimized.
Step 3.4: the invention provides a method for calculating the value of a certain document by using the weighted sum of the intrinsic value of the document and the quoted value, wherein the intrinsic value of the document is determined by the quoted amount, the download amount and the influence factor, the other value is transmitted by the quoted document, and meanwhile, the weight relation between 2 values is determined by the publication time. Therefore, the following scientific paper literature value ranking algorithm formula is proposed:
KLV(ui)=(1-d(ti))*LZV(ui)+d(ti)*TRV(ui)
KLV (ui) in the formula represents the value of the new algorithm in the document ui; LZV (ui) represents the transmission power of the document ui and is used for describing the inherent value of the document itself; TRV (ui) represents the value obtained after citation of the document ui; d (ti) represents a damping function, and determines the evaluation weights of the old and new documents.
Step 3.5: the Elasticsearch uses boolean models to find matching documents and calculates relevance using a formula called a utility scoring function. The formula uses the word frequency/reverse document frequency and the vector space model for reference, and simultaneously adds new characteristics of a coordination factor, field length normalization, word or query statement weight improvement and the like. The relevancy score of each paper to be queried accounts for 60% through a formula, the score of the document value of each paper accounts for 40% through step 3.4, and the query result is displayed according to the total score.
In the scientific and technological resource matching method based on the Pagerank algorithm, starting from academic documents in scientific and technological resources, documents serving as query objects have document values, and the document values can be calculated by weighted summation of intrinsic values and quoted acquisition values of the documents. The relevancy calculation formula used by the Elasticissearch mainly refers to the word frequency/reverse document frequency and a vector space model, and does not consider the literature value of the query object. In order to quickly and accurately find the most valuable documents which accord with the expectation, the invention constructs an Elasticissearch document search engine, uses a new document value sorting algorithm based on a Pagerank algorithm to calculate the document value, and adds a document value factor according to the original calculation relevance rule of the Elasticissearch to obtain a new relevance sorting rule. After the document value sorting algorithm is mature, the idea of the document value sorting algorithm is applied to other scientific and technological resources, such as value sorting algorithms of intellectual property, human resources, policy consultation and the like can be obtained by modifying the details of the algorithm.
The embodiments of the present invention have been described with reference to the accompanying drawings, but the present invention is not limited to the embodiments, and various changes and modifications can be made according to the purpose of the invention, and any changes, modifications, substitutions, combinations or simplifications made according to the spirit and principle of the technical solution of the present invention shall be equivalent substitutions, as long as the purpose of the present invention is met, and the present invention shall fall within the protection scope of the present invention without departing from the technical principle and inventive concept of the present invention.

Claims (7)

1. A scientific and technological resource matching method based on a Pagerank algorithm is characterized by comprising the following steps:
step 1: building a basic Elasticissearch search engine by using a Spring framework of IDEA software;
step 2: crawling and analyzing the thesis data from the resource pool by using an Elasticissearch search engine, and putting the data into an Elasticissearch index library;
and step 3: defining a new document value sorting algorithm based on a Pagerank sorting algorithm, and calculating and modifying to obtain a new correlation score calculation rule by combining the Elasticissearch correlation;
and 4, step 4: and (4) inquiring according to the input keywords, and when a Request is generated in the IDAE, sequencing the document values according to the scores by using a newly defined relevancy scoring rule.
2. The scientific and technological resource matching method based on the Pagerank algorithm as claimed in claim 1, wherein: the Elasticisearch index library visualization was achieved using the Elasticisearch-head-master plug-in.
3. The scientific and technological resource matching method based on the Pagerank algorithm as claimed in claim 1, wherein: in step 1, configuring an Elasticsearch version in a configuration file of a Spring project, and implementing data search by calling SearchRequest, SearchResponse, resthighlevel client and termquerybuilder api in Elasticsearch.
4. The scientific and technological resource matching method based on the Pagerank algorithm as claimed in claim 1, wherein: in the step 2, a Spring project is built based on a Spring project framework in the IDEA, an Elasticissearch engine version bound to the project is concerned in a pom file, an 3.6.3 version bound to the maven by the Elasticissearch engine is automatically imported into a required jar package, and after the project is built, the thesis resource data is crawled by a simple crawler and stored in an ES index library after permission of the resource platform is obtained.
5. The scientific and technological resource matching method based on the Pagerank algorithm as claimed in claim 1, wherein: in step 3, the basic Pagerank algorithm determines that a web page is important if the web page is linked to by many other web pages, that is, the Pagerank value is relatively high; if a web page with a high pageank value is linked to another web page, determining that the pageank value of the linked web page is correspondingly increased; the main calculation formula of the Pagerank algorithm is as follows:
Figure FDA0003080460210000011
wherein PR (pi) represents the PR value of page pi; c (pk) represents the total number of pages pk linked out by page pj; d is a damping factor, generally takes a value of 0.85, represents the probability that the user continues to click the link, and 1-d clicks a new page for the user to jump out of the linkOutline of a noodle And (4) rate.
6. The scientific and technological resource matching method based on the Pagerank algorithm as claimed in claim 1, wherein: in the step 3, for the characteristics of the scientific and technical paper documents, combining a Pagerank algorithm, defining the value of a certain document by using the weighted sum of the intrinsic value of the document and the quoted value, wherein the intrinsic value of the document is determined by the quoted amount, the download amount and the influence factor of the scientific and technical document, and the quoted value of the scientific and technical document is delivered by the quoted document; meanwhile, the weight relationship among the 2 values is determined by publication time, and the following literature value sorting algorithm formula is adopted:
KLV(ui)=(1-d(ti))*LZV(ui)+d(ti)*TRV(ui);
where KLV (ui) represents the literature value of this document, d (ti) represents the time damping function of this document, LZV (ui) represents the intrinsic value of this document itself, and TRV (ui) represents the value to which this document is cited.
7. The scientific and technological resource matching method based on the Pagerank algorithm as claimed in claim 6, wherein: in the step 4, the value score KLV of each document is calculated, the relevancy ranking rule of the Elasticsearch is modified, the score calculated by the original relevancy of the Elasticsearch accounts for 60%, the score calculated by the value of the scientific and technological paper documents accounts for 40%, and the ranking display result is displayed according to the final total score, and the specific implementation steps are as follows:
step 3.1: determining the intrinsic value of the scientific literature;
step 3.2: determining the value of the scientific literature;
step 3.3: determining a time damping coefficient;
step 3.4: establishing a document value sorting algorithm formula;
step 3.5: and fusing a literature value sorting algorithm formula to calculate the original similarity of the Elasticissearch, thereby obtaining a new literature correlation calculation mode.
CN202110564589.8A 2021-05-24 2021-05-24 Scientific and technological resource matching method based on Pagerank algorithm Pending CN113343062A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110564589.8A CN113343062A (en) 2021-05-24 2021-05-24 Scientific and technological resource matching method based on Pagerank algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110564589.8A CN113343062A (en) 2021-05-24 2021-05-24 Scientific and technological resource matching method based on Pagerank algorithm

Publications (1)

Publication Number Publication Date
CN113343062A true CN113343062A (en) 2021-09-03

Family

ID=77471094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110564589.8A Pending CN113343062A (en) 2021-05-24 2021-05-24 Scientific and technological resource matching method based on Pagerank algorithm

Country Status (1)

Country Link
CN (1) CN113343062A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202313A (en) * 2016-07-01 2016-12-07 西安电子科技大学 Retrieval result synthesis sort method towards academic Meta Search Engine
CN107092639A (en) * 2017-02-23 2017-08-25 武汉智寻天下科技有限公司 A kind of search engine system
CN108446367A (en) * 2018-03-15 2018-08-24 湖南工业大学 A kind of the packaging industry data search method and equipment of knowledge based collection of illustrative plates

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202313A (en) * 2016-07-01 2016-12-07 西安电子科技大学 Retrieval result synthesis sort method towards academic Meta Search Engine
CN107092639A (en) * 2017-02-23 2017-08-25 武汉智寻天下科技有限公司 A kind of search engine system
CN108446367A (en) * 2018-03-15 2018-08-24 湖南工业大学 A kind of the packaging industry data search method and equipment of knowledge based collection of illustrative plates

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
金山城: ""基于Elasticsearch的分布式搜索引擎的研究与实现"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈志涛: ""基于深度学习的个性化引文搜索推荐算法研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Similar Documents

Publication Publication Date Title
EP2438539B1 (en) Co-selected image classification
US7949643B2 (en) Method and apparatus for rating user generated content in search results
US8521734B2 (en) Search engine with augmented relevance ranking by community participation
KR100462292B1 (en) A method for providing search results list based on importance information and a system thereof
AU2020101885A4 (en) A Novel Tensor Factorization Using Trust and Rating for Recommendation, system and method thereof
US11170005B2 (en) Online ranking of queries for sponsored search
US8977625B2 (en) Inference indexing
CN111475725B (en) Method, apparatus, device and computer readable storage medium for searching content
US20130226950A1 (en) Generalized edit distance for queries
US20100274778A1 (en) Ranking documents
WO2009023371A2 (en) Categorization of queries
AU2014299245B1 (en) Improvements in website traffic optimization
CN111522905A (en) Document searching method and device based on database
US11995090B2 (en) Techniques for determining relevant electronic content in response to queries
Ruiz et al. Facilitating document annotation using content and querying value
US8380722B2 (en) Using anchor text with hyperlink structures for web searches
CN111753167A (en) Search processing method, search processing device, computer equipment and medium
Yang et al. Personalized news recommendation based on the text and image integration
Baker et al. A novel web ranking algorithm based on pages multi-attribute
CN110309189B (en) Method and device for acquiring heat of entity words
CN113343062A (en) Scientific and technological resource matching method based on Pagerank algorithm
CN115630144A (en) Document searching method and device and related equipment
Zuluaga Cajiao et al. Graph-based similarity for document retrieval in the biomedical domain
JP2010282403A (en) Document retrieval method
Lobo et al. A novel method for analyzing best pages generated by query term synonym combination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210903