CN110334269A - A kind of information retrieval method and system - Google Patents

A kind of information retrieval method and system Download PDF

Info

Publication number
CN110334269A
CN110334269A CN201910622980.1A CN201910622980A CN110334269A CN 110334269 A CN110334269 A CN 110334269A CN 201910622980 A CN201910622980 A CN 201910622980A CN 110334269 A CN110334269 A CN 110334269A
Authority
CN
China
Prior art keywords
web document
correlation
timing
web
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910622980.1A
Other languages
Chinese (zh)
Other versions
CN110334269B (en
Inventor
董文轩
程洁丹
晏裕生
姚晗
孙孟阳
江洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INTRODUCTION OF TECHNOLOGY RESEARCH & ECONOMY DEVELOPMENT INSTITUTE
Original Assignee
INTRODUCTION OF TECHNOLOGY RESEARCH & ECONOMY DEVELOPMENT INSTITUTE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INTRODUCTION OF TECHNOLOGY RESEARCH & ECONOMY DEVELOPMENT INSTITUTE filed Critical INTRODUCTION OF TECHNOLOGY RESEARCH & ECONOMY DEVELOPMENT INSTITUTE
Priority to CN201910622980.1A priority Critical patent/CN110334269B/en
Publication of CN110334269A publication Critical patent/CN110334269A/en
Application granted granted Critical
Publication of CN110334269B publication Critical patent/CN110334269B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of information retrieval method and system.Information retrieval method provided by the invention and system calculate the correlation of keyword set to be found with each web document in the web document set of national defense technical information field data source to be checked first;Then the web document that correlation is more than or equal to similar threshold value is exported, and correlation is less than Sequential output of the web document of similar threshold value according to timing from high to low.Search method and system provided by the invention, it is exported the biggish web document of correlation as search result, it can ensure the coverage rate of search result, simultaneously, web document lesser for correlation, according to the Sequential output of its timing from high to low to user, the high-timeliness requirement of information retrieval can satisfy.Therefore, the information retrieval that national defense technical information field is carried out using method and system provided by the invention, can meet the requirement of its high-timeliness and high coverage rate simultaneously.

Description

A kind of information retrieval method and system
Technical field
The present invention relates to information retrieval fields, more particularly to a kind of information retrieval method and system.
Background technique
Information retrieval (Information Retrieval) refers to according to user's needs, using certain information retrieval side Method finds out the search procedure of information required for user from a large amount of information aggregate.The key problem of information retrieval is result row How sequence is returning to user's most desirable information arrangement before list.The a part of information retrieval as information retrieval is Refer to and utilize certain information retrieval method, provide the information message such as required news, dynamic, policy, viewpoint for user, it has There are the main features such as high-timeliness and personalization.The national defense technical information realm information retrieval information retrieval special as one kind, With the characteristic for requiring high-timeliness and high coverage rate, still, existing search method can not meet simultaneously its high-timeliness and The requirement of high coverage rate.
Summary of the invention
The object of the present invention is to provide a kind of information retrieval method and systems, can meet national defense technical information field simultaneously The requirement of the high-timeliness and high coverage rate of information retrieval.
To achieve the above object, the present invention provides following schemes:
A kind of information retrieval method, which comprises
Obtain the web document set of keyword set to be found and national defense technical information field data source to be checked, the net Page collection of document includes multiple web documents;
Calculate the correlation of the keyword set to be found with each web document;
The web document that correlation is more than or equal to similar threshold value is exported, and correlation is less than the similar threshold value Sequential output of the web document according to timing from high to low.
Optionally, the correlation for calculating the keyword set to be found and each web document is specific to wrap It includes:
The correlation of the keyword set to be found with each web document is calculated using BM25 model.
Optionally, the web document that correlation is more than or equal to similar threshold value exports, and specifically includes:
Correlation is more than or equal to sequence of each web document of the similar threshold value according to correlation from high to low Output.
Optionally, described that correlation is less than sequence of the web document of the similar threshold value according to timing from high to low Output, specifically includes:
The time sequence parameter that correlation is less than each web document of the similar threshold value is obtained, the time sequence parameter includes: hair Cloth time, renewal time, click volume sum, download sum, page residence time overall length and web page contents update in acceleration At least one;
The timing of each web document is calculated according to the time sequence parameter;
Each web document that correlation is less than the similar threshold value is exported according to the sequence of timing from high to low.
Optionally, the time sequence parameter includes: issuing time, renewal time, click volume sum, download sum, the page Residence time overall length and web page contents update acceleration, it is described according to the time sequence parameter calculate each web document when Sequence specifically includes:
According to formula:Calculate the timing of i-th of web document, 1 ≤ i≤I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIndicate the timing of i-th of web document Property;DiIndicate the download sum of i-th of web document;CiIndicate the click volume sum of i-th of web document;PiIt indicates i-th The page residence time overall length of web document;T2iIndicate the renewal time of i-th of web document;T1iIndicate i-th of web document Issuing time;GiIndicate that the web page contents of i-th of web document update acceleration.
A kind of information retrieval system, the system comprises:
Data acquisition module, for obtaining the net of keyword set to be found and national defense technical information field data source to be checked Page collection of document, the web document set includes multiple web documents;
Correlation calculations module is related to each web document for calculating the keyword set to be found Property;
Search and output module, the web document for correlation to be more than or equal to similar threshold value export, and will be related Property be less than Sequential output of the web document according to timing from high to low of the similar threshold value.
Optionally, the correlation calculations module includes:
Correlation calculations unit, for calculating the keyword set to be found and each webpage using BM25 model The correlation of document.
Optionally, the search and output module includes:
High similar document output unit, each web document for correlation to be more than or equal to the similar threshold value are pressed According to the Sequential output of correlation from high to low.
Optionally, the search and output module includes:
Time sequence parameter acquiring unit, for obtaining timing ginseng of the correlation less than each web document of the similar threshold value Number, the time sequence parameter include: issuing time, renewal time, click volume sum, download sum, page residence time overall length At least one of acceleration is updated with web page contents;
Timing computing unit, for calculating the timing of each web document according to the time sequence parameter;
Timing document output unit, for correlation to be less than the similar threshold value according to the sequence of timing from high to low Each web document output.
Optionally, the time sequence parameter includes: issuing time, renewal time, click volume sum, download sum, the page Residence time overall length and web page contents update acceleration, and the timing computing unit includes:
Timing computation subunit, for according to formula:It calculates i-th The timing of web document, 1≤i≤I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIndicate the The timing of i web document;DiIndicate the download sum of i-th of web document;CiIndicate the click of i-th of web document Amount sum;PiIndicate the page residence time overall length of i-th of web document;T2iIndicate the renewal time of i-th of web document; T1iIndicate the issuing time of i-th of web document;GiIndicate that the web page contents of i-th of web document update acceleration.
The specific embodiment provided according to the present invention, the invention discloses following technical effects:
Information retrieval method provided by the invention and system calculate keyword set to be found and national defense technical information first The correlation of each web document in the web document set of field data source to be checked;Then correlation is more than or equal to phase It is exported like the web document of threshold value, and correlation is less than sequence of the web document of similar threshold value according to timing from high to low Output.Search method and system provided by the invention are exported the biggish web document of correlation as search result, can be true The coverage rate of search result is protected, meanwhile, web document lesser for correlation is defeated according to the sequence of its timing from high to low Out to user, the high-timeliness requirement of information retrieval can satisfy.Therefore, state is carried out using method and system provided by the invention The information retrieval in anti-scientific and technological information field can meet the requirement of its high-timeliness and high coverage rate simultaneously.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.
Fig. 1 is a kind of flow chart of information retrieval method provided in an embodiment of the present invention;
Fig. 2 is a kind of structural block diagram of information retrieval system provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The object of the present invention is to provide a kind of information retrieval method and systems, can meet national defense technical information field simultaneously The requirement of the high-timeliness and high coverage rate of information retrieval.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
Fig. 1 is a kind of flow chart of information retrieval method provided in an embodiment of the present invention.As shown in Figure 1, the method packet It includes:
Step 101: obtaining the web document collection of keyword set to be found and national defense technical information field data source to be checked It closes, the web document set includes multiple web documents.
Step 102: calculating the correlation of the keyword set to be found and each web document.The present embodiment In, the correlation of the keyword set to be found with each web document is calculated using BM25 model.
Step 103: the web document that correlation is more than or equal to similar threshold value being exported, and by correlation less than described Sequential output of the web document of similar threshold value according to timing from high to low.
In practical application, can by correlation be more than or equal to the similar threshold value each web document according to correlation by High to Low Sequential output is placed on foremost to user, the i.e. highest web document of correlation, and what correlation was taken second place is placed on second Position, and so on, each web document that correlation is more than or equal to the similar threshold value is exported to user.
It is described that correlation is less than Sequential output of the web document of the similar threshold value according to timing from high to low, tool Body includes:
The time sequence parameter that correlation is less than each web document of the similar threshold value is obtained, the time sequence parameter includes: hair Cloth time, renewal time, click volume sum, download sum, page residence time overall length and web page contents update in acceleration At least one;
The timing of each web document is calculated according to the time sequence parameter;
Each web document that correlation is less than the similar threshold value is exported according to the sequence of timing from high to low.
In the present embodiment, the time sequence parameter include: issuing time, renewal time, click volume sum, download sum, Page residence time overall length and web page contents update acceleration, described to calculate each web document according to the time sequence parameter Timing, specifically include:
According to formula:Calculate the timing of i-th of web document, 1 ≤ i≤I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIndicate the timing of i-th of web document Property;DiIndicate the download sum of i-th of web document;CiIndicate the click volume sum of i-th of web document;PiIt indicates i-th The page residence time overall length of web document;T2iIndicate the renewal time of i-th of web document;T1iIndicate i-th of web document Issuing time;GiIndicate that the web page contents of i-th of web document update acceleration.
Fig. 2 is a kind of structural block diagram of information retrieval system provided in an embodiment of the present invention.As shown in Fig. 2, the system Include:
Data acquisition module 201, for obtaining keyword set to be found and national defense technical information field data source to be checked Web document set, the web document set includes multiple web documents.
Correlation calculations module 202, for calculating the phase of the keyword set to be found with each web document Guan Xing.
Search and output module 203, the web document for correlation to be more than or equal to similar threshold value export, and by phase Closing property is less than Sequential output of the web document of the similar threshold value according to timing from high to low.
The correlation calculations module 202 includes:
Correlation calculations unit, for calculating the keyword set to be found and each webpage using BM25 model The correlation of document.
The search and output module 203 includes:
High similar document output unit, each web document for correlation to be more than or equal to the similar threshold value are pressed According to the Sequential output of correlation from high to low.
The search and output module 203 further include:
Time sequence parameter acquiring unit, for obtaining timing ginseng of the correlation less than each web document of the similar threshold value Number, the time sequence parameter include: issuing time, renewal time, click volume sum, download sum, page residence time overall length At least one of acceleration is updated with web page contents;
Timing computing unit, for calculating the timing of each web document according to the time sequence parameter;
Timing document output unit, for correlation to be less than the similar threshold value according to the sequence of timing from high to low Each web document output.
In the present embodiment, the time sequence parameter include: issuing time, renewal time, click volume sum, download sum, Page residence time overall length and web page contents update acceleration, and the timing computing unit includes:
Timing computation subunit, for according to formula:Calculate i-th The timing of a web document, 1≤i≤I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIt indicates The timing of i-th of web document;DiIndicate the download sum of i-th of web document;CiIndicate the point of i-th of web document The amount of hitting sum;PiIndicate the page residence time overall length of i-th of web document;T2iIndicate the renewal time of i-th of web document; T1iIndicate the issuing time of i-th of web document;GiIndicate that the web page contents of i-th of web document update acceleration.
Specific implementation process of the invention is as follows:
S1: web document set D, the D={ d of national defense technical information field data source to be checked are obtained1,d2,……,dn, di Indicate i-th of web document in D.
S2: obtaining the query text of user's input, carries out cutting to query text, obtains keyword set Q=to be found {q1,q2,……,qu, wherein qiIndicate that i-th of keyword to be found in keyword set to be found, 1≤i≤u, u indicate The quantity of keyword to be found.Each web document diIt is expressed as < Q, fi, ri> triple form, Q is the to be checked of user Look for keyword set;fiFor web document diFeature;riFor the correlation Rule of judgment of document and keyword set Q to be found Value, value range are { 0,1 }, and 0 represents uncorrelated, and 1 represents correlation.Specifically, it is determined that when keyword set to be found, to every A web document di, use the unsupervised feature selection approach of RSR algorithm (Regularized Self-Representation) To find the optimal cutting of each document, the specific steps are as follows:
(1) web document diCharacteristic set be fi={ fi1,fi2,……,fim, each specific features fijIt can pass through Other feature or oneself linear expression are as follows:Wherein, 1≤i≤n, 1≤j≤k≤m, wjkIndicate fij And fikCoefficient of relationship, eijIndicate weighted term, fijIndicate j-th of feature of i-th of document.
(2) to the characteristic set f of the documenti, solved using extreme value algorithm optimal Wherein, W indicates web document diCoefficient matrix, W=[wij]∈Rm×m, l2,1 Norm is that and also added on E in order to make algorithm have robustness to outlier | | W | |2,1Regular terms is flat to avoid the occurrence of All solutions;λ is the regularization weighting parameters of non-zero.
It enablesWherein, wiIt isThe i-th row.According to formulaIt can be obtained every The coefficient of correspondence of a feature, wherein v={ v1,v2,……,vm, i.e. web document diJ-th of feature fijCorresponding coefficient is vj
(3) occurs keyword q to be found in statistical documents featureiWord frequency xi, according to formulaIt obtains The keyword set coefficient t of the documenti.According to tiDescending sequence is ranked up, and selects tiMaximum cutting is as optimal Cutting, to obtain keyword set Q={ q to be found1,q2,……,qu}。
S3: to each web document di, it is network address (URL), title, main body respectively that dividing its content, which is 7 content domains, Content, document label (meta keywords), label describe (meta description), the Anchor Text (link i.e. in webpage Text) and lookup time log.Wherein, each web document is in a search engine by these domain representations and index.
S4: the progress of each web document in keyword set to be found and collection of document D is calculated using BM25 model Correlation obtains the relevance ranking result of n web document in collection of document D eventually by sequence screening.
Circular is as follows:
(1) each keyword q in keyword set Q to be found is calculated firstiWith each web document diIn each content Degree of correlation R (the q in domaini,di), then according to formulaCarry out accumulation operations, obtain it is final to Search keyword set Q and web document diCorrelation S (Q, di), PiIndicate the weight of the keyword.Wherein, degree of correlation R (qi,di) calculation formula it is as follows:
R(qi,di)=[fqi×(k1+1)/(fqi+K)]×[qfi×(k2+1)/(qfi+ k2)], wherein K=k1 × (1- b+b×dli× avgdl), qfiFor keyword qiThe frequency of occurrences in query statement Q, fqiFor keyword qiIn web document di In the frequency of occurrences, k1, k2, b is regulatory factor, may be configured as k1=1, k2=2, dl under normal circumstancesiIt is web document di Length, avgdl is all web documents i.e. average length of collection of document D,
(2) to all web documents in collection of document D, according to relevance values S (Q, di) be ranked up from big to small, it obtains The collection of document arranged to correlation descending.
(3) dependent thresholds T is obtained, the collection of document that correlation descending arranges is divided into two parts using dependent thresholds T, it is preceding Half portion is divided into the collection of document that correlation is more than or equal to dependent thresholds T, and latter half is correlation less than dependent thresholds T's Collection of document.
S5: it obtains in collection of document of the correlation less than dependent thresholds T, the issuing time T1 of each document, renewal time T2, click volume sum C (as 1 time click, default value 0 when user's single machine mouse clicks any position of the webpage), download Total D (user is 1 downloading, default value 0 to web page contents triggering down operation), page residence time overall length P and Web page contents update acceleration G.When calculating click volume sum C, as 1 time when user's single machine mouse clicks any position of the webpage It clicks, default value 0.The value that web page contents update acceleration G changes according to the speed at web page contents renewal time interval.
S6: according to formulaCalculate the timing of each web document.
S7: it is sequentially output according to each web document of the sequence by correlation less than similar threshold value T of timing from high to low To user.
A kind of search method and system provided by the invention, in conjunction with the timing of correlation and the information publication of searching motif Property, search result entry is ranked up according to the actual demand degree of user, improves the information search status of intelligence agent, The result for being truly realized user's care is placed on foremost, and the height for meeting national defense technical information realm information search result is related Property and high-timeliness requirement.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.
Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not It is interpreted as limitation of the present invention.

Claims (10)

1. a kind of information retrieval method, which is characterized in that the described method includes:
Obtain the web document set of keyword set to be found and national defense technical information field data source to be checked, the webpage text Shelves set includes multiple web documents;
Calculate the correlation of the keyword set to be found with each web document;
The web document that correlation is more than or equal to similar threshold value is exported, and correlation is less than to the net of the similar threshold value Sequential output of the page document according to timing from high to low.
2. the method according to claim 1, wherein described calculate the keyword set to be found and each institute The correlation for stating web document, specifically includes:
The correlation of the keyword set to be found with each web document is calculated using BM25 model.
3. the method according to claim 1, wherein the net that correlation is more than or equal to similar threshold value Page document output, specifically includes:
Correlation is more than or equal to Sequential output of each web document of the similar threshold value according to correlation from high to low.
4. the method according to claim 1, wherein the webpage text that correlation is less than to the similar threshold value Sequential output of the shelves according to timing from high to low, specifically includes:
The time sequence parameter that correlation is less than each web document of the similar threshold value is obtained, when the time sequence parameter includes: publication Between, renewal time, click volume sum, download sum, page residence time overall length and web page contents update in acceleration at least One;
The timing of each web document is calculated according to the time sequence parameter;
Each web document that correlation is less than the similar threshold value is exported according to the sequence of timing from high to low.
5. according to the method described in claim 4, it is characterized in that, the time sequence parameter include: issuing time, renewal time, Click volume sum, download sum, page residence time overall length and web page contents update acceleration, described to be joined according to the timing Number calculates the timing of each web document, specifically includes:
According to formula:Calculate the timing of i-th of web document, 1≤i≤ I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIndicate the timing of i-th of web document;DiTable Show the download sum of i-th of web document;CiIndicate the click volume sum of i-th of web document;PiIndicate i-th of webpage text The page residence time overall length of shelves;T2iIndicate the renewal time of i-th of web document;T1iIndicate the publication of i-th of web document Time;GiIndicate that the web page contents of i-th of web document update acceleration.
6. a kind of information retrieval system, which is characterized in that the system comprises:
Data acquisition module, for obtaining the webpage text of keyword set to be found and national defense technical information field data source to be checked Shelves set, the web document set includes multiple web documents;
Correlation calculations module, for calculating the correlation of the keyword set to be found with each web document;
Search and output module, the web document for correlation to be more than or equal to similar threshold value export, and correlation is small In Sequential output of the web document according to timing from high to low of the similar threshold value.
7. system according to claim 6, which is characterized in that the correlation calculations module includes:
Correlation calculations unit, for calculating the keyword set to be found and each web document using BM25 model Correlation.
8. system according to claim 6, which is characterized in that the search and output module includes:
High similar document output unit, for correlation to be more than or equal to each web document of the similar threshold value according to phase The Sequential output of closing property from high to low.
9. system according to claim 6, which is characterized in that the search and output module includes:
Time sequence parameter acquiring unit, for obtaining time sequence parameter of the correlation less than each web document of the similar threshold value, institute Stating time sequence parameter includes: issuing time, renewal time, click volume sum, download sum, page residence time overall length and webpage At least one of content update acceleration;
Timing computing unit, for calculating the timing of each web document according to the time sequence parameter;
Timing document output unit, for correlation to be less than each of the similar threshold value according to the sequence of timing from high to low Web document output.
10. system according to claim 9, which is characterized in that the time sequence parameter include: issuing time, renewal time, Click volume sum, download sum, page residence time overall length and web page contents update acceleration, the timing computing unit Include:
Timing computation subunit, for according to formula:Calculate i-th of webpage The timing of document, 1≤i≤I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIt indicates i-th The timing of web document;DiIndicate the download sum of i-th of web document;CiIndicate that the click volume of i-th of web document is total Number;PiIndicate the page residence time overall length of i-th of web document;T2iIndicate the renewal time of i-th of web document;T1iTable Show the issuing time of i-th of web document;GiIndicate that the web page contents of i-th of web document update acceleration.
CN201910622980.1A 2019-07-11 2019-07-11 Information retrieval method and system Active CN110334269B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910622980.1A CN110334269B (en) 2019-07-11 2019-07-11 Information retrieval method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910622980.1A CN110334269B (en) 2019-07-11 2019-07-11 Information retrieval method and system

Publications (2)

Publication Number Publication Date
CN110334269A true CN110334269A (en) 2019-10-15
CN110334269B CN110334269B (en) 2021-05-07

Family

ID=68146347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910622980.1A Active CN110334269B (en) 2019-07-11 2019-07-11 Information retrieval method and system

Country Status (1)

Country Link
CN (1) CN110334269B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5893092A (en) * 1994-12-06 1999-04-06 University Of Central Florida Relevancy ranking using statistical ranking, semantics, relevancy feedback and small pieces of text
CN1306258A (en) * 2001-03-09 2001-08-01 北京大学 Method for judging position correlation of a group of query keys or words on network page
CN101477556A (en) * 2009-01-22 2009-07-08 苏州智讯科技有限公司 Method for discovering hot sport in internet mass information
CN101625680A (en) * 2008-07-09 2010-01-13 东北大学 Document retrieval method in patent field
CN102982153A (en) * 2012-11-29 2013-03-20 北京亿赞普网络技术有限公司 Information retrieval method and device
CN104991962A (en) * 2015-07-22 2015-10-21 无锡天脉聚源传媒科技有限公司 Method and apparatus for generating recommendation information
CN107977405A (en) * 2017-11-16 2018-05-01 北京三快在线科技有限公司 Data reordering method, data sorting device, electronic equipment and readable storage medium storing program for executing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5893092A (en) * 1994-12-06 1999-04-06 University Of Central Florida Relevancy ranking using statistical ranking, semantics, relevancy feedback and small pieces of text
CN1306258A (en) * 2001-03-09 2001-08-01 北京大学 Method for judging position correlation of a group of query keys or words on network page
CN101625680A (en) * 2008-07-09 2010-01-13 东北大学 Document retrieval method in patent field
CN101477556A (en) * 2009-01-22 2009-07-08 苏州智讯科技有限公司 Method for discovering hot sport in internet mass information
CN102982153A (en) * 2012-11-29 2013-03-20 北京亿赞普网络技术有限公司 Information retrieval method and device
CN104991962A (en) * 2015-07-22 2015-10-21 无锡天脉聚源传媒科技有限公司 Method and apparatus for generating recommendation information
CN107977405A (en) * 2017-11-16 2018-05-01 北京三快在线科技有限公司 Data reordering method, data sorting device, electronic equipment and readable storage medium storing program for executing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
检索结果多样化研究综述: "冯晓华等", 《情报学报》 *

Also Published As

Publication number Publication date
CN110334269B (en) 2021-05-07

Similar Documents

Publication Publication Date Title
US20110106796A1 (en) System and method for recommendation of interesting web pages based on user browsing actions
EP2145264B1 (en) Calculating importance of documents factoring historical importance
US8244737B2 (en) Ranking documents based on a series of document graphs
Niranjan et al. Developing a web recommendation system based on closed sequential patterns
Raman et al. Online learning to diversify from implicit feedback
Shie et al. Online mining of temporal maximal utility itemsets from data streams
US7720870B2 (en) Method and system for quantifying the quality of search results based on cohesion
US8468153B2 (en) Information service for facts extracted from differing sources on a wide area network
Yagci et al. Scalable and adaptive collaborative filtering by mining frequent item co-occurrences in a user feedback stream
Prajapati A survey paper on hyperlink-induced topic search (HITS) algorithms for web mining
CN110209909A (en) Data crawling method, device, computer equipment and storage medium
CN105302898B (en) A kind of search ordering method and device based on click model
Wang et al. Optimal Control of Forward‐Backward Stochastic Jump‐Diffusion Differential Systems with Observation Noises: Stochastic Maximum Principle
Kaur et al. SIMHAR-smart distributed web crawler for the hidden web using SIM+ hash and redis server
Barla et al. Rule-based user characteristics acquisition from logs with semantics for personalized web-based systems
Chauhan et al. Web page ranking using machine learning approach
Srivastava et al. Discussion on damping factor value in PageRank computation
CN110334269A (en) A kind of information retrieval method and system
Yang et al. On characterizing and computing the diversity of hyperlinks for anti-spamming page ranking
CN103902687B (en) The generation method and device of a kind of Search Results
Lambhate et al. Hybrid algorithm on semantic web crawler for search engine to improve memory space and time
Xu et al. [Retracted] Generating Personalized Web Search Using Semantic Context
Lai et al. Personalized Web search results with profile comparisons
Yue et al. Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction
Godoy et al. A user profiling architecture for textual-based agents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant