CN110334269B - Information retrieval method and system - Google Patents
Information retrieval method and system Download PDFInfo
- Publication number
- CN110334269B CN110334269B CN201910622980.1A CN201910622980A CN110334269B CN 110334269 B CN110334269 B CN 110334269B CN 201910622980 A CN201910622980 A CN 201910622980A CN 110334269 B CN110334269 B CN 110334269B
- Authority
- CN
- China
- Prior art keywords
- webpage
- document
- relevance
- time sequence
- ith
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an information retrieval method and an information retrieval system. The information retrieval method and the system provided by the invention firstly calculate the relevance of each webpage document in the keyword set to be searched and the webpage document set of the data source to be searched in the field of national defense science and technology information; and then outputting the webpage documents with the relevance larger than or equal to the similarity threshold, and outputting the webpage documents with the relevance smaller than the similarity threshold in the order from high to low according to the time sequence. The retrieval method and the retrieval system provided by the invention output the webpage documents with larger relevance as the retrieval result, can ensure the coverage rate of the retrieval result, and simultaneously output the webpage documents with smaller relevance to the user according to the sequence from high to low, and can meet the requirement of high timeliness of information retrieval. Therefore, the method and the system provided by the invention are adopted to carry out information retrieval in the field of national defense science and technology information, and can simultaneously meet the requirements of high timeliness and high coverage rate.
Description
Technical Field
The present invention relates to the field of information retrieval, and in particular, to an information retrieval method and system.
Background
Information Retrieval (Information Retrieval) refers to a search process of finding out Information needed by a user from a large number of Information sets by adopting a certain Information Retrieval method according to the user needs. The core problem of information retrieval is result ordering, i.e., how to arrange the information most needed by the user in front of the return list. The information retrieval is used as a part of information retrieval, which means that a certain information retrieval method is utilized to provide information such as news, dynamic, policy, viewpoint and the like required by a user, and the method has the main characteristics of high timeliness, individuation and the like. The information retrieval in the technical information field of defense is used as a special information retrieval, and has the characteristics of high timeliness and high coverage rate, but the existing retrieval method cannot meet the requirements of high timeliness and high coverage rate at the same time.
Disclosure of Invention
The invention aims to provide an information retrieval method and an information retrieval system, which can simultaneously meet the requirements of high timeliness and high coverage rate of information retrieval in the field of national defense science and technology information.
In order to achieve the purpose, the invention provides the following scheme:
an information retrieval method, the method comprising:
acquiring a keyword set to be searched and a webpage document set of a data source to be searched in the field of national defense science and technology intelligence, wherein the webpage document set comprises a plurality of webpage documents;
calculating the correlation between the keyword set to be searched and each webpage document;
and outputting the webpage documents with the relevance larger than or equal to the similarity threshold, and outputting the webpage documents with the relevance smaller than the similarity threshold in sequence from high to low.
Optionally, the calculating the relevance between the keyword set to be searched and each of the web documents specifically includes:
and calculating the relevance of the keyword set to be searched and each webpage document by adopting a BM25 model.
Optionally, the outputting the webpage document with the relevance greater than or equal to the similarity threshold specifically includes:
and outputting the webpage documents with the relevance larger than or equal to the similarity threshold value in the order of high relevance to low relevance.
Optionally, the outputting the web documents with the relevance smaller than the similarity threshold from high to low according to the time sequence specifically includes:
acquiring time sequence parameters of each webpage document with the correlation smaller than the similarity threshold, wherein the time sequence parameters comprise: at least one of the release time, the update time, the total number of clicks, the total number of downloads, the total length of the dwell time of the page and the acceleration of updating the webpage content;
calculating the time sequence of each webpage document according to the time sequence parameters;
and outputting the webpage documents with the relevance smaller than the similarity threshold value in the order of high chronological order to low chronological order.
Optionally, the timing parameter includes: the method includes the following steps that the issuing time, the updating time, the total click quantity, the total download quantity, the total page retention time and the webpage content updating acceleration are calculated, the time sequence of each webpage document is calculated according to the time sequence parameters, and the method specifically includes the following steps:
according to the formula:calculating the time sequence of the ith webpage document, wherein I is more than or equal to 1 and less than or equal to I, I represents the number of the webpage documents with the correlation less than the similarity threshold value, SiRepresenting the time sequence of the ith webpage document; diRepresenting the total download amount of the ith webpage document; ciRepresenting the total click rate of the ith webpage document; piRepresenting the total length of the page stay time of the ith webpage document; t2iIndicating the update time of the ith webpage document; t1iIndicating the publishing time of the ith webpage document; giIndicating the web content update acceleration of the ith web document.
An information retrieval system, the system comprising:
the system comprises a data acquisition module, a search module and a search module, wherein the data acquisition module is used for acquiring a keyword set to be searched and a webpage document set of a data source to be searched in the field of national defense science and technology intelligence, and the webpage document set comprises a plurality of webpage documents;
the correlation calculation module is used for calculating the correlation between the keyword set to be searched and each webpage document;
and the retrieval output module is used for outputting the webpage documents with the relevance greater than or equal to the similarity threshold value and outputting the webpage documents with the relevance less than the similarity threshold value in sequence from high to low according to the time sequence.
Optionally, the correlation calculation module includes:
and the correlation calculation unit is used for calculating the correlation between the keyword set to be searched and each webpage document by adopting a BM25 model.
Optionally, the retrieval output module includes:
and the high-similarity document output unit is used for outputting the webpage documents of which the relevance is greater than or equal to the similarity threshold value in the order of high relevance to low relevance.
Optionally, the retrieval output module includes:
a time sequence parameter obtaining unit, configured to obtain a time sequence parameter of each web document whose correlation is smaller than the similarity threshold, where the time sequence parameter includes: at least one of the release time, the update time, the total number of clicks, the total number of downloads, the total length of the dwell time of the page and the acceleration of updating the webpage content;
the time sequence calculating unit is used for calculating the time sequence of each webpage document according to the time sequence parameters;
and the time sequence document output unit is used for outputting the webpage documents with the relevance smaller than the similarity threshold value according to the time sequence from high to low.
Optionally, the timing parameter includes: the time sequence calculating unit comprises the following components of issuing time, updating time, total number of click rate, total number of download amount, total length of dwell time of a page and updating acceleration of webpage content:
a timing calculation subunit configured to:calculating the time sequence of the ith webpage document, wherein I is more than or equal to 1 and less than or equal to I, I represents the number of the webpage documents with the correlation less than the similarity threshold value, SiRepresenting the time sequence of the ith webpage document; diRepresenting the total download amount of the ith webpage document; ciRepresenting the total click rate of the ith webpage document; piRepresenting the total length of the page stay time of the ith webpage document; t2iIndicating the update time of the ith webpage document; t1iIndicating the publishing time of the ith webpage document; giIndicating the web content update acceleration of the ith web document.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the information retrieval method and the system provided by the invention firstly calculate the relevance of each webpage document in the keyword set to be searched and the webpage document set of the data source to be searched in the field of national defense science and technology information; and then outputting the webpage documents with the relevance larger than or equal to the similarity threshold, and outputting the webpage documents with the relevance smaller than the similarity threshold in the order from high to low according to the time sequence. The retrieval method and the retrieval system provided by the invention output the webpage documents with larger relevance as the retrieval result, can ensure the coverage rate of the retrieval result, and simultaneously output the webpage documents with smaller relevance to the user according to the sequence from high to low, and can meet the requirement of high timeliness of information retrieval. Therefore, the method and the system provided by the invention are adopted to carry out information retrieval in the field of national defense science and technology information, and can simultaneously meet the requirements of high timeliness and high coverage rate.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a flowchart of an information retrieval method according to an embodiment of the present invention;
fig. 2 is a block diagram of an information retrieval system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide an information retrieval method and an information retrieval system, which can simultaneously meet the requirements of high timeliness and high coverage rate of information retrieval in the field of national defense science and technology information.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of an information retrieval method according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step 101: acquiring a keyword set to be searched and a webpage document set of a data source to be searched in the field of national defense science and technology intelligence, wherein the webpage document set comprises a plurality of webpage documents.
Step 102: and calculating the correlation between the keyword set to be searched and each webpage document. In this embodiment, a BM25 model is used to calculate the relevance between the keyword set to be searched and each of the web documents.
Step 103: and outputting the webpage documents with the relevance larger than or equal to the similarity threshold, and outputting the webpage documents with the relevance smaller than the similarity threshold in sequence from high to low.
In practical application, the web documents with the relevance greater than or equal to the similarity threshold value can be output to the user according to the sequence of the relevance from high to low, that is, the web document with the highest relevance is placed at the top, the web document with the second relevance is placed at the second position, and so on, and the web documents with the relevance greater than or equal to the similarity threshold value are output to the user.
The outputting the webpage documents with the relevance smaller than the similarity threshold value according to the sequence from high to low in time sequence specifically comprises:
acquiring time sequence parameters of each webpage document with the correlation smaller than the similarity threshold, wherein the time sequence parameters comprise: at least one of the release time, the update time, the total number of clicks, the total number of downloads, the total length of the dwell time of the page and the acceleration of updating the webpage content;
calculating the time sequence of each webpage document according to the time sequence parameters;
and outputting the webpage documents with the relevance smaller than the similarity threshold value in the order of high chronological order to low chronological order.
In this embodiment, the timing parameters include: the method includes the following steps that the issuing time, the updating time, the total click quantity, the total download quantity, the total page retention time and the webpage content updating acceleration are calculated, the time sequence of each webpage document is calculated according to the time sequence parameters, and the method specifically includes the following steps:
according to the formula:calculating the time sequence of the ith webpage document, wherein I is more than or equal to 1 and less than or equal to I, I represents the number of the webpage documents with the correlation less than the similarity threshold value, SiRepresenting the time sequence of the ith webpage document; diRepresenting the total download amount of the ith webpage document; ciRepresenting the total click rate of the ith webpage document; piRepresenting the total length of the page stay time of the ith webpage document; t2iIndicating the update time of the ith webpage document; t1iIndicating the publishing time of the ith webpage document; giIndicating the web content update acceleration of the ith web document.
Fig. 2 is a block diagram of an information retrieval system according to an embodiment of the present invention. As shown in fig. 2, the system includes:
the data acquisition module 201 is configured to acquire a keyword set to be searched and a web document set of a data source to be searched in the field of defense science and technology intelligence, where the web document set includes a plurality of web documents.
And the correlation calculation module 202 is configured to calculate correlations between the keyword set to be searched and each of the web page documents.
And the retrieval output module 203 is used for outputting the webpage documents with the relevance greater than or equal to the similarity threshold value, and outputting the webpage documents with the relevance less than the similarity threshold value in sequence from high to low.
The correlation calculation module 202 includes:
and the correlation calculation unit is used for calculating the correlation between the keyword set to be searched and each webpage document by adopting a BM25 model.
The retrieval output module 203 includes:
and the high-similarity document output unit is used for outputting the webpage documents of which the relevance is greater than or equal to the similarity threshold value in the order of high relevance to low relevance.
The retrieval output module 203 further includes:
a time sequence parameter obtaining unit, configured to obtain a time sequence parameter of each web document whose correlation is smaller than the similarity threshold, where the time sequence parameter includes: at least one of the release time, the update time, the total number of clicks, the total number of downloads, the total length of the dwell time of the page and the acceleration of updating the webpage content;
the time sequence calculating unit is used for calculating the time sequence of each webpage document according to the time sequence parameters;
and the time sequence document output unit is used for outputting the webpage documents with the relevance smaller than the similarity threshold value according to the time sequence from high to low.
In this embodiment, the timing parameters include: the time sequence calculating unit comprises the following components of issuing time, updating time, total number of click rate, total number of download amount, total length of dwell time of a page and updating acceleration of webpage content:
a timing calculation subunit configured to:calculating the time sequence of the ith webpage document, wherein I is more than or equal to 1 and less than or equal to I, I represents the number of the webpage documents with the correlation less than the similarity threshold value, SiRepresenting the time sequence of the ith webpage document; diRepresenting the total download amount of the ith webpage document; ciRepresenting the total click rate of the ith webpage document; piRepresenting the total length of the page stay time of the ith webpage document; t2iIndicating the update time of the ith webpage document; t1iIndicating the publishing time of the ith webpage document; giIndicating the web content update acceleration of the ith web document.
The specific implementation process of the invention is as follows:
s1 obtaining national defenseWebpage document set D, D ═ D of data source to be checked in scientific and technological information field1,d2,……,dn},diRepresenting the ith web page document in D.
S2, obtaining the query text input by the user, segmenting the query text to obtain the keyword set Q to be searched1,q2,……,quWherein q isiAnd the ith keyword to be searched in the keyword set to be searched is represented, i is more than or equal to 1 and less than or equal to u, and u represents the number of the keywords to be searched. Each web page document diIs shown as<Q,fi,ri>Q is the keyword set to be searched of the user; f. ofiFor web page documents diThe features of (1); r isiAnd taking the value of the relevance judgment condition of the document and the keyword set Q to be searched, wherein the value range is {0,1},0 represents irrelevant, and 1 represents relevant. Specifically, when determining the keyword set to be searched, for each webpage document diAn optimal segmentation of each document is found by using an unsupervised feature selection method of an RSR algorithm (regulated Self-reconstruction), and the specific steps are as follows:
(1) web page document diIs characterized by the feature set of fi={fi1,fi2,……,fimEach specific feature fijCan be linearly expressed by other features or by itself as:wherein i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to k and less than or equal to m, and wjkDenotes fijAnd fikCoefficient of relationship of eijRepresenting a weighted term, fijRepresenting the jth feature of the ith document.
(2) Set of features f for the documentiSolving for optimality using extremum algorithms Wherein W represents a web document diThe matrix of coefficients of (a) is,W=[wij]∈Rm×m,l2,1the norm on E is to make the algorithm robust to outliers, and also to add W computation2,1Regularization terms to avoid trivial solutions; λ is a non-zero regularization weighting parameter.
Order toWherein, wiIs thatRow i of (2). According to the formulaCorresponding coefficients for each feature may be obtained, where v ═ v1,v2,……,vmI.e. the web page document diJ (th) feature fijCorresponding coefficient is vj。
(3) Counting the appearance of keywords q to be searched in the document characteristicsiWord frequency xiAccording to the formulaObtaining the key word set coefficient t of the filei. According to tiSorting the t according to the sequence from big to smalliThe maximum segmentation is used as the optimal segmentation, so that a keyword set Q to be searched is obtained1,q2,……,qu}。
S3: for each web page document diThe content is divided into 7 content fields, which are respectively a web address (URL), a title, a body content, a document tag (meta keywords), a tag description (meta description), an anchor text (i.e., a link text in a web page), and a search time log. Where each web page document is represented and indexed in the search engine by these fields.
S4, calculating the relevance between the keyword set to be searched and each webpage document in the document set D by using a BM25 model, and finally obtaining the relevance ranking result of the n webpage documents in the document set D through ranking and screening.
The specific calculation method is as follows:
(1) firstly, each keyword Q in a keyword set Q to be searched is calculatediAnd each web document diCorrelation degree R (q) of each content fieldi,di) Then according to the formulaPerforming accumulation operation to obtain the final keyword set Q to be searched and the webpage document diCorrelation of (A), (B), (C), (i),PiRepresenting the weight of the keyword. Wherein the degree of correlation R (q)i,di) The calculation formula of (a) is as follows:
R(qi,di)=[fqi×(k1+1)/(fqi+K)]×[qfi×(k2+1)/(qfi+k2)]wherein K is K1 × (1-b + b × dli×avgdl),qfiAs a keyword qiFrequency of occurrence, fq, in the query statement QiAs a keyword qiIn web page document diThe occurrence frequencies of k1, k2 and b are all adjustment factors, and can be set to k1 ═ 1, k2 ═ 2, dl in generaliIs a web page document diAvgdl is the average length of all web page documents, i.e., document set D,
(2) for all the webpage documents in the document set D, according to the relevance value S (Q, D)i) And sorting from big to small to obtain a document set with descending relevance.
(3) And acquiring a correlation threshold T, and dividing the document set with descending correlation into two parts by using the correlation threshold T, wherein the first half part is the document set with the correlation larger than or equal to the correlation threshold T, and the second half part is the document set with the correlation smaller than the correlation threshold T.
S5, acquiring the publishing time T1, the updating time T2, the total click quantity C (the default value is 0 when the user clicks any position of the webpage with a single mouse), the total download quantity D (the default value is 0 when the user triggers the downloading operation of the webpage content, namely 1 downloading), the total dwell time P and the updating acceleration G of the webpage content in the document set with the correlation smaller than the correlation threshold T. And when the total number C of the click quantity is calculated, 1 click is performed when the user clicks any position of the webpage by a single mouse, and the default value is 0. The value of the web content updating acceleration G changes according to the speed of the web content updating time interval.
And S7, sequentially outputting the webpage documents with the relevance smaller than the similarity threshold value T to the user according to the chronological sequence from high to low.
According to the retrieval method and the retrieval system, the relevance of the retrieval theme and the time sequence of information release are combined, the items of the retrieval result are sorted according to the actual requirement degree of the user, the information search current situation of information personnel is improved, the result concerned by the user is really placed at the forefront, and the requirements of high relevance and high timeliness of the information retrieval result in the field of national defense science and technology information are met.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.
Claims (6)
1. An information retrieval method, the method comprising:
acquiring a keyword set to be searched and a webpage document set of a data source to be searched in the field of national defense science and technology intelligence, wherein the webpage document set comprises a plurality of webpage documents;
calculating the correlation between the keyword set to be searched and each webpage document;
outputting the webpage documents with the relevance larger than or equal to a similarity threshold, and outputting the webpage documents with the relevance smaller than the similarity threshold in sequence from high to low according to the time sequence;
the outputting the webpage documents with the relevance smaller than the similarity threshold value according to the sequence from high to low in time sequence specifically comprises:
acquiring time sequence parameters of each webpage document with the correlation smaller than the similarity threshold, wherein the time sequence parameters comprise: at least one of the release time, the update time, the total number of clicks, the total number of downloads, the total length of the dwell time of the page and the acceleration of updating the webpage content;
calculating the time sequence of each webpage document according to the time sequence parameters, which specifically comprises the following steps:
according to the formula:calculating the time sequence of the ith webpage document, wherein I is more than or equal to 1 and less than or equal to I, I represents the number of the webpage documents with the correlation less than the similarity threshold value, SiRepresenting the time sequence of the ith webpage document; diRepresenting the total download amount of the ith webpage document; ciRepresenting the total click rate of the ith webpage document; piRepresenting the total length of the page stay time of the ith webpage document; t2iIndicating the update time of the ith webpage document; t1iIndicating the publishing time of the ith webpage document; giRepresenting the web page content updating acceleration of the ith web page document;
and outputting the webpage documents with the relevance smaller than the similarity threshold value in the order of high chronological order to low chronological order.
2. The method according to claim 1, wherein the calculating the relevance of the keyword set to be searched to each of the web documents specifically comprises:
and calculating the relevance of the keyword set to be searched and each webpage document by adopting a BM25 model.
3. The method according to claim 1, wherein outputting the web page document whose relevance is greater than or equal to the similarity threshold specifically includes:
and outputting the webpage documents with the relevance larger than or equal to the similarity threshold value in the order of high relevance to low relevance.
4. An information retrieval system, the system comprising:
the system comprises a data acquisition module, a search module and a search module, wherein the data acquisition module is used for acquiring a keyword set to be searched and a webpage document set of a data source to be searched in the field of national defense science and technology intelligence, and the webpage document set comprises a plurality of webpage documents;
the correlation calculation module is used for calculating the correlation between the keyword set to be searched and each webpage document;
the retrieval output module is used for outputting the webpage documents with the relevance larger than or equal to the similarity threshold value and outputting the webpage documents with the relevance smaller than the similarity threshold value from high to low according to the time sequence;
the retrieval output module comprises:
a time sequence parameter obtaining unit, configured to obtain a time sequence parameter of each web document whose correlation is smaller than the similarity threshold, where the time sequence parameter includes: at least one of the release time, the update time, the total number of clicks, the total number of downloads, the total length of the dwell time of the page and the acceleration of updating the webpage content;
a time sequence calculating unit, configured to calculate a time sequence of each web document according to the time sequence parameter, where the time sequence calculating unit includes:
a timing calculation subunit configured to:calculating the time sequence of the ith webpage document, wherein I is more than or equal to 1 and less than or equal to I, and I represents small relevanceNumber of web documents in the similarity threshold, SiRepresenting the time sequence of the ith webpage document; diRepresenting the total download amount of the ith webpage document; ciRepresenting the total click rate of the ith webpage document; piRepresenting the total length of the page stay time of the ith webpage document; t2iIndicating the update time of the ith webpage document; t1iIndicating the publishing time of the ith webpage document; giRepresenting the web page content updating acceleration of the ith web page document;
and the time sequence document output unit is used for outputting the webpage documents with the relevance smaller than the similarity threshold value according to the time sequence from high to low.
5. The system of claim 4, wherein the correlation computation module comprises:
and the correlation calculation unit is used for calculating the correlation between the keyword set to be searched and each webpage document by adopting a BM25 model.
6. The system of claim 4, wherein the search output module comprises:
and the high-similarity document output unit is used for outputting the webpage documents of which the relevance is greater than or equal to the similarity threshold value in the order of high relevance to low relevance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910622980.1A CN110334269B (en) | 2019-07-11 | 2019-07-11 | Information retrieval method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910622980.1A CN110334269B (en) | 2019-07-11 | 2019-07-11 | Information retrieval method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110334269A CN110334269A (en) | 2019-10-15 |
CN110334269B true CN110334269B (en) | 2021-05-07 |
Family
ID=68146347
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910622980.1A Active CN110334269B (en) | 2019-07-11 | 2019-07-11 | Information retrieval method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110334269B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1306258A (en) * | 2001-03-09 | 2001-08-01 | 北京大学 | Method for judging position correlation of a group of query keys or words on network page |
CN101477556A (en) * | 2009-01-22 | 2009-07-08 | 苏州智讯科技有限公司 | Method for discovering hot sport in internet mass information |
CN101625680A (en) * | 2008-07-09 | 2010-01-13 | 东北大学 | Document retrieval method in patent field |
CN102982153A (en) * | 2012-11-29 | 2013-03-20 | 北京亿赞普网络技术有限公司 | Information retrieval method and device |
CN104991962A (en) * | 2015-07-22 | 2015-10-21 | 无锡天脉聚源传媒科技有限公司 | Method and apparatus for generating recommendation information |
CN107977405A (en) * | 2017-11-16 | 2018-05-01 | 北京三快在线科技有限公司 | Data reordering method, data sorting device, electronic equipment and readable storage medium storing program for executing |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5642502A (en) * | 1994-12-06 | 1997-06-24 | University Of Central Florida | Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text |
-
2019
- 2019-07-11 CN CN201910622980.1A patent/CN110334269B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1306258A (en) * | 2001-03-09 | 2001-08-01 | 北京大学 | Method for judging position correlation of a group of query keys or words on network page |
CN101625680A (en) * | 2008-07-09 | 2010-01-13 | 东北大学 | Document retrieval method in patent field |
CN101477556A (en) * | 2009-01-22 | 2009-07-08 | 苏州智讯科技有限公司 | Method for discovering hot sport in internet mass information |
CN102982153A (en) * | 2012-11-29 | 2013-03-20 | 北京亿赞普网络技术有限公司 | Information retrieval method and device |
CN104991962A (en) * | 2015-07-22 | 2015-10-21 | 无锡天脉聚源传媒科技有限公司 | Method and apparatus for generating recommendation information |
CN107977405A (en) * | 2017-11-16 | 2018-05-01 | 北京三快在线科技有限公司 | Data reordering method, data sorting device, electronic equipment and readable storage medium storing program for executing |
Also Published As
Publication number | Publication date |
---|---|
CN110334269A (en) | 2019-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109408703B (en) | Information recommendation method and system, device, electronic equipment and storage medium thereof | |
CN107145496B (en) | Method for matching image with content item based on keyword | |
US9020947B2 (en) | Web knowledge extraction for search task simplification | |
CN102760138B (en) | Classification method and device for user network behaviors and search method and device for user network behaviors | |
US8612435B2 (en) | Activity based users' interests modeling for determining content relevance | |
US8150841B2 (en) | Detecting spiking queries | |
CN111708740A (en) | Mass search query log calculation analysis system based on cloud platform | |
US20070143300A1 (en) | System and method for monitoring evolution over time of temporal content | |
US20080077569A1 (en) | Integrated Search Service System and Method | |
WO2002019158A2 (en) | Method and system for personalisation of digital information | |
WO2014149199A1 (en) | Method and system for multi-phase ranking for content personalization | |
CN103324669A (en) | Method and client for processing web page bookmark | |
CN107145497B (en) | Method for selecting image matched with content based on metadata of image and content | |
US20090132517A1 (en) | Socially-derived relevance in search engine results | |
CN102163228A (en) | Method, apparatus and device for determining sorting result of resource candidates | |
CN105760443A (en) | Project recommending system, device and method | |
CN102930038A (en) | Combined method of search result similar items and system of the same | |
CN102364467A (en) | Network search method and system | |
CN105095209A (en) | Document clustering method, document clustering device and network equipment | |
CN108959580A (en) | A kind of optimization method and system of label data | |
CN104615723B (en) | The determination method and apparatus of query word weighted value | |
Hoang et al. | Academic event recommendation based on research similarity and exploring interaction between authors | |
CN108509449B (en) | Information processing method and server | |
CN117593089A (en) | Credit card recommendation method, apparatus, device, storage medium and program product | |
CN110334269B (en) | Information retrieval method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |