CN110334269A - A kind of information retrieval method and system - Google Patents
A kind of information retrieval method and system Download PDFInfo
- Publication number
- CN110334269A CN110334269A CN201910622980.1A CN201910622980A CN110334269A CN 110334269 A CN110334269 A CN 110334269A CN 201910622980 A CN201910622980 A CN 201910622980A CN 110334269 A CN110334269 A CN 110334269A
- Authority
- CN
- China
- Prior art keywords
- web document
- correlation
- timing
- web
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of information retrieval method and system.Information retrieval method provided by the invention and system calculate the correlation of keyword set to be found with each web document in the web document set of national defense technical information field data source to be checked first;Then the web document that correlation is more than or equal to similar threshold value is exported, and correlation is less than Sequential output of the web document of similar threshold value according to timing from high to low.Search method and system provided by the invention, it is exported the biggish web document of correlation as search result, it can ensure the coverage rate of search result, simultaneously, web document lesser for correlation, according to the Sequential output of its timing from high to low to user, the high-timeliness requirement of information retrieval can satisfy.Therefore, the information retrieval that national defense technical information field is carried out using method and system provided by the invention, can meet the requirement of its high-timeliness and high coverage rate simultaneously.
Description
Technical field
The present invention relates to information retrieval fields, more particularly to a kind of information retrieval method and system.
Background technique
Information retrieval (Information Retrieval) refers to according to user's needs, using certain information retrieval side
Method finds out the search procedure of information required for user from a large amount of information aggregate.The key problem of information retrieval is result row
How sequence is returning to user's most desirable information arrangement before list.The a part of information retrieval as information retrieval is
Refer to and utilize certain information retrieval method, provide the information message such as required news, dynamic, policy, viewpoint for user, it has
There are the main features such as high-timeliness and personalization.The national defense technical information realm information retrieval information retrieval special as one kind,
With the characteristic for requiring high-timeliness and high coverage rate, still, existing search method can not meet simultaneously its high-timeliness and
The requirement of high coverage rate.
Summary of the invention
The object of the present invention is to provide a kind of information retrieval method and systems, can meet national defense technical information field simultaneously
The requirement of the high-timeliness and high coverage rate of information retrieval.
To achieve the above object, the present invention provides following schemes:
A kind of information retrieval method, which comprises
Obtain the web document set of keyword set to be found and national defense technical information field data source to be checked, the net
Page collection of document includes multiple web documents;
Calculate the correlation of the keyword set to be found with each web document;
The web document that correlation is more than or equal to similar threshold value is exported, and correlation is less than the similar threshold value
Sequential output of the web document according to timing from high to low.
Optionally, the correlation for calculating the keyword set to be found and each web document is specific to wrap
It includes:
The correlation of the keyword set to be found with each web document is calculated using BM25 model.
Optionally, the web document that correlation is more than or equal to similar threshold value exports, and specifically includes:
Correlation is more than or equal to sequence of each web document of the similar threshold value according to correlation from high to low
Output.
Optionally, described that correlation is less than sequence of the web document of the similar threshold value according to timing from high to low
Output, specifically includes:
The time sequence parameter that correlation is less than each web document of the similar threshold value is obtained, the time sequence parameter includes: hair
Cloth time, renewal time, click volume sum, download sum, page residence time overall length and web page contents update in acceleration
At least one;
The timing of each web document is calculated according to the time sequence parameter;
Each web document that correlation is less than the similar threshold value is exported according to the sequence of timing from high to low.
Optionally, the time sequence parameter includes: issuing time, renewal time, click volume sum, download sum, the page
Residence time overall length and web page contents update acceleration, it is described according to the time sequence parameter calculate each web document when
Sequence specifically includes:
According to formula:Calculate the timing of i-th of web document, 1
≤ i≤I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIndicate the timing of i-th of web document
Property;DiIndicate the download sum of i-th of web document;CiIndicate the click volume sum of i-th of web document;PiIt indicates i-th
The page residence time overall length of web document;T2iIndicate the renewal time of i-th of web document;T1iIndicate i-th of web document
Issuing time;GiIndicate that the web page contents of i-th of web document update acceleration.
A kind of information retrieval system, the system comprises:
Data acquisition module, for obtaining the net of keyword set to be found and national defense technical information field data source to be checked
Page collection of document, the web document set includes multiple web documents;
Correlation calculations module is related to each web document for calculating the keyword set to be found
Property;
Search and output module, the web document for correlation to be more than or equal to similar threshold value export, and will be related
Property be less than Sequential output of the web document according to timing from high to low of the similar threshold value.
Optionally, the correlation calculations module includes:
Correlation calculations unit, for calculating the keyword set to be found and each webpage using BM25 model
The correlation of document.
Optionally, the search and output module includes:
High similar document output unit, each web document for correlation to be more than or equal to the similar threshold value are pressed
According to the Sequential output of correlation from high to low.
Optionally, the search and output module includes:
Time sequence parameter acquiring unit, for obtaining timing ginseng of the correlation less than each web document of the similar threshold value
Number, the time sequence parameter include: issuing time, renewal time, click volume sum, download sum, page residence time overall length
At least one of acceleration is updated with web page contents;
Timing computing unit, for calculating the timing of each web document according to the time sequence parameter;
Timing document output unit, for correlation to be less than the similar threshold value according to the sequence of timing from high to low
Each web document output.
Optionally, the time sequence parameter includes: issuing time, renewal time, click volume sum, download sum, the page
Residence time overall length and web page contents update acceleration, and the timing computing unit includes:
Timing computation subunit, for according to formula:It calculates i-th
The timing of web document, 1≤i≤I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIndicate the
The timing of i web document;DiIndicate the download sum of i-th of web document;CiIndicate the click of i-th of web document
Amount sum;PiIndicate the page residence time overall length of i-th of web document;T2iIndicate the renewal time of i-th of web document;
T1iIndicate the issuing time of i-th of web document;GiIndicate that the web page contents of i-th of web document update acceleration.
The specific embodiment provided according to the present invention, the invention discloses following technical effects:
Information retrieval method provided by the invention and system calculate keyword set to be found and national defense technical information first
The correlation of each web document in the web document set of field data source to be checked;Then correlation is more than or equal to phase
It is exported like the web document of threshold value, and correlation is less than sequence of the web document of similar threshold value according to timing from high to low
Output.Search method and system provided by the invention are exported the biggish web document of correlation as search result, can be true
The coverage rate of search result is protected, meanwhile, web document lesser for correlation is defeated according to the sequence of its timing from high to low
Out to user, the high-timeliness requirement of information retrieval can satisfy.Therefore, state is carried out using method and system provided by the invention
The information retrieval in anti-scientific and technological information field can meet the requirement of its high-timeliness and high coverage rate simultaneously.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 is a kind of flow chart of information retrieval method provided in an embodiment of the present invention;
Fig. 2 is a kind of structural block diagram of information retrieval system provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The object of the present invention is to provide a kind of information retrieval method and systems, can meet national defense technical information field simultaneously
The requirement of the high-timeliness and high coverage rate of information retrieval.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
Fig. 1 is a kind of flow chart of information retrieval method provided in an embodiment of the present invention.As shown in Figure 1, the method packet
It includes:
Step 101: obtaining the web document collection of keyword set to be found and national defense technical information field data source to be checked
It closes, the web document set includes multiple web documents.
Step 102: calculating the correlation of the keyword set to be found and each web document.The present embodiment
In, the correlation of the keyword set to be found with each web document is calculated using BM25 model.
Step 103: the web document that correlation is more than or equal to similar threshold value being exported, and by correlation less than described
Sequential output of the web document of similar threshold value according to timing from high to low.
In practical application, can by correlation be more than or equal to the similar threshold value each web document according to correlation by
High to Low Sequential output is placed on foremost to user, the i.e. highest web document of correlation, and what correlation was taken second place is placed on second
Position, and so on, each web document that correlation is more than or equal to the similar threshold value is exported to user.
It is described that correlation is less than Sequential output of the web document of the similar threshold value according to timing from high to low, tool
Body includes:
The time sequence parameter that correlation is less than each web document of the similar threshold value is obtained, the time sequence parameter includes: hair
Cloth time, renewal time, click volume sum, download sum, page residence time overall length and web page contents update in acceleration
At least one;
The timing of each web document is calculated according to the time sequence parameter;
Each web document that correlation is less than the similar threshold value is exported according to the sequence of timing from high to low.
In the present embodiment, the time sequence parameter include: issuing time, renewal time, click volume sum, download sum,
Page residence time overall length and web page contents update acceleration, described to calculate each web document according to the time sequence parameter
Timing, specifically include:
According to formula:Calculate the timing of i-th of web document, 1
≤ i≤I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIndicate the timing of i-th of web document
Property;DiIndicate the download sum of i-th of web document;CiIndicate the click volume sum of i-th of web document;PiIt indicates i-th
The page residence time overall length of web document;T2iIndicate the renewal time of i-th of web document;T1iIndicate i-th of web document
Issuing time;GiIndicate that the web page contents of i-th of web document update acceleration.
Fig. 2 is a kind of structural block diagram of information retrieval system provided in an embodiment of the present invention.As shown in Fig. 2, the system
Include:
Data acquisition module 201, for obtaining keyword set to be found and national defense technical information field data source to be checked
Web document set, the web document set includes multiple web documents.
Correlation calculations module 202, for calculating the phase of the keyword set to be found with each web document
Guan Xing.
Search and output module 203, the web document for correlation to be more than or equal to similar threshold value export, and by phase
Closing property is less than Sequential output of the web document of the similar threshold value according to timing from high to low.
The correlation calculations module 202 includes:
Correlation calculations unit, for calculating the keyword set to be found and each webpage using BM25 model
The correlation of document.
The search and output module 203 includes:
High similar document output unit, each web document for correlation to be more than or equal to the similar threshold value are pressed
According to the Sequential output of correlation from high to low.
The search and output module 203 further include:
Time sequence parameter acquiring unit, for obtaining timing ginseng of the correlation less than each web document of the similar threshold value
Number, the time sequence parameter include: issuing time, renewal time, click volume sum, download sum, page residence time overall length
At least one of acceleration is updated with web page contents;
Timing computing unit, for calculating the timing of each web document according to the time sequence parameter;
Timing document output unit, for correlation to be less than the similar threshold value according to the sequence of timing from high to low
Each web document output.
In the present embodiment, the time sequence parameter include: issuing time, renewal time, click volume sum, download sum,
Page residence time overall length and web page contents update acceleration, and the timing computing unit includes:
Timing computation subunit, for according to formula:Calculate i-th
The timing of a web document, 1≤i≤I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIt indicates
The timing of i-th of web document;DiIndicate the download sum of i-th of web document;CiIndicate the point of i-th of web document
The amount of hitting sum;PiIndicate the page residence time overall length of i-th of web document;T2iIndicate the renewal time of i-th of web document;
T1iIndicate the issuing time of i-th of web document;GiIndicate that the web page contents of i-th of web document update acceleration.
Specific implementation process of the invention is as follows:
S1: web document set D, the D={ d of national defense technical information field data source to be checked are obtained1,d2,……,dn, di
Indicate i-th of web document in D.
S2: obtaining the query text of user's input, carries out cutting to query text, obtains keyword set Q=to be found
{q1,q2,……,qu, wherein qiIndicate that i-th of keyword to be found in keyword set to be found, 1≤i≤u, u indicate
The quantity of keyword to be found.Each web document diIt is expressed as < Q, fi, ri> triple form, Q is the to be checked of user
Look for keyword set;fiFor web document diFeature;riFor the correlation Rule of judgment of document and keyword set Q to be found
Value, value range are { 0,1 }, and 0 represents uncorrelated, and 1 represents correlation.Specifically, it is determined that when keyword set to be found, to every
A web document di, use the unsupervised feature selection approach of RSR algorithm (Regularized Self-Representation)
To find the optimal cutting of each document, the specific steps are as follows:
(1) web document diCharacteristic set be fi={ fi1,fi2,……,fim, each specific features fijIt can pass through
Other feature or oneself linear expression are as follows:Wherein, 1≤i≤n, 1≤j≤k≤m, wjkIndicate fij
And fikCoefficient of relationship, eijIndicate weighted term, fijIndicate j-th of feature of i-th of document.
(2) to the characteristic set f of the documenti, solved using extreme value algorithm optimal Wherein, W indicates web document diCoefficient matrix, W=[wij]∈Rm×m, l2,1
Norm is that and also added on E in order to make algorithm have robustness to outlier | | W | |2,1Regular terms is flat to avoid the occurrence of
All solutions;λ is the regularization weighting parameters of non-zero.
It enablesWherein, wiIt isThe i-th row.According to formulaIt can be obtained every
The coefficient of correspondence of a feature, wherein v={ v1,v2,……,vm, i.e. web document diJ-th of feature fijCorresponding coefficient is
vj。
(3) occurs keyword q to be found in statistical documents featureiWord frequency xi, according to formulaIt obtains
The keyword set coefficient t of the documenti.According to tiDescending sequence is ranked up, and selects tiMaximum cutting is as optimal
Cutting, to obtain keyword set Q={ q to be found1,q2,……,qu}。
S3: to each web document di, it is network address (URL), title, main body respectively that dividing its content, which is 7 content domains,
Content, document label (meta keywords), label describe (meta description), the Anchor Text (link i.e. in webpage
Text) and lookup time log.Wherein, each web document is in a search engine by these domain representations and index.
S4: the progress of each web document in keyword set to be found and collection of document D is calculated using BM25 model
Correlation obtains the relevance ranking result of n web document in collection of document D eventually by sequence screening.
Circular is as follows:
(1) each keyword q in keyword set Q to be found is calculated firstiWith each web document diIn each content
Degree of correlation R (the q in domaini,di), then according to formulaCarry out accumulation operations, obtain it is final to
Search keyword set Q and web document diCorrelation S (Q, di), PiIndicate the weight of the keyword.Wherein, degree of correlation R
(qi,di) calculation formula it is as follows:
R(qi,di)=[fqi×(k1+1)/(fqi+K)]×[qfi×(k2+1)/(qfi+ k2)], wherein K=k1 × (1-
b+b×dli× avgdl), qfiFor keyword qiThe frequency of occurrences in query statement Q, fqiFor keyword qiIn web document di
In the frequency of occurrences, k1, k2, b is regulatory factor, may be configured as k1=1, k2=2, dl under normal circumstancesiIt is web document di
Length, avgdl is all web documents i.e. average length of collection of document D,
(2) to all web documents in collection of document D, according to relevance values S (Q, di) be ranked up from big to small, it obtains
The collection of document arranged to correlation descending.
(3) dependent thresholds T is obtained, the collection of document that correlation descending arranges is divided into two parts using dependent thresholds T, it is preceding
Half portion is divided into the collection of document that correlation is more than or equal to dependent thresholds T, and latter half is correlation less than dependent thresholds T's
Collection of document.
S5: it obtains in collection of document of the correlation less than dependent thresholds T, the issuing time T1 of each document, renewal time
T2, click volume sum C (as 1 time click, default value 0 when user's single machine mouse clicks any position of the webpage), download
Total D (user is 1 downloading, default value 0 to web page contents triggering down operation), page residence time overall length P and
Web page contents update acceleration G.When calculating click volume sum C, as 1 time when user's single machine mouse clicks any position of the webpage
It clicks, default value 0.The value that web page contents update acceleration G changes according to the speed at web page contents renewal time interval.
S6: according to formulaCalculate the timing of each web document.
S7: it is sequentially output according to each web document of the sequence by correlation less than similar threshold value T of timing from high to low
To user.
A kind of search method and system provided by the invention, in conjunction with the timing of correlation and the information publication of searching motif
Property, search result entry is ranked up according to the actual demand degree of user, improves the information search status of intelligence agent,
The result for being truly realized user's care is placed on foremost, and the height for meeting national defense technical information realm information search result is related
Property and high-timeliness requirement.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.
Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said
It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation
Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not
It is interpreted as limitation of the present invention.
Claims (10)
1. a kind of information retrieval method, which is characterized in that the described method includes:
Obtain the web document set of keyword set to be found and national defense technical information field data source to be checked, the webpage text
Shelves set includes multiple web documents;
Calculate the correlation of the keyword set to be found with each web document;
The web document that correlation is more than or equal to similar threshold value is exported, and correlation is less than to the net of the similar threshold value
Sequential output of the page document according to timing from high to low.
2. the method according to claim 1, wherein described calculate the keyword set to be found and each institute
The correlation for stating web document, specifically includes:
The correlation of the keyword set to be found with each web document is calculated using BM25 model.
3. the method according to claim 1, wherein the net that correlation is more than or equal to similar threshold value
Page document output, specifically includes:
Correlation is more than or equal to Sequential output of each web document of the similar threshold value according to correlation from high to low.
4. the method according to claim 1, wherein the webpage text that correlation is less than to the similar threshold value
Sequential output of the shelves according to timing from high to low, specifically includes:
The time sequence parameter that correlation is less than each web document of the similar threshold value is obtained, when the time sequence parameter includes: publication
Between, renewal time, click volume sum, download sum, page residence time overall length and web page contents update in acceleration at least
One;
The timing of each web document is calculated according to the time sequence parameter;
Each web document that correlation is less than the similar threshold value is exported according to the sequence of timing from high to low.
5. according to the method described in claim 4, it is characterized in that, the time sequence parameter include: issuing time, renewal time,
Click volume sum, download sum, page residence time overall length and web page contents update acceleration, described to be joined according to the timing
Number calculates the timing of each web document, specifically includes:
According to formula:Calculate the timing of i-th of web document, 1≤i≤
I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIndicate the timing of i-th of web document;DiTable
Show the download sum of i-th of web document;CiIndicate the click volume sum of i-th of web document;PiIndicate i-th of webpage text
The page residence time overall length of shelves;T2iIndicate the renewal time of i-th of web document;T1iIndicate the publication of i-th of web document
Time;GiIndicate that the web page contents of i-th of web document update acceleration.
6. a kind of information retrieval system, which is characterized in that the system comprises:
Data acquisition module, for obtaining the webpage text of keyword set to be found and national defense technical information field data source to be checked
Shelves set, the web document set includes multiple web documents;
Correlation calculations module, for calculating the correlation of the keyword set to be found with each web document;
Search and output module, the web document for correlation to be more than or equal to similar threshold value export, and correlation is small
In Sequential output of the web document according to timing from high to low of the similar threshold value.
7. system according to claim 6, which is characterized in that the correlation calculations module includes:
Correlation calculations unit, for calculating the keyword set to be found and each web document using BM25 model
Correlation.
8. system according to claim 6, which is characterized in that the search and output module includes:
High similar document output unit, for correlation to be more than or equal to each web document of the similar threshold value according to phase
The Sequential output of closing property from high to low.
9. system according to claim 6, which is characterized in that the search and output module includes:
Time sequence parameter acquiring unit, for obtaining time sequence parameter of the correlation less than each web document of the similar threshold value, institute
Stating time sequence parameter includes: issuing time, renewal time, click volume sum, download sum, page residence time overall length and webpage
At least one of content update acceleration;
Timing computing unit, for calculating the timing of each web document according to the time sequence parameter;
Timing document output unit, for correlation to be less than each of the similar threshold value according to the sequence of timing from high to low
Web document output.
10. system according to claim 9, which is characterized in that the time sequence parameter include: issuing time, renewal time,
Click volume sum, download sum, page residence time overall length and web page contents update acceleration, the timing computing unit
Include:
Timing computation subunit, for according to formula:Calculate i-th of webpage
The timing of document, 1≤i≤I, I indicate that correlation is less than the quantity of the web document of the similar threshold value, SiIt indicates i-th
The timing of web document;DiIndicate the download sum of i-th of web document;CiIndicate that the click volume of i-th of web document is total
Number;PiIndicate the page residence time overall length of i-th of web document;T2iIndicate the renewal time of i-th of web document;T1iTable
Show the issuing time of i-th of web document;GiIndicate that the web page contents of i-th of web document update acceleration.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910622980.1A CN110334269B (en) | 2019-07-11 | 2019-07-11 | Information retrieval method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910622980.1A CN110334269B (en) | 2019-07-11 | 2019-07-11 | Information retrieval method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110334269A true CN110334269A (en) | 2019-10-15 |
CN110334269B CN110334269B (en) | 2021-05-07 |
Family
ID=68146347
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910622980.1A Active CN110334269B (en) | 2019-07-11 | 2019-07-11 | Information retrieval method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110334269B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5893092A (en) * | 1994-12-06 | 1999-04-06 | University Of Central Florida | Relevancy ranking using statistical ranking, semantics, relevancy feedback and small pieces of text |
CN1306258A (en) * | 2001-03-09 | 2001-08-01 | 北京大学 | Method for judging position correlation of a group of query keys or words on network page |
CN101477556A (en) * | 2009-01-22 | 2009-07-08 | 苏州智讯科技有限公司 | Method for discovering hot sport in internet mass information |
CN101625680A (en) * | 2008-07-09 | 2010-01-13 | 东北大学 | Document retrieval method in patent field |
CN102982153A (en) * | 2012-11-29 | 2013-03-20 | 北京亿赞普网络技术有限公司 | Information retrieval method and device |
CN104991962A (en) * | 2015-07-22 | 2015-10-21 | 无锡天脉聚源传媒科技有限公司 | Method and apparatus for generating recommendation information |
CN107977405A (en) * | 2017-11-16 | 2018-05-01 | 北京三快在线科技有限公司 | Data reordering method, data sorting device, electronic equipment and readable storage medium storing program for executing |
-
2019
- 2019-07-11 CN CN201910622980.1A patent/CN110334269B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5893092A (en) * | 1994-12-06 | 1999-04-06 | University Of Central Florida | Relevancy ranking using statistical ranking, semantics, relevancy feedback and small pieces of text |
CN1306258A (en) * | 2001-03-09 | 2001-08-01 | 北京大学 | Method for judging position correlation of a group of query keys or words on network page |
CN101625680A (en) * | 2008-07-09 | 2010-01-13 | 东北大学 | Document retrieval method in patent field |
CN101477556A (en) * | 2009-01-22 | 2009-07-08 | 苏州智讯科技有限公司 | Method for discovering hot sport in internet mass information |
CN102982153A (en) * | 2012-11-29 | 2013-03-20 | 北京亿赞普网络技术有限公司 | Information retrieval method and device |
CN104991962A (en) * | 2015-07-22 | 2015-10-21 | 无锡天脉聚源传媒科技有限公司 | Method and apparatus for generating recommendation information |
CN107977405A (en) * | 2017-11-16 | 2018-05-01 | 北京三快在线科技有限公司 | Data reordering method, data sorting device, electronic equipment and readable storage medium storing program for executing |
Non-Patent Citations (1)
Title |
---|
检索结果多样化研究综述: "冯晓华等", 《情报学报》 * |
Also Published As
Publication number | Publication date |
---|---|
CN110334269B (en) | 2021-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20110106796A1 (en) | System and method for recommendation of interesting web pages based on user browsing actions | |
EP2145264B1 (en) | Calculating importance of documents factoring historical importance | |
US8244737B2 (en) | Ranking documents based on a series of document graphs | |
Niranjan et al. | Developing a web recommendation system based on closed sequential patterns | |
Raman et al. | Online learning to diversify from implicit feedback | |
Shie et al. | Online mining of temporal maximal utility itemsets from data streams | |
US7720870B2 (en) | Method and system for quantifying the quality of search results based on cohesion | |
US8468153B2 (en) | Information service for facts extracted from differing sources on a wide area network | |
Yagci et al. | Scalable and adaptive collaborative filtering by mining frequent item co-occurrences in a user feedback stream | |
Prajapati | A survey paper on hyperlink-induced topic search (HITS) algorithms for web mining | |
CN110209909A (en) | Data crawling method, device, computer equipment and storage medium | |
CN105302898B (en) | A kind of search ordering method and device based on click model | |
Wang et al. | Optimal Control of Forward‐Backward Stochastic Jump‐Diffusion Differential Systems with Observation Noises: Stochastic Maximum Principle | |
Kaur et al. | SIMHAR-smart distributed web crawler for the hidden web using SIM+ hash and redis server | |
Barla et al. | Rule-based user characteristics acquisition from logs with semantics for personalized web-based systems | |
Chauhan et al. | Web page ranking using machine learning approach | |
Srivastava et al. | Discussion on damping factor value in PageRank computation | |
CN110334269A (en) | A kind of information retrieval method and system | |
Yang et al. | On characterizing and computing the diversity of hyperlinks for anti-spamming page ranking | |
CN103902687B (en) | The generation method and device of a kind of Search Results | |
Lambhate et al. | Hybrid algorithm on semantic web crawler for search engine to improve memory space and time | |
Xu et al. | [Retracted] Generating Personalized Web Search Using Semantic Context | |
Lai et al. | Personalized Web search results with profile comparisons | |
Yue et al. | Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction | |
Godoy et al. | A user profiling architecture for textual-based agents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |