CN104765782B - A kind of index order update method and device - Google Patents
A kind of index order update method and device Download PDFInfo
- Publication number
- CN104765782B CN104765782B CN201510125423.0A CN201510125423A CN104765782B CN 104765782 B CN104765782 B CN 104765782B CN 201510125423 A CN201510125423 A CN 201510125423A CN 104765782 B CN104765782 B CN 104765782B
- Authority
- CN
- China
- Prior art keywords
- ranking results
- caching
- index
- segment
- publisher
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of index order update method and device, is related to computing technique and search field, in the prior art can not reflect in time the factor of the real-time updates such as information publisher's state to solve to search result, make the problem that search result is not accurate enough.The described method includes: carrying out inverted index to data segment according to the first inquiry request is calculated the first sequence as a result, and first ranking results are stored in caching;Forward index is carried out to first ranking results in caching according to the publisher's state refreshed in real time to calculate to carry out real-time update to first ranking results.
Description
Technical field
The present invention relates to calculating and information technology fields, more particularly to a kind of information displaying method and device.
Background technique
Information sorting in the search result of classification information website is influenced by factors, in addition to the correlation of information itself
Except the factors such as property, renewal time, there are also the state of information publisher, the generic of information and place regions etc..
However in the prior art, when the factors such as publisher's state change, since data volume is big, requirement of real-time is high, it is
System is difficult in the information sorting that these factors are reflected to search result in time, to keep search result not accurate enough.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of index order update method and devices, to solve existing skill
The factor of the real-time updates such as information publisher's state can not be reflected in art to search result in time, keep search result not accurate enough
The problem of.
On the one hand, the present invention provides a kind of index order update method, comprising: according to the first inquiry request to data segment into
The first sequence is calculated as a result, and first ranking results are stored in caching in row inverted index;According to publisher's state pair
First ranking results in caching carry out forward index and calculate to carry out real-time update to first ranking results.
Optionally, the multiple segmentations of the data segment point are managed, and are stored in preset time range in each segmentation
The data of generation, the corresponding preset time range of each segmentation are different.
Optionally, described that first ranking results packet is calculated to data segment progress inverted index according to the first inquiry request
It includes: according to first inquiry request, inverted index being carried out to each segmentation, the first ranking results are calculated.
Optionally, publisher's state that the basis refreshes in real time is just arranging first ranking results in caching
It includes: to exist when in the first ranking results of the data segment that index, which is calculated to carry out real-time update to first ranking results,
When the situation that document is deleted, corresponding document is removed from the cache;According to the publisher's state refreshed in real time in caching
First ranking results carry out forward index calculate with to first ranking results carry out real-time update.
Optionally, publisher's state includes the user property of publisher or the operation behavior of publisher.
Further, forward index meter is carried out to first ranking results in caching according to publisher's state described
After calculating to carry out real-time update to first ranking results, the method also includes: according to the second inquiry request described slow
Deposit middle progress result set inquiry;There are the data acquisitions in the case where result set, from the caching in the caching
The result set;In the case where the result set is not present in the caching, successively carries out inverted index and calculate and positive row's rope
Draw calculating to obtain the second ranking results.
On the other hand, the present invention also provides a kind of index order updating devices, comprising: inverted index computing unit is used for
Inverted index is carried out to data segment according to the first inquiry request, the first sequence is calculated as a result, and by first ranking results
Deposit caching;Forward index computing unit, for being carried out just according to publisher's state to first ranking results in caching
Row's index is calculated to carry out real-time update to first ranking results.
Optionally, the multiple segmentations of the data segment point are managed, and are stored in preset time range in each segmentation
The data of generation, the corresponding preset time range of each segmentation are different.
Optionally, the forward index computing unit is specifically used for: existing when in the first ranking results of the data segment
When the situation that document is deleted, corresponding document is removed from the cache;According to the publisher's state refreshed in real time in caching
First ranking results carry out forward index calculate with to first ranking results carry out real-time update.
Further, described device further include: query unit, for according to publisher's state to described in caching
After one ranking results carry out forward index calculating to carry out real-time update to first ranking results, according to the second inquiry request
Result set inquiry is carried out in the caching;Acquiring unit, in the caching there are in the case where the result set, from
Result set described in data acquisition in the caching;In the case where the result set is not present in the caching, successively trigger
The inverted index computing unit and the forward index computing unit calculate and forward index calculating carrying out inverted index
Afterwards, the second ranking results are obtained.
Index order update method provided in an embodiment of the present invention and device, can be according to the first inquiry request to data segment
It carries out inverted index and the first sequence is calculated as a result, and first ranking results are stored in caching;Then according to publisher
State carries out forward index to first ranking results in caching and calculates to carry out in real time more to first ranking results
Newly.In this way, faster due to the data throughput speed in caching, and only need to carry out simple forward index calculating, therefore energy
It is enough to reflect the update of publisher's state in ranking results in time, to substantially increase the accuracy of search result.
Detailed description of the invention
Fig. 1 is a kind of flow chart of index order update method provided in an embodiment of the present invention;
Fig. 2 is a kind of operating process schematic diagram of index order update method in the preferred embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of data segment in the preferred embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of index order updating device provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with attached drawing, the present invention is described in detail.It should be appreciated that specific embodiment described herein is only
To explain the present invention, the present invention is not limited.
As shown in Figure 1, the embodiment of the present invention provides a kind of index order update method, comprising:
S11 carries out inverted index to data segment according to the first inquiry request and the first sequence is calculated as a result, and will be described
First ranking results deposit caching;
S12 carries out forward index meter to first ranking results in caching according to the publisher's state refreshed in real time
It calculates to carry out real-time update to first ranking results.
Index order update method provided in an embodiment of the present invention can fall data segment according to the first inquiry request
The first sequence is calculated as a result, and first ranking results are stored in caching in row's index;Then according to the hair refreshed in real time
Cloth person state carries out forward index to first ranking results in caching and calculates to carry out in fact to first ranking results
Shi Gengxin.In this way, faster due to the data throughput speed in caching, and only need to carry out simple forward index calculating, because
This can reflect the update of publisher's state in ranking results in time, to substantially increase the accuracy of search result.
Wherein, optionally, publisher's state may include the user property of publisher or the operation behavior of publisher etc.
It can be with the feature of real-time change.
Operation change to index may include inquiry and update.To support multi-thread concurrent inquiry and real time indexing,
Multistage can be divided to be managed index data.
For search index function, the document in index data section can be carried out real-time according to the inquiry request of user
Sequence, and the document after sequence is stored in caching.For document more new function, index data section can be voluntarily to document therein
Implement the operations such as document updates and batch heterogeneous profiles refresh.
Specifically, data segment is a kind of way to manage to magnanimity index data.For support multi-thread concurrent inquiry and
Real time indexing, the multiple segmentations of data segment point are managed, and the number generated in preset time range is only stored in each segmentation
According to the corresponding preset time range of each segmentation is different.In general, these data segments can be divided into reader segment and
writer segment.Wherein, reader segment for user query and can carry out the deleting of data, change;writer
Segment only provides the increasing of data, deletes, changes function, does not provide query function.After a collection of index data generates, it can be generated
Document addition request, newly-increased document is added in writer segment.When writer segment life cycle reaches the upper limit
Later (such as 3 seconds), and writer segment is converted into reader segment, becomes the reader for inquiry
Segment, while creating new writer segment.It can be scheduled between each reader segment and dynamic is melted
It closes.
Specifically, in one embodiment of the invention, it can be with timing scan reader segment, as a reader
After the life cycle of segment reaches the upper limit, just by this reader segment and one bigger than its life cycle
Adjacent reader segment is merged.
Preferably, as shown in Figure 2, wherein w indicates that writer segment, r and R indicate reader segment, by scheming
2 as can be seen that the reader segment capacity minimum life cycle adjacent with writer segment is most short, from writer
Segment is remoter, and the capacity of reader segment is bigger, and life cycle is bigger.
Preferably, biggish Reader segment can be indexed by the full dose under line, and lesser data segment can be on line
Real time indexing section;Each section has its life cycle, had after reaching its life cycle carry out life cycle promotion or to
The bigger section of life cycle merges.
Since data segment is managed by multiple segmentations, optionally, data segment is carried out according to the first inquiry request
It is specific that the first ranking results are calculated in inverted index can include: same to each segmentation according to first inquiry request
The first sequence is calculated as a result, to effectively accelerate search speed in Shi Jinhang inverted index.
Specifically, document update can be started by document isomery refresh requests.Optionally, document isomery refresh requests
May include two: querying condition and update condition, for example, in one embodiment of the invention, querying condition and update condition
Are as follows:
Query=day:Friday AND gender:male&&&update=valid_days:100, price:
19.22
When updater receives the request of isomery refreshing, can proceed as follows:
1) it according to the querying condition in request, executes investigation to each reader segment data segment and askes, obtain
All qualified document sets;
2) document sets are traversed, according to every a pair of of the domain name and thresholding in request, to the correspondence in each document just arranging domain into
Row refreshes.
Request is updated since updater serially executes each document, so document sets will not change in the process, thus
It ensure that the integrality of batch refresh.
Since operation of the user to data can be directly reflected into data segment, but it is different surely reflect in caching,
In order to make the data in caching that can also timely update, it is preferred that in step s 12, according to the publisher's state refreshed in real time
Carrying out forward index calculating to first ranking results in caching can to carry out real-time update to first ranking results
It specifically includes:
When there is a situation where that document is deleted in the first ranking results of the data segment, by corresponding document from caching
Middle deletion;
According to the publisher's state refreshed in real time in caching first ranking results carry out forward index calculate with
Real-time update is carried out to first ranking results.
Due to having existed for some ranking results after step s 12, in caching, therefore, inquired when next time
Or when search, result set inquiry can be carried out in the caching according to the second inquiry request;Exist in the caching described
In the case where result set, from result set described in the data acquisition in the caching;The result set is not present in the caching
In the case where, it successively carries out inverted index and calculates with forward index calculating to obtain the second ranking results.
That is, it is directed to a query, and if coming to nothing collection in cache, progress inverted index inquiry first
It is ranked up Deng calculating, and according to main sequence dimension (such as time), the query result after obtaining a sequence for the first time
Collection, there are in cache, this step is thick row.This result set will guarantee there are enough redundancies.For example, even if inquiry request
The information for only taking first page will also save former pages of information result collection, in thick row for the first time to work as user's page turning backward
When, cache result can be also multiplexed, and on the other hand, when the document in cache is deleted in real time or is dropped temporary, redundancy is literary in cache
Shelves can also supplement up.Specific redundancy number of pages can be depending on the common page turning number of user, and user usually will not browse page
The information of number excessively rearward.
After obtaining the result set slightly arranged in cache, using more complex marking mode to the document in result set into
Row rearrangement.This marking mode can be using just arranging domain as marking factor as publisher's score of document, therefore works as
When score updates, ranking results also be will be updated, this step is essence row.
It should be noted that filtering out from the result set and being deleted in real time if there is the result set met in cache
The document removed is (when the document in some section is deleted, it may be possible to because business needs, it is also possible to because of the brush of entire chapter document
Newly, that is, this document it is deleted from the section, and the new data of this document is increased in new section, at this time taken
If cache, the data in cache may also maintain this document, then cache is just inconsistent with the state of section, institute
To need to filter out the document deleted), and the number of documents according to specified by query, intercept required result set.
In the embodiment of the present invention, other than it can carry out this more rough sequence of inverted index in data segment,
More accurate forward index can also be carried out at cache layers.Cache layers avoid the operations such as frequent inverted index inquiry,
It ensure that query performance.Essence row has used cache result the positive row domain such as publisher's score of real-time refreshing as marking factor
It resequences, ensure that ranking results being capable of real-time update.Moreover, because essence row carries out only for the result set in cache
Sequence, does not need to carry out inverted index inquiry again, so not having to expend attitude resource.
Correspondingly, as shown in figure 3, the embodiment of the present invention also provides a kind of index order updating device, comprising:
Inverted index computing unit 41 is calculated for carrying out inverted index to data segment according to the first inquiry request
One ranking results, and first ranking results are stored in and are cached;
Forward index computing unit 42, for being sorted according to the publisher's state refreshed in real time to described first in caching
As a result forward index is carried out to calculate to carry out real-time update to first ranking results.
Index order updating device provided in an embodiment of the present invention, inverted index computing unit 41 can be according to the first inquiries
Request carries out inverted index to data segment and the first sequence is calculated as a result, and first ranking results are stored in caching;So
Forward index computing unit 42 carries out just first ranking results in caching according to the publisher's state refreshed in real time afterwards
Row's index is calculated to carry out real-time update to first ranking results.In this way, faster due to the data throughput speed in caching,
And only need to carry out simple forward index calculating, therefore the update of publisher's state can be reflected in time and arrive ranking results
In, to substantially increase the accuracy of search result.
Optionally, the multiple segmentations of the data segment point are managed, and preset time range is only stored in each segmentation
The data of interior generation, the corresponding preset time range of each segmentation are different.
Optionally, the forward index computing unit is specifically used for: existing when in the first ranking results of the data segment
When the situation that document is deleted, corresponding document is removed from the cache;According to the publisher's state refreshed in real time in caching
First ranking results carry out forward index calculate with to first ranking results carry out real-time update.
In another embodiment, as shown in figure 4, index order updating device provided by the invention may also include that
Query unit 43, for carrying out forward index to first ranking results in caching according to publisher's state
After calculating to carry out real-time update to first ranking results, result set is carried out in the caching according to the second inquiry request
Inquiry;
Acquiring unit 44, for there are the data in the case where result set, from the caching in the caching
Obtain the result set;In the case where the result set is not present in the caching, inverted index computing unit is successively triggered
41 and forward index computing unit 42, it calculates carrying out inverted index and after forward index calculates, obtains the second ranking results.
Although for illustrative purposes, the preferred embodiment of the present invention has been disclosed, those skilled in the art will recognize
It is various improve, increase and replace be also it is possible, therefore, the scope of the present invention should be not limited to the above embodiments.
Claims (8)
1. a kind of index order update method characterized by comprising
Inverted index is carried out to data segment according to the first inquiry request, the first sequence is calculated as a result, and described first sorts
As a result deposit caching;
Forward index is carried out to first ranking results in caching according to publisher's state to calculate to first sequence
As a result real-time update is carried out;
Wherein, the multiple segmentations of the data segment point are managed, and generation in preset time range is stored in each segmentation
Data, the corresponding preset time range of each segmentation is different, and multiple segmentations include: writer segment and multiple
The corresponding preset time range of reader segment, the reader segment is with the reader segment
Increase with the writer segment distance and increase.
2. the method according to claim 1, wherein described arrange data segment according to the first inquiry request
Index is calculated the first ranking results and includes:
According to first inquiry request, inverted index is carried out to each segmentation, the first ranking results are calculated.
3. the method according to claim 1, wherein publisher's state that the basis refreshes in real time is in caching
First ranking results carry out forward index calculate with to first ranking results carry out real-time update include:
When there is a situation where that document is deleted in the first ranking results of the data segment, corresponding document is deleted from caching
It removes;
Forward index is carried out to first ranking results in caching according to the publisher's state refreshed in real time to calculate to institute
It states the first ranking results and carries out real-time update.
4. the method according to claim 1, wherein publisher's state include publisher user property or
The operation behavior of publisher.
5. method according to claim 1 to 4, which is characterized in that it is described according to publisher's state to slow
It is described after first ranking results deposited carry out forward index calculating to carry out real-time update to first ranking results
Method further include:
Result set inquiry is carried out in the caching according to the second inquiry request;
There are in the case where the result set in the caching, from result set described in the data acquisition in the caching;
In the caching be not present the result set in the case where, successively carry out inverted index calculate and forward index calculate with
Obtain the second ranking results.
6. a kind of index order updating device characterized by comprising
The first sequence is calculated for carrying out inverted index to data segment according to the first inquiry request in inverted index computing unit
As a result, and first ranking results are stored in caching;
Forward index computing unit, for carrying out forward index to first ranking results in caching according to publisher's state
It calculates to carry out real-time update to first ranking results;
Wherein, the multiple segmentations of the data segment point are managed, and generation in preset time range is stored in each segmentation
Data, the corresponding preset time range of each segmentation is different, and multiple segmentations include: writer segment and multiple
The corresponding preset time range of reader segment, the reader segment is with the reader segment
Increase with the writer segment distance and increase.
7. device according to claim 6, which is characterized in that the forward index computing unit is specifically used for:
When there is a situation where that document is deleted in the first ranking results of the data segment, corresponding document is deleted from caching
It removes;
Forward index is carried out to first ranking results in caching according to the publisher's state refreshed in real time to calculate to institute
It states the first ranking results and carries out real-time update.
8. the device according to any one of claim 6 to 7, which is characterized in that further include:
Query unit, for according to publisher's state in caching first ranking results carry out forward index calculate with
After carrying out real-time update to first ranking results, result set inquiry is carried out in the caching according to the second inquiry request;
Acquiring unit, in the caching there are in the case where the result set, from the data acquisition institute in the caching
State result set;In the caching be not present the result set in the case where, successively trigger the inverted index computing unit and
The forward index computing unit calculates with after forward index calculating carrying out inverted index, obtains the second ranking results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510125423.0A CN104765782B (en) | 2015-03-20 | 2015-03-20 | A kind of index order update method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510125423.0A CN104765782B (en) | 2015-03-20 | 2015-03-20 | A kind of index order update method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104765782A CN104765782A (en) | 2015-07-08 |
CN104765782B true CN104765782B (en) | 2019-06-21 |
Family
ID=53647613
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510125423.0A Active CN104765782B (en) | 2015-03-20 | 2015-03-20 | A kind of index order update method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104765782B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105677813A (en) * | 2015-12-30 | 2016-06-15 | 五八有限公司 | Information display method and device |
CN106294691B (en) * | 2016-08-04 | 2020-03-03 | 广州交易猫信息技术有限公司 | List refreshing method and device and server |
CN110750535B (en) * | 2019-09-27 | 2024-02-02 | 上海麦克风文化传媒有限公司 | Ordering result updating method |
CN111787351B (en) * | 2020-07-01 | 2022-09-06 | 百度在线网络技术(北京)有限公司 | Information query method, device, equipment and computer storage medium |
CN116303140B (en) * | 2023-05-19 | 2023-08-29 | 珠海妙存科技有限公司 | Hardware-based sorting algorithm optimization method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102867070A (en) * | 2012-09-29 | 2013-01-09 | 瑞庭网络技术(上海)有限公司 | Method for updating cache of key-value distributed memory system |
CN103177117A (en) * | 2013-04-08 | 2013-06-26 | 北京奇虎科技有限公司 | Information index system and information index update method |
CN103218423A (en) * | 2013-04-02 | 2013-07-24 | 中国科学院信息工程研究所 | Data inquiry method and device |
CN103970853A (en) * | 2014-05-05 | 2014-08-06 | 浙江宇视科技有限公司 | Method and device for optimizing search engine |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100573520C (en) * | 2006-08-29 | 2009-12-23 | 国际商业机器公司 | For retrieval is carried out pretreated method and apparatus to a plurality of documents |
US9424351B2 (en) * | 2010-11-22 | 2016-08-23 | Microsoft Technology Licensing, Llc | Hybrid-distribution model for search engine indexes |
-
2015
- 2015-03-20 CN CN201510125423.0A patent/CN104765782B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102867070A (en) * | 2012-09-29 | 2013-01-09 | 瑞庭网络技术(上海)有限公司 | Method for updating cache of key-value distributed memory system |
CN103218423A (en) * | 2013-04-02 | 2013-07-24 | 中国科学院信息工程研究所 | Data inquiry method and device |
CN103177117A (en) * | 2013-04-08 | 2013-06-26 | 北京奇虎科技有限公司 | Information index system and information index update method |
CN103970853A (en) * | 2014-05-05 | 2014-08-06 | 浙江宇视科技有限公司 | Method and device for optimizing search engine |
Also Published As
Publication number | Publication date |
---|---|
CN104765782A (en) | 2015-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104765782B (en) | A kind of index order update method and device | |
CN104850572B (en) | HBase non-primary key index construct and querying method and its system | |
CN107103032B (en) | Mass data paging query method for avoiding global sequencing in distributed environment | |
CN100458779C (en) | Index and its extending and searching method | |
CN102890722B (en) | Indexing method applied to time sequence historical database | |
US9760636B1 (en) | Systems and methods for browsing historical content | |
CN112437916A (en) | Incremental clustering of database tables | |
CN105630864A (en) | Forced ordering of a dictionary storing row identifier values | |
US10565198B2 (en) | Bit vector search index using shards | |
CN104268295B (en) | A kind of data query method and device | |
Bender et al. | Exponential structures for efficient cache-oblivious algorithms | |
JP2017194778A (en) | Tuning device and method for relational database | |
CN103488684A (en) | Electricity reliability index rapid calculation method based on caching data multithread processing | |
US20140222828A1 (en) | Columnwise Storage of Point Data | |
CN106682042B (en) | A kind of relation data caching and querying method and device | |
TWI539306B (en) | Information delivery method, processing server and merge server | |
Huang et al. | Mining frequent and top-k high utility time interval-based events with duration patterns | |
CN110162522A (en) | A kind of distributed data search system and method | |
CN104834719B (en) | Applied to the Database Systems under real-time big data scene | |
Leong Hou et al. | Durable top-k search in document archives | |
CN104536992B (en) | The expanding method and device of keyword | |
CN106484818A (en) | A kind of hierarchy clustering method based on Hadoop and HBase | |
CN106372123A (en) | Tag-based related content recommendation method and system | |
CN106372127B (en) | The diversity figure sort method of large-scale graph data based on Spark | |
CN109684331A (en) | A kind of object storage meta data management device and method based on Kudu |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |