Summary of the invention
Technical problems to be solved in this application are to provide a kind of index updating method of ad data, in order at saving resource, prevent under the prerequisite of performance bottleneck, the index that ensures sponsored search engine creates in real time, quickly and efficiently and upgrades, thereby effectively improves the retrieval performance of sponsored search engine.
Accordingly, the application also provides a kind of index upgrade device of ad data, in order to ensure said method application and realization in practice.
In order to address the above problem, the application discloses a kind of index updating method of ad data, comprising:
The master index step of updating of carrying out according to the first Preset Time interval, specifically comprises:
In count table, record in current advertising database the information that update time, up-to-date ad data recorded;
According to the update time when time master index, from current advertising database, extract the first set of the ad data record in effective status;
Master index is set up in the first set for the described ad data record in effective status;
And,
After each master index step of updating executes, the increment index step of updating of carrying out according to the second Preset Time interval, wherein, described the second Preset Time interval is less than the first Preset Time interval; Specifically comprise:
According to the information of the ad data record recording in described count table, in advertising database, obtain newly-increased ad data record after the update time of this ad data record;
According to the update time when time increment index, from described newly-increased ad data record, extract the second set of the ad data record in effective status;
Increment index is set up in the second set for the described ad data record in effective status;
Described master index step of updating also comprises:
Judgement is worked as time size of the master index of foundation and whether is exceeded predetermined threshold value, if not, and by described master index write memory;
Described increment index step of updating also comprises:
By in the increment index write memory when inferior foundation.
Preferably, described method, also comprises:
After master index step of updating executes, if certain value in the default attribute of certain ad data record changes in advertising database, the information of this ad data record is write in document amendment vector;
Described master index step of updating also comprises:
Empty document amendment vector;
Described increment index step of updating also comprises:
According to the information of the ad data record recording in described document amendment vector, from advertising database, extract corresponding ad data record;
Described the second set for the ad data record in effective status is set up increment index step and is comprised: increment index set up in the second set and the described corresponding ad data record extracting from advertising database for the described ad data record in effective status.
Preferably, described master index step of updating also comprises:
To back up when time master index of foundation;
Described increment index step of updating also comprises:
To back up when time increment index of foundation.
Preferably, described ad data record has the term of validity, and described foundation is when the update time of time master index, and the step of extracting the first set of the ad data record in effective status from current advertising database comprises:
Travel through the ad data record in current advertising database, judgement is within whether update time of time master index is during the term of validity of current this ad data record one by one, if so, judge that this ad data is recorded as the ad data record in effective status, puts into the first set;
Described foundation is when the update time of time increment index, and the step of extracting the second set of the ad data record in effective status from described newly-increased ad data record comprises:
Travel through described newly-increased ad data record, judgement is within whether update time of time increment index is during the term of validity of current this ad data record one by one, if so, judge that this ad data is recorded as the ad data record in effective status, puts into the second set.
Preferably, the default attribute of described ad data record is full-text search attribute, and described full-text search attribute refers to the attribute for index in classification;
Described method also comprises:
If the non-full-text search attribute of certain ad data record changes in advertising database, directly revise this ad data and record the respective attributes value in manipulative indexing;
If in advertising database certain ad data record deleted, this ad data record certain property value in the non-full-text search attribute of manipulative indexing be set to invalid.
Preferably, before the sub-step in described judgement one by one within update time of time master index is whether during the term of validity of current this ad data record, also comprise following sub-step:
Default form will be converted to when the update time of time master index;
Before sub-step in described judgement one by one within update time of time increment index is whether during the term of validity of current this ad data record, also comprise following sub-step:
Default form will be converted to when the update time of time increment index.
Preferably, described ad data record has corresponding ID, in described current advertising database, up-to-date ad data records the maximum ad data record of corresponding ID value update time, and the information of the described ad data record recording in count table is the ID of the maximum ad data record of ID value.
The index upgrade system that disclosed herein as well is a kind of ad data, comprising:
Master index update module, for carry out the renewal of master index according to the first Preset Time interval, specifically comprises following submodule:
Count table record sub module, for recording at count table in current advertising database, the information that update time, up-to-date ad data recorded;
The first valid data extract submodule, for the update time according to when time master index, extract the first set of the ad data record in effective status from current advertising database;
Submodule set up in master index, for setting up master index for the first set of the described ad data record in effective status;
And,
Increment index update module, for after each master index has upgraded, carries out the renewal of increment index according to the second Preset Time interval, wherein, described the second Preset Time interval is less than the first Preset Time interval; Specifically comprise following submodule:
Count table reading submodule, the information recording for the ad data recording according to described count table is obtained newly-increased ad data record after the update time of this ad data record in advertising database;
The second valid data extract submodule, for the update time according to when time increment index, extract the second set of the ad data record in effective status from described newly-increased ad data record;
The first increment index is set up submodule, for setting up increment index for the second set of the described ad data record in effective status;
Index upgrade module also comprises following submodule:
Master index internal memory writes submodule, for judging whether work as time size of the master index of foundation exceeds predetermined threshold value, if not, by described master index write memory;
Described increment index update module also comprises following submodule:
Increment index internal memory writes submodule, for working as time increment index write memory of foundation.
Preferably, described system, also comprises:
Document is revised vectorial logging modle, after having upgraded at master index, when certain value in advertising database in the default attribute of certain ad data record changes, the information of this ad data record is write in document amendment vector;
Described master index update module also comprises:
Empty submodule, for emptying document amendment vector;
Described increment index update module also comprises:
The 3rd valid data extract submodule, and the information recording for the ad data recording according to described document amendment vector is extracted corresponding ad data record from advertising database;
Described the first increment index is set up submodule, sets up increment index for the second set for the described ad data record in effective status and the described corresponding ad data record extracting from advertising database.
Compared with prior art, the application comprises following advantage:
The application only sets up index for the ad data record in effective status, can effectively reduce the size of index itself, improves the speed of upgrading index.
Moreover the application adopts the index stores scheme of two-level memory, makes full use of as far as possible memory index.For increment index, directly increment index is stored in internal memory.For master index, first according to memory size, a threshold value is set, in the time that the size of master index is no more than this threshold value, master index is stored in internal memory, is stored in index file read or write speed in internal memory very fast, can improve significantly retrieval rate, further promote retrieval performance
The application only for amendment be the full-text search attribute of ad data record time reconstruct index entry, adopt document amendment vector to preserve the ID of the ad data record being modified, in the time upgrading increment index, except inquiring about those master index updated time newly-increased ad data record afterwards, the ID that also will preserve according to amendment vector inquires corresponding ad data record, set up increment index for these ad data records, thereby while having improved amendment document, upgrade the speed of index.
Embodiment
For the above-mentioned purpose, the feature and advantage that make the application can become apparent more, below in conjunction with the drawings and specific embodiments, the application is described in further detail.
With reference to Fig. 1, show the flow chart of steps of the index updating method embodiment 1 of a kind of ad data of the application, specifically can comprise the steps:
Step 11, the master index step of updating of carrying out according to the first Preset Time interval, specifically can comprise following sub-step:
Sub-step 111, in count table, record in current advertising database, update time up-to-date ad data record information;
As a kind of example of the concrete application of the embodiment of the present application, described ad data record has corresponding ID, as 245 or 298, in described current advertising database, up-to-date ad data records the maximum ad data record of corresponding ID value update time, that is to say, the information of the described ad data record recording in count table is the ID of the maximum ad data record of ID value.
In the embodiment of the present application, in described current advertising database update time up-to-date ad data record the timing node at its place can be for defining the border of master index and increment index institute data query, the data of setting up master index inquiry are those update times of ad data records before intermediate node at this moment, and the data of setting up increment index inquiry are those update times of ad data records after intermediate node at this moment.In practice, the upper limit that the ID of ad data record maximum ID value can be inquired about to ID as master index, the data of setting up master index inquiry are ad data records that those ID values are less than or equal to this maximum ID value upper limit, and lower limit that can be using the ID of maximum ID value ad data record as increment index inquiry ID, the data of setting up increment index inquiry are ad data records that those values ID value is greater than this maximum ID value lower limit.
In specific implementation, count table can leave in disk with document form, is storing the ID of the maximum ad data record of current I D value, for the auxiliary index of setting up.Count table creates in the time that system is disposed, and creates without repeating in system operational process.
Sub-step 112, foundation are when the update time of time master index, and from current advertising database, first of the ad data record of extraction in effective status gathers;
It should be noted that, sponsored search engine towards ad data be different from the document data of general search engine index, the displaying of ad data has waiting, only the ad data in waiting is only effectively, that is to say, in the application, indication ad data record all has the corresponding term of validity, if the term of validity of certain ad data record is from November 11,11 days to 2011 September in 2011.For a long time, the research that those skilled in the art improve search engine retrieving performance concentrates on crawler technology optimization, result for retrieval Optimal scheduling, the several aspects of retrieval server performance boost, and the application concentrates in the characteristic that ad data has the term of validity, design is only set up index to the ad data in effective status, be the present invention overcome those skilled in the art must be in crawler technology optimization, result for retrieval Optimal scheduling, the technology prejudice that the several aspects of retrieval server performance boost make improvements, and directly the source data of index is changed, and without changing or additionally increase hardware, just effectively reduced the collection of invalid data, reduce the size of index itself, improve the speed of upgrading index, thereby promote the retrieval performance of sponsored search engine.
Based on the traffic performance of ad data, the advertisement that not yet comes into force or lost efficacy should be not indexed.In the embodiment of the present application, according to this traffic performance, in the time setting up or upgrade index, can judge that whether ad data records in effective status.
As a kind of example of the concrete application of the embodiment of the present application, described sub-step 112 specifically can operate as follows:
Travel through the ad data record in current advertising database, judgement is within whether update time of time master index is during the term of validity of current this ad data record one by one, if so, judge that this ad data is recorded as the ad data record in effective status, puts into the first set.
In a kind of preferred embodiment of application, can be before judgement be within whether update time of time master index is during the term of validity of current this ad data record, first default form will be converted into when the update time of time master index, such as, be converted into the number of seconds of passing by that 32 signless integers represent from midnight January 1 1970 Greenwich mean time.For example, be 2011-9-21 when the update time of inferior master index, after conversion form, be: 1316534400.
For example, when the update time of inferior master index be: 2011-9-22 morning, 2011-9-22 is converted into 1316620800;
Ad data in current advertising database records as shown in the following Table 1:
ad_id |
title |
word |
region_id |
start_date |
end_date |
234 |
The family of mobile phone |
Mobile phone |
10 |
1316448000 |
1316880000 |
260 |
Notebook sale monopoly |
Notebook |
121319 |
1316448000 |
1316534400 |
298 |
IPAD appoints you to choose |
IPAD |
101220 |
1316534400 |
1316880000 |
In upper table, ad_id refers to that ad data records corresponding ID value, title refers to the title of ad data record, word refers to the keyword of ad data record, region_id refers to the input region ID of ad data record, start_date refers to that ad data records the initial time of the term of validity, and end_date refers to that ad data records the termination time of the term of validity.
In this example, according to the update time 1316620800 when time master index, judge that this numerical value is whether between the start_date and end_date of ad data record, find that ID is that 234 and 298 ad data record satisfies condition, these two ad datas record in effective status.First set of extracting the ad data record in effective status from the advertising database shown in table 1 comprises that ad_id is 234 and 298 ad data record.
In the process of queries ad data, can exclude invalid ad data record by this step, thereby can reduce the size of index itself, improve the speed of upgrading index.
Master index is set up in sub-step 113, first set of recording for the described ad data in effective status;
In a preferred embodiment of the present application, can adopt inverted index to realize Indexing Mechanism, adopt the index structure of inverted index structure as master index, the first set obtaining according to step sub-step 112, realizes the renewal of master index by rebuilding the mode of master index.
The corresponding relation of general index is the correspondence of from " number of documents " to " document all word ".Arrange rope this relation the other way around, become from ' ' word ' ' to " occurring all number of documents of this word ", thus can be rapidly by word and search to all documents that occur these words.In practical application, in inverted index, conventionally also can comprise the information such as number of times and particular location that word occurs in document.For convenient search, inverted list is normally orderly.
Below giving an example of inverted index:
Be provided with two sections of articles 1 and 2:
The content of article 1 is: Tom lives in Guangzhou, I live in Guangzhou too.
The content of article 2 is: He once lived in Shanghai.
1) first we will obtain the keyword of these two sections of articles. and we need to take to process to connect as follows to execute conventionally:
A. we have plenty of article content now, i.e. a character string, and we first will find out all words in character string, i.e. participle.English word is owing to using space-separated, relatively good processing.Between Chinese word, it is the special word segmentation processing of needs connecting together.
B. in article " in ", the words such as " once " ' too' ' do not have any practical significance, in Chinese " " word such as "Yes" is conventionally also without concrete meaning, these words that do not represent concept can filter out.
C. user conventionally wish to look into ' He " time can be containing " he ", the article of " HE " is also found out, so capital and small letter need to be unified in all words.
D. can be containing " lives " when user conventionally wishes to look into " live ", the article of " lived " is also found out, so need handle ' ' lives ", ' ' lived' ' be reduced into ' ' live ".
E. the punctuation mark in article does not represent certain conception of species conventionally, can filter out yet.
After processing above, all keywords of article 1 are: [tom] [live] [guangzhou] [i] [live] [guangzhou].
All keywords of article 2 are: [he] [live] [shanghai].
2) had after keyword, we just can set up inverted index.Corresponding relation is above: " article number " is to " all keywords in article ".Inverted index turns this relation around, becomes: " keyword " is to " having all articles number of this keyword ".Article 1,2 becomes after the row of falling:
Conventionally only know keyword occurs not enough in which article, we also need to know the position of keyword occurrence number and appearance in article, conventionally have two kinds of positions: a) character position, record this word and be which character in article (advantage be keyword bright when aobvious location fast); B) keyword position, recording this word is which keyword in article (advantage is to save index space, phrase (phase) inquiry soon).
Add " frequency of occurrences " and " there is position ' ' after information, our index structure becomes:
There is position in keyword article number [frequency of occurrences]:
With this behavior example of live, we illustrate that this structure: live has occurred 2 times in article 1, in article 2, occur once, what does is its appearance position that " 2; 5,2 " this represents? we need to analyze with the frequency of occurrences in conjunction with article number, have occurred 2 times in article 1, so " 2; 5 " just represent two positions that live occurs in article l, in article 2, occurred once, remaining " 2 " just represent that live is the 2nd key word in article 2.
Inverted index is to set up at present indexes applications mode the most widely, and it is for having good performance taking word as basic inquiry.
Step 12, after each master index step of updating executes, the increment index step of updating of carrying out according to the second Preset Time interval, wherein, described the second Preset Time interval is less than the first Preset Time interval;
In the embodiment of the present application, master index storage is for the index data of the advertising record changing before the updated time of setting, and increment index storage is for the index data of the advertising record changing after the updated time of setting.For example, updated time can be set as to morning every day, like this in the same day, master index storage be the index data that had been stored in effective advertisement of advertisement base before morning on the same day, and increment index storage is to insert, and deleted after morning on the same day or the index data of effective advertisement of renewal.Be master index of automatic Reconstruction in morning every day, after master index has created, start renewal to increment index, in the same day at regular intervals, as increment index of 3 minutes automatic Reconstructions.
Described step 12 specifically can comprise following sub-step:
Sub-step 121, the information recording according to the ad data recording in described count table are obtained newly-increased ad data record after the update time of this ad data record in advertising database;
In the implementation of sub-step 111, can write the information of an ad data record, in the concrete application of one of the embodiment of the present application, the information of this writing ad data record is the ID of this ad data record, be the ID that can store an ad data record in count table, in practice, each master index of carrying out upgrades operation meeting and deposits the ID of the up-to-date ad data record obtaining in count table, and the value of writing formerly will be covered by the new value of writing.
In the embodiment of the present application, the ad data recording in described count table records the timing node at its place can be for defining the border of master index and increment index institute data query, the data of setting up master index inquiry are those update times of ad data records before intermediate node at this moment, and the data of setting up increment index inquiry are those update times of ad data records after intermediate node at this moment.In practice, the upper limit that the ID of ad data record maximum ID value can be inquired about to ID as master index, the data of setting up master index inquiry are ad data records that those ID values are less than or equal to this maximum ID value upper limit, and lower limit that can be using the ID of maximum ID value ad data record as increment index inquiry ID, the data of setting up increment index inquiry are ad data records that those values ID value is greater than this maximum ID value lower limit.
Sub-step 122, foundation, when the update time of time increment index, are extracted the second set that the ad data in effective status records from described newly-increased ad data record;
As a kind of example of the concrete application of the application, described sub-step 122 specifically can operate as follows:
Travel through described newly-increased ad data record, judgement is within whether update time of time increment index is during the term of validity of current this ad data record one by one, if so, judge that this ad data is recorded as the ad data record in effective status, puts into the second set.
In a kind of preferred embodiment of application, can be before judgement be within whether update time of time increment index is during the term of validity of current this ad data record, first default form will be converted into when the update time of time increment index, such as, be converted into the number of seconds of passing by that 32 signless integers represent from midnight January 1 1970 Greenwich mean time.For example, be 2011-9-22 when the update time of inferior increment index, after conversion form, be: 1316620800.
For example, when the update time of inferior increment index be: 22 days 13 September in 2011 point, 2011-9-22 is converted into 1316620800;
Ad data in current advertising database records as shown in the following Table 2:
ad_id |
title |
word |
region_id |
start_date |
end_date |
234 |
The window of mobile phone |
Mobile phone |
1012 |
1316448000 |
1316880000 |
260 |
Notebook sale monopoly |
Notebook |
121319 |
1316448000 |
1316534400 |
298 |
IPAD appoints you to choose |
IPAD |
101220 |
1316534400 |
1316880000 |
310 |
IPhone sale monopoly |
IPhone |
122021 |
1316620800 |
1316880000 |
In upper table, ad_id refers to that ad data records corresponding ID value, title refers to the title of ad data record, word refers to the keyword of ad data record, region_id refers to the input region ID of ad data record, start_date refers to that ad data records the initial time of the term of validity, and end_date refers to that ad data records the termination time of the term of validity.
In this example, according to the update time 1316620800 when time master index, judge that this numerical value whether between the start_date and end_date of ad data record, finds that the ad data record that ID is 310 satisfies condition, this ad data records in effective status.Second set of extracting the ad data record in effective status from the advertising database shown in table 2 comprises that ad_id is 310 ad data record.
In the process of queries ad data, can exclude invalid ad data record by this step, thereby can reduce the size of index itself, improve the speed of upgrading index.
Increment index is set up in sub-step 123, second set of recording for the described ad data in effective status.
In a preferred embodiment of the present application, can adopt the index structure of inverted index structure as increment index, the ad data set of records ends obtaining according to sub-step 122, realizes the renewal of increment index by rebuilding the mode of increment index.
As a preferred embodiment of the present application, described master index step of updating can also comprise the steps:
Judgement is worked as time size of the master index of foundation and whether is exceeded predetermined threshold value, if not, and by described master index write memory;
Described increment index step of updating can also comprise the steps:
By in the increment index write memory when inferior foundation.
In order further to promote retrieval performance, the application adopts the index stores scheme of two-level memory.For increment index, can directly increment index be stored in internal memory.For master index, can first according to memory size, a threshold value be set, in the time that the size of master index is no more than this threshold value, master index is stored in internal memory, be stored in index file read or write speed in internal memory very fast, can improve significantly retrieval rate.In practice, in the time that the size of master index exceedes the threshold value of setting, master index is stored in disk.
Particularly, web page search engine need to be processed the web data of magnanimity because the data volume of index is huge, cannot be by index stores in internal memory, general way be by index stores in disk.In this application, because increment index only reflects the ad data changing the same day, therefore increment index capacity is little, can be placed on completely in internal memory; And for master index, the application needs number of ads to be processed at most in 1,000,000 ranks, its scale is much smaller than the webpage quantity of web page search engine retrieval.Therefore, master index is placed in internal memory is feasible to the application.For example, can in memory file system "/dev/shm ", create the bibliographic structure that index file is deposited, increment index and the master index that meets predetermined threshold value size are write under this bibliographic structure.
In order to ensure consistance, prevent to be stored in index file in internal memory in machine power-off, or the system situation of losing while restarting, can in disk, safeguard accordingly and back up.Thereby in specific implementation, described master index step of updating 11 can also comprise the steps:
To back up when time master index of foundation;
Described increment index step of updating 12 can also comprise the steps:
To back up when time increment index of foundation.
In concrete application, at system deployment phase, the storage directory that can create master index in disk backs up for master index, after master index has upgraded, up-to-date master index is backed up to disk in the catalogue of middle storage master index.Meanwhile, at system deployment phase, the storage directory that can create increment index in disk backs up for increment index, after increment index has upgraded, up-to-date increment index is backed up to disk in the catalogue of middle storage increment index.
With reference to figure 2, show the flow chart of steps of the index updating method embodiment 2 of a kind of ad data of the application, specifically can comprise the steps:
Step 21, the master index step of updating of carrying out according to the first Preset Time interval, specifically can comprise following sub-step:
Sub-step 211, in count table, record in current advertising database, update time up-to-date ad data record information;
Sub-step 212, foundation are when the update time of time master index, and from current advertising database, first of the ad data record of extraction in effective status gathers;
Master index is set up in sub-step 213, first set of recording for the described ad data in effective status;
Sub-step 214, empty document amendment vector;
Sub-step 215, by when time set up master index back up;
And,
Step 22, after each master index step of updating executes, the increment index step of updating of carrying out according to the second Preset Time interval, wherein, described the second Preset Time interval is less than the first Preset Time interval; Specifically can comprise following sub-step:
Sub-step 221, the information recording according to the ad data recording in described count table are obtained newly-increased ad data record after the update time of this ad data record in advertising database;
Sub-step 222, foundation, when the update time of time increment index, are extracted the second set that the ad data in effective status records from described newly-increased ad data record;
Sub-step 223, the information recording according to the ad data recording in described document amendment vector are extracted corresponding ad data record from advertising database;
Increment index set up in sub-step 224, the ad data record extracting for sub-step 222 and sub-step 223;
Sub-step 225, by when time set up increment index back up.
In practice, after master index step of updating executes, if certain value in the default attribute of certain ad data record changes in advertising database, the information of this ad data record can be write in document amendment vector.Suppose master index of automatic Reconstruction in morning every day, after master index has created, start the renewal to increment index, in the same day at regular intervals, as increment index of 3 minutes automatic Reconstructions, that in document amendment vector, preserve is the ID of the ad data record of amendment in the same day, after the complete master index of last update, once revise certain value in the default attribute of certain ad data record, deposited the ID of this ad data record of bar in document and revise in vector.When up-to-date master index has been set up, and ad data record corresponding to ID of preserving in document amendment vector has been updated in master index as the data source of master index, therefore without setting up increment index according to the ID in document amendment vector again, i.e. the advertisement ID preserving in document amendment vector has now lost meaning.By emptying document amendment vector, can ensure document amendment vector all the time record be the ID of the ad data record that was modified in the same day, can save internal memory simultaneously.
In specific implementation, can adopt elongated array to realize document amendment vector, elongated storage of array is in internal memory.According to the ID of the ad data record in described document amendment vector, can from advertising database, inquire corresponding ad data record, thereby can set up increment index for these ad data records.
The corresponding multiple attribute fields of each document in index.As the example of a kind of concrete application of the embodiment of the present application, the default attribute of described ad data record can be full-text search attribute, and described full-text search attribute refers to the attribute for index in classification; For example, full-text search attribute can comprise title, description and keyword etc.
In the concrete application of the embodiment of the present application, can also comprise the steps:
If the non-full-text search attribute of certain ad data record changes in advertising database, directly revise this ad data and record the respective attributes value in manipulative indexing.
Particularly, described non-full-text search attribute can refer to the attribute as filtercondition.For example, non-full-text search attribute can comprise input region, the input size etc. of advertisement.
Adopt the embodiment of the present application, in the time of amendment document, operate by following two kinds of situations:
(1) be the attribute of the non-full-text search of certain ad data record when what revise, so directly revise this ad data and record the respective attributes value in manipulative indexing;
For example, when amendment is the non-full-text search attribute such as input region of advertisement, only need find index entry corresponding to this advertisement in index, owing to comprising all non-full-text search attributes of corresponding advertisement in index entry, can directly upgrade the association attributes in index entry according to amendment demand.
(2) be the attribute of the full-text search of certain ad data record when what revise, the ID that the ad data that adopts the preservation of document amendment vector to be modified records.
When amendment be the full-text search attribute such as title of advertisement time, need reconstruct index entry.The method that the application adopts is to use document amendment vector to preserve the ID of the ad data record being modified.To be reflected in increment index the amendment of full-text search attribute, in the time upgrading increment index, except inquiring about those master index updated time newly-increased ad data record afterwards, the ID that also will preserve according to document amendment vector inquires corresponding ad data record, sets up increment index for these ad data records.
For example, the advertisement putting Regional Property region_id that supposes to have revised advertisement 234 in table 2 is " 1012 ", due to amendment is non-full-text search attribute, record the region_id attribute in corresponding index entry so directly revise this ad data in index, if having revised the title attribute of advertisement 234 is " window of mobile phone ", because this attribute is full-text search attribute, ID234 is added in document amendment vector, Deng until while upgrading increment index next time, the more new technological process of increment index will read the ad data record of 234 correspondences, the variation that advertisement 234 occurs like this will be embodied in up-to-date increment index.
In the concrete application of the embodiment of the present application, can also comprise the steps:
If in advertising database certain ad data record deleted, this ad data record certain property value in the non-full-text search attribute of manipulative indexing be set to invalid.
Adopt the embodiment of the present application, in the time that deletion ad data records, certain attribute corresponding to ad data record to be deleted is set to illegal value, simultaneously in the time of retrieval using this attribute field as filtercondition, the ad data record that illegal property value is corresponding will be filtered, and realize the effect of " deletion " ad data record in index.
For example, the index entry of safeguarding in index comprises the attributes such as advertisement putting region, advertisement putting state, advertisement putting size.In the time that deletion ad data records, the commit condition can ad data to be deleted being recorded in index entry corresponding in index is set to disarmed state, simultaneously using advertisement putting state as filtercondition, like this, in the time of retrieval, those advertisement putting states are that invalid index entry cannot meet filtercondition, and ad data record corresponding to index entry being filtered can not occur in result for retrieval, realized the effect of deleting document.Certainly, the attribute such as advertisement putting region or advertisement putting size also can be set similarly and realize identical function, the application to this without being limited.
It should be noted that, the embodiment of the present application, during upgrading master index, does not re-establish or upgrades increment index, but after master index has upgraded by the time, then upgrade increment index.
For embodiment of the method, for simple description, therefore it is all expressed as to a series of combination of actions, but those skilled in the art should know, the application is not subject to the restriction of described sequence of movement, because according to the application, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and related action might not be that the application is necessary.
With reference to figure 3, show the structured flowchart of the index upgrade system embodiment of a kind of ad data of the application, specifically can comprise as lower module:
Master index update module 31, for carry out the renewal of master index according to the first Preset Time interval, specifically comprises following submodule:
Count table record sub module 311, for recording at count table in current advertising database, the information that update time, up-to-date ad data recorded;
The first valid data extract submodule 312, for the update time according to when time master index, extract the first set of the ad data record in effective status from current advertising database;
Submodule 312 set up in master index, for setting up master index for the first set of the described ad data record in effective status;
And,
Increment index update module 32, for after each master index has upgraded, carries out the renewal of increment index according to the second Preset Time interval, wherein, described the second Preset Time interval is less than the first Preset Time interval; Specifically comprise following submodule:
Count table reading submodule 321, the information recording for the ad data recording according to described count table is obtained newly-increased ad data record after the update time of this ad data record in advertising database;
The second valid data extract submodule 322, for the update time according to when time increment index, extract the second set of the ad data record in effective status from described newly-increased ad data record;
The first increment index is set up submodule 323, for setting up increment index for the second set of the described ad data record in effective status.
In concrete application, described ad data record has corresponding ID, in described current advertising database, up-to-date ad data records the maximum ad data record of corresponding ID value update time, and the information of the described ad data record recording in count table is the ID of the maximum ad data record of ID value.
In a preferred embodiment of the present application, can also comprise as lower module:
Document is revised vectorial logging modle, after having upgraded at master index, when certain value in advertising database in the default attribute of certain ad data record changes, the information of this ad data record is write in document amendment vector;
Described master index update module 31 can also comprise following submodule:
Empty submodule, for emptying document amendment vector;
Described increment index update module 32 can also comprise following submodule:
The 3rd valid data extract submodule, and the information recording for the ad data recording according to described document amendment vector is extracted corresponding ad data record from advertising database;
Described the first increment index is set up submodule 323, also sets up increment index for the second set of the ad data record in effective status and the corresponding ad data record extracting from advertising database.
In specific implementation, described master index update module 31 can also comprise following submodule:
Master index backup submodule, for backing up when time master index of foundation;
Described increment index update module 32 can also comprise following submodule:
Increment index backup submodule, for backing up when time increment index of foundation.
As the example of a kind of concrete application of the embodiment of the present application, described ad data record has the term of validity, and described the first valid data extract submodule 312 and specifically can comprise as lower unit:
The first traversal judging unit, for traveling through the ad data record of current advertising database, judgement is within whether update time of time master index is during the term of validity of current this ad data record one by one, if, judge that this ad data is recorded as the ad data record in effective status, puts into the first set;
Described the second valid data extract submodule 322 and specifically can comprise as lower unit:
The second Ergodic judgement unit, for traveling through described newly-increased ad data record, judgement is within whether update time of time increment index is during the term of validity of current this ad data record one by one, if, judge that this ad data is recorded as the ad data record in effective status, puts into the second set.
In a preferred embodiment of the present application, described the first valid data extract submodule 312 and can also comprise as lower unit:
The first converting unit, for being converted to when the update time of time master index default form, will be sent to the first traversal judging unit update time when time master index after conversion;
Described the second valid data extract submodule 322 and can also comprise as lower unit:
The second converting unit, for being converted to when the update time of time increment index default form, will be sent to the second Ergodic judgement unit update time when time increment index after conversion.
For improving the retrieval performance of sponsored search engine, described master index update module 31 can also comprise following submodule:
Master index internal memory writes submodule, for judging whether work as time size of the master index of foundation exceeds predetermined threshold value, if not, by described master index write memory;
Described increment index update module 32 can also comprise following submodule:
Increment index internal memory writes submodule, for working as time increment index write memory of foundation.
In concrete application, the default attribute of described ad data record is full-text search attribute, and described full-text search attribute refers to the attribute for index in classification; Described system can also comprise as lower module:
Amendment processing module, in the time that the non-full-text search attribute of an advertising database ad data record changes, directly revises this ad data and records the respective attributes value in manipulative indexing;
Delete processing module, deleted for recording at an advertising database ad data, this ad data record certain property value in the non-full-text search attribute of manipulative indexing be set to invalid.
For system embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part is referring to the part explanation of embodiment of the method.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is and the difference of other embodiment, between each embodiment identical similar part mutually referring to.
The application can be used in numerous general or special purpose computingasystem environment or configuration.For example: personal computer, server computer, handheld device or portable set, laptop device, multicomputer system, system based on microprocessor, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, the distributed computing environment that comprises above any system or equipment etc.
The application can describe in the general context of computer executable instructions, for example program module.Usually, program module comprises and carries out particular task or realize routine, program, object, assembly, data structure of particular abstract data type etc.Also can in distributed computing environment, put into practice the application, in these distributed computing environment, be executed the task by the teleprocessing equipment being connected by communication network.In distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium including memory device.
Finally, also it should be noted that, in this article, relational terms such as the first and second grades is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply and between these entities or operation, have the relation of any this reality or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby the process, method, article or the equipment that make to comprise a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or be also included as the intrinsic key element of this process, method, article or equipment.The in the situation that of more restrictions not, the key element being limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment that comprises described key element and also have other identical element.
The index upgrade system of the index updating method of a kind of ad data above the application being provided and a kind of ad data, be described in detail, applied principle and the embodiment of specific case to the application herein and set forth, the explanation of above embodiment is just for helping to understand the application's method and core concept thereof; , for one of ordinary skill in the art, according to the application's thought, all will change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application meanwhile.