Summary of the invention
The application's technical matters to be solved provides a kind of index upgrade method of ad data; In order to save resource, to prevent under the prerequisite of performance bottleneck; Guarantee the index real-time of sponsored search engine, establishment quickly and efficiently and renewal, thereby effectively improve the retrieval performance of sponsored search engine.
Accordingly, the application also provides a kind of index upgrade device of ad data, in order to guarantee application and the realization of said method in reality.
In order to address the above problem, the application discloses a kind of index upgrade method of ad data, comprising:
Master index step of updating according to first Preset Time is carried out at interval specifically comprises:
In count table, write down in the current advertising database ad data information recorded that update time is up-to-date;
According to when the update time of time master index, from current advertising database, extract first set of the ad data record that is in effective status;
Set up master index to said first set that is in the ad data record of effective status;
And,
After each master index step of updating executed, according to the increment index step of updating that second Preset Time is carried out at interval, wherein, said second Preset Time at interval less than first Preset Time at interval; Specifically comprise:
According to the ad data information recorded that is write down in the said count table, in advertising database, obtain newly-increased ad data record after the update time of this ad data record;
According to when the update time of time increment index,, extracts said newly-increased ad data second set of the ad data record that is in effective status from writing down;
Set up increment index to said second set that is in the ad data record of effective status.
Preferably, described method also comprises:
After the master index step of updating executes,, then this ad data information recorded is write document and revise in the vector if certain value in the preset attribute of certain bar ad data record changes in the advertising database;
Said master index step of updating also comprises:
Empty document and revise vector;
Said increment index step of updating also comprises:
Revise the ad data information recorded that writes down in the vector according to said document, from advertising database, extract the correspondent advertisement data recording;
Increment index set up in ad data record to said extraction.
Preferably, said master index step of updating also comprises:
To work as time master index of foundation backs up;
Said increment index step of updating also comprises:
To work as time increment index of foundation backs up.
Preferably, said ad data record has the term of validity, and said foundation is when the update time of time master index, and the step of from current advertising database, extracting first set of the ad data record that is in effective status comprises:
Travel through the ad data record in the current advertising database; Judge one by one in whether update time of time master index is during the term of validity of current this ad data record; If, judge that then this ad data is recorded as the ad data record that is in effective status, put into first set;
Said foundation is when the update time of time increment index, and the step of from said newly-increased ad data writes down, extracting second set of the ad data record that is in effective status comprises:
Travel through said newly-increased ad data record; Judge one by one in whether update time of time increment index is during the term of validity of current this ad data record; If, judge that then this ad data is recorded as the ad data record that is in effective status, put into second set.
Preferably, said master index step of updating also comprises:
Judge whether the size when the inferior master index of setting up exceeds predetermined threshold value, if not, then with in the said master index write memory;
Said increment index step of updating also comprises:
To work as in time increment index write memory of foundation.
Preferably, the preset attribute of said ad data record is the full-text search attribute, and said full-text search attribute is meant the attribute that is used for the branch word and search;
Described method also comprises:
If the non-full-text search attribute of certain bar ad data record changes in the advertising database, then this ad data of direct modification writes down the respective attributes value in manipulative indexing;
If certain bar ad data record is deleted in the advertising database, then certain property value in the non-full-text search attribute of this ad data record manipulative indexing be set to invalid.
Preferably, before the said substep of judging one by one in update time of time master index is whether during the term of validity of current this ad data record, also comprise following substep:
Convert preset form the update time that will work as time master index into;
Before the said substep of judging one by one in update time of time increment index is whether during the term of validity of current this ad data record, also comprise following substep:
Convert preset form the update time that will work as time increment index into.
Preferably; Said ad data record has corresponding ID; Up-to-date ad data writes down the maximum ad data record of corresponding ID value update time in the said current advertising database, and the said ad data information recorded that in count table, writes down is the ID that the maximum ad data of ID value writes down.
Disclosed herein as well is a kind of index upgrade system of ad data, comprising:
The master index update module is used for carrying out at interval according to first Preset Time renewal of master index, specifically comprises following submodule:
The count table record sub module is used in the current advertising database of count table record the ad data information recorded that update time is up-to-date;
First valid data extract submodule, are used for from current advertising database, extracting first set of the ad data record that is in effective status according to when the update time of time master index;
Submodule set up in master index, is used for setting up master index to said first set that is in the ad data record of effective status;
And,
The increment index update module, be used for each master index upgrade accomplish after, carry out the renewal of increment index at interval according to second Preset Time, wherein, said second Preset Time at interval less than first Preset Time at interval; Specifically comprise following submodule:
The count table reading submodule is used for according to the ad data information recorded that said count table write down, and in advertising database, obtains newly-increased ad data record after the update time of this ad data record;
Second valid data extract submodule, are used for from said newly-increased ad data writes down, extracting second set of the ad data record that is in effective status according to when the update time of time increment index;
First increment index is set up submodule, is used for setting up increment index to said second set that is in the ad data record of effective status.
Preferably, described system also comprises:
Document is revised vectorial logging modle, is used for after master index upgrades completion, when certain value in advertising database in the preset attribute of certain bar ad data record changes, this ad data information recorded is write document revise in the vector;
Said master index update module also comprises:
Empty submodule, be used to empty document and revise vector;
Said increment index update module also comprises:
The 3rd valid data extract submodule, are used for revising the ad data information recorded that vector writes down according to said document, from advertising database, extract the correspondent advertisement data recording;
Second increment index is set up submodule, is used for setting up increment index to the ad data record of said extraction.
Compared with prior art, the application comprises following advantage:
The application only sets up index to the ad data record that is in effective status, can effectively reduce the size of index itself, improves the speed of upgrading index.
Moreover the application adopts the index stores scheme of two-stage storage, makes full use of the memory index as far as possible.For increment index, directly increment index is stored in the internal memory.For master index, at first a threshold value is set, when the size of master index is no more than this threshold value according to memory size; Master index is stored in the internal memory, and the index file read or write speed that is stored in the internal memory is very fast, can improve retrieval rate significantly; Further promoted retrieval performance
The application only to revise be the full-text search attribute of ad data record the time reconstruct index entry; Adopt document to revise the ID that vector is preserved the ad data record that is modified; When upgrading increment index,, also to inquire the corresponding advertisement data recording according to revising the vectorial ID that preserves except inquiring about the ad data record that increases newly after those master index updated time; Set up increment index to these ad data records, thereby improved the speed of upgrading index when revising document.
Embodiment
For above-mentioned purpose, the feature and advantage that make the application can be more obviously understandable, the application is done further detailed explanation below in conjunction with accompanying drawing and embodiment.
With reference to Fig. 1, show the flow chart of steps of the index upgrade method embodiment 1 of a kind of ad data of the application, specifically can comprise the steps:
Step 11, according to the master index step of updating that first Preset Time is carried out at interval, specifically can comprise following substep:
Substep 111, in count table the record current advertising database in, the ad data information recorded that update time is up-to-date;
As the concrete a kind of example used of the application embodiment; Said ad data record has corresponding ID; As 245 or 298; Up-to-date ad data writes down the maximum ad data record of corresponding ID value update time in the said current advertising database, that is to say that the said ad data information recorded that in count table, writes down is the ID that the maximum ad data of ID value writes down.
In the application embodiment; In the said current advertising database update time its place of up-to-date ad data record timing node can be used to define the border of master index and increment index institute data query; The data of setting up master index inquiry are those update times of ad data records before the intermediate node at this moment, and the data of setting up the increment index inquiry are those update times of ad data records after the intermediate node at this moment.In reality; The ID of ad data record that promptly can the ID value is maximum is as the upper limit of master index inquiry ID; The data of promptly setting up the master index inquiry are those ID values ad data records smaller or equal to this maximum ID value upper limit; And the ID of ad data that can the ID value is maximum record is as the lower limit of increment index inquiry ID, and the data of promptly setting up the increment index inquiry are those values ID value ad data records greater than this maximum ID value lower limit.
In concrete the realization, count table can leave in the disk with document form, is storing the ID of the maximum ad data record of current I D value, is used for the auxiliary index of setting up.Count table is created when system deployment, need not in system's operational process, to repeat to create.
Substep 112, according to when the update time of time master index, from current advertising database, extract first set of the ad data record that is in effective status;
Need to prove; Sponsored search engine towards ad data be different from the document data of general search engine index; The displaying of ad data has waiting, and only the ad data in waiting is only effectively, that is to say; Indication ad data record all has the corresponding term of validity among the application, and the term of validity that writes down like certain bar ad data is from November 11,2011 11 days to 2011 September in.For a long time, those skilled in the art concentrate on crawler technology optimization, the result for retrieval optimization sorting to the research that the search engine retrieving performance improves; The several aspects of retrieval server performance boost, and the application concentrates on the characteristic that ad data has the term of validity, design is only set up index to the ad data that is in effective status; Be the present invention overcome those skilled in the art must be in crawler technology optimization, the result for retrieval optimization sorting, improved technological prejudice is made in the several aspects of retrieval server performance boost; And directly the source data of index is changed; And need not to change or extra increase hardware, just effectively reduced the collection of invalid data, reduced the size of index itself; Improve the speed of upgrading index, thereby promoted the retrieval performance of sponsored search engine.
Based on the traffic performance of ad data, still the Pending The Entry Into Force or the advertisement of having lost efficacy should be by index.In the application embodiment,, when setting up or upgrade index, can judge whether the ad data record is in effective status according to this traffic performance.
As the concrete a kind of example used of the application embodiment, said substep 112 specifically can be operated as follows:
Travel through the ad data record in the current advertising database; Judge one by one in whether update time of time master index is during the term of validity of current this ad data record; If, judge that then this ad data is recorded as the ad data record that is in effective status, put into first set.
In a kind of preferred embodiment of application; Can be before judging in whether update time of time master index is during the term of validity of current this ad data record; Be converted into preset form the update time that will work as time master index earlier; Such as, be converted into the second number that 32 signless integers are represented from past midnight January 1 1970 Greenwich mean time.For example, be 2011-9-21 when the update time of inferior master index, be behind the conversion form: 1316534400.
For example, be: 2011-9-22 morning, 2011-9-22 is converted into 1316620800 when the update time of inferior master index;
Ad data record in the current advertising database is shown in following table 1:
ad_id |
title |
word |
region_id |
start_date |
end_date |
234 |
The family of mobile phone |
Mobile phone |
10 |
1316448000 |
1316880000 |
260 |
The notebook sale monopoly | Notebook | |
12?13?19 |
1316448000 |
1316534400 |
298 |
IPAD appoints you to choose |
IPAD |
10?12?20 |
1316534400 |
1316880000 |
In the last table; Ad_id is meant the corresponding ID value of ad data record; Title is meant the title of ad data record, and word is meant the keyword of ad data record, and region_id is meant the input region ID of ad data record; Start_date is meant the zero-time of the ad data record term of validity, and end_date is meant the termination time of the ad data record term of validity.
In this example; According to update time 1316620800 when time master index; Judge this numerical value whether between the start_date and end_date of ad data record, find that ID is that 234 and 298 ad data record satisfies condition, promptly these two ad datas write down and are in effective status.First set of from the advertising database shown in the table 1, extracting the ad data record that is in effective status comprises that then ad_id is 234 and 298 ad data record.
In the process of queries ad data, can exclude invalid ad data record through this step, thereby can reduce the size of index itself, improve the speed of upgrading index.
Substep 113, set up master index to first set of the said ad data record that is in effective status;
In a kind of preferred embodiment of the application; Can adopt inverted index to realize index mechanism; Promptly adopt the index structure of inverted index structure as master index, first set according to step substep 112 obtains realizes the renewal of master index through the mode of rebuilding master index.
The corresponding relation of general index is the correspondence of from " number of documents " to " the document all speech ".Arrange rope the other way around, become from " speech " to " all number of documents that this speech occurs " this relation, thus can be apace through word and search to all documents that these speech occur.In the practical application, usually also can comprise information such as number of times that speech occurs and particular location in the inverted index in document.Retrieval for ease, inverted list is normally orderly.
Below be giving an example of inverted index:
Be provided with two pieces of articles 1 and 2:
The content of article 1 is: Tom lives in Guangzhou, I live in Guangzhou too.
The content of article 2 is: He once lived in Shanghai.
1) at first we will obtain the keyword of these two pieces of articles. and we need take to handle to connect as follows to execute usually:
A. we have plenty of article content now, i.e. character string, and we will find out all words in the character string, i.e. participle earlier.English word is owing to use space-separated, relatively good processing.Between the Chinese word is the special word segmentation processing of needs that connects together.
B. in the article " in ", " once " speech such as ' too ' ' do not have any practical significance, in the Chinese " " words such as " being " do not have concrete implication usually yet, on behalf of the speech of notion, these to filter out.
C. the user hope to look into usually ' He " time can be containing " he ", the article of " HE " is also found out, so capital and small letter need be unified in all words.
Can be when d. the user hopes to look into " live " usually containing " lives ", the article of " lived " is also found out, so need " lives ", " lived ' ' is reduced into " live ".
E. the punctuation mark in the article is not represented certain conception of species usually, can filter out yet.
Through after the top processing, all keywords of article 1 are: [tom] [live] [guangzhou] [i] [live] [guangzhou].
All keywords of article 2 are: [he] [live] [shanghai].
2) keyword has been arranged after, we just can set up inverted index.Above corresponding relation be: " article number " is to " all keywords in the article ".Inverted index turns this relation around, becomes: " keyword " is to " have all articles of this keyword number ".Article 1,2 is through becoming behind the row:
Usually only know keyword occurs not enough in which article; We also need know the position of keyword occurrence number and appearance in article; Two kinds of positions are arranged usually: a) character position, promptly write down this speech and be which character in the article (advantage be keyword bright when apparent the location fast); B) keyword position, promptly writing down this speech is which keyword in the article (advantage is to practice thrift index space, phrase (phase) inquiry soon).
After adding " frequency of occurrences " and " position occurring " information, our index structure becomes:
The position appears in keyword article number [frequency of occurrences]:
We explain that this structure: live has occurred 2 times in article 1 with this behavior of live example, occurred once in the article 2, its appearance position be " 2; 5,2 " this what is represented? We need combine the article number and the frequency of occurrences to analyze, and have occurred in the article 12 times; So " 2; 5 " Just represent two positions that live occurs in article l, occurred once in the article 2 that remaining " 2 " just represent that live is the 2nd key word in the article 2.
Inverted index is to set up indexes applications mode the most widely at present, and it has good performance for the inquiry that with the word is the basis.
Step 12, after each master index step of updating executes, according to the increment index step of updating that second Preset Time is carried out at interval, wherein, said second Preset Time is at interval less than first Preset Time at interval;
In the application embodiment, the master index storage is to the index data of the advertising record that changes before the updated time of setting, and the increment index storage is to the index data of the advertising record that changes after the updated time of setting.For example; Can updated time be set at morning every day; Like this in the same day, the master index storage be the index data that had been stored in effective advertisement of advertisement base before morning on the same day, and the increment index storage be inserted after morning on the same day, the index data of deletion or effective advertisement of upgrading.Be automatically to rebuild a master index morning every day, when master index create accomplish after, start renewal to increment index, in the same day at regular intervals, as rebuilding an increment index in 3 minutes automatically.
Said step 12 specifically can comprise following substep:
The ad data information recorded that is write down in substep 121, the said count table of foundation is obtained newly-increased ad data record after the update time of this ad data record in advertising database;
In the implementation of substep 111, can write an ad data information recorded; In a kind of concrete application of the application embodiment; This that writes an ad data information recorded is the ID of this ad data record, promptly can store the ID of an ad data record in the count table, in reality; Each carry out the ID that master index upgrades the up-to-date ad data record that the operation meeting will obtain and deposit count table in, the value of writing formerly will be covered by the new value of writing.
In the application embodiment; The timing node at its place of ad data record of writing down in the said count table can be used to define the border of master index and increment index institute data query; The data of setting up master index inquiry are those update times of ad data records before the intermediate node at this moment, and the data of setting up the increment index inquiry are those update times of ad data records after the intermediate node at this moment.In reality; The ID of ad data record that promptly can the ID value is maximum is as the upper limit of master index inquiry ID; The data of promptly setting up the master index inquiry are those ID values ad data records smaller or equal to this maximum ID value upper limit; And the ID of ad data that can the ID value is maximum record is as the lower limit of increment index inquiry ID, and the data of promptly setting up the increment index inquiry are those values ID value ad data records greater than this maximum ID value lower limit.
Substep 122, according to when the update time of time increment index, second set of from said newly-increased ad data record, extracting the ad data record that is in effective status;
As the concrete a kind of example used of the application, said substep 122 specifically can be operated as follows:
Travel through said newly-increased ad data record; Judge one by one in whether update time of time increment index is during the term of validity of current this ad data record; If, judge that then this ad data is recorded as the ad data record that is in effective status, put into second set.
In a kind of preferred embodiment of application; Can be before judging in whether update time of time increment index is during the term of validity of current this ad data record; Be converted into preset form the update time that will work as time increment index earlier; Such as, be converted into the second number that 32 signless integers are represented from past midnight January 1 1970 Greenwich mean time.For example, be 2011-9-22 when the update time of inferior increment index, be behind the conversion form: 1316620800.
For example, be when the update time of inferior increment index: 22 days 13 September in 2011 point, 2011-9-22 is converted into 1316620800;
Ad data record in the current advertising database is shown in following table 2:
ad_id |
title |
word |
region_id |
start_date |
end_date |
234 |
The window of mobile phone |
Mobile phone |
1012 |
1316448000 |
1316880000 |
260 |
The notebook sale monopoly |
Notebook |
121319 |
1316448000 |
1316534400 |
298 |
IPAD appoints you to choose |
IPAD |
10?12?20 |
1316534400 |
1316880000 |
310 |
The IPhone sale monopoly | IPhone | |
12?20?21 |
1316620800 |
1316880000 |
In the last table; Ad_id is meant the corresponding ID value of ad data record; Title is meant the title of ad data record, and word is meant the keyword of ad data record, and region_id is meant the input region ID of ad data record; Start_date is meant the zero-time of the ad data record term of validity, and end_date is meant the termination time of the ad data record term of validity.
In this example; According to update time 1316620800 when time master index; Judge this numerical value whether between the start_date and end_date of ad data record, find that ID is that 310 ad data record satisfies condition, promptly this ad data record is in effective status.Second set of from the advertising database shown in the table 2, extracting the ad data record that is in effective status comprises that then ad_id is 310 ad data record.
In the process of queries ad data, can exclude invalid ad data record through this step, thereby can reduce the size of index itself, improve the speed of upgrading index.
Substep 123, set up increment index to second set of the said ad data record that is in effective status.
In a kind of preferred embodiment of the application, can adopt the index structure of inverted index structure as increment index, the ad data set of records ends according to substep 122 obtains realizes the renewal of increment index through the mode of rebuilding increment index.
As a kind of preferred embodiment of the application, said master index step of updating can also comprise the steps:
Judge whether the size when the inferior master index of setting up exceeds predetermined threshold value, if not, then with in the said master index write memory;
Said increment index step of updating can also comprise the steps:
To work as in time increment index write memory of foundation.
In order further to promote retrieval performance, the application adopts the index stores scheme of two-stage storage.For increment index, can directly increment index be stored in the internal memory.For master index, can at first a threshold value be set according to memory size, when the size of master index is no more than this threshold value, master index is stored in the internal memory, the index file read or write speed that is stored in the internal memory is very fast, can improve retrieval rate significantly.In reality, when the size of master index surpasses preset threshold, then master index is stored in the disk.
Particularly, web page search engine need be handled the web data of magnanimity because the indexed data amount is huge, can't be in internal memory with index stores, general way be with index stores in disk.In this application, because increment index only reflects the ad data that changed the same day, so the increment index capacity is little, can be placed on fully in the internal memory; And for master index, the number of ads that the application need handle is at most in 1,000,000 ranks, and its scale is much smaller than the webpage quantity of web page search engine retrieval.Therefore, master index is placed in the internal memory is feasible to the application.For example, can in memory file system "/dev/shm ", create the bibliographic structure that index file is deposited, increment index and the master index that meets predetermined threshold value size are write under this bibliographic structure.
In order to guarantee consistance, prevent that the index file that is stored in the internal memory cuts off the power supply at machine, the situation of losing when perhaps system restarts can be safeguarded corresponding backup in disk.Thereby in concrete the realization, said master index step of updating 11 can also comprise the steps:
To work as time master index of foundation backs up;
Said increment index step of updating 12 can also comprise the steps:
To work as time increment index of foundation backs up.
In concrete the application, in the system deployment stage, the storage directory that can in disk, create master index is used for the master index backup, when master index upgrade accomplish after, store in the catalogue of master index during up-to-date master index backed up to disk.Simultaneously, in the system deployment stage, the storage directory that can in disk, create increment index is used for the increment index backup, when increment index upgrade accomplish after, during up-to-date increment index backed up to disk in the catalogue of storage increment index.
With reference to figure 2, show the flow chart of steps of index upgrade method embodiment 2 of a kind of ad data of the application, specifically can comprise the steps:
Step 21, according to the master index step of updating that first Preset Time is carried out at interval, specifically can comprise following substep:
Substep 211, in count table the record current advertising database in, the ad data information recorded that update time is up-to-date;
Substep 212, according to when the update time of time master index, from current advertising database, extract first set of the ad data record that is in effective status;
Substep 213, set up master index to first set of the said ad data record that is in effective status;
Substep 214, empty document and revise vector;
Substep 215, will work as time the master index of setting up and back up;
And,
Step 22, after each master index step of updating executes, according to the increment index step of updating that second Preset Time is carried out at interval, wherein, said second Preset Time is at interval less than first Preset Time at interval; Specifically can comprise following substep:
The ad data information recorded that is write down in substep 221, the said count table of foundation is obtained newly-increased ad data record after the update time of this ad data record in advertising database;
Substep 222, according to when the update time of time increment index, second set of from said newly-increased ad data record, extracting the ad data record that is in effective status;
Substep 223, the said document of foundation are revised the ad data information recorded that writes down in the vector, from advertising database, extract the correspondent advertisement data recording;
Increment index set up in substep 224, the ad data record that extracts to substep 222 and substep 223;
Substep 225, will work as time the increment index of setting up and back up.
In reality, after the master index step of updating executes,, then can this ad data information recorded be write document and revise in the vector if certain value in the preset attribute of certain bar ad data record changes in the advertising database.Suppose to rebuild a master index automatically morning every day, after completion created in master index, start renewal increment index; In the same day at regular intervals; Like increment index of automatic reconstruction in 3 minutes, that then preserve in the document modification vector is the ID that the same day, the interior ad data of revising write down, after last update finishes master index; In case revised certain value in the preset attribute of certain bar ad data record, then the ID with this ad data record of bar deposits in the document modification vector.When completion set up in up-to-date master index; And the ID corresponding advertisement data recording of preserving in the document modification vector has been updated in the master index as the data source of master index; Therefore the ID that need not to revise in the vector according to document again sets up increment index, and promptly document is revised the advertisement ID that preserves in the vector and lost meaning at this moment.Revise vector through emptying document, can guarantee document revise vector all the time record be the ID of the ad data record that was modified in the same day, simultaneously can save memory.
In concrete the realization, can adopt elongated array to realize document modification vector, elongated storage of array is in internal memory.Revise the ID that the ad data in the vector writes down according to said document, can from advertising database, inquire the corresponding advertisement data recording, thereby can set up increment index to these ad datas records.
The corresponding a plurality of attribute fields of each document in the index.As the example of a kind of concrete application of the application embodiment, the preset attribute of said ad data record can be the full-text search attribute, and said full-text search attribute is meant the attribute that is used for the branch word and search; For example, the full-text search attribute can comprise title, description and keyword etc.
In the concrete application of the application embodiment, can also comprise the steps:
If the non-full-text search attribute of certain bar ad data record changes in the advertising database, then this ad data of direct modification writes down the respective attributes value in manipulative indexing.
Particularly, said non-full-text search attribute can refer to the attribute as filtercondition.For example, non-full-text search attribute can comprise advertisement the input region, throw in size etc.
Adopt the application embodiment, when revising document, operate by following two kinds of situation:
(1) be the attribute of the non-full-text search of certain bar ad data record when what revise, the respective attributes value in this ad data record manipulative indexing of direct modification gets final product so;
For example; When what revise is the non-full-text search attributes such as input region of advertisement; Only need find the corresponding index entry of this advertisement in the index,, can directly upgrade the association attributes in the index entry according to the modification demand owing to comprise all non-full-text search attributes of corresponding advertisement in the index entry.
(2) be the attribute of the full-text search of certain bar ad data record when what revise, then adopt document to revise the ID that vector is preserved the ad data record that is modified.
When revise be the full-text search attributes such as title of advertisement the time, need the reconstruct index entry.The method that the application adopts is to use document to revise the ID that vector is preserved the ad data record that is modified.Modification to the full-text search attribute will be reflected in the increment index; When upgrading increment index; Except inquiring about the ad data record that increases newly after those master index updated time; Also to revise the vectorial ID that preserves and inquire the corresponding advertisement data recording, set up increment index to these ad datas records according to document.
For example; The advertisement putting ground Domain Properties region_id that supposes to have revised advertisement 234 in the table 2 is " 1012 ", is non-full-text search attribute owing to what revise, so the region_id attribute in the index entry that this ad data record is corresponding in the direct modification index gets final product; Be " window of mobile phone " if revised the title attribute of advertisement 234; Because this attribute is the full-text search attribute, add ID234 to document and revise in the vector, wait by the time when upgrading increment index next time; The more new technological process of increment index will read 234 corresponding advertisement data recording, and the variation of advertisement 234 generations will be embodied in the up-to-date increment index like this.
In the concrete application of the application embodiment, can also comprise the steps:
If certain bar ad data record is deleted in the advertising database, then certain property value in the non-full-text search attribute of this ad data record manipulative indexing be set to invalid.
Adopt the application embodiment; When deletion ad data record; With waiting that deleting certain corresponding attribute of ad data record is changed to illegal value; Simultaneously when retrieval with this attribute field as filtercondition, illegal property value corresponding advertisement data recording will be filtered, and promptly be implemented in the effect that " deletion " ad data writes down in the index.
For example, the index entry of safeguarding in the index comprises attributes such as advertisement putting region, advertisement putting state, advertisement putting size.When deletion ad data record; The commit condition that ad data that can be to be deleted is recorded in the index in the corresponding index entry is set to disarmed state, simultaneously with the advertisement putting state as filtercondition, like this; When retrieval; Those advertisement putting states are that invalid index entry can't satisfy filtercondition, and the index entry corresponding advertisement data recording that is filtered can not occur in result for retrieval, have promptly realized the effect of deletion document.Certainly, attributes such as advertisement putting region or advertisement putting size also can be set similarly realize identical functions, the application need not this to limit.
Need to prove that the application embodiment does not rebulid or upgrades increment index during upgrading master index, but by the time master index upgrade accomplish after, upgrade increment index again.
For method embodiment, for simple description, so it all is expressed as a series of combination of actions; But those skilled in the art should know; The application does not receive the restriction of described sequence of movement, because according to the application, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in the instructions all belongs to preferred embodiment, and related action might not be that the application is necessary.
With reference to figure 3, show the structured flowchart of index upgrade system embodiment of a kind of ad data of the application, specifically can comprise like lower module:
Master index update module 31 is used for carrying out at interval according to first Preset Time renewal of master index, specifically comprises following submodule:
Count table record sub module 311 is used in the current advertising database of count table record the ad data information recorded that update time is up-to-date;
First valid data extract submodule 312, are used for from current advertising database, extracting first set of the ad data record that is in effective status according to when the update time of time master index;
Submodule 312 set up in master index, is used for setting up master index to said first set that is in the ad data record of effective status;
And,
Increment index update module 32, be used for each master index upgrade accomplish after, carry out the renewal of increment index at interval according to second Preset Time, wherein, said second Preset Time at interval less than first Preset Time at interval; Specifically comprise following submodule:
Count table reading submodule 321 is used for according to the ad data information recorded that said count table write down, and in advertising database, obtains newly-increased ad data record after the update time of this ad data record;
Second valid data extract submodule 322, are used for from said newly-increased ad data writes down, extracting second set of the ad data record that is in effective status according to when the update time of time increment index;
First increment index is set up submodule 323, is used for setting up increment index to said second set that is in the ad data record of effective status.
In concrete the application; Said ad data record has corresponding ID; Up-to-date ad data writes down the maximum ad data record of corresponding ID value update time in the said current advertising database, and the said ad data information recorded that in count table, writes down is the ID that the maximum ad data of ID value writes down.
In a kind of preferred embodiment of the application, can also comprise like lower module:
Document is revised vectorial logging modle, is used for after master index upgrades completion, when certain value in advertising database in the preset attribute of certain bar ad data record changes, this ad data information recorded is write document revise in the vector;
Said master index update module 31 can also comprise following submodule:
Empty submodule, be used to empty document and revise vector;
Said increment index update module 32 can also comprise following submodule:
The 3rd valid data extract submodule, are used for revising the ad data information recorded that vector writes down according to said document, from advertising database, extract the correspondent advertisement data recording;
Second increment index is set up submodule, is used for setting up increment index to the ad data record of said extraction.
In concrete the realization, said master index update module 31 can also comprise following submodule:
Master index backup submodule is used for backing up when time master index of foundation;
Said increment index update module 32 can also comprise following submodule:
Increment index backup submodule is used for backing up when time increment index of foundation.
As the example of a kind of concrete application of the application embodiment, said ad data record has the term of validity, and said first valid data extract submodule 312 and specifically can comprise like lower unit:
The first traversal judging unit; Be used for traveling through the ad data record of current advertising database; Judge one by one in whether update time of time master index is during the term of validity of current this ad data record; If, judge that then this ad data is recorded as the ad data record that is in effective status, put into first set;
Said second valid data extract submodule 322 and specifically can comprise like lower unit:
The second traversal judging unit; Be used to travel through said newly-increased ad data record; Judge one by one in whether update time of time increment index is during the term of validity of current this ad data record; If, judge that then this ad data is recorded as the ad data record that is in effective status, put into second set.
In a kind of preferred embodiment of the application, said first valid data extract submodule 312 and can also comprise like lower unit:
First converting unit is used for converting preset form the update time when time master index into, with being sent to the first traversal judging unit update time when time master index after the conversion;
Said second valid data extract submodule 322 and can also comprise like lower unit:
Second converting unit is used for converting preset form the update time when time increment index into, with being sent to the second traversal judging unit update time when time increment index after the conversion.
For improving the retrieval performance of sponsored search engine, said master index update module 31 can also comprise following submodule:
The master index internal memory writes submodule, is used to judge whether the size when the inferior master index of setting up exceeds predetermined threshold value, if not, and then with in the said master index write memory;
Said increment index update module 32 can also comprise following submodule:
The increment index internal memory writes submodule, is used for the increment index write memory when inferior foundation.
In concrete the application, the preset attribute of said ad data record is the full-text search attribute, and said full-text search attribute is meant the attribute that is used for the branch word and search; Described system can also comprise like lower module:
Revise processing module, be used for when the non-full-text search attribute of advertising database bar ad data record changes, the respective attributes value in this ad data record manipulative indexing of direct modification;
The deletion processing module is used for being deleted at advertising database bar ad data record, then certain property value in the non-full-text search attribute of this ad data record manipulative indexing be set to invalid.
For system embodiment, because it is similar basically with method embodiment, so description is fairly simple, relevant part gets final product referring to the part explanation of method embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed all is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.
The application can be used in numerous general or special purpose computingasystem environment or the configuration.For example: personal computer, server computer, handheld device or portable set, plate equipment, multicomputer system, the system based on microprocessor, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, comprise DCE of above any system or equipment or the like.
The application can describe in the general context of the computer executable instructions of being carried out by computing machine, for example program module.Usually, program module comprises the routine carrying out particular task or realize particular abstract, program, object, assembly, data structure or the like.Also can in DCE, put into practice the application, in these DCEs, by through communication network connected teleprocessing equipment execute the task.In DCE, program module can be arranged in this locality and the remote computer storage medium that comprises memory device.
At last; Also need to prove; In this article; Relational terms such as first and second grades only is used for an entity or operation are made a distinction with another entity or operation, and not necessarily requires or hint relation or the order that has any this reality between these entities or the operation.And; Term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability; Thereby make and comprise that process, method, article or the equipment of a series of key elements not only comprise those key elements; But also comprise other key elements of clearly not listing, or also be included as this process, method, article or equipment intrinsic key element.Under the situation that do not having much more more restrictions, the key element that limits by statement " comprising ... ", and be not precluded within process, method, article or the equipment that comprises said key element and also have other identical element.
More than to the index upgrade method of a kind of ad data that the application provided and a kind of index upgrade system of ad data; Carried out detailed introduction; Used concrete example among this paper the application's principle and embodiment are set forth, the explanation of above embodiment just is used to help to understand the application's method and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to the application's thought, the part that on embodiment and range of application, all can change, in sum, this description should not be construed as the restriction to the application.