CN105468637A - Database updating method and apparatus - Google Patents
Database updating method and apparatus Download PDFInfo
- Publication number
- CN105468637A CN105468637A CN201410453679.XA CN201410453679A CN105468637A CN 105468637 A CN105468637 A CN 105468637A CN 201410453679 A CN201410453679 A CN 201410453679A CN 105468637 A CN105468637 A CN 105468637A
- Authority
- CN
- China
- Prior art keywords
- value
- target web
- sid
- index mark
- mark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Embodiments of the present invention disclose a database updating method and apparatus, and relate to the technical field of software. The method comprises: acquiring a webpage parameter of a target webpage; according to an identification of a target website corresponding to a target webpage, determining a data table corresponding to the target website in a to-be-updated database, wherein the number of the data table comprised in the to-be-updated database is larger than 1; according to the identification of the target webpage in the target website, determining whether a data record corresponding to the target webpage exists in the data table corresponding to the target website, wherein the data record comprises: the identification of the target webpage in the target website and an index identification corresponding to the target webpage; if not, generating the index identification corresponding to the target webpage; and adding the data record corresponding to the target webpage in the data table corresponding to the target website. The database is updated by using the scheme provided by the embodiments of the present invention, so that the speed of updating the database can be improved.
Description
Technical field
The present invention relates to software technology field, particularly a kind of database update method and device.
Background technology
Along with the fast development of Internet technology, the information on internet is more and more abundanter, therefore, and the information required for increasing user is searched for by search engine.
Each search engine generally obtains the info web of target web by web crawlers, and is updated in corresponding database by obtained info web, thinks that user provides and more fully searches for information.Wherein, web crawlers, referring to can according to certain rule, the program of the automatic capturing network information or script.
In prior art, when search engine upgrades the info web of the target web that spider obtains in associated databases, general elder generation judges whether there is the data record corresponding with this target web in database according to the website information etc. of target web, if do not exist, data record corresponding to target web is increased after the last item data record stored, wherein, pieces of data record is generally stored in a tables of data of database.
When in database, information is less, application aforesaid way can upgrade by fulfillment database fast, but, along with spider obtain webpage info web get more and more, the data record that database comprises also gets more and more, therefore, at every turn more new database time, judge whether to exist in database the time that data record corresponding to a certain webpage need more and more longer, and then the slowing of each more new database.
Summary of the invention
The embodiment of the invention discloses a kind of database update method and device, to improve the speed of more new database.
For achieving the above object, the embodiment of the invention discloses a kind of database update method, described method comprises:
Obtain the webpage parameter of target web, wherein, described webpage parameter comprises: the mark of the targeted website that described target web is corresponding and the mark of described target web in described targeted website;
According to the mark of targeted website corresponding to described target web, in database to be updated, determine the tables of data corresponding with described targeted website, wherein, in described database to be updated comprise the quantity > 1 of tables of data;
According to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, wherein, described data record comprises: the mark of described target web in described targeted website and index mark corresponding to described target web;
If do not exist, generate the index mark that described target web is corresponding; The data record that described target web is corresponding is increased in the tables of data that described targeted website is corresponding.
Optionally, before the index mark that the described target web of described generation is corresponding, also comprise:
Index mark is generated marker bit and is set to lock-out state, wherein, when described index mark generation marker bit is lock-out state, represents current and can only calculate index mark corresponding to described target web;
According to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web;
If do not exist, then perform the step of index mark corresponding to the described target web of described generation;
After the index mark that the described target web of described generation is corresponding, also comprise:
Index is identified generation marker bit and be set to released state, wherein, when described index mark generation marker bit is released state, represent that the index that can start calculating other webpages except described target web corresponding identifies.
Optionally, the index mark that the described target web of described generation is corresponding, comprising:
Obtain the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated
max;
Calculate Value_SID
maxdivided by the remainder R of the first threshold preset;
Judge whether remainder R is less than default Second Threshold;
If yes, then the Second Threshold that index mark=R+ corresponding to described target web presets is calculated;
Otherwise, calculate the 3rd threshold value that index mark=R+ corresponding to described target web presets.
Optionally, the maximal value Value_SID of the index mark that the data record stored in the described database to be updated of described acquisition is corresponding
max, comprising:
According to the index ident value Value_SID_L that this locality stores, obtain the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated
max;
Wherein, after starting refresh routine corresponding to described database to be updated, the Value_SID_L that this locality stores is set to: Value_SID_D
maxvalue, Value_SID_D
maxfor the maximal value of index mark corresponding to the data record stored in the database described to be updated that stores in described database to be updated;
Described, the Value_SID_L that this locality stores is set to: Value_SID_D
maxafter value, also comprise:
By Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value;
After calculating index mark corresponding to described target web, also comprise:
Value_SID_L value is updated to the index mark that described target web is corresponding;
Judge whether the index mark that described target web is corresponding is greater than Value_SID_D
max;
If yes, then by Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value;
After terminating refresh routine corresponding to described database to be updated, also comprise:
By Value_SID_D
maxbe updated to: Value_SID_D
max=Value_SID_L.
Optionally, after the index mark that the described target web of described generation is corresponding, also comprise:
The index mark generated is sent to webcrawler module or non-described database to be updated.
For achieving the above object, the embodiment of the invention discloses a kind of database update device, described device comprises:
Webpage gain of parameter module, for obtaining the webpage parameter of target web, wherein, described webpage parameter comprises: the mark of the targeted website that described target web is corresponding and the mark of described target web in described targeted website;
Tables of data determination module, for the mark according to targeted website corresponding to described target web, in database to be updated, determine the tables of data corresponding with described targeted website, wherein, in described database to be updated comprise the quantity > 1 of tables of data;
First data record judge module, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, wherein, described data record comprises: the mark of described target web in described targeted website and index mark corresponding to described target web;
Index identifier generation module, for when the judged result of described first data record judge module is no, generates the index mark that described target web is corresponding;
Data record increases module, for after described index identifier generation module generating indexes mark, increases the data record that described target web is corresponding in the tables of data that described targeted website is corresponding.
Optionally, described database update device also comprises:
Lock-out state arranges module, is set to lock-out state for index mark is generated marker bit, wherein, when described index mark generation marker bit is lock-out state, represents current and can only calculate index mark corresponding to described target web;
Second data record judge module, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, if do not exist, then trigger described index identifier generation module generating indexes mark;
Released state arranges module, for after described index identifier generation module generating indexes mark, index mark is generated marker bit and is set to released state, wherein, when described index mark generation marker bit is released state, represents and can start index mark corresponding to calculating other webpages except described target web.
Optionally, described index identifier generation module, comprising:
Index mark maximal value obtains submodule, for obtaining the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated
max;
Remainder calculating sub module, for calculating Value_SID
maxdivided by the remainder R of the first threshold preset;
Remainder judges submodule, for judging whether remainder R is less than default Second Threshold;
First index mark calculating sub module, for when described remainder judges the judged result of submodule for being, calculates the Second Threshold that index mark=R+ corresponding to described target web presets;
Second index mark calculating sub module, for judging that the judged result of submodule is no at described remainder, calculates the 3rd threshold value that index mark=R+ corresponding to described target web presets.
Optionally, described index mark maximal value obtains submodule, specifically for the index ident value Value_SID_L stored according to this locality, obtains the maximal value Value_SID that index corresponding to the data record that stored in described database to be updated identifies
max;
Described database update device also comprises:
First index mark arranges module, for after starting refresh routine corresponding to described database to be updated, is set to by the Value_SID_L that this locality stores: Value_SID_D
maxvalue, Value_SID_D
maxfor the maximal value of index mark corresponding to the data record stored in the database described to be updated that stores in described database to be updated;
Second index mark arranges module, after the index stored for arranging module installation this locality in described first index mark identifies, by Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value;
First index identification renewal module, after calculating index mark corresponding to described target web in described first index mark calculating sub module or described second index mark calculating sub module, Value_SID_L value is updated to the index mark that described target web is corresponding;
Index mark judge module, for judging whether the index mark that described target web is corresponding is greater than Value_SID_D
max;
Second index identification renewal module, for when the judged result of described index mark judge module is for being, then by Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value;
3rd index identification renewal module, for after terminating refresh routine corresponding to described database to be updated, by Value_SID_D
maxbe updated to: Value_SID_D
max=Value_SID_L.
Optionally, described database update device also comprises:
Index mark sending module, for after described index identifier generation module generating indexes mark, sends the index mark generated to webcrawler module or non-described database to be updated.
As seen from the above, in the scheme that the embodiment of the present invention provides, after the webpage parameter obtaining target web, judge in tables of data corresponding with targeted website in database to be updated, whether to there is data record corresponding to target web, when not existing, generate the index mark that target web is corresponding, and in the tables of data that targeted website is corresponding, increase data record corresponding to this target web.Compared with prior art, in the scheme that the embodiment of the present invention provides, data record corresponding for each webpage is stored in multiple tables of data, but not in a tables of data, therefore, when judging whether to exist in database to be updated data record corresponding to target web, only judge in the tables of data that target web is corresponding, and judge without the need in the full detail that comprises at database to be updated, therefore, it is possible to improve the speed of more new database.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The first schematic flow sheet of the database update method that Fig. 1 provides for the embodiment of the present invention;
The second schematic flow sheet of the database update method that Fig. 2 provides for the embodiment of the present invention;
The third schematic flow sheet of the database update method that Fig. 3 provides for the embodiment of the present invention;
4th kind of schematic flow sheet of the database update method that Fig. 4 provides for the embodiment of the present invention;
The first structural representation of the database update device that Fig. 5 provides for the embodiment of the present invention;
The second structural representation of the database update device that Fig. 6 provides for the embodiment of the present invention;
The third structural representation of the database update device that Fig. 7 provides for the embodiment of the present invention;
4th kind of structural representation of the database update device that Fig. 8 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
The first schematic flow sheet of the database update method that Fig. 1 provides for the embodiment of the present invention, the method comprises:
S101: the webpage parameter obtaining target web.
In practical application, search engine is in order to provide abundant search service to user, the relevant information of target web is generally obtained by webcrawler module, and according to the corresponding database of obtained information updating, wherein, the website information of target web is at least comprised in the information that webcrawler module obtains.
After webcrawler module obtains the relevant information of target web, search engine can according to the webpage parameter of obtained information acquisition target web, wherein, can comprise in above-mentioned webpage parameter: the mark of the targeted website that target web is corresponding and the target web mark in targeted website.
Those skilled in the art are understandable that, can obtain the mark of targeted website corresponding to target web according to the website information of target web.
Further, the mark of target web in targeted website can be the network address of target web, also can be the information after the network address removal targeted website mark of target web.
Such as: the network address of the target web that web crawlers obtains is: http://item.jd.com/1184892.html, being designated then according to targeted website corresponding to the known target web of this network address: http://item.jd.com, the information removed after the mark of targeted website is: 1184892.
S102: according to the mark of targeted website corresponding to target web, determine the tables of data corresponding with targeted website in database to be updated.
Because the info web obtained along with webcrawler module gets more and more, in database to be updated storage network page information tables of data in the data that store also just more and more.In addition, more first need judge whether to exist in database to be updated the relevant information of target web corresponding to information that webcrawler module obtains during new database, therefore, the data grows stored in tables of data along with storage network page information in database to be updated is many, judge whether that the speed goes of the relevant information that there is target web is slow, for improving the efficiency of more new database, can by the information in database according to certain rale store in different tables of data, such as, from the web storage of one or more website in a tables of data etc.
In view of the foregoing, in the present embodiment, in database to be updated comprise the quantity > 1 of tables of data.
In addition, in practical application, for improving the efficiency of more new database further, each tables of data comprised in database to be updated can be corresponding with the mark of a website respectively.
In a kind of application specifically, database to be updated can also be stored in distributed file system as the file of in distributed file system.
Wherein, distributed file system, refer to that the physical memory resources of file system management is not necessarily connected directly between on local memory device, but can be connected with network-side memory device by computer network, and then file is stored in each memory node, wherein, a memory device of network-side can be referred to as a memory node.
In one particular embodiment of the present invention, according to the mark of targeted website corresponding to target web, when determining the tables of data failure corresponding with targeted website in database to be updated, tables of data corresponding to targeted website can be created in database to be updated.After success creates tables of data corresponding to targeted website, perform the step generating index mark (S104) corresponding to target web.
S103: according to the mark of target web in targeted website, judges whether there is data record corresponding to target web in the tables of data that targeted website is corresponding, if do not exist, performs S104, otherwise, perform S106.
Wherein, can comprise in data record: the mark of target web in targeted website and index mark corresponding to target web, certainly, do not limit information included in data record in the application.
When user browses webpage according to the Search Results that search engine provides, the index mark that the webpage that will be able to browse according to user of search engine is corresponding provides the website information of this webpage to browser.
Such as: when user is by search engine search " web crawlers ", this search engine can provide many web page digest information relevant to " web crawlers " to user, wherein, the index mark that webpage is corresponding is comprised in web page digest information, but this index mark can be shown to user, also can not show to user; When user clicks a certain web page digest information, network engine obtains index mark corresponding to this webpage, and be identified in corresponding database according to obtained index and retrieve, obtain the website information of this webpage, finally the website information of this webpage is sent to browser, make browser according to the website information of this webpage to user's displayed web page.
A kind of embody rule scene of the above-mentioned just index mark that target web is corresponding, the application does not limit this.
S104: generate the index mark that target web is corresponding.
S105: increase the data record that target web is corresponding in the tables of data that targeted website is corresponding.
Preferably, in a kind of specific implementation of the present invention, after generating index mark corresponding to target web, can also comprise: send the index mark generated to webcrawler module or non-database to be updated.
In practical application, can be same database with database to be updated for providing the database of the website information of webpage to user, may not be same database, when not being same database, after generating index mark corresponding to target web, the index mark generated can be sent, to upgrade data record corresponding to target web in other databases to webcrawler module or non-database to be updated.
It should be noted that, webcrawler module also needs received identification information to send to corresponding database, to upgrade data record corresponding to target web in associated databases after receiving index mark corresponding to target web.
S106: process ends.
As seen from the above, in the scheme that the present embodiment provides, after the webpage parameter obtaining target web, judge in tables of data corresponding with targeted website in database to be updated, whether to there is data record corresponding to target web, when not existing, generate the index mark that target web is corresponding, and in the tables of data that targeted website is corresponding, increase data record corresponding to this target web.Compared with prior art, in the scheme that the present embodiment provides, data record corresponding for each webpage is stored in multiple tables of data, but not in a tables of data, therefore, when judging whether to exist in database to be updated data record corresponding to target web, only judge in the tables of data that target web is corresponding, and judge without the need in the full detail that comprises at database to be updated, therefore, it is possible to improve the speed of more new database.
In one particular embodiment of the present invention, see Fig. 2, provide the second schematic flow sheet of database update method, compared with previous embodiment, in the present embodiment,
Before generating index mark (S104) corresponding to target web, also comprise:
S107: index mark is generated marker bit and is set to lock-out state.
After starting refresh routine corresponding to database to be updated, webcrawler module repeatedly can obtain info web, therefore, the info web that search engine may obtain according to webcrawler module obtains the webpage parameter of multiple webpage, when there is not data record corresponding to each webpage in the corresponding data table judging database to be updated through S103, identical for preventing the index of each webpage generated according to S104 from identifying, the method of marker bit is generated by arranging index mark, once only allow the index marker position that calculating webpage is corresponding, that is: when index mark generation marker bit is lock-out state, represent current and can only calculate index mark corresponding to target web.
In practical application, index mark can be generated marker bit and be set to 1 expression lock-out state, be set to 0 mark released state, certainly, the application is just described for above-mentioned, does not limit the concrete numerical value representing lock-out state and released state.
S108: according to the mark of target web in targeted website, judges whether there is data record corresponding to target web in the tables of data that targeted website is corresponding, if do not exist, then perform S104.
In practical application, the info web that search engine may obtain according to webcrawler module obtains the webpage parameter of multiple webpage, then may there is the identical situation of two or more webpage in this multiple webpage, in this case, when judging whether to exist in the tables of data that targeted website is corresponding data record corresponding to target web according to S103, may judge do not exist, and when generating index corresponding to target web according to S104 and identifying, in the tables of data that targeted website is corresponding, but there is data record corresponding to target web.For above-mentioned situation, after index mark being generated marker bit and being set to lock-out state, according to the mark of target web in targeted website, need judge in the tables of data that targeted website is corresponding, whether to there is data record corresponding to target web.
After generating index mark (S104) corresponding to target web, also comprise:
S109: index mark is generated marker bit and is set to released state.
Wherein, when index mark generation marker bit is released state, represents and can start index mark corresponding to calculating other webpages except target web.
It should be noted that, the application does not limit the execution sequence of S109 and S105, and S109 can perform before S105, can perform after S105 yet, can also perform with S105 simultaneously.
As seen from the above, in the scheme that the present embodiment provides, before generating index mark corresponding to target web, the mode of marker bit is generated by arranging index mark, current index corresponding to target web that can only calculate is identified, and the index mark that can effectively prevent each webpage of generation corresponding repeats.
In another specific embodiment of the present invention, see Fig. 3, provide the third schematic flow sheet of database update method, compared with previous embodiment, in the present embodiment, generate index mark (S104) that target web is corresponding, comprising:
S1041: the maximal value Value_SID obtaining index mark corresponding to the data record that stored in database to be updated
max.
In practical application, Value_SID can be obtained from database to be updated
max.
Certainly, in order to prevent frequently reading the problems such as database work pressure to be updated that data cause is excessive from database to be updated, also can after starting refresh routine corresponding to database to be updated, the maximal value that index corresponding for the data record stored in database to be updated identifies is stored in this locality, during each execution S1041, the maximal value that the index corresponding according to the data record stored in the database to be updated that this locality stores identifies, obtains Value_SID
max.Specific implementation can refer to embodiment illustrated in fig. 4.
S1042: calculate Value_SID
maxdivided by the remainder R of the first threshold preset.
S1043: judge whether remainder R is less than default Second Threshold, if yes, performs S1044, otherwise, perform S1045.
S1044: calculate the Second Threshold that index mark=R+ corresponding to target web presets.
S1045: calculate the 3rd threshold value that index mark=R+ corresponding to target web presets.
Be described below by instantiation.
Suppose, the first threshold preset is 10000000, and the Second Threshold preset is 1000000, and the 3rd threshold value preset is 1.
S1041: obtain Value_SID
max=1000001;
S1042:1000001 is 1000001 divided by the remainder of 10000000, then R=1000001;
S1043:1000001 is greater than 1000000, then perform S1045;
S1045: index mark=1000001+1=1000002 that target web is corresponding.
In another specific embodiment of the present invention, see Fig. 4, provide the 4th kind of schematic flow sheet of database update method, compared with embodiment illustrated in fig. 3, in the present embodiment, the method also comprises:
S110: after starting refresh routine corresponding to database to be updated, the Value_SID_L that this locality stores is set to: Value_SID_D
maxvalue.
Wherein, Value_SID_D
maxfor the maximal value of index mark corresponding to the data record stored in the database to be updated that stores in database to be updated.
S111: by Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value.
Obtain the maximal value Value_SID of index mark corresponding to the data record that stored in database to be updated
max(S1041), comprising:
S10411: the index ident value Value_SID_L stored according to this locality, obtain the maximal value Value_SID of index mark corresponding to the data record that stored in database to be updated
max.
In the present embodiment, can after S105 increases data record corresponding to target web in the tables of data that targeted website is corresponding, upgrade the local index ident value Value_SID_L stored, so that generate index mark corresponding to other webpages, certainly, also Value_SID_L can not be upgraded, but record the update status of database to be updated by other means, such as, the number etc. of the data record increased after starting refresh routine corresponding to database to be updated by a counting variable record.
After calculate index mark corresponding to target web according to S1044 or S1045, also comprise:
S112: Value_SID_L value is updated to the index mark that target web is corresponding.
S113: judge whether the index mark that target web is corresponding is greater than Value_SID_D
max, if yes, perform S114.
S114: by Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value.
Preferably, the 4th threshold value preset can be 10000, and certainly, the application does not limit the value of the 4th default threshold value, can determine as the case may be in practical application.
It should be noted that, the application does not limit the execution sequence of S105 and S112-S114, and S105 can perform before S112-S114, can perform after S112-S114 yet, can also perform with the step of in S112-S114 simultaneously.
S115: after terminating refresh routine corresponding to database to be updated, by Value_SID_D
maxbe updated to: Value_SID_D
max=Value_SID_L.
As seen from the above, in the scheme that above-described embodiment provides, obtain the maximal value of index mark corresponding to the data record that stored database to be updated from this locality, can prevent from frequently accessing database to be updated, avoid datamation pressure to be updated excessive.
Corresponding with above-mentioned database update method, the embodiment of the present invention additionally provides a kind of database update device.
The first structural representation of the database update device that Fig. 5 provides for the embodiment of the present invention, this device comprises: webpage gain of parameter module 501, tables of data determination module 502, first data record judge module 503, index identifier generation module 504 and data record increase module 505.
Wherein, webpage gain of parameter module 501, for obtaining the webpage parameter of target web, wherein, described webpage parameter comprises: the mark of the targeted website that described target web is corresponding and the mark of described target web in described targeted website;
Tables of data determination module 502, for the mark according to targeted website corresponding to described target web, in database to be updated, determine the tables of data corresponding with described targeted website, wherein, in described database to be updated comprise the quantity > 1 of tables of data;
First data record judge module 503, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, wherein, described data record comprises: the mark of described target web in described targeted website and index mark corresponding to described target web;
Index identifier generation module 504, for when the judged result of described first data record judge module 503 is no, generates the index mark that described target web is corresponding;
Data record increases module 505, for after described index identifier generation module 504 generating indexes mark, increases the data record that described target web is corresponding in the tables of data that described targeted website is corresponding.
In a preferred embodiment of the invention, this database update device also comprises: index mark sending module (not shown).
Wherein, index mark sending module, for after described index identifier generation module 504 generating indexes mark, sends the index mark generated to webcrawler module or non-described database to be updated.
As seen from the above, in the scheme that the present embodiment provides, after the webpage parameter obtaining target web, judge in tables of data corresponding with targeted website in database to be updated, whether to there is data record corresponding to target web, when not existing, generate the index mark that target web is corresponding, and in the tables of data that targeted website is corresponding, increase data record corresponding to this target web.Compared with prior art, in the scheme that the present embodiment provides, data record corresponding for each webpage is stored in multiple tables of data, but not in a tables of data, therefore, when judging whether to exist in database to be updated data record corresponding to target web, only judge in the tables of data that target web is corresponding, and judge without the need in the full detail that comprises at database to be updated, therefore, it is possible to improve the speed of more new database.
In one particular embodiment of the present invention, see Fig. 6, provide the second structural representation of database update device, compared with previous embodiment, in the present embodiment, this device also comprises: lock-out state arranges module 506, second data record judge module 507 and released state arranges module 508.
Wherein, lock-out state arranges module 506, is set to lock-out state for index mark is generated marker bit, wherein, when described index mark generation marker bit is lock-out state, represents current and can only calculate index mark corresponding to described target web;
Second data record judge module 507, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, if do not exist, then trigger described index identifier generation module 504 generating indexes mark;
Released state arranges module 508, for after described index identifier generation module 504 generating indexes mark, index mark is generated marker bit and is set to released state, wherein, when described index mark generation marker bit is released state, represents and can start index mark corresponding to calculating other webpages except described target web.
As seen from the above, in the scheme that the present embodiment provides, before generating index mark corresponding to target web, the mode of marker bit is generated by arranging index mark, current index corresponding to target web that can only calculate is identified, and the index mark that can effectively prevent each webpage of generation corresponding repeats.
In another specific embodiment of the present invention, see Fig. 7, provide the third structural representation of database update device, compared with previous embodiment, in the present embodiment, index identifier generation module 504, comprising: index mark maximal value acquisition submodule 5041, remainder calculating sub module 5042, remainder judge submodule 5043, first index mark calculating sub module 5044 and the second index mark calculating sub module 5045.
Wherein, index mark maximal value obtains submodule 5041, for obtaining the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated
max;
Remainder calculating sub module 5042, for calculating Value_SID
maxdivided by the remainder R of the first threshold preset;
Remainder judges submodule 5043, for judging whether remainder R is less than default Second Threshold;
First index mark calculating sub module 5044, for when described remainder judges the judged result of submodule 5043 for being, calculates the Second Threshold that index mark=R+ corresponding to described target web presets;
Second index mark calculating sub module 5045, for judging that the judged result of submodule 5043 is no at described remainder, calculates the 3rd threshold value that index mark=R+ corresponding to described target web presets.
In another specific embodiment of the present invention, see Fig. 8, provide the 4th kind of structural representation of database update device, compared with embodiment illustrated in fig. 7, in the present embodiment,
Index mark maximal value obtains submodule 5041, specifically for the index ident value Value_SID_L stored according to this locality, obtains the maximal value Value_SID that index corresponding to the data record that stored in described database to be updated identifies
max;
In addition, this database update is put and is also comprised: the first index mark arranges module 509, second index mark and arranges module 510, first index identification renewal module 511, index mark judge module 512, second index identification renewal module 513 and the 3rd index identification renewal module 514.
Wherein, the first index mark arranges module 509, for after starting refresh routine corresponding to described database to be updated, is set to by the Value_SID_L that this locality stores: Value_SID_D
maxvalue, Value_SID_D
maxfor the maximal value of index mark corresponding to the data record stored in the database described to be updated that stores in described database to be updated.
Second index mark arranges module 510, for arranging after module 509 arranges the local index mark stored in described first index mark, by Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value;
First index identification renewal module 511, after calculating index mark corresponding to described target web in described first index mark calculating sub module 5044 or described second index mark calculating sub module 5045, Value_SID_L value is updated to the index mark that described target web is corresponding;
Index mark judge module 512, for judging whether the index mark that described target web is corresponding is greater than Value_SID_D
max;
Second index identification renewal module 513, for when the judged result of described index mark judge module 512 is for being, then by Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value;
3rd index identification renewal module 514, for after terminating refresh routine corresponding to described database to be updated, by Value_SID_D
maxbe updated to: Value_SID_D
max=Value_SID_L.
As seen from the above, in the scheme that above-described embodiment provides, obtain the maximal value of index mark corresponding to the data record that stored database to be updated from this locality, can prevent from frequently accessing database to be updated, avoid datamation pressure to be updated excessive.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
It should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.
One of ordinary skill in the art will appreciate that all or part of step realized in said method embodiment is that the hardware that can carry out instruction relevant by program has come, described program can be stored in computer read/write memory medium, here the alleged storage medium obtained, as: ROM/RAM, magnetic disc, CD etc.
The foregoing is only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.
Claims (10)
1. a database update method, is characterized in that, described method comprises:
Obtain the webpage parameter of target web, wherein, described webpage parameter comprises: the mark of the targeted website that described target web is corresponding and the mark of described target web in described targeted website;
According to the mark of targeted website corresponding to described target web, in database to be updated, determine the tables of data corresponding with described targeted website, wherein, in described database to be updated comprise the quantity > 1 of tables of data;
According to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, wherein, described data record comprises: the mark of described target web in described targeted website and index mark corresponding to described target web;
If do not exist, generate the index mark that described target web is corresponding; The data record that described target web is corresponding is increased in the tables of data that described targeted website is corresponding.
2. method according to claim 1, is characterized in that,
Before the index mark that the described target web of described generation is corresponding, also comprise:
Index mark is generated marker bit and is set to lock-out state, wherein, when described index mark generation marker bit is lock-out state, represents current and can only calculate index mark corresponding to described target web;
According to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web;
If do not exist, then perform the step of index mark corresponding to the described target web of described generation;
After the index mark that the described target web of described generation is corresponding, also comprise:
Index is identified generation marker bit and be set to released state, wherein, when described index mark generation marker bit is released state, represent that the index that can start calculating other webpages except described target web corresponding identifies.
3. method according to claim 1, is characterized in that, the index mark that the described target web of described generation is corresponding, comprising:
Obtain the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated
max;
Calculate Value_SID
maxdivided by the remainder R of the first threshold preset;
Judge whether remainder R is less than default Second Threshold;
If yes, then the Second Threshold that index mark=R+ corresponding to described target web presets is calculated;
Otherwise, calculate the 3rd threshold value that index mark=R+ corresponding to described target web presets.
4. method according to claim 3, is characterized in that, the maximal value Value_SID of the index mark that the data record stored in the described database to be updated of described acquisition is corresponding
max, comprising:
According to the index ident value Value_SID_L that this locality stores, obtain the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated
max;
Wherein, after starting refresh routine corresponding to described database to be updated, the Value_SID_L that this locality stores is set to: Value_SID_D
maxvalue, Value_SID_D
maxfor the maximal value of index mark corresponding to the data record stored in the database described to be updated that stores in described database to be updated;
Described, the Value_SID_L that this locality stores is set to: Value_SID_D
maxafter value, also comprise:
By Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value;
After calculating index mark corresponding to described target web, also comprise:
Value_SID_L value is updated to the index mark that described target web is corresponding;
Judge whether the index mark that described target web is corresponding is greater than Value_SID_D
max;
If yes, then by Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value;
After terminating refresh routine corresponding to described database to be updated, also comprise:
By Value_SID_D
maxbe updated to: Value_SID_D
max=Value_SID_L.
5. method according to claim 1, is characterized in that, after the index mark that the described target web of described generation is corresponding, also comprises:
The index mark generated is sent to webcrawler module or non-described database to be updated.
6. a database update device, is characterized in that, described device comprises:
Webpage gain of parameter module, for obtaining the webpage parameter of target web, wherein, described webpage parameter comprises: the mark of the targeted website that described target web is corresponding and the mark of described target web in described targeted website;
Tables of data determination module, for the mark according to targeted website corresponding to described target web, in database to be updated, determine the tables of data corresponding with described targeted website, wherein, in described database to be updated comprise the quantity > 1 of tables of data;
First data record judge module, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, wherein, described data record comprises: the mark of described target web in described targeted website and index mark corresponding to described target web;
Index identifier generation module, for when the judged result of described first data record judge module is no, generates the index mark that described target web is corresponding;
Data record increases module, for after described index identifier generation module generating indexes mark, increases the data record that described target web is corresponding in the tables of data that described targeted website is corresponding.
7. device according to claim 6, is characterized in that, described device also comprises:
Lock-out state arranges module, is set to lock-out state for index mark is generated marker bit, wherein, when described index mark generation marker bit is lock-out state, represents current and can only calculate index mark corresponding to described target web;
Second data record judge module, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, if do not exist, then trigger described index identifier generation module generating indexes mark;
Released state arranges module, for after described index identifier generation module generating indexes mark, index mark is generated marker bit and is set to released state, wherein, when described index mark generation marker bit is released state, represents and can start index mark corresponding to calculating other webpages except described target web.
8. device according to claim 6, is characterized in that, described index identifier generation module, comprising:
Index mark maximal value obtains submodule, for obtaining the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated
max;
Remainder calculating sub module, for calculating Value_SID
maxdivided by the remainder R of the first threshold preset;
Remainder judges submodule, for judging whether remainder R is less than default Second Threshold;
First index mark calculating sub module, for when described remainder judges the judged result of submodule for being, calculates the Second Threshold that index mark=R+ corresponding to described target web presets;
Second index mark calculating sub module, for judging that the judged result of submodule is no at described remainder, calculates the 3rd threshold value that index mark=R+ corresponding to described target web presets.
9. device according to claim 8, is characterized in that,
Described index mark maximal value obtains submodule, specifically for the index ident value Value_SID_L stored according to this locality, obtains the maximal value Value_SID that index corresponding to the data record that stored in described database to be updated identifies
max;
Described device also comprises:
First index mark arranges module, for after starting refresh routine corresponding to described database to be updated, is set to by the Value_SID_L that this locality stores: Value_SID_D
maxvalue, Value_SID_D
maxfor the maximal value of index mark corresponding to the data record stored in the database described to be updated that stores in described database to be updated;
Second index mark arranges module, after the index stored for arranging module installation this locality in described first index mark identifies, by Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value;
First index identification renewal module, after calculating index mark corresponding to described target web in described first index mark calculating sub module or described second index mark calculating sub module, Value_SID_L value is updated to the index mark that described target web is corresponding;
Index mark judge module, for judging whether the index mark that described target web is corresponding is greater than Value_SID_D
max;
Second index identification renewal module, for when the judged result of described index mark judge module is for being, then by Value_SID_D
maxbe updated to: current Value_SID_D
max+ preset the 4th threshold value;
3rd index identification renewal module, for after terminating refresh routine corresponding to described database to be updated, by Value_SID_D
maxbe updated to: Value_SID_D
max=Value_SID_L.
10. device according to claim 6, is characterized in that, described device also comprises:
Index mark sending module, for after described index identifier generation module generating indexes mark, sends the index mark generated to webcrawler module or non-described database to be updated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410453679.XA CN105468637A (en) | 2014-09-05 | 2014-09-05 | Database updating method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410453679.XA CN105468637A (en) | 2014-09-05 | 2014-09-05 | Database updating method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105468637A true CN105468637A (en) | 2016-04-06 |
Family
ID=55606342
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410453679.XA Pending CN105468637A (en) | 2014-09-05 | 2014-09-05 | Database updating method and apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105468637A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649576A (en) * | 2016-11-15 | 2017-05-10 | 北京集奥聚合科技有限公司 | Storing method and system for e-commerce commodities crawled by crawlers |
CN106980673A (en) * | 2017-03-27 | 2017-07-25 | 恒生电子股份有限公司 | Main memory database table index updating method and system |
CN109948100A (en) * | 2019-03-12 | 2019-06-28 | 深圳市商舟网科技有限公司 | Page info processing method, device, computer equipment and storage medium |
CN112015819A (en) * | 2020-08-31 | 2020-12-01 | 杭州欧若数网科技有限公司 | Data updating method, device, equipment and medium for distributed graph database |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102495892A (en) * | 2011-12-09 | 2012-06-13 | 北京大学 | Webpage information extraction method |
CN102724295A (en) * | 2012-05-24 | 2012-10-10 | 中国电子科技集团公司第十五研究所 | Data synchronization method and system |
CN102831252A (en) * | 2012-09-21 | 2012-12-19 | 北京奇虎科技有限公司 | Method and device for updating index database and search method and system |
US20140229450A1 (en) * | 2013-01-23 | 2014-08-14 | Michael Brekelmans | Method for Organising Blog Articles in a Social Network |
CN104021192A (en) * | 2014-06-13 | 2014-09-03 | 北京联时空网络通信设备有限公司 | Database renewing method and device |
-
2014
- 2014-09-05 CN CN201410453679.XA patent/CN105468637A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102495892A (en) * | 2011-12-09 | 2012-06-13 | 北京大学 | Webpage information extraction method |
CN102724295A (en) * | 2012-05-24 | 2012-10-10 | 中国电子科技集团公司第十五研究所 | Data synchronization method and system |
CN102831252A (en) * | 2012-09-21 | 2012-12-19 | 北京奇虎科技有限公司 | Method and device for updating index database and search method and system |
US20140229450A1 (en) * | 2013-01-23 | 2014-08-14 | Michael Brekelmans | Method for Organising Blog Articles in a Social Network |
CN104021192A (en) * | 2014-06-13 | 2014-09-03 | 北京联时空网络通信设备有限公司 | Database renewing method and device |
Non-Patent Citations (1)
Title |
---|
施威铭研究室: "《完全掌握Microsoft access2002中文版标准教程》", 31 January 2002, 中国青年出版社 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106649576A (en) * | 2016-11-15 | 2017-05-10 | 北京集奥聚合科技有限公司 | Storing method and system for e-commerce commodities crawled by crawlers |
CN106980673A (en) * | 2017-03-27 | 2017-07-25 | 恒生电子股份有限公司 | Main memory database table index updating method and system |
CN106980673B (en) * | 2017-03-27 | 2021-03-02 | 恒生电子股份有限公司 | Method and system for updating internal memory database table index |
CN109948100A (en) * | 2019-03-12 | 2019-06-28 | 深圳市商舟网科技有限公司 | Page info processing method, device, computer equipment and storage medium |
CN112015819A (en) * | 2020-08-31 | 2020-12-01 | 杭州欧若数网科技有限公司 | Data updating method, device, equipment and medium for distributed graph database |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102298614B (en) | Method for determining collection category of page collection information and device and equipment | |
US9116899B2 (en) | Managing changes to one or more files via linked mapping records | |
US10402310B1 (en) | Systems and methods for reducing storage required for code coverage results | |
US20120054440A1 (en) | Systems and methods for providing a hierarchy of cache layers of different types for intext advertising | |
US10102234B2 (en) | Auto suggestion in search with additional properties | |
US20080091685A1 (en) | Handling dynamic URLs in crawl for better coverage of unique content | |
US8239392B2 (en) | Enhanced query performance using fixed length hashing of multidimensional data | |
US20120124028A1 (en) | Unified Application Discovery across Application Stores | |
JP5542859B2 (en) | Log management apparatus, log storage method, log search method, and program | |
CN102124481A (en) | Embedding macros in web pages with advertisements | |
CN105468637A (en) | Database updating method and apparatus | |
US10205678B2 (en) | Systems and methods for client-side dynamic information resource activation and deactivation | |
CN106326025A (en) | Method and device for processing abnormality of browser | |
CN104794177A (en) | Data storing method and device | |
CN103186666A (en) | Method, device and equipment for searching based on favorites | |
CN104484386A (en) | Information sharing method and browser client terminal | |
CN104317931A (en) | Webpage title determining method and device | |
CN102902784B (en) | Web page classification storage system and method | |
CN105787379A (en) | Information management method and system as well as electronic device | |
CN103107919A (en) | Method and system for network resource modeling | |
CN102968445B (en) | Based on the application call method and apparatus of browser input | |
US8463799B2 (en) | System and method for consolidating search engine results | |
CN104899217A (en) | Method and apparatus for implementing customized function | |
CN103678535A (en) | Browser and downloading method thereof | |
JP2017097462A (en) | Search program, search device and search method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160406 |
|
WD01 | Invention patent application deemed withdrawn after publication |