CN105468637A - Database updating method and apparatus - Google Patents

Database updating method and apparatus Download PDF

Info

Publication number
CN105468637A
CN105468637A CN201410453679.XA CN201410453679A CN105468637A CN 105468637 A CN105468637 A CN 105468637A CN 201410453679 A CN201410453679 A CN 201410453679A CN 105468637 A CN105468637 A CN 105468637A
Authority
CN
China
Prior art keywords
value
target web
sid
index mark
mark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410453679.XA
Other languages
Chinese (zh)
Inventor
杜玉杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lianjia Zhongying Network Technology Co Ltd
Original Assignee
Beijing Lianjia Zhongying Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lianjia Zhongying Network Technology Co Ltd filed Critical Beijing Lianjia Zhongying Network Technology Co Ltd
Priority to CN201410453679.XA priority Critical patent/CN105468637A/en
Publication of CN105468637A publication Critical patent/CN105468637A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present invention disclose a database updating method and apparatus, and relate to the technical field of software. The method comprises: acquiring a webpage parameter of a target webpage; according to an identification of a target website corresponding to a target webpage, determining a data table corresponding to the target website in a to-be-updated database, wherein the number of the data table comprised in the to-be-updated database is larger than 1; according to the identification of the target webpage in the target website, determining whether a data record corresponding to the target webpage exists in the data table corresponding to the target website, wherein the data record comprises: the identification of the target webpage in the target website and an index identification corresponding to the target webpage; if not, generating the index identification corresponding to the target webpage; and adding the data record corresponding to the target webpage in the data table corresponding to the target website. The database is updated by using the scheme provided by the embodiments of the present invention, so that the speed of updating the database can be improved.

Description

A kind of database update method and device
Technical field
The present invention relates to software technology field, particularly a kind of database update method and device.
Background technology
Along with the fast development of Internet technology, the information on internet is more and more abundanter, therefore, and the information required for increasing user is searched for by search engine.
Each search engine generally obtains the info web of target web by web crawlers, and is updated in corresponding database by obtained info web, thinks that user provides and more fully searches for information.Wherein, web crawlers, referring to can according to certain rule, the program of the automatic capturing network information or script.
In prior art, when search engine upgrades the info web of the target web that spider obtains in associated databases, general elder generation judges whether there is the data record corresponding with this target web in database according to the website information etc. of target web, if do not exist, data record corresponding to target web is increased after the last item data record stored, wherein, pieces of data record is generally stored in a tables of data of database.
When in database, information is less, application aforesaid way can upgrade by fulfillment database fast, but, along with spider obtain webpage info web get more and more, the data record that database comprises also gets more and more, therefore, at every turn more new database time, judge whether to exist in database the time that data record corresponding to a certain webpage need more and more longer, and then the slowing of each more new database.
Summary of the invention
The embodiment of the invention discloses a kind of database update method and device, to improve the speed of more new database.
For achieving the above object, the embodiment of the invention discloses a kind of database update method, described method comprises:
Obtain the webpage parameter of target web, wherein, described webpage parameter comprises: the mark of the targeted website that described target web is corresponding and the mark of described target web in described targeted website;
According to the mark of targeted website corresponding to described target web, in database to be updated, determine the tables of data corresponding with described targeted website, wherein, in described database to be updated comprise the quantity > 1 of tables of data;
According to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, wherein, described data record comprises: the mark of described target web in described targeted website and index mark corresponding to described target web;
If do not exist, generate the index mark that described target web is corresponding; The data record that described target web is corresponding is increased in the tables of data that described targeted website is corresponding.
Optionally, before the index mark that the described target web of described generation is corresponding, also comprise:
Index mark is generated marker bit and is set to lock-out state, wherein, when described index mark generation marker bit is lock-out state, represents current and can only calculate index mark corresponding to described target web;
According to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web;
If do not exist, then perform the step of index mark corresponding to the described target web of described generation;
After the index mark that the described target web of described generation is corresponding, also comprise:
Index is identified generation marker bit and be set to released state, wherein, when described index mark generation marker bit is released state, represent that the index that can start calculating other webpages except described target web corresponding identifies.
Optionally, the index mark that the described target web of described generation is corresponding, comprising:
Obtain the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated max;
Calculate Value_SID maxdivided by the remainder R of the first threshold preset;
Judge whether remainder R is less than default Second Threshold;
If yes, then the Second Threshold that index mark=R+ corresponding to described target web presets is calculated;
Otherwise, calculate the 3rd threshold value that index mark=R+ corresponding to described target web presets.
Optionally, the maximal value Value_SID of the index mark that the data record stored in the described database to be updated of described acquisition is corresponding max, comprising:
According to the index ident value Value_SID_L that this locality stores, obtain the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated max;
Wherein, after starting refresh routine corresponding to described database to be updated, the Value_SID_L that this locality stores is set to: Value_SID_D maxvalue, Value_SID_D maxfor the maximal value of index mark corresponding to the data record stored in the database described to be updated that stores in described database to be updated;
Described, the Value_SID_L that this locality stores is set to: Value_SID_D maxafter value, also comprise:
By Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value;
After calculating index mark corresponding to described target web, also comprise:
Value_SID_L value is updated to the index mark that described target web is corresponding;
Judge whether the index mark that described target web is corresponding is greater than Value_SID_D max;
If yes, then by Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value;
After terminating refresh routine corresponding to described database to be updated, also comprise:
By Value_SID_D maxbe updated to: Value_SID_D max=Value_SID_L.
Optionally, after the index mark that the described target web of described generation is corresponding, also comprise:
The index mark generated is sent to webcrawler module or non-described database to be updated.
For achieving the above object, the embodiment of the invention discloses a kind of database update device, described device comprises:
Webpage gain of parameter module, for obtaining the webpage parameter of target web, wherein, described webpage parameter comprises: the mark of the targeted website that described target web is corresponding and the mark of described target web in described targeted website;
Tables of data determination module, for the mark according to targeted website corresponding to described target web, in database to be updated, determine the tables of data corresponding with described targeted website, wherein, in described database to be updated comprise the quantity > 1 of tables of data;
First data record judge module, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, wherein, described data record comprises: the mark of described target web in described targeted website and index mark corresponding to described target web;
Index identifier generation module, for when the judged result of described first data record judge module is no, generates the index mark that described target web is corresponding;
Data record increases module, for after described index identifier generation module generating indexes mark, increases the data record that described target web is corresponding in the tables of data that described targeted website is corresponding.
Optionally, described database update device also comprises:
Lock-out state arranges module, is set to lock-out state for index mark is generated marker bit, wherein, when described index mark generation marker bit is lock-out state, represents current and can only calculate index mark corresponding to described target web;
Second data record judge module, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, if do not exist, then trigger described index identifier generation module generating indexes mark;
Released state arranges module, for after described index identifier generation module generating indexes mark, index mark is generated marker bit and is set to released state, wherein, when described index mark generation marker bit is released state, represents and can start index mark corresponding to calculating other webpages except described target web.
Optionally, described index identifier generation module, comprising:
Index mark maximal value obtains submodule, for obtaining the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated max;
Remainder calculating sub module, for calculating Value_SID maxdivided by the remainder R of the first threshold preset;
Remainder judges submodule, for judging whether remainder R is less than default Second Threshold;
First index mark calculating sub module, for when described remainder judges the judged result of submodule for being, calculates the Second Threshold that index mark=R+ corresponding to described target web presets;
Second index mark calculating sub module, for judging that the judged result of submodule is no at described remainder, calculates the 3rd threshold value that index mark=R+ corresponding to described target web presets.
Optionally, described index mark maximal value obtains submodule, specifically for the index ident value Value_SID_L stored according to this locality, obtains the maximal value Value_SID that index corresponding to the data record that stored in described database to be updated identifies max;
Described database update device also comprises:
First index mark arranges module, for after starting refresh routine corresponding to described database to be updated, is set to by the Value_SID_L that this locality stores: Value_SID_D maxvalue, Value_SID_D maxfor the maximal value of index mark corresponding to the data record stored in the database described to be updated that stores in described database to be updated;
Second index mark arranges module, after the index stored for arranging module installation this locality in described first index mark identifies, by Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value;
First index identification renewal module, after calculating index mark corresponding to described target web in described first index mark calculating sub module or described second index mark calculating sub module, Value_SID_L value is updated to the index mark that described target web is corresponding;
Index mark judge module, for judging whether the index mark that described target web is corresponding is greater than Value_SID_D max;
Second index identification renewal module, for when the judged result of described index mark judge module is for being, then by Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value;
3rd index identification renewal module, for after terminating refresh routine corresponding to described database to be updated, by Value_SID_D maxbe updated to: Value_SID_D max=Value_SID_L.
Optionally, described database update device also comprises:
Index mark sending module, for after described index identifier generation module generating indexes mark, sends the index mark generated to webcrawler module or non-described database to be updated.
As seen from the above, in the scheme that the embodiment of the present invention provides, after the webpage parameter obtaining target web, judge in tables of data corresponding with targeted website in database to be updated, whether to there is data record corresponding to target web, when not existing, generate the index mark that target web is corresponding, and in the tables of data that targeted website is corresponding, increase data record corresponding to this target web.Compared with prior art, in the scheme that the embodiment of the present invention provides, data record corresponding for each webpage is stored in multiple tables of data, but not in a tables of data, therefore, when judging whether to exist in database to be updated data record corresponding to target web, only judge in the tables of data that target web is corresponding, and judge without the need in the full detail that comprises at database to be updated, therefore, it is possible to improve the speed of more new database.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The first schematic flow sheet of the database update method that Fig. 1 provides for the embodiment of the present invention;
The second schematic flow sheet of the database update method that Fig. 2 provides for the embodiment of the present invention;
The third schematic flow sheet of the database update method that Fig. 3 provides for the embodiment of the present invention;
4th kind of schematic flow sheet of the database update method that Fig. 4 provides for the embodiment of the present invention;
The first structural representation of the database update device that Fig. 5 provides for the embodiment of the present invention;
The second structural representation of the database update device that Fig. 6 provides for the embodiment of the present invention;
The third structural representation of the database update device that Fig. 7 provides for the embodiment of the present invention;
4th kind of structural representation of the database update device that Fig. 8 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
The first schematic flow sheet of the database update method that Fig. 1 provides for the embodiment of the present invention, the method comprises:
S101: the webpage parameter obtaining target web.
In practical application, search engine is in order to provide abundant search service to user, the relevant information of target web is generally obtained by webcrawler module, and according to the corresponding database of obtained information updating, wherein, the website information of target web is at least comprised in the information that webcrawler module obtains.
After webcrawler module obtains the relevant information of target web, search engine can according to the webpage parameter of obtained information acquisition target web, wherein, can comprise in above-mentioned webpage parameter: the mark of the targeted website that target web is corresponding and the target web mark in targeted website.
Those skilled in the art are understandable that, can obtain the mark of targeted website corresponding to target web according to the website information of target web.
Further, the mark of target web in targeted website can be the network address of target web, also can be the information after the network address removal targeted website mark of target web.
Such as: the network address of the target web that web crawlers obtains is: http://item.jd.com/1184892.html, being designated then according to targeted website corresponding to the known target web of this network address: http://item.jd.com, the information removed after the mark of targeted website is: 1184892.
S102: according to the mark of targeted website corresponding to target web, determine the tables of data corresponding with targeted website in database to be updated.
Because the info web obtained along with webcrawler module gets more and more, in database to be updated storage network page information tables of data in the data that store also just more and more.In addition, more first need judge whether to exist in database to be updated the relevant information of target web corresponding to information that webcrawler module obtains during new database, therefore, the data grows stored in tables of data along with storage network page information in database to be updated is many, judge whether that the speed goes of the relevant information that there is target web is slow, for improving the efficiency of more new database, can by the information in database according to certain rale store in different tables of data, such as, from the web storage of one or more website in a tables of data etc.
In view of the foregoing, in the present embodiment, in database to be updated comprise the quantity > 1 of tables of data.
In addition, in practical application, for improving the efficiency of more new database further, each tables of data comprised in database to be updated can be corresponding with the mark of a website respectively.
In a kind of application specifically, database to be updated can also be stored in distributed file system as the file of in distributed file system.
Wherein, distributed file system, refer to that the physical memory resources of file system management is not necessarily connected directly between on local memory device, but can be connected with network-side memory device by computer network, and then file is stored in each memory node, wherein, a memory device of network-side can be referred to as a memory node.
In one particular embodiment of the present invention, according to the mark of targeted website corresponding to target web, when determining the tables of data failure corresponding with targeted website in database to be updated, tables of data corresponding to targeted website can be created in database to be updated.After success creates tables of data corresponding to targeted website, perform the step generating index mark (S104) corresponding to target web.
S103: according to the mark of target web in targeted website, judges whether there is data record corresponding to target web in the tables of data that targeted website is corresponding, if do not exist, performs S104, otherwise, perform S106.
Wherein, can comprise in data record: the mark of target web in targeted website and index mark corresponding to target web, certainly, do not limit information included in data record in the application.
When user browses webpage according to the Search Results that search engine provides, the index mark that the webpage that will be able to browse according to user of search engine is corresponding provides the website information of this webpage to browser.
Such as: when user is by search engine search " web crawlers ", this search engine can provide many web page digest information relevant to " web crawlers " to user, wherein, the index mark that webpage is corresponding is comprised in web page digest information, but this index mark can be shown to user, also can not show to user; When user clicks a certain web page digest information, network engine obtains index mark corresponding to this webpage, and be identified in corresponding database according to obtained index and retrieve, obtain the website information of this webpage, finally the website information of this webpage is sent to browser, make browser according to the website information of this webpage to user's displayed web page.
A kind of embody rule scene of the above-mentioned just index mark that target web is corresponding, the application does not limit this.
S104: generate the index mark that target web is corresponding.
S105: increase the data record that target web is corresponding in the tables of data that targeted website is corresponding.
Preferably, in a kind of specific implementation of the present invention, after generating index mark corresponding to target web, can also comprise: send the index mark generated to webcrawler module or non-database to be updated.
In practical application, can be same database with database to be updated for providing the database of the website information of webpage to user, may not be same database, when not being same database, after generating index mark corresponding to target web, the index mark generated can be sent, to upgrade data record corresponding to target web in other databases to webcrawler module or non-database to be updated.
It should be noted that, webcrawler module also needs received identification information to send to corresponding database, to upgrade data record corresponding to target web in associated databases after receiving index mark corresponding to target web.
S106: process ends.
As seen from the above, in the scheme that the present embodiment provides, after the webpage parameter obtaining target web, judge in tables of data corresponding with targeted website in database to be updated, whether to there is data record corresponding to target web, when not existing, generate the index mark that target web is corresponding, and in the tables of data that targeted website is corresponding, increase data record corresponding to this target web.Compared with prior art, in the scheme that the present embodiment provides, data record corresponding for each webpage is stored in multiple tables of data, but not in a tables of data, therefore, when judging whether to exist in database to be updated data record corresponding to target web, only judge in the tables of data that target web is corresponding, and judge without the need in the full detail that comprises at database to be updated, therefore, it is possible to improve the speed of more new database.
In one particular embodiment of the present invention, see Fig. 2, provide the second schematic flow sheet of database update method, compared with previous embodiment, in the present embodiment,
Before generating index mark (S104) corresponding to target web, also comprise:
S107: index mark is generated marker bit and is set to lock-out state.
After starting refresh routine corresponding to database to be updated, webcrawler module repeatedly can obtain info web, therefore, the info web that search engine may obtain according to webcrawler module obtains the webpage parameter of multiple webpage, when there is not data record corresponding to each webpage in the corresponding data table judging database to be updated through S103, identical for preventing the index of each webpage generated according to S104 from identifying, the method of marker bit is generated by arranging index mark, once only allow the index marker position that calculating webpage is corresponding, that is: when index mark generation marker bit is lock-out state, represent current and can only calculate index mark corresponding to target web.
In practical application, index mark can be generated marker bit and be set to 1 expression lock-out state, be set to 0 mark released state, certainly, the application is just described for above-mentioned, does not limit the concrete numerical value representing lock-out state and released state.
S108: according to the mark of target web in targeted website, judges whether there is data record corresponding to target web in the tables of data that targeted website is corresponding, if do not exist, then perform S104.
In practical application, the info web that search engine may obtain according to webcrawler module obtains the webpage parameter of multiple webpage, then may there is the identical situation of two or more webpage in this multiple webpage, in this case, when judging whether to exist in the tables of data that targeted website is corresponding data record corresponding to target web according to S103, may judge do not exist, and when generating index corresponding to target web according to S104 and identifying, in the tables of data that targeted website is corresponding, but there is data record corresponding to target web.For above-mentioned situation, after index mark being generated marker bit and being set to lock-out state, according to the mark of target web in targeted website, need judge in the tables of data that targeted website is corresponding, whether to there is data record corresponding to target web.
After generating index mark (S104) corresponding to target web, also comprise:
S109: index mark is generated marker bit and is set to released state.
Wherein, when index mark generation marker bit is released state, represents and can start index mark corresponding to calculating other webpages except target web.
It should be noted that, the application does not limit the execution sequence of S109 and S105, and S109 can perform before S105, can perform after S105 yet, can also perform with S105 simultaneously.
As seen from the above, in the scheme that the present embodiment provides, before generating index mark corresponding to target web, the mode of marker bit is generated by arranging index mark, current index corresponding to target web that can only calculate is identified, and the index mark that can effectively prevent each webpage of generation corresponding repeats.
In another specific embodiment of the present invention, see Fig. 3, provide the third schematic flow sheet of database update method, compared with previous embodiment, in the present embodiment, generate index mark (S104) that target web is corresponding, comprising:
S1041: the maximal value Value_SID obtaining index mark corresponding to the data record that stored in database to be updated max.
In practical application, Value_SID can be obtained from database to be updated max.
Certainly, in order to prevent frequently reading the problems such as database work pressure to be updated that data cause is excessive from database to be updated, also can after starting refresh routine corresponding to database to be updated, the maximal value that index corresponding for the data record stored in database to be updated identifies is stored in this locality, during each execution S1041, the maximal value that the index corresponding according to the data record stored in the database to be updated that this locality stores identifies, obtains Value_SID max.Specific implementation can refer to embodiment illustrated in fig. 4.
S1042: calculate Value_SID maxdivided by the remainder R of the first threshold preset.
S1043: judge whether remainder R is less than default Second Threshold, if yes, performs S1044, otherwise, perform S1045.
S1044: calculate the Second Threshold that index mark=R+ corresponding to target web presets.
S1045: calculate the 3rd threshold value that index mark=R+ corresponding to target web presets.
Be described below by instantiation.
Suppose, the first threshold preset is 10000000, and the Second Threshold preset is 1000000, and the 3rd threshold value preset is 1.
S1041: obtain Value_SID max=1000001;
S1042:1000001 is 1000001 divided by the remainder of 10000000, then R=1000001;
S1043:1000001 is greater than 1000000, then perform S1045;
S1045: index mark=1000001+1=1000002 that target web is corresponding.
In another specific embodiment of the present invention, see Fig. 4, provide the 4th kind of schematic flow sheet of database update method, compared with embodiment illustrated in fig. 3, in the present embodiment, the method also comprises:
S110: after starting refresh routine corresponding to database to be updated, the Value_SID_L that this locality stores is set to: Value_SID_D maxvalue.
Wherein, Value_SID_D maxfor the maximal value of index mark corresponding to the data record stored in the database to be updated that stores in database to be updated.
S111: by Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value.
Obtain the maximal value Value_SID of index mark corresponding to the data record that stored in database to be updated max(S1041), comprising:
S10411: the index ident value Value_SID_L stored according to this locality, obtain the maximal value Value_SID of index mark corresponding to the data record that stored in database to be updated max.
In the present embodiment, can after S105 increases data record corresponding to target web in the tables of data that targeted website is corresponding, upgrade the local index ident value Value_SID_L stored, so that generate index mark corresponding to other webpages, certainly, also Value_SID_L can not be upgraded, but record the update status of database to be updated by other means, such as, the number etc. of the data record increased after starting refresh routine corresponding to database to be updated by a counting variable record.
After calculate index mark corresponding to target web according to S1044 or S1045, also comprise:
S112: Value_SID_L value is updated to the index mark that target web is corresponding.
S113: judge whether the index mark that target web is corresponding is greater than Value_SID_D max, if yes, perform S114.
S114: by Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value.
Preferably, the 4th threshold value preset can be 10000, and certainly, the application does not limit the value of the 4th default threshold value, can determine as the case may be in practical application.
It should be noted that, the application does not limit the execution sequence of S105 and S112-S114, and S105 can perform before S112-S114, can perform after S112-S114 yet, can also perform with the step of in S112-S114 simultaneously.
S115: after terminating refresh routine corresponding to database to be updated, by Value_SID_D maxbe updated to: Value_SID_D max=Value_SID_L.
As seen from the above, in the scheme that above-described embodiment provides, obtain the maximal value of index mark corresponding to the data record that stored database to be updated from this locality, can prevent from frequently accessing database to be updated, avoid datamation pressure to be updated excessive.
Corresponding with above-mentioned database update method, the embodiment of the present invention additionally provides a kind of database update device.
The first structural representation of the database update device that Fig. 5 provides for the embodiment of the present invention, this device comprises: webpage gain of parameter module 501, tables of data determination module 502, first data record judge module 503, index identifier generation module 504 and data record increase module 505.
Wherein, webpage gain of parameter module 501, for obtaining the webpage parameter of target web, wherein, described webpage parameter comprises: the mark of the targeted website that described target web is corresponding and the mark of described target web in described targeted website;
Tables of data determination module 502, for the mark according to targeted website corresponding to described target web, in database to be updated, determine the tables of data corresponding with described targeted website, wherein, in described database to be updated comprise the quantity > 1 of tables of data;
First data record judge module 503, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, wherein, described data record comprises: the mark of described target web in described targeted website and index mark corresponding to described target web;
Index identifier generation module 504, for when the judged result of described first data record judge module 503 is no, generates the index mark that described target web is corresponding;
Data record increases module 505, for after described index identifier generation module 504 generating indexes mark, increases the data record that described target web is corresponding in the tables of data that described targeted website is corresponding.
In a preferred embodiment of the invention, this database update device also comprises: index mark sending module (not shown).
Wherein, index mark sending module, for after described index identifier generation module 504 generating indexes mark, sends the index mark generated to webcrawler module or non-described database to be updated.
As seen from the above, in the scheme that the present embodiment provides, after the webpage parameter obtaining target web, judge in tables of data corresponding with targeted website in database to be updated, whether to there is data record corresponding to target web, when not existing, generate the index mark that target web is corresponding, and in the tables of data that targeted website is corresponding, increase data record corresponding to this target web.Compared with prior art, in the scheme that the present embodiment provides, data record corresponding for each webpage is stored in multiple tables of data, but not in a tables of data, therefore, when judging whether to exist in database to be updated data record corresponding to target web, only judge in the tables of data that target web is corresponding, and judge without the need in the full detail that comprises at database to be updated, therefore, it is possible to improve the speed of more new database.
In one particular embodiment of the present invention, see Fig. 6, provide the second structural representation of database update device, compared with previous embodiment, in the present embodiment, this device also comprises: lock-out state arranges module 506, second data record judge module 507 and released state arranges module 508.
Wherein, lock-out state arranges module 506, is set to lock-out state for index mark is generated marker bit, wherein, when described index mark generation marker bit is lock-out state, represents current and can only calculate index mark corresponding to described target web;
Second data record judge module 507, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, if do not exist, then trigger described index identifier generation module 504 generating indexes mark;
Released state arranges module 508, for after described index identifier generation module 504 generating indexes mark, index mark is generated marker bit and is set to released state, wherein, when described index mark generation marker bit is released state, represents and can start index mark corresponding to calculating other webpages except described target web.
As seen from the above, in the scheme that the present embodiment provides, before generating index mark corresponding to target web, the mode of marker bit is generated by arranging index mark, current index corresponding to target web that can only calculate is identified, and the index mark that can effectively prevent each webpage of generation corresponding repeats.
In another specific embodiment of the present invention, see Fig. 7, provide the third structural representation of database update device, compared with previous embodiment, in the present embodiment, index identifier generation module 504, comprising: index mark maximal value acquisition submodule 5041, remainder calculating sub module 5042, remainder judge submodule 5043, first index mark calculating sub module 5044 and the second index mark calculating sub module 5045.
Wherein, index mark maximal value obtains submodule 5041, for obtaining the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated max;
Remainder calculating sub module 5042, for calculating Value_SID maxdivided by the remainder R of the first threshold preset;
Remainder judges submodule 5043, for judging whether remainder R is less than default Second Threshold;
First index mark calculating sub module 5044, for when described remainder judges the judged result of submodule 5043 for being, calculates the Second Threshold that index mark=R+ corresponding to described target web presets;
Second index mark calculating sub module 5045, for judging that the judged result of submodule 5043 is no at described remainder, calculates the 3rd threshold value that index mark=R+ corresponding to described target web presets.
In another specific embodiment of the present invention, see Fig. 8, provide the 4th kind of structural representation of database update device, compared with embodiment illustrated in fig. 7, in the present embodiment,
Index mark maximal value obtains submodule 5041, specifically for the index ident value Value_SID_L stored according to this locality, obtains the maximal value Value_SID that index corresponding to the data record that stored in described database to be updated identifies max;
In addition, this database update is put and is also comprised: the first index mark arranges module 509, second index mark and arranges module 510, first index identification renewal module 511, index mark judge module 512, second index identification renewal module 513 and the 3rd index identification renewal module 514.
Wherein, the first index mark arranges module 509, for after starting refresh routine corresponding to described database to be updated, is set to by the Value_SID_L that this locality stores: Value_SID_D maxvalue, Value_SID_D maxfor the maximal value of index mark corresponding to the data record stored in the database described to be updated that stores in described database to be updated.
Second index mark arranges module 510, for arranging after module 509 arranges the local index mark stored in described first index mark, by Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value;
First index identification renewal module 511, after calculating index mark corresponding to described target web in described first index mark calculating sub module 5044 or described second index mark calculating sub module 5045, Value_SID_L value is updated to the index mark that described target web is corresponding;
Index mark judge module 512, for judging whether the index mark that described target web is corresponding is greater than Value_SID_D max;
Second index identification renewal module 513, for when the judged result of described index mark judge module 512 is for being, then by Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value;
3rd index identification renewal module 514, for after terminating refresh routine corresponding to described database to be updated, by Value_SID_D maxbe updated to: Value_SID_D max=Value_SID_L.
As seen from the above, in the scheme that above-described embodiment provides, obtain the maximal value of index mark corresponding to the data record that stored database to be updated from this locality, can prevent from frequently accessing database to be updated, avoid datamation pressure to be updated excessive.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
It should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.
One of ordinary skill in the art will appreciate that all or part of step realized in said method embodiment is that the hardware that can carry out instruction relevant by program has come, described program can be stored in computer read/write memory medium, here the alleged storage medium obtained, as: ROM/RAM, magnetic disc, CD etc.
The foregoing is only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims (10)

1. a database update method, is characterized in that, described method comprises:
Obtain the webpage parameter of target web, wherein, described webpage parameter comprises: the mark of the targeted website that described target web is corresponding and the mark of described target web in described targeted website;
According to the mark of targeted website corresponding to described target web, in database to be updated, determine the tables of data corresponding with described targeted website, wherein, in described database to be updated comprise the quantity > 1 of tables of data;
According to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, wherein, described data record comprises: the mark of described target web in described targeted website and index mark corresponding to described target web;
If do not exist, generate the index mark that described target web is corresponding; The data record that described target web is corresponding is increased in the tables of data that described targeted website is corresponding.
2. method according to claim 1, is characterized in that,
Before the index mark that the described target web of described generation is corresponding, also comprise:
Index mark is generated marker bit and is set to lock-out state, wherein, when described index mark generation marker bit is lock-out state, represents current and can only calculate index mark corresponding to described target web;
According to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web;
If do not exist, then perform the step of index mark corresponding to the described target web of described generation;
After the index mark that the described target web of described generation is corresponding, also comprise:
Index is identified generation marker bit and be set to released state, wherein, when described index mark generation marker bit is released state, represent that the index that can start calculating other webpages except described target web corresponding identifies.
3. method according to claim 1, is characterized in that, the index mark that the described target web of described generation is corresponding, comprising:
Obtain the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated max;
Calculate Value_SID maxdivided by the remainder R of the first threshold preset;
Judge whether remainder R is less than default Second Threshold;
If yes, then the Second Threshold that index mark=R+ corresponding to described target web presets is calculated;
Otherwise, calculate the 3rd threshold value that index mark=R+ corresponding to described target web presets.
4. method according to claim 3, is characterized in that, the maximal value Value_SID of the index mark that the data record stored in the described database to be updated of described acquisition is corresponding max, comprising:
According to the index ident value Value_SID_L that this locality stores, obtain the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated max;
Wherein, after starting refresh routine corresponding to described database to be updated, the Value_SID_L that this locality stores is set to: Value_SID_D maxvalue, Value_SID_D maxfor the maximal value of index mark corresponding to the data record stored in the database described to be updated that stores in described database to be updated;
Described, the Value_SID_L that this locality stores is set to: Value_SID_D maxafter value, also comprise:
By Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value;
After calculating index mark corresponding to described target web, also comprise:
Value_SID_L value is updated to the index mark that described target web is corresponding;
Judge whether the index mark that described target web is corresponding is greater than Value_SID_D max;
If yes, then by Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value;
After terminating refresh routine corresponding to described database to be updated, also comprise:
By Value_SID_D maxbe updated to: Value_SID_D max=Value_SID_L.
5. method according to claim 1, is characterized in that, after the index mark that the described target web of described generation is corresponding, also comprises:
The index mark generated is sent to webcrawler module or non-described database to be updated.
6. a database update device, is characterized in that, described device comprises:
Webpage gain of parameter module, for obtaining the webpage parameter of target web, wherein, described webpage parameter comprises: the mark of the targeted website that described target web is corresponding and the mark of described target web in described targeted website;
Tables of data determination module, for the mark according to targeted website corresponding to described target web, in database to be updated, determine the tables of data corresponding with described targeted website, wherein, in described database to be updated comprise the quantity > 1 of tables of data;
First data record judge module, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, wherein, described data record comprises: the mark of described target web in described targeted website and index mark corresponding to described target web;
Index identifier generation module, for when the judged result of described first data record judge module is no, generates the index mark that described target web is corresponding;
Data record increases module, for after described index identifier generation module generating indexes mark, increases the data record that described target web is corresponding in the tables of data that described targeted website is corresponding.
7. device according to claim 6, is characterized in that, described device also comprises:
Lock-out state arranges module, is set to lock-out state for index mark is generated marker bit, wherein, when described index mark generation marker bit is lock-out state, represents current and can only calculate index mark corresponding to described target web;
Second data record judge module, for according to the mark of described target web in described targeted website, judge in the tables of data that described targeted website is corresponding, whether to there is data record corresponding to described target web, if do not exist, then trigger described index identifier generation module generating indexes mark;
Released state arranges module, for after described index identifier generation module generating indexes mark, index mark is generated marker bit and is set to released state, wherein, when described index mark generation marker bit is released state, represents and can start index mark corresponding to calculating other webpages except described target web.
8. device according to claim 6, is characterized in that, described index identifier generation module, comprising:
Index mark maximal value obtains submodule, for obtaining the maximal value Value_SID of index mark corresponding to the data record that stored in described database to be updated max;
Remainder calculating sub module, for calculating Value_SID maxdivided by the remainder R of the first threshold preset;
Remainder judges submodule, for judging whether remainder R is less than default Second Threshold;
First index mark calculating sub module, for when described remainder judges the judged result of submodule for being, calculates the Second Threshold that index mark=R+ corresponding to described target web presets;
Second index mark calculating sub module, for judging that the judged result of submodule is no at described remainder, calculates the 3rd threshold value that index mark=R+ corresponding to described target web presets.
9. device according to claim 8, is characterized in that,
Described index mark maximal value obtains submodule, specifically for the index ident value Value_SID_L stored according to this locality, obtains the maximal value Value_SID that index corresponding to the data record that stored in described database to be updated identifies max;
Described device also comprises:
First index mark arranges module, for after starting refresh routine corresponding to described database to be updated, is set to by the Value_SID_L that this locality stores: Value_SID_D maxvalue, Value_SID_D maxfor the maximal value of index mark corresponding to the data record stored in the database described to be updated that stores in described database to be updated;
Second index mark arranges module, after the index stored for arranging module installation this locality in described first index mark identifies, by Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value;
First index identification renewal module, after calculating index mark corresponding to described target web in described first index mark calculating sub module or described second index mark calculating sub module, Value_SID_L value is updated to the index mark that described target web is corresponding;
Index mark judge module, for judging whether the index mark that described target web is corresponding is greater than Value_SID_D max;
Second index identification renewal module, for when the judged result of described index mark judge module is for being, then by Value_SID_D maxbe updated to: current Value_SID_D max+ preset the 4th threshold value;
3rd index identification renewal module, for after terminating refresh routine corresponding to described database to be updated, by Value_SID_D maxbe updated to: Value_SID_D max=Value_SID_L.
10. device according to claim 6, is characterized in that, described device also comprises:
Index mark sending module, for after described index identifier generation module generating indexes mark, sends the index mark generated to webcrawler module or non-described database to be updated.
CN201410453679.XA 2014-09-05 2014-09-05 Database updating method and apparatus Pending CN105468637A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410453679.XA CN105468637A (en) 2014-09-05 2014-09-05 Database updating method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410453679.XA CN105468637A (en) 2014-09-05 2014-09-05 Database updating method and apparatus

Publications (1)

Publication Number Publication Date
CN105468637A true CN105468637A (en) 2016-04-06

Family

ID=55606342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410453679.XA Pending CN105468637A (en) 2014-09-05 2014-09-05 Database updating method and apparatus

Country Status (1)

Country Link
CN (1) CN105468637A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649576A (en) * 2016-11-15 2017-05-10 北京集奥聚合科技有限公司 Storing method and system for e-commerce commodities crawled by crawlers
CN106980673A (en) * 2017-03-27 2017-07-25 恒生电子股份有限公司 Main memory database table index updating method and system
CN109948100A (en) * 2019-03-12 2019-06-28 深圳市商舟网科技有限公司 Page info processing method, device, computer equipment and storage medium
CN112015819A (en) * 2020-08-31 2020-12-01 杭州欧若数网科技有限公司 Data updating method, device, equipment and medium for distributed graph database

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102495892A (en) * 2011-12-09 2012-06-13 北京大学 Webpage information extraction method
CN102724295A (en) * 2012-05-24 2012-10-10 中国电子科技集团公司第十五研究所 Data synchronization method and system
CN102831252A (en) * 2012-09-21 2012-12-19 北京奇虎科技有限公司 Method and device for updating index database and search method and system
US20140229450A1 (en) * 2013-01-23 2014-08-14 Michael Brekelmans Method for Organising Blog Articles in a Social Network
CN104021192A (en) * 2014-06-13 2014-09-03 北京联时空网络通信设备有限公司 Database renewing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102495892A (en) * 2011-12-09 2012-06-13 北京大学 Webpage information extraction method
CN102724295A (en) * 2012-05-24 2012-10-10 中国电子科技集团公司第十五研究所 Data synchronization method and system
CN102831252A (en) * 2012-09-21 2012-12-19 北京奇虎科技有限公司 Method and device for updating index database and search method and system
US20140229450A1 (en) * 2013-01-23 2014-08-14 Michael Brekelmans Method for Organising Blog Articles in a Social Network
CN104021192A (en) * 2014-06-13 2014-09-03 北京联时空网络通信设备有限公司 Database renewing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
施威铭研究室: "《完全掌握Microsoft access2002中文版标准教程》", 31 January 2002, 中国青年出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649576A (en) * 2016-11-15 2017-05-10 北京集奥聚合科技有限公司 Storing method and system for e-commerce commodities crawled by crawlers
CN106980673A (en) * 2017-03-27 2017-07-25 恒生电子股份有限公司 Main memory database table index updating method and system
CN106980673B (en) * 2017-03-27 2021-03-02 恒生电子股份有限公司 Method and system for updating internal memory database table index
CN109948100A (en) * 2019-03-12 2019-06-28 深圳市商舟网科技有限公司 Page info processing method, device, computer equipment and storage medium
CN112015819A (en) * 2020-08-31 2020-12-01 杭州欧若数网科技有限公司 Data updating method, device, equipment and medium for distributed graph database

Similar Documents

Publication Publication Date Title
CN102298614B (en) Method for determining collection category of page collection information and device and equipment
US9116899B2 (en) Managing changes to one or more files via linked mapping records
US10402310B1 (en) Systems and methods for reducing storage required for code coverage results
US20120054440A1 (en) Systems and methods for providing a hierarchy of cache layers of different types for intext advertising
US10102234B2 (en) Auto suggestion in search with additional properties
US20080091685A1 (en) Handling dynamic URLs in crawl for better coverage of unique content
US8239392B2 (en) Enhanced query performance using fixed length hashing of multidimensional data
US20120124028A1 (en) Unified Application Discovery across Application Stores
JP5542859B2 (en) Log management apparatus, log storage method, log search method, and program
CN102124481A (en) Embedding macros in web pages with advertisements
CN105468637A (en) Database updating method and apparatus
US10205678B2 (en) Systems and methods for client-side dynamic information resource activation and deactivation
CN106326025A (en) Method and device for processing abnormality of browser
CN104794177A (en) Data storing method and device
CN103186666A (en) Method, device and equipment for searching based on favorites
CN104484386A (en) Information sharing method and browser client terminal
CN104317931A (en) Webpage title determining method and device
CN102902784B (en) Web page classification storage system and method
CN105787379A (en) Information management method and system as well as electronic device
CN103107919A (en) Method and system for network resource modeling
CN102968445B (en) Based on the application call method and apparatus of browser input
US8463799B2 (en) System and method for consolidating search engine results
CN104899217A (en) Method and apparatus for implementing customized function
CN103678535A (en) Browser and downloading method thereof
JP2017097462A (en) Search program, search device and search method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160406

WD01 Invention patent application deemed withdrawn after publication