CN104699848A - Method and device for extracting data of limited Web databases - Google Patents

Method and device for extracting data of limited Web databases Download PDF

Info

Publication number
CN104699848A
CN104699848A CN201510154092.3A CN201510154092A CN104699848A CN 104699848 A CN104699848 A CN 104699848A CN 201510154092 A CN201510154092 A CN 201510154092A CN 104699848 A CN104699848 A CN 104699848A
Authority
CN
China
Prior art keywords
data
query
out device
unit
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510154092.3A
Other languages
Chinese (zh)
Other versions
CN104699848B (en
Inventor
杜鹃
张卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Yellow River Conservancy Technical Institute
Original Assignee
Zhengzhou University
Yellow River Conservancy Technical Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University, Yellow River Conservancy Technical Institute filed Critical Zhengzhou University
Priority to CN201510154092.3A priority Critical patent/CN104699848B/en
Publication of CN104699848A publication Critical patent/CN104699848A/en
Application granted granted Critical
Publication of CN104699848B publication Critical patent/CN104699848B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to the technical field of computers, and provides a method and a device for extracting data of limited Web databases. The method includes enabling the device for extracting the data to acquire an attribute value in a Web database query interface; generating and transmitting query requests to the limited Web databases; parsing query feedback web pages and extracting the query data; updating data in local databases according to the query data; analyzing the local databases by the aid of algorithms based on EdaliwdbFCA (extract data from limited Web database based on formal concept analysis) and generating next groups of query attribute values; completing data extraction when the numbers of the query data are equal to preset thresholds. The device for extracting the data comprises a query attribute value acquisition unit, a query unit, a parsing unit, a data updating unit, a query attribute value generating unit and a query completing unit. The formal concept analysis method can be used for extracting the data of the limited Web databases on the basis of the attribute value query interface, and accordingly the high-quality data can be extracted from the limited Web databases. The method and the device have the advantages of good stability and high efficiency.

Description

The data pick-up method in limited web data storehouse and device
Technical field
The present invention relates to field of computer technology, in particular to a kind of data pick-up method and device of limited web data storehouse.
Background technology
No matter be for technical reason or application demand, be limited within the specific limits if the inquiry in web data storehouse returns results, namely set of properties is used to inquire about web data storehouse, only k object can be obtained automatically by program, and the web data storehouse so with such feature is limited web data storehouse.Web page is divided into shallow net and dark net, shallow net is the static Web page coupled together by hyperlink, and according to statistics, the scale of dark net resource is about 500 times of static page resource, have the better quality of data, and in deeply netting, most important resource is exactly web data storehouse simultaneously.How to extract the data in limited web data storehouse, and to extract the higher data of quality be the problem be widely studied always.
Summary of the invention
In view of this, the object of the present invention is to provide a kind of data pick-up method and device of limited web data storehouse, can realize extracting the higher data of quality from limited web data storehouse.
The present invention is achieved in that
First aspect, embodiments provide a kind of data pick-up method of limited web data storehouse, be applied to the data pick-up device in limited web data storehouse, described draw-out device comprises local data base, and described method comprises:
Described draw-out device obtains a property value in web data library inquiry interface;
Described inquiry request, according to described attribute value generation inquiry request, is sent to described limited web data storehouse by described draw-out device;
Described draw-out device resolves the Webpage of query feedback, extracts the data query included by described Webpage;
Described draw-out device upgrades the data in local data base according to described data query;
Described draw-out device is by extracting (Extract data from Limited Web Database based on Formal ConceptAnalysis based on the limited web data storehouse of maximum sub-concept, EdaliwdbFCA) algorithm is analyzed described local data base, produce next group polling property value, again to inquire about described limited web data storehouse;
When the number of described data query equals the predetermined threshold value of the data number of the Webpage every page display inquiring about rear feedback, the extraction of described draw-out device end data.
In conjunction with first aspect, embodiments provide the first possible embodiment of first aspect, before wherein said draw-out device resolves the Webpage of query feedback, described method also comprises:
Judge the Webpage whether receiving feedback query in Preset Time;
If the Webpage of non-feedback query in Preset Time, described inquiry request is sent to described limited web data storehouse by described draw-out device again.
Faced by this abstracting method is an internet complicated and changeable, and any accident all may cause in extraction process inquires about failed phenomenon.Therefore, inquire about each time and all need to be managed and to safeguard, failed inquiry can be found, and can re-start inquiry, and this abstracting method can be made like this to have better robustness, can ensure carrying out smoothly of extraction work.
In conjunction with first aspect, embodiments provide the embodiment that the second of first aspect is possible, wherein said draw-out device upgrades the data in local data base according to described data query, comprising:
Described draw-out device compares the data in the data query and described local data base extracted;
The data query of the data be different from described local data base adds in described local data base by described draw-out device.
Extracted data be by the data in restricted web database according to certain rule extraction in local data base, the data in restricted web database can be utilized.If there are the data extracted in local data base, then do not need to be added to again in local data base.
In conjunction with first aspect, embodiments provide the third possible embodiment of first aspect, wherein said draw-out device, according to described attribute value generation inquiry request, comprising:
Single-value attribute is converted into the multi-valued attribute that described web data library inquiry interface can identify by described draw-out device.
Second aspect, the embodiment of the present invention additionally provides a kind of data pick-up device of limited web data storehouse, and described draw-out device comprises local data base, and described draw-out device also comprises:
Querying attributes value obtains unit, for obtaining a property value in web data library inquiry interface;
Query unit, for according to described attribute value generation inquiry request, sends to described limited web data storehouse by described inquiry request;
Resolution unit, for resolving the Webpage of query feedback, extracts the data query included by described Webpage;
Data updating unit, for upgrading the data in local data base according to described data query;
Querying attributes value generation unit, for analyzing described local data base by extracting EdaliwdbFCA algorithm based on maximum sub-concept limited web data storehouse, produces next group polling property value, again to inquire about described limited web data storehouse;
Poll-final unit, during for equaling to inquire about the predetermined threshold value of the data number of the Webpage every page display of rear feedback when the number of described data query, the extraction of end data.
In conjunction with second aspect, embodiments provide the first possible embodiment of second aspect, wherein said resolution unit comprises:
Webpage receives judgment sub-unit, for judging the Webpage whether receiving feedback query in Preset Time;
If the Webpage of non-feedback query in Preset Time, described inquiry request is sent to described limited web data storehouse by described query unit again.
This draw-out device is applied in internet complicated and changeable, and any accident all can cause in extraction process inquires about failed phenomenon.Therefore, inquire about each time and all need to be managed and to safeguard, failed inquiry can be found, and can re-start inquiry, and this draw-out device can be made like this to have better robustness, can ensure carrying out smoothly of extraction work.
In conjunction with second aspect, embodiments provide the embodiment that the second of second aspect is possible, wherein said data updating unit comprises:
Relatively subelement, the data in the data query extracted for more described resolution unit and described local data base;
Data add subelement, for being added in described local data base by the data query extracted of the data be different from described local data base.
Extracted data be by the data in restricted web database according to certain rule extraction in local data base, the data in restricted web database can be utilized.If there are the data extracted in local data base, then do not need to be added to again in local data base.
In conjunction with second aspect, embodiments provide the third possible embodiment of second aspect, wherein said query unit comprises:
Attribute transformant unit, for being converted into the multi-valued attribute that described web data library inquiry interface can identify by single-value attribute.
The embodiment of the present invention provides a kind of data pick-up method and device of limited web data storehouse, by the data pick-up of combining form conceptual analysis method to the limited web data storehouse based on property value query interface, realize extracting the higher data of quality in limited web data storehouse, and there is the fast feature of good stability, efficiency.
For making above-mentioned purpose of the present invention, feature and advantage become apparent, preferred embodiment cited below particularly, and coordinate appended accompanying drawing, be described in detail below.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, be briefly described to the accompanying drawing used required in embodiment below, be to be understood that, the following drawings illustrate only some embodiment of the present invention, therefore the restriction to scope should be counted as, for those of ordinary skill in the art, under the prerequisite not paying creative work, other relevant accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 shows the data pick-up method in a kind of limited web data storehouse that the embodiment of the present invention provides;
Fig. 2 shows the data pick-up method in the limited web data storehouse of another kind that the embodiment of the present invention provides;
Fig. 3 shows the data pick-up device in a kind of limited web data storehouse that the embodiment of the present invention provides;
Fig. 4 shows the data pick-up device in the limited web data storehouse of another kind that the embodiment of the present invention provides.
Mark in figure: local data base 301, querying attributes value obtains unit 302, query unit 303, limited web data storehouse 304, resolution unit 305, data updating unit 306, querying attributes value generation unit 307, poll-final unit 308, webpage receives judgment sub-unit 309, relatively subelement 310, data add subelement 311, attribute transformant unit 312.
Embodiment
Below in conjunction with accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.The assembly of the embodiment of the present invention describing and illustrate in usual accompanying drawing herein can be arranged with various different configuration and design.Therefore, below to the detailed description of the embodiments of the invention provided in the accompanying drawings and the claimed scope of the present invention of not intended to be limiting, but selected embodiment of the present invention is only represented.Based on embodiments of the invention, the every other embodiment that those skilled in the art obtain under the prerequisite not making creative work, all belongs to the scope of protection of the invention.
The acquisition of web data is mainly through obtaining Web page information extraction.Web page is divided into shallow net and dark net.Shallow net is the static Web page coupled together by hyperlink, and its content can by current universal search engine (Google, Baidu etc.) direct index and retrieval.Dark net refers to that those ask the Web page dynamically produced by Web server according to user.Wherein addressable online database (here referred to as web data storehouse or WDB), such as, in National IP Network, ten-thousand-ton train, remarkable Amazon etc., they are important dark net ingredients.The content in web data storehouse is stored in real background data base, and major part can not by current universal search engine institute index.Dark net Web page content, only when being queried, just according to the inquiry request of user, dynamically can be generated by Web server, and result is returned to visitor.
The data pick-up method in the limited web data storehouse that the embodiment of the present invention provides and device by setting up the mapping relations between the concept lattice corresponding to overall Formal Context and local Formal Context, and then carry out careful formalization analysis.Then propose and adopt only construct current queries concept under be covered as the method for query concept search volume, avoid the structure of lower half concept lattice.And provide corresponding structural theory and prune rule, reduce the complexity based on query selection in the data extraction process in the limited web data storehouse of form concept analysis (formal concept analysis, FCA) further.
Wherein, Formal Context is a tlv triple K=(O, A, I), and wherein O is object (entity) set, and A is descriptor (attribute) set, and I is a binary relation between O and A, namely I ⊆ O × A .
Formal notion is two tuple c=(X, Y), wherein meet X'=Y and X=Y', then c is known as a formal notion of Formal Context K, and wherein X and Y is called as the new extension and connotation of concept c respectively.The set expression of the form of ownership concept that Formal Context K produces is C k.
Concept lattice (Formal Concept Lattice), is referred to as again Galois lattice (GaloisLattice), for all concept set C that Formal Context K produces k, and C kon the ordered set L that derives of partial ordering relation k=(C k,≤), be referred to as the concept lattice of Formal Context K.Each node in concept lattice is a formal notion.
Be called that the up/down of concept c covers by the set that all direct father's concept/direct sub-concept forms of concept c.
The data extraction process in web data storehouse can be modeled as the Select inquiry in sql like language.This process of type of service conceptual analysis can be turned to function Q by form.Such use attribute set inquire about, the Query Result of attribute Y can be expressed as Q (Y).Online web data can be seen as overall Formal Context, is expressed as K g=(O g, A g, R g); And the data being drawn into this locality form local Formal Context, be expressed as K l=(O l, A l, R l), wherein a l=A g, R l=R g.All concepts that so overall Formal Context produces and the concept lattice that they are formed are expressed as C gand L g.Correspondingly, local Formal Context K lthe concept lattice of all concepts formed and their formation is expressed as C land L l.
Function c l→ C g:
wherein (X, Y) ∈ L l, the Galois lattice operation of intension Y on overall Formal Context is expressed as Y g' and Y g".
Full concept, if c ∈ is L l, and then concept c is called as Full concept.
For a Formal Context K=(O, A, I), if there is certain object a, its property set had is Y, and has the object number > Δ of property set Y in whole Formal Context, namely and meet || a " || > △, then object a is referred to as the undistinguishable object of Formal Context K under limited threshold value Δ.
Consult Fig. 1, a kind of data pick-up method of limited web data storehouse, be applied to the data pick-up device in limited web data storehouse, draw-out device comprises local data base, and method comprises:
S101: draw-out device obtains a property value in web data library inquiry interface.
S102: inquiry request, according to attribute value generation inquiry request, is sent to limited web data storehouse by draw-out device.
S103: draw-out device resolves the Webpage of query feedback, extracts the data query included by Webpage.
S104: draw-out device upgrades the data in local data base according to data query.
S105: draw-out device, by analyzing local data base based on EdaliwdbFCA algorithm, produces next group polling property value, again to inquire about limited web data storehouse.
S106: when the number of data query equals the predetermined threshold value of the data number of the Webpage every page display inquiring about rear feedback, the extraction of draw-out device end data.
Faced by draw-out device is internet complicated and changeable, and any accident all can cause the interruption of extraction process.Consult Fig. 2, embodiments provide the data pick-up method in another kind of limited web data storehouse, the method has robustness.Method comprises:
S201: draw-out device obtains a property value in web data library inquiry interface.
S202: inquiry request, according to attribute value generation inquiry request, is sent to limited web data storehouse by draw-out device.Wherein, when needs send inquiry request, single-value attribute is converted into the multi-valued attribute that described web data library inquiry interface can identify by draw-out device, to realize inquiry.
The present embodiment describes this transforming relationship by Xml file, can be met the demand of interface renewal, and do not need to recompilate source code by amendment interface document.Below list the scale mapping XML file of part Sina mobile phones enquiring interface.
File:SinaMobileProScale.xml
<?xml version="1.0"encoding="UTF-8"standalone="no"?>
<!--sina mobile select web deep database,scale definition-->
<!--DOCTYPE scale-set SYSTEM"scale.dtd"-->
<!DOCTYPE scale-set[
<!ELEMENT scale-set(scale+)>
<!ELEMENT scale(attribute-list,object+)>
<!ATTLIST scale name CDATA#REQUIRED>
<!ATTLIST scale type CDATA"rating">
<!ATTLIST scale id CDATA#IMPLIED>
<!ELEMENT attribute-list(#PCDATA)>
<!ELEMENT object(#PCDATA)>
<!ATTLIST object name CDATA#REQUIRED>
<!ATTLIST object id CDATA#IMPLIED>
]>
<scale-set>
<scale name="mobile_jiage1"id="0">
<attribute-list></attribute-list>
<object name="0-499"id="0"></object>
<object name="500-999"id="1"></object>
<object name="1000-1499"id="2"></object>
<object name="1500-1999"id="3"></object>
<object name="2000-2999"id="4"></object>
<object name="3000-1000000"id="5"></object>
</scale>
<scale name="mobile_face"id="2">
<attribute-list></attribute-list>
<object name=" straight plate " id=" 12 " ></object>
<object name=" upturning lid, down turnover cover " id=" 13 " ></object>
<object name=" slip lid " id=" 14 " ></object>
<object name=" rotates, revolves shadow " id=" 15 " ></object>
<object name=" other " id=" 16 " ></object>
</scale>
</scale-set>
S203: judge the Webpage whether receiving feedback query in Preset Time; If the Webpage of non-feedback query in Preset Time, described inquiry request is sent to described limited web data storehouse by described draw-out device again.
After the Webpage of Preset Time internal feedback inquiry, perform
S204: draw-out device resolves the Webpage of query feedback, extracts the data query included by Webpage.
S205: draw-out device compares the data in the data query and local data base extracted.
If different, then perform S206: the data query of the data be different from local data base adds in local data base by draw-out device.
S207: draw-out device, by analyzing local data base based on EdaliwdbFCA algorithm, produces next group polling property value, again to inquire about limited web data storehouse.
S208: when the number of data query equals the predetermined threshold value of the data number of the Webpage every page display inquiring about rear feedback, the extraction of draw-out device end data.
The data pick-up method of restricted web database disclosed in the embodiment of the present invention, by selecting single attributive concept as initial query concept, if current candidate query concept is not a Full concept, then mean that to return results quantity too much, and be greater than limited threshold value Δ, therefore can not be displayed in same Web page, and then can not be extracted and obtain.Extract according to current the local Formal Context obtained, constructed lower covering Covl (c) of this concept c, until choose extension gesture to be less than or equal to the concept of limited threshold value as actual queries concept.In the extraction process in whole web data storehouse, send query concept intension Y as querying attributes collection, by upgrading local Formal Context to the extraction that under limited situation, it returns results.In whole query script, use prune rule to reduce query concept quantity, improve algorithm extraction efficiency.
Consult Fig. 3, this embodiment provides a kind of data pick-up device of limited web data storehouse, and draw-out device comprises local data base 301, and draw-out device also comprises:
Querying attributes value obtains unit 302, for obtaining a property value in web data library inquiry interface.
Query unit 303, for according to attribute value generation inquiry request, sends to limited web data storehouse 304 by inquiry request.
Wherein, query unit 303 comprises attribute transformant unit 312, for single-value attribute being converted into the multi-valued attribute that web data library inquiry interface can identify.
Resolution unit 305, for resolving the Webpage of query feedback, extracts the data query included by Webpage.
Data updating unit 306, for upgrading the data in local data base 301 according to data query.
Querying attributes value generation unit 307, for by analyzing local data base 301 based on EdaliwdbFCA algorithm, produces next group polling property value, again to inquire about limited web data storehouse 304.
Poll-final unit 308, during for equaling to inquire about the predetermined threshold value of the data number of the Webpage every page display of rear feedback when the number of data query, the extraction of end data.
According to said apparatus, the target data in limited web data storehouse 304 can be drawn in local data base 301, realize the search to dark net resource.In order to make the data pick-up device in limited web data storehouse 304, there is better robustness, extract the data in limited web data storehouse 304 better, consult Fig. 4, the embodiment of the present invention provides the data pick-up device in another kind of limited web data storehouse 304, comprise local data base 301, draw-out device also comprises:
Querying attributes value obtains unit 302, for obtaining a property value in web data library inquiry interface.
Query unit 303, for according to attribute value generation inquiry request, sends to limited web data storehouse 304 by inquiry request.
Resolution unit 305, for resolving the Webpage of query feedback, extracts the data query included by Webpage.
Wherein, resolution unit 305 comprises webpage reception judgment sub-unit 309, for judging the Webpage whether receiving feedback query in Preset Time.If the Webpage of non-feedback query in Preset Time, inquiry request is sent to limited web data storehouse 304 by query unit 303 again.
Data updating unit 306, for upgrading the data in local data base 301 according to data query.
Wherein, data updating unit 306 comprises: compare subelement 310 and data interpolation subelement 311.
Relatively subelement 310, for comparing the data in data query and local data base 301 that resolution unit 305 extracts;
Data add subelement 311, for being added in local data base 301 by the data query extracted of the data be different from local data base 301.
Querying attributes value generation unit 307, for by analyzing local data base 301 based on EdaliwdbFCA algorithm, produces next group polling property value, again to inquire about limited web data storehouse 304.
Poll-final unit 308, during for equaling to inquire about the predetermined threshold value of the data number of the Webpage every page display of rear feedback when the number of data query, the extraction of end data.
The draw-out device provided to make the embodiment of the present invention has good extendability, namely may be used for the extraction work of different data sources (different web data storehouses or simulation web data storehouse), EdaliwdbFCA is encapsulated in ExtractStrategy class.This draw-out device by abstract for data source to be extracted be Formal Context, therefore use DBContext class description form background, and Polymeric encapsulation withdrawal device abstract class DataExtractor.Meanwhile, the Galois contact computing required for algorithm EdaliwdbFCA is also encapsulated in DBContext class.Draw-out device needs the query manipulation sent then to be completed by the concrete entity of the SendQuery abstract function in abstract class DataExtractor.SendQuery function needs the query interface according to concrete data source, query concept is converted into the multi-valued attribute meeting interface specification, and sends inquiry request.Extract the data obtained to need to put into local data base 301, therefore abstract class DataExtractor comprises database module DBModule object.Class SinaMobileExtractor and DBDataExtractor is the specific implementation of abstract class DataExtractor, tackles different extraction tasks.And these extract the difference of task due to data source, concrete extraction process, and Web query interface is also different.XExtractor represents the specific implementation of any abstract class DataExtractor, thus declared attribute selection algorithm is independent of concrete extraction process.Therefore if new extraction task need be added, then add the realization of corresponding abstract class DataExtractor.DBDataExtractor class is extracted data from simulation web data storehouse, therefore comprises DBModule object.
These are only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1. the data pick-up method in limited web data storehouse, is characterized in that, be applied to the data pick-up device in limited web data storehouse, described draw-out device comprises local data base, and described method comprises:
Described draw-out device obtains a property value in web data library inquiry interface;
Described inquiry request, according to described attribute value generation inquiry request, is sent to described limited web data storehouse by described draw-out device;
Described draw-out device resolves the Webpage of query feedback, extracts the data query included by described Webpage;
Described draw-out device upgrades the data in local data base according to described data query;
Described draw-out device is analyzed described local data base by extracting EdaliwdbFCA algorithm based on maximum sub-concept limited web data storehouse, produces next group polling property value, again to inquire about described limited web data storehouse;
When the number of described data query equals the predetermined threshold value of the data number of the Webpage every page display inquiring about rear feedback, the extraction of described draw-out device end data.
2. the data pick-up method in limited web data storehouse according to claim 1, is characterized in that, before described draw-out device resolves the Webpage of query feedback, described method also comprises:
Judge the Webpage whether receiving feedback query in Preset Time;
If the Webpage of non-feedback query in Preset Time, described inquiry request is sent to described limited web data storehouse by described draw-out device again.
3. the data pick-up method in limited web data storehouse according to claim 1, is characterized in that, described draw-out device upgrades the data in local data base according to described data query, comprising:
Described draw-out device compares the data in the data query and described local data base extracted;
The data query of the data be different from described local data base adds in described local data base by described draw-out device.
4. the data pick-up method in limited web data storehouse according to claim 1, is characterized in that, described draw-out device, according to described attribute value generation inquiry request, comprising:
Single-value attribute is converted into the multi-valued attribute that described web data library inquiry interface can identify by described draw-out device.
5. the data pick-up device in limited web data storehouse, is characterized in that, described draw-out device comprises local data base, and described draw-out device also comprises:
Querying attributes value obtains unit, for obtaining a property value in web data library inquiry interface;
Query unit, for according to described attribute value generation inquiry request, sends to described limited web data storehouse by described inquiry request;
Resolution unit, for resolving the Webpage of query feedback, extracts the data query included by described Webpage;
Data updating unit, for upgrading the data in local data base according to described data query;
Querying attributes value generation unit, for analyzing described local data base by extracting EdaliwdbFCA algorithm based on maximum sub-concept limited web data storehouse, produces next group polling property value, again to inquire about described limited web data storehouse;
Poll-final unit, during for equaling to inquire about the predetermined threshold value of the data number of the Webpage every page display of rear feedback when the number of described data query, the extraction of end data.
6. the data pick-up device in limited web data storehouse according to claim 5, it is characterized in that, described resolution unit comprises:
Webpage receives judgment sub-unit, for judging the Webpage whether receiving feedback query in Preset Time;
If the Webpage of non-feedback query in Preset Time, described inquiry request is sent to described limited web data storehouse by described query unit again.
7. the data pick-up device in limited web data storehouse according to claim 5, it is characterized in that, described data updating unit comprises:
Relatively subelement, the data in the data query extracted for more described resolution unit and described local data base;
Data add subelement, for being added in described local data base by the data query extracted of the data be different from described local data base.
8. the data pick-up device in limited web data storehouse according to claim 5, it is characterized in that, described query unit comprises:
Attribute transformant unit, for being converted into the multi-valued attribute that described web data library inquiry interface can identify by single-value attribute.
CN201510154092.3A 2015-04-02 2015-04-02 The data pick-up method and device in limited web data storehouse Expired - Fee Related CN104699848B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510154092.3A CN104699848B (en) 2015-04-02 2015-04-02 The data pick-up method and device in limited web data storehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510154092.3A CN104699848B (en) 2015-04-02 2015-04-02 The data pick-up method and device in limited web data storehouse

Publications (2)

Publication Number Publication Date
CN104699848A true CN104699848A (en) 2015-06-10
CN104699848B CN104699848B (en) 2018-04-27

Family

ID=53346968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510154092.3A Expired - Fee Related CN104699848B (en) 2015-04-02 2015-04-02 The data pick-up method and device in limited web data storehouse

Country Status (1)

Country Link
CN (1) CN104699848B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7181471B1 (en) * 1999-11-01 2007-02-20 Fujitsu Limited Fact data unifying method and apparatus
CN101697221A (en) * 2009-09-18 2010-04-21 何国健 Method for obtaining reading access to limited content of web site by purchasing web site products
CN103560943A (en) * 2013-10-31 2014-02-05 北京邮电大学 Network analytic system and method supporting real-time mass data processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7181471B1 (en) * 1999-11-01 2007-02-20 Fujitsu Limited Fact data unifying method and apparatus
CN101697221A (en) * 2009-09-18 2010-04-21 何国健 Method for obtaining reading access to limited content of web site by purchasing web site products
CN103560943A (en) * 2013-10-31 2014-02-05 北京邮电大学 Network analytic system and method supporting real-time mass data processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张卓: "基于形式概念分析的Web数据库抽取研究", 《中国博士学位论文全文数据库信息科技辑》 *

Also Published As

Publication number Publication date
CN104699848B (en) 2018-04-27

Similar Documents

Publication Publication Date Title
US20200183932A1 (en) Optimizing write operations in object schema-based application programming interfaces (apis)
CN107480198B (en) Distributed NewSQL database system and full-text retrieval method
CN102164186B (en) Method and system for realizing cloud search service
CN107391502B (en) Time interval data query method and device and index construction method and device
CN102567436A (en) Multi-Tenant system
CN104967620A (en) Access control method based on attribute-based access control policy
CN105431844A (en) Third party search applications for a search system
CN102760058B (en) Massive software project sharing method oriented to large-scale collaborative development
CN105468720A (en) Method for integrating distributed data processing systems, corresponding systems and data processing method
CN108154024B (en) Data retrieval method and device and electronic equipment
CN107423037B (en) Application program interface positioning method and device
CN102968454A (en) Method and equipment for obtaining search results of popularization object
CN102999600A (en) Method and system for automatically generating embedded database
CN104462429A (en) Method and device for generating database query sentences
CN113434482A (en) Data migration method and device, computer equipment and storage medium
CN104636368A (en) Data retrieval method and device and server
CN105930354B (en) Storage model conversion method and device
CN105843809B (en) Data processing method and device
CN112905600A (en) Data query method and device, storage medium and electronic equipment
CN103902651A (en) Cloud code query method and device based on MongoDB
CN105488165A (en) Data retrieval method and system based on index database
CN107291875B (en) Metadata organization management method and system based on metadata graph
CN104699848A (en) Method and device for extracting data of limited Web databases
CN117009430A (en) Data management method, device, storage medium and electronic equipment
CN115640476A (en) Site column management method, terminal and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Du Juan

Inventor after: Zhang Zhuo

Inventor after: Cao Jianchun

Inventor before: Du Juan

Inventor before: Zhang Zhuo

CB03 Change of inventor or designer information
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180427

Termination date: 20210402

CF01 Termination of patent right due to non-payment of annual fee