CN101984434A - Webpage data extracting method based on extensible language query - Google Patents

Webpage data extracting method based on extensible language query Download PDF

Info

Publication number
CN101984434A
CN101984434A CN 201010545520 CN201010545520A CN101984434A CN 101984434 A CN101984434 A CN 101984434A CN 201010545520 CN201010545520 CN 201010545520 CN 201010545520 A CN201010545520 A CN 201010545520A CN 101984434 A CN101984434 A CN 101984434A
Authority
CN
China
Prior art keywords
node
label
attribute
data
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201010545520
Other languages
Chinese (zh)
Other versions
CN101984434B (en
Inventor
聂铁铮
于戈
王波涛
岳德君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201010545520A priority Critical patent/CN101984434B/en
Publication of CN101984434A publication Critical patent/CN101984434A/en
Application granted granted Critical
Publication of CN101984434B publication Critical patent/CN101984434B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A webpage data extracting method based on extensible language query belongs to the technical field of computer database; and the method comprises the following steps: determining the corresponding mode structure in the Web page when extracting the data contents; locating the data area, the data unit and the attribute text in the Web page; marking the semanteme of the attribute text; generating the data unit node path; calculating the path expression form of extracting the attribute value; generating the XML query sentence for extracting the data; and extracting the data by means of the XML query sentence. The method can generate precision XML query sentence for guaranteeing the correctness of the XML query sentence; the method has high generality and can be combined with the current method in seamless; and the method can adapt to more complex query result output.

Description

Web data abstracting method based on the extensible language inquiry
Technical field
The invention belongs to the computer database technology field, particularly a kind of web data abstracting method based on the extensible language inquiry.
Background technology
Continuous development along with the Web field, data message increases rapidly among the Web, current each application continues to increase the demand of Web data, though comprised a large amount of structurings and semi-structured data among the Web, but these data owners will offer the user by browser with the form of hypertext markup language HTML and browse, be difficult to be directly used among the application such as data mining and data integration, therefore how efficiently and exactly from a large amount of Web pages drawing-out structureization and semi-structured data become more and more important, mainly be divided three classes at the typical abstracting method of Web data: based on the method for html tag tree or DOM Document Object Model dom tree; Method based on page structure; Method based on visual information; Method based on html tag tree or DOM Document Object Model dom tree mainly comprises XWRAP, RoadRunner, Lixto, MDR and MDRII etc.; Method groundwork based on page structure comprises NoDoSE, DEByE and SG-WRAP etc.; Based on the method for visual information mainly based on ViDRE;
Is a kind of commonplace method based on html tag tree or DOM Document Object Model dom tree to data recording extraction in the page, before extracted data, be the DOM Document Object Model dom tree with the Web conversion of page at first based on label, then based on architectural feature in the dom tree and automatic or automanual decimation rule extracted data therefrom, method based on page structure is at first formulated the structure that comprises data division in the page, in the page, seek similar part as extracting the result according to this structure again, but, for the page simple in structure, it can obtain good effect, if in the page dom tree in complex structure and the data field noise node too much, then treatment effect is not fine, but also can't support the data identification of nested structure;
Mainly utilize the position habit feature of user's browsing content in the webpage design from the relevant position extracted data based on the technology of visual information extracted data in the webpage, a kind of abstracting method that the ViDRE of Microsoft Research, Asia proposes based on visualization feature, this method is simulated the identifying of human eye to the page to a certain extent, finally reach the purpose of identifying object information, yet, on the one hand, when the page does not have tangible visual signature, extraction efficiency based on vision can seriously reduce, and on the other hand, be applicable to based on the mode of vision the single page is carried out data pick-up, the page efficiency in extracting different for the identical data of a large amount of structures will be very low;
Above method is only applicable to comprise the webpage of simple data structure, will be difficult to expression or produce the attribute disappearance if the data in the webpage are hierarchical relationship then the result that extracts, therefore is difficult to the content of pages of handle data structures complexity; Secondly, these methods directly generate after initialization and extract result data, if wherein have the Attribute Recognition mistake then to be difficult to timely correction; In addition, these method operations are relatively very independent, are difficult to combine with the existing database system, therefore lack the unified management to web data.
Summary of the invention
For remedying the deficiency of said method, the invention provides a kind of web data abstracting method based on the extensible language inquiry.
Technical scheme of the present invention is achieved in that based on the web data abstracting method of extensible language inquiry, may further comprise the steps:
Step 1: pairing mode configuration when determining in the Web page extracted data content;
Mode configuration comprises: 2 kinds of the list structure of relation form and hierarchical structures, wherein, the data pattern S of list structure is by data entity name E and one group of community set A={A 1..., A nConstitute A wherein i(1<=i<=n, the quantity of n representation attribute) attribute in the representation attribute set, data type by Property Name and attribute constitutes, be expressed as<N, T 〉, N representation attribute title wherein, T representation attribute data type, described data type T comprises integer type integer, floating point type float and character string type string; Described hierarchical structure is meant the complex data structures of being made up of fundamental type, and its corresponding data pattern is expressed as , comprise attribute
Figure 401172DEST_PATH_IMAGE002
, m is a pattern
Figure 200501DEST_PATH_IMAGE001
The quantity of middle attribute;
Step 2: data area, data cell and attribute text in the Web page of location;
The Web page source code format of html language description is turned to the document of XML language;
Described data area Da is meant the zone that minimum border comprised that comprises all data cells in the Web page, and localization method is: corresponding minimum subtree that comprises all data cells in the DOM Document Object Model DOM structure of Web page correspondence;
Described data cell Du, the data entity of the mode configuration correspondence that expression Web data pick-up will obtain usually by the attribute description in the pattern, repeats with certain rules in the page; Localization method is: in the DOM Document Object Model dom tree of the Web page, find out the node at each property content place of data entity in the page, the minimum subtree that comprises these nodes is exactly a data cell;
Described attribute text At, be illustrated in the content of text that comprises the property value of data pattern attribute in the Web page, usually in the DOM Document Object Model dom tree of the Web page in the text node of node element, localization method is property value: find out the node that comprises this property value text in the DOM Document Object Model dom tree structure of Web page correspondence;
Step 3: the attribute text in the step 2 is carried out semantic tagger;
Method is: all specify the attribute that is comprised in one or more data patterns for each the attribute text that is comprised in each data cell;
Step 4: generate the data cell node path, may further comprise the steps:
Step 4-1: the data cell set that step 2 is obtained is expressed as: U={U 1, U 2..., U n, wherein, U iRepresent a data unit, i=1 wherein ..., n;
Step 4-2: according to established data unit U i, institute is to deserved node element in page XML document to determine it, and this node table is shown N i, the structure according to XML document is node element N again iThe path values of generation from root node to this node is expressed as P i
Step 4-3: the path expression of computational data unit, method is:
Get the path of a data cell node, at path values P iIn, each step in the predicate location path expression formula of use location, promptly by the documentation root node to the node element of data cell correspondence each node of process, get each node label in the path expression, the path of all data cells has identical sequence label, then the sequence label that begins from root node is expressed as T, is expressed as (T respectively comprising m label 1, T 2..., T m), label T wherein 1Be the label of root node, all the other labels and the like, the label of each node is expressed as (p at it with the position sequence in the label brotgher of node I1..., p Im), position p wherein I1Be the position of root node label, all the other labels and the like, then path values is expressed as:
Path values P i=/label 1 [position i1]/label 2[position i2]/... / label m[position im],
Be Pi=/T 1[p I1]/T 2[p I2]/.../T m[p Im]/
Step 4-4:, calculate the longest common path LCP that begins from root node to the set of paths of data unit:
The longest described common path is meant the path that the total node in the path of all data cell nodes constitutes, the method of calculating the longest common path LCP is: for the path of data cell node, first label position that begins from root node begins coupling, if the positional value of all data cell node paths under current label is identical, i.e. p 1i=p 2i=...=p Ni, then current label and positional value are added in the longest common path in proper order, i.e. LCP+=/T i[p i], if there is different value in the positional value of all data cell node paths under current label, then stop the coupling, with the longest current common path value as the longest final common path value;
Step 4-5: the longest common path LCP that abbreviation step 4-4 calculates;
For one in the longest common path pairing node of step, be expressed as n i, corresponding label is T iIf, do not exist identically in its brotgher of node with its label, and to have identical successor path be "/label I+1/ .../label m" the non-data cell node of descendants's node, then the positional value of this node can omit in the expression formula of the longest common path;
Step 4-6: adopt the method that generates predicate to calculate local path, described local path is meant the path that the privately owned node of each node constitutes, it is the predicate expression formula of location node accurately, can filter incoherent node in all data cell nodes in location:
The method that generates predicate is: the label of supposing the node in current step is Ti, sees in all brotghers of node of node set in the current step, and whether comprise identical with its label and have identical successor path is "/label I+1/ .../label m" the non-data cell node of descendants's node; if then do not omit predicate; if having to check again then whether the XML of non-data cell node attribute is arranged in the present node; present node and the non-data cell node that meets top condition can be distinguished; and if such XML attribute were arranged with this attribute as the predicate expression formula; if there is not then further to calculate the scope of positional value in the predicate, these qualified non-data cell nodes are called the noise node;
The method of the scope of positional value is as follows in the described calculating predicate:
If the noise node only appears at before the data cell node set, then for this label list registration according to the scope of position in the predicate of cell node be: positional value minimum from the pairing node location of label i of all data cell nodes is to a last node with this label;
If the noise node only appears at after the data cell node set, then according to the scope of position in the predicate of cell node be: maximum positional value the pairing node location of label i from first to all data cell nodes for this label list registration;
If back end is cut apart regularly by the noise node, the interval p that the computational data cell node is cut apart by the noise node Inte, the length p that the computational data cell node occurs continuously ContAnd calculate positional value minimum in the pairing node location of label i of all data cell nodes and maximum positional value, be expressed as pmin and pmax, the node that then meets the lower position condition is considered to the node on the data cell path: after (1) node location value deducts pmin, to p InteRemainder behind the delivery is less than p ContValue; (2) the node location value deducts pmax less than maximum noise node location value and adds value after 1 again;
Step 4-7: merge the longest common path and local path;
Will the longest common path and local path merge, obtain the path P u of locator data unit in the XML document of the Web page;
Step 5: calculate the path expression that extracts property value, may further comprise the steps;
Step 5-1: the path that generates the attribute node location;
Suppose in sample data pattern attribute A iThe node at property value place with respect to the path representation of data cell node be:
/ label A I1[position A I1]/label A I2[position A I2]/... / label A Ik[position A Ik]
Promptly/TA I1[pA I1]/TA I2[pA I2]/... / TA Ik[pA Ik], TA wherein IjThe expression label A Ij, pA IjExpression position A Ij, j=1 wherein ..., k, label A IkFor comprising the label of property value node, position A IkFor this node at it with the position in the label brotgher of node, then can use the method for step 4-5, abbreviation is carried out in the path of attribute node location;
Step 5-2: determine the property value decimation rule;
The property value decimation rule is applicable to following two kinds of situations: 1, the property value of a plurality of attributes is contained in the node text simultaneously; 2, comprise non-property value content of text in the node content of text;
Suppose that non-property value content of text is a fixed text in the node text, and also use fixing text to cut apart between the property value of the different attribute in same node text, only need calculate the property value that the fixed character string of cutting apart attribute in the node text gets final product unbundled attribute value text or different attribute, method is:
At first get a plurality of sample Web pages, therefrom extract the node text that comprises same alike result, if alphabet is the property value content then directly extraction in this node text, otherwise extracts public substring and cut apart attribute, from the node text, extract the regular as follows of property value:
If fixed text Text1 is arranged before the property value of attribute Ai in the node text, then at first node text-string Str is got fixed text Text1 substring Str-after afterwards, check again after the property value of attribute Ai, if fixed text Text2 is arranged, then again character string Str-after is got fixed text Text2 substring before, be expressed as Str-before;
Step 6: the XML query statement that generates data pick-up;
Back end path and attribute node path that the structure of the XML query statement of drawing-out structure data mainly is based on step 4 and step 5 and is obtained, when using the XQuery query language, the structure of statement mainly is to use the FLWOR expression formula of XQuery query language, wherein, each clause's function is as follows:
FOR clause: locator data cell node set;
LET clause: increase predicate variable;
WHERE clause: the predicate of data cell node based on the attribute path filtered;
ORDER clause: the rule that the result is sorted;
RETURN clause: return the desired data layout of user;
According to XML query language XQuery syntactic property, can extract the data content of the hierarchical structure in the Web page by the form of nested FLWOR clause in RETURN clause, be several methods that make up the XML query statement according to different demands below:
Step 6-1: when the data pick-up result was hierarchical structure, the XML query statement structure construction method that generate was:
(1) outermost layer of statement uses fixing XML element tags as root node, and the centre is the XML query expression, is the FLWOR expression formula for the XQuery language, promptly uses following form:<root node label〉the XML query expression</the root node label 〉;
(2) in the XML query expression, use the path expression locator data cell node variable of data cell, use FOR statement locator data cell node variable for the XQuery language, can use LET statement and WHERE statement to add the predicate of locator data cell node simultaneously;
(3) in the XML query expression, output at Query Result, use the attribute-name in the data pattern or have the label of the text of identical semanteme as element in the XML document, use the path of the attribute node location that generates in the step 5 and the property value decimation rule is located corresponding attribute under the data cell node variable property value text, concrete form is:<attribute tags〉{ expression formula that attribute node path and property value decimation rule constitute }</attribute tags 〉
The one-piece construction of XML query statement is:
<root node label 〉
{
FOR data cell node variable in data cell node path
[LET statement]
[WHERE statement]
RETURN<data entity name label 〉
<attribute 1 label〉{ expression formula that attribute 1 node path and property value decimation rule constitute }</attribute 1 label 〉
……
<attribute n label〉{ expression formula that attribute n node path and property value decimation rule constitute }</attribute n label 〉
</data entity name label 〉
}
</root node label 〉
Step 6-2: when the data pick-up result is the list structure of relation form, the XML query statement structure construction method that generate:
1 in the XML query expression, use the path expression locator data cell node variable of data cell, use FOR statement locator data cell node variable for the XQuery language, use LET statement and WHERE statement to add the predicate of locator data cell node simultaneously;
2 in the XML query expression, output at Query Result, demand according to the output result, be arranged in order the expression formula that constitutes by attribute node path and property value decimation rule, separate with special symbol between the expression formula of different attribute value, concrete form is: { property value of attribute 1 extracts expression formula } separator { property value of attribute 2 extracts expression formula } separator ... separator { property value of attribute n extracts expression formula }
Step 7: utilize XML query statement extracted data;
Use the execution engine of XML query processing, operation XML query statement can extract the data designated content from the webpage that is formatted as the XML document form on the XML document after the target web conversion.
Advantage of the present invention: the Web data pick-up method based on the XML inquiry of the present invention has than extensive applicability: (1) the present invention can generate accurate XML query statement, based on path expression generation method, data unit and property value are carried out accurate XPath expression formula location, thereby guarantee the correctness of XML query statement; (2) the present invention has high generality, and the XML query statement of data source extracted data be may operate in database in generation or the XML query specification is carried out on the engine, can with existing seamless fusion; (3) the present invention can adapt to complicated query result output more, by adjusting the structure of bearing-age tree virgin sentence, supports to extract the data content of the middle-level structure of the Web page, not only is confined to simple relational structure.
Description of drawings
Fig. 1 is that the web data abstracting method electronics that the present invention is based on the extensible language inquiry is sold the Web page of data synoptic diagram of book website;
Fig. 2 is the web data abstracting method process flow diagram that the present invention is based on the extensible language inquiry;
Fig. 3 the present invention is based on the position view of the web data abstracting method data cell of extensible language inquiry at the page documents dom tree.
Embodiment
Below in conjunction with drawings and Examples the present invention is described in further detail:
Fig. 1 sells a Web page of data of book website for certain electronics, the flow process that adopts the inventive method as shown in Figure 2, step is as follows:
Step 1: pairing data pattern S when determining from the Web page extracted data content, wherein data entity title E is " books ", the Property Name that community set comprised and the data type of attribute are as shown in table 1:
Table 1 is data entity " books " Property Name that is comprised and the data type of attribute
? Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Attribute 6 Attribute 7 Attribute 8 Attribute 9
Title Title The author Publishing house Publication time The books brief introduction Original cost Present price Discount Save the amount of money
Type string string string string string float float integer float
Step 2: the data area in the location map 1 in the sample page, data cell and attribute text, as can be seen from Figure 1, data cell is made up of data cell 1, data cell 2, data cell 3;
At first need being formatted as of html page met XML language standard's XML document:
<div?class="list_book_right">
<h2〉<img/<a name=" link_prd_name " href=" " target=" _ blank " algorithm and data structure prepare for the postgraduate qualifying examination the examination question essence analyse (the 2nd edition)</a</h2
<h3〉client's scoring:</h3 〉
<h4 class=" list_r_list_h4 "〉author:<a href=" "〉Chen Shoukong</a 〉,<a href=" "〉and Hu Xiaokun</a 〉,<a href=" "〉and Li Ling</a〉write</h4 〉
<h4〉publishing house:<a href=" "〉China Machine Press</a〉</h4 〉
<h4〉publication time: 2007 07 month</h4 〉
<h5〉this book collected key university of institute and academy of sciences surplus in the of since nineteen ninety-two domestic 60, the 1600 multiple tracks examination questions of more than 300 cover Master degree candidates entrance " algorithm and data structure " examination papers, and provided Key for Reference and analysis.This book can be used as institution of higher learning's computing machine and relevant speciality learning data<font class=dot〉...</font〉</h5 〉
<div?class="clear"></div>
<h6〉<span〉$42.00</span〉<span〉$35.70</span〉discount: 85 folding saving: $6.30</h6 〉
<span?class="list_r_list_button"> <a?href=''?><img?src=''?/></a></span> <span?class="list_r_list_button"><a?href="?"><img?src="?"?/></a></span>
</div>
Step 3: the attribute text to data cell in the sample page marks, and the text among Fig. 1 in 3 data unit marks as follows respectively:
Data cell 1:
Title: algorithm and the data structure examination question essence of preparing for the postgraduate qualifying examination is analysed (the 2nd edition)
Author: Chen Shoukong, Hu Xiaokun, Li Ling
Publishing house: China Machine Press
Publication time: 2007 07 month
The books brief introduction: this book collected key university of institute and academy of sciences surplus in the of since nineteen ninety-two domestic 60, the 1600 multiple tracks examination questions of more than 300 cover Master degree candidates entrance " algorithm and data structure " examination papers, and having provided Key for Reference and analysis, this book can be used as institution of higher learning's computing machine and relevant speciality learning data
Original cost: $42.00
Present price: $35.70
Discount: 85
Save amount of money: $6.30
Data cell 2:
Title: data mining notion and technology (former book the 2nd edition)
Author: Han Jiawei may win, and model is bright, Meng Xiaofeng
Publishing house: China Machine Press
Publication time: 2007 03 month
The books brief introduction: this book is told about the important knowledge and technology innovation in data mining field all sidedly, on the quite comprehensive basis of the 1st version content, the 2nd edition newest research results of having showed this field, for example excavate stream, sequential and sequence data and excavate time and space, multimedia, text and web data, this book can be used as
Original cost: $55.00
Present price: $42.30
Discount: 77
Save amount of money: $12.70
Data cell 3:
Title: Oracle9i﹠amp; The 10g art of programming: go deep into data base architecture
Author: Kate, Su Jinguo
Publishing house: People's Telecon Publishing House
Publication time: in October, 2006
The books brief introduction: this book be one about oracle 9j az﹠amp; The authoritative books of 10g data base architecture have been contained all most important oracle architecture characteristicses, comprise file, internal storage structure and process, lock and door bolt, and affairs, concurrent and many versions, table and index, data type, and subregion and parallel, and
Original cost: $99.00
Present price: $74.30
Discount: 75
Save amount of money: $24.70
Step 4: the path that generates the data cell node;
Step 4-1: data cell U={U1, U2, U3}, wherein: U1 represents that data cell 1, U2 represent that data cell 2, U3 represent data cell 3, wherein the title of data cell 1 is " algorithm and data structure prepare for the postgraduate qualifying examination examination question essence analyse (the 2nd edition) ", the title of data cell 2 is " data mining notion and technology (a former book the 2nd edition) ", and the title of data cell 3 is " Oracle9i﹠amp; The 10g art of programming: go deep into data base architecture ";
Step 4-2: data cell marks out in Fig. 1, position in the data cell corresponding page document D OM tree is shown in solid dot among Fig. 3, among Fig. 3 in the corresponding XML document of root node label be the element of html, the outermost layer dotted line of Webpage in the corresponding diagram 1, comprising whole content viewables and not visible content, node 1 is the XML node element of head for label, web data header in the corresponding diagram 1 in the Webpage, wherein the page metamessage that comprises of content is not visible element, node 2 is the XML node element of body for label, outermost layer solid line in the corresponding diagram 1 in the Webpage, node 2.1 all is that label is the XML node element of div in the child nodes of node 2 to node 2.7, the below advertisement position in the Webpage in node 2.7 corresponding diagram 1 wherein, triangle among the figure under the node is represented the subtree under this node, node 2.6.1 is that label is the XML node element of div in the child nodes of node 2.6 to node 2.6.3, solid line zone pointed, the data area of Webpage in the node 2.6.3 corresponding diagram 1 wherein, node 2.6.3.1 is that first label is the XML node element of div in the node 2.6.3 child nodes, the data cell 1 of Webpage solid line zone pointed in the node 2.6.3.1 corresponding diagram 1 wherein, node div[1] represent that this node is that first label is the XML node element of div in its father node child nodes, the pairing node in solid node bit data unit among Fig. 3, the path of this node is exactly the data cell path, comprises the attribute value data of data cell in the text node of this node subtree;
Step 4-3: calculating path expression formula;
The path values of the node of data cell in XML document is respectively:
P1:“/html[1]/body[1]/div[6]/div[3]/div[1]/div[4]/div[2]”
P2:“/html[1]/body[1]/div[6]/div[3]/div[1]/div[5]/div[2]”
P3:“/html[1]/body[1]/div[6]/div[3]/div[1]/div[6]/div[2]”
Html wherein, body, div are the XML element tags, div[i] the expression label is that the div node is i in its brotgher of node with label;
Step 4-4: calculate the longest common path LCP that begins from root node;
The longest common path LCP:
LCP:“/html[1]/body[1]/div[6]/div[3]/div[1]”
Step 4-5: the longest common path LCP that abbreviation step 4-3 calculates;
The longest common path expression formula behind the abbreviation is:
LCP:“/html/body/div[6]/div[3]/div[1]”
Step 4-6: calculate local path;
The expression formula of local path is:
“/div[.@class="list_r_list"]/div[2]”
Step 4-7: merge the longest common path and local path;
The path expression that obtains the data cell node after the merging is:
“/html/body/div[6]/div[3]/div[1]/div[.@class="list_r_list"]/div[2]”
Div[.@class=" list_r_list " wherein] expression has XML attribute class, and Class is the Style Attributes of label node in the html document, and property value is that the XML label of list_r_list is the node element of div;
Step 5: generate the path expression that extracts property value;
For data cell 1, it is as follows to generate the path expression that extracts property value based on structure wherein:
1. attribute " title "
Path localization and expression formula is "/h2/a/ ", and wherein h2 represents that label is the XML node element of h2, and a represents that label is the XML node element of a;
The property value decimation rule: the content in this attribute node all is a property value information, can use text () function in the XQuery language as the function that extracts property value.
2. attribute " author "
Path localization and expression formula is "/h4[1]/a ", wherein h4[1] represent that first label is the XML node element of h4, a represents that label is the XML node element of a;
The property value decimation rule: the content in this attribute node all is a property value information.
3. attribute " publishing house "
Path localization and expression formula is "/h4[2]/a ", wherein h4[2] second XML node element that label is h4 of expression, a represents that label is the XML node element of a;
The property value decimation rule: the content in this attribute node all is a property value information.
4. attribute " publication time "
Path localization and expression formula is "/h4[3] ", wherein h4[3] the 3rd XML node element that label is h4 of expression;
The property value decimation rule: the content part in this attribute node is a property value information, and decimation rule is for eliminating public non-property value character string " publication time: ".
5. attribute " books brief introduction "
Path localization and expression formula is "/h5 ", and wherein h5 represents that label is the XML node element of h5;
The property value decimation rule: the content in this attribute node all is a property value information.
6. attribute " original cost "
Path localization and expression formula is "/h6/ span[1] ", and wherein h6 represents that label is the XML node element of h6, and span [1] represents that first label is the XML node element of span;
The property value decimation rule: the content part in this attribute node is a property value information, and decimation rule is for eliminating public non-property value character string " $ ".
7. attribute " present price "
Path localization and expression formula is "/h6/ span[2] ", and wherein h6 represents that label is the XML node element of h6, and span [2] represents second XML node element that label is span;
The property value decimation rule: the content part in this attribute node is a property value information, and decimation rule is for eliminating public non-property value character string " $ ".
8. attribute " discount "
Path localization and expression formula is "/h6/ ", and wherein h6 represents that label is the XML node element of h6;
The property value decimation rule: the content part in this attribute node is a property value information, and decimation rule is for eliminating node text-string public non-property value character string of middle front part " discount: " and the public non-property value character string in rear portion " folding ".
9. attribute " the saving amount of money "
Path localization and expression formula is "/h6/ ", and wherein h6 represents that label is the XML node element of h6;
The property value decimation rule: the content part in this attribute node is a property value information, and decimation rule " is saved: $ " for eliminating the public non-property value character string of node text-string middle front part.
Step 6: the XML query statement that generates data pick-up;
To generate XML form extraction result data is example, uses XQuery as the XML query language, as follows for the XQuery statement that this example generated:
The tabulation of<books 〉
FOR variable 1 IN/html/body/div[6]/div[3]/div[1]/div[.@class=" list_r_list "]/div[2]
RETURN< Books
Title{ variable 1/h2/a/text () }</ Title
The authorVariable 1/h4[1]/a/text () }</ The author
Publishing houseVariable 1/ h4[2]/a/text () }</ Publishing house
Publication timeVariable 1/h4[3]/substring-after (text (), " publication time: ")</ Publication time
The books brief introduction{ variable 1/ h5/text () }</ The books brief introduction
Original costVariable 1/ h6/ span[1]/substring-after (text (), " $ ")</ Original cost
Present priceVariable 1/ h6/span[2]/substring-after (text (), " $ ")</ Present price
Discount{ variable 1/h6/substring – before (substring-after (text (), " folding: "), " folding ") }</ Discount
Save the amount of money{ variable 1/ h6/ substring-after (text (), " saving: $ ") }</ Save the amount of money
</ Books
}
</books tabulation 〉
Wherein, FOR, IN, RETURN are XQuery query language key word, text () is for obtaining the function of intranodal text, substring – before () is for obtaining the substring function before a certain special string in the character string, and substring-after () is for obtaining the substring function after a certain special string in the character string.
Step 7: carry out above XML query statement extracted data;
Behind the data pick-up of above XML query statement execution to example page, the XML data content of acquisition is:
<books tabulation 〉
Books
TitleAlgorithm and data structure prepare for the postgraduate qualifying examination the examination question essence analyse (the 2nd edition)</ Title
The authorOld keep Kong Huxiao a kind of jade Li Ling</ The author
Publishing houseChina Machine Press</ Publishing house
Publication time2007 years 07 month</ Publication time
The books brief introductionThis book collected key university of institute and academy of sciences surplus in the of since nineteen ninety-two domestic 60, the 1600 multiple tracks examination questions of more than 300 cover Master degree candidates entrance " algorithm and data structure " examination papers, and provided Key for Reference and analysis.This book can be used as institution of higher learning's computing machine and relevant speciality learning data</ The books brief introduction
Original cost42.00</ Original cost
Present price35.70</ Present price
Discount85</ Discount
Save the amount of money6.30</ Save the amount of money
</ Books
Books
TitleData mining notion and technology (former book the 2nd edition)</ Title
The authorHan Jiawei may the Bo Fanmingmeng small peak</ The author
Publishing houseChina Machine Press</ Publishing house
Publication time2007 years 03 month</ Publication time
The books brief introductionThis book tells about the important knowledge and technology innovation in data mining field all sidedly.On the quite comprehensive basis of the 1st version content, the 2nd edition newest research results of having showed this field for example excavated stream, sequential and sequence data and excavated time and space, multimedia, text and web data.This book can be used as</ The books brief introduction
Original cost55.00</ Original cost
Present price42.30</ Present price
Discount77</ Discount
Save the amount of money12.70</ Save the amount of money
</ Books
Books
TitleOracle9i﹠amp; 10g programme the art: go deep into data base architecture</ Title
The authorKate revive Jin nation</ The author
Publishing houseThe People's Telecon Publishing House</ Publishing house
Publication timeIn October, 2006</ Publication time
The books brief introductionThis book be one about oracle 9j az﹠amp; The authoritative books of 10g data base architecture have been contained all most important oracle architecture characteristicses, comprise file, internal storage structure and process, lock and door bolt, and affairs, concurrent and many versions, table and index, data type, and subregion and parallel, and</ The books brief introduction
Original cost99.00</ Original cost
Present price74.30</ Present price
Discount75</ Discount
Save the amount of money24.70</ Save the amount of money
</ Books
<books tabulation 〉

Claims (5)

1. web data abstracting method based on extensible language inquiry is characterized in that: may further comprise the steps:
Step 1: pairing mode configuration when determining in the Web page extracted data content;
Step 2: data area, data cell and attribute text in the Web page of location;
Step 3: the attribute text in the step 2 is carried out semantic tagger;
Step 4: generate the data cell node path:
Step 5: calculate the path expression that extracts property value;
Step 6: the XML query statement that generates data pick-up;
Step 7: utilize XML query statement extracted data.
2. the web data abstracting method based on the extensible language inquiry according to claim 1, it is characterized in that: the described mode configuration of step 1 comprises: two kinds of the list structure of relation form and hierarchical structures, wherein, the data pattern S of list structure is by data entity name E and one group of community set A={A 1..., A nConstitute A wherein i(1<=i<=n, the quantity of n representation attribute) attribute in the representation attribute set, data type by Property Name and attribute constitutes, be expressed as<N, T 〉, N representation attribute title wherein, T representation attribute data type, described data type T comprises integer type integer, floating point type float and character string type string; Described hierarchical structure is meant the complex data structures of being made up of fundamental type, and its corresponding data pattern is expressed as
Figure 254648DEST_PATH_IMAGE001
, comprise attribute
Figure 684492DEST_PATH_IMAGE002
, m is a pattern The quantity of middle attribute.
3. the web data abstracting method based on the extensible language inquiry according to claim 1, it is characterized in that: the described generation data cell of step 4 node path may further comprise the steps:
Step 4-1: the data cell set that step 2 is obtained is expressed as: U={U 1, U 2..., U n, wherein, U iRepresent a data unit, i=1 wherein ..., n;
Step 4-2: according to established data unit U i, institute is to deserved node element in page XML document to determine it, and this node table is shown N i, the structure according to XML document is node element N again iThe path values of generation from root node to this node is expressed as P i
Step 4-3: the path expression of computational data unit, method is:
Get the path of a data cell node, at path values P iIn, each step in the predicate location path expression formula of use location, promptly by the documentation root node to the node element of data cell correspondence each node of process, get each node label in the path expression, the path of all data cells has identical sequence label, then the sequence label that begins from root node is expressed as T, is expressed as (T respectively comprising m label 1, T 2..., T m), label T wherein 1Be the label of root node, all the other labels and the like, the label of each node is expressed as (p at it with the position sequence in the label brotgher of node I1..., p Im), position p wherein I1Be the position of root node label, all the other labels and the like, then path values is expressed as:
Path values P i=/label 1 [position i1]/label 2[position i2]/... / label m[position im],
Be Pi=/T 1[p I1]/T 2[p I2]/.../T m[p Im]/
Step 4-4:, calculate the longest common path LCP that begins from root node to the set of paths of data unit:
The longest described common path is meant the path that the total node in the path of all data cell nodes constitutes, the method of calculating the longest common path LCP is: for the path of data cell node, first label position that begins from root node begins coupling, if the positional value of all data cell node paths under current label is identical, i.e. p 1i=p 2i=...=p Ni, then current label and positional value are added in the longest common path in proper order, i.e. LCP+=/T i[p i], if there is different value in the positional value of all data cell node paths under current label, then stop the coupling, with the longest current common path value as the longest final common path value;
Step 4-5: the longest common path LCP that abbreviation step 4-4 calculates;
For one in the longest common path pairing node of step, be expressed as n i, corresponding label is T iIf, do not exist identically in its brotgher of node with its label, and to have identical successor path be "/label I+1/ .../label m" the non-data cell node of descendants's node, then the positional value of this node can omit in the expression formula of the longest common path;
Step 4-6: adopt the method that generates predicate to calculate local path, described local path is meant the path that the privately owned node of each node constitutes:
The method that generates predicate is: the label of supposing the node in current step is Ti, sees in all brotghers of node of node set in the current step, and whether comprise identical with its label and have identical successor path is "/label I+1/ .../label m" the non-data cell node of descendants's node; if then do not omit predicate; if having to check again then whether the XML of non-data cell node attribute is arranged in the present node; present node and the non-data cell node that meets top condition can be distinguished; and if such XML attribute were arranged with this attribute as the predicate expression formula; if there is not then further to calculate the scope of positional value in the predicate, these qualified non-data cell nodes are called the noise node;
The method of the scope of positional value is as follows in the described calculating predicate:
If the noise node only appears at before the data cell node set, then for this label list registration according to the scope of position in the predicate of cell node be: positional value minimum from the pairing node location of label i of all data cell nodes is to a last node with this label;
If the noise node only appears at after the data cell node set, then according to the scope of position in the predicate of cell node be: maximum positional value the pairing node location of label i from first to all data cell nodes for this label list registration;
If back end is cut apart regularly by the noise node, the interval p that the computational data cell node is cut apart by the noise node Inte, the length p that the computational data cell node occurs continuously ContAnd calculate positional value minimum in the pairing node location of label i of all data cell nodes and maximum positional value, be expressed as pmin and pmax, the node that then meets the lower position condition is considered to the node on the data cell path: after (1) node location value deducts pmin, to p InteRemainder behind the delivery is less than p ContValue; (2) the node location value deducts pmax less than maximum noise node location value and adds value after 1 again;
Step 4-7: merge the longest common path and local path;
Will the longest common path and local path merge, obtain the path P u of locator data unit in the XML document of the Web page.
4. the web data abstracting method based on the extensible language inquiry according to claim 1 is characterized in that:
The path expression that property value is extracted in the described calculating of step 5 may further comprise the steps:
Step 5-1: the path that generates the attribute node location;
Suppose in sample data pattern attribute A iThe node at property value place with respect to the path representation of data cell node be:
/ label A I1[position A I1]/label A I2[position A I2]/... / label A Ik[position A Ik]
Promptly/TA I1[pA I1]/TA I2[pA I2]/... / TA Ik[pA Ik], TA wherein IjThe expression label A Ij, pA IjExpression position A Ij, j=1 wherein ..., k, label A IkFor comprising the label of property value node, position A IkFor this node at it with the position in the label brotgher of node, then can use the method for step 4-5, abbreviation is carried out in the path of attribute node location;
Step 5-2: determine the property value decimation rule;
1), the property value of a plurality of attributes is contained in the node text simultaneously the property value decimation rule is applicable to following two kinds of situations:; 2), comprise non-property value content of text in the node content of text;
Suppose that non-property value content of text is a fixed text in the node text, and also use fixing text to cut apart between the property value of the different attribute in same node text, only need calculate the property value that the fixed character string of cutting apart attribute in the node text gets final product unbundled attribute value text or different attribute, method is:
At first get a plurality of sample Web pages, therefrom extract the node text that comprises same alike result, if alphabet is the property value content then directly extraction in this node text, otherwise extracts public substring and cut apart attribute, from the node text, extract the regular as follows of property value:
If fixed text Text1 is arranged before the property value of attribute Ai in the node text, then at first node text-string Str is got fixed text Text1 substring Str-after afterwards, check again after the property value of attribute Ai, if fixed text Text2 is arranged, then again character string Str-after is got fixed text Text2 substring before, be expressed as Str-before.
5. the web data abstracting method based on the extensible language inquiry according to claim 1, it is characterized in that: the XML query statement of the described generation data pick-up of step 6 may further comprise the steps:
Step 6-1: when the data pick-up result was hierarchical structure, the XML query statement structure construction method that generate was:
(1) outermost layer of statement uses fixing XML element tags as root node, and the centre is the XML query expression, is the FLWOR expression formula for the XQuery language, promptly uses following form:<root node label〉the XML query expression</the root node label 〉;
(2) in the XML query expression, use the path expression locator data cell node variable of data cell, use FOR statement locator data cell node variable for the XQuery language, can use LET statement and WHERE statement to add the predicate of locator data cell node simultaneously;
(3) in the XML query expression, output at Query Result, use the attribute-name in the data pattern or have the label of the text of identical semanteme as element in the XML document, use the path of the attribute node location that generates in the step 5 and the property value decimation rule is located corresponding attribute under the data cell node variable property value text, concrete form is:<attribute tags〉{ expression formula that attribute node path and property value decimation rule constitute }</attribute tags 〉
The one-piece construction of XML query statement is:
<root node label 〉
{
FOR data cell node variable in data cell node path
[LET statement]
[WHERE statement]
RETURN<data entity name label 〉
<attribute 1 label〉{ expression formula that attribute 1 node path and property value decimation rule constitute }</attribute 1 label 〉
……
<attribute n label〉{ expression formula that attribute n node path and property value decimation rule constitute }</attribute n label 〉
</data entity name label 〉
}
</root node label 〉
Step 6-2: when the data pick-up result is the list structure of relation form, the XML query statement structure construction method that generate:
(1) in the XML query expression, use the path expression locator data cell node variable of data cell, use FOR statement locator data cell node variable for the XQuery language, use LET statement and WHERE statement to add the predicate of locator data cell node simultaneously;
(2) in the XML query expression, output at Query Result, demand according to the output result, be arranged in order the expression formula that constitutes by attribute node path and property value decimation rule, separate with special symbol between the expression formula of different attribute value, concrete form is: { property value of attribute 1 extracts expression formula } separator { property value of attribute 2 extracts expression formula } separator ... separator { property value of attribute n extracts expression formula }.
CN201010545520A 2010-11-16 2010-11-16 Webpage data extracting method based on extensible language query Expired - Fee Related CN101984434B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010545520A CN101984434B (en) 2010-11-16 2010-11-16 Webpage data extracting method based on extensible language query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010545520A CN101984434B (en) 2010-11-16 2010-11-16 Webpage data extracting method based on extensible language query

Publications (2)

Publication Number Publication Date
CN101984434A true CN101984434A (en) 2011-03-09
CN101984434B CN101984434B (en) 2012-09-05

Family

ID=43641603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010545520A Expired - Fee Related CN101984434B (en) 2010-11-16 2010-11-16 Webpage data extracting method based on extensible language query

Country Status (1)

Country Link
CN (1) CN101984434B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456053A (en) * 2010-11-02 2012-05-16 江苏大学 Method for mapping XML document to database
CN102902723A (en) * 2012-09-06 2013-01-30 北京北森测评技术有限公司 Method and device for analyzing network data
CN103186674A (en) * 2013-04-02 2013-07-03 浪潮电子信息产业股份有限公司 Web data quick inquiry method based on extensive makeup language (XML)
CN103778104A (en) * 2012-10-22 2014-05-07 富士通株式会社 Information processing device, information processing method and electronic device
WO2016090625A1 (en) * 2014-12-12 2016-06-16 Hewlett-Packard Development Company, L.P. Scalable web data extraction
CN105808520A (en) * 2014-12-30 2016-07-27 联想(北京)有限公司 Electronic equipment and sentence processing method thereof
CN106294722A (en) * 2016-08-09 2017-01-04 上海资誉网络科技有限公司 A kind of web page contents extraction method and device
CN106649628A (en) * 2016-12-06 2017-05-10 北京大学 Interaction enhancement method and system of webpage visible area
CN106951451A (en) * 2017-02-22 2017-07-14 北京麒麟合盛网络技术有限公司 A kind of webpage content extracting method, device and computing device
CN106980619A (en) * 2016-01-18 2017-07-25 北京国双科技有限公司 Data query method and device
CN107957909A (en) * 2016-10-17 2018-04-24 腾讯科技(深圳)有限公司 A kind of information processing method, terminal device and server
CN108334560A (en) * 2018-01-03 2018-07-27 腾讯科技(深圳)有限公司 A kind of information acquisition method and relevant device
CN108614842A (en) * 2016-12-13 2018-10-02 北京国双科技有限公司 The method and apparatus for inquiring data
CN109582886A (en) * 2018-11-02 2019-04-05 北京字节跳动网络技术有限公司 Content of pages extracting method, the generation method of template and device, medium and equipment
CN110309364A (en) * 2018-03-02 2019-10-08 腾讯科技(深圳)有限公司 A kind of information extraction method and device
CN112528082A (en) * 2020-12-08 2021-03-19 集美大学 XML document production line XPath query method, terminal equipment and storage medium
CN112836063A (en) * 2021-01-27 2021-05-25 四川新网银行股份有限公司 Method for realizing feature tracing
CN115658993A (en) * 2022-09-27 2023-01-31 观澜网络(杭州)有限公司 Intelligent extraction method and system for core content of webpage

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290624A (en) * 2008-06-11 2008-10-22 华东师范大学 News web page metadata automatic extraction method
CN101582074A (en) * 2009-01-21 2009-11-18 东北大学 Method for extracting data of DeepWeb response webpage

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290624A (en) * 2008-06-11 2008-10-22 华东师范大学 News web page metadata automatic extraction method
CN101582074A (en) * 2009-01-21 2009-11-18 东北大学 Method for extracting data of DeepWeb response webpage

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
《中国优秀硕士学位论文全文数据库信息科技辑》 20040430 邓丽 面向主题的XML网页的模式和数据抽取 第1-47页 1-5 , 2 *
《小型微型计算机系统》 20100430 孙高尚等 一种应用于Deep Web结果页面中分页标签的识别方法 第635-640页 1-5 第31卷, 第4期 2 *
《情报杂志》 20061231 李剑波等 一种基于XML的Web信息抽取方法 第49-51页 1-5 , 第8期 2 *
《计算机应用》 20040630 周津等 基于XML的网页信息自动抽取 第225-227页 1-5 第24卷, 2 *
《计算机科学》 20071231 申德荣等 支持Web深层数据库网络的部分关键技术的研究 第123-125页 1-5 第34卷, 第8期 2 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102456053B (en) * 2010-11-02 2013-08-14 江苏大学 Method for mapping XML document to database
CN102456053A (en) * 2010-11-02 2012-05-16 江苏大学 Method for mapping XML document to database
CN102902723A (en) * 2012-09-06 2013-01-30 北京北森测评技术有限公司 Method and device for analyzing network data
CN103778104B (en) * 2012-10-22 2017-05-03 富士通株式会社 Information processing device, information processing method and electronic device
CN103778104A (en) * 2012-10-22 2014-05-07 富士通株式会社 Information processing device, information processing method and electronic device
CN103186674A (en) * 2013-04-02 2013-07-03 浪潮电子信息产业股份有限公司 Web data quick inquiry method based on extensive makeup language (XML)
WO2016090625A1 (en) * 2014-12-12 2016-06-16 Hewlett-Packard Development Company, L.P. Scalable web data extraction
CN105808520A (en) * 2014-12-30 2016-07-27 联想(北京)有限公司 Electronic equipment and sentence processing method thereof
CN105808520B (en) * 2014-12-30 2018-12-14 联想(北京)有限公司 Electronic equipment and its sentence processing method
CN106980619A (en) * 2016-01-18 2017-07-25 北京国双科技有限公司 Data query method and device
CN106980619B (en) * 2016-01-18 2021-03-26 北京国双科技有限公司 Data query method and device
CN106294722A (en) * 2016-08-09 2017-01-04 上海资誉网络科技有限公司 A kind of web page contents extraction method and device
CN106294722B (en) * 2016-08-09 2019-11-22 上海资誉网络科技有限公司 A kind of web page contents extraction method and device
CN107957909A (en) * 2016-10-17 2018-04-24 腾讯科技(深圳)有限公司 A kind of information processing method, terminal device and server
CN107957909B (en) * 2016-10-17 2022-01-07 腾讯科技(深圳)有限公司 Information processing method, terminal equipment and server
CN106649628A (en) * 2016-12-06 2017-05-10 北京大学 Interaction enhancement method and system of webpage visible area
CN106649628B (en) * 2016-12-06 2020-08-25 北京大学 Interaction enhancement method and system for webpage visualization area
CN108614842A (en) * 2016-12-13 2018-10-02 北京国双科技有限公司 The method and apparatus for inquiring data
CN106951451B (en) * 2017-02-22 2019-11-12 麒麟合盛网络技术股份有限公司 A kind of webpage content extracting method, device and calculate equipment
CN106951451A (en) * 2017-02-22 2017-07-14 北京麒麟合盛网络技术有限公司 A kind of webpage content extracting method, device and computing device
CN108334560A (en) * 2018-01-03 2018-07-27 腾讯科技(深圳)有限公司 A kind of information acquisition method and relevant device
CN110309364A (en) * 2018-03-02 2019-10-08 腾讯科技(深圳)有限公司 A kind of information extraction method and device
CN110309364B (en) * 2018-03-02 2023-03-28 腾讯科技(深圳)有限公司 Information extraction method and device
CN109582886A (en) * 2018-11-02 2019-04-05 北京字节跳动网络技术有限公司 Content of pages extracting method, the generation method of template and device, medium and equipment
CN112528082A (en) * 2020-12-08 2021-03-19 集美大学 XML document production line XPath query method, terminal equipment and storage medium
CN112528082B (en) * 2020-12-08 2022-05-03 集美大学 XML document production line XPath query method, terminal equipment and storage medium
CN112836063A (en) * 2021-01-27 2021-05-25 四川新网银行股份有限公司 Method for realizing feature tracing
CN112836063B (en) * 2021-01-27 2023-06-06 四川新网银行股份有限公司 Method for realizing feature tracing
CN115658993A (en) * 2022-09-27 2023-01-31 观澜网络(杭州)有限公司 Intelligent extraction method and system for core content of webpage

Also Published As

Publication number Publication date
CN101984434B (en) 2012-09-05

Similar Documents

Publication Publication Date Title
CN101984434B (en) Webpage data extracting method based on extensible language query
CN101470728B (en) Method and device for automatically abstracting text of Chinese news web page
Day et al. Reference metadata extraction using a hierarchical knowledge representation framework
Zhao et al. Automatic extraction of dynamic record sections from search engine result pages
Zheng et al. Template-independent news extraction based on visual consistency
Xue et al. Web page title extraction and its application
Embley et al. Converting heterogeneous statistical tables on the web to searchable databases
JPWO2007105759A1 (en) Formula description structured language object search system and search method
CN101515287A (en) Automatic generating method of wrapper of complex page
CN102662969A (en) Internet information object positioning method based on webpage structure semantic meaning
Wu et al. Web news extraction via path ratios
Seng et al. An Intelligent information segmentation approach to extract financial data for business valuation
CN102262658A (en) Method for extracting web data from bottom to top based on entity
CN105574066A (en) Web page text extraction and comparison method and system thereof
Cruz et al. Semantic extraction of geographic data from web tables for big data integration
Liu et al. Automatically extracting user reviews from forum sites
Tezcan et al. Detecting grammatical errors in machine translation output using dependency parsing and treebank querying
He et al. Application of the indent conversion based on XML and DOM
Ou Data structuring and effective retrieval in the mining of web sequential characteristic
Liu et al. Automatically mining review records from forum Web sites
Kit et al. OLAP query processing for XML data in RDBMS
Kolkur et al. Web Data Extraction Using Tree Structure Algorithms-A Comparison
Xiong et al. New document scoring model based on interval tree
Barzilay Graph-based Algorithms in NLP
Flesca et al. A fuzzy logic approach to wrapping pdf documents

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120905

Termination date: 20141116

EXPY Termination of patent right or utility model