CN108334560A - A kind of information acquisition method and relevant device - Google Patents

A kind of information acquisition method and relevant device Download PDF

Info

Publication number
CN108334560A
CN108334560A CN201810009236.XA CN201810009236A CN108334560A CN 108334560 A CN108334560 A CN 108334560A CN 201810009236 A CN201810009236 A CN 201810009236A CN 108334560 A CN108334560 A CN 108334560A
Authority
CN
China
Prior art keywords
property name
attribute value
attribute
traverse path
xpath
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810009236.XA
Other languages
Chinese (zh)
Other versions
CN108334560B (en
Inventor
王策
张锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810009236.XA priority Critical patent/CN108334560B/en
Publication of CN108334560A publication Critical patent/CN108334560A/en
Application granted granted Critical
Publication of CN108334560B publication Critical patent/CN108334560B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention discloses a kind of information acquisition method and relevant devices, including:The second traverse path of the first traverse path and attribute value of Property Name is obtained first;Then according to first traverse path from obtaining the Property Name in page info and obtain the attribute value from the page info according to second traverse path;Then the mapping relations for establishing the Property Name and the attribute value are exported as acquisition of information result.Using the embodiment of the present invention, the accuracy of acquisition of information can be improved.

Description

A kind of information acquisition method and relevant device
Technical field
The present invention relates to field of computer technology more particularly to a kind of information acquisition methods and relevant device.
Background technology
Information carrier online at present is mainly text, can be the information for including in text by way of acquisition of information Structuring processing is carried out, the same organizational form of table is become, what input information obtained system is urtext, such as:Webpage Data or individual word content, output be set form information point.Information point is obtained from various documents It takes out, is then integrated in the form of unified, can efficiently obtain letter from a large amount of document by means of which Breath.Acquisition of information is commonly based on extensible markup language path language (Xml Path Language, XPath) realization, The Property Name of information is fixed in current information acquisition method, only to the corresponding attribute of the Property Name of information needed Value configuration XPath, and the specific of attribute value is obtained in the corresponding file structure model of text (Dom tree) by XPath Content.For example, being the infobox information of encyclopaedia entry " XXX " as shown in Figure 1, wherein " Business Name ", " foreign language title " etc. are Property Name, " XXX Co., Ltds of Shenzhen ", " ABC " are corresponding attribute values, obtain composition infobox information when, " Business Name " and " foreign language title " is fixed, and " XXX Co., Ltds of Shenzhen " and " ABC " system are the XPath by them It is obtained from the corresponding DOM tree of html text content of the Baidupedia page of " XXX ".
However, due in the different pages, the registration of attribute value is higher, and Property Name difference is larger, for example, figure The corresponding Property Name of attribute value " internet " in 1 is " business scope ", still, in Baidu's entry " internet " In infobox information, the Property Name of " internet " is " Chinese name ".Therefore, this fixed attribute title, only adopts attribute value Cause the accuracy of acquisition of information low with the method that XPath modes are obtained.
Invention content
A kind of information acquisition method of offer of the embodiment of the present invention and relevant device.The accuracy of acquisition of information can be improved.
First aspect present invention provides a kind of information acquisition method, including:
Obtain the second traverse path of the first traverse path and attribute value of Property Name;
The Property Name is obtained from page info according to first traverse path and is traversed according to described second Path obtains the attribute value from the page info;
The mapping relations for establishing the Property Name and the attribute value are exported as acquisition of information result.
Wherein, described to establish the Property Name and the mapping relations of the attribute value include:
Obtain the first map tags of the Property Name and the second map tags of the attribute value;
According to first map tags and second map tags, the Property Name and the attribute value are established Mapping relations.
Wherein, described according to first map tags and second map tags, establish the Property Name and institute The mapping relations for stating attribute value include:
When first map tags are identical as the second map tags, the Property Name and the attribute value are established Mapping relations.
Wherein, described the Property Name to be obtained from page info according to first traverse path and according to institute It states the second traverse path and obtains the attribute value from the page info and include:
Structure traversal tree is created according to the page info, wherein the structure traversal tree includes multiple content nodes;
The multiple content node on the structure traversal tree is traversed, the category is obtained according to first traverse path Property title and the attribute value is obtained according to second traverse path.
Wherein, first traverse path for obtaining Property Name and the second traverse path of attribute value include:
Obtain the attribute-bit of the Property Name and the attribute value;
According to the attribute-bit, first traverse path and second traverse path are obtained from configuration file, Wherein, the configuration file include the attribute-bit, it is corresponding with first traverse path and second traverse path Relationship.
Wherein, the attribute-bit of the acquisition Property Name and the attribute value includes:
Obtain the uniform resource locator of the page info;
According to the uniform resource locator, the attribute-bit of the Property Name and the attribute value is obtained.
Wherein, described the Property Name is obtained from page info according to first traverse path to include:
Determine the type of the Property Name;
If the Property Name is open Property Name, according to the acquisition of the first traverse path of the Property Name Property Name.
Correspondingly, second aspect of the present invention provides a kind of information acquisition device, including:
Path acquisition module, the second traverse path of the first traverse path and attribute value for obtaining Property Name;
Data obtaining module, for according to first traverse path obtained from page info the Property Name, with And the attribute value is obtained from the page info according to second traverse path;
As a result output module, establish the mapping relations of the Property Name and the attribute value as acquisition of information result into Row output.
Wherein, the result output module is specifically used for:
Obtain the first map tags of the Property Name and the second map tags of the attribute value;
According to first map tags and second map tags, the Property Name and the attribute value are established Mapping relations.
Wherein, the result output module is specifically used for:
When first map tags are identical as the second map tags, the Property Name and the attribute value are established Mapping relations.
Wherein, described information acquisition module is specifically used for:
Structure traversal tree is created according to the page info, wherein the structure traversal tree includes multiple content nodes;
The multiple content node on the structure traversal tree is traversed, the category is obtained according to first traverse path Property title and the attribute value is obtained according to second traverse path.
Wherein, the path acquisition module is specifically used for:
Obtain the attribute-bit of the Property Name and the attribute value;
According to the attribute-bit, first traverse path and second traverse path are obtained from configuration file, Wherein, the configuration file include the attribute-bit, it is corresponding with first traverse path and second traverse path Relationship.
Wherein, the path acquisition module is specifically used for:
Obtain the uniform resource locator of the page info;
According to the uniform resource locator, the attribute-bit of the Property Name and the attribute value is obtained.
Wherein, described information acquisition module is specifically used for:
Determine the type of the Property Name;
If the Property Name is open Property Name, according to the acquisition of the first traverse path of the Property Name Property Name.
The third aspect, the present invention provides a kind of information acquisition apparatus, including:Processor, memory and communication bus, In, for realizing connection communication between processor and memory, processor executes the program stored in memory and uses communication bus Step in a kind of information acquisition method that above-mentioned first aspect offer is provided.
In a possible design, information acquisition apparatus provided by the invention can include for executing in the above method The corresponding module of behavior.Module can be software and/or be hardware.
It is yet another aspect of the present invention to provide a kind of computer readable storage medium, in the computer readable storage medium It is stored with a plurality of instruction, described instruction is suitable for being loaded by processor and executing the method described in above-mentioned various aspects.
It is yet another aspect of the present invention to provide a kind of computer program products including instruction, when it runs on computers When so that computer executes the method described in above-mentioned various aspects.
Implement the embodiment of the present invention, obtains the first traverse path of Property Name and the second traversal road of attribute value first Diameter;Then the Property Name is obtained from page info according to first traverse path and is traversed according to described second Path obtains the attribute value from the page info;Finally establish the mapping relations of the Property Name and the attribute value It is exported as acquisition of information result.It is obtained using traverse path for Property Name and attribute value, and by attribute Title and attribute value are mapped, and the accuracy of acquisition of information is improved.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, for this field For those of ordinary skill, without creative efforts, other drawings may also be obtained based on these drawings.
Fig. 1 is a kind of schematic diagram for acquisition of information result that prior art provides;
Fig. 2 is a kind of structural schematic diagram of Information Acquisition System provided in an embodiment of the present invention;
Fig. 3 is a kind of flow diagram of information acquisition method provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of DOM tree provided in an embodiment of the present invention a kind of;
Fig. 5 is the flow diagram of another information acquisition method provided in an embodiment of the present invention;
Fig. 6 is the flow diagram of another information acquisition method provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram that the present invention implements a kind of information acquisition device provided;
Fig. 8 is a kind of structural schematic diagram for information acquisition apparatus that the embodiment of the present invention proposes.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained without creative efforts Example, shall fall within the protection scope of the present invention.
Fig. 2 is referred to, Fig. 2 is a kind of structural schematic diagram of Information Acquisition System provided in an embodiment of the present invention, the information Acquisition system includes user equipment 201 and server 202.Wherein, server 202 can be capable of providing network information browsing clothes Website (Web) server of business.User equipment 201 can refer to the equipment for providing the voice and/or data connection that arrive user, May be connected to laptop computer or desktop computer etc. computing device or its can be such as individual digital The autonomous device of assistant (Personal Digital Assistant, PDA) etc..Wherein, server is for receiving user equipment The service request of transmission, for the service request for asking browsing pages information, the unified resource for then parsing the webpage information is fixed Position symbol (Uniform Resource Location, URL) obtains the traverse path and attribute of Property Name from configuration file The traverse path of value finally obtains Property Name according to the traverse path of Property Name, is obtained according to the traverse path of attribute value Attribute value, the correspondence for finally establishing Property Name and attribute value are sent to user equipment as information result is obtained.User Equipment is used to send service request to server, and obtaining acquisition of information result from server is shown.
System is obtained based on above- mentioned information, as shown in figure 3, a kind of information acquisition method that the embodiment of the present invention proposes, packet It includes:S301, system loads configuration file pattern.conf and configuration file xpath.conf, wherein pattern.conf files ID number (pattern_id) including expression mode (pattern) and its corresponding expression mode, for example, as shown in table 1, Pattern.conf files include pattern_id 0 and pattern_id 1 and corresponding pattern.Xpath.conf texts Part includes the XPath of pattern_id, Property Name and attribute value.For example, as shown in table 2,0 time packet of pattern_id Contain under " title " and " brief introduction " two Property Names and the XPath of their corresponding attribute values, pattern_id 1 and has included The XPath of Property Name " label " and corresponding two attribute values.S302, by parse page info URL from Pattern_id is obtained in pattern.conf files, and attribute is then obtained from xpath.conf files according to pattern_id The XPath of value.S303 creates the DOM tree of page info, traverses the XPath of each attribute value under pattern_id, presses According to the paths XPath, node content is obtained from DOM tree as corresponding attribute value.S304, according to Property Name and attribute The correspondence of value, according to<Property Name, attribute value>Form output, can if a Property Name corresponds to M attribute value With according to<Property Name, attribute value 1, attribute value 2 ..., attribute value M>Form output, for example, Property Name " label " is corresponding Two attribute values " stock name " and " company ", then can export<Label, stock name, company>.
Table 1.pattern.conf files
pattern_id pattern
0 ^https://baike\.baidu\.com/item/.+/\d+$
1 ^https://baike\.baidu\.com/subview/\d+/\d+\.htm$
Table 2.xpath.conf files
However, due in the different pages, the registration of attribute value is higher, and Property Name difference is larger, therefore, The different pages, possible entirely different, this fixed attribute title of the corresponding Property Name of the identical attribute values of XPath, only to belonging to Property the method that is obtained using XPath modes of value cause the accuracy of acquisition of information low.In order to solve the problems, such as this, the present invention carries Go out following solution.
Fig. 4 is referred to, Fig. 4 is the flow chart schematic diagram for another information acquisition method that the embodiment of the present invention proposes, should Method includes but not limited to following steps:
S401 obtains the first traverse path of Property Name and the second traverse path of attribute value.
In the specific implementation, the service request of user equipment transmission can be received first, service request is believed for request page Then breath obtains the URL of page info;According to the URL, the attribute-bit of Property Name and attribute value, last basis are obtained Attribute-bit obtains the first traverse path and the second traverse path from configuration file, wherein configuration file includes attribute mark Know, the correspondence with the first traverse path and the second traverse path.
Wherein, system includes configuration file pattern.conf and configuration file xpath.conf, configuration file Pattern.conf includes pattern and its corresponding pattern_id, for example, as shown in table 1, pattern.conf files Including pattern_id 0 and pattern_id 1 and their corresponding pattern.Configuration file xpath.conf includes The XPath of pattern_id, Property Name and attribute value, wherein the Property Name in xpath.conf files includes specific The Property Name of Property Name and XPath forms, wherein the Property Name of XPath forms is open Property Name, open attribute The corresponding attribute value of title is open attribute value, it should be noted that only there are one corresponding open categories for an open Property Name Property value.For example, as shown in table 3, pattern_id 0 corresponds to two attribute titles, the first is specific Property Name, such as table 3 In the first row shown in, Property Name " title " is specific Property Name, the XPath "/html/body/ of corresponding attribute value div[4]/div[2]/div/div[2]/dd/h1”;It is for second the Property Name of XPath forms, such as the second row institute in table 3 Show, "/the html/body/div [4]/div [2]/div/dl [1]/dt [1] " in Property Name is the attribute-name of XPath forms Claim, the XPath "/html/body/div [4]/div [2]/div/dl [1]/dd [1] " of corresponding attribute value.
3. modified xpath.conf files of table
For example, after receiving service request, loading configuration file pattern.conf and configuration file first Then xpath.conf obtains the URL of the requested page info of user equipment, the URL of the page is parsed by regular expression, Matching inquiry is carried out to configuration file pattern.conf and then obtains corresponding pattern and pattern_id, and is generated Pattern_id lists obtain particular community title and correspondence according to pattern_id lists from configuration file xpath.conf Attribute value XPath and open Property Name XPath and corresponding open attribute value XPath.For example, can be first First from configuration file pattern.conf as shown in Table 1 pattern_id is obtained, then according to pattern_id from such as table 3 Shown in xpath.conf files obtain the XPath of Property Name, the XPath of attribute value or Property Name respectively, and then generate Configuration information list as shown in table 4.Configuration information list includes pattern_id 0, particular community title " title " and corresponds to The XPath "/html/body/div [4]/div [2]/div/div [2]/dd/h1 " of attribute value, open Property Name The XPath of XPath "/html/body/div [4]/div [2]/div/dl [1]/dt [1] " and corresponding open attribute value "/ html/body/div[4]/div[2]/div/dl[1]/dd[1]”。
4. configuration information list of table
pattern_id Property Name/attribute value XPath
0 Title /html/body/div[4]/div[2]/div/div[2]/dd/h1
0 Open Property Name /html/body/div[4]/div[2]/div/dl[1]/dt[2]
0 Open attribute value /html/body/div[4]/div[2]/div/dl[1]/dd[2]
S402 obtains the Property Name and according to described according to first traverse path from page info Two traverse paths obtain the attribute value from the page info.
In the specific implementation, structure traversal tree can be created according to the page info, wherein the structure traversal, which is set, includes Multiple content nodes;The multiple content node on the structure traversal tree is traversed, is obtained according to first traverse path The Property Name and the attribute value is obtained according to second traverse path.
Optionally, according to first traverse path before obtaining the Property Name in page info, can be true Determine the type of Property Name;If it is determined that the Property Name is open Property Name (Xpath forms), then traversed according to described first Path obtains the Property Name from page info.If the Property Name is specific Property Name, e.g., " company's industry " corporate business ", " development course ", then can be determined as Property Name, therefore in this case by business ", " development course " etc. The Property Name need not be obtained from page info according to the first traverse path.
For example, as shown in figure 5, by DOM parsing html page information, corresponding DOM tree are generated.DOM tree packets Containing multiple content nodes, each content node shows as the content of text in a HTML markup or HTML markup.It is creating After DOM tree, according to the XPath in configuration information list as shown in table 4, the traversal content node in DOM tree, Obtain the value of information of the corresponding node content as XPath.For example, when XPath be /html/head/title when, Ke Yigen Html nodes, head nodes and title nodes in DOM Tree shown in fig. 5 are traversed successively according to/html/head/title, Then the value of information of the content of text " My title " of title nodes as XPath is obtained, in this way according to different times The value of information that path obtains each XPath respectively is gone through, attribute information list as shown in table 5 is ultimately produced, attribute information row Table includes particular community title " title " and the corresponding XPath values of information " XXX ", open Property Name and corresponding The XPath values of information " foreign language title ", open attribute value and the corresponding XPath values of information " ABC ".
5. attribute information list of table
Property Name/attribute value The XPath values of information
Title XXX
Open Property Name Foreign language title
Open attribute value ABC
S403, the mapping relations for establishing the Property Name and the attribute value are exported as acquisition of information result.
In the specific implementation, if Property Name is particular community title, by the corresponding XPath values of information of particular community title As the corresponding attribute value of particular community title, if Property Name is open Property Name, by open Property Name The XPath values of information and the XPath values of information of open attribute value establish mapping relations, according to<Property Name, attribute value>Format Output.
Such as:In attribute information list as shown in table 5, the corresponding attribute value of particular community title " title " is " XXX " opens the XPath values of information " ABC " that the corresponding attribute value of Property Name " foreign language title " is open attribute value, also, They can be exported respectively and be:<Title, XXX>,<Foreign language title, ABC>.
In embodiments of the present invention, the first traverse path of Property Name and the second traversal road of attribute value are obtained first Diameter;Then the Property Name is obtained from page info according to first traverse path and is traversed according to described second Path obtains the attribute value from the page info;Finally establish the mapping relations of the Property Name and the attribute value It is exported as acquisition of information result.It is obtained using traverse path for Property Name and attribute value, and by attribute Title and attribute value are mapped, and the accuracy of acquisition of information is improved.
Fig. 6 is referred to, Fig. 6 is the flow chart schematic diagram for another information acquisition method that the embodiment of the present invention proposes, should Method includes but not limited to following steps:
S601 obtains the first traverse path of Property Name and the second traverse path of attribute value.
In the specific implementation, the service request of user equipment transmission can be received first, service request is believed for request page Then breath obtains the URL of page info;According to the URL, the attribute-bit of Property Name and attribute value, last basis are obtained Attribute-bit obtains the first traverse path and the second traverse path from configuration file, wherein configuration file includes attribute mark Know, the correspondence with the first traverse path and the second traverse path.
Wherein, system includes configuration file pattern.conf and configuration file xpath.conf, configuration file Pattern.conf includes pattern and its corresponding pattern_id, for example, as shown in table 1, pattern.conf files Including pattern_id 0 and pattern_id 1 and their corresponding pattern.Configuration file xpath.conf includes The XPath of pattern_id, Property Name and attribute value, wherein the Property Name in xpath.conf files includes specific The Property Name of Property Name and XPath forms, wherein the Property Name of XPath forms is open Property Name, open attribute The corresponding attribute value of title is open attribute value, it should be noted that only there are one corresponding open categories for an open Property Name Property value.For example, as shown in table 6, pattern_id 0 corresponds to two attribute titles, the first is specific Property Name, such as table 6 In the first row shown in, Property Name " title " is specific Property Name, the XPath "/html/body/ of corresponding attribute value div[4]/div[2]/div/div[2]/dd/h1”;For second the Property Name of XPath forms, as in table 6 the second row and Shown in the third line, "/the html/body/div [4]/div [2]/div/dl [1]/dt [1] " in Property Name and "/html/ Body/div [4]/div [2]/div/dl [1]/dt [2] " is the Property Name of XPath forms, the XPath of corresponding attribute value "/ Html/body/div [4]/div [2]/div/dl [1]/dd [1] " and "/html/body/div [4]/div [2]/div/d1 [1]/dd[2]”。
6. modified xpath.conf files of table
For example, after receiving service request, loading configuration file pattern.conf and configuration file first Then xpath.conf obtains the URL of the requested page info of user equipment, the URL of the page is parsed by regular expression, Matching inquiry is carried out to configuration file pattern.conf and then obtains corresponding pattern and pattern_id, and is generated Pattern_id lists obtain particular community title and correspondence according to pattern_id lists from configuration file xpath.conf Attribute value XPath and open Property Name XPath and corresponding open attribute value XPath, if shared n are opened Property Name to be put, then can be respectively designated as open Property Name _ 1, open Property Name _ 2 ... open Property Name _ n, Corresponding open attribute value is named as open attribute value _ 1, open attribute value _ 2 ..., open attribute value _ n.For example, can be first Pattern_id is obtained from configuration file pattern.conf as shown in Table 1, then according to pattern_id from such as 6 institute of table The xpath.conf files shown obtain the XPath of Property Name, the XPath of attribute value or Property Name respectively, and then generate such as Configuration information list shown in table 7.Configuration information list includes pattern_id 0, particular community title " title " and corresponding The XPath "/html/body/div [4]/div [2]/div/div [2]/dd/h1 " of attribute value, Property Name _ 1 is opened XPath "/html/body/div [4]/div [2]/div/dl [1]/dt [1] " and corresponding open attribute value _ 1 XPath "/ Html/body/div [4]/div [2]/div/dl [1]/dd [1] ", the XPath "/html/body/div for opening Property Name _ 2 [4] the XPath "/html/body/div [4]/div of/div [2]/div/dl [1]/dt [2] " and corresponding open attribute value _ 2 [2]/div/dl[1]/dd[2]”。
7. configuration information list of table
pattern_id Property Name/attribute value XPath
0 Title /html/body/div[4]/div[2]/div/div[2]/dd/h1
0 Open Property Name _ 1 /html/body/div[4]/div[2]/div/dl[1]/dt[1]
0 Open attribute value _ 1 /html/body/div[4]/div[2]/div/dl[1]/dd[1]
0 Open Property Name _ 2 /html/body/div[4]/div[2]/div/dl[1]/dt[2]
0 Open attribute value _ 2 /html/body/div[4]/div[2]/div/dl[1]/dd[2]
S602 obtains the Property Name and according to described according to first traverse path from page info Two traverse paths obtain the attribute value from the page info.
In the specific implementation, structure traversal tree can be created according to the page info, wherein the structure traversal, which is set, includes Multiple content nodes;The multiple content node on the structure traversal tree is traversed, is obtained according to first traverse path The Property Name and the attribute value is obtained according to second traverse path.
Optionally, according to first traverse path before obtaining the Property Name in page info, can be true Determine the type of Property Name;If it is determined that the Property Name is open Property Name (Xpath forms), then traversed according to described first Path obtains the Property Name from page info.If the Property Name is specific Property Name, e.g., " company's industry " corporate business ", " development course ", then can be determined as Property Name, therefore in this case by business ", " development course " etc. The Property Name need not be obtained from page info according to the first traverse path.
For example, as shown in figure 5, by DOM parsing html page information, corresponding DOM tree are generated.DOM tree packets Containing multiple content nodes, each content node shows as the content of text in a HTML markup or HTML markup.It is creating After DOM tree, according to the XPath in configuration information list as shown in table 7, the traversal content node in DOM tree, Obtain the value of information of the corresponding node content as XPath.For example, when XPath be /html/head/title when, Ke Yigen Html nodes, head nodes and title nodes in DOM Tree shown in fig. 5 are traversed successively according to/html/head/title, Then the value of information of the content of text " My title " of title nodes as XPath is obtained, in this way according to different times The value of information that path obtains each XPath respectively is gone through, attribute information list as shown in table 8 is ultimately produced, including specific Property Name " title " and the corresponding XPath values of information " XXX ", the open Property Name _ 1 and corresponding XPath values of information The value of information " ABC " of " foreign language title ", open attribute value _ 1 and corresponding XPath, open Property Name _ 2 and corresponding The XPath values of information " general headquarters place " and the value of information " China Shenzhen " of open attribute value _ 2 and the XPath answered.
8. attribute information list of table
Property Name/attribute value The XPath values of information
Title XXX
Open Property Name _ 1 Foreign language title
Open attribute value _ 1 ABC
Open Property Name _ 2 General headquarters place
Open attribute value _ 2 China Shenzhen
S603 obtains the first map tags of the Property Name and the second map tags of the attribute value.
In the specific implementation, if Property Name/attribute value is open Property Name _ n or open attribute value _ n, can obtain First map tags of the value of information of the number " n " as corresponding XPath in open Property Name _ n, obtain open attribute The second map tags of number " n " in value _ n as the value of information of corresponding XPath, wherein n, which can be 1,2,3 ... waits any Integer.For example, in attribute information list as shown in table 8, the number " 1 " in open Property Name _ 1 is obtained as corresponding First map tags of the value of information " foreign language title " of XPath obtain the number " 1 " in open attribute value _ 1 as corresponding Second map tags of the value of information " China Shenzhen " of XPath.
S604 establishes the Property Name and the category according to first map tags and second map tags Property value mapping relations, output information obtain result.
In the specific implementation, if Property Name is particular community title, by the corresponding XPath values of information of particular community title As the corresponding attribute value of particular community title, can by they according to<Property Name:Attribute value>Form exported, For example, in attribute information list as shown in table 8, the corresponding attribute value of particular community title " title " is exactly " XXX ", and will They are exported:<Title, XXX>.
It, will open Property Name _ n pairs if Property Name/attribute value is open Property Name _ n or open attribute value _ n The value of information of the XPath answered is stored in the nth position of open Property Name list as Property Name;Similarly, belong to open Property value _ n corresponding XPath the value of information nth position of open list of attribute values, traversal attribute letter are stored in as attribute value It ceases opening Property Name _ 1 in list and arrives open Property Name _ n, and open attribute value _ n is arrived in open attribute value _ 1.Finally, When first map tags are identical as the second map tags, the corresponding Property Name of the first map tags and second are mapped The corresponding attribute value of label establishes mapping relations, and can be according to<Property Name, attribute value>Form output.
For example, as shown in table 9-1 and table 9-2, the first of Property Name " foreign language title " is reflected in open Property Name list It is 1 to penetrate label, and the second map tags of attribute value " ABC " are 1 in open list of attribute values, therefore, Property Name " outer literary fame First map tags of title " are identical as the second map tags of attribute value " ABC ", to establish " foreign language title " and " ABC " Mapping relations, and they are pressed<Foreign language title:ABC>Form output.Similarly, the first mapping of Property Name " general headquarters place " Label is 2, and the second map tags of attribute value " China Shenzhen " are also 2, " general headquarters place " and " Chinese deep so as to establish The mapping relations of ditch between fields ", and export<General headquarters place:China Shenzhen>.
In embodiments of the present invention, the first traverse path of Property Name and the second traversal road of attribute value are obtained first Diameter;Then the Property Name is obtained from page info according to first traverse path and is traversed according to described second Path obtains the attribute value from the page info;Finally establish the mapping relations of the Property Name and the attribute value It is exported as acquisition of information result.It is obtained using traverse path for Property Name and attribute value, and by attribute Title and attribute value are mapped, and the accuracy of acquisition of information is improved.
Fig. 7 is referred to, Fig. 7 is a kind of structural schematic diagram for information acquisition device that the embodiment of the present invention proposes, the information Acquisition device may include:
Path acquisition module 701, the second traverse path of the first traverse path and attribute value for obtaining Property Name.
In the specific implementation, the service request of user equipment transmission can be received first, service request is believed for request page Then breath obtains the URL of page info;According to the URL, the attribute-bit of Property Name and attribute value, last basis are obtained Attribute-bit obtains the first traverse path and the second traverse path from configuration file, wherein configuration file includes attribute mark Know, the correspondence with the first traverse path and the second traverse path.
Wherein, system includes configuration file pattern.conf and configuration file xpath.conf, configuration file Pattern.conf includes pattern and its corresponding pattern_id, for example, as shown in table 1, pattern.conf files Including pattern_id 0 and pattern_id 1 and their corresponding pattern.Configuration file xpath.conf includes The XPath of pattern_id, Property Name and attribute value, wherein the Property Name in xpath.conf files includes specific The Property Name of Property Name and XPath forms, wherein the Property Name of XPath forms is open Property Name, open attribute The corresponding attribute value of title is open attribute value, it should be noted that only there are one corresponding open categories for an open Property Name Property value.For example, as shown in table 6, pattern_id 0 corresponds to two attribute titles, the first is specific Property Name, such as table 6 In the first row shown in, Property Name " title " is specific Property Name, the XPath "/html/body/ of corresponding attribute value div[4]/div[2]/div/div[2]/dd/h1”;For second the Property Name of XPath forms, as in table 6 the second row and Shown in the third line, "/the html/body/div [4]/div [2]/div/dl [1]/dt [1] " in Property Name and "/html/ Body/div [4]/div [2]/div/dl [1]/dt [2] " is the Property Name of XPath forms, the XPath of corresponding attribute value "/ Html/body/div [4]/div [2]/div/dl [1]/dd [1] " and "/html/body/div [4]/div [2]/div/d1 [1]/dd[2]”。
For example, after receiving service request, loading configuration file pattern.conf and configuration file first Then xpath.conf obtains the URL of the requested page info of user equipment, the URL of the page is parsed by regular expression, Matching inquiry is carried out to configuration file pattern.conf and then obtains corresponding pattern and pattern_id, and is generated Pattern_id lists obtain particular community title and correspondence according to pattern_id lists from configuration file xpath.conf Attribute value XPath and open Property Name XPath and corresponding open attribute value XPath, if shared n are opened Property Name to be put, then can be respectively designated as open Property Name _ 1, open Property Name _ 2 ... open Property Name _ n, Corresponding open attribute value is named as open attribute value _ 1, open attribute value _ 2 ..., open attribute value _ n.For example, can be first Pattern_id is obtained from configuration file pattern.conf as shown in Table 1, then according to pattern_id from such as 6 institute of table The xpath.conf files shown obtain the XPath of Property Name, the XPath of attribute value or Property Name respectively, and then generate such as Configuration information list shown in table 7.Configuration information list includes pattern_id 0, particular community title " title " and corresponding The XPath "/html/body/div [4]/div [2]/div/div [2]/dd/h1 " of attribute value, Property Name _ 1 is opened XPath "/html/body/div [4]/div [2]/div/dl [1]/dt [1] " and corresponding open attribute value _ 1 XPath "/ Html/body/div [4]/div [2]/div/dl [1]/dd [1] ", the XPath "/html/body/div for opening Property Name _ 2 [4] the XPath "/html/body/div [4]/div of/div [2]/div/dl [1]/dt [2] " and corresponding open attribute value _ 2 [2]/div/dl[1]/dd[2]”。
Data obtaining module 702, for according to first traverse path obtained from page info the Property Name, And the attribute value is obtained from the page info according to second traverse path.
In the specific implementation, structure traversal tree can be created according to the page info, wherein the structure traversal, which is set, includes Multiple content nodes;The multiple content node on the structure traversal tree is traversed, is obtained according to first traverse path The Property Name and the attribute value is obtained according to second traverse path.
Optionally, according to first traverse path before obtaining the Property Name in page info, can be true Determine the type of Property Name;If it is determined that the Property Name is open Property Name (Xpath forms), then traversed according to described first Path obtains the Property Name from page info.If the Property Name is specific Property Name, e.g., " company's industry " corporate business ", " development course ", then can be determined as Property Name, therefore in this case by business ", " development course " etc. The Property Name need not be obtained from page info according to the first traverse path.
For example, as shown in figure 5, by DOM parsing html page information, corresponding DOM tree are generated.DOM tree packets Containing multiple content nodes, each content node shows as the content of text in a HTML markup or HTML markup.It is creating After DOM tree, according to the XPath in configuration information list as shown in table 7, the traversal content node in DOM tree, Obtain the value of information of the corresponding node content as XPath.For example, when XPath be /html/head/title when, Ke Yigen Html nodes, head nodes and title nodes in DOM Tree shown in fig. 5 are traversed successively according to/html/head/title, Then the value of information of the content of text " My title " of title nodes as XPath is obtained, in this way according to different times The value of information that path obtains each XPath respectively is gone through, attribute information list as shown in table 8 is ultimately produced, including specific Property Name " title " and the corresponding XPath values of information " XXX ", the open Property Name _ 1 and corresponding XPath values of information The value of information " ABC " of " foreign language title ", open attribute value _ 1 and corresponding XPath, open Property Name _ 2 and corresponding The XPath values of information " general headquarters place " and the value of information " China Shenzhen " of open attribute value _ 2 and corresponding XPath.
As a result output module 703, the mapping relations for establishing the Property Name and the attribute value are obtained as information Result is taken to be exported.
In the specific implementation, if Property Name is particular community title, by the corresponding XPath values of information of particular community title As the corresponding attribute value of particular community title, can by they according to<Property Name:Attribute value>Form exported, For example, in attribute information list as shown in table 8, the corresponding attribute value of particular community title " title " is exactly " XXX ", and will They are exported:<Title, XXX>.
If Property Name/attribute value is open Property Name _ n or open attribute value _ n, open attribute-name is obtained first The first map tags of number " n " in title _ n as the value of information of corresponding XPath, open Property Name _ n is corresponding The value of information of XPath is stored in the nth position of open Property Name list as Property Name, similarly, obtains open attribute The second map tags of number " n " in value _ n as the value of information of corresponding XPath, open attribute value _ n is corresponding The value of information of XPath is stored in the nth position of open list of attribute values as attribute value, wherein n can be 1,2,3 ... wait appoint One integer traverses opening Property Name _ 1 in attribute information list and arrives open Property Name _ n, and open attribute value _ 1 is arrived Open attribute value _ n.For example, in attribute information list as shown in table 8, the number " 1 " obtained in open Property Name _ 1 is made The first map tags for the corresponding XPath values of information " foreign language title " are 1, and " foreign language title " is stored as Property Name On the 1st position of open Property Name list, the number " 1 " obtained in open attribute value _ 1 is believed as corresponding XPath Second map tags of breath value " China Shenzhen ", and " China Shenzhen " is stored in the of open list of attribute values as attribute value On 1 position.
Finally, when first map tags are identical as the second map tags, by the corresponding attribute of the first map tags Title attribute value corresponding with the second map tags establishes mapping relations, and can be according to<Property Name, attribute value>Form Output.
For example, as shown in table 9-1 and table 9-2, the first of Property Name " foreign language title " is reflected in open Property Name list It is 1 to penetrate label, and the second map tags of attribute value " ABC " are 1 in open list of attribute values, therefore, Property Name " outer literary fame First map tags of title " are identical as the second map tags of attribute value " ABC ", to establish " foreign language title " and " ABC " Mapping relations press them<Foreign language title:ABC>Form output.Similarly, the first mapping mark of Property Name " general headquarters place " Label are 2, and the second map tags of attribute value " China Shenzhen " are also 2, so as to establish " general headquarters place " and " China Shenzhen " Mapping relations, and export<General headquarters place:China Shenzhen>.
In embodiments of the present invention, the first traverse path of Property Name and the second traversal road of attribute value are obtained first Diameter;Then the Property Name is obtained from page info according to first traverse path and is traversed according to described second Path obtains the attribute value from the page info;Finally establish the mapping relations of the Property Name and the attribute value It is exported as acquisition of information result.It is obtained using traverse path for Property Name and attribute value, and by attribute Title and attribute value are mapped, and the accuracy of acquisition of information is improved.
Continuing with the structural schematic diagram for referring to Fig. 8, Fig. 8 being a kind of information acquisition apparatus that the embodiment of the present invention proposes.Such as Shown in figure, which may include:At least one processor 801, at least one communication interface 802 are at least one Memory 803 and at least one communication bus 804.
Wherein, processor 801 can be central processor unit, general processor, digital signal processor, special integrated Circuit, field programmable gate array either other programmable logic device, transistor logic, hardware component or it is arbitrary Combination.It may be implemented or execute various illustrative logic blocks, module and electricity in conjunction with described in the disclosure of invention Road.The processor can also be to realize the combination of computing function, such as combine comprising one or more microprocessors, number letter The combination etc. of number processor and microprocessor.Communication bus 804 can be Peripheral Component Interconnect standard PCI bus or extension work Industry normal structure eisa bus etc..The bus can be divided into address bus, data/address bus, controlling bus etc..For ease of indicating, It is only indicated with a thick line in Fig. 8, it is not intended that an only bus or a type of bus.Communication bus 804 is used for Realize the connection communication between these components.Wherein, the communication interface 802 of equipment is used for and other nodes in the embodiment of the present invention Equipment carries out the communication of signaling or data.Memory 803 may include volatile memory, such as non-volatile dynamic random is deposited Take memory (Nonvolatile Random Access Memory, NVRAM), phase change random access memory (Phase Change RAM, PRAM), magnetic-resistance random access memory (Magetoresistive RAM, MRAM) etc., can also include non- Volatile memory, for example, at least a disk memory, Electrical Erasable programmable read only memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), flush memory device, such as anti-or flash memory (NOR Flash memory) or anti-and flash memory (NAND flash memory), semiconductor devices, such as solid state disk (Solid State Disk, SSD) etc..Memory 803 optionally can also be at least one storage for being located remotely from aforementioned processor 801 Device.Batch processing code is stored in memory 803, and processor 801 executes the program in memory 803:
Obtain the second traverse path of the first traverse path and attribute value of Property Name;
The Property Name is obtained from page info according to first traverse path and is traversed according to described second Path obtains the attribute value from the page info;
The mapping relations for establishing the Property Name and the attribute value are exported as acquisition of information result.
Optionally, processor 801 is additionally operable to execute following operating procedure:
Obtain the first map tags of the Property Name and the second map tags of the attribute value;
According to first map tags and second map tags, the Property Name and the attribute value are established Mapping relations.
Optionally, processor 801 is additionally operable to execute following operating procedure:
When first map tags are identical as the second map tags, the Property Name and the attribute value are established Mapping relations.
Optionally, processor 801 is additionally operable to execute following operating procedure:
Structure traversal tree is created according to the page info, wherein the structure traversal tree includes multiple content nodes;
The multiple content node on the structure traversal tree is traversed, the category is obtained according to first traverse path Property title and the attribute value is obtained according to second traverse path.
Optionally, processor 801 is additionally operable to execute following operating procedure:
Obtain the attribute-bit of the Property Name and the attribute value;
According to the attribute-bit, first traverse path and second traverse path are obtained from configuration file, Wherein, the configuration file include the attribute-bit, it is corresponding with first traverse path and second traverse path Relationship.
Optionally, processor 801 is additionally operable to execute following operating procedure:
Obtain the uniform resource locator of the page info;
According to the uniform resource locator, the attribute-bit of the Property Name and the attribute value is obtained.
Optionally, processor 801 is additionally operable to execute following operating procedure:
Determine the type of the Property Name;
If the Property Name is open Property Name, according to the acquisition of the first traverse path of the Property Name Property Name.
Further, processor can also be matched with memory and communication interface, executed and provided in foregoing invention embodiment The operation of source control device.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or its arbitrary combination real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to the flow or function described in the embodiment of the present invention.The computer can be all-purpose computer, special meter Calculation machine, computer network or other programmable devices.The computer instruction can be stored in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state disk Solid State Disk (SSD)) etc..
Above-described specific implementation mode has carried out further the purpose of the present invention, technical solution and advantageous effect It is described in detail.All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in Within protection scope of the present invention.

Claims (15)

1. a kind of information acquisition method, which is characterized in that the method includes:
Obtain the second traverse path of the first traverse path and attribute value of Property Name;
The Property Name is obtained from page info according to first traverse path and according to second traverse path The attribute value is obtained from the page info;
The mapping relations for establishing the Property Name and the attribute value are exported as acquisition of information result.
2. the method as described in claim 1, which is characterized in that the mapping for establishing the Property Name and the attribute value Relationship includes:
Obtain the first map tags of the Property Name and the second map tags of the attribute value;
According to first map tags and second map tags, the mapping of the Property Name and the attribute value is established Relationship.
3. the method as described in right wants 2, which is characterized in that described to be marked with second mapping according to first map tags Label, the mapping relations for establishing the Property Name and the attribute value include:
When first map tags are identical as the second map tags, the mapping of the Property Name and the attribute value is established Relationship.
4. method as shown in claim 1, which is characterized in that described to be obtained from page info according to first traverse path It takes the Property Name and the attribute value is obtained from the page info according to second traverse path and include:
Structure traversal tree is created according to the page info, wherein the structure traversal tree includes multiple content nodes;
The multiple content node on the structure traversal tree is traversed, the attribute-name is obtained according to first traverse path Claim and the attribute value is obtained according to second traverse path.
5. method as shown in claim 1, which is characterized in that first traverse path and attribute value for obtaining Property Name The second traverse path include:
Obtain the attribute-bit of the Property Name and the attribute value;
According to the attribute-bit, first traverse path and second traverse path are obtained from configuration file, wherein The configuration file includes the attribute-bit, the correspondence with first traverse path and second traverse path.
6. method as claimed in claim 5, which is characterized in that the attribute for obtaining the Property Name and the attribute value Mark includes:
Obtain the uniform resource locator of the page info;
According to the uniform resource locator, the attribute-bit of the Property Name and the attribute value is obtained.
7. method as claimed in any one of claims 1 to 6, which is characterized in that it is described according to first traverse path from the page The Property Name is obtained in information includes:
Determine the type of the Property Name;
If the Property Name is open Property Name, the attribute is obtained according to the first traverse path of the Property Name Title.
8. a kind of information acquisition device, which is characterized in that described device includes:
Path acquisition module, the second traverse path of the first traverse path and attribute value for obtaining Property Name;
Data obtaining module, for obtaining the Property Name, Yi Jigen from page info according to first traverse path The attribute value is obtained from the page info according to second traverse path;
As a result output module, the mapping relations for establishing the Property Name and the attribute value are defeated as the progress of acquisition of information result Go out.
9. device as claimed in claim 8, which is characterized in that the result output module is specifically used for:
Obtain the first map tags of the Property Name and the second map tags of the attribute value;
According to first map tags and second map tags, the mapping of the Property Name and the attribute value is established Relationship.
10. device as claimed in claim 9, which is characterized in that the result output module is specifically used for:
When first map tags are identical as the second map tags, the mapping of the Property Name and the attribute value is established Relationship.
11. device as claimed in claim 8, which is characterized in that described information acquisition module is specifically used for:
Structure traversal tree is created according to the page info, wherein the structure traversal tree includes multiple content nodes;
The multiple content node on the structure traversal tree is traversed, the attribute-name is obtained according to first traverse path Claim and the attribute value is obtained according to second traverse path.
12. device as claimed in claim 8, which is characterized in that the path acquisition module is specifically used for:
Obtain the attribute-bit of the Property Name and the attribute value;
According to the attribute-bit, first traverse path and second traverse path are obtained from configuration file, wherein The configuration file includes the attribute-bit, the correspondence with first traverse path and second traverse path.
13. device as claimed in claim 12, which is characterized in that the path acquisition module is specifically used for:
Obtain the uniform resource locator of the page info;
According to the uniform resource locator, the attribute-bit of the Property Name and the attribute value is obtained.
14. any one device as described in claim 8-13, which is characterized in that described information acquisition module is specifically used for:
Determine the type of the Property Name;
If the Property Name is open Property Name, the attribute is obtained according to the first traverse path of the Property Name Title.
15. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has a plurality of finger It enables, described instruction is suitable for being loaded by processor and being executed such as claim 1-7 any one of them methods.
CN201810009236.XA 2018-01-03 2018-01-03 Information acquisition method and related equipment Active CN108334560B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810009236.XA CN108334560B (en) 2018-01-03 2018-01-03 Information acquisition method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810009236.XA CN108334560B (en) 2018-01-03 2018-01-03 Information acquisition method and related equipment

Publications (2)

Publication Number Publication Date
CN108334560A true CN108334560A (en) 2018-07-27
CN108334560B CN108334560B (en) 2022-04-15

Family

ID=62924834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810009236.XA Active CN108334560B (en) 2018-01-03 2018-01-03 Information acquisition method and related equipment

Country Status (1)

Country Link
CN (1) CN108334560B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030212705A1 (en) * 1994-12-07 2003-11-13 Richard Williamson Method and apparatus for mapping objects to multiple tables of a database
US20070244887A1 (en) * 2006-04-18 2007-10-18 Benq Corporation Systems and methods for discovering frequently accessed subtrees
CN101183385A (en) * 2007-12-04 2008-05-21 西安交通大学 XML enquire method based on multi-modality indexes structure
CN101593184A (en) * 2008-05-29 2009-12-02 国际商业机器公司 The system and method for self-adaptively locating dynamic web page elements
JP2010012853A (en) * 2008-07-02 2010-01-21 Navitime Japan Co Ltd Path search system, path search server, path search method, and terminal device
CN101887458A (en) * 2010-07-06 2010-11-17 江苏大学 Path coding-based XML document index method
CN101984434A (en) * 2010-11-16 2011-03-09 东北大学 Webpage data extracting method based on extensible language query
CN102693240A (en) * 2011-03-25 2012-09-26 北京航空航天大学 Formal description method and device of Web service protocol semantics
CN102760150A (en) * 2012-04-05 2012-10-31 中国人民解放军国防科学技术大学 Webpage extraction method based on attribute reproduction and labeled path
CN103049494A (en) * 2012-12-07 2013-04-17 华为技术有限公司 Method and device for storing table of extensible markup language (XML) file
US20130297657A1 (en) * 2012-05-01 2013-11-07 Gajanan Chinchwadkar Apparatus and Method for Forming and Using a Tree Structured Database with Top-Down Trees and Bottom-Up Indices
CN106294641A (en) * 2016-08-03 2017-01-04 朱杰 A kind of orientation lookup method getting in touch with object
CN106599280A (en) * 2016-12-23 2017-04-26 北京奇虎科技有限公司 Webpage node path information determination method and apparatus
CN106709980A (en) * 2017-01-09 2017-05-24 北京航空航天大学 Complex three-dimensional scene modeling method based on formalization
CN106844640A (en) * 2017-01-22 2017-06-13 漳州科技职业学院 A kind of web data analysis and processing method

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030212705A1 (en) * 1994-12-07 2003-11-13 Richard Williamson Method and apparatus for mapping objects to multiple tables of a database
US20070244887A1 (en) * 2006-04-18 2007-10-18 Benq Corporation Systems and methods for discovering frequently accessed subtrees
CN101183385A (en) * 2007-12-04 2008-05-21 西安交通大学 XML enquire method based on multi-modality indexes structure
CN101593184A (en) * 2008-05-29 2009-12-02 国际商业机器公司 The system and method for self-adaptively locating dynamic web page elements
JP2010012853A (en) * 2008-07-02 2010-01-21 Navitime Japan Co Ltd Path search system, path search server, path search method, and terminal device
CN101887458A (en) * 2010-07-06 2010-11-17 江苏大学 Path coding-based XML document index method
CN101984434A (en) * 2010-11-16 2011-03-09 东北大学 Webpage data extracting method based on extensible language query
CN102693240A (en) * 2011-03-25 2012-09-26 北京航空航天大学 Formal description method and device of Web service protocol semantics
CN102760150A (en) * 2012-04-05 2012-10-31 中国人民解放军国防科学技术大学 Webpage extraction method based on attribute reproduction and labeled path
US20130297657A1 (en) * 2012-05-01 2013-11-07 Gajanan Chinchwadkar Apparatus and Method for Forming and Using a Tree Structured Database with Top-Down Trees and Bottom-Up Indices
CN103049494A (en) * 2012-12-07 2013-04-17 华为技术有限公司 Method and device for storing table of extensible markup language (XML) file
CN106294641A (en) * 2016-08-03 2017-01-04 朱杰 A kind of orientation lookup method getting in touch with object
CN106599280A (en) * 2016-12-23 2017-04-26 北京奇虎科技有限公司 Webpage node path information determination method and apparatus
CN106709980A (en) * 2017-01-09 2017-05-24 北京航空航天大学 Complex three-dimensional scene modeling method based on formalization
CN106844640A (en) * 2017-01-22 2017-06-13 漳州科技职业学院 A kind of web data analysis and processing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张婷等: "XPath语义特性及其对XML数据操作的应用研究", 《信息技术》 *

Also Published As

Publication number Publication date
CN108334560B (en) 2022-04-15

Similar Documents

Publication Publication Date Title
US10936179B2 (en) Methods and systems for web content generation
US11372935B2 (en) Automatically generating a website specific to an industry
US10796076B2 (en) Method and system for providing suggested tags associated with a target web page for manipulation by a useroptimal rendering engine
US10452787B2 (en) Techniques for automated document translation
US8572202B2 (en) Persistent saving portal
US8468145B2 (en) Indexing of URLs with fragments
US20100250649A1 (en) Scope-Based Extensibility for Control Surfaces
US20080040661A1 (en) Method for inheriting a Wiki page layout for a Wiki page
US20080010387A1 (en) Method for defining a Wiki page layout using a Wiki page
US20180011933A1 (en) Method, apparatus, and server for generating hotspot content
WO2021051624A1 (en) Data acquisition method and apparatus, and electronic device and storage medium
CN110365724A (en) Task processing method, device and electronic equipment
US20130318133A1 (en) Techniques to manage universal file descriptor models for content files
US20080010388A1 (en) Method and apparatus for server wiring model
CN113656737A (en) Webpage content display method and device, electronic equipment and storage medium
US20240061992A1 (en) Generating tagged content from text of an electronic document
KR20090087502A (en) Really simple syndication for data
CN108334560A (en) A kind of information acquisition method and relevant device
CN110516174A (en) The method, apparatus and storage medium of text are obtained based on Simple Syndication
US11914943B1 (en) Generating an electronic document with a consistent text ordering
CN113779438B (en) Webpage text information processing method and device and terminal equipment
Buzydlowski et al. A comparison of a hierarchical tree to an associative map interface for the selection of classification terms
Tarczyński Model of long line with influence of screen
Guenther MIX: what it stands for: Metadata for Images in XML Schema
Troy National Guest Systems probes Windows application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant