CN108334560A - A kind of information acquisition method and relevant device - Google Patents
A kind of information acquisition method and relevant device Download PDFInfo
- Publication number
- CN108334560A CN108334560A CN201810009236.XA CN201810009236A CN108334560A CN 108334560 A CN108334560 A CN 108334560A CN 201810009236 A CN201810009236 A CN 201810009236A CN 108334560 A CN108334560 A CN 108334560A
- Authority
- CN
- China
- Prior art keywords
- property name
- attribute value
- attribute
- traverse path
- xpath
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The embodiment of the invention discloses a kind of information acquisition method and relevant devices, including:The second traverse path of the first traverse path and attribute value of Property Name is obtained first;Then according to first traverse path from obtaining the Property Name in page info and obtain the attribute value from the page info according to second traverse path;Then the mapping relations for establishing the Property Name and the attribute value are exported as acquisition of information result.Using the embodiment of the present invention, the accuracy of acquisition of information can be improved.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of information acquisition methods and relevant device.
Background technology
Information carrier online at present is mainly text, can be the information for including in text by way of acquisition of information
Structuring processing is carried out, the same organizational form of table is become, what input information obtained system is urtext, such as:Webpage
Data or individual word content, output be set form information point.Information point is obtained from various documents
It takes out, is then integrated in the form of unified, can efficiently obtain letter from a large amount of document by means of which
Breath.Acquisition of information is commonly based on extensible markup language path language (Xml Path Language, XPath) realization,
The Property Name of information is fixed in current information acquisition method, only to the corresponding attribute of the Property Name of information needed
Value configuration XPath, and the specific of attribute value is obtained in the corresponding file structure model of text (Dom tree) by XPath
Content.For example, being the infobox information of encyclopaedia entry " XXX " as shown in Figure 1, wherein " Business Name ", " foreign language title " etc. are
Property Name, " XXX Co., Ltds of Shenzhen ", " ABC " are corresponding attribute values, obtain composition infobox information when,
" Business Name " and " foreign language title " is fixed, and " XXX Co., Ltds of Shenzhen " and " ABC " system are the XPath by them
It is obtained from the corresponding DOM tree of html text content of the Baidupedia page of " XXX ".
However, due in the different pages, the registration of attribute value is higher, and Property Name difference is larger, for example, figure
The corresponding Property Name of attribute value " internet " in 1 is " business scope ", still, in Baidu's entry " internet "
In infobox information, the Property Name of " internet " is " Chinese name ".Therefore, this fixed attribute title, only adopts attribute value
Cause the accuracy of acquisition of information low with the method that XPath modes are obtained.
Invention content
A kind of information acquisition method of offer of the embodiment of the present invention and relevant device.The accuracy of acquisition of information can be improved.
First aspect present invention provides a kind of information acquisition method, including:
Obtain the second traverse path of the first traverse path and attribute value of Property Name;
The Property Name is obtained from page info according to first traverse path and is traversed according to described second
Path obtains the attribute value from the page info;
The mapping relations for establishing the Property Name and the attribute value are exported as acquisition of information result.
Wherein, described to establish the Property Name and the mapping relations of the attribute value include:
Obtain the first map tags of the Property Name and the second map tags of the attribute value;
According to first map tags and second map tags, the Property Name and the attribute value are established
Mapping relations.
Wherein, described according to first map tags and second map tags, establish the Property Name and institute
The mapping relations for stating attribute value include:
When first map tags are identical as the second map tags, the Property Name and the attribute value are established
Mapping relations.
Wherein, described the Property Name to be obtained from page info according to first traverse path and according to institute
It states the second traverse path and obtains the attribute value from the page info and include:
Structure traversal tree is created according to the page info, wherein the structure traversal tree includes multiple content nodes;
The multiple content node on the structure traversal tree is traversed, the category is obtained according to first traverse path
Property title and the attribute value is obtained according to second traverse path.
Wherein, first traverse path for obtaining Property Name and the second traverse path of attribute value include:
Obtain the attribute-bit of the Property Name and the attribute value;
According to the attribute-bit, first traverse path and second traverse path are obtained from configuration file,
Wherein, the configuration file include the attribute-bit, it is corresponding with first traverse path and second traverse path
Relationship.
Wherein, the attribute-bit of the acquisition Property Name and the attribute value includes:
Obtain the uniform resource locator of the page info;
According to the uniform resource locator, the attribute-bit of the Property Name and the attribute value is obtained.
Wherein, described the Property Name is obtained from page info according to first traverse path to include:
Determine the type of the Property Name;
If the Property Name is open Property Name, according to the acquisition of the first traverse path of the Property Name
Property Name.
Correspondingly, second aspect of the present invention provides a kind of information acquisition device, including:
Path acquisition module, the second traverse path of the first traverse path and attribute value for obtaining Property Name;
Data obtaining module, for according to first traverse path obtained from page info the Property Name, with
And the attribute value is obtained from the page info according to second traverse path;
As a result output module, establish the mapping relations of the Property Name and the attribute value as acquisition of information result into
Row output.
Wherein, the result output module is specifically used for:
Obtain the first map tags of the Property Name and the second map tags of the attribute value;
According to first map tags and second map tags, the Property Name and the attribute value are established
Mapping relations.
Wherein, the result output module is specifically used for:
When first map tags are identical as the second map tags, the Property Name and the attribute value are established
Mapping relations.
Wherein, described information acquisition module is specifically used for:
Structure traversal tree is created according to the page info, wherein the structure traversal tree includes multiple content nodes;
The multiple content node on the structure traversal tree is traversed, the category is obtained according to first traverse path
Property title and the attribute value is obtained according to second traverse path.
Wherein, the path acquisition module is specifically used for:
Obtain the attribute-bit of the Property Name and the attribute value;
According to the attribute-bit, first traverse path and second traverse path are obtained from configuration file,
Wherein, the configuration file include the attribute-bit, it is corresponding with first traverse path and second traverse path
Relationship.
Wherein, the path acquisition module is specifically used for:
Obtain the uniform resource locator of the page info;
According to the uniform resource locator, the attribute-bit of the Property Name and the attribute value is obtained.
Wherein, described information acquisition module is specifically used for:
Determine the type of the Property Name;
If the Property Name is open Property Name, according to the acquisition of the first traverse path of the Property Name
Property Name.
The third aspect, the present invention provides a kind of information acquisition apparatus, including:Processor, memory and communication bus,
In, for realizing connection communication between processor and memory, processor executes the program stored in memory and uses communication bus
Step in a kind of information acquisition method that above-mentioned first aspect offer is provided.
In a possible design, information acquisition apparatus provided by the invention can include for executing in the above method
The corresponding module of behavior.Module can be software and/or be hardware.
It is yet another aspect of the present invention to provide a kind of computer readable storage medium, in the computer readable storage medium
It is stored with a plurality of instruction, described instruction is suitable for being loaded by processor and executing the method described in above-mentioned various aspects.
It is yet another aspect of the present invention to provide a kind of computer program products including instruction, when it runs on computers
When so that computer executes the method described in above-mentioned various aspects.
Implement the embodiment of the present invention, obtains the first traverse path of Property Name and the second traversal road of attribute value first
Diameter;Then the Property Name is obtained from page info according to first traverse path and is traversed according to described second
Path obtains the attribute value from the page info;Finally establish the mapping relations of the Property Name and the attribute value
It is exported as acquisition of information result.It is obtained using traverse path for Property Name and attribute value, and by attribute
Title and attribute value are mapped, and the accuracy of acquisition of information is improved.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment
Attached drawing be briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, for this field
For those of ordinary skill, without creative efforts, other drawings may also be obtained based on these drawings.
Fig. 1 is a kind of schematic diagram for acquisition of information result that prior art provides;
Fig. 2 is a kind of structural schematic diagram of Information Acquisition System provided in an embodiment of the present invention;
Fig. 3 is a kind of flow diagram of information acquisition method provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of DOM tree provided in an embodiment of the present invention a kind of;
Fig. 5 is the flow diagram of another information acquisition method provided in an embodiment of the present invention;
Fig. 6 is the flow diagram of another information acquisition method provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram that the present invention implements a kind of information acquisition device provided;
Fig. 8 is a kind of structural schematic diagram for information acquisition apparatus that the embodiment of the present invention proposes.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained without creative efforts
Example, shall fall within the protection scope of the present invention.
Fig. 2 is referred to, Fig. 2 is a kind of structural schematic diagram of Information Acquisition System provided in an embodiment of the present invention, the information
Acquisition system includes user equipment 201 and server 202.Wherein, server 202 can be capable of providing network information browsing clothes
Website (Web) server of business.User equipment 201 can refer to the equipment for providing the voice and/or data connection that arrive user,
May be connected to laptop computer or desktop computer etc. computing device or its can be such as individual digital
The autonomous device of assistant (Personal Digital Assistant, PDA) etc..Wherein, server is for receiving user equipment
The service request of transmission, for the service request for asking browsing pages information, the unified resource for then parsing the webpage information is fixed
Position symbol (Uniform Resource Location, URL) obtains the traverse path and attribute of Property Name from configuration file
The traverse path of value finally obtains Property Name according to the traverse path of Property Name, is obtained according to the traverse path of attribute value
Attribute value, the correspondence for finally establishing Property Name and attribute value are sent to user equipment as information result is obtained.User
Equipment is used to send service request to server, and obtaining acquisition of information result from server is shown.
System is obtained based on above- mentioned information, as shown in figure 3, a kind of information acquisition method that the embodiment of the present invention proposes, packet
It includes:S301, system loads configuration file pattern.conf and configuration file xpath.conf, wherein pattern.conf files
ID number (pattern_id) including expression mode (pattern) and its corresponding expression mode, for example, as shown in table 1,
Pattern.conf files include pattern_id 0 and pattern_id 1 and corresponding pattern.Xpath.conf texts
Part includes the XPath of pattern_id, Property Name and attribute value.For example, as shown in table 2,0 time packet of pattern_id
Contain under " title " and " brief introduction " two Property Names and the XPath of their corresponding attribute values, pattern_id 1 and has included
The XPath of Property Name " label " and corresponding two attribute values.S302, by parse page info URL from
Pattern_id is obtained in pattern.conf files, and attribute is then obtained from xpath.conf files according to pattern_id
The XPath of value.S303 creates the DOM tree of page info, traverses the XPath of each attribute value under pattern_id, presses
According to the paths XPath, node content is obtained from DOM tree as corresponding attribute value.S304, according to Property Name and attribute
The correspondence of value, according to<Property Name, attribute value>Form output, can if a Property Name corresponds to M attribute value
With according to<Property Name, attribute value 1, attribute value 2 ..., attribute value M>Form output, for example, Property Name " label " is corresponding
Two attribute values " stock name " and " company ", then can export<Label, stock name, company>.
Table 1.pattern.conf files
pattern_id | pattern |
0 | ^https://baike\.baidu\.com/item/.+/\d+$ |
1 | ^https://baike\.baidu\.com/subview/\d+/\d+\.htm$ |
Table 2.xpath.conf files
However, due in the different pages, the registration of attribute value is higher, and Property Name difference is larger, therefore,
The different pages, possible entirely different, this fixed attribute title of the corresponding Property Name of the identical attribute values of XPath, only to belonging to
Property the method that is obtained using XPath modes of value cause the accuracy of acquisition of information low.In order to solve the problems, such as this, the present invention carries
Go out following solution.
Fig. 4 is referred to, Fig. 4 is the flow chart schematic diagram for another information acquisition method that the embodiment of the present invention proposes, should
Method includes but not limited to following steps:
S401 obtains the first traverse path of Property Name and the second traverse path of attribute value.
In the specific implementation, the service request of user equipment transmission can be received first, service request is believed for request page
Then breath obtains the URL of page info;According to the URL, the attribute-bit of Property Name and attribute value, last basis are obtained
Attribute-bit obtains the first traverse path and the second traverse path from configuration file, wherein configuration file includes attribute mark
Know, the correspondence with the first traverse path and the second traverse path.
Wherein, system includes configuration file pattern.conf and configuration file xpath.conf, configuration file
Pattern.conf includes pattern and its corresponding pattern_id, for example, as shown in table 1, pattern.conf files
Including pattern_id 0 and pattern_id 1 and their corresponding pattern.Configuration file xpath.conf includes
The XPath of pattern_id, Property Name and attribute value, wherein the Property Name in xpath.conf files includes specific
The Property Name of Property Name and XPath forms, wherein the Property Name of XPath forms is open Property Name, open attribute
The corresponding attribute value of title is open attribute value, it should be noted that only there are one corresponding open categories for an open Property Name
Property value.For example, as shown in table 3, pattern_id 0 corresponds to two attribute titles, the first is specific Property Name, such as table 3
In the first row shown in, Property Name " title " is specific Property Name, the XPath "/html/body/ of corresponding attribute value
div[4]/div[2]/div/div[2]/dd/h1”;It is for second the Property Name of XPath forms, such as the second row institute in table 3
Show, "/the html/body/div [4]/div [2]/div/dl [1]/dt [1] " in Property Name is the attribute-name of XPath forms
Claim, the XPath "/html/body/div [4]/div [2]/div/dl [1]/dd [1] " of corresponding attribute value.
3. modified xpath.conf files of table
For example, after receiving service request, loading configuration file pattern.conf and configuration file first
Then xpath.conf obtains the URL of the requested page info of user equipment, the URL of the page is parsed by regular expression,
Matching inquiry is carried out to configuration file pattern.conf and then obtains corresponding pattern and pattern_id, and is generated
Pattern_id lists obtain particular community title and correspondence according to pattern_id lists from configuration file xpath.conf
Attribute value XPath and open Property Name XPath and corresponding open attribute value XPath.For example, can be first
First from configuration file pattern.conf as shown in Table 1 pattern_id is obtained, then according to pattern_id from such as table 3
Shown in xpath.conf files obtain the XPath of Property Name, the XPath of attribute value or Property Name respectively, and then generate
Configuration information list as shown in table 4.Configuration information list includes pattern_id 0, particular community title " title " and corresponds to
The XPath "/html/body/div [4]/div [2]/div/div [2]/dd/h1 " of attribute value, open Property Name
The XPath of XPath "/html/body/div [4]/div [2]/div/dl [1]/dt [1] " and corresponding open attribute value "/
html/body/div[4]/div[2]/div/dl[1]/dd[1]”。
4. configuration information list of table
pattern_id | Property Name/attribute value | XPath |
0 | Title | /html/body/div[4]/div[2]/div/div[2]/dd/h1 |
0 | Open Property Name | /html/body/div[4]/div[2]/div/dl[1]/dt[2] |
0 | Open attribute value | /html/body/div[4]/div[2]/div/dl[1]/dd[2] |
S402 obtains the Property Name and according to described according to first traverse path from page info
Two traverse paths obtain the attribute value from the page info.
In the specific implementation, structure traversal tree can be created according to the page info, wherein the structure traversal, which is set, includes
Multiple content nodes;The multiple content node on the structure traversal tree is traversed, is obtained according to first traverse path
The Property Name and the attribute value is obtained according to second traverse path.
Optionally, according to first traverse path before obtaining the Property Name in page info, can be true
Determine the type of Property Name;If it is determined that the Property Name is open Property Name (Xpath forms), then traversed according to described first
Path obtains the Property Name from page info.If the Property Name is specific Property Name, e.g., " company's industry
" corporate business ", " development course ", then can be determined as Property Name, therefore in this case by business ", " development course " etc.
The Property Name need not be obtained from page info according to the first traverse path.
For example, as shown in figure 5, by DOM parsing html page information, corresponding DOM tree are generated.DOM tree packets
Containing multiple content nodes, each content node shows as the content of text in a HTML markup or HTML markup.It is creating
After DOM tree, according to the XPath in configuration information list as shown in table 4, the traversal content node in DOM tree,
Obtain the value of information of the corresponding node content as XPath.For example, when XPath be /html/head/title when, Ke Yigen
Html nodes, head nodes and title nodes in DOM Tree shown in fig. 5 are traversed successively according to/html/head/title,
Then the value of information of the content of text " My title " of title nodes as XPath is obtained, in this way according to different times
The value of information that path obtains each XPath respectively is gone through, attribute information list as shown in table 5 is ultimately produced, attribute information row
Table includes particular community title " title " and the corresponding XPath values of information " XXX ", open Property Name and corresponding
The XPath values of information " foreign language title ", open attribute value and the corresponding XPath values of information " ABC ".
5. attribute information list of table
Property Name/attribute value | The XPath values of information |
Title | XXX |
Open Property Name | Foreign language title |
Open attribute value | ABC |
S403, the mapping relations for establishing the Property Name and the attribute value are exported as acquisition of information result.
In the specific implementation, if Property Name is particular community title, by the corresponding XPath values of information of particular community title
As the corresponding attribute value of particular community title, if Property Name is open Property Name, by open Property Name
The XPath values of information and the XPath values of information of open attribute value establish mapping relations, according to<Property Name, attribute value>Format
Output.
Such as:In attribute information list as shown in table 5, the corresponding attribute value of particular community title " title " is
" XXX " opens the XPath values of information " ABC " that the corresponding attribute value of Property Name " foreign language title " is open attribute value, also,
They can be exported respectively and be:<Title, XXX>,<Foreign language title, ABC>.
In embodiments of the present invention, the first traverse path of Property Name and the second traversal road of attribute value are obtained first
Diameter;Then the Property Name is obtained from page info according to first traverse path and is traversed according to described second
Path obtains the attribute value from the page info;Finally establish the mapping relations of the Property Name and the attribute value
It is exported as acquisition of information result.It is obtained using traverse path for Property Name and attribute value, and by attribute
Title and attribute value are mapped, and the accuracy of acquisition of information is improved.
Fig. 6 is referred to, Fig. 6 is the flow chart schematic diagram for another information acquisition method that the embodiment of the present invention proposes, should
Method includes but not limited to following steps:
S601 obtains the first traverse path of Property Name and the second traverse path of attribute value.
In the specific implementation, the service request of user equipment transmission can be received first, service request is believed for request page
Then breath obtains the URL of page info;According to the URL, the attribute-bit of Property Name and attribute value, last basis are obtained
Attribute-bit obtains the first traverse path and the second traverse path from configuration file, wherein configuration file includes attribute mark
Know, the correspondence with the first traverse path and the second traverse path.
Wherein, system includes configuration file pattern.conf and configuration file xpath.conf, configuration file
Pattern.conf includes pattern and its corresponding pattern_id, for example, as shown in table 1, pattern.conf files
Including pattern_id 0 and pattern_id 1 and their corresponding pattern.Configuration file xpath.conf includes
The XPath of pattern_id, Property Name and attribute value, wherein the Property Name in xpath.conf files includes specific
The Property Name of Property Name and XPath forms, wherein the Property Name of XPath forms is open Property Name, open attribute
The corresponding attribute value of title is open attribute value, it should be noted that only there are one corresponding open categories for an open Property Name
Property value.For example, as shown in table 6, pattern_id 0 corresponds to two attribute titles, the first is specific Property Name, such as table 6
In the first row shown in, Property Name " title " is specific Property Name, the XPath "/html/body/ of corresponding attribute value
div[4]/div[2]/div/div[2]/dd/h1”;For second the Property Name of XPath forms, as in table 6 the second row and
Shown in the third line, "/the html/body/div [4]/div [2]/div/dl [1]/dt [1] " in Property Name and "/html/
Body/div [4]/div [2]/div/dl [1]/dt [2] " is the Property Name of XPath forms, the XPath of corresponding attribute value "/
Html/body/div [4]/div [2]/div/dl [1]/dd [1] " and "/html/body/div [4]/div [2]/div/d1
[1]/dd[2]”。
6. modified xpath.conf files of table
For example, after receiving service request, loading configuration file pattern.conf and configuration file first
Then xpath.conf obtains the URL of the requested page info of user equipment, the URL of the page is parsed by regular expression,
Matching inquiry is carried out to configuration file pattern.conf and then obtains corresponding pattern and pattern_id, and is generated
Pattern_id lists obtain particular community title and correspondence according to pattern_id lists from configuration file xpath.conf
Attribute value XPath and open Property Name XPath and corresponding open attribute value XPath, if shared n are opened
Property Name to be put, then can be respectively designated as open Property Name _ 1, open Property Name _ 2 ... open Property Name _ n,
Corresponding open attribute value is named as open attribute value _ 1, open attribute value _ 2 ..., open attribute value _ n.For example, can be first
Pattern_id is obtained from configuration file pattern.conf as shown in Table 1, then according to pattern_id from such as 6 institute of table
The xpath.conf files shown obtain the XPath of Property Name, the XPath of attribute value or Property Name respectively, and then generate such as
Configuration information list shown in table 7.Configuration information list includes pattern_id 0, particular community title " title " and corresponding
The XPath "/html/body/div [4]/div [2]/div/div [2]/dd/h1 " of attribute value, Property Name _ 1 is opened
XPath "/html/body/div [4]/div [2]/div/dl [1]/dt [1] " and corresponding open attribute value _ 1 XPath "/
Html/body/div [4]/div [2]/div/dl [1]/dd [1] ", the XPath "/html/body/div for opening Property Name _ 2
[4] the XPath "/html/body/div [4]/div of/div [2]/div/dl [1]/dt [2] " and corresponding open attribute value _ 2
[2]/div/dl[1]/dd[2]”。
7. configuration information list of table
pattern_id | Property Name/attribute value | XPath |
0 | Title | /html/body/div[4]/div[2]/div/div[2]/dd/h1 |
0 | Open Property Name _ 1 | /html/body/div[4]/div[2]/div/dl[1]/dt[1] |
0 | Open attribute value _ 1 | /html/body/div[4]/div[2]/div/dl[1]/dd[1] |
0 | Open Property Name _ 2 | /html/body/div[4]/div[2]/div/dl[1]/dt[2] |
0 | Open attribute value _ 2 | /html/body/div[4]/div[2]/div/dl[1]/dd[2] |
S602 obtains the Property Name and according to described according to first traverse path from page info
Two traverse paths obtain the attribute value from the page info.
In the specific implementation, structure traversal tree can be created according to the page info, wherein the structure traversal, which is set, includes
Multiple content nodes;The multiple content node on the structure traversal tree is traversed, is obtained according to first traverse path
The Property Name and the attribute value is obtained according to second traverse path.
Optionally, according to first traverse path before obtaining the Property Name in page info, can be true
Determine the type of Property Name;If it is determined that the Property Name is open Property Name (Xpath forms), then traversed according to described first
Path obtains the Property Name from page info.If the Property Name is specific Property Name, e.g., " company's industry
" corporate business ", " development course ", then can be determined as Property Name, therefore in this case by business ", " development course " etc.
The Property Name need not be obtained from page info according to the first traverse path.
For example, as shown in figure 5, by DOM parsing html page information, corresponding DOM tree are generated.DOM tree packets
Containing multiple content nodes, each content node shows as the content of text in a HTML markup or HTML markup.It is creating
After DOM tree, according to the XPath in configuration information list as shown in table 7, the traversal content node in DOM tree,
Obtain the value of information of the corresponding node content as XPath.For example, when XPath be /html/head/title when, Ke Yigen
Html nodes, head nodes and title nodes in DOM Tree shown in fig. 5 are traversed successively according to/html/head/title,
Then the value of information of the content of text " My title " of title nodes as XPath is obtained, in this way according to different times
The value of information that path obtains each XPath respectively is gone through, attribute information list as shown in table 8 is ultimately produced, including specific
Property Name " title " and the corresponding XPath values of information " XXX ", the open Property Name _ 1 and corresponding XPath values of information
The value of information " ABC " of " foreign language title ", open attribute value _ 1 and corresponding XPath, open Property Name _ 2 and corresponding
The XPath values of information " general headquarters place " and the value of information " China Shenzhen " of open attribute value _ 2 and the XPath answered.
8. attribute information list of table
Property Name/attribute value | The XPath values of information |
Title | XXX |
Open Property Name _ 1 | Foreign language title |
Open attribute value _ 1 | ABC |
Open Property Name _ 2 | General headquarters place |
Open attribute value _ 2 | China Shenzhen |
S603 obtains the first map tags of the Property Name and the second map tags of the attribute value.
In the specific implementation, if Property Name/attribute value is open Property Name _ n or open attribute value _ n, can obtain
First map tags of the value of information of the number " n " as corresponding XPath in open Property Name _ n, obtain open attribute
The second map tags of number " n " in value _ n as the value of information of corresponding XPath, wherein n, which can be 1,2,3 ... waits any
Integer.For example, in attribute information list as shown in table 8, the number " 1 " in open Property Name _ 1 is obtained as corresponding
First map tags of the value of information " foreign language title " of XPath obtain the number " 1 " in open attribute value _ 1 as corresponding
Second map tags of the value of information " China Shenzhen " of XPath.
S604 establishes the Property Name and the category according to first map tags and second map tags
Property value mapping relations, output information obtain result.
In the specific implementation, if Property Name is particular community title, by the corresponding XPath values of information of particular community title
As the corresponding attribute value of particular community title, can by they according to<Property Name:Attribute value>Form exported,
For example, in attribute information list as shown in table 8, the corresponding attribute value of particular community title " title " is exactly " XXX ", and will
They are exported:<Title, XXX>.
It, will open Property Name _ n pairs if Property Name/attribute value is open Property Name _ n or open attribute value _ n
The value of information of the XPath answered is stored in the nth position of open Property Name list as Property Name;Similarly, belong to open
Property value _ n corresponding XPath the value of information nth position of open list of attribute values, traversal attribute letter are stored in as attribute value
It ceases opening Property Name _ 1 in list and arrives open Property Name _ n, and open attribute value _ n is arrived in open attribute value _ 1.Finally,
When first map tags are identical as the second map tags, the corresponding Property Name of the first map tags and second are mapped
The corresponding attribute value of label establishes mapping relations, and can be according to<Property Name, attribute value>Form output.
For example, as shown in table 9-1 and table 9-2, the first of Property Name " foreign language title " is reflected in open Property Name list
It is 1 to penetrate label, and the second map tags of attribute value " ABC " are 1 in open list of attribute values, therefore, Property Name " outer literary fame
First map tags of title " are identical as the second map tags of attribute value " ABC ", to establish " foreign language title " and " ABC "
Mapping relations, and they are pressed<Foreign language title:ABC>Form output.Similarly, the first mapping of Property Name " general headquarters place "
Label is 2, and the second map tags of attribute value " China Shenzhen " are also 2, " general headquarters place " and " Chinese deep so as to establish
The mapping relations of ditch between fields ", and export<General headquarters place:China Shenzhen>.
In embodiments of the present invention, the first traverse path of Property Name and the second traversal road of attribute value are obtained first
Diameter;Then the Property Name is obtained from page info according to first traverse path and is traversed according to described second
Path obtains the attribute value from the page info;Finally establish the mapping relations of the Property Name and the attribute value
It is exported as acquisition of information result.It is obtained using traverse path for Property Name and attribute value, and by attribute
Title and attribute value are mapped, and the accuracy of acquisition of information is improved.
Fig. 7 is referred to, Fig. 7 is a kind of structural schematic diagram for information acquisition device that the embodiment of the present invention proposes, the information
Acquisition device may include:
Path acquisition module 701, the second traverse path of the first traverse path and attribute value for obtaining Property Name.
In the specific implementation, the service request of user equipment transmission can be received first, service request is believed for request page
Then breath obtains the URL of page info;According to the URL, the attribute-bit of Property Name and attribute value, last basis are obtained
Attribute-bit obtains the first traverse path and the second traverse path from configuration file, wherein configuration file includes attribute mark
Know, the correspondence with the first traverse path and the second traverse path.
Wherein, system includes configuration file pattern.conf and configuration file xpath.conf, configuration file
Pattern.conf includes pattern and its corresponding pattern_id, for example, as shown in table 1, pattern.conf files
Including pattern_id 0 and pattern_id 1 and their corresponding pattern.Configuration file xpath.conf includes
The XPath of pattern_id, Property Name and attribute value, wherein the Property Name in xpath.conf files includes specific
The Property Name of Property Name and XPath forms, wherein the Property Name of XPath forms is open Property Name, open attribute
The corresponding attribute value of title is open attribute value, it should be noted that only there are one corresponding open categories for an open Property Name
Property value.For example, as shown in table 6, pattern_id 0 corresponds to two attribute titles, the first is specific Property Name, such as table 6
In the first row shown in, Property Name " title " is specific Property Name, the XPath "/html/body/ of corresponding attribute value
div[4]/div[2]/div/div[2]/dd/h1”;For second the Property Name of XPath forms, as in table 6 the second row and
Shown in the third line, "/the html/body/div [4]/div [2]/div/dl [1]/dt [1] " in Property Name and "/html/
Body/div [4]/div [2]/div/dl [1]/dt [2] " is the Property Name of XPath forms, the XPath of corresponding attribute value "/
Html/body/div [4]/div [2]/div/dl [1]/dd [1] " and "/html/body/div [4]/div [2]/div/d1
[1]/dd[2]”。
For example, after receiving service request, loading configuration file pattern.conf and configuration file first
Then xpath.conf obtains the URL of the requested page info of user equipment, the URL of the page is parsed by regular expression,
Matching inquiry is carried out to configuration file pattern.conf and then obtains corresponding pattern and pattern_id, and is generated
Pattern_id lists obtain particular community title and correspondence according to pattern_id lists from configuration file xpath.conf
Attribute value XPath and open Property Name XPath and corresponding open attribute value XPath, if shared n are opened
Property Name to be put, then can be respectively designated as open Property Name _ 1, open Property Name _ 2 ... open Property Name _ n,
Corresponding open attribute value is named as open attribute value _ 1, open attribute value _ 2 ..., open attribute value _ n.For example, can be first
Pattern_id is obtained from configuration file pattern.conf as shown in Table 1, then according to pattern_id from such as 6 institute of table
The xpath.conf files shown obtain the XPath of Property Name, the XPath of attribute value or Property Name respectively, and then generate such as
Configuration information list shown in table 7.Configuration information list includes pattern_id 0, particular community title " title " and corresponding
The XPath "/html/body/div [4]/div [2]/div/div [2]/dd/h1 " of attribute value, Property Name _ 1 is opened
XPath "/html/body/div [4]/div [2]/div/dl [1]/dt [1] " and corresponding open attribute value _ 1 XPath "/
Html/body/div [4]/div [2]/div/dl [1]/dd [1] ", the XPath "/html/body/div for opening Property Name _ 2
[4] the XPath "/html/body/div [4]/div of/div [2]/div/dl [1]/dt [2] " and corresponding open attribute value _ 2
[2]/div/dl[1]/dd[2]”。
Data obtaining module 702, for according to first traverse path obtained from page info the Property Name,
And the attribute value is obtained from the page info according to second traverse path.
In the specific implementation, structure traversal tree can be created according to the page info, wherein the structure traversal, which is set, includes
Multiple content nodes;The multiple content node on the structure traversal tree is traversed, is obtained according to first traverse path
The Property Name and the attribute value is obtained according to second traverse path.
Optionally, according to first traverse path before obtaining the Property Name in page info, can be true
Determine the type of Property Name;If it is determined that the Property Name is open Property Name (Xpath forms), then traversed according to described first
Path obtains the Property Name from page info.If the Property Name is specific Property Name, e.g., " company's industry
" corporate business ", " development course ", then can be determined as Property Name, therefore in this case by business ", " development course " etc.
The Property Name need not be obtained from page info according to the first traverse path.
For example, as shown in figure 5, by DOM parsing html page information, corresponding DOM tree are generated.DOM tree packets
Containing multiple content nodes, each content node shows as the content of text in a HTML markup or HTML markup.It is creating
After DOM tree, according to the XPath in configuration information list as shown in table 7, the traversal content node in DOM tree,
Obtain the value of information of the corresponding node content as XPath.For example, when XPath be /html/head/title when, Ke Yigen
Html nodes, head nodes and title nodes in DOM Tree shown in fig. 5 are traversed successively according to/html/head/title,
Then the value of information of the content of text " My title " of title nodes as XPath is obtained, in this way according to different times
The value of information that path obtains each XPath respectively is gone through, attribute information list as shown in table 8 is ultimately produced, including specific
Property Name " title " and the corresponding XPath values of information " XXX ", the open Property Name _ 1 and corresponding XPath values of information
The value of information " ABC " of " foreign language title ", open attribute value _ 1 and corresponding XPath, open Property Name _ 2 and corresponding
The XPath values of information " general headquarters place " and the value of information " China Shenzhen " of open attribute value _ 2 and corresponding XPath.
As a result output module 703, the mapping relations for establishing the Property Name and the attribute value are obtained as information
Result is taken to be exported.
In the specific implementation, if Property Name is particular community title, by the corresponding XPath values of information of particular community title
As the corresponding attribute value of particular community title, can by they according to<Property Name:Attribute value>Form exported,
For example, in attribute information list as shown in table 8, the corresponding attribute value of particular community title " title " is exactly " XXX ", and will
They are exported:<Title, XXX>.
If Property Name/attribute value is open Property Name _ n or open attribute value _ n, open attribute-name is obtained first
The first map tags of number " n " in title _ n as the value of information of corresponding XPath, open Property Name _ n is corresponding
The value of information of XPath is stored in the nth position of open Property Name list as Property Name, similarly, obtains open attribute
The second map tags of number " n " in value _ n as the value of information of corresponding XPath, open attribute value _ n is corresponding
The value of information of XPath is stored in the nth position of open list of attribute values as attribute value, wherein n can be 1,2,3 ... wait appoint
One integer traverses opening Property Name _ 1 in attribute information list and arrives open Property Name _ n, and open attribute value _ 1 is arrived
Open attribute value _ n.For example, in attribute information list as shown in table 8, the number " 1 " obtained in open Property Name _ 1 is made
The first map tags for the corresponding XPath values of information " foreign language title " are 1, and " foreign language title " is stored as Property Name
On the 1st position of open Property Name list, the number " 1 " obtained in open attribute value _ 1 is believed as corresponding XPath
Second map tags of breath value " China Shenzhen ", and " China Shenzhen " is stored in the of open list of attribute values as attribute value
On 1 position.
Finally, when first map tags are identical as the second map tags, by the corresponding attribute of the first map tags
Title attribute value corresponding with the second map tags establishes mapping relations, and can be according to<Property Name, attribute value>Form
Output.
For example, as shown in table 9-1 and table 9-2, the first of Property Name " foreign language title " is reflected in open Property Name list
It is 1 to penetrate label, and the second map tags of attribute value " ABC " are 1 in open list of attribute values, therefore, Property Name " outer literary fame
First map tags of title " are identical as the second map tags of attribute value " ABC ", to establish " foreign language title " and " ABC "
Mapping relations press them<Foreign language title:ABC>Form output.Similarly, the first mapping mark of Property Name " general headquarters place "
Label are 2, and the second map tags of attribute value " China Shenzhen " are also 2, so as to establish " general headquarters place " and " China Shenzhen "
Mapping relations, and export<General headquarters place:China Shenzhen>.
In embodiments of the present invention, the first traverse path of Property Name and the second traversal road of attribute value are obtained first
Diameter;Then the Property Name is obtained from page info according to first traverse path and is traversed according to described second
Path obtains the attribute value from the page info;Finally establish the mapping relations of the Property Name and the attribute value
It is exported as acquisition of information result.It is obtained using traverse path for Property Name and attribute value, and by attribute
Title and attribute value are mapped, and the accuracy of acquisition of information is improved.
Continuing with the structural schematic diagram for referring to Fig. 8, Fig. 8 being a kind of information acquisition apparatus that the embodiment of the present invention proposes.Such as
Shown in figure, which may include:At least one processor 801, at least one communication interface 802 are at least one
Memory 803 and at least one communication bus 804.
Wherein, processor 801 can be central processor unit, general processor, digital signal processor, special integrated
Circuit, field programmable gate array either other programmable logic device, transistor logic, hardware component or it is arbitrary
Combination.It may be implemented or execute various illustrative logic blocks, module and electricity in conjunction with described in the disclosure of invention
Road.The processor can also be to realize the combination of computing function, such as combine comprising one or more microprocessors, number letter
The combination etc. of number processor and microprocessor.Communication bus 804 can be Peripheral Component Interconnect standard PCI bus or extension work
Industry normal structure eisa bus etc..The bus can be divided into address bus, data/address bus, controlling bus etc..For ease of indicating,
It is only indicated with a thick line in Fig. 8, it is not intended that an only bus or a type of bus.Communication bus 804 is used for
Realize the connection communication between these components.Wherein, the communication interface 802 of equipment is used for and other nodes in the embodiment of the present invention
Equipment carries out the communication of signaling or data.Memory 803 may include volatile memory, such as non-volatile dynamic random is deposited
Take memory (Nonvolatile Random Access Memory, NVRAM), phase change random access memory (Phase
Change RAM, PRAM), magnetic-resistance random access memory (Magetoresistive RAM, MRAM) etc., can also include non-
Volatile memory, for example, at least a disk memory, Electrical Erasable programmable read only memory (Electrically
Erasable Programmable Read-Only Memory, EEPROM), flush memory device, such as anti-or flash memory (NOR
Flash memory) or anti-and flash memory (NAND flash memory), semiconductor devices, such as solid state disk (Solid
State Disk, SSD) etc..Memory 803 optionally can also be at least one storage for being located remotely from aforementioned processor 801
Device.Batch processing code is stored in memory 803, and processor 801 executes the program in memory 803:
Obtain the second traverse path of the first traverse path and attribute value of Property Name;
The Property Name is obtained from page info according to first traverse path and is traversed according to described second
Path obtains the attribute value from the page info;
The mapping relations for establishing the Property Name and the attribute value are exported as acquisition of information result.
Optionally, processor 801 is additionally operable to execute following operating procedure:
Obtain the first map tags of the Property Name and the second map tags of the attribute value;
According to first map tags and second map tags, the Property Name and the attribute value are established
Mapping relations.
Optionally, processor 801 is additionally operable to execute following operating procedure:
When first map tags are identical as the second map tags, the Property Name and the attribute value are established
Mapping relations.
Optionally, processor 801 is additionally operable to execute following operating procedure:
Structure traversal tree is created according to the page info, wherein the structure traversal tree includes multiple content nodes;
The multiple content node on the structure traversal tree is traversed, the category is obtained according to first traverse path
Property title and the attribute value is obtained according to second traverse path.
Optionally, processor 801 is additionally operable to execute following operating procedure:
Obtain the attribute-bit of the Property Name and the attribute value;
According to the attribute-bit, first traverse path and second traverse path are obtained from configuration file,
Wherein, the configuration file include the attribute-bit, it is corresponding with first traverse path and second traverse path
Relationship.
Optionally, processor 801 is additionally operable to execute following operating procedure:
Obtain the uniform resource locator of the page info;
According to the uniform resource locator, the attribute-bit of the Property Name and the attribute value is obtained.
Optionally, processor 801 is additionally operable to execute following operating procedure:
Determine the type of the Property Name;
If the Property Name is open Property Name, according to the acquisition of the first traverse path of the Property Name
Property Name.
Further, processor can also be matched with memory and communication interface, executed and provided in foregoing invention embodiment
The operation of source control device.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or its arbitrary combination real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program
Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or
It partly generates according to the flow or function described in the embodiment of the present invention.The computer can be all-purpose computer, special meter
Calculation machine, computer network or other programmable devices.The computer instruction can be stored in computer readable storage medium
In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer
Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center
User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or
Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or
It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with
It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state disk
Solid State Disk (SSD)) etc..
Above-described specific implementation mode has carried out further the purpose of the present invention, technical solution and advantageous effect
It is described in detail.All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in
Within protection scope of the present invention.
Claims (15)
1. a kind of information acquisition method, which is characterized in that the method includes:
Obtain the second traverse path of the first traverse path and attribute value of Property Name;
The Property Name is obtained from page info according to first traverse path and according to second traverse path
The attribute value is obtained from the page info;
The mapping relations for establishing the Property Name and the attribute value are exported as acquisition of information result.
2. the method as described in claim 1, which is characterized in that the mapping for establishing the Property Name and the attribute value
Relationship includes:
Obtain the first map tags of the Property Name and the second map tags of the attribute value;
According to first map tags and second map tags, the mapping of the Property Name and the attribute value is established
Relationship.
3. the method as described in right wants 2, which is characterized in that described to be marked with second mapping according to first map tags
Label, the mapping relations for establishing the Property Name and the attribute value include:
When first map tags are identical as the second map tags, the mapping of the Property Name and the attribute value is established
Relationship.
4. method as shown in claim 1, which is characterized in that described to be obtained from page info according to first traverse path
It takes the Property Name and the attribute value is obtained from the page info according to second traverse path and include:
Structure traversal tree is created according to the page info, wherein the structure traversal tree includes multiple content nodes;
The multiple content node on the structure traversal tree is traversed, the attribute-name is obtained according to first traverse path
Claim and the attribute value is obtained according to second traverse path.
5. method as shown in claim 1, which is characterized in that first traverse path and attribute value for obtaining Property Name
The second traverse path include:
Obtain the attribute-bit of the Property Name and the attribute value;
According to the attribute-bit, first traverse path and second traverse path are obtained from configuration file, wherein
The configuration file includes the attribute-bit, the correspondence with first traverse path and second traverse path.
6. method as claimed in claim 5, which is characterized in that the attribute for obtaining the Property Name and the attribute value
Mark includes:
Obtain the uniform resource locator of the page info;
According to the uniform resource locator, the attribute-bit of the Property Name and the attribute value is obtained.
7. method as claimed in any one of claims 1 to 6, which is characterized in that it is described according to first traverse path from the page
The Property Name is obtained in information includes:
Determine the type of the Property Name;
If the Property Name is open Property Name, the attribute is obtained according to the first traverse path of the Property Name
Title.
8. a kind of information acquisition device, which is characterized in that described device includes:
Path acquisition module, the second traverse path of the first traverse path and attribute value for obtaining Property Name;
Data obtaining module, for obtaining the Property Name, Yi Jigen from page info according to first traverse path
The attribute value is obtained from the page info according to second traverse path;
As a result output module, the mapping relations for establishing the Property Name and the attribute value are defeated as the progress of acquisition of information result
Go out.
9. device as claimed in claim 8, which is characterized in that the result output module is specifically used for:
Obtain the first map tags of the Property Name and the second map tags of the attribute value;
According to first map tags and second map tags, the mapping of the Property Name and the attribute value is established
Relationship.
10. device as claimed in claim 9, which is characterized in that the result output module is specifically used for:
When first map tags are identical as the second map tags, the mapping of the Property Name and the attribute value is established
Relationship.
11. device as claimed in claim 8, which is characterized in that described information acquisition module is specifically used for:
Structure traversal tree is created according to the page info, wherein the structure traversal tree includes multiple content nodes;
The multiple content node on the structure traversal tree is traversed, the attribute-name is obtained according to first traverse path
Claim and the attribute value is obtained according to second traverse path.
12. device as claimed in claim 8, which is characterized in that the path acquisition module is specifically used for:
Obtain the attribute-bit of the Property Name and the attribute value;
According to the attribute-bit, first traverse path and second traverse path are obtained from configuration file, wherein
The configuration file includes the attribute-bit, the correspondence with first traverse path and second traverse path.
13. device as claimed in claim 12, which is characterized in that the path acquisition module is specifically used for:
Obtain the uniform resource locator of the page info;
According to the uniform resource locator, the attribute-bit of the Property Name and the attribute value is obtained.
14. any one device as described in claim 8-13, which is characterized in that described information acquisition module is specifically used for:
Determine the type of the Property Name;
If the Property Name is open Property Name, the attribute is obtained according to the first traverse path of the Property Name
Title.
15. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has a plurality of finger
It enables, described instruction is suitable for being loaded by processor and being executed such as claim 1-7 any one of them methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810009236.XA CN108334560B (en) | 2018-01-03 | 2018-01-03 | Information acquisition method and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810009236.XA CN108334560B (en) | 2018-01-03 | 2018-01-03 | Information acquisition method and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108334560A true CN108334560A (en) | 2018-07-27 |
CN108334560B CN108334560B (en) | 2022-04-15 |
Family
ID=62924834
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810009236.XA Active CN108334560B (en) | 2018-01-03 | 2018-01-03 | Information acquisition method and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108334560B (en) |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030212705A1 (en) * | 1994-12-07 | 2003-11-13 | Richard Williamson | Method and apparatus for mapping objects to multiple tables of a database |
US20070244887A1 (en) * | 2006-04-18 | 2007-10-18 | Benq Corporation | Systems and methods for discovering frequently accessed subtrees |
CN101183385A (en) * | 2007-12-04 | 2008-05-21 | 西安交通大学 | XML enquire method based on multi-modality indexes structure |
CN101593184A (en) * | 2008-05-29 | 2009-12-02 | 国际商业机器公司 | The system and method for self-adaptively locating dynamic web page elements |
JP2010012853A (en) * | 2008-07-02 | 2010-01-21 | Navitime Japan Co Ltd | Path search system, path search server, path search method, and terminal device |
CN101887458A (en) * | 2010-07-06 | 2010-11-17 | 江苏大学 | Path coding-based XML document index method |
CN101984434A (en) * | 2010-11-16 | 2011-03-09 | 东北大学 | Webpage data extracting method based on extensible language query |
CN102693240A (en) * | 2011-03-25 | 2012-09-26 | 北京航空航天大学 | Formal description method and device of Web service protocol semantics |
CN102760150A (en) * | 2012-04-05 | 2012-10-31 | 中国人民解放军国防科学技术大学 | Webpage extraction method based on attribute reproduction and labeled path |
CN103049494A (en) * | 2012-12-07 | 2013-04-17 | 华为技术有限公司 | Method and device for storing table of extensible markup language (XML) file |
US20130297657A1 (en) * | 2012-05-01 | 2013-11-07 | Gajanan Chinchwadkar | Apparatus and Method for Forming and Using a Tree Structured Database with Top-Down Trees and Bottom-Up Indices |
CN106294641A (en) * | 2016-08-03 | 2017-01-04 | 朱杰 | A kind of orientation lookup method getting in touch with object |
CN106599280A (en) * | 2016-12-23 | 2017-04-26 | 北京奇虎科技有限公司 | Webpage node path information determination method and apparatus |
CN106709980A (en) * | 2017-01-09 | 2017-05-24 | 北京航空航天大学 | Complex three-dimensional scene modeling method based on formalization |
CN106844640A (en) * | 2017-01-22 | 2017-06-13 | 漳州科技职业学院 | A kind of web data analysis and processing method |
-
2018
- 2018-01-03 CN CN201810009236.XA patent/CN108334560B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030212705A1 (en) * | 1994-12-07 | 2003-11-13 | Richard Williamson | Method and apparatus for mapping objects to multiple tables of a database |
US20070244887A1 (en) * | 2006-04-18 | 2007-10-18 | Benq Corporation | Systems and methods for discovering frequently accessed subtrees |
CN101183385A (en) * | 2007-12-04 | 2008-05-21 | 西安交通大学 | XML enquire method based on multi-modality indexes structure |
CN101593184A (en) * | 2008-05-29 | 2009-12-02 | 国际商业机器公司 | The system and method for self-adaptively locating dynamic web page elements |
JP2010012853A (en) * | 2008-07-02 | 2010-01-21 | Navitime Japan Co Ltd | Path search system, path search server, path search method, and terminal device |
CN101887458A (en) * | 2010-07-06 | 2010-11-17 | 江苏大学 | Path coding-based XML document index method |
CN101984434A (en) * | 2010-11-16 | 2011-03-09 | 东北大学 | Webpage data extracting method based on extensible language query |
CN102693240A (en) * | 2011-03-25 | 2012-09-26 | 北京航空航天大学 | Formal description method and device of Web service protocol semantics |
CN102760150A (en) * | 2012-04-05 | 2012-10-31 | 中国人民解放军国防科学技术大学 | Webpage extraction method based on attribute reproduction and labeled path |
US20130297657A1 (en) * | 2012-05-01 | 2013-11-07 | Gajanan Chinchwadkar | Apparatus and Method for Forming and Using a Tree Structured Database with Top-Down Trees and Bottom-Up Indices |
CN103049494A (en) * | 2012-12-07 | 2013-04-17 | 华为技术有限公司 | Method and device for storing table of extensible markup language (XML) file |
CN106294641A (en) * | 2016-08-03 | 2017-01-04 | 朱杰 | A kind of orientation lookup method getting in touch with object |
CN106599280A (en) * | 2016-12-23 | 2017-04-26 | 北京奇虎科技有限公司 | Webpage node path information determination method and apparatus |
CN106709980A (en) * | 2017-01-09 | 2017-05-24 | 北京航空航天大学 | Complex three-dimensional scene modeling method based on formalization |
CN106844640A (en) * | 2017-01-22 | 2017-06-13 | 漳州科技职业学院 | A kind of web data analysis and processing method |
Non-Patent Citations (1)
Title |
---|
张婷等: "XPath语义特性及其对XML数据操作的应用研究", 《信息技术》 * |
Also Published As
Publication number | Publication date |
---|---|
CN108334560B (en) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10936179B2 (en) | Methods and systems for web content generation | |
US11372935B2 (en) | Automatically generating a website specific to an industry | |
US10796076B2 (en) | Method and system for providing suggested tags associated with a target web page for manipulation by a useroptimal rendering engine | |
US10452787B2 (en) | Techniques for automated document translation | |
US8572202B2 (en) | Persistent saving portal | |
US8468145B2 (en) | Indexing of URLs with fragments | |
US20100250649A1 (en) | Scope-Based Extensibility for Control Surfaces | |
US20080040661A1 (en) | Method for inheriting a Wiki page layout for a Wiki page | |
US20080010387A1 (en) | Method for defining a Wiki page layout using a Wiki page | |
US20180011933A1 (en) | Method, apparatus, and server for generating hotspot content | |
WO2021051624A1 (en) | Data acquisition method and apparatus, and electronic device and storage medium | |
CN110365724A (en) | Task processing method, device and electronic equipment | |
US20130318133A1 (en) | Techniques to manage universal file descriptor models for content files | |
US20080010388A1 (en) | Method and apparatus for server wiring model | |
CN113656737A (en) | Webpage content display method and device, electronic equipment and storage medium | |
US20240061992A1 (en) | Generating tagged content from text of an electronic document | |
KR20090087502A (en) | Really simple syndication for data | |
CN108334560A (en) | A kind of information acquisition method and relevant device | |
CN110516174A (en) | The method, apparatus and storage medium of text are obtained based on Simple Syndication | |
US11914943B1 (en) | Generating an electronic document with a consistent text ordering | |
CN113779438B (en) | Webpage text information processing method and device and terminal equipment | |
Buzydlowski et al. | A comparison of a hierarchical tree to an associative map interface for the selection of classification terms | |
Tarczyński | Model of long line with influence of screen | |
Guenther | MIX: what it stands for: Metadata for Images in XML Schema | |
Troy | National Guest Systems probes Windows application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |