CN104036026A - Methods and systems for storing and positioning selected content of structured document - Google Patents

Methods and systems for storing and positioning selected content of structured document Download PDF

Info

Publication number
CN104036026A
CN104036026A CN201410300699.3A CN201410300699A CN104036026A CN 104036026 A CN104036026 A CN 104036026A CN 201410300699 A CN201410300699 A CN 201410300699A CN 104036026 A CN104036026 A CN 104036026A
Authority
CN
China
Prior art keywords
content
node
document
choosing
play amount
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410300699.3A
Other languages
Chinese (zh)
Other versions
CN104036026B (en
Inventor
吴涛军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410300699.3A priority Critical patent/CN104036026B/en
Publication of CN104036026A publication Critical patent/CN104036026A/en
Application granted granted Critical
Publication of CN104036026B publication Critical patent/CN104036026B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/123Storage facilities

Abstract

The invention relates to methods and devices for storing, positioning and repositioning selected content of a structured document. The storing method includes the steps: for calculation, taking any one of nodes as a reference node, calculating offset of the initial position and offset of the end position of the content selected by a user; for storing, storing the reference point, the offset of the initial position of the selected content and the offset of the end position of the selected content to a server. The positioning method includes the steps: for inputting, reading positioning information from the server; for positioning, positioning the selected content in a new structured document according to the positioning information. The method for repositioning the selected content of the structured document includes the steps of positioning information reading, retrieval, times comparison and repositioning.

Description

Storage and location structure document are chosen the method and system of content
Technical field
The application relates to a kind of storage and location structure document and chooses the method and system of content, and its position of can implementation structure document choosing content is stored and reorientates, and belongs to and the invention belongs to technical field of information retrieval.
Background technology
At present, along with the widespread use of computer network, people have strong demand for the content of choosing on storage and location structure document.People wish can realize accurate location for the structured document content of having accessed.That is to say, in the time again accessing this structured document, can accurately locate the structured document content of having selected.
In order to solve the problems of the technologies described above, prior art has been done some explorations.The method of choosing content that prior art relates on storage and location structure document has two kinds, and a kind of method is for static structure document, and the screen coordinate that operates initial place during with mouse mark by preservation last user is located and chosen content.Another kind method is chosen the XPath path of content and is located by preserving user, DOM (Document Object Model) is a kind of standard program interface of processing extend markup language (comprising HTML), it regards html document as a tree that has many nodes, XPath is the path expression of following the path form of DOM, can describe the sequence of steps from a DOM node to another DOM node.The method realizes location and chooses content by recording root node to the XPath path of choosing content place DOM node.Obviously, so first method due to only very obvious for static Web page limitation, needs to pass through preserved screen coordinate simultaneously, localization process efficiency is not high in addition.Second method adopts comparatively general, for example, Yahoo, in disclosed US2013117127A1 U.S. Patent application on May 9th, 2013, relates to a kind of system and method for advertising message being located according to the chosen content before user, is mainly designed according to similar thinking.For another example, American Network technology application company (Network Appliance Inc.) on Dec 13rd, 2005 disclosed US6976189B1 U.S. Patent application, on June 10th, 2008 disclosed US7386762B1 U.S. Patent application, on November 9th, 2010 disclosed US7831864B1 U.S. Patent application all relate to the in addition system of Storage and Processing of content-based user behavior, wherein adopt and regard document as comprise many nodes tree structure, realize and processing by the ergodic algorithm optimization of tree structure.
Process thinking for above-mentioned two kinds, especially the existing technical matters of the second processing thinking is: the versatility of realization and the accuracy of location are poor.The versatility that realizes is poor to be mainly reflected in the multiple computing equipment such as mobile terminal and multi-purpose computer to state in realization the versatility of function aspects not strong, because browser on mobile terminal is for different on the presentation mode of webpage and multi-purpose computer, the dom tree structure generating is different, and the content of choosing of therefore preserving on multi-purpose computer possibly cannot locate on the webpage of mobile terminal.The poor accuracy of location is mainly reflected in static Web page and is modified the inaccurate and dynamic web page plug-in unit in rear location and loads the impact for location, in the time being used in static Web page, after web document is modified, new web page cannot be chosen content according to the XPath routing information location of having preserved, in the time being used in dynamic web page, because webpage connector, browser plug-in or other third party's assemblies may increase some labels to reach plug-in unit effect to webpage, the XPath path of preserving, in the time that new plug-in unit cannot load, cannot be used for locating and choose content.
Summary of the invention
In order to address the above problem, provide a kind of on computing machine and mobile terminal, can store accurately and stably with location structure document on the method and apparatus of choosing content.
The present invention is for solving the problems of the technologies described above by the following technical solutions:
Storage organization document is chosen a method for content, comprises the steps: calculation procedure, taking one or more arbitrary node as with reference to node, calculates user and chooses the side-play amount of content reference position and the side-play amount of end position; Storing step, stores server into reference to node, the side-play amount choosing the side-play amount of content reference position and choose end of text position.
As preferred version, storage organization document as above is chosen the method for content, and wherein, storing step is also stored and chosen content; It is the root node of structured document text with reference to node; And described storing step is also stored URL(uniform resource locator).
Storage organization document as described in above arbitrary scheme and preferred version is chosen the method for content, wherein calculation procedure comprises following sub-step, the content-length calculation procedure of reference position, calculates the content-length of choosing between content reference position and nearest document node; Traversal step, from reference to node, travels through nearest document node each node before, and whether decision node label be the label in special tag table, if so, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node; Document node side-play amount calculation procedure recently, adds up the length of content in each node before nearest document node, obtains the side-play amount of nearest document node; Choose content reference position side-play amount calculation procedure, the side-play amount of nearest document node is added to the content-length of choosing between content reference position and nearest document node, obtain choosing the side-play amount of content reference position.
Or, storage organization document as described in above arbitrary scheme and preferred version is chosen the method for content, wherein calculation procedure comprises following sub-step, and the content-length calculation procedure of end position calculates the content-length of choosing between end of text position and nearest document node; Traversal step, from reference to node, travels through nearest document node each node before, and whether decision node label be the label in special tag table, if so, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node; Node side-play amount calculation procedure, adds up the length of content in each node before nearest document node, obtains the side-play amount of nearest document node; The side-play amount calculation procedure of end position, adds the side-play amount of nearest document node the content-length of choosing between end of text position and nearest document node, obtains choosing the side-play amount of end of text position.
Location structure document is chosen a method for content, comprises the steps: input step, reads with reference to node, user and chooses the locating information such as the side-play amount of content reference position and the side-play amount of end position from server; Positioning step, locates and chooses content in new construction document according to locating information.
Wherein, the locating information that input step reads comprises with reference to node, user chooses the side-play amount of content reference position and the side-play amount of end position.
More specifically, wherein positioning step comprises, traversal step, in new construction document, traversal is with reference to the child node in node, every through a node, whether decision node label is the label in special tag table, if, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node, and the length of content in node is added up; Content-length obtaining step, in the time that accumulated value is greater than or equal to the side-play amount of choosing content reference position for the first time, obtain the accumulated value of a nearest document node before this node, the accumulated value that the side-play amount of choosing content reference position is deducted to nearest document node obtains choosing content reference position and the content-length between document node recently; Reference position determining step, obtains choosing content reference position according to described content-length; Content-length determining step, when cumulative value is greater than or equal to the side-play amount of choosing end of text position for the first time, obtain the accumulated value of a nearest document node before this node, the accumulated value that the side-play amount of choosing end of text position is deducted to nearest document node obtains choosing end of text position and the content-length between document node recently; Choose end of text location positioning step, obtain choosing end of text position according to described content-length; Choose content positioning step, according to choosing content reference position and choosing location, end of text position and choose content.
As the further prioritization scheme of the present invention, appear at which time and also store in server as locating information choosing the number of times that content occurs in reference to node and choosing content, choose content and choose content when not identical in locating information when what locate in new construction document, trial is reorientated, reorientating structured document, to choose its step of method of content as follows: locating information read step, obtains locating information from server; Searching step, new construction document with reference to node in retrieve the content of choosing in locating information, record the number of times of its appearance; Number of times comparison step, the number of times that judges retrieval gained with in locating information, choose the number of times that content occurs in reference to node and whether equate, if unequal, reminding user new construction document content changes, cannot reorientate, if equated, continue operation; Reorientate step, which appears at time reorientate and to choose content according to choosing content in locating information.
Wherein, locating information read step from server obtain locating information comprise with reference to node, choose content, choose content reference position side-play amount, choose end of text position side-play amount, choose the number of times that content occurs with reference to node and choose content to appear at which time.
The re-positioning method of choosing content as above, wherein also comprises step of updating, the reference position of choosing content that calculating is reorientated and the side-play amount of end position, and be updated in the locating information of having preserved in server.·
The present invention is for solving the problems of the technologies described above by the following technical solutions:
Storage organization document is chosen a device for content, and it comprises: computing module, and this module, taking one or more arbitrary node as with reference to node, calculates user and chooses the side-play amount of content reference position and the side-play amount of end position; Memory module, this module stores server into reference to node, the side-play amount choosing the side-play amount of content reference position and choose end of text position.
Storage organization document as above is chosen the device of content, and wherein memory module is also stored and chosen content.
Storage organization document as above is chosen the method for content, is wherein the root node of structured document text with reference to node.
Storage organization document as above is chosen the device of content, and wherein memory module is also stored URL(uniform resource locator).
Storage organization document as above is chosen the device of content, and wherein computing module comprises following submodule, the content-length computing module of reference position, and this module is calculated the content-length of choosing between content reference position and nearest document node; Spider module, this module, from reference to node, travels through nearest document node each node before, and whether decision node label be the label in special tag table, if so, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node; Document node side-play amount computing module recently, this module adds up the length of content in each node before nearest document node, obtains the side-play amount of nearest document node; Choose content reference position side-play amount computing module, this module adds the side-play amount of nearest document node the content-length of choosing between content reference position and nearest document node, obtains choosing the side-play amount of content reference position.
Storage organization document as above is chosen the device of content, and wherein computing module comprises following submodule, the content-length computing module of end position, and this module calculates the content-length of choosing between end of text position and nearest document node; Spider module, this module, from reference to node, travels through nearest document node each node before, and whether decision node label be the label in special tag table, if so, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node; Node side-play amount computing module, this module adds up the length of content in each node before nearest document node, obtains the side-play amount of nearest document node; The side-play amount calculation procedure of end position, adds the side-play amount of nearest document node the content-length of choosing between end of text position and nearest document node, obtains choosing the side-play amount of end of text position.
Location structure document is chosen a device for content, comprises as lower module: load module, and this module reads locating information from server; Locating module, this module is located and is chosen content in new construction document according to locating information.
Location structure document as above is chosen the device of content, it is characterized in that, the locating information that load module reads comprises with reference to node, user chooses the side-play amount of content reference position and the side-play amount of end position.
Location structure document as above is chosen the device of content, wherein locating module comprises, spider module, this module travels through with reference to the child node in new construction document node, every through a node, whether decision node label is the label in special tag table, if so, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node, and the length of content in node is added up; Content-length acquisition module, this module is in the time that accumulated value is greater than or equal to the side-play amount of choosing content reference position for the first time, obtain the accumulated value of a nearest document node before this node, the accumulated value that the side-play amount of choosing content reference position is deducted to nearest document node obtains choosing content reference position and the content-length between document node recently; Reference position determination module, this module obtains choosing content reference position according to described content-length; Content-length determination module, when cumulative value is greater than or equal to for the first time the side-play amount of choosing end of text position by this module, obtain the accumulated value of a nearest document node before this node, the accumulated value that the side-play amount of choosing end of text position is deducted to nearest document node obtains choosing end of text position and the content-length between document node recently; Choose end of text position determination module, this module obtains choosing end of text position according to described content-length; Choose content locating module, this module is according to choosing content reference position and choosing location, end of text position and choose content.
Reorientate structured document and choose a device for content, comprising: locating information read module, this module obtains locating information from server; Retrieval module, this module new construction document with reference to node in retrieve the content of choosing in locating information, record the number of times of its appearance; Number of times comparison module, the number of times that this module judges retrieval gained with in locating information, choose the number of times that content occurs in reference to node and whether equate, if unequal, reminding user new construction document content changes, cannot reorientate, if equated, continue operation; Reorientate module, which this module appears at and time reorientates and to choose content according to choosing content in locating information.
The re-positioning device of choosing content as above, wherein locating information read module from server obtain locating information comprise with reference to node, choose content, choose content reference position side-play amount, choose end of text position side-play amount, choose the number of times that content occurs with reference to node and choose content to appear at which time.
The re-positioning device of choosing content as above, wherein also comprises update module, and this module is calculated the reference position of choosing content of reorientating and the side-play amount of end position, and is updated in the locating information of having preserved in server.
Described in this patent, choosing content, can be word, video or picture etc.
The present invention adopts above technical scheme compared with prior art, has improved the versatility of realization and the accuracy of location comprehensively.On the one hand, improved the versatility realizing comprehensively, no matter how different the computing equipment such as mobile terminal, multi-purpose computer is for the parsing of structured document structure, can locate accurately and stably, and can effectively overcome the otherness of between different browsers, html document being resolved.On the other hand, improve the accuracy of location comprehensively, when increasing in the file structure of structured document or reducing the label that in special tag table, length is zero, do not affected location, and in structured document, have while not relating to the change of choosing content, can reorientate intelligently.
Brief description of the drawings
From the following description to explanation the application's purport and the preferred embodiments and drawings of use thereof, the application's above and other objects, features and advantages will be apparent, in the accompanying drawings:
Fig. 1 is the process flow diagram that storage organization document is chosen content and method;
Fig. 2 is the process flow diagram that location structure document is chosen content and method;
Fig. 3 is the process flow diagram of reorientating structured document and choose content and method;
Fig. 4 is the structural drawing that storage organization document is chosen content device;
Fig. 5 is the structural drawing that location structure document is chosen content device;
Fig. 6 is the structural drawing of reorientating structured document and choose content device;
Fig. 7 is the process flow diagram that Content Implementation example is chosen in storage and location.
Embodiment
Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:
Fig. 1 is the process flow diagram that storage organization document is chosen content and method, comprises the steps: calculation procedure, taking the root node of structured document text as with reference to node, calculates user and chooses the side-play amount of content reference position and the side-play amount of end position; It comprises, the content-length calculation procedure of reference position is calculated the content-length of choosing between content reference position and nearest document node; Traversal step, from reference to node, travels through nearest document node each node before, and whether decision node label be the label in special tag table, if so, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node; Document node side-play amount calculation procedure recently, adds up the length of content in each node before nearest document node, obtains the side-play amount of nearest document node; Choose content reference position side-play amount calculation procedure, the side-play amount of nearest document node is added to the content-length of choosing between content reference position and nearest document node, obtain choosing the side-play amount of content reference position; The content-length calculation procedure of end position, calculates the content-length of choosing between end of text position and nearest document node; Traversal step, from reference to node, travels through nearest document node each node before, and whether decision node label be the label in special tag table, if so, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node; Node side-play amount calculation procedure, adds up the length of content in each node before nearest document node, obtains the side-play amount of nearest document node; The side-play amount calculation procedure of end position, adds the side-play amount of nearest document node the content-length of choosing between end of text position and nearest document node, obtains choosing the side-play amount of end of text position.Storing step, by structured document URL(uniform resource locator), with reference to node, the side-play amount choosing content, choose the side-play amount of content reference position and choose end of text position stores server into.
Fig. 2 is the process flow diagram that location structure document is chosen content and method, comprises the steps: input step, reads with reference to node, user and chooses the locating information such as the side-play amount of content reference position and the side-play amount of end position from server; Positioning step, locate and choose content in new construction document according to locating information, it comprises traversal step, in new construction document, traversal is with reference to the child node in node, every through a node, whether decision node label is the label in special tag table, if, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node, and the length of content in node is added up; Content-length obtaining step, in the time that accumulated value is greater than or equal to the side-play amount of choosing content reference position for the first time, obtain the accumulated value of a nearest document node before this node, the accumulated value that the side-play amount of choosing content reference position is deducted to nearest document node obtains choosing content reference position and the content-length between document node recently; Reference position determining step, obtains choosing content reference position according to described content-length; Content-length determining step, when cumulative value is greater than or equal to the side-play amount of choosing end of text position for the first time, obtain the accumulated value of a nearest document node before this node, the accumulated value that the side-play amount of choosing end of text position is deducted to nearest document node obtains choosing end of text position and the content-length between document node recently; Choose end of text location positioning step, obtain choosing end of text position according to described content-length; Choose content positioning step, according to choosing content reference position and choosing location, end of text position and choose content.
Fig. 3 is the process flow diagram of reorientating structured document and choose content and method, comprise the steps: that its step is as follows: locating information read step, from server obtain with reference to node, choose content, choose content reference position side-play amount, choose end of text position side-play amount, choose the number of times that content occurs with reference to node and choose content to appear at which inferior information; Searching step, new construction document with reference to node in retrieve the content of choosing in locating information, record the number of times of its appearance; Number of times comparison step, the number of times that judges retrieval gained with in locating information, choose the number of times that content occurs in reference to node and whether equate, if unequal, reminding user new construction document content changes, cannot reorientate, if equated, continue operation; Reorientate step, which appears at time reorientate and to choose content according to choosing content in locating information; Step of updating, the reference position of choosing content that calculating is reorientated and the side-play amount of end position, and be updated in the locating information of having preserved in server.
Fig. 4 is the structural drawing that storage organization document is chosen content device, this storage organization document is chosen content device and is comprised, computing module, this module, taking the root node of structured document text as with reference to node, calculates user and chooses the side-play amount of content reference position and the side-play amount of end position; Memory module, this module by URL(uniform resource locator), with reference to node, the side-play amount choosing content, choose the side-play amount of content reference position and choose end of text position stores server into.Computing module comprises following submodule, the content-length computing module of reference position, and this module is calculated the content-length of choosing between content reference position and nearest document node; The first spider module, this module is from reference to node, travel through nearest document node each node before, and whether decision node label is the label in special tag table, if, using the length of value content in node corresponding to label in table, if not, the length of content in node obtained; Document node side-play amount computing module recently, this module adds up the length of content in each node before nearest document node, obtains the side-play amount of nearest document node; Choose content reference position side-play amount computing module, this module adds the side-play amount of nearest document node the content-length of choosing between content reference position and nearest document node, obtains choosing the side-play amount of content reference position; The content-length computing module of end position, this module calculates the content-length of choosing between end of text position and nearest document node; The second spider module, this module is from reference to node, travel through nearest document node each node before, and whether decision node label is the label in special tag table, if, using the length of value content in node corresponding to label in table, if not, the length of content in node obtained; Node side-play amount computing module, this module adds up the length of content in each node before nearest document node, obtains the side-play amount of nearest document node; The side-play amount computing module of end position, adds the side-play amount of nearest document node the content-length of choosing between end of text position and nearest document node, obtains choosing the side-play amount of end of text position.
Fig. 5 is the structural drawing that location structure document is chosen content device, and this device comprises as lower module: load module, and this module reads with reference to node, user and chooses the locating information such as the side-play amount of content reference position and the side-play amount of end position from server; Locating module, this module is located and is chosen content in new construction document according to locating information.Locating module comprises, spider module, this module travels through with reference to the child node in new construction document node, every through a node, whether decision node label is the label in special tag table, if, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node, and the length of content in node is added up; Content-length acquisition module, this module is in the time that accumulated value is greater than or equal to the side-play amount of choosing content reference position for the first time, obtain the accumulated value of a nearest document node before this node, the accumulated value that the side-play amount of choosing content reference position is deducted to nearest document node obtains choosing content reference position and the content-length between document node recently; Reference position determination module, this module obtains choosing content reference position according to described content-length; Content-length determination module, when cumulative value is greater than or equal to for the first time the side-play amount of choosing end of text position by this module, obtain the accumulated value of a nearest document node before this node, the accumulated value that the side-play amount of choosing end of text position is deducted to nearest document node obtains choosing end of text position and the content-length between document node recently; Choose end of text position determination module, this module obtains choosing end of text position according to described content-length; Choose content locating module, this module is according to choosing content reference position and choosing location, end of text position and choose content.
Fig. 6 is the structural drawing of reorientating structured document and choose content device, this device comprises: locating information read module, this module from server obtain with reference to node, choose content, choose content reference position side-play amount, choose end of text position side-play amount, choose the number of times that content occurs with reference to node and choose content to appear at which inferior locating information; Retrieval module, this module new construction document with reference to node in retrieve the content of choosing in locating information, record the number of times of its appearance; Number of times comparison module, the number of times that this module judges retrieval gained with in locating information, choose the number of times that content occurs in reference to node and whether equate, if unequal, reminding user new construction document content changes, cannot reorientate, if equated, continue operation; Reorientate module, which this module appears at and time reorientates and to choose content according to choosing content in locating information; Update module, this module is calculated the reference position of choosing content of reorientating and the side-play amount of end position, and is updated in the locating information of having preserved in server.
As shown in Figure 7, the invention discloses the structured documents such as a kind of storage and locating web-pages and choose the method for content, specifically divide the following steps: step 1), user's accessed web page, chooses content; Step 2), taking the root node of Web page text as with reference to node, calculate the side-play amount of choosing content reference position and end position; Step 3), locating information is saved in to server; Step 4), user is accessed web page again; Step 5), obtain locating information from server; Step 6), locate and choose content in new web page according to the side-play amount in locating information; Step 7), judge location gained choose content with in locating information whether choose content identical, if identical, location end, if not identical, performs step 8); Step 8), new web page with reference to node in retrieve the content of choosing in locating information, record the number of times of its appearance; Step 9), the number of times that judges retrieval gained with in locating information, choose the number of times that content occurs in reference to node and whether equate, if not etc., reminding user new web page content changes, and cannot reorientate, and finish location, if equated, perform step 10), step 10), which appears at time reorientate and to choose content according to choosing content in locating information; Step 11), the reference position of choosing content that calculating is reorientated and the side-play amount of end position, and be updated in the locating information of having preserved in server, finish location.
In following examples, the unit of skew using byte as content, a Chinese character length is two bytes, an English character length is a byte;
Suppose that a certain webpage url is http:// xxxxx.xxx, html code is as follows:
<html>
<body>
<script?type=”text/javascript”>
Document.write(”<h1>Hello?World!</h1>”)
</script>
<p id=1> is </p> in first effective node
<i?mg?src=”/i/eg_mouse.jpg”width=”128”height=”128”>
<p id=2><i> is </i></pGreatT.Gr eaT.GT in the first child node of second effective node
The 3rd effective </p> of <p id=3>This is
<p id=4> node is at the 4th effective node </p>
</body>
</html>
It is that id is " node " four words that start in " effectively " in 3 p node and the id p node that is 4 that user chooses content, system first by this webpage url " http://xxxxx.xxx", with reference to node " <body> ", choose content " effectively node ", choose the side-play amount of content reference position, choose the side-play amount of end of text position, choose the number of times " 4 " that content occurs in reference to node and choose content and appear at which time " 3 " and be saved in server as locating information, wherein calculating user, to choose the step of side-play amount of content reference position and end position as follows: step 1), calculate and choose content " effectively node " reference position and nearest document node (is DOM node, the length of the content " the 3rd of This is " between the p node that down together) id is 3 is 14, step 2), from with reference to node <body>, each node before the p node that traversal id is 3, and whether decision node label is the label in special tag table, if, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node: first node is <script> node, <script> label is the label in special tag table, its respective value is 0, getting this node content length is 0, second node is that id is 1 p node, and <p> is not the label in special tag table, and its node content " in first effective node " length is 18, the 3rd node is <img> node, and <img> label is the label in special tag table, and its length is 1, and getting this node ' s length is 1, the 4th node is that id is 2 p node, and <p> is not the label in special tag table, and its node content length is 0, the 5th node is that id is the child node <i> of 2 p node, <i> is not the label in special tag table, and its node content " in the first child node of second effective node " length is 30, step 3), in each node, the length of content adds up before the p node that is 3 by id, i.e. 0+18+1+0+30, the side-play amount that obtains id and be 3 p node is 49, step 4), the side-play amount 49 of the p node that is 3 by id adds the content-length 14 of choosing between the p node that content reference position and id are 3, obtains choosing the side-play amount 63 of content reference position, step 5), calculating content " node " length of choosing between the p node that content " effectively node " end position and nearest DOM node i d are 4 is 4, step 6), from with reference to node, each node before the p node that traversal id is 4, and whether decision node label is the label in special tag table, if, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node: first node is <script> node, <script> label is the label in special tag table, its respective value is 0, and getting this node content length is 0, second node is that id is 1 p node, and <p> is not the label in special tag table, and its node content " in first effective node " length is 18, the 3rd node is <img> node, and <img> label is the label in special tag table, and its length is 1, and getting this node ' s length is 1, the 4th node is that id is 2 p node, and <p> is not the label in special tag table, and its node content length is 0, the 5th node is that id is the child node <i> of 2 p node, <i> is not the label in special tag table, and its node content " in the first child node of second effective node " length is 30, the 6th node is that id is 3 p node, and <p> is not the label in special tag table, and the length of its node content " the 3rd of This is is effective " is 18, step 7), in each node, the length of content adds up before the p node that is 4 by id, i.e. and 0+18+1+0+30+18 obtains id and is the side-play amount 67 of 4 p node, step 8), the side-play amount 67 of the p node that is 4 by id adds the content-length 4 of choosing between the p node that end of text position and id are 4, obtains choosing the side-play amount 71 of end of text position.
User need to reproduce while choosing content " effectively content ", first from server, obtains locating information, and locates and choose content in new web page according to locating information, and its step is as follows: step 1), from locating information, obtain following information: webpage url " http://xxxxx.xxx", with reference to node " <body> ", choose content " effectively node ", choose the side-play amount 63 of content reference position and choose the side-play amount 71 of end of text position, step 2), the new web page of opening " http://xxxxx.xxx" in, traversal is with reference to the child node in node " <body> ", every through a node, whether decision node label is the label in special tag table, if, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node, and the length of content in node is added up: first node is <script> node, <script> label is the label in special tag table, its respective value is 0, getting this node content length is 0, accumulation length is 0, second node is that id is 1 p node, and <p> is not the label in special tag table, and its node content " in first effective node " length is 18, and accumulation length is 18, the 3rd node is <img> node, and <img> label is the label in special tag table, and its length is 1, and getting this node ' s length is 1, and accumulation length is 19, the 4th node is that id is 2 p node, and <p> is not the label in special tag table, and its node content length is 0, and accumulation length is 19, the 5th node is that id is the child node <i> of 2 p node, <i> is not the label in special tag table, its node content " in the first child node of second effective node " length is 30, and accumulation length is 49, the 6th node is that id is 3 p node, and <p> is not the label in special tag table, and the length of its node content " the 3rd of This is is effective " is 18, accumulation length 67, the 7th node is that id is 4 p node, and <p> is not the label in special tag table, and the length of its node content " node is at the 4th effective node " is 20, and accumulation length is 87, step 3), in the time that accumulated value is greater than or equal to the side-play amount 63 of choosing content reference position for the first time, in the 6th node of traversal, obtain the accumulated value 49 of the 5th node, the accumulated value 49 that the side-play amount 63 of choosing content reference position is deducted to the 5th node obtains choosing the content-length 14 between content reference position and the 5th node, step 4), obtain choosing content reference position, step 5), when cumulative value is greater than or equal to the side-play amount 71 of choosing end of text position for the first time, in the 7th node of traversal, obtain the accumulated value 67 of the 6th node, the accumulated value 67 that the side-play amount 71 of choosing end of text position is deducted to the 6th node obtains choosing the content-length 4 between content reference position and the 6th node, step 6), obtain choosing end of text position, step 7), navigate to and choose content " effectively node " according to choosing content reference position and choosing end of text position.
Suppose that web page contents changes, after changing, html code is as follows:
<html>
<body>
<script?type=”text/javascript”>
Document.write(”<h1>Hello?World!</h1>”)
</script>
First effective node </p> of <p id=1>
<img?src=”/i/eg_mouse.jpg”width=”128”height=”128”>
<p id=2><i> is </i></pGreatT.Gr eaT.GT in the first child node of second effective node
The 3rd effective </p> of <p id=3>This is
<p id=4> node is at the 4th effective node </p>
</body>
</html>
Now, user's practicality is chosen the side-play amount of content reference position and chosen content that the side-play amount of end of text position navigates to is that id is " node is " in 4 p node, from in locating information to choose content " effectively node " different, attempt reorientating, its step is as follows: step 1), obtain locating information from server: webpage url " http:// xxxxx.xxx", with reference to node " <body> ", choose content " effectively node ", choose content reference position side-play amount 63, choose end of text position side-play amount 71, choose the number of times " 4 " that content occurs in reference to node and choose content to appear at which time " 3 "; Step 2), the new web page of opening " http:// xxxxx.xxx" choose content " effectively node " with reference to retrieval in node " <body> ", find altogether to have occurred 4 times; Step 3), the number of times 4 of computing machine judgement retrieval gained with in locating information, choose the number of times " 3 " that content occurs in reference to node and equate; Step 4), " effectively node " that in retrieval, the 3rd time occurs reorientated as choosing content; Step 5), the side-play amount that calculates the reference position of choosing content of reorientating is 59, the side-play amount of end position is 67, is updated in the locating information of having preserved in server.
Suppose that web page contents changes again, after changing, html code is as follows:
<html>
<body>
<script?type=”text/javascript”>
Document.write(”<h1>Hello?World!</h1>”)
</script>
First effective node </p> of <p id=1>
<img?src=”/i/eg_mouse.jpg”width=”128”height=”128”>
<p id=2><i> is </i></pGreatT.Gr eaT.GT in second the first child node that has a node
The 3rd effective </p> of <p id=3>This is
<p id=4> node is at the 4th effective node </p>
</body>
</html>
Now, user's practicality is chosen the side-play amount of content reference position and chosen content that the side-play amount of end of text position navigates to is that id is " o'clock the 4th " in 4 p node, from in locating information to choose content " effectively node " different, attempt reorientating, its step is as follows: step 1), obtain locating information from server: webpage url " http:// xxxxx.xxx", with reference to node " <body> ", choose content " effectively node ", choose content reference position side-play amount 63, choose end of text position side-play amount 71, choose the number of times " 4 " that content occurs in reference to node and choose content to appear at which time " 3 "; Step 2), the new web page of opening " http:// xxxxx.xxx" choose content " effectively node " with reference to retrieval in node " <body> ", find altogether to have occurred 3 times; Step 3), in the number of times " 3 " of computing machine judgement retrieval gained and locating information, choose the number of times " 4 " that content occurs in reference to node unequal, reminding user new web page content changes, and cannot reorientate.
Above the method and system of choosing content on storage provided by the present invention and locating web-pages is described in detail.In instructions and claim, be used in reference to the numbering of each step when describing method, unless specialized or outside context can uniquely be determined, do not represented the order of each step.Applied specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment is just for helping to understand method of the present invention and core concept thereof.It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention, can also carry out some improvement and modification to the present invention, these improvement and modification also fall in the protection domain of the claims in the present invention.

Claims (10)

1. storage organization document is chosen a method for content, comprises the steps:
Calculation procedure, taking one or more arbitrary node as with reference to node, calculates user and chooses the side-play amount of content reference position and the side-play amount of end position;
Storing step, stores server into reference to node, the side-play amount choosing the side-play amount of content reference position and choose end of text position.
2. storage organization document as claimed in claim 1 is chosen the method for content, and wherein calculation procedure comprises following sub-step,
The content-length calculation procedure of reference position, calculates the content-length of choosing between content reference position and nearest document node;
Traversal step, from reference to node, travels through nearest document node each node before, and whether decision node label be the label in special tag table, if so, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node;
Document node side-play amount calculation procedure recently, adds up the length of content in each node before nearest document node, obtains the side-play amount of nearest document node;
Choose content reference position side-play amount calculation procedure, the side-play amount of nearest document node is added to the content-length of choosing between content reference position and nearest document node, obtain choosing the side-play amount of content reference position.
3. storage organization document as claimed in claim 1 is chosen the method for content, and wherein calculation procedure comprises following sub-step,
The content-length calculation procedure of end position, calculates the content-length of choosing between end of text position and nearest document node;
Traversal step, from reference to node, travels through nearest document node each node before, and whether decision node label be the label in special tag table, if so, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node;
Node side-play amount calculation procedure, adds up the length of content in each node before nearest document node, obtains the side-play amount of nearest document node;
The side-play amount calculation procedure of end position, adds the side-play amount of nearest document node the content-length of choosing between end of text position and nearest document node, obtains choosing the side-play amount of end of text position.
4. location structure document is chosen a method for content, comprises the steps:
Input step, reads locating information from server;
Positioning step, locates and chooses content in new construction document according to locating information.
5. location structure document as claimed in claim 4 is chosen the method for content, and wherein positioning step comprises,
Traversal step, in new construction document, traversal is with reference to the child node in node, every through a node, whether decision node label is the label in special tag table, if, using the length of value content in node that in table, label is corresponding, if not, obtain the length of content in node, and the length of content in node is added up;
Content-length obtaining step, in the time that accumulated value is greater than or equal to the side-play amount of choosing content reference position for the first time, obtain the accumulated value of a nearest document node before this node, the accumulated value that the side-play amount of choosing content reference position is deducted to nearest document node obtains choosing content reference position and the content-length between document node recently;
Reference position determining step, obtains choosing content reference position according to described content-length;
Content-length determining step, when cumulative value is greater than or equal to the side-play amount of choosing end of text position for the first time, obtain the accumulated value of a nearest document node before this node, the accumulated value that the side-play amount of choosing end of text position is deducted to nearest document node obtains choosing end of text position and the content-length between document node recently;
Choose end of text location positioning step, obtain choosing end of text position according to described content-length;
Choose content positioning step, according to choosing content reference position and choosing location, end of text position and choose content.
6. reorientate structured document and choose a method for content, comprise the steps:
Locating information read step, obtains locating information from server;
Searching step, new construction document with reference to node in retrieve the content of choosing in locating information, record the number of times of its appearance;
Number of times comparison step, the number of times that judges retrieval gained with in locating information, choose the number of times that content occurs in reference to node and whether equate, if unequal, reminding user new construction document content changes, cannot reorientate, if equated, continue operation;
Reorientate step, which appears at time reorientate and to choose content according to choosing content in locating information.
7. the re-positioning method of choosing content as claimed in claim 6, wherein locating information read step from server obtain locating information comprise with reference to node, choose content, choose content reference position side-play amount, choose end of text position side-play amount, choose the number of times that content occurs with reference to node and choose content to appear at which time.
8. storage organization document is chosen a device for content, and it comprises:
Computing module, this module, taking one or more arbitrary node as with reference to node, calculates user and chooses the side-play amount of content reference position and the side-play amount of end position;
Memory module, this module stores server into reference to node, the side-play amount choosing the side-play amount of content reference position and choose end of text position.
9. location structure document is chosen a device for content, and it comprises:
Load module, this module reads locating information from server;
Locating module, this module is located and is chosen content in new construction document according to locating information.
10. reorientate structured document and choose a device for content, it comprises:
Locating information read module, this module obtains locating information from server;
Retrieval module, this module new construction document with reference to node in retrieve the content of choosing in locating information, record the number of times of its appearance;
Number of times comparison module, the number of times that this module judges retrieval gained with in locating information, choose the number of times that content occurs in reference to node and whether equate, if unequal, reminding user new construction document content changes, cannot reorientate, if equated, continue operation;
Reorientate module, which this module appears at and time reorientates and to choose content according to choosing content in locating information.
CN201410300699.3A 2014-06-27 2014-06-27 Storage and location structure document choose the method and system of content Active CN104036026B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410300699.3A CN104036026B (en) 2014-06-27 2014-06-27 Storage and location structure document choose the method and system of content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410300699.3A CN104036026B (en) 2014-06-27 2014-06-27 Storage and location structure document choose the method and system of content

Publications (2)

Publication Number Publication Date
CN104036026A true CN104036026A (en) 2014-09-10
CN104036026B CN104036026B (en) 2018-02-23

Family

ID=51466796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410300699.3A Active CN104036026B (en) 2014-06-27 2014-06-27 Storage and location structure document choose the method and system of content

Country Status (1)

Country Link
CN (1) CN104036026B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090947A (en) * 2014-07-03 2014-10-08 无锡市崇安区科技创业服务中心 Method for storing and locating selected content on web page

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1949225A (en) * 2006-11-23 2007-04-18 金蝶软件(中国)有限公司 XML file preprocessing method, apparatus, file structure, reading method and device
CN101271474A (en) * 2007-03-20 2008-09-24 株式会社东芝 System for and method of searching structured documents using indexes
CN102254009A (en) * 2011-07-15 2011-11-23 福建星网锐捷通讯股份有限公司 Method for extracting data of webpage table
CN103605675A (en) * 2013-10-30 2014-02-26 北京京东尚科信息技术有限公司 XML (extensive markup language) path expression extracting method and device
CN103635897A (en) * 2011-06-23 2014-03-12 微软公司 Dynamically updating a running page

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1949225A (en) * 2006-11-23 2007-04-18 金蝶软件(中国)有限公司 XML file preprocessing method, apparatus, file structure, reading method and device
CN101271474A (en) * 2007-03-20 2008-09-24 株式会社东芝 System for and method of searching structured documents using indexes
CN103635897A (en) * 2011-06-23 2014-03-12 微软公司 Dynamically updating a running page
CN102254009A (en) * 2011-07-15 2011-11-23 福建星网锐捷通讯股份有限公司 Method for extracting data of webpage table
CN103605675A (en) * 2013-10-30 2014-02-26 北京京东尚科信息技术有限公司 XML (extensive markup language) path expression extracting method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
匿名: "jQuery offset,position,offsetParent,scrollLeft,scrollTop html 控件定位 css position属", 《DESERT3.ITEYE.COM/BLOG/1561965》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090947A (en) * 2014-07-03 2014-10-08 无锡市崇安区科技创业服务中心 Method for storing and locating selected content on web page

Also Published As

Publication number Publication date
CN104036026B (en) 2018-02-23

Similar Documents

Publication Publication Date Title
CN103488732A (en) Generation method and device of static pages
TWI592807B (en) Method and device for web style address merge
KR20140012664A (en) Method for rearranging web page
US20150143230A1 (en) Method and device for displaying webpage contents in browser
CN103020156B (en) A kind of disposal route for webpage, device and equipment
CA2817554A1 (en) Mobile content management system
US20160092566A1 (en) Clustering repetitive structure of asynchronous web application content
CN109492177B (en) web page blocking method based on web page semantic structure
CN104239356A (en) Webpage commenting method and system and browser
CN110020312B (en) Method and device for extracting webpage text
CN104462540A (en) Webpage information extraction method
CN103678509A (en) Method and device for generating webpage template
US9465814B2 (en) Annotating search results with images
CN103473347A (en) Web page similarity-based browser rendering optimization method
CN103902571A (en) Method and system for saving webpage complete content and corresponding client end and server
CN106033387B (en) The method and apparatus for testing flash intrinsic controls
CN102880707A (en) Method and device for webpage body content recognition
CN102880679A (en) Method and device for storing webpage information
CN103020179A (en) Method, device and equipment for extracting webpage contents
CN105589918B (en) A kind of method and device for extracting page info
CN110110184B (en) Information inquiry method, system, computer system and storage medium
US20150169567A1 (en) Search result image display environment and background
JP4539438B2 (en) COLLECTING METHOD AND DEVICE FOR TRACKBACK SOURCE COMMENT / TRACKBACK, PROGRAM, AND COMPUTER-READABLE STORAGE MEDIUM CONTAINING THE PROGRAM
CN104036026A (en) Methods and systems for storing and positioning selected content of structured document
CN104462390A (en) Method and system for improving efficiency of adaptive layout of webpages

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent of invention or patent application
CB02 Change of applicant information

Address after: Yuhua Road, Qinhuai District of Nanjing City, Jiangsu province 210000 No. 22 treasure garden 22-302

Applicant after: Wu Taojun

Address before: 200000 West Yan'an Road 900 Road, Changning District, Shanghai

Applicant before: Wu Taojun

GR01 Patent grant
GR01 Patent grant