CN105740475A - Web page transformation method and system - Google Patents
Web page transformation method and system Download PDFInfo
- Publication number
- CN105740475A CN105740475A CN201610154451.XA CN201610154451A CN105740475A CN 105740475 A CN105740475 A CN 105740475A CN 201610154451 A CN201610154451 A CN 201610154451A CN 105740475 A CN105740475 A CN 105740475A
- Authority
- CN
- China
- Prior art keywords
- page
- object model
- unit
- document object
- subtree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000011426 transformation method Methods 0.000 title abstract 3
- 238000000034 method Methods 0.000 claims abstract description 65
- 238000013507 mapping Methods 0.000 claims abstract description 58
- 238000012545 processing Methods 0.000 claims abstract description 28
- 238000006243 chemical reaction Methods 0.000 claims description 68
- 230000003252 repetitive effect Effects 0.000 claims description 36
- 238000002347 injection Methods 0.000 claims description 5
- 239000007924 injection Substances 0.000 claims description 5
- 238000013461 design Methods 0.000 abstract description 12
- 238000012986 modification Methods 0.000 abstract 1
- 230000004048 modification Effects 0.000 abstract 1
- 230000009466 transformation Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 6
- 239000000203 mixture Substances 0.000 description 6
- 230000005611 electricity Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 230000000284 resting effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Document Processing Apparatus (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a web page transformation method and system.The method comprises the steps that an HTML document object model is adopted for processing an input HTML document, and a document object model tree is obtained; the document object model tree is subjected to subtree processing, and page information units corresponding to subtrees are obtained according to the processing result; the page information units are subjected to similarity processing, and a mapping relation between the page information units is established according to the similarity processing result; according to the mapping relation, information of a page to be transformed is input into a target page according to the mapping relation, and page transformation is completed.By means of the web page transformation method and system, the requirement for the skills of a web page designer on UI design is lowered, the designer only needs to conduct slight modification and adjustment on the produced and transformed web page, the web page designer can visually select a web page template, and the generated page can satisfy the fondness of users to the maximum extent.
Description
Technical field
The present invention relates to Internet technical field, particularly relate to a kind of web page conversion method and system.
Background technology
In recent years, along with the safe day by day of the universal of network and the technology of E-Payment by mails, specialized, shopping online became a kind of important way of resident's shopping.One household appliances manufacturer such as Taobao, sky cat, Suning, Guomei emerges day by day, drives the development of electricity business's industry.Along with the demand that people are growing, increasing businessman has offered Online Store, brings the great demand to Online Store's class webpage design therewith.It practice, existing a large amount of Online Store is splendid design sample, if a kind of method can be had can to make oneself Online Store rapidly according to these web page templates can reduce design cost largely.Regrettably, not yet there are effective means that template can be utilized to build webpage rapidly at present.Mainly be similar to QQ space one key changes skin to the method for the fast Template conversion that presently, there are, and blog one key changes the conversion of page method based on equity template of this type of skin;Page reconstructing method based on color transfer.
But, there is the template to original web page and target web and require strict equity in the above-mentioned conversion of page method based on equity template, can inject by the accurate of guarantee information, but the aspect such as integral layout, plug-in unit layout, overall framework changes after this result also in conversion of page, this page converts and is merely resting on the color on surface and the conversion of style and is not the webpage that generates different designs, it is impossible to is regarded as template truly and converts;Page reconstructing method based on color transfer is only the conversion in color rather than the conversion in template, can not generate the new page of application another kind design.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, the invention provides a kind of web page conversion method and system, reduce webpage design personnel to the UI grounding in basic skills designed, and the webpage of the conversion produced has only to designer and slightly revises and adjust, webpage design personnel can select Template web page intuitively so that the page of generation can meet the hobby of user to greatest extent.
In order to solve above-mentioned technical problem, embodiments providing a kind of web page conversion method, described method includes:
Adopt html document object model that the html file of input is processed, obtain document object model tree;
Described document object model tree is carried out subtree process, obtains, according to result, the page info unit that described subtree is corresponding;
Described page info unit is carried out similarity process, sets up the mapping relations between described page info unit according to similarity result;
According to described mapping relations, page info to be converted is injected in target pages along described mapping relations, completes conversion of page.
Preferably, described html file includes html file to be converted and target html file.
Preferably, described described document object model tree is carried out subtree process, obtain, according to result, the page info unit that described subtree is corresponding, including:
Described document object model tree is carried out traversal processing, obtains the leaf node of described document object model tree;
Determining that described leaf node current depth is d, according to described degree of depth d, expansion depth is the leaf node of d-1;
Calculate the frequency of occurrence in described document object model tree of the subtree in the leaf node that the described degree of depth is d-1;
If judging when described frequency of occurrence is not less than threshold value, then continue to extend the described leaf node degree of depth, if described frequency of occurrence is less than threshold value, then export described subtree as DOM Document Object Model subtree;
Obtain the page info unit that described DOM Document Object Model subtree is corresponding.
Preferably, described described page info unit is carried out similarity process, set up the mapping relations between described page info unit according to similarity result, including:
Described page info unit is carried out similarity process, obtains two page info units that in described page info unit, similarity is the highest;
It is associated two the highest for described similarity page info units processing, obtains the mapping relations between page info unit.
Preferably, described according to the described mapping relations between described page info unit, page info to be converted is injected in target pages along described mapping relations, including:
DOM Document Object Model subtree is processed, obtains the minimum repetitive of described DOM Document Object Model subtree;
Page info to be converted is injected along described mapping relations the described minimum repetitive of target pages, obtains and change minimum repetitive;
Carry out conversion of page according to the minimum repetitive of described conversion, complete conversion of page.
Correspondingly, the embodiment of the present invention additionally provides a kind of web page converting system, and described system includes:
Document processing module: for adopting html document object model that the html file of input is processed, obtain document object model tree;
Page info unit acquisition module: for described document object model tree is carried out subtree process, obtains, according to result, the page info unit that described subtree is corresponding;
Module is set up in mapping: for described page info unit is carried out similarity process, set up the mapping relations between described page info unit according to similarity result;
Page conversion module: for according to described mapping relations, page info to be converted being injected in target pages along described mapping relations, completes conversion of page.
Preferably, described html file includes html file to be converted and target html file.
Preferably, described page info unit acquisition module includes:
Traversal processing unit: for described document object model tree is carried out traversal processing, obtain the leaf node of described document object model tree;
The degree of depth is determined and expanding element: being used for determining that described leaf node current depth is d, according to described degree of depth d, expansion depth is the leaf node of d-1;
Frequency of occurrence computing unit: for calculating the subtree frequency of occurrence in described document object model tree in the leaf node that the described degree of depth is d-1;
Judging unit: if be not less than threshold value for the described frequency of occurrence of judgement, then continue to extend the described leaf node degree of depth, if described frequency of occurrence is less than threshold value, then export described subtree as DOM Document Object Model subtree;
Page info unit acquiring unit: obtain the page info unit that described DOM Document Object Model subtree is corresponding.
Preferably, described mapping is set up module and is included:
Similar retrieval unit: for described page info unit being carried out similarity process, obtain two page info units that in page info unit, similarity is the highest;
Unit is set up in mapping: for being associated processing according to two page info units that described similarity is the highest, obtain the mapping relations between page info unit.
Preferably, described page conversion module includes:
Subtree processing unit: for DOM Document Object Model subtree is processed, obtain the minimum repetitive of described DOM Document Object Model subtree;
Information injection unit: for according to described mapping relations, page info to be converted being injected along described mapping relations the described minimum repetitive of target pages, obtain and change minimum repetitive;
Conversion of page unit: for carrying out conversion of page according to the minimum repetitive of described conversion, complete conversion of page.
In implementing embodiment of the present invention process, reduce webpage design personnel to the UI grounding in basic skills designed, and the webpage of the conversion produced has only to designer and slightly revises and adjust, webpage design personnel can select Template web page intuitively so that the page of generation can meet the hobby of user to greatest extent;In whole process, user has only to select original web page and Template web page, and whole process all need not manually participate in, and user has only to wait new web page file generated, easy and simple to handle, fast operation, especially because electricity business's structure of web page is often similar, its structure is more similar, and the effect of conversion is more good.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, the accompanying drawing used required in embodiment or description of the prior art will be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the premise not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the schematic flow sheet of the web page conversion method of the embodiment of the present invention;
Fig. 2 is the schematic flow sheet of the page info unit acquisition of the embodiment of the present invention;
Fig. 3 is the schematic flow sheet of the page info to be converted injection target pages of the embodiment of the present invention;
Fig. 4 is the structure composition schematic diagram of the web page converting system of the embodiment of the present invention;
Fig. 5 is the structure composition schematic diagram of the page info unit acquisition module of the embodiment of the present invention;
Fig. 6 is the structure composition schematic diagram of the page conversion module of the embodiment of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, the every other embodiment that those of ordinary skill in the art obtain under not making creative work premise, broadly fall into the scope of protection of the invention.
Fig. 1 is the schematic flow sheet of the web page conversion method of the embodiment of the present invention, as it is shown in figure 1, the method includes:
S11: adopt html document object model that the html file of input is processed, obtain document object model tree;
S12: the document object model tree is carried out subtree process, obtains, according to result, the page info unit that this subtree is corresponding;
S13: this page info unit is carried out similarity process, sets up the mapping relations between this page info unit according to similarity result;
S14: according to these mapping relations, injects page info to be converted in target pages along these mapping relations, completes conversion of page.
S11 is described further:
DOM Document Object Model (DOM) is a kind of interface unrelated with browser, platform and language, DOM Document Object Model gives web designer or one unified standard of software developer fully, it is possible to allow web designer or software developer access the data in the website of oneself, script and top layer object.Adopt html document object model (DOM) that the html file of input is processed, the structure tree with element, attribute and text in html file is presented, this structure tree is called document object model tree, and wherein input html file is divided into html file to be converted and target html file.
In implementing the present embodiment process, if run into the irregular html file page, by using the increase income storehouse of increasing income in storehouse etc. of beautifulsoup code file to be converted to the html file having reference format, then html document object model (DOM) is adopted to process.
S12 is described further:
The document object model tree is carried out DOM Document Object Model subtree process, obtains DOM Document Object Model subtree, obtain, according to the document object model subtree, the page info unit that subtree is corresponding;Wherein, DOM Document Object Model subtree is the subtree that the degree of depth is deep as far as possible, frequency of occurrence is high retaining semantics information.
Fig. 2 is the schematic flow sheet of the page info unit acquisition of the embodiment of the present invention, as in figure 2 it is shown, this schematic flow sheet is as follows:
S121: the document object model tree carries out traversal processing, obtains the leaf node of the document object model tree;
S122: determine that this leaf node current depth is d, according to this degree of depth d, expansion depth is the leaf node of d-1;
S123: calculate the frequency of occurrence in the document object model tree of the subtree in the leaf node that this degree of depth is d-1;
S124: if this frequency of occurrence is not less than threshold value, then return S122, if this frequency of occurrence is less than threshold value, then exports this subtree as DOM Document Object Model subtree;
S125: obtain the page info unit that the document object model subtree is corresponding.
S121 is described further:
Document object model tree is traveled through by the mode that can adopt level traversal or extreme saturation in embodiments of the present invention, by the document object model tree travels through each leaf node information obtaining the document object model tree.
S122 is described further:
Characteristic according to tree, it is determined that the degree of depth of current leaf node, sets the current leaf node degree of depth as d, adopts the process of " growth " to extend on the leaf node that the degree of depth is d-1 layer.
S123 is described further:
When the process adopting " growth " extends on the leaf node that the degree of depth is d-1 layer, statistics gets the subtree quantity in the leaf node of d-1 layer, then calculates these subtrees frequency of occurrence whole document object model tree.
S124 is described further:
The frequency of occurrence obtained and threshold value are compared, if frequency of occurrence is not less than comparison threshold value, then needs to return S122;If this frequency of occurrence is less than comparison threshold value, export this subtree as DOM Document Object Model subtree;The comparison threshold value adopted in embodiments of the present invention is 2, however it is necessary that and determines according to the concrete condition of user, and threshold value can be set according to different situations.
It is below the example of DOM Document Object Model subtree:
S125 is described further:
Page info unit corresponding in the document object model subtree is obtained by DOM Document Object Model subtree;Wherein, page info unit is the unit of structure and information conversion between the page.
It is below the page info unit of the example of DOM Document Object Model subtree in S124:
S13 is described further:
This page info unit is carried out similarity process, sets up the mapping relations between this page info unit according to similarity result.
Further, adopt the mode of traversal to treat the information word in conversion page and target pages information word carries out traversal processing, obtain the page info unit that the mutual similarity in page info to be converted unit and target pages information word is the highest;Specifically two recirculate, first recirculates travels through all page info unit Ae of all page infos to be converted unit, to each Ae second recirculate traversal target pages information word in all page info unit Be time, mutually compare, obtain similarity S between the two, after traversal has compared, S is maximum, then illustrate that this page info unit Ae is the most relevant to page info unit Be.
Then, it is associated processing by page info unit the highest for two similaritys, sets up the mapping relations between page info unit.
In searching loop process, by the black box function Fnlp that natural speech processes, first recirculate traversal and second recirculate traversal time, two page info units of input, output is the display S of the two page info unit;In page info unit association process, by the information extraction of each page info unit out, transfer a phrase vector V to, utilize the distance between the element of Fnlp function two V of definition, consider that the length between different V is likely different, adopt the technology DTW of dynamic calculation vector distance, calculate the distance between phrase vector V, obtain the page to be converted mapping relations to the multi-to-multi between the page info unit of target pages.
S14 is described further:
According to these mapping relations, page info to be converted is injected in target pages along these mapping relations, completes conversion of page.
Fig. 3 is the schematic flow sheet of the page info to be converted injection target pages of the embodiment of the present invention, as it is shown on figure 3, this schematic flow sheet is as follows:
S141: DOM Document Object Model subtree is processed, obtains the minimum repetitive of the document object model subtree;
S142: page info to be converted injects this minimum repetitive of target pages along these mapping relations, obtains and changes minimum repetitive;
S143: carry out conversion of page according to the minimum repetitive of this conversion, complete conversion of page.
S141 is described further:
DOM Document Object Model subtree is carried out minimum repetitive process, obtains the minimum repetitive of the document object model subtree;Minimum repetitive refers in DOM Document Object Model subtree, have complete message structure minimum that repeat, remove semantic information structure subtree.
Further, processing procedure is from the leaf node of DOM Document Object Model subtree upwards " growth ", when extending from d layer to d-1 layer, judge current " growth " subtree out frequency of occurrence in the DOM Document Object Model subtree at its place, set threshold value F as d-1 layer interstitial content 2/3, if frequency of occurrence more than threshold value F, then judges from the subtree that d-1 layer is downward it is the minimum repetitive of the document object model subtree;If frequency of occurrence is less than threshold value F, then DOM Document Object Model subtree continues " growth ".
It is below an example of medium and small repetitive:
S142 is described further:
Page info to be converted is injected along these mapping relations this minimum repetitive of target pages, obtains and change minimum repetitive.
Further, after the mapping relations of the page info unit got between the page to be converted and target pages, the information of the page to be converted is injected in the minimum repetitive of target pages along these mapping relations, then carries out replicating extension, obtain dress and change minimum repetitive.
The minimum repetitive process that the information of the page to be converted is injected into target pages is as follows:
S143 is described further:
Carry out conversion of page according to the minimum repetitive of this conversion, complete conversion of page.
Further, descend the information injected by the page to be converted in repetitive that the information in this page original is substituted most according to this conversion, it is achieved conversion of page processes, and completes conversion of page.
Fig. 4 is the structure composition schematic diagram of the web page converting system of the embodiment of the present invention, and as shown in Figure 4, this system includes:
Document processing module 11: for adopting html document object model that the html file of input is processed, obtain document object model tree;
Page info unit acquisition module 12: for the document object model tree is carried out subtree process, obtains, according to result, the page info unit that this subtree is corresponding;
Module 13 is set up in mapping: for this page info unit is carried out similarity process, set up the mapping relations between this page info unit according to similarity result;
Page conversion module 14: for according to these mapping relations, page info to be converted being injected in target pages along these mapping relations, completes conversion of page.
Preferably, this html file includes html file to be converted and target html file.
Preferably, Fig. 5 is the structure composition schematic diagram of the page info unit acquisition module of the embodiment of the present invention, as it is shown in figure 5, this page info unit acquisition module 12 includes:
Traversal processing unit 121: for the document object model tree being carried out traversal processing, obtain the leaf node of the document object model tree;
The degree of depth is determined and expanding element 122: being used for determining that this leaf node current depth is d, according to this degree of depth d, expansion depth is the leaf node of d-1;
Frequency of occurrence computing unit 123: for calculating the subtree frequency of occurrence in the document object model tree in the leaf node that this degree of depth is d-1;
Judging unit 124: if be not less than threshold value for this frequency of occurrence of judgement, then continue to extend this leaf node degree of depth, if this frequency of occurrence is less than threshold value, then export this subtree as DOM Document Object Model subtree;
Page info unit acquiring unit 125: obtain the page info unit that the document object model subtree is corresponding.
Preferably, this mapping is set up module 13 and is included:
Similar retrieval unit: for this page info unit being carried out similarity process, obtain two page info units that in page info unit, similarity is the highest;
Unit is set up in mapping: for being associated processing according to two page info units that this similarity is the highest, obtain the mapping relations between page info unit.
Preferably, Fig. 6 is the structure composition schematic diagram of the page conversion module of the embodiment of the present invention, and as shown in Figure 6, this page conversion module 14 includes:
Subtree processing unit 141: for DOM Document Object Model subtree is processed, obtains the minimum repetitive of the document object model subtree;
Information injection unit 142: for according to these mapping relations, page info to be converted being injected along these mapping relations this minimum repetitive of target pages, obtain and change minimum repetitive;
Conversion of page unit 143: for carrying out conversion of page according to the minimum repetitive of this conversion, complete conversion of page.
Specifically, the operation principle of the system related functions module of the embodiment of the present invention referring to the associated description of embodiment of the method, can repeat no more here.
In implementing embodiment of the present invention process, reduce webpage design personnel to the UI grounding in basic skills designed, and the webpage of the conversion produced has only to designer and slightly revises and adjust, webpage design personnel can select Template web page intuitively so that the page of generation can meet the hobby of user to greatest extent;In whole process, user has only to select original web page and Template web page, and whole process all need not manually participate in, and user has only to wait new web page file generated, easy and simple to handle, fast operation, especially because electricity business's structure of web page is often similar, its structure is more similar, and the effect of conversion is more good.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment can be by the hardware that program carrys out instruction relevant and completes, this program can be stored in a computer-readable recording medium, storage medium may include that read only memory (ROM, ReadOnlyMemory), random access memory (RAM, RandomAccessMemory), disk or CD etc..
Additionally, a kind of web page the conversion method above embodiment of the present invention provided and system are described in detail, principles of the invention and embodiment are set forth by specific case used herein, and the explanation of above example is only intended to help to understand method and the core concept thereof of the present invention;Simultaneously for one of ordinary skill in the art, according to the thought of the present invention, all will change in specific embodiments and applications, in sum, this specification content should not be construed as limitation of the present invention.
Claims (10)
1. a web page conversion method, it is characterised in that described method includes:
Adopt html document object model that the html file of input is processed, obtain document object model tree;
Described document object model tree is carried out subtree process, obtains, according to result, the page info unit that described subtree is corresponding;
Described page info unit is carried out similarity process, sets up the mapping relations between described page info unit according to similarity result;
According to described mapping relations, page info to be converted is injected in target pages along described mapping relations, completes conversion of page.
2. web page conversion method according to claim 1, it is characterised in that described html file includes html file to be converted and target html file.
3. web page conversion method according to claim 1, it is characterised in that described described document object model tree is carried out subtree process, obtains, according to result, the page info unit that described subtree is corresponding, including:
Described document object model tree is carried out traversal processing, obtains the leaf node of described document object model tree;
Determining that described leaf node current depth is d, according to described degree of depth d, expansion depth is the leaf node of d-1;
Calculate the frequency of occurrence in described document object model tree of the subtree in the leaf node that the described degree of depth is d-1;
If judging when described frequency of occurrence is not less than threshold value, then continue to extend the described leaf node degree of depth, if described frequency of occurrence is less than threshold value, then export described subtree as DOM Document Object Model subtree;
Obtain the page info unit that described DOM Document Object Model subtree is corresponding.
4. web page conversion method according to claim 1, it is characterised in that described described page info unit is carried out similarity process, sets up the mapping relations between described page info unit according to similarity result, including:
Described page info unit is carried out similarity process, obtains two page info units that in described page info unit, similarity is the highest;
It is associated two the highest for described similarity page info units processing, obtains the mapping relations between page info unit.
5. web page conversion method according to claim 1, it is characterised in that described according to the described mapping relations between described page info unit, injects in target pages by page info to be converted along described mapping relations, including:
DOM Document Object Model subtree is processed, obtains the minimum repetitive of described DOM Document Object Model subtree;
Page info to be converted is injected along described mapping relations the described minimum repetitive of target pages, obtains and change minimum repetitive;
Carry out conversion of page according to the minimum repetitive of described conversion, complete conversion of page.
6. a web page converting system, it is characterised in that described system includes:
Document processing module: for adopting html document object model that the html file of input is processed, obtain document object model tree;
Page info unit acquisition module: for described document object model tree is carried out subtree process, obtains, according to result, the page info unit that described subtree is corresponding;
Module is set up in mapping: for described page info unit is carried out similarity process, set up the mapping relations between described page info unit according to similarity result;
Page conversion module: for according to described mapping relations, page info to be converted being injected in target pages along described mapping relations, completes conversion of page.
7. web page converting system according to claim 6, it is characterised in that described html file includes html file to be converted and target html file.
8. web page converting system according to claim 6, it is characterised in that described page info unit acquisition module includes:
Traversal processing unit: for described document object model tree is carried out traversal processing, obtain the leaf node of described document object model tree;
The degree of depth is determined and expanding element: being used for determining that described leaf node current depth is d, according to described degree of depth d, expansion depth is the leaf node of d-1;
Frequency of occurrence computing unit: for calculating the subtree frequency of occurrence in described document object model tree in the leaf node that the described degree of depth is d-1;
Judging unit: if be not less than threshold value for the described frequency of occurrence of judgement, then continue to extend the described leaf node degree of depth, if described frequency of occurrence is less than threshold value, then export described subtree as DOM Document Object Model subtree;
Page info unit acquiring unit: obtain the page info unit that described DOM Document Object Model subtree is corresponding.
9. web page converting system according to claim 6, it is characterised in that described mapping is set up module and included:
Similar retrieval unit: for described page info unit being carried out similarity process, obtain two page info units that in page info unit, similarity is the highest;
Unit is set up in mapping: for being associated processing according to two page info units that described similarity is the highest, obtain the mapping relations between page info unit.
10. web page converting system according to claim 6, it is characterised in that described page conversion module includes:
Subtree processing unit: for DOM Document Object Model subtree is processed, obtain the minimum repetitive of described DOM Document Object Model subtree;
Information injection unit: for according to described mapping relations, page info to be converted being injected along described mapping relations the described minimum repetitive of target pages, obtain and change minimum repetitive;
Conversion of page unit: for carrying out conversion of page according to the minimum repetitive of described conversion, complete conversion of page.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610154451.XA CN105740475B (en) | 2016-03-16 | 2016-03-16 | Web page conversion method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610154451.XA CN105740475B (en) | 2016-03-16 | 2016-03-16 | Web page conversion method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105740475A true CN105740475A (en) | 2016-07-06 |
CN105740475B CN105740475B (en) | 2020-04-28 |
Family
ID=56251099
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610154451.XA Active CN105740475B (en) | 2016-03-16 | 2016-03-16 | Web page conversion method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105740475B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991131A (en) * | 2017-03-08 | 2017-07-28 | 陕西识代运筹信息科技股份有限公司 | A kind of data processing method and device |
CN107862328A (en) * | 2017-10-31 | 2018-03-30 | 平安科技(深圳)有限公司 | The regular execution method of information word set generation method and rule-based engine |
CN118626742A (en) * | 2024-08-14 | 2024-09-10 | 浙江有数数智科技有限公司 | Processing method, device, equipment and medium for character recognition in webpage |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1295293A (en) * | 1999-11-05 | 2001-05-16 | 国际商业机器公司 | Method and system for selecting envelope from wide world web service device for users |
CN101261632A (en) * | 2008-04-08 | 2008-09-10 | 杭州电子科技大学 | FrontPage operation paper evaluation method based on HTML grammar tree |
CN102890681A (en) * | 2011-07-20 | 2013-01-23 | 阿里巴巴集团控股有限公司 | Method and system for generating webpage structure template |
US20130091414A1 (en) * | 2011-10-11 | 2013-04-11 | Omer BARKOL | Mining Web Applications |
US20140236968A1 (en) * | 2011-10-31 | 2014-08-21 | Li-Mei Jiao | Discrete Wavelet Transform Method for Document Structure Similarity |
CN104866527A (en) * | 2015-04-24 | 2015-08-26 | 美通云动(北京)科技有限公司 | Dynamic webpage template matching method and device |
-
2016
- 2016-03-16 CN CN201610154451.XA patent/CN105740475B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1295293A (en) * | 1999-11-05 | 2001-05-16 | 国际商业机器公司 | Method and system for selecting envelope from wide world web service device for users |
CN101261632A (en) * | 2008-04-08 | 2008-09-10 | 杭州电子科技大学 | FrontPage operation paper evaluation method based on HTML grammar tree |
CN102890681A (en) * | 2011-07-20 | 2013-01-23 | 阿里巴巴集团控股有限公司 | Method and system for generating webpage structure template |
US20130091414A1 (en) * | 2011-10-11 | 2013-04-11 | Omer BARKOL | Mining Web Applications |
US20140236968A1 (en) * | 2011-10-31 | 2014-08-21 | Li-Mei Jiao | Discrete Wavelet Transform Method for Document Structure Similarity |
CN104866527A (en) * | 2015-04-24 | 2015-08-26 | 美通云动(北京)科技有限公司 | Dynamic webpage template matching method and device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991131A (en) * | 2017-03-08 | 2017-07-28 | 陕西识代运筹信息科技股份有限公司 | A kind of data processing method and device |
CN107862328A (en) * | 2017-10-31 | 2018-03-30 | 平安科技(深圳)有限公司 | The regular execution method of information word set generation method and rule-based engine |
WO2019085075A1 (en) * | 2017-10-31 | 2019-05-09 | 平安科技(深圳)有限公司 | Information element set generation method and rule execution method based on rule engine |
CN118626742A (en) * | 2024-08-14 | 2024-09-10 | 浙江有数数智科技有限公司 | Processing method, device, equipment and medium for character recognition in webpage |
Also Published As
Publication number | Publication date |
---|---|
CN105740475B (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106462555B (en) | Method and system for WEB content generation | |
US11995409B2 (en) | Content generation using target content derived modeling and unsupervised language modeling | |
KR20210116379A (en) | Method, apparatus for text generation, device and storage medium | |
JP2019533205A (en) | User keyword extraction apparatus, method, and computer-readable storage medium | |
Zheng et al. | Template-independent news extraction based on visual consistency | |
CN107992542A (en) | A kind of similar article based on topic model recommends method | |
CN103810251A (en) | Method and device for extracting text | |
WO2024078105A1 (en) | Method for extracting technical problem in patent literature and related device | |
CN105740475A (en) | Web page transformation method and system | |
CN104281565A (en) | Semantic dictionary constructing method and device | |
US20230177359A1 (en) | Method and apparatus for training document information extraction model, and method and apparatus for extracting document information | |
CN111553138B (en) | Auxiliary writing method and device for standardizing content structure document | |
CN103092973B (en) | information extraction method and device | |
CN113434659B (en) | Implicit conflict sensing method in collaborative design process | |
CN113239256B (en) | Method for generating website signature, method and device for identifying website | |
CN105320641B (en) | Text verification method and user terminal | |
CN117111909A (en) | Code automatic generation method, system, computer equipment and storage medium | |
CN106339381B (en) | Information processing method and device | |
CN115269923A (en) | Method, system, equipment and medium for processing webpage text area and text information | |
Pu et al. | A vision-based approach for deep web form extraction | |
YesuRaju et al. | A language independent web data extraction using vision based page segmentation algorithm | |
CN112926318A (en) | Method for extracting new sentiment words of online shopping comments based on syntactic analysis | |
CN106802914B (en) | Heuristic multi-feature rule set webpage blocking method | |
CN108920449A (en) | A kind of document model extended method based on the modeling of extensive theme | |
CN111339289A (en) | Topic model inference method based on commodity comments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |