CN112527291A - Webpage generation method and device, electronic equipment and storage medium - Google Patents

Webpage generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112527291A
CN112527291A CN202011391660.9A CN202011391660A CN112527291A CN 112527291 A CN112527291 A CN 112527291A CN 202011391660 A CN202011391660 A CN 202011391660A CN 112527291 A CN112527291 A CN 112527291A
Authority
CN
China
Prior art keywords
document
original document
generating
intermediate format
format
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011391660.9A
Other languages
Chinese (zh)
Inventor
左杭
李孟君
杜豪
张展
何渝君
舒忠玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hanyun Technology Co Ltd
Original Assignee
Hanyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hanyun Technology Co Ltd filed Critical Hanyun Technology Co Ltd
Priority to CN202011391660.9A priority Critical patent/CN112527291A/en
Publication of CN112527291A publication Critical patent/CN112527291A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/38Creation or generation of source code for implementing user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/154Tree transformation for tree-structured or markup documents, e.g. XSLT, XSL-FO or stylesheets

Abstract

The application provides a webpage generation method, a webpage generation device, electronic equipment and a storage medium, and relates to the technical field of page design. The method comprises the following steps: acquiring an original document to be converted; determining a document tree corresponding to an original document; generating an intermediate format document according to a document tree corresponding to the original document, wherein the intermediate format document represents the hierarchical structure of the original document and the content of the original document by using a preset syntactic format; and generating a target webpage corresponding to the original document according to the intermediate format document. According to the method, the intermediate format document is generated according to the document tree corresponding to the original document, and the target webpage is generated according to the grammatical format of the intermediate format document, so that the target webpage can be automatically generated only by mastering the skill of operating the text editing software by a user, the webpage can be generated with low learning cost without learning other webpage generation tools, the process of learning and developing skills is avoided, and the webpage generation efficiency is improved.

Description

Webpage generation method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of page design technologies, and in particular, to a method and an apparatus for generating a web page, an electronic device, and a storage medium.
Background
With the increasing of website services, websites have more and more functions, and accordingly, more and more documents need to be provided on a website interface for users to use. The web page document displayed on the website interface is converted from an original document form such as Word. Therefore, when the size of the document provided by the website is large, how to realize the fast and efficient conversion from the original document to the webpage document, that is, how to generate the webpage document based on the original document, is a problem to be solved.
At present, most of webpage documents are generated by three methods, wherein the first method is based on the output of a rich text editor; the second is that programmers manually write codes to realize the code writing, synchronize the codes to a server after the codes are written, and generate or update an interface; and the third is that programmers manually write markdown (which is a lightweight markup language file) and render the markdown file by using a markdown file server.
However, the methods of the prior art all require a lot of time to edit and generate the web document, which results in low efficiency.
Disclosure of Invention
An object of the present application is to provide a method, an apparatus, an electronic device, and a storage medium for generating a web page, so as to improve the efficiency of generating the web page.
In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:
in a first aspect, an embodiment of the present application provides a method for generating a web page, where the method includes:
acquiring an original document to be converted, wherein the original document is generated by using text editing software;
determining a document tree corresponding to the original document, wherein the document tree is used for representing the hierarchical structure of the original document and the content of the original document;
generating an intermediate format document according to a document tree corresponding to the original document, wherein the intermediate format document represents the hierarchical structure of the original document and the content of the original document by using a preset syntactic format;
and generating a target webpage corresponding to the original document according to the intermediate format document.
Optionally, the generating an intermediate format document according to the document tree corresponding to the original document includes:
identifying tags in the document tree;
and generating the intermediate format document according to the type of the label in the document tree.
Optionally, the generating the intermediate format document according to the type of the tag in the document tree includes:
and if the type of the label is the type capable of generating the intermediate format document, generating a corresponding object of the label in the intermediate format document, and adding the content contained in the label in the document tree into the corresponding object.
Optionally, generating the intermediate format document according to the type of the tag in the document tree includes:
and if the type of the label is the type which can not generate the intermediate format document, discarding the label and the content contained in the label.
Optionally, the generating a target webpage corresponding to the original document according to the intermediate format document includes:
and generating a target webpage corresponding to the original document according to the mapping relation between the preset syntactic format used by the intermediate format document and the format of the target webpage.
Optionally, the generating a target webpage corresponding to the original document according to the mapping relationship between the preset syntax format used by the intermediate format document and the format of the target webpage includes:
reading a plurality of objects of the intermediate format document;
generating a code segment of each object of the intermediate format document in the target webpage according to the mapping relation between the preset syntactic format used by the intermediate format document and the format of the target webpage;
and combining the code segments of each object in the target webpage to obtain the target webpage corresponding to the original document.
Optionally, the determining a document tree corresponding to the original document includes:
reading a binary stream of the original document;
decompressing the binary stream of the original document to obtain an XML (extensible Markup Language) structure of the original document;
and determining a document tree corresponding to the original document according to the XML structure of the original document.
In a second aspect, an embodiment of the present application further provides a web page generating apparatus, where the apparatus includes: the device comprises an acquisition module, a determination module and a generation module;
the acquisition module is used for acquiring an original document to be converted, wherein the original document is generated by using text editing software;
the determining module is configured to determine a document tree corresponding to the original document, where the document tree is used to represent a hierarchical structure of the original document and content of the original document;
the generating module is used for generating an intermediate format document according to the document tree corresponding to the original document, wherein the intermediate format document represents the hierarchical structure of the original document and the content of the original document by using a preset syntactic format; and generating a target webpage corresponding to the original document according to the intermediate format document.
Optionally, the generating module is configured to identify a tag in the document tree; and generating the intermediate format document according to the type of the label in the document tree.
Optionally, the generating module is further configured to generate a corresponding object of the tag in the intermediate format document if the type of the tag is a type that can generate the intermediate format document, and add content included in the tag in the document tree to the corresponding object.
Optionally, the generating module is further configured to:
and if the type of the label is the type which can not generate the intermediate format document, discarding the label and the content contained in the label.
Optionally, the generating module is further configured to generate a target webpage corresponding to the original document according to a mapping relationship between the preset syntax format used by the intermediate format document and the format of the target webpage.
Optionally, the generating module is further configured to read a plurality of objects of the intermediate format document;
generating a code segment of each object of the intermediate format document in the target webpage according to the mapping relation between the preset syntactic format used by the intermediate format document and the format of the target webpage;
and combining the code segments of each object in the target webpage to obtain the target webpage corresponding to the original document.
Optionally, the determining module is further configured to read a binary stream of the original document;
decompressing the binary stream of the original document to obtain an XML structure of the original document;
and determining a document tree corresponding to the original document according to the XML structure of the original document.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method as provided by the first aspect.
In a fourth aspect, the present application further provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the method provided in the first aspect.
The beneficial effect of this application is:
the application provides a webpage generation method, a webpage generation device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an original document to be converted, wherein the original document is generated by using text editing software; determining a document tree corresponding to an original document, wherein the document tree is used for representing the hierarchical structure of the original document and the content of the original document; generating an intermediate format document according to a document tree corresponding to the original document, wherein the intermediate format document represents the hierarchical structure of the original document and the content of the original document by using a preset syntactic format; and generating a target webpage corresponding to the original document according to the intermediate format document. According to the method, the intermediate format document is generated according to the document tree corresponding to the original document, then the target webpage corresponding to the original document is generated according to the grammatical format of the read intermediate format document, the target webpage can be automatically generated only by mastering the skill of operating the text editing software by a user, the webpage can be generated with low learning cost without learning other webpage generation tools, the process of learning and developing skills is avoided, and the webpage generation efficiency is improved.
In addition, by judging and identifying the label type in the document tree, the label which can not generate the intermediate format document and the content contained in the label are discarded, and the accuracy of generating the intermediate format document can be improved.
Secondly, link labels which are not in the white list are erased, so that the safety of generating the target webpage is ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a method for generating a web page according to an embodiment of the present application;
fig. 3 is a schematic flowchart of another webpage generating method according to an embodiment of the present application;
fig. 4 is a schematic flowchart of another webpage generating method according to an embodiment of the present application;
fig. 5 is a schematic flowchart of another webpage generating method according to an embodiment of the present application;
fig. 6 is a schematic flowchart of another webpage generating method according to an embodiment of the present application;
fig. 7 is a schematic overall flow chart of a webpage generating method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a web page generation apparatus according to an embodiment of the present application.
Icon: 100-an electronic device; 101-a processor; 102-memory.
Detailed Description
In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.
In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure; the electronic device may be a general-purpose computer or a special-purpose computer, and both of them may be used to implement the web page generation method of the present application. As shown in fig. 1, the electronic device 100 includes: a processor 101 and a memory 102.
The memory 102 is used for storing a program, and the processor 101 calls the program stored in the memory 102 to execute the web page generation method provided in the following embodiments, which will be described in detail in the following through a plurality of specific embodiments.
Fig. 2 is a schematic flowchart of a method for generating a web page according to an embodiment of the present application, and optionally, an execution subject of the method may be a computer, a server, or other devices. As shown in fig. 2, the method includes:
s201, obtaining an original document to be converted.
Wherein the original document is a document generated using text editing software.
In some embodiments, the original document may be a document generated by word, wps, etc. text editing software.
In one implementable manner, for example, the start node and child nodes of a word are obtained. The start node includes a version and a format of a word, for example, the version of the word may be: 2003 version, 2006 version, 2013 version, etc. The format of the word can be any one of doc and docx.
The child nodes contain the content and format corresponding to each child node. Wherein, the content corresponding to the child node can be the content in the word document; the format may be: multiple title hierarchies, font sizes, font types, margins, picture content, etc. in word document content.
S202, determining a document tree corresponding to the original document.
The document tree is used for representing the hierarchical structure of the original document and the content of the original document.
In an implementation mode, the hierarchical structure and the content of the original document are determined according to the obtained starting node and the child nodes of the original document. For example, if a plurality of title structures such as a first-level title, a second-level title, and a third-level title are included in the child nodes of the original document, the hierarchical structure of the original document can be determined according to the title structures in the child nodes.
S203, generating an intermediate format document according to the document tree corresponding to the original document.
The intermediate format document represents the hierarchical structure of the original document and the content of the original document by using a preset syntactic format.
In an implementation manner, the intermediate format document may be a markdown file, and the markdown file replaces typesetting with simple syntax, unlike a large amount of typesetting and font setting in a commonly used text editing software word or wps.
For example, the hierarchical structure of the original document is obtained by traversing the document trees corresponding to the original document one by one, for example, by converting the multi-level titles in the traversed document trees into the corresponding title hierarchical structure by using the title syntax format in the markdown file.
Optionally, for example, the picture syntax in the markdown file may be used to compress the picture in the document tree, and the link tag in the document tree may be filtered, so as to generate an intermediate format file, i.e., a markdown file.
Optionally, before the original document to be converted is obtained, the file name of the original document is named according to a preset rule, so that the file name of the word can be subsequently segmented to generate a directory structure of the intermediate format file. For example, the original document is named as a.b.c.doc, an a directory is correspondingly created, a b directory is created under the a directory, and c.md is created under the b directory, and then, the generated markdown file names and pictures are stored in corresponding directories to generate a plurality of directories named by the file names of the original document and corresponding picture resources.
And S204, generating a target webpage corresponding to the original document according to the intermediate format document.
In an implementation manner, the corresponding target webpage code segment can be generated according to the intermediate format document, and then the target webpage code generator is used for generating a complete target webpage interface, and the directory structure of the corresponding target webpage can be generated according to the target webpage name, so as to achieve the effect of generating the target webpage.
To sum up, an embodiment of the present application provides a method for generating a web page, where the method includes: acquiring an original document to be converted, wherein the original document is generated by using text editing software; determining a document tree corresponding to an original document, wherein the document tree is used for representing the hierarchical structure of the original document and the content of the original document; generating an intermediate format document according to a document tree corresponding to the original document, wherein the intermediate format document represents the hierarchical structure of the original document and the content of the original document by using a preset syntactic format; and generating a target webpage corresponding to the original document according to the intermediate format document. According to the method, the intermediate format document is generated according to the document tree corresponding to the original document, and then the target webpage corresponding to the original document is generated according to the grammatical format of the read intermediate format document, so that the target webpage can be generated by a user only needing to operate the text editing software skillfully, the webpage can be generated with low learning cost without learning other webpage generation tools, the process of learning and developing skills is avoided, and the webpage generation efficiency is improved.
Fig. 3 is a schematic flowchart of another webpage generating method according to an embodiment of the present application; as shown in fig. 3, the step S203: generating the intermediate format document according to the document tree corresponding to the original document, and may further include:
s301, identifying the label in the document tree.
In general, the document tree includes a plurality of tag types in the original document content, for example, a plurality of tag types such as a picture tag img, a link tag a, an in-line tag span, a segment tag p, and the like.
Therefore, the type of each label in the document tree needs to be judged and identified, and the conversion of the content corresponding to the label which can not generate the intermediate format document is avoided, so that the generated intermediate format document can be ensured to be accurate.
S302, generating an intermediate format document according to the type of the label in the document tree.
It should be noted that different processing manners are adopted for different tag types in the document to generate the intermediate format document.
In an implementation manner, for example, if the type of a certain tag in the identified document tree is a picture tag img, the picture content contained in the picture tag img can be read, and a buffer is generated, and then the picture content is converted into encoded data which can be used for transmitting byte codes.
For convenience of understanding, in this embodiment, the encoded data is, for example, base64, and the picture content included in the picture tag img may be encoded according to base64 to obtain a corresponding hash code. It can be understood that the hashcode is generated according to the content correspondence of the picture, and the name of the picture is generated by using the hashcode, so that the picture is prevented from being refreshed without changing.
Fig. 4 is a schematic flowchart of another webpage generating method according to an embodiment of the present application; as shown in fig. 4, optionally, the step S302: generating the intermediate format document according to the type of the tag in the document tree, and may further include:
s401, judging whether the label type in the document tree is the type capable of generating the intermediate format document.
It should be noted that the document tree includes multiple types of tags, such as a picture tag, a link tag, a title tag, a formula tag, and an attachment tag, and if each type of tag does not generate an intermediate format document, each type of tag in the traversed document needs to be determined, so as to improve the efficiency of generating the intermediate format document.
S402, if the type of the label is the type capable of generating the intermediate format document, generating a corresponding object of the label in the intermediate format document, and adding the content contained in the label in the document tree into the corresponding object.
In a possible implementation manner, for example, if it is recognized that a type of a certain tag in the document tree is a link tag a, a link syntax in the intermediate format document is adopted to generate a corresponding object of the link tag a in the intermediate format document, and link content included in the link tag a is added to the corresponding object, so that in a subsequently generated target webpage, a user can click and open link address content included in the link tag a.
Optionally, in order to implement risk control on the target web page, whether the link label a is in a white list defined by a website to which the target web page belongs may be determined by a URL (Uniform Resource Locator), and if the link label a is not in the white list, the link label a is removed, so as to improve the security of generating the target web page.
And if the link tag is in the white list, generating a corresponding object of the link tag a in the intermediate format document.
Optionally, in step S302: generating the intermediate format document according to the type of the tag in the document tree, and may further include:
s403, if the type of the label is the type which can not generate the intermediate format document, discarding the label and the content contained in the label.
In another possible implementation, for example, the attachment tag or other special format is that no intermediate format document can be generated. If the type of the next label in the document tree is identified to be the attachment label type, discarding the attachment label and the attachment content contained in the attachment label so as to improve the accuracy and efficiency of generating the intermediate format document.
Optionally, in step S204: generating a target webpage corresponding to the original document according to the intermediate format document, and may further include: and generating a target webpage corresponding to the original document according to the mapping relation between the preset syntactic format used by the intermediate format document and the format of the target webpage.
In one possible implementation, such as after determining the formats of the intermediate format document and the target web page, then a mapping relationship between the two may be determined. For example, a mapping relationship between markdown files and HTML (Hyper Text Markup Language) format.
Fig. 5 is a schematic flowchart of another webpage generating method according to an embodiment of the present application; as shown in fig. 5, on the basis of the foregoing embodiment, the generating of the target web page corresponding to the original document according to the mapping relationship between the preset syntax format used by the intermediate format document and the format of the target web page may further include the following steps:
s501, reading a plurality of objects of the intermediate format document.
In the present embodiment, for convenience of understanding, for example, a file in which the intermediate format document is a markdown format is described as an example.
If the objects of the markdown file contain the content and the basic attribute corresponding to the markdown file, the read markdown file directory can be analyzed and traversed to obtain a plurality of objects in the corresponding markdown file.
S502, generating code segments of each object of the intermediate format document in the target webpage according to the mapping relation between the preset syntactic format used by the intermediate format document and the format of the target webpage.
In this embodiment, in order to facilitate understanding, if the format of the target web page is HTML, an HTML code segment in the target web page may be generated for each object in the markdown file according to a mapping relationship between a preset syntax format used by the markdown file and the HTML format.
S503, combining the code segments of each object in the target webpage to obtain the target webpage corresponding to the original document.
Generally, before the code segment of each object in the target web page is combined, each slot function of the common interface in the target web page needs to be defined, such as a series of slot functions defining content, footer, header, sidebar, or ad slot.
After definition, for example, based on the above embodiment, the HTML generator will put the HTML code segment into the designated slot to generate the complete target web page interface. If the content slots can be inserted into the document contents in the markdown file contained in the HTML code segment, the sidebar can generate interface navigation according to the directory structure, and each slot has different functions, so that the corresponding HTML code segment in the target webpage can be processed according to different slot functions.
In the present embodiment, for example, the document content in the markdown file contained in the HTML code segment is inserted into the content slot by the HTML generator to generate the document content in the target web page; a corresponding directory structure can be generated according to the markdown object and inserted into the sidebar slot to generate a series of operations such as a corresponding navigation directory; a menu structure generated from the markdown object may also be inserted into a menu slot to generate a menu directory in the target web page. And finally, obtaining the target webpage corresponding to the original document.
Fig. 6 is a schematic flowchart of another webpage generating method according to an embodiment of the present application; as shown in fig. 6, the above step S202: determining a document tree corresponding to the original document, which may further include:
s601, reading the binary stream of the original document.
In an implementation manner, for example, the file name and the file content of the original document word may be read and cached to a memory buffer.
S602, carrying out decompression processing on the binary stream of the original document to obtain the XML structure of the original document.
It should be noted that the decompression process is to facilitate reading of multiple formats within the original document.
On the basis of the above embodiment, ZIP (which is a relatively simple archive format that compresses each file separately) may be used to decompress the binary stream buffered in the buffer to generate a corresponding XML-structured file.
S603, determining a document tree corresponding to the original document according to the XML structure of the original document.
In one possible implementation, for example, the XML structure file of the original document is traversed, a following node is generated according to the XML structure file of the original document, and then each following node is traversed:
(1) and generating an object, and storing the type of the traversed XML structure and the corresponding text into the object.
(2) Identifying whether the traversed XML structures have nested XML structures, if so, traversing the nested XML structures, and continuing the operation of the step (1); if not, the current object is inserted into the parent attribute, and then traversal continues to the next following node until the end.
Fig. 7 is a schematic overall flow chart of a webpage generating method according to an embodiment of the present application; as shown in fig. 7, the method may include:
s701, reading the binary stream of the original document.
S702, carrying out decompression processing on the binary stream of the original document to obtain the XML structure of the original document.
S703, determining a document tree corresponding to the original document according to the XML structure of the original document.
S704, judging whether the label type in the document tree is the type capable of generating the intermediate format document; if so, go to step S705, otherwise, go to step S706.
S705, if the type of the intermediate format document can be generated, generating a corresponding object of the tag in the intermediate format document, and adding the content contained in the tag in the document tree into the corresponding object to generate the intermediate format document.
And S706, if the type of the intermediate format document cannot be generated, discarding the tag and the content contained in the tag.
S707, reading a plurality of objects of the intermediate format document.
S708, generating code segments of each object of the intermediate format document in the target webpage according to the mapping relation between the preset syntactic format used by the intermediate format document and the format of the target webpage.
And S709, combining the code segments of each object in the target webpage to obtain the target webpage corresponding to the original document.
For example, a brief description is given to a webpage generating method provided by the present application.
For example, the original document is: index. docx, func.role. docx, func.task. docx files.
(1) Reading the content and the file name of each original document, and segmenting the file name of each original document to generate a directory structure of a markdown file, namely a corresponding quick folder and a func folder; and generating corresponding role.md, task.md and corresponding picture files under the func folder.
(2) Reading a plurality of objects in each markdown file obtained above, then creating a slot head slot, a menu slot and a footer slot function, generating a corresponding code segment of each object in the markdown file in a target webpage, finally processing the read markdown file by using an md slot function, returning a corresponding HTML code segment, forming a complete HTML code segment with a common slot, creating a directory structure as same as the markdown file, and copying the directory structure into the target webpage.
(3) After reading each markdown file, generating a corresponding HTML code segment file, wherein the HTML code segment has complete public modules head, font and menu navigation, and after being opened, the user can click a menu navigation bar to jump to other interfaces, such as: after role.html and task.html are generated according to role.md and task.md in the func folder, all the interfaces which can be jumped under the current navigation role can be obtained in task.html, and all the interface options in the top-level func directory can also be obtained, so that all HTML code files are generated into target webpages corresponding to the original documents.
The specific execution process of the above steps and the beneficial effects thereof have been described in detail in the foregoing specific embodiments, and are not described in detail herein.
The following describes a device, a storage medium, and the like corresponding to the method for generating a web page provided by the present application, and specific implementation processes and technical effects thereof are referred to above, and are not described in detail below.
Fig. 8 is a schematic structural diagram of a web page generation apparatus according to an embodiment of the present application; as shown in fig. 6, the apparatus includes: an acquisition module 801, a determination module 802, and a generation module 803.
An obtaining module 801, configured to obtain an original document to be converted, where the original document is a document generated by using text editing software;
a determining module 802, configured to determine a document tree corresponding to an original document, where the document tree is used to represent a hierarchical structure of the original document and content of the original document;
a generating module 803, configured to generate an intermediate format document according to a document tree corresponding to an original document, where the intermediate format document represents a hierarchical structure of the original document and content of the original document by using a preset syntactic format; and generating a target webpage corresponding to the original document according to the intermediate format document.
Optionally, a generating module 803, configured to identify a label in the document tree; and generating the intermediate format document according to the type of the label in the document tree.
Optionally, the generating module 803 is further configured to generate a corresponding object of the tag in the intermediate format document if the type of the tag is a type that can generate the intermediate format document, and add the content included in the tag in the document tree to the corresponding object.
Optionally, the generating module 803 is further configured to:
and if the type of the label is the type which can not generate the intermediate format document, discarding the label and the content contained in the label.
Optionally, the generating module 803 is further configured to generate a target webpage corresponding to the original document according to a mapping relationship between a preset syntax format used by the intermediate format document and a format of the target webpage.
Optionally, the generating module 803 is further configured to read a plurality of objects of the intermediate format document;
generating a code segment of each object of the intermediate format document in the target webpage according to the mapping relation between the preset syntactic format used by the intermediate format document and the format of the target webpage;
and combining the code segments of each object in the target webpage to obtain the target webpage corresponding to the original document.
Optionally, the determining module 802 is further configured to read a binary stream of the original document;
decompressing the binary stream of the original document to obtain an XML structure of the original document;
and determining a document tree corresponding to the original document according to the XML structure of the original document.
The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Optionally, the present application also provides a program product, such as a computer readable storage medium, comprising a program which, when being executed by a processor, is adapted to carry out the above-mentioned method embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to perform some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims (10)

1. A method for generating a web page, the method comprising:
acquiring an original document to be converted, wherein the original document is generated by using text editing software;
determining a document tree corresponding to the original document, wherein the document tree is used for representing the hierarchical structure of the original document and the content of the original document;
generating an intermediate format document according to a document tree corresponding to the original document, wherein the intermediate format document represents the hierarchical structure of the original document and the content of the original document by using a preset syntactic format;
and generating a target webpage corresponding to the original document according to the intermediate format document.
2. The method of claim 1, wherein generating an intermediate format document from the document tree corresponding to the original document comprises:
identifying tags in the document tree;
and generating the intermediate format document according to the type of the label in the document tree.
3. The method of claim 2, wherein generating the intermediate format document according to the type of tag in the document tree comprises:
and if the type of the label is the type capable of generating the intermediate format document, generating a corresponding object of the label in the intermediate format document, and adding the content contained in the label in the document tree into the corresponding object.
4. The method of claim 2, wherein generating the intermediate format document according to the type of tag in the document tree comprises:
and if the type of the label is the type which can not generate the intermediate format document, discarding the label and the content contained in the label.
5. The method according to any one of claims 1-4, wherein the generating a target webpage corresponding to the original document according to the intermediate format document comprises:
and generating a target webpage corresponding to the original document according to the mapping relation between the preset syntactic format used by the intermediate format document and the format of the target webpage.
6. The method according to claim 5, wherein the generating a target webpage corresponding to the original document according to the mapping relationship between the preset syntax format used by the intermediate format document and the format of the target webpage comprises:
reading a plurality of objects of the intermediate format document;
generating a code segment of each object of the intermediate format document in the target webpage according to the mapping relation between the preset syntactic format used by the intermediate format document and the format of the target webpage;
and combining the code segments of each object in the target webpage to obtain the target webpage corresponding to the original document.
7. The method according to any of claims 1-4, wherein the determining the document tree corresponding to the original document comprises:
reading a binary stream of the original document;
decompressing the binary stream of the original document to obtain an extensible markup language (XML) structure of the original document;
and determining a document tree corresponding to the original document according to the XML structure of the original document.
8. An apparatus for generating a web page, the apparatus comprising: the device comprises an acquisition module, a determination module and a generation module;
the acquisition module is used for acquiring an original document to be converted, wherein the original document is generated by using text editing software;
the determining module is configured to determine a document tree corresponding to the original document, where the document tree is used to represent a hierarchical structure of the original document and content of the original document;
the generating module is used for generating an intermediate format document according to the document tree corresponding to the original document, wherein the intermediate format document represents the hierarchical structure of the original document and the content of the original document by using a preset syntactic format; and generating a target webpage corresponding to the original document according to the intermediate format document.
9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method according to any one of claims 1 to 7.
10. A storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202011391660.9A 2020-12-01 2020-12-01 Webpage generation method and device, electronic equipment and storage medium Pending CN112527291A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011391660.9A CN112527291A (en) 2020-12-01 2020-12-01 Webpage generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011391660.9A CN112527291A (en) 2020-12-01 2020-12-01 Webpage generation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112527291A true CN112527291A (en) 2021-03-19

Family

ID=74996190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011391660.9A Pending CN112527291A (en) 2020-12-01 2020-12-01 Webpage generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112527291A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297425A (en) * 2021-06-22 2021-08-24 超凡知识产权服务股份有限公司 Document conversion method, device, server and storage medium
CN113536182A (en) * 2021-07-12 2021-10-22 广州万孚生物技术股份有限公司 Method and device for generating long text webpage, electronic equipment and storage medium
CN113779931A (en) * 2021-08-31 2021-12-10 民商数字科技(深圳)有限公司 Knowledge base construction method based on Word and control method thereof
CN114722781A (en) * 2022-03-28 2022-07-08 慧之安信息技术股份有限公司 Method and device for converting streaming document into OFD document

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102118439A (en) * 2011-01-19 2011-07-06 百度在线网络技术(北京)有限公司 Method and device for automatically processing document contents and editor
CN110879937A (en) * 2019-10-12 2020-03-13 平安国际智慧城市科技股份有限公司 Method and device for generating webpage from document, computer equipment and storage medium
CN111061975A (en) * 2019-12-13 2020-04-24 腾讯科技(深圳)有限公司 Method and device for processing irrelevant content in page
CN111143749A (en) * 2019-12-31 2020-05-12 中国银行股份有限公司 Webpage display method, device, equipment and storage medium
CN111159099A (en) * 2019-11-15 2020-05-15 杭州数梦工场科技有限公司 Online data generation method and device, electronic equipment and storage medium
CN111797336A (en) * 2020-07-07 2020-10-20 北京明略昭辉科技有限公司 Webpage parsing method and device, electronic equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102118439A (en) * 2011-01-19 2011-07-06 百度在线网络技术(北京)有限公司 Method and device for automatically processing document contents and editor
CN110879937A (en) * 2019-10-12 2020-03-13 平安国际智慧城市科技股份有限公司 Method and device for generating webpage from document, computer equipment and storage medium
CN111159099A (en) * 2019-11-15 2020-05-15 杭州数梦工场科技有限公司 Online data generation method and device, electronic equipment and storage medium
CN111061975A (en) * 2019-12-13 2020-04-24 腾讯科技(深圳)有限公司 Method and device for processing irrelevant content in page
CN111143749A (en) * 2019-12-31 2020-05-12 中国银行股份有限公司 Webpage display method, device, equipment and storage medium
CN111797336A (en) * 2020-07-07 2020-10-20 北京明略昭辉科技有限公司 Webpage parsing method and device, electronic equipment and medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297425A (en) * 2021-06-22 2021-08-24 超凡知识产权服务股份有限公司 Document conversion method, device, server and storage medium
CN113297425B (en) * 2021-06-22 2023-09-12 超凡知识产权服务股份有限公司 Document conversion method, device, server and storage medium
CN113536182A (en) * 2021-07-12 2021-10-22 广州万孚生物技术股份有限公司 Method and device for generating long text webpage, electronic equipment and storage medium
CN113779931A (en) * 2021-08-31 2021-12-10 民商数字科技(深圳)有限公司 Knowledge base construction method based on Word and control method thereof
CN114722781A (en) * 2022-03-28 2022-07-08 慧之安信息技术股份有限公司 Method and device for converting streaming document into OFD document

Similar Documents

Publication Publication Date Title
US10067931B2 (en) Analysis of documents using rules
CN112527291A (en) Webpage generation method and device, electronic equipment and storage medium
KR101120301B1 (en) Persistent saving portal
JP2018097846A (en) Api learning
EP1672526A2 (en) File formats, methods, and computer program products for representing documents
CN111176650B (en) Parser generation method, search method, server, and storage medium
US20180260389A1 (en) Electronic document segmentation and relation discovery between elements for natural language processing
RU2579888C2 (en) Universal presentation of text to support various formats of documents and text subsystem
CN116955674B (en) Method and web device for generating graph database statement through LLM
US20130124969A1 (en) Xml editor within a wysiwyg application
CN111831384A (en) Language switching method and device, equipment and storage medium
CN112667563A (en) Document management and operation method and system
US20110078165A1 (en) Document-fragment transclusion
CN107590288B (en) Method and device for extracting webpage image-text blocks
CN107209779B (en) Storage and retrieval of structured content in an unstructured user-editable content repository
US11418622B2 (en) System and methods for web-based software application translation
CN113419721A (en) Web-based expression editing method, device, equipment and storage medium
CN110308907B (en) Data conversion method and device, storage medium and electronic equipment
CN112783482A (en) Visual form generation method, device, equipment and storage medium
US20040221228A1 (en) Method and apparatus for domain specialization in a document type definition
CN116521621A (en) Data processing method and device, electronic equipment and storage medium
CN110543641A (en) chinese and foreign language information comparison method and device
CN107423271B (en) Document generation method and device
US20120192046A1 (en) Generation of a source complex document to facilitate content access in complex document creation
JP5712496B2 (en) Annotation restoration method, annotation assignment method, annotation restoration program, and annotation restoration apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination