CN111241788B

CN111241788B - Document conversion method, device, equipment and storage medium based on linear model

Info

Publication number: CN111241788B
Application number: CN201911365591.1A
Authority: CN
Inventors: 王征徽
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2023-05-30
Anticipated expiration: 2039-12-26
Also published as: CN111241788A

Abstract

The invention relates to the technical field of big data, and discloses a document conversion method, device, equipment and storage medium based on a linear model, which are used for supporting the mutual conversion among various types of documents, reducing the document conversion time length and improving the conversion efficiency. The method comprises the following steps: receiving a document conversion instruction, wherein the document conversion instruction comprises a target identifier, and the target identifier is used for indicating that an original document of a current type needs to be converted into a target document of a target type; generating a target document tree according to the original document and the target mark, wherein the target document tree comprises view elements and content elements; invoking a preset linear model to convert the content elements of the target document tree into at least one target linear sequence; calling a preset box model to split the view elements of the target document tree into at least one target box; and generating a target document according to the at least one target box and the at least one target linear sequence, wherein the target document is of a target type.

Description

Document conversion method, device, equipment and storage medium based on linear model

Technical Field

The present invention relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, and a storage medium for converting a document based on a linear model.

Background

Document conversion is a common requirement in daily office work, office skills familiar to different staff in daily life are different, people often select office software familiar to the people to operate, and the daily office work requirement of people can be met by converting documents in different formats through document conversion.

In an electronic government document system, a client presents WORD or form to the client in a webpage document mode, notifies the client in an approval process that the webpage document is subjected to offline supplementary material through a portable document format (portable document format, PDF) document generated by the system, finally signs and seals and uploads a scanning piece to the system for archiving.

There are some techniques available on the market that can provide conversion between two different documents. These techniques often require reassembling the content according to a specific document structure, and different documents provide different document models, which requires much effort to familiarize the related techniques and document models, is error-prone, and has low conversion efficiency.

Disclosure of Invention

The invention provides a document conversion method, device, equipment and storage medium based on a linear model, which are used for supporting the mutual conversion among various types of documents, reducing the document conversion time and improving the conversion efficiency.

A first aspect of an embodiment of the present invention provides a document conversion method based on a linear model, including: receiving a document conversion instruction, wherein the document conversion instruction comprises a target identifier, and the target identifier is used for indicating that an original document of a current type needs to be converted into a target document of a target type; generating a target document tree according to the original document and the target mark, wherein the target document tree comprises view elements and content elements; invoking a preset linear model to convert the content elements of the target document tree into at least one target linear sequence; calling a preset box model to split the view elements of the target document tree into at least one target box; and generating a target document according to the at least one target box and the at least one target linear sequence, wherein the target document is of a target type.

Optionally, in a first implementation manner of the first aspect of the embodiment of the present invention, the generating a target document tree according to the original document and the target identifier, where the target document tree includes a view element and a content element includes: determining document elements included in the original document, wherein the document elements comprise view elements and content elements; and analyzing the view elements and the content elements according to a preset tree model to generate a target document tree.

Optionally, in a second implementation manner of the first aspect of the embodiment of the present invention, the parsing the view element and the content element according to a preset tree model to generate a target document tree includes: calling a preset storage format, wherein the preset storage format comprises a view format and a content format; obtaining view attributes of view elements according to the view format, wherein the view attributes comprise paragraph intervals, font colors, background colors and alignment modes; acquiring import content of a content element according to the content format, wherein the import content comprises characters, pictures and links; and determining the subordinate relation of the imported content according to the view attribute, and generating a target document tree.

Optionally, in a third implementation manner of the first aspect of the embodiment of the present invention, the invoking the preset linear model to convert the content element of the target document tree into at least one target linear sequence includes: when the content elements belong to different view elements, determining the view element corresponding to each content element in the target document tree to obtain at least two target view elements; determining the at least two target view elements as target parent nodes; determining a plurality of content elements corresponding to each target father node as corresponding target child nodes; arranging a plurality of corresponding target child nodes belonging to the same target parent node and the corresponding target parent nodes according to a preset linear model to generate at least two target linear sequences, wherein each target linear sequence corresponds to one target view element; arranging the at least two target linear sequences according to the sequence of the at least two target view elements; or when the content elements belong to the same view element, determining a target view element in the target document tree; determining the target view element as a target parent node; determining content elements in the target document tree as target child nodes; and arranging a plurality of target child nodes and target father nodes according to a preset linear model to generate a target linear sequence.

Optionally, in a fourth implementation manner of the first aspect of the embodiment of the present invention, the calling a preset box model to split the view element of the target document tree into at least one target box includes: calling a preset box model to analyze view elements of the target document tree to obtain at least one target box; acquiring filling attributes of each target box in the view element, wherein the filling attributes comprise left-right spacing, up-down spacing and inner spacing of a target box model; arranging the at least one target box in an order from top to bottom, each target box occupying a single row; the positions of the target boxes in each row are adjusted according to the associated pattern of each target box.

Optionally, in a fifth implementation manner of the first aspect of the embodiment of the present invention, the generating a target document according to the at least one target box and the at least one target linear sequence, where the target document is a target type includes: acquiring a document template page of a target type and a target format; importing the at least one target box into the document template page according to the target format; converting the current format of the target document tree into the target format based on the target format; sequentially adding view elements in a target format and content elements in the target format into a target box according to the sequence of the at least one target linear sequence; a target document of a target type is generated.

Optionally, in a sixth implementation manner of the first aspect of the embodiment of the present invention, after the generating a target document according to the at least one target box and the at least one target linear sequence, the target document is of a target type, the method further includes: acquiring a set recording attribute, wherein the recording attribute is used for recording an original index position of a target view in the original document; and when the target view changes, restoring the target document into the original document according to the recording attribute.

A second aspect of an embodiment of the present invention provides a document conversion apparatus based on a linear model, including: a receiving unit configured to receive a document conversion instruction, where the document conversion instruction includes a target identifier, where the target identifier is used to indicate that an original document of a current type needs to be converted into a target document of a target type; a first generation unit, configured to generate a target document tree according to the original document and the target identifier, where the target document tree includes a view element and a content element; the conversion unit is used for calling a preset linear model to convert the content elements of the target document tree into at least one target linear sequence; the splitting unit is used for calling a preset box model to split the view element of the target document tree into at least one target box; and the second generation unit is used for generating a target document according to the at least one target box and the at least one target linear sequence, wherein the target document is of a target type.

Optionally, in a first implementation manner of the second aspect of the embodiment of the present invention, the first generating unit includes: a determining module, configured to determine document elements included in the original document, where the document elements include a view element and a content element; and the analysis generating module is used for analyzing the view elements and the content elements according to a preset tree model to generate a target document tree.

Optionally, in a second implementation manner of the second aspect of the embodiment of the present invention, the parsing generating module is specifically configured to: calling a preset storage format, wherein the preset storage format comprises a view format and a content format; obtaining view attributes of view elements according to the view format, wherein the view attributes comprise paragraph intervals, font colors, background colors and alignment modes; acquiring import content of a content element according to the content format, wherein the import content comprises characters, pictures and links; and determining the subordinate relation of the imported content according to the view attribute, and generating a target document tree.

Optionally, in a third implementation manner of the second aspect of the embodiment of the present invention, the conversion unit is specifically configured to: when the content elements belong to different view elements, determining the view element corresponding to each content element in the target document tree to obtain at least two target view elements; determining the at least two target view elements as target parent nodes; determining a plurality of content elements corresponding to each target father node as corresponding target child nodes; arranging a plurality of corresponding target child nodes belonging to the same target parent node and the corresponding target parent nodes according to a preset linear model to generate at least two target linear sequences, wherein each target linear sequence corresponds to one target view element; arranging the at least two target linear sequences according to the sequence of the at least two target view elements; or when the content elements belong to the same view element, determining a target view element in the target document tree; determining the target view element as a target parent node; determining content elements in the target document tree as target child nodes; and arranging a plurality of target child nodes and target father nodes according to a preset linear model to generate a target linear sequence.

Optionally, in a fourth implementation manner of the second aspect of the embodiment of the present invention, the splitting unit is specifically configured to: calling a preset box model to analyze view elements of the target document tree to obtain at least one target box; acquiring filling attributes of each target box in the view element, wherein the filling attributes comprise left-right spacing, up-down spacing and inner spacing of a target box model; arranging the at least one target box in an order from top to bottom, each target box occupying a single row; the positions of the target boxes in each row are adjusted according to the associated pattern of each target box.

Optionally, in a fifth implementation manner of the second aspect of the embodiment of the present invention, the second generating unit is specifically configured to: acquiring a document template page of a target type and a target format; importing the at least one target box into the document template page according to the target format; converting the current format of the target document tree into the target format based on the target format; sequentially adding view elements in a target format and content elements in the target format into a target box according to the sequence of the at least one target linear sequence; a target document of a target type is generated.

Optionally, in a sixth implementation manner of the second aspect of the embodiment of the present invention, the document conversion device based on a linear model further includes: an obtaining unit, configured to obtain a set recording attribute, where the recording attribute is used to record an original index position of a target view in the original document; and the restoring unit is used for restoring the target document into the original document according to the recording attribute when the target view changes.

A third aspect of an embodiment of the present invention provides a linear model-based document conversion apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the linear model-based document conversion method according to any one of the foregoing embodiments when executing the computer program.

A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium storing a computer program, which when executed by a processor, implements the steps of the linear model-based document conversion method according to any one of the above embodiments.

In the technical scheme provided by the embodiment of the invention, a document conversion instruction is received, wherein the document conversion instruction comprises a target identifier, and the target identifier is used for indicating that an original document of a current type needs to be converted into a target document of a target type; generating a target document tree according to the original document and the target mark, wherein the target document tree comprises view elements and content elements; invoking a preset linear model to convert content elements of the target document tree into at least one target linear sequence; calling a preset box model to split view elements of the target document tree into at least one target box; and generating a target document according to the at least one target box and the at least one target linear sequence, wherein the target document is of a target type. According to the embodiment of the invention, the unified tree structure model is provided for identifying the original document, the document content is stored through the linear sequence model, the stored document content is converted into any document of a plurality of types, the mutual conversion among the documents of the plurality of types is supported, the document conversion time is shortened, and the conversion efficiency is improved.

Drawings

FIG. 1 is a schematic diagram of an embodiment of a document conversion method based on a linear model in an embodiment of the present invention;

FIG. 2 is a schematic diagram of another embodiment of a document conversion method based on a linear model in an embodiment of the present invention;

FIG. 3 is a schematic diagram of an embodiment of a document conversion apparatus based on a linear model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of another embodiment of a document conversion device based on a linear model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an embodiment of a document conversion apparatus based on a linear model in an embodiment of the present invention.

Detailed Description

In order to enable those skilled in the art to better understand the present invention, embodiments of the present invention will be described below with reference to the accompanying drawings.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, a flowchart of a document conversion method based on a linear model according to an embodiment of the present invention specifically includes:

101. and receiving a document conversion instruction, wherein the document conversion instruction comprises a target identifier, and the target identifier is used for indicating that the original document of the current type needs to be converted into the target document of the target type.

The server receives a document conversion instruction, wherein the document conversion instruction comprises a target identifier, and the target identifier is used for indicating that an original document of a current type needs to be converted into a target document of a target type. For example, the document conversion instructions may instruct to convert a text (WORD) type document to a portable document format (portable document format, PDF) type document, or to convert a text (WORD) type document to a hypertext markup language (hypertext markup language, HTML) type document, or to convert an HTML type document to a PDF document, as is not limited in this regard.

It should be noted that the present invention may also support mutual conversion between multiple types of documents, where the document conversion instruction may include multiple target identifiers, for example, the document conversion instruction includes both a WORD to HTML identifier and an HTML to PDF identifier, and may sequentially perform conversion processing according to the multiple target identifiers. Other similar identifiers may also be included, and embodiments of the invention are not limited to a particular type of document.

It is to be understood that the execution subject of the present invention may be a document conversion device based on a linear model, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.

102. A target document tree is generated from the original document and the target identifier, the target document tree including view elements and content elements.

Specifically, the server determines document elements included in the original document, wherein the document elements include view elements and content elements; and the server analyzes the view elements and the content elements according to a preset tree model to generate a target document tree. The server analyzes the view elements and the content elements according to a preset tree model, and the process of generating the target document tree comprises the following steps: the server calls a preset storage format which comprises a view format and a content format; the server acquires view attributes of view elements according to the view format, wherein the view attributes comprise paragraph intervals, font colors, background colors and alignment modes; the server acquires imported content of the content element according to the content format, wherein the imported content comprises characters, pictures and links; the server determines the affiliation of the imported content according to the view attribute, and generates a target document tree.

When the number of the target identifications is greater than 1, the generated target document tree is also greater than 1, and the number of the target document trees is the same as the number of the target identifications.

It should be noted that, firstly, it is clear what kind of elements can be included under the root directory of the document, and in the embodiment of the present invention, two kinds of elements are clear: view elements and content elements; secondly, to define the subordinate relation between the father node and the leaf node, in the embodiment of the invention, the content element can only be used as the leaf node of the view, the content can not be used as the leaf node alone, the father-son relation does not exist between the content element and the content element, and the father-son relation exists between the view element and the view element.

It will be appreciated that if the document type is a WORD document, the WORD document may be converted into XML format, and may be in other formats, and that the data format is not particularly required in the embodiments of the present invention (the parsing format provided by the WORD official API is XML), but the data structure must be a tree structure.

103. And calling a preset linear model to convert the content elements of the target document tree into at least one target linear sequence.

The server invokes a preset linear model to convert the content elements of the target document tree into at least one target linear sequence. The linear sequence stores the same level of elements of the same type, each element's coordinates in the document being represented by an offset, typically beginning at zero. Thus, the document can find the corresponding document element according to the root element offset. For example, for an article-like document (WORD-type document), the articles will typically have different chapters, and the different chapters are arranged in sequence, i.e., two different chapters are closely connected, and the completion of one chapter will automatically enter the next chapter; the same principle applies to paragraphs. Based on the analysis, the document can be considered to have a linear structure, the document is analyzed in a top-down mode, the data structure of the document is analyzed and marked in the analysis process, the same type elements are marked as brother nodes, and the same type elements are sequentially stored as a linear sequence according to a linear relation.

It should be noted that, the number of the linear sequences corresponds to the number of the view elements, and the view elements to which the content elements belong may be the same or may be different in different numbers of the view elements, so that the conversion process of the linear sequences is different. When the content elements belong to different view elements, the server determines the view element corresponding to each content element in the target document tree to obtain at least two target view elements; the server determines at least two target view elements as target parent nodes; the server determines a plurality of content elements corresponding to each target father node as corresponding target child nodes; the server arranges a plurality of corresponding target child nodes belonging to the same target parent node and the corresponding target parent nodes according to a preset linear model to generate at least two target linear sequences, wherein each target linear sequence corresponds to one target view element; the server arranges the at least two target linear sequences according to the sequence of the at least two target view elements. For another example, when the content elements belong to the same view element, the server determines a target view element in the target document tree; the server determines the target view element as a target father node; the server determines content elements in the target document tree as target child nodes; the server arranges a plurality of target child nodes and target father nodes according to a preset linear model to generate a target linear sequence.

For example, when a similar content arrangement occurs in a document: text 1- > table 1- > picture- > text 2- > text 3- > table 2, the linear model was used to analyze the following two cases, respectively:

scene one: if the content elements in the document all belong to the same view element A, the following linear sequence model is obtained: view a { text 1- > table 1- > picture- > text 2- > text 3- > table 2}.

Scene II: if the document contents belong to two different view elements, the following linear sequence model is obtained: view a { text 1- > table 1- > picture } - > view B { text 2- > text 3- > table 2}. In this case, views a and B are arranged sequentially, and then the elements in views a and B are arranged according to a linear sequence model, respectively, so that the data can be organized into an initial state.

104. And calling a preset box model to split the view elements of the target document tree into at least one target box.

The server invokes a preset box model to split the view elements of the target document tree into at least one target box, wherein each target box occupies a single row. The box model treats the target document as a box, which encapsulates the actual content and the elements between the actual content, and which includes: spacing, borders, and actual content. Specifically, the server calls a preset box model to analyze view elements of the target document tree to obtain at least one target box; the server acquires filling attributes of each target box in the view element, wherein the filling attributes comprise left-right spacing, up-down spacing and inner spacing of the target box model; the server arranges at least one target box in an order from top to bottom, and each target box occupies one row independently; the server adjusts the positions of the target boxes in each row according to the associated pattern of each target box.

For example, for WORD documents: WORD is typically composed of a series of chapters in which content locations are typically controlled by tabs that control the left-right indentation of the content in the WORD page, which may be controlled to adjust the left-middle-right location of the content, the left-right spacing of the corresponding box models; the chapter and the chapter control the typesetting of the content through the front section and the rear section, wherein the front section and the rear section control the content to be typeset in the document in a top-setting, middle-setting and bottom-setting related mode, and correspond to the upper-lower spacing of the box model; the typesetting of the content and the content, namely the internal spacing of the box model, can be controlled by setting the paragraph line spacing.

It should be noted that the box model regards the target document as being composed of individual box models, and for the box element, if there is no special setting, it defaults to always monopolize one line, and the width is the width of the document page. Elements before and after it can only be arranged above or below it by default. The actual content is the center of the box model, which presents the main information content of the box, and the actual content and the box control the spacing of the actual content from the side frames of each direction through the filling attribute. The box model can be understood as a box containing content, using the WORD document example below: considering a WORD document page as a box, the size of which is approximately that of an A4 sheet, the box is now filled with content, which is also considered as a small box. The box model can be used for designating content arrangement according to a set rule, the box defaults to exclusive one row, and the box model can uniquely determine the arrangement position of the next box through the filling attribute of the box.

105. And generating a target document according to the at least one target box and the at least one target linear sequence, wherein the target document is of a target type.

The server generates a target document according to at least one target box and at least one target linear sequence, wherein the target document is of a target type. Specifically, a server acquires a document template page of a target type and a target format; the server imports at least one target box into a document template page according to a target format; the server converts the current format of the target document tree into a target format based on the target format; the server adds view elements in the target format and content elements in the target format into the target box in sequence according to at least one target linear sequence; the server generates a target document of the target type.

It should be noted that the main function of the linear model is not to directly construct the target document, but to sequentially add document elements (view elements and content elements) to the document page in order of order. The working principle of the unified document is that elements with the same father node are sequentially inserted into a new document through a document tree, and a document with the same structure can be obtained at the moment, and the document can be an HTML type document or other types of documents. In a specific province process, a translation process is needed, namely, translating the data type and style definition in the unified document structure into the data type and style definition of the specified document. That is, only one document is parsed into a defined format, which is then translated into the desired target document via the translation interface.

According to the embodiment of the invention, the unified tree structure model is provided for identifying the original document, the document content is stored through the linear sequence model, the stored document content is converted into any document of a plurality of types, the mutual conversion among the documents of the plurality of types is supported, the document conversion time is shortened, and the conversion efficiency is improved.

Referring to fig. 2, another flowchart of a document conversion method based on a linear model according to an embodiment of the present invention specifically includes:

201. and receiving a document conversion instruction, wherein the document conversion instruction comprises a target identifier, and the target identifier is used for indicating that the original document of the current type needs to be converted into the target document of the target type.

202. A target document tree is generated from the original document and the target identifier, the target document tree including view elements and content elements.

203. And calling a preset linear model to convert the content elements of the target document tree into at least one target linear sequence.

204. And calling a preset box model to split the view elements of the target document tree into at least one target box.

205. And generating a target document according to the at least one target box and the at least one target linear sequence, wherein the target document is of a target type.

206. And acquiring the set recording attribute, wherein the recording attribute is used for recording the original index position of the target view in the original document.

The server acquires the set recording attribute, wherein the recording attribute is used for recording the original index position of the target view in the original document. In support of customizing the target document, a recording attribute can be added to record the index position of the view in the original document, and even if the view of the target document is modified, the view can still be restored to the original document during conversion.

It should be noted that, the relevant attribute of the initial document corresponding to each generated target document can be maintained, and once the target document is wrong, the initial document can be restored, thereby reducing the loss.

207. When the target view changes, the target document is restored to the original document according to the recording attribute.

When the target view changes, the server restores the target document to the original document according to the recording attribute.

The document conversion method based on the linear model in the embodiment of the present invention is described above, and the document conversion device based on the linear model in the embodiment of the present invention is described below, referring to fig. 3, an embodiment of the document conversion device based on the linear model in the embodiment of the present invention includes:

a receiving unit 301, configured to receive a document conversion instruction, where the document conversion instruction includes a target identifier, where the target identifier is used to indicate that an original document of a current type needs to be converted into a target document of a target type;

a first generating unit 302, configured to generate a target document tree according to the original document and the target identifier, where the target document tree includes a view element and a content element;

a conversion unit 303, configured to invoke a preset linear model to convert the content element of the target document tree into at least one target linear sequence;

a splitting unit 304, configured to invoke a preset box model to split the view element of the target document tree into at least one target box;

a second generating unit 305, configured to generate a target document according to the at least one target box and the at least one target linear sequence, where the target document is a target type.

In the embodiment of the invention, when the batch import data is abnormal, abnormal data are removed and normal data of the same batch data are re-imported, so that the whole rollback of the same batch data is avoided, and the data import efficiency is improved.

Referring to fig. 4, another embodiment of a document conversion apparatus based on a linear model according to an embodiment of the present invention includes:

Optionally, the first generating unit 302 includes:

a determining module 3021 configured to determine document elements included in the original document, where the document elements include a view element and a content element;

and the parsing generation module 3022 is configured to parse the view element and the content element according to a preset tree model, so as to generate a target document tree.

Optionally, the parsing generation module 3022 is specifically configured to:

calling a preset storage format, wherein the preset storage format comprises a view format and a content format; obtaining view attributes of view elements according to the view format, wherein the view attributes comprise paragraph intervals, font colors, background colors and alignment modes; acquiring import content of a content element according to the content format, wherein the import content comprises characters, pictures and links; and determining the subordinate relation of the imported content according to the view attribute, and generating a target document tree.

Optionally, the conversion unit 303 is specifically configured to:

when the content elements belong to different view elements, determining the view element corresponding to each content element in the target document tree to obtain at least two target view elements; determining the at least two target view elements as target parent nodes; determining a plurality of content elements corresponding to each target father node as corresponding target child nodes; arranging a plurality of corresponding target child nodes belonging to the same target parent node and the corresponding target parent nodes according to a preset linear model to generate at least two target linear sequences, wherein each target linear sequence corresponds to one target view element; arranging the at least two target linear sequences according to the sequence of the at least two target view elements; or when the content elements belong to the same view element, determining a target view element in the target document tree; determining the target view element as a target parent node; determining content elements in the target document tree as target child nodes; and arranging a plurality of target child nodes and target father nodes according to a preset linear model to generate a target linear sequence.

Optionally, the splitting unit 304 is specifically configured to:

calling a preset box model to analyze view elements of the target document tree to obtain at least one target box; acquiring filling attributes of each target box in the view element, wherein the filling attributes comprise left-right spacing, up-down spacing and inner spacing of a target box model; arranging the at least one target box in an order from top to bottom, each target box occupying a single row; the positions of the target boxes in each row are adjusted according to the associated pattern of each target box.

Optionally, the second generating unit 305 is specifically configured to:

acquiring a document template page of a target type and a target format; importing the at least one target box into the document template page according to the target format; converting the current format of the target document tree into the target format based on the target format; sequentially adding view elements in a target format and content elements in the target format into a target box according to the sequence of the at least one target linear sequence; a target document of a target type is generated.

Optionally, the document conversion device based on the linear model further comprises:

an obtaining unit 306, configured to obtain a set recording attribute, where the recording attribute is used to record an original index position of the target view in the original document;

And a restoring unit 307 for restoring the target document to the original document according to the recording attribute when the target view changes.

In the embodiment of the invention, a target data set is acquired, wherein the target data set comprises a plurality of pieces of data to be imported, and the data to be imported is business data needing to be imported into a database; determining a data format of the target data set; mapping each piece of data to be imported into a plurality of attributes of the preset class through the preset class to generate a plurality of examples, wherein each example comprises one piece of data to be imported; importing a plurality of instances into a database according to a data format; monitoring the importing process of a plurality of examples through a preset function, and storing importing results into corresponding examples; when at least one instance is abnormal in the importing process, importing normal data in other instances in the target data set into a database through a preset classification algorithm. In the embodiment of the invention, when the batch import data is abnormal, abnormal data are removed and normal data of the same batch data are re-imported, so that the whole rollback of the same batch data is avoided, and the data import efficiency is improved.

The document conversion apparatus based on the linear model in the embodiment of the present invention is described in detail from the point of view of the modularized functional entity in the above fig. 3 to 4, and the document conversion device based on the linear model in the embodiment of the present invention is described in detail from the point of view of the hardware processing.

Fig. 5 is a schematic structural diagram of a document conversion device based on a linear model according to an embodiment of the present invention, where the document conversion device 500 based on a linear model may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 501 (e.g., one or more processors) and a memory 509, and one or more storage media 508 (e.g., one or more mass storage devices) storing application programs 507 or data 506. Wherein the memory 509 and storage medium 508 may be transitory or persistent storage. The program stored on the storage medium 508 may include one or more modules (not shown), each of which may include a series of instruction operations on a linear model-based document conversion device. Still further, the processor 501 may be configured to communicate with the storage medium 508 and execute a series of instruction operations in the storage medium 508 on the linear model-based document conversion device 500.

The linear model based document conversion device 500 may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input/output interfaces 504, and/or one or more operating systems 505, such as Windows Server, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the linear model based document conversion device structure shown in FIG. 5 does not constitute a limitation of the linear model based document conversion device and may include more or less components than illustrated, or may combine certain components, or may be a different arrangement of components. The processor 501 may perform the functions of the first generating unit 302, the converting unit 303, the splitting unit 304, the second generating unit 305, the acquiring unit 306, and the restoring unit 307 in the above-described embodiments.

The respective constituent elements of the document conversion apparatus based on the linear model are specifically described below with reference to fig. 5:

the processor 501 is a control center of the linear model-based document conversion apparatus, and can process according to the set linear model-based document conversion method. The processor 501 connects various parts of the entire linear model-based document conversion apparatus using various interfaces and lines, performs various functions and processes data of the linear model-based document conversion apparatus by running or executing software programs and/or modules stored in the memory 509 and calling data stored in the memory 509, thereby supporting inter-conversion between various types of documents, reducing document conversion time, and improving conversion efficiency. The storage medium 508 and the memory 509 are both carriers for storing data, and in the embodiment of the present invention, the storage medium 508 may refer to an internal memory with a small storage capacity but a fast speed, and the memory 509 may be an external memory with a large storage capacity but a slow storage speed.

The memory 509 may be used to store software programs and modules, and the processor 501 performs various functional applications and data processing of the linear model-based document conversion device 500 by running the software programs and modules stored in the memory 509. The memory 509 may mainly comprise a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as calling a preset box model to split the view element of the target document tree into at least one target box), etc.; the storage data area may store data created from the use of the linear model-based document conversion device (such as a target document tree, etc.), and the like. In addition, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. The linear model based document conversion method program and received data stream provided in the embodiment of the present invention are stored in the memory, and the processor 501 is called from the memory 509 when necessary for use.

When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, twisted pair), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., an optical disk), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiment of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A document conversion method based on a linear model, comprising:

receiving a document conversion instruction, wherein the document conversion instruction comprises a target identifier, and the target identifier is used for indicating that an original document of a current type needs to be converted into a target document of a target type;

generating a target document tree according to the original document and the target mark, wherein the target document tree comprises view elements and content elements;

invoking a preset linear model to convert the content elements of the target document tree into at least one target linear sequence;

wherein said invoking a preset linear model converts said content elements of the target document tree into at least one target linear sequence comprising:

When the content elements belong to different view elements, determining the view element corresponding to each content element in the target document tree to obtain at least two target view elements;

determining the at least two target view elements as target parent nodes;

determining a plurality of content elements corresponding to each target father node as corresponding target child nodes;

arranging a plurality of corresponding target child nodes belonging to the same target parent node and the corresponding target parent nodes according to a preset linear model to generate at least two target linear sequences, wherein each target linear sequence corresponds to one target view element;

arranging the at least two target linear sequences according to the sequence of the at least two target view elements;

or alternatively, the first and second heat exchangers may be,

determining a target view element in the target document tree when the content elements belong to the same view element;

determining the target view element as a target parent node;

determining content elements in the target document tree as target child nodes;

arranging a plurality of target child nodes and target father nodes according to a preset linear model to generate a target linear sequence;

Calling a preset box model to split the view elements of the target document tree into at least one target box;

the calling a preset box model to split the view element of the target document tree into at least one target box comprises the following steps:

calling a preset box model to analyze view elements of the target document tree to obtain at least one target box;

acquiring filling attributes of each target box in the view element, wherein the filling attributes comprise left-right spacing, up-down spacing and inner spacing of a target box model;

arranging the at least one target box in an order from top to bottom, each target box occupying a single row;

adjusting the positions of the target boxes in each row according to the related pattern of each target box;

generating a target document according to the at least one target box and the at least one target linear sequence, wherein the target document is of a target type;

wherein the generating a target document according to the at least one target box and the at least one target linear sequence, the target document being of a target type, comprises:

acquiring a document template page of a target type and a target format;

Importing the at least one target box into the document template page according to the target format;

converting the current format of the target document tree into the target format based on the target format;

sequentially adding view elements in a target format and content elements in the target format into a target box according to the sequence of the at least one target linear sequence;

a target document of a target type is generated.

2. The linear model based document transformation method of claim 1, wherein the generating a target document tree from the original document and the target identity, the target document tree including view elements and content elements, comprises:

determining document elements included in the original document, wherein the document elements comprise view elements and content elements;

and analyzing the view elements and the content elements according to a preset tree model to generate a target document tree.

3. The linear model-based document transformation method according to claim 2, wherein the parsing the view element and the content element according to a preset tree model to generate a target document tree comprises:

calling a preset storage format, wherein the preset storage format comprises a view format and a content format;

Obtaining view attributes of view elements according to the view format, wherein the view attributes comprise paragraph intervals, font colors, background colors and alignment modes;

acquiring import content of a content element according to the content format, wherein the import content comprises characters, pictures and links;

and determining the subordinate relation of the imported content according to the view attribute, and generating a target document tree.

4. A linear model based document conversion method according to any one of claims 1-3, wherein after said generating a target document from said at least one target box and said at least one target linear sequence, said target document being of a target type, said method further comprises:

acquiring a set recording attribute, wherein the recording attribute is used for recording an original index position of a target view in the original document;

and when the target view changes, restoring the target document into the original document according to the recording attribute.

5. A document conversion apparatus based on a linear model, comprising:

a receiving unit configured to receive a document conversion instruction, where the document conversion instruction includes a target identifier, where the target identifier is used to indicate that an original document of a current type needs to be converted into a target document of a target type;

A first generation unit, configured to generate a target document tree according to the original document and the target identifier, where the target document tree includes a view element and a content element;

the conversion unit is used for calling a preset linear model to convert the content elements of the target document tree into at least one target linear sequence;

determining the at least two target view elements as target parent nodes;

Or alternatively, the first and second heat exchangers may be,

determining the target view element as a target parent node;

determining content elements in the target document tree as target child nodes;

the splitting unit is used for calling a preset box model to split the view element of the target document tree into at least one target box;

A second generating unit, configured to generate a target document according to the at least one target box and the at least one target linear sequence, where the target document is a target type;

acquiring a document template page of a target type and a target format;

a target document of a target type is generated.

6. A linear model based document conversion device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the linear model based document conversion method according to any one of claims 1-4 when executing the computer program.

7. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which when executed by a processor implements the linear model based document conversion method according to any of claims 1-4.