CN107977346B - PDF document editing method and terminal equipment - Google Patents

PDF document editing method and terminal equipment Download PDF

Info

Publication number
CN107977346B
CN107977346B CN201711182015.4A CN201711182015A CN107977346B CN 107977346 B CN107977346 B CN 107977346B CN 201711182015 A CN201711182015 A CN 201711182015A CN 107977346 B CN107977346 B CN 107977346B
Authority
CN
China
Prior art keywords
text
line
data structure
editing
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711182015.4A
Other languages
Chinese (zh)
Other versions
CN107977346A (en
Inventor
李譞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yitu Software Co.,Ltd.
Original Assignee
Shenzhen Yitu Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yitu Software Co ltd filed Critical Shenzhen Yitu Software Co ltd
Priority to CN201711182015.4A priority Critical patent/CN107977346B/en
Publication of CN107977346A publication Critical patent/CN107977346A/en
Application granted granted Critical
Publication of CN107977346B publication Critical patent/CN107977346B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04812Interaction techniques based on cursor appearance or behaviour, e.g. being affected by the presence of displayed objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/189Automatic justification

Abstract

The invention is suitable for the technical field of communication, and provides a PDF document editing method and a terminal device, wherein the method comprises the following steps: analyzing a PDF document to obtain a logic relation of text object contents in the PDF document; according to the logical relation, a first data structure and a cursor data structure corresponding to the first data structure are constructed; receiving an editing instruction; editing the target text object according to the editing instruction and the cursor data structure, and acquiring a second data structure according to an editing result and the first data structure; rearranging the text objects in the PDF document according to the second data structure; and saving the rearranged result into the PDF document. The invention can complete the editing of the PDF document with high fidelity.

Description

PDF document editing method and terminal equipment
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a PDF document editing method and terminal equipment.
Background
A Portable Document Format (PDF) is a common layout Document, and the page content in the Document is fixed. The method has the advantages of cross-platform and capability of retaining the original format of the file and the open document format standard. However, because the page content in the PDF document is located according to the coordinates of the text objects within the page, the precise locations of the text objects are recorded, and there is no relationship between the text objects, so that the PDF document is not easy to edit.
At present, when a PDF document is edited, contents in a page of the PDF document are usually converted into streaming document data, the streaming document is edited, and the contents are converted back to the page of the PDF document after the editing is completed.
Disclosure of Invention
In view of this, embodiments of the present invention provide a PDF document editing method and a terminal device, so as to solve the problem in the prior art that editing of a PDF document cannot be completed with high fidelity.
A first aspect of an embodiment of the present invention provides a PDF document editing method, including:
analyzing a PDF document to obtain a logic relation of text object contents in the PDF document;
according to the logical relation, a first data structure and a cursor data structure corresponding to the first data structure are constructed;
receiving an editing instruction;
editing a target text object according to the editing instruction and the cursor data structure, and acquiring a second data structure according to an editing result and the first data structure;
rearranging the text objects in the PDF document according to the second data structure;
and saving the rearranged result into the PDF document.
Optionally, the first data structure and the second data structure are both tree data structures, and the tree data structure includes:
a root container containing one or more text streams therein;
the text stream comprises one or more text blocks with the same writing direction;
the text block comprises one or more text paragraphs;
the text paragraph comprises one or more text lines;
the text line contains one or more words;
one or more text objects are contained in the words;
the words contain the mapping relationship between the virtual characters and the actual characters in the text object.
Optionally, the editing the target text object according to the editing instruction and the cursor data structure includes:
and the editing instruction comprises cursor information, if the cursor information is single cursor information, an operation position is determined according to the single cursor information and the cursor data structure, and the target text object is edited according to the operation position.
Optionally, the editing the target text object according to the editing instruction and the cursor data structure includes:
the editing instruction comprises cursor information, if the cursor information is one or more groups of cursor group information, an operation range is determined according to the one or more groups of cursor group information and the cursor data structure, a line order is determined according to a first data structure, and the operation range is sequenced according to the line order;
decomposing the operation range into ranges in a plurality of paragraphs, and forwarding the operation corresponding to the editing instruction and the ranges in the paragraphs to each related paragraph;
decomposing the range in the paragraph into ranges in a plurality of text lines, and forwarding the operation corresponding to the editing instruction and the range in the text lines to each related text line;
resolving the range in the text line into ranges in a plurality of words, and forwarding the operation corresponding to the editing instruction and the ranges in the words to each related word;
and editing the target text object according to the operation corresponding to the editing instruction and the range in the word.
Optionally, the rearranging the text objects in the PDF document according to the second data structure specifically includes:
determining a line text sequence of the text according to the second data structure;
determining the initial position of the text object needing to be typeset again;
rearranging the text objects within the words includes:
moving the text objects in the words according to the line sequence, so that the text object in the next word is close to the tail of the text object in the previous word, and the text objects in all the words are arranged into a straight line on the character base line according to the writing direction of the characters;
rearranging the text objects within the text line includes:
setting the boundaries of the rows; the boundary comprises a line head boundary and a line tail boundary;
if the current text line does not comprise words, ending the typesetting of the current text line;
if the current text line comprises words, moving the head of the first word in the line to the head boundary of the line;
performing a text object within a word rearrangement step on the first word;
when the first word exceeds the line tail boundary, cutting the text object in the first word at the boundary, moving the text object in the first word exceeding the line tail boundary to the next line, and placing the text object in the first word exceeding the line tail boundary on the head boundary of the next line; ending the typesetting of the current line;
if the number of the words in the text line is 1, moving all the words in the next non-empty line to the current text line, and finishing the typesetting again of the current line;
if the number of the words in the text line is more than 2, moving the position of the second word to enable the second word to be close to the first word and to be on the same character base line with the first word;
performing a text object within a word rearrangement step on a second word;
when the second word exceeds the tail boundary of the line, moving the second word to the front of the first word of the next line, and finishing the typesetting of the previous line;
moving the position of the third word to enable the third word to be close to the first word and to be on the same character base line with the second word, and so on until the current text line is typeset again and is finished;
rearranging the text objects within the text passage includes:
performing a re-layout step within the text line for each line in the paragraph;
when the starting position of the text paragraph rearrangement is the first line of the text paragraph, moving the position of the first line to ensure that the first line is attached to the upper boundary of the text paragraph;
searching from the last row to the front, and removing the empty row;
rearranging the text objects within the text block includes:
performing a text paragraph rearrangement step on each text paragraph in the text block;
when the starting position of the rearranged text block is the first section of the text block, moving the position of the first section to ensure that the first section is attached to the upper boundary of the text block;
and adjusting the position of the text paragraph in the text block to ensure that the distance between the lower boundary of the previous paragraph and the upper boundary of the next paragraph is the set line distance.
Optionally, the method further includes:
recording rollback data and corresponding editing operation;
receiving a revocation instruction;
and executing the revocation operation according to the revocation instruction.
Optionally, before the re-typeset result is stored in the PDF document, the method further includes:
and when no editing instruction is detected to be input, exiting the editing state.
A second aspect of an embodiment of the present invention provides a PDF document editing apparatus, including:
the analysis unit is used for analyzing the PDF document to obtain the logical relationship of the text object content in the PDF document;
the construction unit is used for constructing a first data structure and a cursor data structure corresponding to the first data structure according to the logical relation;
an instruction receiving unit for receiving an editing instruction;
the editing unit is used for editing the target text object according to the editing instruction and the cursor data structure and acquiring a second data structure according to an editing result and the first data structure;
the typesetting unit is used for typesetting the text objects in the PDF document again according to the second data structure;
and the storage unit is used for storing the re-typeset result into the PDF document.
A third aspect of an embodiment of the present invention provides a PDF document editing terminal device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the computer program.
A fourth aspect of embodiments of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method as described above.
Compared with the prior art, the embodiment of the invention has the following beneficial effects: the embodiment of the invention obtains the logic relation of the content of the text object in the PDF document by analyzing the PDF document, constructs a first data structure and a cursor data structure corresponding to the first data structure according to the logic relation, edits the target text object according to the editing instruction and the cursor data structure when editing the text object, obtains a second data structure according to the editing result and the first data structure, re-typesets the text object in the PDF document according to the second data structure, and finally saves the typeset result in the PDF document, thereby realizing the editing of the PDF document. The embodiment of the invention does not need to convert the PDF document into the streaming document, but directly edits the PDF document, thereby finishing the editing of the PDF document with high fidelity, and automatically typesetting the text objects in the PDF document without manual typesetting after finishing the editing according to the editing instruction, thereby saving time and labor.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flowchart of an implementation of a PDF document editing method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a PDF document editing apparatus according to a second embodiment of the present invention;
fig. 3 is a schematic diagram of a PDF document editing terminal device according to a third embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a PDF document editing method according to an embodiment of the present invention, where the method includes:
step S101, analyzing the PDF document to obtain the logic relation of the text object content in the PDF document.
In the embodiment of the present invention, the logical relationship of the text object contents in the PDF document refers to the logical relationship between words, text lines and text paragraphs, text blocks, text streams, and the like in the PDF document.
And S102, constructing a first data structure and a cursor data structure corresponding to the first data structure according to the logical relationship.
Optionally, the first data structure and the second data structure are both tree data structures, and the tree data structure includes:
a root container containing one or more text streams therein;
the text stream comprises one or more text blocks with the same writing direction;
the text block comprises one or more text paragraphs;
the text paragraph comprises one or more text lines;
the text line contains one or more words;
one or more text objects are contained in the words;
the words contain the mapping relationship between the virtual characters and the actual characters in the text object.
In the embodiment of the invention, the data structure is a way for storing and organizing data by a computer, and the tree-shaped data structure is a nonlinear data structure and can represent one-to-many relationship between data elements. The first data structure records the line order of the PDF documents. The cursor data structure corresponding to the first data structure can accurately locate the position of any word by recording the text objects in the corresponding text block, the text objects in the text paragraph, the text objects in the text line, and the text objects in the word, and their subscripts and virtual subscripts.
In step S103, an editing instruction is received.
In embodiments of the present invention, editing instructions include, but are not limited to, adding a new text object, adding characters to a text object, segmenting a text object, deleting a text object, moving the position of a text object in a page, setting properties of a text object, adding a new paragraph, and merging paragraphs.
And step S104, editing the target text object according to the editing instruction and the cursor data structure, and acquiring a second data structure according to an editing result and the first data structure.
In the embodiment of the invention, the position or the range for executing the editing operation is determined through the cursor data structure, then the target text object is edited according to the editing instruction, and the first data structure is modified according to the editing result to obtain the second data structure, wherein the second data structure corresponds to the text object in the edited PDF document.
Optionally, the editing the target text object according to the editing instruction and the cursor data structure includes:
and the editing instruction comprises cursor information, if the cursor information is single cursor information, an operation position is determined according to the single cursor information and the cursor data structure, and the target text object is edited according to the operation position.
Optionally, the editing the target text object according to the editing instruction and the cursor data structure includes:
the editing instruction comprises cursor information, if the cursor information is one or more groups of cursor group information, an operation range is determined according to the one or more groups of cursor group information and the cursor data structure, a line order is determined according to a first data structure, and the operation range is sequenced according to the line order;
decomposing the operation range into ranges in a plurality of paragraphs, and forwarding the operation corresponding to the editing instruction and the ranges in the paragraphs to each related paragraph;
decomposing the range in the paragraph into ranges in a plurality of text lines, and forwarding the operation corresponding to the editing instruction and the range in the text lines to each related text line;
resolving the range in the text line into ranges in a plurality of words, and forwarding the operation corresponding to the editing instruction and the ranges in the words to each related word;
and editing the target text object according to the operation corresponding to the editing instruction and the range in the word.
Step S105, rearranging the text objects in the PDF document according to the second data structure.
Optionally, the implementation manner of step S105 is:
determining a line text sequence of the text according to the second data structure;
determining the initial position of the text object needing to be typeset again;
rearranging the text objects within the words includes:
moving the text objects in the words according to the line sequence, so that the text object in the next word is close to the tail of the text object in the previous word, and the text objects in all the words are arranged into a straight line on the character base line according to the writing direction of the characters;
rearranging the text objects within the text line includes:
setting the boundaries of the rows; the boundary comprises a line head boundary and a line tail boundary;
if the current text line does not comprise words, ending the typesetting of the current text line;
if the current text line comprises words, moving the head of the first word in the line to the head boundary of the line;
performing a text object within a word rearrangement step on the first word;
when the first word exceeds the line tail boundary, cutting the text object in the first word at the boundary, moving the text object in the first word exceeding the line tail boundary to the next line, and placing the text object in the first word exceeding the line tail boundary on the head boundary of the next line; ending the typesetting of the current line;
if the number of the words in the text line is 1, moving all the words in the next non-empty line to the current text line, and finishing the typesetting again of the current line;
if the number of the words in the text line is more than 2, moving the position of the second word to enable the second word to be close to the first word and to be on the same character base line with the first word;
performing a text object within a word rearrangement step on a second word;
when the second word exceeds the tail boundary of the line, moving the second word to the front of the first word of the next line, and finishing the typesetting of the previous line;
moving the position of the third word to enable the third word to be close to the first word and to be on the same character base line with the second word, and so on until the current text line is typeset again and is finished;
rearranging the text objects within the text passage includes:
performing a re-layout step within the text line for each line in the paragraph;
when the starting position of the text paragraph rearrangement is the first line of the text paragraph, moving the position of the first line to ensure that the first line is attached to the upper boundary of the text paragraph;
searching from the last row to the front, and removing the empty row;
rearranging the text objects within the text block includes:
performing a text paragraph rearrangement step on each text paragraph in the text block;
when the starting position of the rearranged text block is the first section of the text block, moving the position of the first section to ensure that the first section is attached to the upper boundary of the text block;
and adjusting the position of the text paragraph in the text block to ensure that the distance between the lower boundary of the previous paragraph and the upper boundary of the next paragraph is the set line distance.
And step S106, storing the rearranged result into the PDF document.
The embodiment of the invention obtains the logic relation of the content of the text object in the PDF document by analyzing the PDF document, constructs a first data structure and a cursor data structure corresponding to the first data structure according to the logic relation, edits the target text object according to the editing instruction and the cursor data structure when editing the text object, obtains a second data structure according to the editing result and the first data structure, re-typesets the text object in the PDF document according to the second data structure, and finally saves the typeset result in the PDF document, thereby realizing the editing of the PDF document. The embodiment of the invention does not need to convert the PDF document into the streaming document, but directly edits the PDF document, thereby finishing the editing of the PDF document with high fidelity, and automatically typesetting the text objects in the PDF document without manual typesetting after finishing the editing according to the editing instruction, thereby saving time and labor.
Optionally, the method further includes: recording rollback data and corresponding editing operation; receiving a revocation instruction; and executing the revocation operation according to the revocation instruction.
In the embodiment of the invention, the rollback data and the corresponding editing operation are recorded in the editing and typesetting process, so that the undoing of the executed editing operation can be realized.
Optionally, before step S106, the method further includes: and when no editing instruction is detected to be input, exiting the editing state.
In the embodiment of the invention, after the typesetting is completed again, a third data structure and a second cursor data structure corresponding to the third data structure are constructed according to the typesetting result and the second data structure, and when a new editing instruction is detected to be input, the text object in the PDF document is edited and typesetted again according to the new editing instruction, the third data structure and the second cursor data structure; and when no new editing instruction is input, exiting the editing state, and eliminating the constructed first data structure, second data structure, third data structure, first cursor data structure and second cursor data structure.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Example two
Referring to fig. 2, fig. 2 is a block diagram of a PDF document editing apparatus according to a second embodiment of the present invention, where the apparatus includes:
an analysis unit 201, configured to analyze a PDF document to obtain a logical relationship of text object contents in the PDF document;
a constructing unit 202, configured to construct a first data structure and a cursor data structure corresponding to the first data structure according to the logical relationship;
an instruction receiving unit 203 for receiving an editing instruction;
the editing unit 204 is configured to edit the target text object according to the editing instruction and the cursor data structure, and obtain a second data structure according to an editing result and the first data structure;
a typesetting unit 205, configured to re-typeset the text objects in the PDF document according to the second data structure;
a saving unit 206, configured to save the re-typeset result in the PDF document.
Optionally, the first data structure and the second data structure are both tree data structures, and the tree data structure includes:
a root container containing one or more text streams therein;
the text stream comprises one or more text blocks with the same writing direction;
the text block comprises one or more text paragraphs;
the text paragraph comprises one or more text lines;
the text line contains one or more words;
one or more text objects are contained in the words;
the words contain the mapping relationship between the virtual characters and the actual characters in the text object.
Optionally, the editing the target text object according to the editing instruction and the cursor data structure includes:
and the editing instruction comprises cursor information, if the cursor information is single cursor information, an operation position is determined according to the single cursor information and the cursor data structure, and the target text object is edited according to the operation position.
Optionally, the editing the target text object according to the editing instruction and the cursor data structure includes:
the editing instruction comprises cursor information, if the cursor information is one or more groups of cursor group information, an operation range is determined according to the one or more groups of cursor group information and the cursor data structure, a line order is determined according to a first data structure, and the operation range is sequenced according to the line order;
decomposing the operation range into ranges in a plurality of paragraphs, and forwarding the operation corresponding to the editing instruction and the ranges in the paragraphs to each related paragraph;
decomposing the range in the paragraph into ranges in a plurality of text lines, and forwarding the operation corresponding to the editing instruction and the range in the text lines to each related text line;
resolving the range in the text line into ranges in a plurality of words, and forwarding the operation corresponding to the editing instruction and the ranges in the words to each related word;
and editing the target text object according to the operation corresponding to the editing instruction and the range in the word.
Optionally, the typesetting unit 205 is specifically configured to determine a line order of the text according to the second data structure;
determining the initial position of the text object needing to be typeset again;
rearranging the text objects within the words includes:
moving the text objects in the words according to the line sequence, so that the text object in the next word is close to the tail of the text object in the previous word, and the text objects in all the words are arranged into a straight line on the character base line according to the writing direction of the characters;
rearranging the text objects within the text line includes:
setting the boundaries of the rows; the boundary comprises a line head boundary and a line tail boundary;
if the current text line does not comprise words, ending the typesetting of the current text line;
if the current text line comprises words, moving the head of the first word in the line to the head boundary of the line;
performing a text object within a word rearrangement step on the first word;
when the first word exceeds the line tail boundary, cutting the text object in the first word at the boundary, moving the text object in the first word exceeding the line tail boundary to the next line, and placing the text object in the first word exceeding the line tail boundary on the head boundary of the next line; ending the typesetting of the current line;
if the number of the words in the text line is 1, moving all the words in the next non-empty line to the current text line, and finishing the typesetting again of the current line;
if the number of the words in the text line is more than 2, moving the position of the second word to enable the second word to be close to the first word and to be on the same character base line with the first word;
performing a text object within a word rearrangement step on a second word;
when the second word exceeds the tail boundary of the line, moving the second word to the front of the first word of the next line, and finishing the typesetting of the previous line;
moving the position of the third word to enable the third word to be close to the first word and to be on the same character base line with the second word, and so on until the current text line is typeset again and is finished;
rearranging the text objects within the text passage includes:
performing a re-layout step within the text line for each line in the paragraph;
when the starting position of the text paragraph rearrangement is the first line of the text paragraph, moving the position of the first line to ensure that the first line is attached to the upper boundary of the text paragraph;
searching from the last row to the front, and removing the empty row;
rearranging the text objects within the text block includes:
performing a text paragraph rearrangement step on each text paragraph in the text block;
when the starting position of the rearranged text block is the first section of the text block, moving the position of the first section to ensure that the first section is attached to the upper boundary of the text block;
and adjusting the position of the text paragraph in the text block to ensure that the distance between the lower boundary of the previous paragraph and the upper boundary of the next paragraph is the set line distance.
Optionally, the apparatus further comprises:
the recording unit is used for recording the rollback data and the corresponding editing operation;
a cancel instruction receiving unit configured to receive a cancel instruction;
and the revocation unit is used for executing revocation operation according to the revocation instruction.
Optionally, the apparatus further comprises:
and the exit unit is used for exiting the editing state when no editing instruction is input.
EXAMPLE III
Fig. 3 is a schematic diagram of a PDF document editing terminal device according to an embodiment of the present invention. As shown in fig. 3, the PDF document editing terminal device 3 of the embodiment includes: a processor 30, a memory 31 and a computer program 32 stored in said memory 31 and executable on said processor 30. The processor 30 implements the steps in the respective PDF document editing method embodiments described above, such as steps S101 to S106 shown in fig. 1, when executing the computer program 32. Alternatively, the processor 30, when executing the computer program 32, implements the functions of each module/unit in the above-mentioned device embodiments, for example, the functions of the modules 201 to 206 shown in fig. 2.
Illustratively, the computer program 32 may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 30 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program 32 in the PDF document editing terminal device 3. For example, the computer program 32 may be divided into a plurality of modules, each module having the following specific functions:
analyzing a PDF document to obtain a logic relation of text object contents in the PDF document;
according to the logical relation, a first data structure and a cursor data structure corresponding to the first data structure are constructed;
receiving an editing instruction;
editing a target text object according to the editing instruction and the cursor data structure, and acquiring a second data structure according to an editing result and the first data structure;
rearranging the text objects in the PDF document according to the second data structure;
and saving the rearranged result into the PDF document.
Optionally, the first data structure and the second data structure are both tree data structures, and the tree data structure includes:
a root container containing one or more text streams therein;
the text stream comprises one or more text blocks with the same writing direction;
the text block comprises one or more text paragraphs;
the text paragraph comprises one or more text lines;
the text line contains one or more words;
one or more text objects are contained in the words;
the words contain the mapping relationship between the virtual characters and the actual characters in the text object.
Optionally, the editing the target text object according to the editing instruction and the cursor data structure includes:
and the editing instruction comprises cursor information, if the cursor information is single cursor information, an operation position is determined according to the single cursor information and the cursor data structure, and the target text object is edited according to the operation position.
Optionally, the editing the target text object according to the editing instruction and the cursor data structure includes:
the editing instruction comprises cursor information, if the cursor information is one or more groups of cursor group information, an operation range is determined according to the one or more groups of cursor group information and the cursor data structure, a line order is determined according to a first data structure, and the operation range is sequenced according to the line order;
decomposing the operation range into ranges in a plurality of paragraphs, and forwarding the operation corresponding to the editing instruction and the ranges in the paragraphs to each related paragraph;
decomposing the range in the paragraph into ranges in a plurality of text lines, and forwarding the operation corresponding to the editing instruction and the range in the text lines to each related text line;
resolving the range in the text line into ranges in a plurality of words, and forwarding the operation corresponding to the editing instruction and the ranges in the words to each related word;
and editing the target text object according to the operation corresponding to the editing instruction and the range in the word.
Optionally, the rearranging the text objects in the PDF document according to the second data structure specifically includes:
determining a line text sequence of the text according to the second data structure;
determining the initial position of the text object needing to be typeset again;
rearranging the text objects within the words includes:
moving the text objects in the words according to the line sequence, so that the text object in the next word is close to the tail of the text object in the previous word, and the text objects in all the words are arranged into a straight line on the character base line according to the writing direction of the characters;
rearranging the text objects within the text line includes:
setting the boundaries of the rows; the boundary comprises a line head boundary and a line tail boundary;
if the current text line does not comprise words, ending the typesetting of the current text line;
if the current text line comprises words, moving the head of the first word in the line to the head boundary of the line;
performing a text object within a word rearrangement step on the first word;
when the first word exceeds the line tail boundary, cutting the text object in the first word at the boundary, moving the text object in the first word exceeding the line tail boundary to the next line, and placing the text object in the first word exceeding the line tail boundary on the head boundary of the next line; ending the typesetting of the current line;
if the number of the words in the text line is 1, moving all the words in the next non-empty line to the current text line, and finishing the typesetting again of the current line;
if the number of the words in the text line is more than 2, moving the position of the second word to enable the second word to be close to the first word and to be on the same character base line with the first word;
performing a text object within a word rearrangement step on a second word;
when the second word exceeds the tail boundary of the line, moving the second word to the front of the first word of the next line, and finishing the typesetting of the previous line;
moving the position of the third word to enable the third word to be close to the first word and to be on the same character base line with the second word, and so on until the current text line is typeset again and is finished;
rearranging the text objects within the text passage includes:
performing a re-layout step within the text line for each line in the paragraph;
when the starting position of the text paragraph rearrangement is the first line of the text paragraph, moving the position of the first line to ensure that the first line is attached to the upper boundary of the text paragraph;
searching from the last row to the front, and removing the empty row;
rearranging the text objects within the text block includes:
performing a text paragraph rearrangement step on each text paragraph in the text block;
when the starting position of the rearranged text block is the first section of the text block, moving the position of the first section to ensure that the first section is attached to the upper boundary of the text block;
and adjusting the position of the text paragraph in the text block to ensure that the distance between the lower boundary of the previous paragraph and the upper boundary of the next paragraph is the set line distance.
Optionally, the method further includes:
recording rollback data and corresponding editing operation;
receiving a revocation instruction;
and executing the revocation operation according to the revocation instruction.
Optionally, before the re-typeset result is stored in the PDF document, the method further includes:
and when no editing instruction is detected to be input, exiting the editing state.
The PDF document editing terminal device 3 may be a desktop computer, a notebook, a palm computer, a cloud server, and other computing devices. The PDF document editing terminal device may include, but is not limited to, a processor 30 and a memory 31. Those skilled in the art will appreciate that fig. 3 is only an example of the PDF document editing terminal device 3, and does not constitute a limitation of the PDF document editing terminal device 3, and may include more or less components than those shown, or combine some components, or different components, for example, the PDF document editing terminal device may further include an input-output device, a network access device, a bus, and the like.
The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 31 may be an internal storage unit of the PDF document editing terminal device 3, for example, a hard disk or a memory of the PDF document editing terminal device 3. The memory 31 may also be an external storage device of the PDF document editing terminal device 3, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped on the PDF document editing terminal device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the PDF document editing terminal device 3. The memory 31 is used for storing the computer program and other programs and data required by the PDF document editing terminal device. The memory 31 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (9)

1. A PDF document editing method is characterized by comprising the following steps:
analyzing a PDF document to obtain a logic relation of text object contents in the PDF document;
according to the logical relation, a first data structure and a cursor data structure corresponding to the first data structure are constructed;
receiving an editing instruction;
editing the target text object according to the editing instruction and the cursor data structure, and acquiring a second data structure according to an editing result and the first data structure;
rearranging the text objects in the PDF document according to the second data structure;
saving the re-typeset result into a PDF document;
the editing the target text object according to the editing instruction and the cursor data structure comprises:
the editing instruction comprises cursor information, if the cursor information is one or more groups of cursor group information, an operation range is determined according to the one or more groups of cursor group information and the cursor data structure, a line order is determined according to a first data structure, and the operation range is sequenced according to the line order;
decomposing the operation range into ranges in a plurality of paragraphs, and forwarding the operation corresponding to the editing instruction and the ranges in the paragraphs to each related paragraph;
decomposing the range in the paragraph into ranges in a plurality of text lines, and forwarding the operation corresponding to the editing instruction and the range in the text lines to each related text line;
resolving the range in the text line into ranges in a plurality of words, and forwarding the operation corresponding to the editing instruction and the ranges in the words to each related word;
and editing the target text object according to the operation corresponding to the editing instruction and the range in the word.
2. The PDF document editing method according to claim 1, wherein said first data structure and said second data structure are each a tree data structure, said tree data structure comprising:
a root container containing one or more text streams therein;
the text stream comprises one or more text blocks with the same writing direction;
the text block comprises one or more text paragraphs;
the text paragraph comprises one or more text lines;
the text line contains one or more words;
one or more text objects are contained in the words;
the words contain the mapping relationship between the virtual characters and the actual characters in the text object.
3. The PDF document editing method according to claim 1, wherein said editing a target text object according to said editing instruction and said cursor data structure comprises:
and the editing instruction comprises cursor information, if the cursor information is single cursor information, an operation position is determined according to the single cursor information and the cursor data structure, and the target text object is edited according to the operation position.
4. The method of editing a PDF document according to claim 1 wherein said rearranging text objects in said PDF document according to said second data structure comprises:
determining a line text sequence of the text according to the second data structure;
determining the initial position of the text object needing to be typeset again;
rearranging the text objects within the words includes:
moving the text objects in the words according to the line sequence, so that the text object in the next word is close to the tail of the text object in the previous word, and the text objects in all the words are arranged into a straight line on the character base line according to the writing direction of the characters;
rearranging the text objects within the text line includes:
setting the boundaries of the rows; the boundary comprises a line head boundary and a line tail boundary;
if the current text line does not comprise words, ending the typesetting of the current text line;
if the current text line comprises words, moving the head of the first word in the line to the head boundary of the line;
performing a text object within a word rearrangement step on the first word;
when the first word exceeds the line tail boundary, cutting the text object in the first word at the boundary, moving the text object in the first word exceeding the line tail boundary to the next line, and placing the text object in the first word exceeding the line tail boundary on the head boundary of the next line; ending the typesetting of the current line;
if the number of the words in the text line is 1, moving all the words in the next non-empty line to the current text line, and finishing the typesetting again of the current line;
if the number of the words in the text line is more than 2, moving the position of the second word to enable the second word to be close to the first word and to be on the same character base line with the first word;
performing a text object within a word rearrangement step on a second word;
when the second word exceeds the tail boundary of the line, moving the second word to the front of the first word of the next line, and finishing the typesetting of the previous line;
moving the position of the third word to enable the third word to be close to the first word and to be on the same character base line with the second word, and so on until the current text line is typeset again and is finished;
rearranging the text objects within the text passage includes:
performing a re-layout step within the text line for each line in the paragraph;
when the starting position of the text paragraph rearrangement is the first line of the text paragraph, moving the position of the first line to ensure that the first line is attached to the upper boundary of the text paragraph;
searching from the last row to the front, and removing the empty row;
rearranging the text objects within the text block includes:
performing a text paragraph rearrangement step on each text paragraph in the text block;
when the starting position of the rearranged text block is the first section of the text block, moving the position of the first section to ensure that the first section is attached to the upper boundary of the text block;
and adjusting the position of the text paragraph in the text block to ensure that the distance between the lower boundary of the previous paragraph and the upper boundary of the next paragraph is the set line distance.
5. The PDF document editing method according to claim 1, said method further comprising:
recording rollback data and corresponding editing operation;
receiving a revocation instruction;
and executing the revocation operation according to the revocation instruction.
6. The PDF document editing method according to claim 1, wherein before saving the re-typeset result to the PDF document, said method further comprises:
and when no editing instruction is detected to be input, exiting the editing state.
7. A PDF document editing apparatus comprising:
the analysis unit is used for analyzing the PDF document to obtain the logical relationship of the text object content in the PDF document;
the construction unit is used for constructing a first data structure and a cursor data structure corresponding to the first data structure according to the logical relation;
an instruction receiving unit for receiving an editing instruction;
the editing unit is used for editing the target text object according to the editing instruction and the cursor data structure and acquiring a second data structure according to an editing result and the first data structure;
the typesetting unit is used for typesetting the text objects in the PDF document again according to the second data structure;
the storage unit is used for storing the re-typeset result into the PDF document;
the editing the target text object according to the editing instruction and the cursor data structure comprises:
the editing instruction comprises cursor information, if the cursor information is one or more groups of cursor group information, an operation range is determined according to the one or more groups of cursor group information and the cursor data structure, a line order is determined according to a first data structure, and the operation range is sequenced according to the line order;
decomposing the operation range into ranges in a plurality of paragraphs, and forwarding the operation corresponding to the editing instruction and the ranges in the paragraphs to each related paragraph;
decomposing the range in the paragraph into ranges in a plurality of text lines, and forwarding the operation corresponding to the editing instruction and the range in the text lines to each related text line;
resolving the range in the text line into ranges in a plurality of words, and forwarding the operation corresponding to the editing instruction and the ranges in the words to each related word;
and editing the target text object according to the operation corresponding to the editing instruction and the range in the word.
8. A PDF document editing terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, wherein said processor implements the steps of the method according to any one of claims 1 to 6 when executing said computer program.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN201711182015.4A 2017-11-23 2017-11-23 PDF document editing method and terminal equipment Active CN107977346B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711182015.4A CN107977346B (en) 2017-11-23 2017-11-23 PDF document editing method and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711182015.4A CN107977346B (en) 2017-11-23 2017-11-23 PDF document editing method and terminal equipment

Publications (2)

Publication Number Publication Date
CN107977346A CN107977346A (en) 2018-05-01
CN107977346B true CN107977346B (en) 2021-06-15

Family

ID=62011177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711182015.4A Active CN107977346B (en) 2017-11-23 2017-11-23 PDF document editing method and terminal equipment

Country Status (1)

Country Link
CN (1) CN107977346B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897730B (en) * 2018-06-29 2022-07-29 国信优易数据股份有限公司 PDF text processing method and device
CN109492208B (en) * 2018-10-12 2023-06-23 天津字节跳动科技有限公司 Document editing method and device, equipment and storage medium thereof
CN109657220A (en) * 2018-12-11 2019-04-19 万兴科技股份有限公司 The online editing method, apparatus and electronic equipment of PDF document
CN111460272B (en) * 2019-01-22 2024-02-13 北京国双科技有限公司 Text page ordering method and related equipment
CN110765754A (en) * 2019-09-16 2020-02-07 平安科技(深圳)有限公司 Text data typesetting method and device, computer equipment and storage medium
CN112434495A (en) * 2020-12-14 2021-03-02 万兴科技(湖南)有限公司 Selection method, selection device, computer equipment and storage medium
CN112667438A (en) * 2020-12-24 2021-04-16 万兴科技集团股份有限公司 Text saving and restoring method and device, computer equipment and storage medium
CN113111624B (en) * 2021-04-19 2023-04-18 抖音视界有限公司 Text display method and device, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308488A (en) * 2008-06-05 2008-11-19 北大方正集团有限公司 Document stream type information processing method based on format document and device therefor
CN101976232A (en) * 2010-09-19 2011-02-16 深圳市万兴软件有限公司 Method for identifying data form in document and device thereof
CN102306143A (en) * 2011-09-22 2012-01-04 汉王科技股份有限公司 Method and system for generating and editing PDF (portable document format) document
CN103210371A (en) * 2010-09-30 2013-07-17 苹果公司 Content preview
CN103544408A (en) * 2013-09-23 2014-01-29 中山大学 Method for embedment and extraction of PDF document hidden information according to composite font
CN103617403A (en) * 2013-11-25 2014-03-05 广东数字证书认证中心有限公司 PDF file digital signature and verification method and system
CN104063364A (en) * 2013-03-19 2014-09-24 福建福昕软件开发股份有限公司北京分公司 PDF document recognition method
CN104090920A (en) * 2014-06-17 2014-10-08 安徽教育网络出版有限公司 System for realizing digital content cross-terminal publishing
CN104346322A (en) * 2013-08-08 2015-02-11 北大方正集团有限公司 Document format processing device and document format processing method
CN105701091A (en) * 2014-11-24 2016-06-22 北大方正集团有限公司 Semantic-based PDF document processing method and processing device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308488A (en) * 2008-06-05 2008-11-19 北大方正集团有限公司 Document stream type information processing method based on format document and device therefor
CN101976232A (en) * 2010-09-19 2011-02-16 深圳市万兴软件有限公司 Method for identifying data form in document and device thereof
CN103210371A (en) * 2010-09-30 2013-07-17 苹果公司 Content preview
CN102306143A (en) * 2011-09-22 2012-01-04 汉王科技股份有限公司 Method and system for generating and editing PDF (portable document format) document
CN104063364A (en) * 2013-03-19 2014-09-24 福建福昕软件开发股份有限公司北京分公司 PDF document recognition method
CN104346322A (en) * 2013-08-08 2015-02-11 北大方正集团有限公司 Document format processing device and document format processing method
CN103544408A (en) * 2013-09-23 2014-01-29 中山大学 Method for embedment and extraction of PDF document hidden information according to composite font
CN103617403A (en) * 2013-11-25 2014-03-05 广东数字证书认证中心有限公司 PDF file digital signature and verification method and system
CN104090920A (en) * 2014-06-17 2014-10-08 安徽教育网络出版有限公司 System for realizing digital content cross-terminal publishing
CN105701091A (en) * 2014-11-24 2016-06-22 北大方正集团有限公司 Semantic-based PDF document processing method and processing device

Also Published As

Publication number Publication date
CN107977346A (en) 2018-05-01

Similar Documents

Publication Publication Date Title
CN107977346B (en) PDF document editing method and terminal equipment
CN108470021A (en) The localization method and device of table in PDF document
CN107391762B (en) Log data processing method and device
CN108334609B (en) Method, device, equipment and storage medium for realizing JSON format data access in Oracle
CN111914520A (en) Document collaborative editing method and device, computer device and storage medium
CN107590291A (en) A kind of searching method of picture, terminal device and storage medium
CN110909123A (en) Data extraction method and device, terminal equipment and storage medium
CN111027640A (en) Video data labeling method and device, terminal equipment and storage medium
CN104462420A (en) Method and device for executing query tasks on database
CN115730605A (en) Data analysis method based on multi-dimensional information
CN109542398B (en) Business system generation method and device and computer readable storage medium
CN108038125B (en) Method, device, equipment and storage medium for automatically comparing fund system test values
CN110175539B (en) Character creating method and device, terminal equipment and readable storage medium
CN115935917A (en) Data processing method, device and equipment for visual chart and storage medium
CN111862343A (en) Three-dimensional reconstruction method, device and equipment and computer readable storage medium
CN110704635A (en) Conversion method and device for ternary group data in knowledge graph
CN109324838B (en) Execution method and execution device of single chip microcomputer program and terminal
CN110930056A (en) Thinking-guidance-graph-based task management method, terminal device and storage medium
CN110991088A (en) Cable model construction method and system, terminal device and storage medium
CN107943760B (en) Method and device for optimizing fonts of PDF document editing, terminal equipment and storage medium
CN113077469B (en) Sketch image semantic segmentation method and device, terminal device and storage medium
CN110378795B (en) Method and device for generating clause file, storage medium and server
CN112148470B (en) Parameter synchronization method, computer device and readable storage medium
CN110263303B (en) Method and device for tracing text modification history
CN111967240B (en) Text parsing method, text parsing device, terminal equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 850000 Tibet autonomous region, Lhasa City, New District, west of the East Ring Road, 1-4 road to the north, south of 1-3 Road, Liu Dong building, east of the 8 unit 6, floor 2, No.

Applicant after: Wanxing Technology Group Co.,Ltd.

Address before: 850000 Tibet autonomous region, Lhasa City, New District, west of the East Ring Road, 1-4 road to the north, south of 1-3 Road, Liu Dong building, east of the 8 unit 6, floor 2, No.

Applicant before: WONDERSHARE TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
TA01 Transfer of patent application right

Effective date of registration: 20210422

Address after: 518000 a1204, building 11, Shenzhen Bay science and technology ecological park, No.16, Keji South Road, high tech community, Yuehai street, Nanshan District, Shenzhen City, Guangdong Province

Applicant after: Shenzhen Yitu Software Co.,Ltd.

Address before: 850000 Tibet autonomous region, Lhasa City, New District, west of the East Ring Road, 1-4 road to the north, south of 1-3 Road, Liu Dong building, east of the 8 unit 6, floor 2, No.

Applicant before: Wanxing Technology Group Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant