CN115994521A - Document editing method, presentation method, and document paragraph identification method and device - Google Patents
Document editing method, presentation method, and document paragraph identification method and device Download PDFInfo
- Publication number
- CN115994521A CN115994521A CN202211591507.XA CN202211591507A CN115994521A CN 115994521 A CN115994521 A CN 115994521A CN 202211591507 A CN202211591507 A CN 202211591507A CN 115994521 A CN115994521 A CN 115994521A
- Authority
- CN
- China
- Prior art keywords
- paragraph
- pdf document
- target
- editing
- page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Document Processing Apparatus (AREA)
Abstract
One or more embodiments of the present invention provide a document editing method, a presentation method, a document paragraph identification method and a device, where the document editing method includes: determining a target editing area in response to a received selection operation based on the PDF document page; identifying the content in the target editing area as a paragraph; and responding to the received editing operation based on the content in the target editing area, and performing editing processing according to the editing operation by taking the content in the target editing area as a paragraph. The method can effectively reduce the complexity of editing the PDF document by the user and improve the editing efficiency of the PDF document.
Description
Technical Field
The present invention relates to the field of computer application technologies, and in particular, to a document editing method, a document presenting method, and a document paragraph recognition method and apparatus.
Background
PDF (Portab le Document Format ), is an electronic file format. This file format is independent of the operating system platform, that is, PDF files are common in either Wi windows, un ix or apple Mac OS operating systems. This capability makes it an ideal document format for electronic document distribution and digital information dissemination over the I nternet. More and more electronic books, product descriptions, corporate literature, web materials, e-mail are beginning to use PDF formatted files.
With the development of technology, PDF documents can be read by using a PDF reader and edited. When a user edits a PDF document based on a PDF reader, the user may enter a PDF editing mode through the PDF reader, in which the PDF document is edited. However, based on the existing PDF standard protocol, the paragraph information of the PDF document is not saved when the PDF document is edited and saved, so that when the user edits the PDF document again, the paragraph information in the PDF document needs to be identified through an algorithm. However, in some PDF documents with more complex paragraph distributions, for example, PDF documents of some newspapers, magazines or journals, there may be a plurality of columns in one page, so that two or more paragraphs may be in a juxtaposed position in the page, and in this case, an error may occur in a paragraph identified by the PDF reader using a preset algorithm. As another example, if a page crossing paragraph exists in a document, the preset algorithm often fails to properly identify the page crossing paragraph. For example, if the last piece of content of the previous page and the first piece of content of the next page in the PDF document actually belong to one paragraph, the preset algorithm is likely to recognize the last piece of content of the previous page and the first piece of content of the next page as two paragraphs.
Since correctly performing paragraph recognition on a document is the basis for correct display and editing of the document, the above paragraph recognition defect may seriously affect the correct display and editing of the document.
Disclosure of Invention
In view of this, one or more embodiments of the present invention provide a document editing method, a presentation method, and a method and an apparatus for identifying document paragraphs, which can effectively reduce complexity of editing a PDF document by a user, improve efficiency of editing the PDF document, effectively identify a cross-page PDF document paragraph, improve accuracy of identifying a PDF document paragraph, save paragraph information of the PDF document, and avoid a problem of losing paragraph information of the PDF document after saving the PDF document.
One or more embodiments of the present invention provide a document editing method including: determining a target editing area of the PDF document in response to a received selection operation based on a PDF document page; identifying the content in the target editing area as a paragraph; and responding to the received editing operation of the content in the target editing area, and editing the content in the target editing area as a paragraph according to the editing operation.
Optionally, in response to a received selected operation performed based on a PDF document page, determining a target editing area of the PDF document includes: responding to the selected operation, acquiring first position information and second position information of the PDF document page; and constructing a text box according to the first position information and the second position information, and determining an area in the text box as the target editing area.
Optionally, in response to a received selected operation performed based on a PDF document page, determining a target editing area of the PDF document includes: acquiring a drawing track based on the PDF document page; and forming a text box according to the drawing track, and determining the area in the text box as the target editing area.
Optionally, the method further comprises: after the text box is obtained, the text box is displayed.
Optionally, the method further comprises: after editing processing is carried out on the content in the target editing area as a paragraph according to the editing operation, a new PDF document is generated on the paragraph after editing processing and other parts which are not edited in the PDF document.
Optionally, the target editing area includes content of more than one natural segment.
Optionally, the target editing area includes content of more than one natural segment.
Optionally, the content in the target editing area includes at least one of: text, characters, and pictures.
One or more embodiments of the present invention provide a document editing apparatus including: the determining module is configured to determine a target editing area in response to a received selected operation executed based on the PDF document page; an identification module configured to identify content in the target editing area as a paragraph; and the editing module is configured to respond to the received editing operation on the content in the target editing area and carry out editing processing according to the editing operation by taking the content in the target editing area as a paragraph.
One or more embodiments of the present invention provide an electronic device including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space surrounded by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to respective circuits or devices of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing any one of the document editing methods described above.
One or more embodiments of the present invention provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform any one of the document editing methods described above.
In one or more embodiments of the present invention, in response to a received selection operation performed based on a PDF document page, a target editing area is determined, and content in the target editing area is identified as a paragraph, so that when a received editing operation based on editing the content in the target editing area is acquired, the content in the target editing area can be edited as a paragraph, thereby realizing the purpose of editing a portion that a user desires to edit as a paragraph, avoiding the problem that the identified paragraph does not satisfy the user desire, effectively reducing the complexity of editing the PDF document by the user, and improving the editing efficiency of the PDF document.
One or more embodiments of the present invention provide a method for identifying a document paragraph, including: acquiring first position information of the last line in a target paragraph at the end of a page in a current page of a PDF document and second position information of the first line in a page of a next page of the PDF document; determining whether the target paragraph is not ended on the current page according to the first position information and the second position information; identifying the target paragraph as one paragraph from a first paragraph in a page of the next page in response to the target paragraph not ending in the current page; in response to the target paragraph ending at the current page, the target paragraph is identified as a paragraph.
Optionally, the first position information includes a first coordinate of a last character in a last line of the target paragraph, the second position information includes a second coordinate of a first character in a first line of the page of the next page, and determining whether the target paragraph is not ended in the current page according to the first position information and the second position information includes: determining a first distance between the last character in the last line of the target paragraph and the edge of the PDF page in a first direction according to the first coordinate and the size information of the PDF document page; judging whether the first distance is equal to a preset distance or not; determining that the target paragraph ends at the current page in response to the first distance being unequal to the preset distance;
determining a second distance from a first character in a first row of a page of the next page to the edge of the PDF page in the second direction according to the second coordinate and the size information of the PDF document page in response to the first distance being equal to the preset distance, wherein the first direction and the second direction are opposite directions; judging whether the second distance is equal to the preset distance or not; determining that the target paragraph ends at the current page in response to the second distance being unequal to the preset distance;
And determining that the target paragraph is not ended on the current page in response to the second distance being equal to the preset distance.
Optionally, the method further comprises: before judging whether the first distance is equal to a preset distance or not, calculating the distance between the paragraph before the target paragraph and the page edge in the first direction to obtain the preset distance.
Optionally, the method further comprises: judging whether characters in the target paragraph are consistent with the font and/or the font size of the first paragraph in the page of the next page or not according to the second distance and the preset distance; and determining that the target paragraph ends at the current page in response to the character in the target paragraph being inconsistent with the font or the font size of the first paragraph in the page of the next page.
Optionally, the method further comprises: before first position information of the last line in a target paragraph at the end of a page in a current page of a PDF document and second position information of the first line in a page of a next page of the PDF document are obtained, a PDF document opening instruction is obtained, and the PDF document is opened according to the opening instruction.
Optionally, the method further comprises: after identifying the target paragraph as one paragraph in response to the target paragraph not ending in the current page, or after identifying the target paragraph as one paragraph in response to the target paragraph ending in the current page, storing the identified data of the one paragraph as a data structure.
Optionally, the method further comprises: after identifying the target paragraph as a paragraph in response to the target paragraph not ending in the current page, or identifying the target paragraph as a paragraph in response to the target paragraph ending in the current page, performing editing processing according to the editing operation on the PDF document, wherein the identified paragraph is used as a natural paragraph.
One or more embodiments of the present invention provide a document paragraph recognition apparatus, including: a first acquisition module configured to acquire first position information of a last line in a target paragraph at the end of a page in a current page of a PDF document and second position information of the first line in a page of a next page of the PDF document; a first determining module configured to determine whether the target paragraph is not ended at the current page according to the first position information and the second position information; a first identifying module configured to identify a first paragraph of the target paragraph and a page of the next page as one paragraph in response to the target paragraph not ending at the current page; a second identifying module configured to identify the target paragraph as a paragraph in response to the target paragraph ending at the current page.
Optionally, the first position information includes a first coordinate of a last character in a last line in the target paragraph, the second position information includes a second coordinate of a first character in a first line in the page of the next page, and the determining module is specifically configured to: determining a first distance between the last character in the last line of the target paragraph and the edge of the PDF page in a first direction according to the first coordinate and the size information of the PDF document page; judging whether the first distance is equal to a preset distance or not; determining that the target paragraph ends at the current page in response to the first distance being unequal to the preset distance; determining a second distance from a first character in a first row of a page of the next page to the edge of the PDF page in the second direction according to the second coordinate and the size information of the PDF document page in response to the first distance being equal to the preset distance, wherein the first direction and the second direction are opposite directions; judging whether the second distance is equal to the preset distance or not; determining that the target paragraph ends at the current page in response to the second distance being unequal to the preset distance;
And determining that the target paragraph is not ended on the current page in response to the second distance being equal to the preset distance.
Optionally, the apparatus further includes: the calculating module is configured to calculate the distance between the paragraph before the target paragraph and the edge of the page in the first direction before judging whether the first distance is equal to the preset distance, so as to obtain the preset distance.
Optionally, the apparatus further includes: a judging module configured to judge whether characters in the target paragraph are consistent with the font and/or font size of the first paragraph in the page of the next page in response to the second distance being equal to the preset distance; a second determination module configured to determine that the target paragraph ends at the current page in response to a character in the target paragraph being inconsistent with a font or a font size of a first paragraph in a page of the next page.
Optionally, the apparatus further includes: the second acquisition module is configured to acquire a PDF document opening instruction before acquiring first position information of a last line in a target paragraph at the end of a page in a current page of the PDF document and second position information of the first line in a page of a next page of the PDF document, and open the PDF document according to the opening instruction.
Optionally, the apparatus further includes: a storage module configured to store the data of the identified one paragraph as a data structure after identifying the first paragraph in the page of the target paragraph and the next page as one paragraph in response to the target paragraph not ending in the current page, or after identifying the target paragraph as one paragraph in response to the target paragraph ending in the current page.
Optionally, the apparatus further includes: and a processing module configured to, after identifying the target paragraph as one paragraph in response to the target paragraph not ending in the current page or identifying the target paragraph as one paragraph in response to the target paragraph ending in the current page, perform editing processing according to the editing operation on the PDF document, based on the identified paragraph as one natural paragraph.
One or more embodiments of the present invention also provide an electronic device including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space surrounded by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to respective circuits or devices of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing any one of the above-described document paragraph recognition methods.
One or more embodiments of the present invention also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of identifying any one of the document paragraphs described above.
According to the method and the device for identifying the document paragraphs provided by one or more embodiments of the invention, when the target paragraphs at the tail of the PDF document page are identified, the first position information of the last line of the target paragraphs and the second position information of the first line of the page of the next page are obtained, whether the target paragraphs end on the current page is judged according to the first position information and the second position information, and as the first position information of the last line of the current page can reflect the condition of the last paragraph of the current page and the position information of the first line of the next page can reflect the condition of the first paragraph of the next page, whether the first paragraph of the next page and the target paragraphs are actually one paragraph can be judged by integrating the two position information, so that the purpose of accurately identifying the cross-page paragraphs in the PDF document is realized, and the accuracy of PDF document paragraph identification is improved.
One or more embodiments of the present invention provide a document presentation method including: responding to a save operation of a PDF document, saving paragraph information of the PDF document and the PDF document; and presenting the PDF document according to paragraph information of the PDF document.
Optionally, presenting the PDF document according to paragraph information of the PDF document includes: and responding to the editing operation or the opening operation of the PDF document, and presenting the PDF document according to paragraph information of the PDF document.
Optionally, the paragraph information of the PDF document includes: a paragraph identifier, a paragraph start identifier, and a paragraph end identifier, wherein the paragraph identifier is used to identify a paragraph, the paragraph start identifier is used to identify a location of a paragraph start, and the paragraph end identifier is used to identify a location of a paragraph end.
Optionally, the paragraph information further includes: format information of characters in a paragraph expressed in the form of drawing instructions in a PDF standard protocol, wherein the format information of the characters is disposed between the paragraph start identifier and the paragraph end identifier.
Optionally, the paragraph information of the PDF document further includes: dictionary data including paragraph information of one paragraph.
Optionally, the paragraph information included in the dictionary data is format information of a paragraph.
Optionally, the format information of the paragraph includes at least one of the following: paragraph spacing, paragraph alignment, and paragraph indentation.
One or more embodiments of the present invention also provide a document presentation apparatus including: a saving module configured to save paragraph information of a PDF document and the PDF document in response to a saving operation on the PDF document; and the presentation module is configured to present the PDF document according to paragraph information of the PDF document.
Optionally, the presentation module is specifically configured to: and responding to the editing operation or the opening operation of the PDF document, and presenting the PDF document according to paragraph information of the PDF document.
Optionally, the paragraph information of the PDF document includes: a paragraph identifier, a paragraph start identifier, and a paragraph end identifier; wherein the paragraph identifier is used for identifying a paragraph, the paragraph start identifier is used for identifying a position of a paragraph start, and the paragraph end identifier is used for identifying a position of a paragraph end.
Optionally, the paragraph information of the PDF document further includes: dictionary data including paragraph information of one paragraph.
Optionally, the paragraph information further includes: format information of characters in a paragraph expressed in the form of drawing instructions in the PDF standard protocol.
Optionally, the paragraph information included in the dictionary data is format information of a paragraph.
Optionally, the format information of the paragraph includes at least one of the following: paragraph spacing, paragraph alignment, and paragraph indentation.
One or more embodiments of the present invention also provide an electronic device including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space surrounded by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to respective circuits or devices of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing any one of the document presentation methods described above.
One or more embodiments of the present invention also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform any one of the document presentation methods described above.
According to the document presenting method and device provided by one or more embodiments of the invention, the PDF document and the paragraph information of the PDF document can be stored in response to the storage operation of the PDF document, so that the PDF document can be presented according to the stored paragraph information of the PDF document later, the problem that the paragraph information of the PDF document is lost when the PDF document is edited again because the paragraph information of the PDF document is not stored when the PDF document is stored is avoided, and the accuracy of the paragraph information of the PDF document is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram illustrating a method of editing a document in accordance with one or more embodiments of the present invention;
FIG. 2 is a schematic diagram of a document editing apparatus according to one or more embodiments of the present invention;
FIG. 3 is a schematic diagram of an electronic device shown in accordance with one or more embodiments of the invention;
FIG. 4 is a flow diagram illustrating a method of identifying a document paragraph in accordance with one or more embodiments of the invention;
FIG. 5 is a schematic diagram illustrating a structure of an apparatus for identifying paragraphs of a document according to one or more embodiments of the present invention;
FIG. 6 is a schematic diagram of an electronic device shown in accordance with one or more embodiments of the invention;
FIG. 7 is a flow diagram illustrating a method of document presentation in accordance with one or more embodiments of the present invention;
FIG. 8 is a schematic diagram of a document presentation apparatus shown in accordance with one or more embodiments of the invention;
fig. 9 is a schematic structural view of an electronic device according to one or more embodiments of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
It should be understood that the described embodiments are merely some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
FIG. 1 is a flow diagram illustrating a method of editing a document, as shown in FIG. 1, according to one or more embodiments of the invention, the method comprising:
Step 101: determining a target editing area of the PDF document in response to a received selection operation based on a PDF document page;
in one example, the PDF document page is presented on a display screen of the electronic device, and the user may select the content to be edited on the PDF document page, where the selection operation may be, for example, the user drawing a frame with an arbitrary shape, such as a rectangular, triangular, polygonal, circular, or irregular shape, in the PDF document page, where the area within the frame is the target editing area, and the content within the frame is the content to be edited by the user. In this case, the user draws with a mouse, where the drawing shape is a rectangle, and before drawing, the cursor of the mouse is displayed at a certain point in the PDF document page, for example, may be a starting point of the content to be edited, after the user presses a mouse button (for example, the left button of the mouse), the point where the cursor of the mouse is located when the user presses the mouse button is taken as the starting point, after the user drags the mouse to a point, the user releases the mouse button, the point where the cursor of the mouse is located when the user releases the mouse button is taken as the end point, and the coordinates of the two points, namely the starting point and the end point, form a frame of a rectangular area. For example, a rectangular region may be constructed with the line connecting the start point and the end point as diagonal lines. The rectangular area is the target editing area, and the content in the rectangular area is the content which the user desires to edit. In addition, if the user displays the PDF document using a touch display screen, the user may directly draw the frame with a handwriting pen or a finger. Taking the above frames as irregularly shaped frames, a user draws a finger on the touch screen as an example, the user can draw a closed curve frame on the touch screen directly by the finger, the area in the closed curve frame is the target editing area, and the content in the closed curve is the content which the user expects to edit.
Before the step 101, the document editing method according to one or more embodiments of the present invention may further include switching the PDF document to a custom editing mode in response to the operation of switching the editing mode, after which, after the received editing operation based on the PDF document, the paragraph of the PDF document is not automatically identified by the preset algorithm, but the operation of customizing the target editing area by the user is received to identify the content in the target editing area as one paragraph.
Step 102: identifying the content in the target editing area as a paragraph;
along with the above example, after the frame is drawn by the user, the content in the PDF document within the frame and the content outside the frame may be determined first, and the content within the frame is determined as a paragraph to be marked and displayed. For content outside the target editing area, paragraph recognition is not performed.
Step 103: and responding to the received editing operation based on the content in the target editing area, and performing editing processing according to the editing operation by taking the content in the target editing area as a paragraph.
The content in the target editing area may be, for example, any character, symbol, picture, or the like in the PDF document. Editing operations for contents in the target editing area may include, for example, any operations of adding, deleting, modifying, and setting attribute information to these characters, symbols, and pictures. While for content that is outside the target edit area, it remains unchanged.
In one or more embodiments of the present invention, in response to a received selection operation performed based on a PDF document page, a target editing area of the PDF document is determined, and content in the target editing area is identified as a paragraph, so that when a received editing operation based on editing the content in the target editing area is obtained, the content in the target editing area can be edited as a paragraph, thereby realizing the purpose of editing a portion which a user desires to edit as a paragraph, avoiding the problem that the identified paragraph does not satisfy the user desire, effectively reducing the complexity of editing the PDF document by the user, and improving the editing efficiency of the PDF document.
In one or more embodiments of the present invention, determining a target edit area of a PDF document in response to a received selected operation performed based on a PDF document page may include:
responding to the selected operation, acquiring first position information and second position information of the PDF document page; and constructing a text box according to the first position information and the second position information, and determining an area in the text box as the target editing area. In the above example, the first position information may be, for example, the coordinates of the start point, and the second position information may be, for example, the coordinates of the end point, and a rectangular text box may be constructed by using, for example, the line connecting the start point and the end point as a diagonal line of a rectangle.
In one or more embodiments of the invention, determining the target edit area in response to a received selected operation performed based on a PDF document page may include:
acquiring a drawing track based on the PDF document page; and forming a text box according to the drawing track, and determining the area in the text box as the target editing area. Taking the example that a user draws an irregular shape on the touch screen by using a finger, the user directly draws a closed curve on the touch screen by using the finger, and the area in the closed curve is the target editing area, for example, a text box can be formed by a smallest rectangular box capable of covering all contents in the closed curve.
It should be noted that, in the above embodiment, the content in the text box may be edited, and the text in the text box is exemplified by a text, and attribute information corresponding to the text may be added, deleted, modified, and set in the text box, where the attribute information corresponding to the text may include, for example, information such as a size of the text, a font of the text, a color of the text, and a background color of the text.
In one or more embodiments of the present invention, the above document editing method may further include:
After the text box is obtained, the text box is displayed. For example, a border line of a text box may be displayed at the border of the above-described target editing area to distinguish a paragraph currently being edited from other paragraphs in the PDF document.
In one or more embodiments of the present invention, after the text box is obtained, the text box may be further hidden, after the text box is hidden, the content in the target editing area may be edited according to an editing operation performed by the user as a paragraph that has been identified, if the user wants to view the text box, the text box may be displayed by a designated user operation, for example, double clicking on an arbitrary position in the target editing area, after receiving the designated operation of the user.
In one or more embodiments of the present invention, the above document editing method may further include: after editing processing is carried out on the content in the target editing area as a paragraph according to the editing operation, a new PDF document is generated on the paragraph after editing processing and other parts which are not edited in the PDF document. For example, in a currently displayed page of a PDF document, a user frame selects a first natural section and the first two lines of the second natural section (assuming that the second natural section has six lines in total) in the page as a target editing area, the system displays the first natural section and the first two lines of the second natural section in a text frame in the form of a PDF paragraph, if the user desires to edit the content in the target editing area to delete the last text in the text frame, the text is deleted in response to an editing operation made by the user to delete the text, so as to complete editing processing of the content in the target editing area, keep other parts in the PDF document unchanged, and generate a PDF document from the edited PDF paragraph and other unedited parts in the PDF document.
In one or more embodiments of the invention, the target edit area includes a plurality of natural segments therein. That is, in one or more embodiments of the present invention, in the PDF paragraph identification method described above, the content identified as one paragraph may include more than one natural paragraph. For example, the user may select the content to be edited by framing in the area of the PDF document page where he desires to edit, which may include, for example, a number of actual natural segments. On the basis, the plurality of natural segments can be identified as a PDF paragraph, and the plurality of natural segments are presented in a text box for editing by a user. Therefore, the user can edit the contents of the plurality of natural sections in the text frame by adding, deleting or setting attributes and the like, and the user does not need to enter the text frames corresponding to the natural sections in sequence to edit the contents of the natural sections in sequence, so that the complexity of user operation is effectively reduced.
In one or more embodiments of the invention, the target edit area includes content of less than one natural segment. That is, in one or more embodiments of the present invention, in the PDF paragraph identification method described above, the content identified as one paragraph may include content of less than one natural paragraph. For example, the user may select the content to be edited by framing the region of the PDF document page where the user desires to edit, and the portion of the content may include several lines in a natural segment, for example, if a certain natural segment actually includes twelve lines of text, but the user only wants to edit two lines of text, the user may select only the two lines of text by framing, and the region where the two lines of text are located is the target editing region. After the user box selects the two lines of text, the system can recognize the two lines of text as a PDF paragraph, and the two lines of text are presented in a text box for the user to edit the two lines of text in the text box. Thus, the user can edit the two lines of text in the text box. Without the system identifying other paragraphs in the PDF document. Therefore, according to the document editing method provided by one or more embodiments of the present invention, when editing a PDF document, based on the operation of selecting a target area to be edited by a user, a system may only perform paragraph recognition on the content of the target area to be edited, and no paragraph recognition is required for other content that need not be edited in the PDF document, which obviously simplifies the system processing flow and improves the document editing efficiency.
In one or more embodiments of the present invention, the content in the target editing area includes at least one of: text, characters, and pictures. The text may include, for example, letters or words expressed in any language, and the symbols may include, for example, letters, numbers, operators, punctuation and other symbols, as well as some functional symbols. The picture may be, for example, various pictures that may be displayed in a PDF document.
For reasons of facilitating document editing methods provided by one or more embodiments of the present invention, the following examples are presented to briefly illustrate this method.
In this example, a user may be provided with a custom editing mode in a PDF reader, and for convenience of description, the editing mode may be referred to as mode a, and after entering the mode a, a paragraph to be edited may be custom-defined by the user, which may not be a paragraph actually presented in a natural paragraph form in the PDF document, and may be any portion of content presented in a page of the PDF document. For example, after the user clicks a button set in a PDF reader page to make a PDF document currently presented enter a mode a, the user draws a rectangular frame in the PDF document page through a mouse operation, for example, taking a point when the user presses a mouse button as a starting point, a point when the user releases the mouse button as an ending point, constructing a rectangular area with coordinates of the starting point and coordinates of the ending point, identifying contents in the rectangular area as a paragraph, editing characters or pictures in the rectangular area as a large paragraph, wherein the large paragraph may include a plurality of subsections, editing based on the large paragraph by a subsequent editing operation, and keeping characters and pictures outside the rectangular area unchanged to obtain the edited PDF document.
FIG. 2 is a block diagram of a document editing apparatus according to one or more embodiments of the present invention, and as shown in FIG. 2, the apparatus 20 includes:
a determining module 21 configured to determine a target editing area of the PDF document in response to a received selection operation performed based on a PDF document page;
an identification module 22 configured to identify the content in the target editing area as a paragraph;
an editing module 23 configured to perform editing processing according to a received editing operation based on the content in the target editing area as one paragraph in accordance with the editing operation.
In one or more embodiments of the present invention, the determining module may be specifically configured to: responding to the selected operation, acquiring first position information and second position information of the PDF document page; and constructing a text box according to the first position information and the second position information, and determining an area in the text box as the target editing area.
In one or more embodiments of the present invention, the determining module may be specifically configured to: acquiring a drawing track; and forming a text box according to the drawing track, and determining the area in the text box as the target editing area.
In one or more embodiments of the present invention, the above document editing apparatus may further include: and the generation module is configured to generate a new PDF document from the edited paragraph and other parts of the PDF document which are not edited after editing processing is performed on the content in the target editing area as one paragraph according to the editing operation.
In one or more embodiments of the invention, the paragraph may include the content of a plurality of natural paragraphs.
In one or more embodiments of the invention, fewer than one natural segment may be included in the paragraph.
In one or more embodiments of the present invention, the content in the target editing area may include at least one of: text, characters, and pictures.
One or more embodiments of the present invention also provide an electronic device including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space surrounded by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to respective circuits or devices of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing any one of the document editing methods described above.
One or more embodiments of the present invention also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform any one of the document editing methods described above.
Accordingly, as shown in fig. 3, the electronic device provided by the embodiment of the present invention may include: the processor 32 and the memory 33 are arranged on the circuit board 34, wherein the circuit board 34 is arranged in a space surrounded by the shell 31; a power circuit 35 for powering the various circuits or devices of the server; the memory 33 is for storing executable program code; the processor 32 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 33 for executing any one of the document editing methods provided in the foregoing embodiments.
FIG. 4 is a flow diagram illustrating a method of identifying a document paragraph, according to one or more embodiments of the invention, as shown in FIG. 4, the method comprising:
step 401: acquiring first position information of the last line in a target paragraph at the end of a page in a current page of a PDF document and second position information of the first line in a page of a next page of the PDF document;
Wherein, the target paragraph at the end of the page in the current page may be the last paragraph in the current page of the PDF document, and the paragraph may be an independent natural paragraph; the paragraph may also be part of a natural paragraph, the other part of which is located in the first paragraph of the next page of the PDF document.
The first location information may be, for example, coordinate information related to the last line in the target paragraph, and the second location information may be, for example, coordinate information related to the first line in the page of the next page of the PDF document.
The current page and the next page may be, for example, any two adjacent pages in the PDF document.
When a paragraph is identified in a PDF document, for example, each paragraph in the PDF document may be identified sequentially in the order of paragraphs in the PDF document, and when a target paragraph at the end of the page in the current page is identified, the above-described step 401 is performed. When a paragraph of a PDF document is identified, a page-crossing paragraph may be identified first, and then other paragraphs in the PDF document may be identified, based on which the above step 401 may be performed before identifying other paragraphs in the PDF document.
Step 402: determining whether the target paragraph is not ended on the current page according to the first position information and the second position information;
step 403: identifying the target paragraph as one paragraph from a first paragraph in a page of the next page in response to the target paragraph not ending in the current page;
if the target paragraph is not finished in the current page, the content of the target paragraph is indicated to exist in the page of the next page, so that the first paragraph and the target paragraph in the page of the next page are identified as one paragraph.
It should be noted that, in the above steps 401 to 403, the "target paragraph" and the "first paragraph in the page of the next page" are not necessarily two paragraphs in practice, but the two portions are generally identified as two paragraphs when the PDF document is identified, so the two portions are respectively referred to as "paragraphs".
Step 404: in response to the target paragraph ending at the current page, the target paragraph is identified as a paragraph.
In the above steps 403 and 404, if the target paragraph is not ended on the current page, it indicates that the content of the target paragraph is still present on the page of the next page, the first paragraph of the next page may be identified as the content of the target paragraph, that is, the target paragraph and the first paragraph of the page of the next page may be identified as one paragraph, and if the target paragraph is ended on the current page, it indicates that the target paragraph is an independent paragraph, the target paragraph is identified as one paragraph.
According to the identification method of the document paragraphs provided by one or more embodiments of the present invention, when the last target paragraph in the PDF document page is identified, the first position information of the last line of the target paragraph and the second position information of the first line in the page of the next page are obtained, and whether the target paragraph ends in the current page is judged according to the first position information and the second position information.
In one or more embodiments of the present invention, the first location information may include a first coordinate of a last character in a last line of the target paragraph, the second location information may include a second coordinate of a first character in a first line of a page of the next page, and determining whether the target paragraph is not ended in the current page based on the first location information and the second location information may include:
Determining a first distance between the last character in the last line of the target paragraph and the edge of the PDF page in a first direction according to the first coordinate and the size information of the PDF document page;
the size information of the PDF document page may be obtained from attribute information of the PDF document, for example.
The coordinates of the character, such as the first coordinate and the second coordinate, refer to coordinate points of the position of the character in the PDF document page.
The first direction may be, for example, a right direction in a PDF document page, a position of a last character in a last line of content (including text and/or symbols) in a target paragraph may be determined according to a first coordinate of the last character in the PDF document page, and position information of an edge of the PDF document page may be determined according to size information of the PDF document page. The first distance may be, for example, the distance from the right edge of the PDF page of the last character in the last line of content in the target paragraph. The second distance may be, for example, a distance from a first character in a first line of content in a page of a next page to a left edge of the PDF page.
It should be noted that, in one or more embodiments of the present invention, "left" refers to a direction from a line end of a line in a PDF document paragraph to a line end, and "right" refers to a direction from a line end of a line in a PDF document paragraph to a line end, where "left" may refer to, for example, a left direction when paragraphs are aligned left, and "right" may refer to, for example, a right direction when paragraphs are aligned right.
Judging whether the first distance is equal to a preset distance or not;
if it is assumed that, before the target paragraph is identified, other paragraphs before the target paragraph in the PDF document have been identified, the preset distance may be a distance between the PDF paragraph before the target paragraph and an edge of the PDF page (herein, the page edge refers to a left edge and a right edge of the PDF page).
The preset distance may be, for example, a preset fixed value, and the fixed value may be stored in the PDF document, and the value may be obtained directly from the PDF document each time the value is required to be used.
Determining that the target paragraph ends at the current page in response to the first distance being unequal to the preset distance;
Through judgment, if the distance between the last character in the last line of content in the target paragraph and the right edge of the PDF page is not equal to the preset distance, the character of the last line in the PDF page of the target paragraph is not full of one line, so that the target paragraph can be determined to be finished in the current page.
Determining a second distance from a first character in a first row of a page of the next page to the edge of the PDF page in the second direction according to the second coordinate and the size information of the PDF document page in response to the first distance being equal to the preset distance, wherein the first direction and the second direction are opposite directions;
judging whether the second distance is equal to the preset distance or not;
determining that the target paragraph ends at the current page in response to the second distance being unequal to the preset distance;
and determining that the target paragraph is not ended on the current page in response to the second distance being equal to the preset distance.
And judging that if the distance between the last character in the last line of content in the target section and the right edge of the PDF page is equal to the preset distance, the last line of content in the current page of the target section is full of one line. In this case, the target paragraph may end on the current page or may not end, so it may be continuously determined whether the distance from the first character in the first line of content in the page of the next page to the left edge of the PDF page is equal to the preset distance, if not, it is determined that there is a recess in the first line of content, and in the embodiment of the present invention, the first line of content is considered to have a recess as an independent paragraph, so it is determined that the first paragraph in the page of the next page is considered to be an independent paragraph, and it may be determined that the target paragraph has ended in the previous page (i.e., the current page above); if the distance between the first character in the first line of content in the page of the next page and the left edge of the PDF page is equal to the preset distance, it is determined that there is no indentation in the first line of content, that is, the first paragraph in the page of the next page is not an independent one, that is, it is determined that the first paragraph and the target paragraph actually belong to one paragraph, and it is determined that the target paragraph does not end in the previous page.
In one or more embodiments of the present invention, the method for identifying a document paragraph may further include:
before judging whether the first distance is equal to a preset distance or not, calculating the distance between the paragraph before the target paragraph and the page edge in the first direction to obtain the preset distance.
Taking a certain paragraph as an example, the distance between the first character in any one of the first lines except the first paragraph and the left edge of the PDF may be taken as the preset distance, or the distance between the last character in any one of the last lines except the first paragraph and the right edge of the PDF may be taken as the preset distance. Since line Duan Ladi is typically indented and the characters of the last line of the paragraph are typically not full of lines, the first line or the last line of the paragraph may not be used to participate in the calculation when calculating the preset distance as appropriate. It should be noted that, for a particular paragraph, for example, a certain paragraph includes only one line of content, the preset distance may be calculated without using the paragraph.
In one or more embodiments of the present invention, there are some special paragraphs in the PDF document, when identifying these special paragraphs, a problem that the identification result is not accurate enough may occur by adopting the method, for example, a bulleted/numbered item in the document usually occupies a separate paragraph and is not retracted, and for the identification of such special paragraphs, further judgment needs to be made based on the method, so the method for identifying a document paragraph according to the embodiment of the present invention may further include:
Judging whether characters in the target paragraph are consistent with the font and/or the font size of the first paragraph in the page of the next page or not according to the second distance and the preset distance;
along the above example, when it is determined that the distance between the first word in the first line of characters in the next page and the left edge of the PDF page is equal to the above preset distance, in order to exclude the case that the first line is a special paragraph without indentation, for example, the bullets/numbers in the document, the characters in such special paragraph are usually different from the formats of the characters in other paragraphs in the document, it may be further determined whether the characters in the target paragraph are consistent with the fonts and/or the word sizes of the characters in the first paragraph in the page of the next page. And determining that the target paragraph ends at the current page in response to the character in the target paragraph being inconsistent with the font or the font size of the first paragraph in the page of the next page.
When the character in the target paragraph is inconsistent with the font of the first paragraph in the page of the next page or the font size is inconsistent, the first paragraph in the page of the next page and the target paragraph are not the same paragraph, so that the target paragraph can be determined to be ended in the current page.
In one or more embodiments of the present invention, the above method for identifying document paragraphs may be performed after opening a PDF document according to an open command of the PDF document, or may be performed in a process of identifying each paragraph in the PDF document, based on which, after the above method is performed, all paragraphs in the PDF document may be identified, and each paragraph in the PDF document may be presented in the form of a paragraph when the PDF document is presented. Therefore, the method for identifying a document paragraph may further include: before first position information of the last line in a target paragraph at the end of a page in a current page of a PDF document and second position information of the first line in a page of a next page of the PDF document are obtained, a PDF document opening instruction is obtained, and the PDF document is opened according to the opening instruction.
In one or more embodiments of the present invention, the method for identifying a document paragraph may further include: after identifying the target paragraph as one paragraph in response to the target paragraph not ending in the current page, or after identifying the target paragraph as one paragraph in response to the target paragraph ending in the current page, storing the identified data of the one paragraph as a data structure. For example, data belonging to a paragraph may be stored in a data structure of a predetermined kind. When the PDF document is opened next time, the content and paragraph information of each paragraph can be obtained by reading and analyzing the data structure corresponding to each paragraph, and then each paragraph in the PDF document can be displayed according to the content and paragraph information of each paragraph, so that the editing operation of a user on the PDF document is facilitated.
In one or more embodiments of the present invention, the method for identifying a document paragraph may further include: after identifying the target paragraph as a paragraph in response to the target paragraph not ending in the current page, or identifying the target paragraph as a paragraph in response to the target paragraph ending in the current page, performing editing processing according to the editing operation on the PDF document, wherein the identified paragraph is used as a natural paragraph. The editing operation on the PDF document may include operations such as adding, deleting, and modifying the content of the PDF document, for example. For example, suppose that two lines of content are newly added to a paragraph in a PDF document, such that the other paragraphs following the paragraph are each moved two lines back in the PDF document, and that after a page-crossing paragraph identified based on one or more embodiments of the invention, the paragraph is also moved two lines back as a separate paragraph in response to the above-described modification operation. In addition, the operations of deletion and modification are similar to those of addition, and will not be described here again.
FIG. 5 is a block diagram of a block diagram illustrating a device for identifying a document paragraph according to one or more embodiments of the present invention, as shown in FIG. 5, the device 50 includes:
A first obtaining module 51 configured to obtain first position information of a last line in a target paragraph at the end of a page in a current page of a PDF document, and second position information of a first line in a page of a next page of the PDF document;
a first determining module 52 configured to determine whether the target paragraph does not end on the current page according to the first location information and the second location information;
a first identifying module 53 configured to identify the target paragraph as one paragraph from a first paragraph in a page of the next page in response to the target paragraph not ending in the current page;
a second identifying module 54 is configured to identify the target paragraph as a paragraph in response to the target paragraph ending at the current page.
In one or more embodiments of the present invention, the first location information includes a first coordinate of a last character in a last line in the target paragraph, the second location information includes a second coordinate of a first character in a first line in a page of the next page, and the determining module is specifically configured to:
determining a first distance between the last character in the last line of the target paragraph and the edge of the PDF page in a first direction according to the first coordinate and the size information of the PDF document page;
Judging whether the first distance is equal to a preset distance or not;
determining that the target paragraph ends at the current page in response to the first distance being unequal to the preset distance;
determining a second distance from a first character in a first row of a page of the next page to the edge of the PDF page in the second direction according to the second coordinate and the size information of the PDF document page in response to the first distance being equal to the preset distance, wherein the first direction and the second direction are opposite directions;
judging whether the second distance is equal to the preset distance or not;
determining that the target paragraph ends at the current page in response to the second distance being unequal to the preset distance;
and determining that the target paragraph is not ended on the current page in response to the second distance being equal to the preset distance.
In one or more embodiments of the present invention, the apparatus for identifying a document paragraph may further include:
the calculating module is configured to calculate the distance between the paragraph before the target paragraph and the edge of the page in the first direction before judging whether the first distance is equal to the preset distance, so as to obtain the preset distance.
In one or more embodiments of the present invention, the apparatus for identifying a document paragraph may further include:
a judging module configured to judge whether characters in the target paragraph are consistent with the font and/or font size of the first paragraph in the page of the next page in response to the second distance being equal to the preset distance;
a second determination module configured to determine that the target paragraph ends at the current page in response to a character in the target paragraph being inconsistent with a font or a font size of a first paragraph in a page of the next page.
In one or more embodiments of the present invention, the apparatus for identifying a document paragraph may further include:
the second acquisition module is configured to acquire a PDF document opening instruction before acquiring first position information of a last line in a target paragraph at the end of a page in a current page of the PDF document and second position information of the first line in a page of a next page of the PDF document, and open the PDF document according to the opening instruction.
In one or more embodiments of the present invention, the apparatus for identifying a document paragraph may further include:
a storage module configured to store the data of the identified one paragraph as a data structure after identifying the first paragraph in the page of the target paragraph and the next page as one paragraph in response to the target paragraph not ending in the current page, or after identifying the target paragraph as one paragraph in response to the target paragraph ending in the current page.
In one or more embodiments of the present invention, the apparatus for identifying a document paragraph may further include: and a processing module configured to, after identifying the target paragraph as one paragraph in response to the target paragraph not ending in the current page or identifying the target paragraph as one paragraph in response to the target paragraph ending in the current page, perform editing processing according to the editing operation on the PDF document, based on the identified paragraph as one natural paragraph.
One or more embodiments of the present invention also provide an electronic device including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space surrounded by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to respective circuits or devices of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing any one of the above-described document paragraph recognition methods.
One or more embodiments of the present invention also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of identifying any one of the document paragraphs described above.
Accordingly, as shown in fig. 6, the electronic device provided by the embodiment of the present invention may include: the processor 62 and the memory 63 are arranged on the circuit board 64, wherein the circuit board 64 is arranged in a space surrounded by the shell 61; a power circuit 65 for powering the various circuits or devices of the server; the memory 63 is for storing executable program code; the processor 62 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 63 for executing any one of the document paragraph recognition methods provided in the foregoing embodiments.
FIG. 7 is a flow diagram illustrating a method of document presentation, which may be performed, for example, by a PDF reader, as shown in FIG. 7, in accordance with one or more embodiments of the invention, which includes:
step 701: responding to a save operation of a PDF document, and saving the PDF document and paragraph information of the PDF document;
In one example, before the saving operation of the PDF document is acquired, for example, a user performs an editing operation on an original PDF document (refer to a PDF document for which an editing operation is not performed) through a PDF reader, so that paragraph information of the PDF document is changed. After acquiring a save command triggered by a save operation performed by a user, paragraph information of the edited PDF document may be generated, where the paragraph information may include, for example, paragraph information of each natural segment in the PDF document, and the paragraph information may be saved in the PDF document, for example, in a tag (identification) form, where the edit operation performed on the PDF may include, for example, operations such as adding, deleting, and modifying contents (including characters and symbols) in the PDF document.
Wherein the save operation of the PDF document is for example a save operation triggered by a user clicking on a control presented on the user interface of the PDF reader to save the file. Or the user directly closes the PDF document to trigger the save operation after editing the PDF document.
Step 702: and presenting the PDF document according to paragraph information of the PDF document.
In one or more embodiments of the present invention, presenting the PDF document may refer to displaying a PDF document page in the PDF reader, and presenting the PDF document according to the paragraph information of the PDF document may be displaying the respective paragraphs of the PDF document in the display page of the PDF reader according to the information of the respective paragraphs of the PDF document recorded in the paragraph information saved in step 701 described above.
According to the document presenting method provided by one or more embodiments of the invention, the PDF document and the paragraph information of the PDF document can be stored in response to the storage operation of the PDF document, so that the PDF document can be presented according to the stored paragraph information of the PDF document, the problem that the paragraph information of the PDF document is lost when the PDF document is edited again because the paragraph information of the PDF document is not stored when the PDF document is stored is avoided, and the accuracy of the paragraph information of the PDF document is improved.
In one or more embodiments of the present invention, presenting the PDF document according to paragraph information of the PDF document may include:
and responding to the editing operation or the opening operation of the PDF document, and presenting the PDF document according to paragraph information of the PDF document.
The opening operation of the PDF document may be, for example, opening the PDF document by the PDF reader, and the opening operation may be, for example, implemented by a user clicking on a PDF document icon, or may be implemented by a user selecting and determining to open the PDF document in the PDF reader.
Along with the above example, after the above step 701 is performed, a command to close the PDF document may be triggered in response to an operation to close the PDF document, the PDF document may be closed according to the command, or the PDF document may be kept in an open state without acquiring the command to close the PDF document. If the PDF document is closed, the paragraph information of the PDF stored in the above step 701 may be obtained when the PDF document is opened again according to the opening command of the PDF document next time, and the PDF document is presented according to the paragraph information, and since the PDF document has been edited before the PDF document is saved, the paragraph information of the PDF document has been changed compared to the original PDF document (i.e., the PDF document before the editing operation is not performed), and thus, the state of the PDF document after the editing is presented at this time; if the PDF document is not closed, the PDF document can be presented according to the paragraph information stored in the step 701 before the next editing operation is performed, and similarly, the paragraph information of the presented PDF document is changed.
In one or more embodiments of the present invention, the paragraph information of the PDF document may include:
a paragraph identifier, a paragraph start identifier, and a paragraph end identifier; wherein the paragraph identifier is used for identifying a paragraph, the paragraph start identifier is used for identifying a position of a paragraph start, and the paragraph end identifier is used for identifying a position of a paragraph end.
For example, in saving the PDF document described above, the content in a paragraph (e.g., a natural paragraph) may be identified in tags (the content in a paragraph may include text and punctuation marks). Based on this, the subsequent PDF reader may use the content of the tag identifier as a paragraph when identifying the tag. When saving a PDF document, a start identifier may be used to identify a position of a start of a paragraph and an end identifier may be used to identify a position of an end of a paragraph, based on which a subsequent PDF reader may use the position identified by the start identifier of the paragraph as the start position of the current paragraph when identifying the start identifier of the paragraph and the position identified by the end identifier of the paragraph as the end position of the current paragraph when identifying the end identifier of the paragraph. It can be seen that based on the paragraph identifier, the paragraph start identifier and the paragraph end identifier, the content corresponding to a paragraph, the position of the paragraph start and the position of the paragraph end can be determined. When the PDF document is saved, after the paragraph information is saved, and when the PDF document is opened or edited again, the PDF reader can display each paragraph in the PDF document according to the saved paragraph information, so that the problem of paragraph identification errors is avoided.
In one or more embodiments of the present invention, the paragraph information of the PDF document may further include:
dictionary data, which may include paragraph information of one paragraph. That is, paragraph information corresponding to each paragraph in the PDF document may be expressed in the form of dictionary data, and the dictionary data corresponding to one paragraph corresponds to a paragraph identifier, a paragraph start identifier, and a paragraph end identifier of the paragraph. A paragraph may be uniquely identified with a paragraph identifier, a paragraph start identifier, a paragraph end identifier, and dictionary data for the paragraph. For example, in paragraph information for a paragraph, dictionary data may be located after the paragraph identification, for example, the dictionary data may be followed by a paragraph start identifier and a paragraph end identifier.
In one or more embodiments of the present invention, the paragraph information may further include: format information of characters in a paragraph expressed in the form of drawing instructions in the PDF standard protocol. The paragraph information may further include format information of characters in the paragraph (where the characters may include characters and symbols in the paragraph) expressed in the form of drawing instructions in the PDF standard protocol, the format information of the characters in the paragraph may be identified by using the drawing instructions in the PDF standard protocol, the format information of the characters in the paragraph may not be redefined any more, the format information of the characters in the paragraph may be parsed by means of parsing the drawing instructions in the existing PDF standard protocol, and the parsing efficiency of the PDF paragraph information may be improved on the basis of following the PDF standard protocol. The drawing instructions in the PDF standard protocol include, but are not limited to: the font, font size and position of the text are set, and other drawing instructions in other PDF standard protocols regarding PDF paragraph information can also be included. For example, format information of characters in a paragraph represented in the form of drawing instructions in the PDF standard protocol may be between the paragraph start identifier and the paragraph end identifier.
In one or more embodiments of the present invention, the paragraph information included in the dictionary data is
Format information of the paragraph. In combination with the above, when the paragraph information of the saved PDF document is parsed, after the paragraph identifier, the paragraph start identifier, the paragraph end identifier, the format information of the characters included in the paragraphs and the format information of the paragraphs are obtained, the state of one paragraph presented in the PDF document before the PDF document is saved can be restored according to the paragraph information, so that the PDF document can be restored to the editing state before the PDF document is saved before the PDF document is edited or opened next time, and the editing of the PDF document by a user is facilitated.
In one or more embodiments of the invention, the format information of the paragraph may include at least one of:
paragraph spacing, paragraph alignment, and paragraph indentation. The paragraph pitch information may include a value and a unit corresponding to a paragraph pitch, and the alignment of the paragraphs may include, for example, two-end alignment, left alignment, centering, right alignment, and scatter alignment, and the indentation of the paragraphs may include an indentation manner and an indentation amount of the paragraphs.
The following illustrates, in one example, a PDF document paragraph rendering method of one or more embodiments of the present invention.
When the user clicks and saves after editing the content of the PDF document, the program (such as the PDF reader described above) puts together the content in the same paragraph (including the text and symbols in the same paragraph) and identifies with tag, and generates paragraph information of the paragraph as follows:
/Paragraph<</Sect ion 0>>BDC
18.84 0 0 24 142.57 659.77Tm
<300876EE>Tj
EMC
in the above Paragraph information,/Paragraph represents a Paragraph identifier for identifying a Paragraph, < </Sect > 0> > is an example of the above dictionary data, in which the dictionary data is paired in a key and va_ue (value), in the above example Paragraph information,/Sect is key,0 is va_ue, and in </Sect i > on 0>, the Paragraph information is stored, for example, the format information of the above paragraphs may be stored. Wherein BDC represents the start of tag, i.e. the above paragraph start identifier, and EMC represents the end of tag, i.e. the above paragraph end identifier.
The data in the middle of the BDC and the EMC are drawing instructions in the PDF standard protocol, for example, font size, position, and the like of the characters are set. In addition to the above examples, other paragraph information may be stored in the dictionary data, such as the alignment of paragraphs, e.g.,/ai document/Left, indicating that the alignment of paragraphs is Left aligned, e.g.,/Front I ndentat i onVa l ue 5/Front I ndentat i onUn it/hold, indicating that the pre-paragraph retract is 5 in pounds. Other paragraph format information such as the paragraph spacing is similar, and will not be described here.
When the program opens the PDF document again for editing, the paragraph information corresponding to each paragraph of the PDF can be obtained by analyzing the paragraph information stored after the last editing, and the PDF document is presented according to the paragraph information obtained by analysis.
Based on the above examples, the paragraph information of one or more embodiments of the present invention is further illustrated by way of two examples.
In another example, in response to a PDF save operation, paragraph information of a certain paragraph of a saved PDF document is as follows:
/Paragraph<</Sect i on 0/Al i gnment/Ri ght>>BDC
18.84 0 0 24 142.57 659.77Tm
<300876EE>Tj
EMC
the paragraph information indicates that the paragraph is right aligned.
In yet another example, in response to a PDF save operation, paragraph information of a certain paragraph of a saved PDF document is as follows:
/Paragraph<</Sect i on 0/Beh i nd I ndentat i onVa l ue 10/Beh i nd I ndentat i onUn i t/I nch>>BDC
18.84 0 0 24 142.57 659.77Tm
<300876EE>Tj
EMC
the paragraph information indicates that the paragraph is indented 10 in inches after the paragraph.
Note that, since one PDF document generally includes a plurality of paragraphs, paragraph information of the PDF document saved in response to a save operation in one or more embodiments of the present invention may include, for example, paragraph information shown in the plurality of above examples.
FIG. 8 is a schematic diagram of a document presentation apparatus according to one or more embodiments of the present invention, as shown in FIG. 8, the apparatus 80 includes:
A saving module 81 configured to save paragraph information of a PDF document and the PDF document in response to a saving operation on the PDF document;
and a presentation module 82 configured to present the PDF document according to paragraph information of the PDF document.
In one or more embodiments of the present invention, the presentation module may be specifically configured to: and responding to the editing operation or the opening operation of the PDF document, and presenting the PDF document according to paragraph information of the PDF document.
In one or more embodiments of the present invention, the paragraph information of the PDF document may include: a paragraph identifier, a paragraph start identifier, and a paragraph end identifier; wherein the paragraph identifier is used for identifying a paragraph, the paragraph start identifier is used for identifying a position of a paragraph start, and the paragraph end identifier is used for identifying a position of a paragraph end.
In one or more embodiments of the present invention, the paragraph information of the PDF document may further include: dictionary data including paragraph information of one paragraph.
In one or more embodiments of the present invention, the paragraph information may further include: format information of characters in a paragraph expressed in the form of drawing instructions in the PDF standard protocol.
In one or more embodiments of the present invention, the paragraph information included in the dictionary data is format information of a paragraph.
In one or more embodiments of the invention, the format information of the paragraph may include at least one of: paragraph spacing, paragraph alignment, and paragraph indentation.
One or more embodiments of the present invention also provide an electronic device including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space surrounded by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to respective circuits or devices of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing any one of the document presentation methods described above.
One or more embodiments of the present invention also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform any one of the document presentation methods described above.
Accordingly, as shown in fig. 9, the electronic device provided by the embodiment of the present invention may include: the device comprises a shell 91, a processor 92, a memory 93, a circuit board 94 and a power circuit 95, wherein the circuit board 94 is arranged in a space surrounded by the shell 91, and the processor 92 and the memory 93 are arranged on the circuit board 94; a power circuit 95 for powering the various circuits or devices of the server; a memory 93 for storing executable program code; the processor 92 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 93 for executing any one of the document presentation methods provided in the foregoing embodiments.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
For convenience of description, the above apparatus is described as being functionally divided into various units/modules, respectively. Of course, the functions of the various elements/modules may be implemented in the same piece or pieces of software and/or hardware when implementing the present invention.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-On-y Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.
Claims (10)
1. A document editing method, comprising:
determining a target editing area of the PDF document in response to a received selection operation performed on a PDF document page;
identifying the content in the target editing area as a paragraph;
and responding to the received editing operation of the content in the target editing area, and editing the content in the target editing area as a paragraph according to the editing operation.
2. The method of claim 1, wherein determining the PDF document target edit area in response to a received selected operation performed on a PDF document page comprises:
responding to the selected operation, acquiring first position information and second position information of the PDF document page;
And constructing a text box according to the first position information and the second position information, and determining an area in the text box as the target editing area.
3. The method of claim 1, wherein determining the outgoing target edit region of the PDF document in response to a received selected operation performed based on a PDF document page comprises:
acquiring a drawing track based on the PDF document page;
and forming a text box according to the drawing track, and determining the area in the text box as the target editing area.
4. A method according to claim 2 or 3, characterized in that the method further comprises:
after the text box is obtained, the text box is displayed.
5. The method according to claim 1, wherein the method further comprises:
after editing processing is carried out on the content in the target editing area as a paragraph according to the editing operation, a new PDF document is generated on the paragraph after editing processing and other parts which are not edited in the PDF document.
6. The method of claim 1, wherein the target edit area comprises content of more than one natural segment.
7. The method of claim 1, wherein the target edit area comprises content of less than one natural segment.
8. The method of claim 1, wherein the content in the target edit area comprises at least one of:
text, characters, and pictures.
9. A document editing apparatus, comprising:
a determining module configured to determine a target editing area of a PDF document in response to a received selected operation performed based on a PDF document page;
an identification module configured to identify content in the target editing area as a paragraph;
and the editing module is configured to respond to the received editing operation on the content in the target editing area and carry out editing processing according to the editing operation by taking the content in the target editing area as a paragraph.
10. An electronic device, the electronic device comprising: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space surrounded by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to respective circuits or devices of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing the document editing method according to any one of the above claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211591507.XA CN115994521A (en) | 2022-12-12 | 2022-12-12 | Document editing method, presentation method, and document paragraph identification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211591507.XA CN115994521A (en) | 2022-12-12 | 2022-12-12 | Document editing method, presentation method, and document paragraph identification method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115994521A true CN115994521A (en) | 2023-04-21 |
Family
ID=85989742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211591507.XA Pending CN115994521A (en) | 2022-12-12 | 2022-12-12 | Document editing method, presentation method, and document paragraph identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115994521A (en) |
-
2022
- 2022-12-12 CN CN202211591507.XA patent/CN115994521A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8819545B2 (en) | Digital comic editor, method and non-transitory computer-readable medium | |
US20190087392A1 (en) | System and method for automated conversion of interactive sites and applications to support mobile and other display environments | |
US8718364B2 (en) | Apparatus and method for digitizing documents with extracted region data | |
US8930814B2 (en) | Digital comic editor, method and non-transitory computer-readable medium | |
US8952985B2 (en) | Digital comic editor, method and non-transitory computer-readable medium | |
KR102291479B1 (en) | Detection and reconstruction of east asian layout features in a fixed format document | |
US20120131520A1 (en) | Gesture-based Text Identification and Selection in Images | |
US8208737B1 (en) | Methods and systems for identifying captions in media material | |
KR20150087405A (en) | Providing note based annotation of content in e-reader | |
EP2544099A1 (en) | Method for creating an enrichment file associated with a page of an electronic document | |
CN109658485B (en) | Webpage animation drawing method, device, computer equipment and storage medium | |
WO2013058397A1 (en) | Digital comic editing device and method therefor | |
US9619445B1 (en) | Conversion of content to formats suitable for digital distributions thereof | |
US8824806B1 (en) | Sequential digital image panning | |
CN116127916B (en) | Method and device for dynamically adding watermark | |
CN115994521A (en) | Document editing method, presentation method, and document paragraph identification method and device | |
CN113378526A (en) | PDF paragraph processing method, device, storage medium and equipment | |
CN117291152A (en) | Table extraction method and apparatus | |
CN114564915A (en) | Text typesetting method, electronic equipment and storage medium | |
CN112100977A (en) | Window partial refreshing method, electronic device and storage medium | |
KR20210060808A (en) | Document editing device to check whether the font applied to the document is a supported font and operating method thereof | |
CN113157194B (en) | Text display method, electronic equipment and storage device | |
US20240202429A1 (en) | Method, device, computer equipment and storage medium for editing pdf files | |
CN112364156B (en) | Information display method and device and computer readable storage medium | |
CN115859907B (en) | Reading annotation zoom display method, system and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |