CN112100979A - Typesetting processing method based on electronic book, electronic equipment and storage medium - Google Patents

Typesetting processing method based on electronic book, electronic equipment and storage medium Download PDF

Info

Publication number
CN112100979A
CN112100979A CN202010972641.9A CN202010972641A CN112100979A CN 112100979 A CN112100979 A CN 112100979A CN 202010972641 A CN202010972641 A CN 202010972641A CN 112100979 A CN112100979 A CN 112100979A
Authority
CN
China
Prior art keywords
picture
page
picture elements
area
elements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010972641.9A
Other languages
Chinese (zh)
Inventor
张恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ireader Technology Co Ltd
Original Assignee
Ireader Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ireader Technology Co Ltd filed Critical Ireader Technology Co Ltd
Priority to CN202010972641.9A priority Critical patent/CN112100979A/en
Publication of CN112100979A publication Critical patent/CN112100979A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/106Display of layout of documents; Previewing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/189Automatic justification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Processing Or Creating Images (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The invention discloses a typesetting processing method based on an electronic book, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring picture elements obtained by analyzing original book pages of the electronic book; when the number of the picture elements contained in the original book page is multiple, judging whether each picture element belongs to a fragment picture according to the position relation among the multiple picture elements; if so, merging the picture elements belonging to the fragment pictures to obtain a target picture area containing the merged picture elements; and executing screenshot processing aiming at the target picture area to obtain a screenshot picture corresponding to the target picture area, and generating page typesetting content corresponding to the original book page according to the screenshot picture. The method can identify the integral picture formed by a plurality of picture elements, and reserve the composition mode of the picture, so that the typesetting content is consistent with the original content of the electronic book, and the typesetting efficiency and accuracy are improved.

Description

Typesetting processing method based on electronic book, electronic equipment and storage medium
Technical Field
The invention relates to the field of computers, in particular to a typesetting processing method based on an electronic book, electronic equipment and a storage medium.
Background
In the electronic book typesetting process, the electronic book manuscript in format typesetting needs to be identified, and typesetting with a custom effect is realized through a streaming typesetting mode according to the identification result. Among them, electronic book documents are usually in an uneditable format such as PDF. In the process of identifying the electronic book manuscript, various page elements in the manuscript can be automatically identified, and the page elements specifically comprise various types such as character elements and picture elements. And then, automatically converting the file into a streaming document according to the recognition result to realize custom typesetting.
However, in the process of implementing the present invention, the inventor finds that the above solution in the prior art has at least the following defects: in order to enrich the display effect of pictures, part of pictures in an electronic book are not composed of a single picture element, but are combined by a plurality of picture elements or other types of page elements. Correspondingly, if the typesetting is directly performed according to each page element obtained by the analysis, the composition mode of the picture itself can be damaged, so that the finally obtained typesetting content is inconsistent with the original content of the electronic book.
Disclosure of Invention
In view of the above problems, the present invention has been made to provide a method, an electronic device, and a storage medium for electronic book-based typesetting that overcome or at least partially solve the above problems.
According to an aspect of the present invention, there is provided a method for processing a layout based on an electronic book, including:
acquiring picture elements obtained by analyzing original book pages of the electronic book;
when the number of the picture elements contained in the original book page is multiple, judging whether each picture element belongs to a fragment picture according to the position relation among the multiple picture elements;
if so, merging the picture elements belonging to the fragment pictures to obtain a target picture area containing the merged picture elements;
and executing screenshot processing aiming at the target picture area to obtain a screenshot picture corresponding to the target picture area, and generating page typesetting content corresponding to the original book page according to the screenshot picture.
According to another aspect of the present invention, there is provided an electronic apparatus including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute each operation in the method.
According to yet another aspect of the present invention, a computer storage medium is provided, in which at least one executable instruction is stored, and the executable instruction causes the processor to perform the operations of the above-mentioned method.
In the electronic book-based typesetting processing method, the electronic device and the storage medium provided by the invention, when the number of picture elements contained in an original book page is multiple, whether each picture element belongs to a fragment-type picture or not can be judged according to the position relation among the multiple picture elements, merging processing is carried out on the multiple picture elements belonging to the fragment-type picture to obtain a target picture area containing the multiple merged picture elements, and a screenshot picture corresponding to the target picture area is obtained through screenshot processing. The screenshot picture is converted into a complete picture element, so that the composition mode in the original picture is not damaged. Therefore, the method can identify the integral picture formed by the plurality of picture elements, and reserve the composition mode of the picture, so that the finally obtained typesetting content is consistent with the original content of the electronic book, and further the typesetting efficiency and accuracy are improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flowchart illustrating a method for processing electronic book-based typesetting according to an embodiment of the invention;
FIG. 2 is a flowchart illustrating a method for processing electronic book-based typesetting according to another embodiment of the invention;
fig. 3 shows a schematic structural diagram of an electronic device according to another embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example one
Fig. 1 is a flowchart illustrating a method for processing electronic book-based typesetting according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:
step S110: and acquiring picture elements obtained by analyzing original book pages of the electronic book.
Specifically, the original book page of the electronic book refers to a page before layout, and is usually a layout page. The original book page can be parsed by the page parser, so as to obtain each page element contained in the page, and in this step, the picture elements of the picture type need to be extracted from various page elements.
Step S120: when the number of the picture elements contained in the original book page is multiple, whether each picture element belongs to the fragment picture is judged according to the position relation among the multiple picture elements.
When the number of the picture elements contained in one original book page is multiple, the position relation among the picture elements is obtained, and whether the picture elements belong to the fragment pictures or not is judged according to the position relation. Wherein, the fragmented picture means: partial picture content contained in the full picture. Because the content of the insets in the electronic book is usually more, a large picture is often spliced by a plurality of small pictures, at this time, each small picture is a fragment picture, and the plurality of fragment pictures are commonly used for forming a large picture, so the position relationship among the fragment pictures is relatively fixed.
In the specific judgment, the judgment can be performed according to whether the position of each picture element is located in the middle of the page and occupies the main body area in the page. Typically, the content that alone forms a picture tends to be displayed centrally and occupies the main area of the page. Wherein, the main body area of the page is as follows: the page width is not smaller than the area of the first preset proportion of the total page width. In addition, whether other picture elements exist in the adjacent region of the picture element can be further determined, and if other picture elements do not exist in the adjacent region of one picture element, it is determined that the picture element does not belong to the fragmented picture. In summary, the positional relationship in the present embodiment includes: the position coordinates of the picture elements on the page, the occupied proportion of the picture elements on the page, the relative position relationship between the picture elements and other picture elements and the like. The present invention does not limit the specific meaning of the positional relationship.
Step S130: and if so, merging the picture elements belonging to the fragment pictures to obtain a target picture area containing the merged picture elements.
Since a plurality of picture elements belonging to the fragmented picture are commonly used to form a complete picture, the plurality of picture elements belonging to the fragmented picture need to be merged to obtain a target picture region including the merged picture elements. During the merging process, the number and the range of the picture elements to be merged may be further determined by combining the editing order of the picture elements, the spacing distance between the picture elements, and other factors. The target picture areas of the merged picture elements refer to: the picture element comprises a plurality of picture elements belonging to the fragment type picture, and the picture elements are arranged in the same picture.
Step S140: and executing screenshot processing aiming at the target picture area to obtain a screenshot picture corresponding to the target picture area, and generating page typesetting content corresponding to the original book page according to the screenshot picture.
Specifically, the screenshot picture is used as a complete picture element, so that typesetting processing is performed according to the complete picture element and other page elements contained in the original book page, and page typesetting content corresponding to the original book page is obtained. The screenshot picture completely reserves all page elements for forming the picture in a picture form, so that the problem that the composition mode is disordered is solved.
Therefore, the method can identify the integral picture formed by the plurality of picture elements, and reserve the composition mode of the picture, so that the finally obtained typesetting content is consistent with the original content of the electronic book, and further the typesetting efficiency and accuracy are improved.
Example two
Fig. 2 is a flowchart illustrating a method for processing electronic book-based typesetting according to another embodiment of the present invention. As shown in fig. 2, the method comprises the steps of:
step S200: acquiring page elements obtained by analyzing original book pages of the electronic book.
Specifically, the original book page of the electronic book refers to a page before layout, and is usually a layout page. The page resolver can resolve the original book pages, so that each page element contained in the original book pages is obtained. In the present embodiment, the original book page refers to a single page in the electronic book, considering that the pictures of the electronic book are not usually displayed across pages.
Step S210: and extracting picture elements from the analyzed page elements.
Specifically, picture elements of picture types need to be extracted from various types of page elements. The page element is a minimum unit forming page content, and specifically includes: text elements, picture elements, and path elements. Wherein, the text element refers to: elements composed of various texts such as English characters, Chinese characters and the like. The picture elements refer to: and elements composed of contents in various picture formats such as jpg. The path element refers to: an element formed by a path line, wherein the path line is used to connect any two end points by a straight line or a curved line. Since various types of page elements have corresponding attribute information, the picture elements can be extracted according to the attribute information of the page elements.
Step S220: when the number of the picture elements contained in the original book page is multiple, whether each picture element belongs to the fragment picture is judged according to the position relation among the multiple picture elements.
The main purpose of this step is to identify a plurality of fragmented pictures in the electronic book that are used to compose a complete picture. Since the number of fragmented pictures is necessarily multiple, if only one picture element is included in an original book page, the picture element usually does not belong to a fragmented picture. Accordingly, only when the number of the picture elements included in one original book page is multiple, it is necessary to determine whether the multiple picture elements belong to the fragmented picture.
Specifically, the determination may be made in at least one of the following two ways:
in the first mode, the element size of each picture element is obtained, and whether the picture element belongs to a fragment picture is judged according to the relative relationship between the element size and the page size of an original book page. The mode is mainly judged according to the size information. Wherein the element sizes of the picture elements include: length, width, etc. of the picture element. Accordingly, the page size of the original book page also includes the page length and the page width. The relative relationship between the element size of the picture element and the page size of the original book page mainly refers to: and the proportion of the picture elements in the original book page. Since a single picture is usually displayed in the middle of the page and occupies a certain page space, the picture element with smaller proportion can be determined as the fragment picture by detecting the proportion of the picture element in the page. In specific implementation, a ratio threshold may be set, for example, 20%, and if a ratio between the element size of the picture element and the page size of the original book page is smaller than the ratio threshold, it is determined that the picture element is a fragmented picture. In general, small pictures to be stitched into a large picture are generally small in size, and therefore, a fragmented picture can be quickly and efficiently recognized by size information.
In the second mode, the position information of each picture element in the original book page is obtained, and whether the picture elements adjacent to at least two positions belong to the fragment pictures is judged according to the position information. The position information is represented by position coordinates, and according to the position information of each picture element in the original book page, whether one picture element is in a centered state or a non-centered state can be judged, and whether other adjacent picture elements exist around one picture element can be judged. Accordingly, whether a picture element is a fragmented picture can be determined according to whether the picture element is in a centered state and the distance between two adjacent picture elements. Typically, picture elements that are in a non-centered state and/or have a separation distance from neighboring picture elements that is less than a preset separation threshold belong to a fragmented picture.
The two judgment methods can be used independently or in combination.
Step S230: and if so, merging the picture elements belonging to the fragment pictures to obtain a target picture area containing the merged picture elements.
Since the plurality of fragmented pictures are commonly used to form a complete electronic book illustration, the relative position relationship of each fragmented picture in the electronic book illustration is fixed, and in order to ensure that the relative position relationship is not disturbed in the typesetting process, a plurality of picture elements belonging to the fragmented pictures need to be merged to obtain a target picture region including the merged picture elements. Wherein, the target picture area means: and the multiple fragmented pictures jointly form an area corresponding to a complete electronic book illustration.
Therefore, in order to ensure that the layout of the e-book illustration is correct, it is necessary to ensure that all the page elements included in the target drawing region are page elements in the e-book illustration. Namely: the target picture area does not omit any page elements contained in the electronic book illustration, and the target picture area does not contain any page elements not belonging to the electronic book illustration.
In order to achieve the above object, in this step, the merging process may be performed according to the separation distance between the respective fragmented pictures. The method is realized by the following steps: merging at least two picture elements belonging to the fragmented picture, the position interval of which is smaller than a preset interval threshold value, to obtain a candidate picture area; judging whether the area range of the candidate picture area needs to be adjusted or not; if so, determining a target picture area according to the area range of the adjusted candidate picture area; if not, the target picture area is directly determined according to the area range of the candidate picture area. Therefore, the method mainly performs pairwise merging processing according to the position interval between two fragmented pictures so as to merge the fragmented pictures with closer intervals.
In specific implementation, it is considered that some interfering page elements may exist in the candidate picture region obtained by combining according to the position interval, for example, page elements which do not belong to the e-book illustration may be included; alternatively, page elements belonging to the e-book illustration may also be missed. Therefore, whether the area range of the candidate image area needs to be adjusted or not needs to be judged, and in specific implementation, the adjustment can be performed through the editing sequence of each image element in the original book page: after merging at least two picture elements belonging to the fragmented pictures and having a position interval smaller than a preset interval threshold value to obtain a candidate picture area, acquiring an editing order of each picture element contained in the candidate picture area in an original book page; judging whether the area range of the candidate picture area needs to be adjusted or not according to the editing sequence; and if so, determining the target picture area according to the area range of the adjusted candidate picture area. When judging whether the area range of the candidate picture area needs to be adjusted according to the editing sequence, sequencing the editing sequence of each picture element contained in the candidate picture area, and judging whether the sequence interval between two picture elements adjacent to the sequence is larger than a preset sequence threshold value; if so, at least one picture element in the two picture elements adjacent to the sequence is removed from the candidate picture area, and the area range of the candidate picture area is adjusted according to the remaining picture elements. The editing order refers to an obtaining order of each page element in an original book page, and generally, the editing order of each element in the same picture should be continuous or not greatly different, so that picture elements with greatly different editing orders may not belong to the picture. For example, if the editing order of the fragmented pictures in the candidate picture area corresponding to one picture is mostly distributed in the first interval, the first interval is between 1 and 100. At this time, if the editing order of a certain fragmented picture is located in the second interval, which is located between 1000-1100, it is determined that the fragmented picture located in the second interval does not belong to the picture content, and the fragmented picture should be removed.
In addition, in this embodiment, when determining whether the area range of the candidate picture area needs to be adjusted, the determination may be performed according to whether the candidate picture area contains a text. Specifically, whether a text is contained in the candidate picture area is judged; if so, adjusting the area range of the candidate picture area according to the position of the text. The candidate picture area should not contain the text, and therefore, the page elements that conflict with the position of the text should be removed from the candidate picture area.
In addition, the inventor finds that some pictures further contain text contents, and the text contents belong to a part of the pictures, different from text texts. For example, in the layout text, the text and the picture show the stacking effect, but in the streaming typesetting mode, the text and the picture are typeset separately, so that the stacking effect cannot be shown. At this time, in order to ensure that the characters belonging to the picture content can be divided into the candidate picture areas and maintain the stacking effect of the characters and the pictures, the following processing is performed:
when at least two picture elements belonging to the fragment pictures with the position interval smaller than a preset interval threshold are combined to obtain a candidate picture area, further obtaining text elements obtained by analyzing original book pages of the electronic book; acquiring first position information of at least two picture elements belonging to the fragmented pictures in an original book page and second position information of text elements in the original book page; judging whether a text element coincident with at least two picture elements belonging to the fragment pictures exists or not according to the first position information and the second position information; if so, merging the at least two picture elements belonging to the fragmented picture and the text element coincident with the at least two picture elements belonging to the fragmented picture together to obtain a candidate picture area. According to the attribute information of the page elements, the fact that one page element belongs to a picture element or a text element can be determined, and for each page element, position coordinate data of the page element can be acquired. Correspondingly, when merging processing is executed, acquiring first position information of at least two picture elements belonging to the fragmented pictures in an original book page aiming at the at least two picture elements belonging to the fragmented pictures with the position intervals smaller than a preset interval threshold value, so as to determine the position ranges of the at least two picture elements belonging to the fragmented pictures in the original book page; and acquiring second position information of each text element in the original book page, judging whether text elements which are overlapped with at least two picture elements belonging to the fragment pictures exist according to whether the first position information and the second position information are overlapped, and merging the text elements which are overlapped with the picture elements into the candidate picture area together when the judgment result is yes. Wherein the coincidence comprises: all or part of the text, i.e. the text may all coincide with the picture elements, or the text may also partly coincide with the picture elements.
When at least two picture elements belonging to the fragmented picture and a text element coincident with the at least two picture elements belonging to the fragmented picture are combined together, the method is realized in the following mode: firstly, determining a first rectangular area corresponding to at least two picture elements belonging to the fragmented picture and a second rectangular area corresponding to a text element coinciding with the at least two picture elements belonging to the fragmented picture; then, determining a circumscribed rectangle corresponding to the first rectangular region and the second rectangular region, and performing merging processing on picture elements and text elements contained in the circumscribed rectangle. Wherein, the first rectangular area corresponding to at least two picture elements belonging to the fragmented picture refers to: at least two first external rectangles corresponding to picture elements belonging to the fragment type picture; correspondingly, the second rectangular region corresponding to a text element coinciding with at least two picture elements belonging to the fragmented picture refers to: a second circumscribed rectangle to which the text element corresponds. Therefore, the circumscribed rectangle corresponding to the first rectangular region and the second rectangular region means: a common circumscribed rectangle corresponding to the first and second circumscribed rectangles.
Step S240: and executing screenshot processing aiming at the target picture area to obtain a screenshot picture corresponding to the target picture area, and generating page typesetting content corresponding to the original book page according to the screenshot picture.
Specifically, the screenshot picture is used as a complete picture element, so that typesetting processing is performed according to the complete picture element and other page elements contained in the original book page, and page typesetting content corresponding to the original book page is obtained. The screenshot picture completely reserves page elements such as all fragment pictures for forming the picture in a picture form, so that the problem that the composition mode is disturbed is solved.
In this embodiment, an original book page of the electronic book is a layout page; the page typesetting content is streaming typesetting content.
Therefore, the method can identify the integral picture formed by the plurality of picture elements, and reserve the composition mode of the picture, so that the finally obtained typesetting content is consistent with the original content of the electronic book, and further the typesetting efficiency and accuracy are improved. In addition, the method can accurately identify the picture area according to various factors such as the spacing distance, the editing sequence and the like, and can also accurately process the picture containing the characters, so that the content after screenshot is ensured to be consistent with the picture area, and the typesetting accuracy is improved.
EXAMPLE III
The embodiment of the application provides a non-volatile computer storage medium, wherein at least one executable instruction is stored in the computer storage medium, and the computer executable instruction can execute the typesetting processing method based on the electronic book in any method embodiment.
The executable instructions may be specifically configured to cause the processor to:
acquiring picture elements obtained by analyzing original book pages of the electronic book;
when the number of the picture elements contained in the original book page is multiple, judging whether each picture element belongs to a fragment picture according to the position relation among the multiple picture elements;
if so, merging the picture elements belonging to the fragment pictures to obtain a target picture area containing the merged picture elements;
and executing screenshot processing aiming at the target picture area to obtain a screenshot picture corresponding to the target picture area, and generating page typesetting content corresponding to the original book page according to the screenshot picture.
In an alternative implementation, the executable instructions are configured to cause the processor to:
the judging whether each picture element belongs to the fragment picture according to the position relation among the plurality of picture elements comprises the following steps:
acquiring the element size of each picture element, and judging whether the picture element belongs to a fragment picture or not according to the relative relation between the element size and the page size of the original book page; and/or the presence of a gas in the gas,
and acquiring the position information of each picture element in the original book page, and judging whether the picture elements adjacent to at least two positions belong to the fragment pictures or not according to the position information.
In an alternative implementation, the executable instructions are configured to cause the processor to:
merging at least two picture elements belonging to the fragmented picture, the position interval of which is smaller than a preset interval threshold value, to obtain a candidate picture area;
acquiring the editing sequence of each picture element contained in the candidate picture area in the original book page;
judging whether the area range of the candidate picture area needs to be adjusted or not according to the editing sequence; and if so, determining the target picture area according to the area range of the adjusted candidate picture area.
In an alternative implementation, the executable instructions are configured to cause the processor to:
ordering the editing sequence of each picture element contained in the candidate picture area, and judging whether the sequence interval between two picture elements adjacent to each other in sequence is larger than a preset sequence threshold value;
if so, at least one picture element in the two picture elements adjacent to the sequence is removed from the candidate picture region, and the region range of the candidate picture region is adjusted according to the remaining picture elements.
In an alternative implementation, the executable instructions are configured to cause the processor to:
judging whether the candidate picture area contains a text or not; and if so, adjusting the area range of the candidate picture area according to the position of the text.
In an alternative implementation, the executable instructions are configured to cause the processor to:
acquiring text elements obtained by analyzing original book pages of the electronic book;
acquiring first position information of the at least two picture elements belonging to the fragmented pictures in the original book page and second position information of the text elements in the original book page;
judging whether a text element coincident with the at least two picture elements belonging to the fragment type picture exists or not according to the first position information and the second position information;
if so, merging at least two picture elements belonging to the fragmented picture and the text element coincident with the at least two picture elements belonging to the fragmented picture together to obtain the candidate picture area.
In an alternative implementation, the executable instructions are configured to cause the processor to:
determining a first rectangular region corresponding to the at least two picture elements belonging to the fragmented picture and a second rectangular region corresponding to a text element coinciding with the at least two picture elements belonging to the fragmented picture;
and determining a circumscribed rectangle corresponding to the first rectangular region and the second rectangular region, and merging picture elements and text elements contained in the circumscribed rectangle.
In an optional implementation manner, an original book page of the electronic book is a layout page; the page typesetting content is streaming typesetting content.
Example four
Fig. 3 is a schematic structural diagram of an electronic device according to another embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.
As shown in fig. 3, the electronic device may include: a processor (processor)302, a communication Interface 304, a memory 306, and a communication bus 308.
Wherein: the processor 302, communication interface 304, and memory 306 communicate with each other via a communication bus 308. A communication interface 304 for communicating with network elements of other devices, such as clients or other servers. The processor 302 is configured to execute the program 310, and may specifically execute relevant steps in the above-described electronic book-based typesetting processing method embodiment.
In particular, program 310 may include program code comprising computer operating instructions.
The processor 302 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The electronic device comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 306 for storing a program 310. Memory 306 may comprise high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 310 may specifically be configured to cause the processor 302 to perform the following operations:
acquiring picture elements obtained by analyzing original book pages of the electronic book;
when the number of the picture elements contained in the original book page is multiple, judging whether each picture element belongs to a fragment picture according to the position relation among the multiple picture elements;
if so, merging the picture elements belonging to the fragment pictures to obtain a target picture area containing the merged picture elements;
and executing screenshot processing aiming at the target picture area to obtain a screenshot picture corresponding to the target picture area, and generating page typesetting content corresponding to the original book page according to the screenshot picture.
In an alternative implementation, the executable instructions are configured to cause the processor to:
the judging whether each picture element belongs to the fragment picture according to the position relation among the plurality of picture elements comprises the following steps:
acquiring the element size of each picture element, and judging whether the picture element belongs to a fragment picture or not according to the relative relation between the element size and the page size of the original book page; and/or the presence of a gas in the gas,
and acquiring the position information of each picture element in the original book page, and judging whether the picture elements adjacent to at least two positions belong to the fragment pictures or not according to the position information.
In an alternative implementation, the executable instructions are configured to cause the processor to:
merging at least two picture elements belonging to the fragmented picture, the position interval of which is smaller than a preset interval threshold value, to obtain a candidate picture area;
acquiring the editing sequence of each picture element contained in the candidate picture area in the original book page;
judging whether the area range of the candidate picture area needs to be adjusted or not according to the editing sequence; and if so, determining the target picture area according to the area range of the adjusted candidate picture area.
In an alternative implementation, the executable instructions are configured to cause the processor to:
ordering the editing sequence of each picture element contained in the candidate picture area, and judging whether the sequence interval between two picture elements adjacent to each other in sequence is larger than a preset sequence threshold value;
if so, at least one picture element in the two picture elements adjacent to the sequence is removed from the candidate picture region, and the region range of the candidate picture region is adjusted according to the remaining picture elements.
In an alternative implementation, the executable instructions are configured to cause the processor to:
judging whether the candidate picture area contains a text or not; and if so, adjusting the area range of the candidate picture area according to the position of the text.
In an alternative implementation, the executable instructions are configured to cause the processor to:
acquiring text elements obtained by analyzing original book pages of the electronic book;
acquiring first position information of the at least two picture elements belonging to the fragmented pictures in the original book page and second position information of the text elements in the original book page;
judging whether a text element coincident with the at least two picture elements belonging to the fragment type picture exists or not according to the first position information and the second position information;
if so, merging at least two picture elements belonging to the fragmented picture and the text element coincident with the at least two picture elements belonging to the fragmented picture together to obtain the candidate picture area.
In an alternative implementation, the executable instructions are configured to cause the processor to:
determining a first rectangular region corresponding to the at least two picture elements belonging to the fragmented picture and a second rectangular region corresponding to a text element coinciding with the at least two picture elements belonging to the fragmented picture;
and determining a circumscribed rectangle corresponding to the first rectangular region and the second rectangular region, and merging picture elements and text elements contained in the circumscribed rectangle.
In an optional implementation manner, an original book page of the electronic book is a layout page; the page typesetting content is streaming typesetting content.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (10)

1. A typesetting processing method based on an electronic book comprises the following steps:
acquiring picture elements obtained by analyzing original book pages of the electronic book;
when the number of the picture elements contained in the original book page is multiple, judging whether each picture element belongs to a fragment picture according to the position relation among the multiple picture elements;
if so, merging the picture elements belonging to the fragment pictures to obtain a target picture area containing the merged picture elements;
and executing screenshot processing aiming at the target picture area to obtain a screenshot picture corresponding to the target picture area, and generating page typesetting content corresponding to the original book page according to the screenshot picture.
2. The method according to claim 1, wherein the determining whether each picture element belongs to a fragmented picture according to the position relationship among the plurality of picture elements comprises:
acquiring the element size of each picture element, and judging whether the picture element belongs to a fragment picture or not according to the relative relation between the element size and the page size of the original book page; and/or the presence of a gas in the gas,
and acquiring the position information of each picture element in the original book page, and judging whether the picture elements adjacent to at least two positions belong to the fragment pictures or not according to the position information.
3. The method according to claim 2, wherein the merging the plurality of picture elements belonging to the fragmented picture to obtain the target picture region including the merged plurality of picture elements comprises:
merging at least two picture elements belonging to the fragmented picture, the position interval of which is smaller than a preset interval threshold value, to obtain a candidate picture area;
acquiring the editing sequence of each picture element contained in the candidate picture area in the original book page;
judging whether the area range of the candidate picture area needs to be adjusted or not according to the editing sequence; and if so, determining the target picture area according to the area range of the adjusted candidate picture area.
4. The method of claim 3, wherein the determining whether the region range of the candidate picture region needs to be adjusted according to the editing order comprises:
ordering the editing sequence of each picture element contained in the candidate picture area, and judging whether the sequence interval between two picture elements adjacent to each other in sequence is larger than a preset sequence threshold value;
if so, at least one picture element in the two picture elements adjacent to the sequence is removed from the candidate picture region, and the region range of the candidate picture region is adjusted according to the remaining picture elements.
5. The method of claim 3 or 4, wherein the determining whether the region range of the candidate picture region needs to be adjusted according to the editing order further comprises:
judging whether the candidate picture area contains a text or not; and if so, adjusting the area range of the candidate picture area according to the position of the text.
6. The method according to any one of claims 3 to 5, wherein the merging at least two picture elements belonging to the fragmented picture with a position interval smaller than a preset interval threshold to obtain the candidate picture region comprises:
acquiring text elements obtained by analyzing original book pages of the electronic book;
acquiring first position information of the at least two picture elements belonging to the fragmented pictures in the original book page and second position information of the text elements in the original book page;
judging whether a text element coincident with the at least two picture elements belonging to the fragment type picture exists or not according to the first position information and the second position information;
if so, merging at least two picture elements belonging to the fragmented picture and the text element coincident with the at least two picture elements belonging to the fragmented picture together to obtain the candidate picture area.
7. The method according to claim 6, wherein the merging the at least two picture elements belonging to the fragmented picture and the text element coinciding with the at least two picture elements belonging to the fragmented picture comprises:
determining a first rectangular region corresponding to the at least two picture elements belonging to the fragmented picture and a second rectangular region corresponding to a text element coinciding with the at least two picture elements belonging to the fragmented picture;
and determining a circumscribed rectangle corresponding to the first rectangular region and the second rectangular region, and merging picture elements and text elements contained in the circumscribed rectangle.
8. The method of any of claims 1-7, wherein the original book pages of the e-book are layout pages; the page typesetting content is streaming typesetting content.
9. An electronic device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to perform the method of any of claims 1-8.
10. A computer storage medium having stored therein at least one executable instruction that causes a processor to perform the method of any one of claims 1-8.
CN202010972641.9A 2020-09-16 2020-09-16 Typesetting processing method based on electronic book, electronic equipment and storage medium Pending CN112100979A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010972641.9A CN112100979A (en) 2020-09-16 2020-09-16 Typesetting processing method based on electronic book, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010972641.9A CN112100979A (en) 2020-09-16 2020-09-16 Typesetting processing method based on electronic book, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112100979A true CN112100979A (en) 2020-12-18

Family

ID=73759195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010972641.9A Pending CN112100979A (en) 2020-09-16 2020-09-16 Typesetting processing method based on electronic book, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112100979A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966484A (en) * 2021-03-01 2021-06-15 维沃移动通信有限公司 Chart typesetting method and device, electronic equipment and readable storage medium
CN113011131A (en) * 2021-03-22 2021-06-22 掌阅科技股份有限公司 Typesetting method based on picture electronic book, electronic equipment and storage medium
CN114595391A (en) * 2022-03-17 2022-06-07 北京百度网讯科技有限公司 Data processing method and device based on information search and electronic equipment
CN116578376A (en) * 2023-07-12 2023-08-11 福昕鲲鹏(北京)信息科技有限公司 Open format document OFD page display method, device and equipment
CN117576247A (en) * 2024-01-17 2024-02-20 江西拓世智能科技股份有限公司 Picture generation method and system based on artificial intelligence
CN112966484B (en) * 2021-03-01 2024-06-07 维沃移动通信有限公司 Chart typesetting method, device, electronic equipment and readable storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966484A (en) * 2021-03-01 2021-06-15 维沃移动通信有限公司 Chart typesetting method and device, electronic equipment and readable storage medium
CN112966484B (en) * 2021-03-01 2024-06-07 维沃移动通信有限公司 Chart typesetting method, device, electronic equipment and readable storage medium
CN113011131A (en) * 2021-03-22 2021-06-22 掌阅科技股份有限公司 Typesetting method based on picture electronic book, electronic equipment and storage medium
CN113011131B (en) * 2021-03-22 2022-02-22 掌阅科技股份有限公司 Typesetting method based on picture electronic book, electronic equipment and storage medium
CN114595391A (en) * 2022-03-17 2022-06-07 北京百度网讯科技有限公司 Data processing method and device based on information search and electronic equipment
CN116578376A (en) * 2023-07-12 2023-08-11 福昕鲲鹏(北京)信息科技有限公司 Open format document OFD page display method, device and equipment
CN117576247A (en) * 2024-01-17 2024-02-20 江西拓世智能科技股份有限公司 Picture generation method and system based on artificial intelligence
CN117576247B (en) * 2024-01-17 2024-03-29 江西拓世智能科技股份有限公司 Picture generation method and system based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN110069767B (en) Typesetting method based on electronic book, electronic equipment and computer storage medium
CN112100979A (en) Typesetting processing method based on electronic book, electronic equipment and storage medium
RU2635259C1 (en) Method and device for determining type of digital document
US8645819B2 (en) Detection and extraction of elements constituting images in unstructured document files
EP3940589B1 (en) Layout analysis method, electronic device and computer program product
US10142499B2 (en) Document distribution system, document distribution apparatus, information processing method, and storage medium
CN112380824B (en) PDF document processing method, device, equipment and storage medium for automatically identifying columns
JP2004046315A (en) Device and method for recognizing character, program and storage medium
US9519404B2 (en) Image segmentation for data verification
US9047528B1 (en) Identifying characters in grid-based text
CN111090817A (en) Method for displaying book extension information, electronic equipment and computer storage medium
CN113610068B (en) Test question disassembling method, system, storage medium and equipment based on test paper image
CN112380812B (en) Method, device, equipment and storage medium for extracting incomplete frame line table of PDF (Portable document Format)
CN112100978B (en) Typesetting processing method based on electronic book, electronic equipment and storage medium
CN112417899A (en) Character translation method, device, computer equipment and storage medium
US8116567B2 (en) Digitizing documents
JP2008108114A (en) Document processor and document processing method
CN110956087B (en) Method and device for identifying table in picture, readable medium and electronic equipment
CN113011131B (en) Typesetting method based on picture electronic book, electronic equipment and storage medium
CN109101973B (en) Character recognition method, electronic device and storage medium
CN112699634B (en) Typesetting processing method of electronic book, electronic equipment and storage medium
CN115983198A (en) Method, device and storage medium for extracting header or footer from PDF document
CN112686000B (en) Format conversion method of electronic book document, electronic equipment and storage medium
US20210042555A1 (en) Information Processing Apparatus and Table Recognition Method
CN112784825A (en) Method for identifying characters in picture, method, device and equipment for searching keywords

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination