CN113867654A - PDF page-based splitting and page-splicing method - Google Patents

PDF page-based splitting and page-splicing method Download PDF

Info

Publication number
CN113867654A
CN113867654A CN202111139740.XA CN202111139740A CN113867654A CN 113867654 A CN113867654 A CN 113867654A CN 202111139740 A CN202111139740 A CN 202111139740A CN 113867654 A CN113867654 A CN 113867654A
Authority
CN
China
Prior art keywords
page
splicing
pages
color
pdf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111139740.XA
Other languages
Chinese (zh)
Other versions
CN113867654B (en
Inventor
郑元林
陈兵
廖开阳
刘春霞
陈文倩
王凯迪
孙英健
王可儿
王晓莹
谢雨林
张新会
钟崇军
解博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202111139740.XA priority Critical patent/CN113867654B/en
Publication of CN113867654A publication Critical patent/CN113867654A/en
Application granted granted Critical
Publication of CN113867654B publication Critical patent/CN113867654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/12Digital output to print unit, e.g. line printer, chain printer
    • G06F3/1201Dedicated interfaces to print systems
    • G06F3/1202Dedicated interfaces to print systems specifically adapted to achieve a particular effect
    • G06F3/1203Improving or facilitating administration, e.g. print management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/12Digital output to print unit, e.g. line printer, chain printer
    • G06F3/1201Dedicated interfaces to print systems
    • G06F3/1223Dedicated interfaces to print systems specifically adapted to use a particular technique
    • G06F3/1237Print job management
    • G06F3/1242Image or content composition onto a page
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/12Digital output to print unit, e.g. line printer, chain printer
    • G06F3/1201Dedicated interfaces to print systems
    • G06F3/1223Dedicated interfaces to print systems specifically adapted to use a particular technique
    • G06F3/1237Print job management
    • G06F3/1244Job translation or job parsing, e.g. page banding
    • G06F3/1248Job translation or job parsing, e.g. page banding by printer language recognition, e.g. PDL, PCL, PDF
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/12Digital output to print unit, e.g. line printer, chain printer
    • G06F3/1201Dedicated interfaces to print systems
    • G06F3/1223Dedicated interfaces to print systems specifically adapted to use a particular technique
    • G06F3/1237Print job management
    • G06F3/125Page layout or assigning input pages onto output media, e.g. imposition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Record Information Processing For Printing (AREA)

Abstract

The invention discloses a PDF page-based splitting and page-splicing method, which comprises the steps of firstly reading a PDF file, simultaneously converting page data into image data, and carrying out channel separation and pixel ratio detection on the obtained image data, wherein the channel separation and pixel ratio detection comprises the steps of calculating the ratio of color pixels on the whole image and the deviation value of the color pixels from a standard gray image and carrying out gray color judgment on the page; and separating and recombining the page according to the gray color judgment result and the built-in printing mode and page splicing mode options. The invention uses image analysis and processing technology to extract and judge PDF pages, and splits and splices color pages and black and white pages in PDF according to the output mode, thereby improving efficiency, ensuring quality and reducing cost.

Description

PDF page-based splitting and page-splicing method
Technical Field
The invention belongs to the technical field of image analysis and data processing, and relates to a PDF page-based splitting and page-splicing method.
Background
Along with the application of the internet and digital technology, the digital publishing industry enters a express way and is favored in the field of image-text quick printing and the market of short-cut printing. According to the survey report of digital printing loading in 2020, in the application field of high-end color digital printing machines and production type digital printing machines, the image-text fast printing production center with the largest image-text fast printing area, high quality requirement, strong production capacity and cost requirement is the absolute main force for introducing high-end color digital printing equipment. The second aspect of publishing and printing is that, with the wide promotion of on-demand publishing and printing in the publishing industry, many printing enterprises or digital printing enterprises adopt single-paper high-end color digital printers to produce on-demand publishing orders, and the application in the publishing and printing direction is mainly based on covers.
At present, most of original documents are stored and filed in a PDF Format, PDF (Portable Document Format) files are suitable for being used in the stages of auditing, publishing and filing, and the method has the main advantages of high-fidelity content rendering, multi-platform and multimedia support, strong interactivity, high safety, signature and the like, and can pack unstructured and structured data. The software for processing the Prepress digitization process for developing such documents includes Preps (advanced by Kodak), presswork (princergy evo), and presswork (apoge) abroad, and the like, and the software is represented by the square-fair flow and the square-flight of the north-square (fountain) at home.
It should be noted that, firstly, these process software or systems focus on the unified and standardized processing and imposition of PDF printing, such as detecting whether the image resolution meets the printing requirements; whether the characters or elements have true or messy codes, whether single-color black (gray) or four-color black (gray) is adopted; whether to carry out typesetting, imposition and the like according to the printing requirements. Secondly, the deployment and operation of such software may not be convenient and friendly enough for the image and text fast printing field pursuing aging.
Digital printers are classified into monochrome printers (monochrome machines) and color printers (color machines) according to the color generation mode of printing. The printing cost of a single color machine is about one tenth of that of a color machine under the same breadth. In actual production, there are usually both color and black-and-white pages in the original. In most of the existing digital printing processes, a color part and a black and white part cannot be automatically separated, the cost is high due to the fact that color machines are adopted for printing, the actual requirements cannot be met due to the fact that black and white printing is adopted for printing, and time and labor are wasted due to manual screening and separation. The invention provides a PDF page-based splitting and page-splicing method, which aims to solve the cost problem caused by the fact that a color page and a black and white page cannot be printed separately, and control the cost while ensuring that the appearance and the feeling of a product are not changed.
Disclosure of Invention
The invention aims to provide a PDF page-based splitting and page-splicing method, which can judge and process the page of a PDF file and separate a color page from a gray page; the pages can also be recombined according to a built-in printing mode and a page splicing mode.
The technical scheme adopted by the invention is that a PDF page-based splitting and page-splicing method,
splitting and splicing by using a built-in splitting and splicing mode, wherein the splicing mode of the PDF page comprises reading the PDF page, image conversion, gray color discrimination, page separation, combination and splicing; the method is implemented by the following steps:
step 1, reading a PDF file, acquiring all pages and converting the pages into image data;
step 2, carrying out channel separation and pixel ratio detection on the image data obtained in the step 1, including calculating the ratio of color pixels on the whole image and the deviation value of the color pixels from the standard gray image, and carrying out gray color judgment on the page;
step 3, combining the gray color judgment result obtained in the step 2, and separating and recombining the page according to the built-in printing mode and page splicing mode options;
and 4, transmitting the recombined page obtained in the step 3 into a writer, and outputting the recombined page by the writer to finish splitting and page splicing.
The present invention is also characterized in that,
the step 1 is specifically implemented according to the following method:
using an open method and a getPrixmap method in a python pymumpdf, using an open algorithm to read a PDF file, and establishing a list by taking a page as an object; each single page is converted into image data using the getPixmap algorithm.
The step 2 is specifically implemented according to the following method:
the gray color discrimination algorithm mainly comprises two parts, namely:
(1) calculating the total number of the color pixels; namely the number of color pixel points in the image;
(2) calculating the color pixel ratio; i.e. the ratio in total number of pixels;
the gray color discrimination algorithm is adopted to calculate the number of color pixel points in each image and the ratio of the color pixel points to the total pixels, and the calculation method is expressed as follows:
Figure BDA0003281701850000031
Figure BDA0003281701850000032
Figure BDA0003281701850000033
in the above formula, R, G, B represents pixel matrixes of three channels, m and n represent the sizes of the matrixes, X represents a corresponding gray matrix, X represents the total number of color pixels, and t represents the ratio of the color pixels in the image;
the ratio of x and t, i.e. the total number of color pixels to the color pixels, is calculated as follows:
Figure BDA0003281701850000034
Figure BDA0003281701850000041
wherein ijRepresenting the discrimination result under a single algorithm; lambda [ alpha ]jDenotes a threshold value, threshold λxSet to 100, the color ratio threshold λtTaking 3.8%;
i represents that the weighted average of the two calculation results is binarized to serve as the final judgment result of the current page, the judgment result is a list T, the general form is [0,1,0,1,0,0,0, 1,0. ], each element in T corresponds to the page number one by one, the length is the total page number, 0 represents the gray page of the current page, and 1 represents that the current page is the color page.
Step 3 was carried out in the following manner
And (3) further recombining the gray color discrimination result obtained in the step (2) with a built-in printing mode and a built-in page splicing mode, wherein the specific contents comprise:
built-in printing mode: providing two options of 'single-sided printing' and 'double-sided printing' on the basis of the front side and the back side of the paper, and selecting one option;
built-in page splicing mode: providing a single-copy spelling and a copy spelling; the file types needing to be output finally comprise single spelling and full spelling;
the single page spelling and the copy spelling have the common characteristic that two pages are transversely spliced into one page, and the difference lies in the sequence of the pages; the single page splicing divides all pages into a front part and a rear part according to the page number sequence, and respectively takes one page to splice transversely according to the sequence to obtain a new page; copying and splicing, namely copying the original page, and then transversely splicing the copied page with the original page to obtain a new page;
independently splicing and fully splicing: a refinement option for a singleton mosaic and a duplicate mosaic; the single page splicing is to splice only one file, and the other file is not spliced, namely to splice the gray page and the color page separately; the whole assembly is not distinguished;
if the page splicing mode is a horizontal version, automatically changing the rotation attribute of the page, and adjusting the page to be a vertical version;
adding blank pages: the printing mode and the page splicing mode can be freely combined, and different combinations have different requirements on page number; if not, automatically adding page blank pages at the tail until the total page number meets the requirement.
Step 3, the method for adjusting the page to be vertical is as follows:
acquiring a value of a PDF page object containing a media frame and a rotation attribute, wherein the media frame attribute value contains the height H and the width W of the page, the rotation attribute value is the rotation degree R of the page, a positive value indicates that the page is rotated clockwise, and a negative value indicates that the page is rotated anticlockwise; r is generally 0, and if R is not 0, R must be an integer multiple of 90 or 180;
when H is larger than W: if R is 0, no adjustment is made, namely the original page is kept unchanged; if R is not 0, R is the inverse number of the original value, namely the original page rotates reversely by the same angle;
when H is smaller than W, if R is 0, R is adjusted to 90, namely the original page is rotated by 90 degrees clockwise; if R is not 0, R is the inverse number of the original value, and then 90 is added, namely the original page rotates in the reverse direction by the same angle and then rotates by 90 degrees clockwise.
The method for adding the blank pages in the step 3 comprises the following steps:
generating blank pages provided by canvases of a python reportab library, wherein the size and the direction of the generated blank pages are the same as those of original document pages, and when the number of pages of the document does not meet the requirements of printing and page splicing, the blank pages are automatically added at the tail;
the printing mode is single-sided, page splicing is not needed, and blank pages are not needed to be added;
the printing mode is single-sided, the page splicing mode is copying and splicing, and blank pages are not required to be added;
the printing mode is single-sided, the page splicing mode is single-page splicing, and when the total page number is an odd number, a blank page is added; when the total number of pages is an even number, blank pages do not need to be added;
printing on double sides without page splicing, and adding a blank page when the total number of pages is odd; when the total number of pages is an even number, blank pages do not need to be added;
printing on double sides, splicing by copying, and adding a blank page when the total page number is odd; when the total number of pages is an even number, blank pages are not added;
the printing mode is double-sided, the page splicing mode is single page splicing, and the total page number needs to be the minimum multiple of four; if not, pages of blank pages are added until a minimum multiple of four is met.
The invention has the beneficial effects that:
(1) the splitting algorithm of the invention is not based on page number input, but based on page content, namely splitting and merging gray/color pages, the gray/color page distinguishing algorithm has strong robustness and stability, the splitting effect is ensured, the whole process is automatically completed, and the trouble of manually screening and separating pages is saved.
(2) The invention also designs a page-splicing algorithm which can generate a page-splicing file suitable for direct printing or printing according to the total number of pages, the printing mode and the page-splicing mode.
(3) The invention aims to provide a fast and convenient optimization process from the standardization of PDF (portable document format) files to the use of black-and-white printing and color printing output for the markets of image-text fast printing and short-web publishing printing, and the method for image-text fast printing or printing production is helpful for improving efficiency, saving time and reducing cost.
Drawings
Fig. 1 is a detailed flowchart of a method for splitting and splicing a PDF page according to the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. All the above functions can be combined in any form to form an alternative embodiment of the present disclosure, and are not described in detail herein.
The description and claims of this application and the use of the terms "step 1," "step 2," "step 3," and the like in the foregoing and following description are intended to describe the context of a relatively complete embodiment of the present invention, which can be used for all functional flows, and are not necessarily intended to describe a particular order of precedence. It will be appreciated that some of the operations in these descriptions may be performed separately or in a sequence that is interchangeable under appropriate circumstances such that the embodiments of the application described herein may be performed, for example, in an order other than that described herein.
The technical scheme adopted by the invention is a PDF page-based splitting and page-splicing method, which is implemented according to the following steps as shown in figure 1:
step 1: firstly, performing page analysis on a PDF file and converting the PDF file into image data:
the analysis and conversion algorithm mainly uses an open algorithm and a getPrixmap algorithm in the python pymumpdf, wherein the open algorithm is responsible for reading PDF files and establishing a list by taking pages as objects; the getPixmap algorithm is responsible for converting a single page of image data, and provides operations such as scaling, rotation, and region cropping of the converted image.
The specific content of the step 2 comprises: carrying out gray color discrimination on the image data obtained in the step 1 to obtain a discrimination result:
the gray color discrimination algorithm is based on the principle that whether the values of corresponding positions of image pixels on an RGB three-channel pixel matrix are equal or not, and if not equal, the image pixels are color pixels; in practice, the pixels still appear gray when the three channel values of the pixels differ less. Therefore, the algorithm mainly comprises two aspects, which are respectively:
(1) calculating the total number of the color pixels; i.e. the number of colored pixels in the image.
(2) Calculating the color pixel ratio; i.e. the ratio over the total number of pixels.
The specific calculation formula is as follows:
Figure BDA0003281701850000071
Figure BDA0003281701850000072
Figure BDA0003281701850000073
in the above formula, R, G, B represents a pixel matrix of three channels, m and n represent the size of the matrix, X represents a corresponding gray matrix, X represents the total number of color pixels, and t represents the ratio of color pixels in an image.
The x and t, i.e. the total number of color pixels to the ratio, are calculated as follows:
Figure BDA0003281701850000081
Figure BDA0003281701850000082
wherein ijDenotes the result of the one-sided binarization, λjThe threshold is expressed, and the threshold corresponding to different calculation modes j is different, and the judgment is more strict for the total number of pixels, and the threshold lambda is obtained through experimentsxSet to 100; color ratio is a relatively loose threshold λtTaking 3.8%; it should be noted that, in order to ensure the effectiveness of the gray color discrimination algorithm, the threshold is not limited to the above value, i.e., the threshold may be changed under certain conditions.
And u represents that the weighted average value of the two calculation results is binarized to be used as the final judgment result of the page. And performing the calculation on each page to obtain the judgment results of all the pages. The determination result is a list T, which is generally represented by [0,1,0,1,0,0,0,0,1,0 ], each element in T corresponds to a page number one by one, the length is the total number of pages, 0 indicates that the page is a gray page, and 1 indicates that the page is a color page.
And 3, recombining the page according to the judgment result obtained in the step 2 and a built-in printing mode and a page splicing mode, wherein the specific contents comprise:
the built-in printing mode and page splicing mode are used for better conforming to the characteristics of actual printing output; the printing mode is optional, and two choices of 'single-sided printing' and 'double-sided printing' are provided based on the front side and the back side of the paper; the page splicing mode is optional, and a single-copy splicing mode and a copy splicing mode are provided; in addition, "individual spellings" and "full spellings" are provided for the type of file that is ultimately output as needed.
The 'single page spelling' and the 'copy spelling' can transversely splice two pages into one page, and the difference lies in the sequence of the pages; the single page splicing divides a page into a front part and a rear part according to the page number sequence, and respectively takes one page to splice transversely according to the sequence to form a new page; the copying and splicing is to copy the original page and then transversely splice the original page with the copied page to obtain a new page.
The single splicing and the full splicing are designed according to the output requirement; generally, after processing, the output will result in two documents for gray printing and color printing, respectively, where "single page assembly" refers to assembling only one document and not assembling the other document, such as single gray page assembly and single color page assembly; the "full spellings" are not distinguished.
Vertical plate conversion: the PDF page object contains a media frame (/ MediaBox) to describe the size of the page (height and width) and a Rotate (/ Rotate) attribute that determines the page display style (landscape and portrait), which together determine whether the page is presented in landscape or portrait. The page splicing is horizontal splicing, the page is required to be a vertical page, and if the page is a horizontal page, the rotation attribute can be automatically changed and adjusted to be vertical.
Blank page addition: the printing mode and the page splicing mode can be freely combined, and different combinations have different requirements on page number; the double-sided printing requires that the total number of the original pages is even pages, if the original pages are odd pages, a blank page is automatically added at the tail, and the blank page is complemented to be an even number; when the double-sided printing and the copy spelling are combined, the total number of the original pages is required to be the minimum integer multiple of 4, if the total number of the original pages is insufficient, blank pages are automatically supplemented at the tail end, and the total number of the pages is supplemented to be the minimum integer multiple of 4.
The generation of the blank page is provided by canvas of a python reportab library, the size and the direction of the generated blank page are the same as those of the original document page, and when the number of pages of the document does not meet the requirements of printing and page splicing, the blank page is automatically complemented at the tail.
And 4, storing and outputting the split and spliced pages obtained in the step 3, wherein the specific contents comprise:
PDF files are written mainly with the writer (PdfWriter) method supplied to the python pdfrw library. And (3) adding the pages recombined in the step (3) into a writer respectively, and outputting by the writer to obtain color printing and grey printing documents so as to finish splitting and page splicing.
It should be noted that, when the PDF page splitting and page splicing method provided in the above embodiment is implemented, the above functions are combined and used according to a certain step by only using an example under one condition, and in practical application, the above functions may be used independently or by exchanging the order of steps; that is, some or all of the above functions may be selected, for example, splitting without page splicing, stitching without splitting, or splitting before page splicing, splitting after page splicing, splicing after page splicing, etc.
The present invention has been described in connection with the accompanying drawings by way of example, and its specific implementation is not limited by the above-described manner, as various insubstantial modifications are possible in light of the above teachings, and in light of the above teachings; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.

Claims (6)

1. A PDF page based splitting and page-splicing method is characterized in that a built-in splitting and page-splicing mode is used for splitting and page-splicing, and the page-splicing mode of the PDF page comprises PDF page reading, image conversion, gray color discrimination, page separation, merging and splicing; the method is implemented by the following steps:
step 1, reading a PDF file, acquiring all pages and converting the pages into image data;
step 2, carrying out channel separation and pixel ratio detection on the image data obtained in the step 1, including calculating the ratio of color pixels on the whole image and the deviation value of the color pixels from the standard gray image, and carrying out gray color judgment on the page;
step 3, combining the gray color judgment result obtained in the step 2, and separating and recombining the page according to the built-in printing mode and page splicing mode options;
and 4, transmitting the recombined page obtained in the step 3 into a writer, and outputting the recombined page by the writer to finish splitting and page splicing.
2. The method for splitting and splicing PDF pages according to claim 1, wherein the step 1 is implemented according to the following method:
using an open method and a getPrixmap method in a python pymumpdf, using an open algorithm to read a PDF file, and establishing a list by taking a page as an object; each single page is converted into image data using the getPixmap algorithm.
3. The method for splitting and splicing PDF pages according to claim 1, wherein the step 2 is implemented according to the following method:
the gray color discrimination algorithm mainly comprises two parts, namely:
(3) calculating the total number of the color pixels; namely the number of color pixel points in the image;
(4) calculating the color pixel ratio; i.e. the ratio in total number of pixels;
the gray color discrimination algorithm is adopted to calculate the number of color pixel points in each image and the ratio of the color pixel points to the total pixels, and the calculation method is expressed as follows:
Figure FDA0003281701840000021
Figure FDA0003281701840000022
Figure FDA0003281701840000023
in the above formula, R, G, B represents pixel matrixes of three channels, m and n represent the sizes of the matrixes, X represents a corresponding gray matrix, X represents the total number of color pixels, and t represents the ratio of the color pixels in the image;
the ratio of x and t, i.e. the total number of color pixels to the color pixels, is calculated as follows:
Figure FDA0003281701840000024
Figure FDA0003281701840000025
wherein ijRepresenting the discrimination result under a single algorithm; lambda [ alpha ]jDenotes a threshold value, threshold λxSet to 100, the color ratio threshold λtTaking 3.8%;
i represents that the weighted average of the two calculation results is binarized to serve as the final judgment result of the current page, the judgment result is a list T in the form of [0,1,0,1,0,0,0,0,1,0. ], each element in T corresponds to the number of pages one by one, the length is the total number of pages, 0 represents the gray page of the current page, and 1 represents that the current page is a color page.
4. The method for splitting and splicing PDF pages according to claim 1, wherein the step 3 is implemented according to the following method
And (3) further recombining the gray color discrimination result obtained in the step (2) with a built-in printing mode and a built-in page splicing mode, wherein the specific contents comprise:
built-in printing mode: providing two options of 'single-sided printing' and 'double-sided printing' on the basis of the front side and the back side of the paper, and selecting one option;
built-in page splicing mode: providing a single-copy spelling and a copy spelling; the file types needing to be output finally comprise single spelling and full spelling;
the single page spelling and the copy spelling have the common characteristic that two pages are transversely spliced into one page, and the difference lies in the sequence of the pages; the single page splicing divides all pages into a front part and a rear part according to the page number sequence, and respectively takes one page to splice transversely according to the sequence to obtain a new page; copying and splicing, namely copying the original page, and then transversely splicing the copied page with the original page to obtain a new page;
the independent splicing and the full splicing: a refinement option for a singleton mosaic and a duplicate mosaic; the independent page splicing is to splice only one file, and the other file is not spliced, namely to splice the gray page and the color page independently; the full spelling is not distinguished;
if the page splicing mode is a horizontal version, automatically changing the rotation attribute of the page, and adjusting the page to be a vertical version;
adding blank pages: the printing mode and the page splicing mode can be freely combined, and different combinations have different requirements on page number; if not, automatically adding page blank pages at the tail until the total page number meets the requirement.
5. The PDF page-based splitting and page-splicing method according to claim 4, wherein the method for adjusting the page in the step 3 into the vertical version comprises the following steps:
acquiring a value of a PDF page object containing a media frame and a rotation attribute, wherein the media frame attribute value contains the height H and the width W of the page, the rotation attribute value is the rotation degree R of the page, a positive value indicates that the page is rotated clockwise, and a negative value indicates that the page is rotated anticlockwise; r is generally 0, and if R is not 0, R must be an integer multiple of 90 or 180;
when H is larger than W: if R is 0, no adjustment is made, namely the original page is kept unchanged; if R is not 0, R is the inverse number of the original value, namely the original page rotates reversely by the same angle;
when H is smaller than W, if R is 0, R is adjusted to 90, namely the original page is rotated by 90 degrees clockwise; if R is not 0, R is the inverse number of the original value, and then 90 is added, namely the original page rotates in the reverse direction by the same angle and then rotates by 90 degrees clockwise.
6. The PDF page-based splitting and page-splicing method according to claim 4,
the method for adding the blank pages in the step 3 comprises the following steps:
generating blank pages provided by canvases of a python reportab library, wherein the size and the direction of the generated blank pages are the same as those of original document pages, and when the number of pages of the document does not meet the requirements of printing and page splicing, the blank pages are automatically added at the tail;
the printing mode is single-sided, page splicing is not needed, and blank pages are not needed to be added;
the printing mode is single-sided, the page splicing mode is copying and splicing, and blank pages are not required to be added;
the printing mode is single-sided, the page splicing mode is single-page splicing, and when the total page number is an odd number, a blank page is added; when the total number of pages is an even number, blank pages do not need to be added;
printing on double sides without page splicing, and adding a blank page when the total number of pages is odd; when the total number of pages is an even number, blank pages do not need to be added;
printing on double sides, splicing by copying, and adding a blank page when the total page number is odd; when the total number of pages is an even number, blank pages are not added;
the printing mode is double-sided, the page splicing mode is single page splicing, and the total page number needs to be the minimum multiple of four; if not, pages of blank pages are added until a minimum multiple of four is met.
CN202111139740.XA 2021-09-27 2021-09-27 Splitting and page-spelling method based on PDF page Active CN113867654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111139740.XA CN113867654B (en) 2021-09-27 2021-09-27 Splitting and page-spelling method based on PDF page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111139740.XA CN113867654B (en) 2021-09-27 2021-09-27 Splitting and page-spelling method based on PDF page

Publications (2)

Publication Number Publication Date
CN113867654A true CN113867654A (en) 2021-12-31
CN113867654B CN113867654B (en) 2024-03-08

Family

ID=78991502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111139740.XA Active CN113867654B (en) 2021-09-27 2021-09-27 Splitting and page-spelling method based on PDF page

Country Status (1)

Country Link
CN (1) CN113867654B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115139670A (en) * 2022-07-08 2022-10-04 广东阿诺捷喷墨科技有限公司 Inkjet printing method and system based on single pass inkjet data processing

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070229881A1 (en) * 2006-03-31 2007-10-04 Konica Minolta Systems Laboratory, Inc. Method for printing mixed color and black and white documents
JP2011055131A (en) * 2009-08-31 2011-03-17 Kyocera Mita Corp Image forming apparatus
CN103942187A (en) * 2013-01-18 2014-07-23 北大方正集团有限公司 Page makeup method and device
CN107133000A (en) * 2017-04-27 2017-09-05 上海电机学院 Cross-platform document color analysis and printing interlock method, storage device and terminal
US20200310722A1 (en) * 2019-03-29 2020-10-01 Kyocera Document Solutions Inc. Printing using Multiple Printing Devices
CN112633116A (en) * 2020-12-17 2021-04-09 西安理工大学 Method for intelligently analyzing PDF (Portable document Format) image-text

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070229881A1 (en) * 2006-03-31 2007-10-04 Konica Minolta Systems Laboratory, Inc. Method for printing mixed color and black and white documents
JP2011055131A (en) * 2009-08-31 2011-03-17 Kyocera Mita Corp Image forming apparatus
CN103942187A (en) * 2013-01-18 2014-07-23 北大方正集团有限公司 Page makeup method and device
CN107133000A (en) * 2017-04-27 2017-09-05 上海电机学院 Cross-platform document color analysis and printing interlock method, storage device and terminal
US20200310722A1 (en) * 2019-03-29 2020-10-01 Kyocera Document Solutions Inc. Printing using Multiple Printing Devices
CN112633116A (en) * 2020-12-17 2021-04-09 西安理工大学 Method for intelligently analyzing PDF (Portable document Format) image-text

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115139670A (en) * 2022-07-08 2022-10-04 广东阿诺捷喷墨科技有限公司 Inkjet printing method and system based on single pass inkjet data processing
CN115139670B (en) * 2022-07-08 2024-01-30 广东阿诺捷喷墨科技有限公司 Inkjet printing method and system based on single pass inkjet data processing

Also Published As

Publication number Publication date
CN113867654B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
US8345998B2 (en) Compression scheme selection based on image data type and user selections
US8705140B2 (en) Systems and methods for dynamic sharpness control in system using binary to continuous tone conversion
US8310717B2 (en) Application driven spot color optimizer for reprographics
US20090316213A1 (en) System and method of improving image quality in digital image scanning and printing by reducing noise in output image data
CN1767579A (en) Image processing apparatus and method
JP4945361B2 (en) Image processing method and apparatus, and CPU-readable recording medium
JP4978348B2 (en) Image processing system and image processing method
US7856140B2 (en) Method, computer program, computer and printing system for trapping image data
CA2293613A1 (en) Method and system for image format conversion
CN113867654B (en) Splitting and page-spelling method based on PDF page
US20070133020A1 (en) Image processing system and image processing method
US20040263885A1 (en) Interlacing methods for lenticular images
JP2008311796A (en) Method and device for processing image
US20050206948A1 (en) Image formation assistance device, image formation assistance method and image formation assistance system
US7809199B2 (en) Image processing apparatus
CN101197913B (en) Image processing apparatus and control method
US8139266B2 (en) Color printing control device, color printing control method, and computer readable recording medium stored with color printing control program
US6665435B1 (en) Image data processing method and corresponding device
US7973970B2 (en) Preventing artifacts that may be produced when bottling PDL files converted from raster images
JP4710672B2 (en) Character color discrimination device, character color discrimination method, and computer program
JP3772610B2 (en) Image forming apparatus and control method thereof
US20050225782A1 (en) User-adjustable mechanism for extracting full color information from two-color ink definitions
US7221481B2 (en) Preventing artifacts that may be produced when bottling PDL type files converted from raster images
US7190828B2 (en) Color rendering
JPH0729019A (en) Method for optimum color rendering of plurality of objects in page description

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant