CN113867654A - PDF page-based splitting and page-splicing method - Google Patents
PDF page-based splitting and page-splicing method Download PDFInfo
- Publication number
- CN113867654A CN113867654A CN202111139740.XA CN202111139740A CN113867654A CN 113867654 A CN113867654 A CN 113867654A CN 202111139740 A CN202111139740 A CN 202111139740A CN 113867654 A CN113867654 A CN 113867654A
- Authority
- CN
- China
- Prior art keywords
- page
- splicing
- pages
- color
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000007639 printing Methods 0.000 claims abstract description 73
- 238000000926 separation method Methods 0.000 claims abstract description 7
- 238000001514 detection method Methods 0.000 claims abstract description 4
- 238000004364 calculation method Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 208000018747 cerebellar ataxia with neuropathy and bilateral vestibular areflexia syndrome Diseases 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000010191 image analysis Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000032683 aging Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010017 direct printing Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/12—Digital output to print unit, e.g. line printer, chain printer
- G06F3/1201—Dedicated interfaces to print systems
- G06F3/1202—Dedicated interfaces to print systems specifically adapted to achieve a particular effect
- G06F3/1203—Improving or facilitating administration, e.g. print management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/12—Digital output to print unit, e.g. line printer, chain printer
- G06F3/1201—Dedicated interfaces to print systems
- G06F3/1223—Dedicated interfaces to print systems specifically adapted to use a particular technique
- G06F3/1237—Print job management
- G06F3/1242—Image or content composition onto a page
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/12—Digital output to print unit, e.g. line printer, chain printer
- G06F3/1201—Dedicated interfaces to print systems
- G06F3/1223—Dedicated interfaces to print systems specifically adapted to use a particular technique
- G06F3/1237—Print job management
- G06F3/1244—Job translation or job parsing, e.g. page banding
- G06F3/1248—Job translation or job parsing, e.g. page banding by printer language recognition, e.g. PDL, PCL, PDF
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/12—Digital output to print unit, e.g. line printer, chain printer
- G06F3/1201—Dedicated interfaces to print systems
- G06F3/1223—Dedicated interfaces to print systems specifically adapted to use a particular technique
- G06F3/1237—Print job management
- G06F3/125—Page layout or assigning input pages onto output media, e.g. imposition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30176—Document
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Quality & Reliability (AREA)
- Record Information Processing For Printing (AREA)
Abstract
The invention discloses a PDF page-based splitting and page-splicing method, which comprises the steps of firstly reading a PDF file, simultaneously converting page data into image data, and carrying out channel separation and pixel ratio detection on the obtained image data, wherein the channel separation and pixel ratio detection comprises the steps of calculating the ratio of color pixels on the whole image and the deviation value of the color pixels from a standard gray image and carrying out gray color judgment on the page; and separating and recombining the page according to the gray color judgment result and the built-in printing mode and page splicing mode options. The invention uses image analysis and processing technology to extract and judge PDF pages, and splits and splices color pages and black and white pages in PDF according to the output mode, thereby improving efficiency, ensuring quality and reducing cost.
Description
Technical Field
The invention belongs to the technical field of image analysis and data processing, and relates to a PDF page-based splitting and page-splicing method.
Background
Along with the application of the internet and digital technology, the digital publishing industry enters a express way and is favored in the field of image-text quick printing and the market of short-cut printing. According to the survey report of digital printing loading in 2020, in the application field of high-end color digital printing machines and production type digital printing machines, the image-text fast printing production center with the largest image-text fast printing area, high quality requirement, strong production capacity and cost requirement is the absolute main force for introducing high-end color digital printing equipment. The second aspect of publishing and printing is that, with the wide promotion of on-demand publishing and printing in the publishing industry, many printing enterprises or digital printing enterprises adopt single-paper high-end color digital printers to produce on-demand publishing orders, and the application in the publishing and printing direction is mainly based on covers.
At present, most of original documents are stored and filed in a PDF Format, PDF (Portable Document Format) files are suitable for being used in the stages of auditing, publishing and filing, and the method has the main advantages of high-fidelity content rendering, multi-platform and multimedia support, strong interactivity, high safety, signature and the like, and can pack unstructured and structured data. The software for processing the Prepress digitization process for developing such documents includes Preps (advanced by Kodak), presswork (princergy evo), and presswork (apoge) abroad, and the like, and the software is represented by the square-fair flow and the square-flight of the north-square (fountain) at home.
It should be noted that, firstly, these process software or systems focus on the unified and standardized processing and imposition of PDF printing, such as detecting whether the image resolution meets the printing requirements; whether the characters or elements have true or messy codes, whether single-color black (gray) or four-color black (gray) is adopted; whether to carry out typesetting, imposition and the like according to the printing requirements. Secondly, the deployment and operation of such software may not be convenient and friendly enough for the image and text fast printing field pursuing aging.
Digital printers are classified into monochrome printers (monochrome machines) and color printers (color machines) according to the color generation mode of printing. The printing cost of a single color machine is about one tenth of that of a color machine under the same breadth. In actual production, there are usually both color and black-and-white pages in the original. In most of the existing digital printing processes, a color part and a black and white part cannot be automatically separated, the cost is high due to the fact that color machines are adopted for printing, the actual requirements cannot be met due to the fact that black and white printing is adopted for printing, and time and labor are wasted due to manual screening and separation. The invention provides a PDF page-based splitting and page-splicing method, which aims to solve the cost problem caused by the fact that a color page and a black and white page cannot be printed separately, and control the cost while ensuring that the appearance and the feeling of a product are not changed.
Disclosure of Invention
The invention aims to provide a PDF page-based splitting and page-splicing method, which can judge and process the page of a PDF file and separate a color page from a gray page; the pages can also be recombined according to a built-in printing mode and a page splicing mode.
The technical scheme adopted by the invention is that a PDF page-based splitting and page-splicing method,
splitting and splicing by using a built-in splitting and splicing mode, wherein the splicing mode of the PDF page comprises reading the PDF page, image conversion, gray color discrimination, page separation, combination and splicing; the method is implemented by the following steps:
step 1, reading a PDF file, acquiring all pages and converting the pages into image data;
step 2, carrying out channel separation and pixel ratio detection on the image data obtained in the step 1, including calculating the ratio of color pixels on the whole image and the deviation value of the color pixels from the standard gray image, and carrying out gray color judgment on the page;
step 3, combining the gray color judgment result obtained in the step 2, and separating and recombining the page according to the built-in printing mode and page splicing mode options;
and 4, transmitting the recombined page obtained in the step 3 into a writer, and outputting the recombined page by the writer to finish splitting and page splicing.
The present invention is also characterized in that,
the step 1 is specifically implemented according to the following method:
using an open method and a getPrixmap method in a python pymumpdf, using an open algorithm to read a PDF file, and establishing a list by taking a page as an object; each single page is converted into image data using the getPixmap algorithm.
The step 2 is specifically implemented according to the following method:
the gray color discrimination algorithm mainly comprises two parts, namely:
(1) calculating the total number of the color pixels; namely the number of color pixel points in the image;
(2) calculating the color pixel ratio; i.e. the ratio in total number of pixels;
the gray color discrimination algorithm is adopted to calculate the number of color pixel points in each image and the ratio of the color pixel points to the total pixels, and the calculation method is expressed as follows:
in the above formula, R, G, B represents pixel matrixes of three channels, m and n represent the sizes of the matrixes, X represents a corresponding gray matrix, X represents the total number of color pixels, and t represents the ratio of the color pixels in the image;
the ratio of x and t, i.e. the total number of color pixels to the color pixels, is calculated as follows:
wherein ijRepresenting the discrimination result under a single algorithm; lambda [ alpha ]jDenotes a threshold value, threshold λxSet to 100, the color ratio threshold λtTaking 3.8%;
i represents that the weighted average of the two calculation results is binarized to serve as the final judgment result of the current page, the judgment result is a list T, the general form is [0,1,0,1,0,0,0, 1,0. ], each element in T corresponds to the page number one by one, the length is the total page number, 0 represents the gray page of the current page, and 1 represents that the current page is the color page.
Step 3 was carried out in the following manner
And (3) further recombining the gray color discrimination result obtained in the step (2) with a built-in printing mode and a built-in page splicing mode, wherein the specific contents comprise:
built-in printing mode: providing two options of 'single-sided printing' and 'double-sided printing' on the basis of the front side and the back side of the paper, and selecting one option;
built-in page splicing mode: providing a single-copy spelling and a copy spelling; the file types needing to be output finally comprise single spelling and full spelling;
the single page spelling and the copy spelling have the common characteristic that two pages are transversely spliced into one page, and the difference lies in the sequence of the pages; the single page splicing divides all pages into a front part and a rear part according to the page number sequence, and respectively takes one page to splice transversely according to the sequence to obtain a new page; copying and splicing, namely copying the original page, and then transversely splicing the copied page with the original page to obtain a new page;
independently splicing and fully splicing: a refinement option for a singleton mosaic and a duplicate mosaic; the single page splicing is to splice only one file, and the other file is not spliced, namely to splice the gray page and the color page separately; the whole assembly is not distinguished;
if the page splicing mode is a horizontal version, automatically changing the rotation attribute of the page, and adjusting the page to be a vertical version;
adding blank pages: the printing mode and the page splicing mode can be freely combined, and different combinations have different requirements on page number; if not, automatically adding page blank pages at the tail until the total page number meets the requirement.
Step 3, the method for adjusting the page to be vertical is as follows:
acquiring a value of a PDF page object containing a media frame and a rotation attribute, wherein the media frame attribute value contains the height H and the width W of the page, the rotation attribute value is the rotation degree R of the page, a positive value indicates that the page is rotated clockwise, and a negative value indicates that the page is rotated anticlockwise; r is generally 0, and if R is not 0, R must be an integer multiple of 90 or 180;
when H is larger than W: if R is 0, no adjustment is made, namely the original page is kept unchanged; if R is not 0, R is the inverse number of the original value, namely the original page rotates reversely by the same angle;
when H is smaller than W, if R is 0, R is adjusted to 90, namely the original page is rotated by 90 degrees clockwise; if R is not 0, R is the inverse number of the original value, and then 90 is added, namely the original page rotates in the reverse direction by the same angle and then rotates by 90 degrees clockwise.
The method for adding the blank pages in the step 3 comprises the following steps:
generating blank pages provided by canvases of a python reportab library, wherein the size and the direction of the generated blank pages are the same as those of original document pages, and when the number of pages of the document does not meet the requirements of printing and page splicing, the blank pages are automatically added at the tail;
the printing mode is single-sided, page splicing is not needed, and blank pages are not needed to be added;
the printing mode is single-sided, the page splicing mode is copying and splicing, and blank pages are not required to be added;
the printing mode is single-sided, the page splicing mode is single-page splicing, and when the total page number is an odd number, a blank page is added; when the total number of pages is an even number, blank pages do not need to be added;
printing on double sides without page splicing, and adding a blank page when the total number of pages is odd; when the total number of pages is an even number, blank pages do not need to be added;
printing on double sides, splicing by copying, and adding a blank page when the total page number is odd; when the total number of pages is an even number, blank pages are not added;
the printing mode is double-sided, the page splicing mode is single page splicing, and the total page number needs to be the minimum multiple of four; if not, pages of blank pages are added until a minimum multiple of four is met.
The invention has the beneficial effects that:
(1) the splitting algorithm of the invention is not based on page number input, but based on page content, namely splitting and merging gray/color pages, the gray/color page distinguishing algorithm has strong robustness and stability, the splitting effect is ensured, the whole process is automatically completed, and the trouble of manually screening and separating pages is saved.
(2) The invention also designs a page-splicing algorithm which can generate a page-splicing file suitable for direct printing or printing according to the total number of pages, the printing mode and the page-splicing mode.
(3) The invention aims to provide a fast and convenient optimization process from the standardization of PDF (portable document format) files to the use of black-and-white printing and color printing output for the markets of image-text fast printing and short-web publishing printing, and the method for image-text fast printing or printing production is helpful for improving efficiency, saving time and reducing cost.
Drawings
Fig. 1 is a detailed flowchart of a method for splitting and splicing a PDF page according to the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. All the above functions can be combined in any form to form an alternative embodiment of the present disclosure, and are not described in detail herein.
The description and claims of this application and the use of the terms "step 1," "step 2," "step 3," and the like in the foregoing and following description are intended to describe the context of a relatively complete embodiment of the present invention, which can be used for all functional flows, and are not necessarily intended to describe a particular order of precedence. It will be appreciated that some of the operations in these descriptions may be performed separately or in a sequence that is interchangeable under appropriate circumstances such that the embodiments of the application described herein may be performed, for example, in an order other than that described herein.
The technical scheme adopted by the invention is a PDF page-based splitting and page-splicing method, which is implemented according to the following steps as shown in figure 1:
step 1: firstly, performing page analysis on a PDF file and converting the PDF file into image data:
the analysis and conversion algorithm mainly uses an open algorithm and a getPrixmap algorithm in the python pymumpdf, wherein the open algorithm is responsible for reading PDF files and establishing a list by taking pages as objects; the getPixmap algorithm is responsible for converting a single page of image data, and provides operations such as scaling, rotation, and region cropping of the converted image.
The specific content of the step 2 comprises: carrying out gray color discrimination on the image data obtained in the step 1 to obtain a discrimination result:
the gray color discrimination algorithm is based on the principle that whether the values of corresponding positions of image pixels on an RGB three-channel pixel matrix are equal or not, and if not equal, the image pixels are color pixels; in practice, the pixels still appear gray when the three channel values of the pixels differ less. Therefore, the algorithm mainly comprises two aspects, which are respectively:
(1) calculating the total number of the color pixels; i.e. the number of colored pixels in the image.
(2) Calculating the color pixel ratio; i.e. the ratio over the total number of pixels.
The specific calculation formula is as follows:
in the above formula, R, G, B represents a pixel matrix of three channels, m and n represent the size of the matrix, X represents a corresponding gray matrix, X represents the total number of color pixels, and t represents the ratio of color pixels in an image.
The x and t, i.e. the total number of color pixels to the ratio, are calculated as follows:
wherein ijDenotes the result of the one-sided binarization, λjThe threshold is expressed, and the threshold corresponding to different calculation modes j is different, and the judgment is more strict for the total number of pixels, and the threshold lambda is obtained through experimentsxSet to 100; color ratio is a relatively loose threshold λtTaking 3.8%; it should be noted that, in order to ensure the effectiveness of the gray color discrimination algorithm, the threshold is not limited to the above value, i.e., the threshold may be changed under certain conditions.
And u represents that the weighted average value of the two calculation results is binarized to be used as the final judgment result of the page. And performing the calculation on each page to obtain the judgment results of all the pages. The determination result is a list T, which is generally represented by [0,1,0,1,0,0,0,0,1,0 ], each element in T corresponds to a page number one by one, the length is the total number of pages, 0 indicates that the page is a gray page, and 1 indicates that the page is a color page.
And 3, recombining the page according to the judgment result obtained in the step 2 and a built-in printing mode and a page splicing mode, wherein the specific contents comprise:
the built-in printing mode and page splicing mode are used for better conforming to the characteristics of actual printing output; the printing mode is optional, and two choices of 'single-sided printing' and 'double-sided printing' are provided based on the front side and the back side of the paper; the page splicing mode is optional, and a single-copy splicing mode and a copy splicing mode are provided; in addition, "individual spellings" and "full spellings" are provided for the type of file that is ultimately output as needed.
The 'single page spelling' and the 'copy spelling' can transversely splice two pages into one page, and the difference lies in the sequence of the pages; the single page splicing divides a page into a front part and a rear part according to the page number sequence, and respectively takes one page to splice transversely according to the sequence to form a new page; the copying and splicing is to copy the original page and then transversely splice the original page with the copied page to obtain a new page.
The single splicing and the full splicing are designed according to the output requirement; generally, after processing, the output will result in two documents for gray printing and color printing, respectively, where "single page assembly" refers to assembling only one document and not assembling the other document, such as single gray page assembly and single color page assembly; the "full spellings" are not distinguished.
Vertical plate conversion: the PDF page object contains a media frame (/ MediaBox) to describe the size of the page (height and width) and a Rotate (/ Rotate) attribute that determines the page display style (landscape and portrait), which together determine whether the page is presented in landscape or portrait. The page splicing is horizontal splicing, the page is required to be a vertical page, and if the page is a horizontal page, the rotation attribute can be automatically changed and adjusted to be vertical.
Blank page addition: the printing mode and the page splicing mode can be freely combined, and different combinations have different requirements on page number; the double-sided printing requires that the total number of the original pages is even pages, if the original pages are odd pages, a blank page is automatically added at the tail, and the blank page is complemented to be an even number; when the double-sided printing and the copy spelling are combined, the total number of the original pages is required to be the minimum integer multiple of 4, if the total number of the original pages is insufficient, blank pages are automatically supplemented at the tail end, and the total number of the pages is supplemented to be the minimum integer multiple of 4.
The generation of the blank page is provided by canvas of a python reportab library, the size and the direction of the generated blank page are the same as those of the original document page, and when the number of pages of the document does not meet the requirements of printing and page splicing, the blank page is automatically complemented at the tail.
And 4, storing and outputting the split and spliced pages obtained in the step 3, wherein the specific contents comprise:
PDF files are written mainly with the writer (PdfWriter) method supplied to the python pdfrw library. And (3) adding the pages recombined in the step (3) into a writer respectively, and outputting by the writer to obtain color printing and grey printing documents so as to finish splitting and page splicing.
It should be noted that, when the PDF page splitting and page splicing method provided in the above embodiment is implemented, the above functions are combined and used according to a certain step by only using an example under one condition, and in practical application, the above functions may be used independently or by exchanging the order of steps; that is, some or all of the above functions may be selected, for example, splitting without page splicing, stitching without splitting, or splitting before page splicing, splitting after page splicing, splicing after page splicing, etc.
The present invention has been described in connection with the accompanying drawings by way of example, and its specific implementation is not limited by the above-described manner, as various insubstantial modifications are possible in light of the above teachings, and in light of the above teachings; or directly apply the conception and the technical scheme of the invention to other occasions without improvement and equivalent replacement, and the invention is within the protection scope of the invention.
Claims (6)
1. A PDF page based splitting and page-splicing method is characterized in that a built-in splitting and page-splicing mode is used for splitting and page-splicing, and the page-splicing mode of the PDF page comprises PDF page reading, image conversion, gray color discrimination, page separation, merging and splicing; the method is implemented by the following steps:
step 1, reading a PDF file, acquiring all pages and converting the pages into image data;
step 2, carrying out channel separation and pixel ratio detection on the image data obtained in the step 1, including calculating the ratio of color pixels on the whole image and the deviation value of the color pixels from the standard gray image, and carrying out gray color judgment on the page;
step 3, combining the gray color judgment result obtained in the step 2, and separating and recombining the page according to the built-in printing mode and page splicing mode options;
and 4, transmitting the recombined page obtained in the step 3 into a writer, and outputting the recombined page by the writer to finish splitting and page splicing.
2. The method for splitting and splicing PDF pages according to claim 1, wherein the step 1 is implemented according to the following method:
using an open method and a getPrixmap method in a python pymumpdf, using an open algorithm to read a PDF file, and establishing a list by taking a page as an object; each single page is converted into image data using the getPixmap algorithm.
3. The method for splitting and splicing PDF pages according to claim 1, wherein the step 2 is implemented according to the following method:
the gray color discrimination algorithm mainly comprises two parts, namely:
(3) calculating the total number of the color pixels; namely the number of color pixel points in the image;
(4) calculating the color pixel ratio; i.e. the ratio in total number of pixels;
the gray color discrimination algorithm is adopted to calculate the number of color pixel points in each image and the ratio of the color pixel points to the total pixels, and the calculation method is expressed as follows:
in the above formula, R, G, B represents pixel matrixes of three channels, m and n represent the sizes of the matrixes, X represents a corresponding gray matrix, X represents the total number of color pixels, and t represents the ratio of the color pixels in the image;
the ratio of x and t, i.e. the total number of color pixels to the color pixels, is calculated as follows:
wherein ijRepresenting the discrimination result under a single algorithm; lambda [ alpha ]jDenotes a threshold value, threshold λxSet to 100, the color ratio threshold λtTaking 3.8%;
i represents that the weighted average of the two calculation results is binarized to serve as the final judgment result of the current page, the judgment result is a list T in the form of [0,1,0,1,0,0,0,0,1,0. ], each element in T corresponds to the number of pages one by one, the length is the total number of pages, 0 represents the gray page of the current page, and 1 represents that the current page is a color page.
4. The method for splitting and splicing PDF pages according to claim 1, wherein the step 3 is implemented according to the following method
And (3) further recombining the gray color discrimination result obtained in the step (2) with a built-in printing mode and a built-in page splicing mode, wherein the specific contents comprise:
built-in printing mode: providing two options of 'single-sided printing' and 'double-sided printing' on the basis of the front side and the back side of the paper, and selecting one option;
built-in page splicing mode: providing a single-copy spelling and a copy spelling; the file types needing to be output finally comprise single spelling and full spelling;
the single page spelling and the copy spelling have the common characteristic that two pages are transversely spliced into one page, and the difference lies in the sequence of the pages; the single page splicing divides all pages into a front part and a rear part according to the page number sequence, and respectively takes one page to splice transversely according to the sequence to obtain a new page; copying and splicing, namely copying the original page, and then transversely splicing the copied page with the original page to obtain a new page;
the independent splicing and the full splicing: a refinement option for a singleton mosaic and a duplicate mosaic; the independent page splicing is to splice only one file, and the other file is not spliced, namely to splice the gray page and the color page independently; the full spelling is not distinguished;
if the page splicing mode is a horizontal version, automatically changing the rotation attribute of the page, and adjusting the page to be a vertical version;
adding blank pages: the printing mode and the page splicing mode can be freely combined, and different combinations have different requirements on page number; if not, automatically adding page blank pages at the tail until the total page number meets the requirement.
5. The PDF page-based splitting and page-splicing method according to claim 4, wherein the method for adjusting the page in the step 3 into the vertical version comprises the following steps:
acquiring a value of a PDF page object containing a media frame and a rotation attribute, wherein the media frame attribute value contains the height H and the width W of the page, the rotation attribute value is the rotation degree R of the page, a positive value indicates that the page is rotated clockwise, and a negative value indicates that the page is rotated anticlockwise; r is generally 0, and if R is not 0, R must be an integer multiple of 90 or 180;
when H is larger than W: if R is 0, no adjustment is made, namely the original page is kept unchanged; if R is not 0, R is the inverse number of the original value, namely the original page rotates reversely by the same angle;
when H is smaller than W, if R is 0, R is adjusted to 90, namely the original page is rotated by 90 degrees clockwise; if R is not 0, R is the inverse number of the original value, and then 90 is added, namely the original page rotates in the reverse direction by the same angle and then rotates by 90 degrees clockwise.
6. The PDF page-based splitting and page-splicing method according to claim 4,
the method for adding the blank pages in the step 3 comprises the following steps:
generating blank pages provided by canvases of a python reportab library, wherein the size and the direction of the generated blank pages are the same as those of original document pages, and when the number of pages of the document does not meet the requirements of printing and page splicing, the blank pages are automatically added at the tail;
the printing mode is single-sided, page splicing is not needed, and blank pages are not needed to be added;
the printing mode is single-sided, the page splicing mode is copying and splicing, and blank pages are not required to be added;
the printing mode is single-sided, the page splicing mode is single-page splicing, and when the total page number is an odd number, a blank page is added; when the total number of pages is an even number, blank pages do not need to be added;
printing on double sides without page splicing, and adding a blank page when the total number of pages is odd; when the total number of pages is an even number, blank pages do not need to be added;
printing on double sides, splicing by copying, and adding a blank page when the total page number is odd; when the total number of pages is an even number, blank pages are not added;
the printing mode is double-sided, the page splicing mode is single page splicing, and the total page number needs to be the minimum multiple of four; if not, pages of blank pages are added until a minimum multiple of four is met.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111139740.XA CN113867654B (en) | 2021-09-27 | 2021-09-27 | Splitting and page-spelling method based on PDF page |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111139740.XA CN113867654B (en) | 2021-09-27 | 2021-09-27 | Splitting and page-spelling method based on PDF page |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113867654A true CN113867654A (en) | 2021-12-31 |
CN113867654B CN113867654B (en) | 2024-03-08 |
Family
ID=78991502
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111139740.XA Active CN113867654B (en) | 2021-09-27 | 2021-09-27 | Splitting and page-spelling method based on PDF page |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113867654B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115139670A (en) * | 2022-07-08 | 2022-10-04 | 广东阿诺捷喷墨科技有限公司 | Inkjet printing method and system based on single pass inkjet data processing |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070229881A1 (en) * | 2006-03-31 | 2007-10-04 | Konica Minolta Systems Laboratory, Inc. | Method for printing mixed color and black and white documents |
JP2011055131A (en) * | 2009-08-31 | 2011-03-17 | Kyocera Mita Corp | Image forming apparatus |
CN103942187A (en) * | 2013-01-18 | 2014-07-23 | 北大方正集团有限公司 | Page makeup method and device |
CN107133000A (en) * | 2017-04-27 | 2017-09-05 | 上海电机学院 | Cross-platform document color analysis and printing interlock method, storage device and terminal |
US20200310722A1 (en) * | 2019-03-29 | 2020-10-01 | Kyocera Document Solutions Inc. | Printing using Multiple Printing Devices |
CN112633116A (en) * | 2020-12-17 | 2021-04-09 | 西安理工大学 | Method for intelligently analyzing PDF (Portable document Format) image-text |
-
2021
- 2021-09-27 CN CN202111139740.XA patent/CN113867654B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070229881A1 (en) * | 2006-03-31 | 2007-10-04 | Konica Minolta Systems Laboratory, Inc. | Method for printing mixed color and black and white documents |
JP2011055131A (en) * | 2009-08-31 | 2011-03-17 | Kyocera Mita Corp | Image forming apparatus |
CN103942187A (en) * | 2013-01-18 | 2014-07-23 | 北大方正集团有限公司 | Page makeup method and device |
CN107133000A (en) * | 2017-04-27 | 2017-09-05 | 上海电机学院 | Cross-platform document color analysis and printing interlock method, storage device and terminal |
US20200310722A1 (en) * | 2019-03-29 | 2020-10-01 | Kyocera Document Solutions Inc. | Printing using Multiple Printing Devices |
CN112633116A (en) * | 2020-12-17 | 2021-04-09 | 西安理工大学 | Method for intelligently analyzing PDF (Portable document Format) image-text |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115139670A (en) * | 2022-07-08 | 2022-10-04 | 广东阿诺捷喷墨科技有限公司 | Inkjet printing method and system based on single pass inkjet data processing |
CN115139670B (en) * | 2022-07-08 | 2024-01-30 | 广东阿诺捷喷墨科技有限公司 | Inkjet printing method and system based on single pass inkjet data processing |
Also Published As
Publication number | Publication date |
---|---|
CN113867654B (en) | 2024-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8345998B2 (en) | Compression scheme selection based on image data type and user selections | |
US8705140B2 (en) | Systems and methods for dynamic sharpness control in system using binary to continuous tone conversion | |
US8310717B2 (en) | Application driven spot color optimizer for reprographics | |
US20090316213A1 (en) | System and method of improving image quality in digital image scanning and printing by reducing noise in output image data | |
CN1767579A (en) | Image processing apparatus and method | |
JP4945361B2 (en) | Image processing method and apparatus, and CPU-readable recording medium | |
JP4978348B2 (en) | Image processing system and image processing method | |
US7856140B2 (en) | Method, computer program, computer and printing system for trapping image data | |
CA2293613A1 (en) | Method and system for image format conversion | |
CN113867654B (en) | Splitting and page-spelling method based on PDF page | |
US20070133020A1 (en) | Image processing system and image processing method | |
US20040263885A1 (en) | Interlacing methods for lenticular images | |
JP2008311796A (en) | Method and device for processing image | |
US20050206948A1 (en) | Image formation assistance device, image formation assistance method and image formation assistance system | |
US7809199B2 (en) | Image processing apparatus | |
CN101197913B (en) | Image processing apparatus and control method | |
US8139266B2 (en) | Color printing control device, color printing control method, and computer readable recording medium stored with color printing control program | |
US6665435B1 (en) | Image data processing method and corresponding device | |
US7973970B2 (en) | Preventing artifacts that may be produced when bottling PDL files converted from raster images | |
JP4710672B2 (en) | Character color discrimination device, character color discrimination method, and computer program | |
JP3772610B2 (en) | Image forming apparatus and control method thereof | |
US20050225782A1 (en) | User-adjustable mechanism for extracting full color information from two-color ink definitions | |
US7221481B2 (en) | Preventing artifacts that may be produced when bottling PDL type files converted from raster images | |
US7190828B2 (en) | Color rendering | |
JPH0729019A (en) | Method for optimum color rendering of plurality of objects in page description |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |