WO2022139134A1

WO2022139134A1 - Method and device for inspecting digitally-converted content

Info

Publication number: WO2022139134A1
Application number: PCT/KR2021/014028
Authority: WO
Inventors: 박종한
Original assignee: 주식회사 펍플
Priority date: 2020-12-22
Filing date: 2021-10-12
Publication date: 2022-06-30
Also published as: KR20220089872A

Abstract

A method and a device for inspecting digitally-converted content are disclosed. According to one aspect of the present disclosure, provided is a method for inspecting digitally-converted content performed by a digitally-converted content inspection device, comprising: a process of flattening original content composed of at least one layer into a single layer, and generating processed content that is the result of removing, from among text and objects contained in the original content, text and objects obscured by an object included in an upper layer; a pre-inspection process of comparing text recognized from the processed content with text extracted from the original content; a post-inspection process of comparing the original content with digitally-converted content produced on the basis of the original content; and a process of preparing an inspection report on the basis of the comparison result of the pre-inspection process and the post-inspection process.

Description

Digital conversion content inspection method and device

The present disclosure relates to a digital conversion content verification method and apparatus.

The content described in this section merely provides background information on the present invention and does not constitute the prior art.

Digital-converted content refers to content converted by structuring original content according to a predetermined process. For example, digital conversion content such as html format may be produced by using original content in PDF format, which is a format for printing.

1 is an exemplary view showing a conventional digital conversion content inspection method.

As shown in Figure 1, in order to check whether the digitally converted content is properly produced, in general, the inspector must directly compare the screen on which the original content is displayed and the screen on which the digitally converted content is displayed.

However, this method has a problem in that, in converting a large amount of original content or original content including a large amount of pages to digitally converted content, the efficiency of inspection time and manpower is lowered. Furthermore, there is a limitation in that the accuracy of the inspection is lowered due to the accumulation of mistakes and/or fatigue that may occur according to the long-term work of the inspector.

In particular, even if the inspector uses the inspection interface that allows the inspector to simultaneously view and inspect the same page in the original content and the digitally converted content, typos and differences in image color change that occur when converting the original content for printing into the digitally converted content, etc. There is a problem that it is difficult to find directly.

The main purpose of the present disclosure is to provide a digital conversion content inspection method and apparatus that can increase the accuracy of digital conversion content production by detecting typos, small objects, hidden data, and image color change that are difficult for inspectors to visually check. have.

Furthermore, the present disclosure provides a digitally converted content inspection method capable of producing digitally converted content in consideration of cross browsing by comparing a screen actually displayed for each browser with the original content for digitally converted content and to provide a device.

According to an aspect of the present disclosure, as an inspection method performed by a digital conversion content inspection apparatus, original content composed of at least one layer is flattened into a single layer, and text ( text) and an object, the process of generating processed content that is a result of removing text and objects that are hidden by objects included in an upper layer; a pre-examination process of comparing the text recognized from the processed content with the text extracted from the original content; a post-examination process of comparing the original content with digitally converted content produced based on the original content; and creating an inspection report based on a comparison result of the pre-inspection process and the post-inspection process.

According to another aspect of the present disclosure, the original content composed of at least one layer is flattened into a single layer, and the text and objects included in the original content are included in the upper layer. a preprocessor for generating processed content that is a result of removing text and objects obscured by an object; a dictionary check unit comparing the text recognized from the processed content with the text extracted from the original content; a post-examination unit for comparing the original content with digitally converted content produced based on the original content; and a learning unit for creating an inspection report based on a comparison result of the pre-inspection process and the post-inspection process.

As described above, according to the exemplary embodiment of the present disclosure, it is possible to increase the accuracy of digitally converted content production by detecting typos, small objects, hidden data, and image color change that are difficult for an inspector to check with the naked eye. Accordingly, inspection time and manpower required for inspection can be minimized, and the shortcomings of visual inspection can be supplemented.

Furthermore, according to an embodiment of the present disclosure, digitally converted content can be produced in consideration of cross-browsing by comparing the screen actually displayed for each browser with the original content for digitally converted content.

Furthermore, according to an embodiment of the present disclosure, it is possible to gradually improve the inspection accuracy and reduce the inspection time through learning, and in particular, by increasing the judgment rate for text direction, rotation, and foreign language exception handling for each language, digitally converted content It is possible to achieve automation of the entire process from creation to creation.

2 is a block diagram schematically showing an apparatus for inspecting digitally converted content according to an embodiment of the present disclosure.

3 is a block diagram schematically showing a pre-inspection unit according to an embodiment of the present disclosure.

4 is a block diagram schematically showing a post inspection unit according to an embodiment of the present disclosure.

5 is a flowchart illustrating a digital conversion content inspection method according to an embodiment of the present disclosure.

Hereinafter, some embodiments of the present disclosure will be described in detail with reference to exemplary drawings. In adding reference numerals to the components of each drawing, it should be noted that the same components are given the same reference numerals as much as possible even though they are indicated on different drawings. In addition, in describing the present disclosure, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description thereof will be omitted.

In addition, in describing the components of the present disclosure, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. Throughout the specification, when a part 'includes' or 'includes' a certain element, this means that other elements may be further included, rather than excluding other elements, unless otherwise stated. . In addition, the '... Terms such as 'unit' and 'module' mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software.

2, the digital conversion content inspection device 20 according to an embodiment of the present disclosure includes an input unit 200, a data extraction unit 210, a pre-processing unit 220, a pre-examination unit ( A pre-inspection unit 230 , a conversion unit 240 , a post-inspection unit 250 , a learning unit 260 , and an output unit 270 are included in whole or in part. Not all blocks shown in FIG. 2 are essential components, and in another embodiment, some blocks included in the digital conversion content inspection device 20 may be added, changed, or deleted. For example, according to another embodiment of the present disclosure, the function performed by the conversion unit 240 may be performed by a digital conversion device (not shown), which is a separate stand-alone device that is interlocked with the digital conversion content inspection device 20, , in this case, the digital conversion content inspection device 20 may not include the conversion unit 240 .

The input unit 200 receives original content to be digitally converted. The original content may include text information and image information, and preferably a file produced in a PDF (Portable Document Format) format, but is not necessarily limited thereto.

The data extraction unit 210 parses the original content and includes text, object, image, vector data, and layout information of the original content included in the original content. extract etc. The data extractor 210 according to an embodiment of the present disclosure may temporarily store the extracted data.

The data extraction unit 210 provides the extracted data to the dictionary inspection unit 230 and/or the conversion unit 240 .

Meanwhile, hereinafter, the case in which the original content consists of one page will be described as an example, but this is for convenience of description and the original content may include one or more pages. When the original content consists of a plurality of pages, the data extraction unit 210 may separate the original content for each page and then extract data for each page.

The preprocessor 220 flattens the original content and removes unnecessary text and/or objects from the compressed original content.

In the present disclosure, digital-converted content is a result of complex extracted data, and is preferably produced based on HTML (Hyper Text Mark-up Language). The digitally converted content may include a background image and vector-based text data, and the information included in the digitally converted content is displayed on the screen in an overlapping state based on a layer. .

Digitally converted content inspection is a technology for contrasting the layer-based digitally converted content and the surface on which the original content is actually displayed, and it is different from the technology of simply comparing two different images.

In addition, the conventional character recognition (character recognition) technology and / or computer vision (computer vision) technology alone, not the text and / or object (object) exposed to the top layer, other than the text and / or object hidden in another object inspection Because it is impossible, a unique processing method for digitally converted content verification is required.

Due to this, the preprocessor 220 according to an embodiment of the present disclosure flattens the original content to unify the layers in the original content. Specifically, the preprocessor 220 may unify the layers by compressing the PDF layer system of the original content. The preprocessor 220 removes text and/or objects obscured by an object in an upper layer from the compressed original content.

The preprocessor 220 according to an embodiment of the present disclosure is configured to perform an upper layer object based on an area, a perimeter, a centroid, and a bounding box of each object and/or text. You can find text and/or objects that are obscured by . The preprocessor 220 according to an embodiment of the present disclosure determines whether other texts and/or objects are covered by objects of higher layers using computing vision technology for regions where each object and/or a bounding box between texts overlaps. check

For example, when text in a lower layer is covered by an object in an upper layer, the text is not visible on the screen, but the text is extracted when data is extracted. Since such text information is a garbage value, it should be removed. However, even if the bounding boxes of the object and the text overlap, it cannot be concluded that the text is covered by the object. Accordingly, the preprocessing unit 220 according to an embodiment of the present disclosure uses a computing vision technology to symmetrically determine whether the text is visible on the screen by symmetrical pixel values of the region where the boundary box between the object and the text overlap and the color values of the text. By checking , it is checked whether the text of the lower layer is obscured by the object of the upper layer.

On the other hand, the method of removing the text and/or the obscured object obscured by the object in the upper layer from the compressed original content is not limited to the above-described method, and anyone skilled in the art may add or remove another method. .

The pre-inspection unit 230 performs a pre-inspection on the original content from which the hidden text and/or the hidden object has been removed by using a character recognition technology and/or a computer vision technology. Here, the pre-inspection refers to an inspection performed prior to digital conversion content production. Since it takes a considerable amount of time to convert original content into digital conversion content, before performing the conversion process, the pre-examination unit 230 reduces unnecessary conversion time by detecting the cause of errors that may occur in the conversion process in advance. and increase the accuracy of the inspection. For example, the pre-examination unit 230 determines whether the document is encrypted with respect to the original content, whether the document is corrupted, fonts (or subset fonts) in the document, whether bookmarks and/or cut lines are included, etc. can figure out The pre-examination unit 230 may determine whether the conversion unit 240 performs conversion based on the pre-examination result, and may record the pre-examination result in a log format. A detailed description of the pre-inspection unit 230 will be described with reference to FIG. 3 .

The converter 240 creates digitally converted content based on text, image, vector data, and/or layout information of the original content extracted from the original content.

The conversion unit 240 according to an embodiment of the present disclosure preferably produces digitally converted content based on HTML (Hyper Text Mark-up Language). The digitally converted content may include a background image, vector-based text data, and the like. Here, the vector-based text data means that text in the original content is converted into a vector image. A vector image is an image format that uses dots and lines to express the outline and fills the inside with color or pattern. It is an image format format that can obtain the same appearance as the original even when enlarged or reduced. Such a vector image has an advantage in that it can always provide a clear image regardless of enlargement/reduction because the boundary line is formed by connecting lines. The vector image may be preferably implemented as SVG (Scalable Vector Graphics), but is not necessarily limited thereto.

The post inspection unit 250 compares the original content and the digitally converted content, and generates a post inspection result. A detailed description of the post inspection unit 250 will be described with reference to FIG. 4 .

The learning unit 260 determines whether to perform re-conversion based on the pre-examination result and/or the post-examination result. The learning unit 260 learns the pre-examination unit 230 , the conversion unit 240 , and/or the post-examination unit 250 by using the pre-examination result and/or the post-examination result in repeatedly performing the conversion. . The learning unit 260 according to an embodiment of the present disclosure may use a regression analysis model of machine learning.

When there is no difference between the original content and the converted (or reconverted) digitally converted content, the learning unit 260 reflects the setting value used for digitally converted content conversion and inspection in learning. Here, the set value may include a set value for the crop line and/or a language used to implement a character recognition technology or a computer vision technology, a matching condition, a CMAP value, and the like.

On the other hand, if conversion is impossible due to a problem of the original content itself, or there is a difference between the original content and the converted (or reconverted) digitally converted content despite learning and reconversion through regression, the learning unit 260 is The pre-inspection result and/or the post-inspection result are processed and provided to the output unit 270 . Here, the case where there is a difference between the original content and the converted (or reconverted) digitally converted content despite learning and reconversion through regression means that the original content and It may mean a case in which a difference greater than or equal to a preset threshold range exists between converted (or re-converted) digitally converted content. In this case, the learning unit 260 may generate an inspection report in the form of JSON (Java Script Object Notation) by processing the pre-inspection result and/or the post-inspection result. Here, the inspection report is data by which the inspector can identify the problem of conversion failure, and may include information on areas or pages where differences occur between the original content and the digitally converted content, information on failure cases, and the like.

The output unit 270 provides an inspection report to the user. The output unit 270 according to an embodiment of the present disclosure may include an output means such as a display to provide the inspection report to the user. The inspector can check the data that can be referenced for the visual inspection based on the inspection report, and can proceed with the next conversion and inspection based on the set values included in the inspection report.

According to another embodiment of the present disclosure, the output unit 270 may provide a pre-inspection result and/or a post-inspection result to the user by transmitting the inspection report to a user terminal. Here, the user terminal is a separate, stand-alone device that is interlocked with the digital conversion content inspection device 20, for example, a laptop, a personal computer (PC), a smart phone, a tablet PC. ), a personal digital assistant (PDA), and a mobile communication terminal.

As shown in FIG. 3 , the dictionary check unit 230 according to an embodiment of the present disclosure includes all or part of the text recognition unit 300 and the text comparison unit 310 . Not all blocks shown in FIG. 3 are essential components, and in another embodiment, some blocks included in the pre-examination unit 230 may be added, changed, or deleted.

The text recognition unit 300 obtains data from which hidden text and/or hidden objects are removed from the original content (hereinafter, 'processed content') from the preprocessor 220 . After removing the image included in the processed content, the text recognition unit 300 recognizes the text using a character recognition technology. To this end, the text recognition unit 300 according to an embodiment of the present disclosure may include an artificial intelligence-based optical character recognition model (AI-OCR model).

The text comparison unit 310 compares the text (hereinafter, 'recognized text') obtained by the text recognition unit 300 from the processed content using a character recognition technology with the original content, and generates a pre-examination result.

The text comparison unit 310 according to an embodiment of the present disclosure compares the recognized text and the text extracted from the original content by the preprocessor 220 (hereinafter, 'extracted text').

The text comparison unit 310 according to an embodiment of the present disclosure compares the extracted text and the content of the recognized text, that is, a text value.

The text comparison unit 310 according to another embodiment of the present disclosure compares style information related to coordinates and/or size, such as direction, spacing, and leading, of the recognized text with style information of the extracted text.

Various types of concepts related to page size exist in the PDF format, where a crop box refers to the size of a page displayed on a screen. When creating original content using editing software such as Indesign or Illustrator, there may be a problem that the coordinate system of a specific text is expressed as a value out of the displayed area due to errors, etc. have.

Meanwhile, since spacing and spacing may be different depending on the type of font used in the original content, font information such as spacing and spacing of the font needs to be extracted in order to reconstruct the original content into digitally converted content. In this case, when the font file itself is attached to the original content, there is no significant problem in extracting such font information. On the other hand, when the characters included in the original content maintain only the glyph form, the CMAP information is not clearly present in the original content. For this reason, if only the glyph form is maintained, it may fail to extract the text included in the original content, and even if the text extraction succeeds, the font of the text is recognized as an alternative font, and font information such as spacing and leading is not matched with the actual text. can be calculated differently. Accordingly, when digitally converted content is produced based on the extracted information, a problem such as line break occurring at a location different from the actual original content may occur.

The text comparison unit 310 according to an embodiment of the present disclosure may detect these problems before conversion is performed by comparing the style information of the recognized text and the extracted text.

The text comparison unit 310 generates a comparison result as a pre-examination result, and transmits it to the learning unit 260 .

As shown in FIG. 4 , the post inspection unit 250 according to an embodiment of the present disclosure includes an original image generation unit 400 , a converted image generation unit 410 , an image comparison unit 420 , and a content comparison unit 430 . ) in whole or in part. Not all blocks shown in FIG. 4 are essential components, and in another embodiment, some blocks included in the post inspection unit 250 may be added, changed, or deleted.

The original image generator 400 generates an original image that is a surface image obtained by rendering original content. When the original content consists of a plurality of pages, the original image generator 400 generates an original image for each page.

The converted image generator 410 generates a converted image that is a screen image obtained by rendering the converted content. When the converted content consists of a plurality of pages, the converted image generator 410 generates a converted image for each page. Furthermore, the converted image generating unit 410 generates a converted image for each page of each converted content for each browser, so that it can respond to a cross browsing issue.

The image comparison unit 420 performs a resize correction operation for matching the resolutions of the original image and the converted image to the original image and the converted image, and compares the original image and the converted image. The image comparison unit 420 according to an embodiment of the present disclosure compares the original image and the converted image, and determines whether the image color is changed or not, whether there is a change or not, and whether there is a change or not. In this case, the image comparison unit 420 may compare one original image for a specific page in the original content and a converted image for each browser corresponding to the original image, respectively.

The image comparison unit 420 according to an embodiment of the present disclosure may compare the original image and the converted image by using an open source library related to computing vision. For example, the image comparison unit 420 may compare the original image and the converted image using a template matching and structural similarity index algorithm provided by Open Source Computer Vision (OpenCV).

On the other hand, the template matching algorithm is a method of comparing the converted image on top of the original image and moving the designated area little by little. When this template matching is performed for the entire area, it takes a considerable amount of time, which affects the verification time. For this reason, it is possible to perform template matching only for some regions, and then proceed to the next step and compare the differences more precisely by using the structural similarity index algorithm.

The content comparison unit 430 compares the original content and the object and/or image included in the digitally converted content.

The content comparison unit 430 according to an embodiment of the present disclosure compares an object included in the original content with a vector-based object included in the digitally converted content. As such, the content comparison unit 430 performs an inspection using data mapping, not an inspection of the area shown on the screen.

The content comparison unit 430 according to an embodiment of the present disclosure may extract only an image from among the objects included in the original content, and may compare it with the image generated in the conversion process in units of pixels.

As described above, the image comparison unit 420 creates and compares an image for a visible surface, whereas the content comparison unit 430 compares the images with the text and/or objects hidden in other objects as well as the objects in the uppermost layer. comparisons can be made.

The image comparison unit 420 and the content comparison unit 430 provide the comparison result to the learning unit 260 as a post-test result.

The digital conversion content inspection device 20 extracts data from the original content and generates processed content through a pre-processing process (S500). The digital conversion content inspection apparatus 20 according to an embodiment of the present disclosure parses the original content and includes text, object, image, and vector data included in the original content ) and layout information of the original content are extracted. The digital conversion content inspection apparatus 20 according to an embodiment of the present disclosure flattens the original content composed of at least one layer into a single layer, and flattens the text and/or objects included in the original content. Creates processed content that removes text and objects that are hidden by objects included in the layer.

The digital conversion content inspection device 20 performs a preliminary inspection using the text and processed content extracted from the original content (S510). The digital conversion content inspection apparatus 20 according to an embodiment of the present disclosure may compare the text value or style information of the text recognized from the processed content and the text extracted from the original content. Here, the style information of the text refers to information related to at least one of the direction, spacing, leading, and size of the text. Digital conversion content inspection apparatus 20 according to an embodiment of the present disclosure includes whether a document is encrypted with respect to the original content, whether the document is corrupted, a font (or a subset font) in the document, a bookmark, and/or Alternatively, it is possible to determine whether a cut line is included or the like.

The digital conversion content inspection device 20 determines whether the pre-examination result satisfies a preset conversion start condition (S520). Here, the preset conversion start conditions include whether the text recognized from the processed content matches the text extracted from the original content, whether the document is encrypted with respect to the original content, whether the document is broken, the font (or subset font) in the document, and the bookmark and/or may be a condition related to whether or not a crop line is included.

When the pre-examination result does not satisfy the preset conversion start condition, the digital conversion content inspection device 20 outputs the pre-examination result to inform the user that the conversion cannot be performed due to the problem of the original content itself. (S580).

If the pre-examination result satisfies the preset conversion start condition, the digital conversion content verification apparatus 20 creates digital conversion content based on the data extracted from the original content (S530).

The digitally converted content inspection device 20 performs post inspection using the original content and the digitally converted content (S540). The digital conversion content inspection device 20 according to an embodiment of the present disclosure generates an original image and a converted image that are a screen image (surface image) obtained by rendering the original content and the digitally converted content, respectively, and the original image and the converted image can be compared In this case, the digital conversion content inspection device 20 may detect a difference between the original image and the converted image using a template matching algorithm and/or a structural similarity index algorithm. The digitally converted content inspection apparatus 20 according to an embodiment of the present disclosure may compare an object included in the original content with a vector-based object included in the digitally converted content by using data mapping. The digitally converted content inspection apparatus 20 according to an embodiment of the present disclosure may compare an image included in the original content with an image included in the digitally converted content in units of pixels.

The digital conversion content inspection apparatus 20 checks whether the post inspection result satisfies a preset re-conversion condition (S550). In case the preset reconversion conditions are not satisfied, there is no difference between the original content and the converted content as a result of the post inspection, or the difference between the original content and the converted (or reconverted) digitally converted content despite repeated reconversion It may mean that it exists.

If the post inspection result satisfies the preset re-conversion condition, the digital conversion content inspection apparatus 20 performs re-conversion and re-examination processes (S500 to S540).

If the post inspection result does not satisfy the preset re-conversion condition, the digital conversion content inspection device 20 checks whether the conversion from the original content to the digital conversion content was successful based on the post inspection result (S560). Here, the conversion success means a case in which there is no difference between the original content and the converted content as a result of the post inspection.

When the conversion from the original content to the digital conversion content is successful, the digital conversion content inspection device 20 reflects the setting values used for conversion and inspection in learning (S570). Here, the set value may include a set value for the crop line and/or a language used to implement a character recognition technology or a computer vision technology, a matching condition, a CMAP value, and the like. Here, 'language' means a language in which character recognition technology is to be performed, and is a setting value that greatly affects the recognition rate of character recognition technology. When implementing character recognition technology, since the characters themselves can vary depending on the reading direction, extract the area where words or texts exist in the original content to determine the text direction, and divide the area to perform character recognition according to the text direction. do. In particular, in the case of a language in which horizontal and vertical directions are mixed, such as Japanese and Chinese, or a language written from right to left, such as Arabic, an accurate value cannot be read without language setting. The digital conversion content inspection device 20 according to an embodiment of the present disclosure extracts the text direction by parsing words and/or text from the original content, and after checking the CMAP value, it is possible to grasp information such as font and language, , through the learning process, it is possible to increase the judgment rate for text direction, rotation, and language setting.

If the conversion from the original content to the digital conversion content fails, the digital conversion content inspection device 20 may provide the user with information necessary for visual inspection by outputting a pre-inspection result and/or a post-inspection result (S580) . The digital conversion content inspection apparatus 20 according to an embodiment of the present disclosure may create an inspection report based on a pre-inspection result and/or a post-inspection result. Here, the inspection report may include information about a page or area in which a difference exists between the original content and the digitally converted content, a conversion failure case, and the like. As described above, the digital conversion content inspection device 20 according to an embodiment of the present disclosure is not only information about the set value used for inspection and/or conversion, but also information that the inspector can actually refer to during visual inspection. can be provided to

Although it is described that each process is sequentially executed in FIG. 5 , this is merely illustrative of the technical idea of an embodiment of the present disclosure. In other words, those of ordinary skill in the art to which an embodiment of the present disclosure pertain may change the order described in FIG. 5 within a range that does not depart from the essential characteristics of an embodiment of the present disclosure, or perform one or more of the respective processes. Since it will be possible to apply various modifications and variations by executing in parallel, FIG. 5 is not limited to a time-series order.

Various implementations of the systems and techniques described herein may be implemented in digital electronic circuitry, integrated circuitry, field programmable gate array (FPGA), application specific integrated circuit (ASIC), computer hardware, firmware, software, and/or combination can be realized. These various implementations may include being implemented in one or more computer programs executable on a programmable system. The programmable system includes at least one programmable processor (which may be a special purpose processor) coupled to receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device. or may be a general-purpose processor). Computer programs (also known as programs, software, software applications or code) contain instructions for a programmable processor and are stored on a "computer-readable recording medium".

The computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored. These computer-readable recording media are non-volatile or non-transitory, such as ROM, CD-ROM, magnetic tape, floppy disk, memory card, hard disk, magneto-optical disk, and storage device. It may be a medium, and may further include a transitory medium such as a data transmission medium. In addition, the computer-readable recording medium may be distributed in a network-connected computer system, and the computer-readable code may be stored and executed in a distributed manner.

Various implementations of the systems and techniques described herein may be implemented by a programmable computer. Here, the computer includes a programmable processor, a data storage system (including volatile memory, non-volatile memory, or other types of storage systems or combinations thereof), and at least one communication interface. For example, a programmable computer may be one of a server, a network appliance, a set-top box, an embedded device, a computer expansion module, a personal computer, a laptop, a Personal Data Assistant (PDA), a cloud computing system, or a mobile device.

The above description is merely illustrative of the technical idea of this embodiment, and a person skilled in the art to which this embodiment belongs may make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are intended to explain rather than limit the technical spirit of the present embodiment, and the scope of the technical spirit of the present embodiment is not limited by these embodiments. The protection scope of this embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present embodiment.

(Explanation of symbols)

20: digital conversion content inspection device 200: input unit

210: data extraction unit 220: pre-processing unit

230: advance inspection unit 240: conversion unit

250: post inspection unit 260: learning unit

270: output unit 300: text recognition unit

310: text comparison unit 400: original image generation unit

410: converted image generation unit 420: image comparison unit

430: content comparison unit

CROSS-REFERENCE TO RELATED APPLICATIONCROSS-REFERENCE TO RELATED APPLICATION

This patent application claims priority to Patent Application No. 10-2020-0180497, filed in Korea on December 22, 2020, which is incorporated herein by reference in its entirety.

Claims

As an inspection method performed by a digital conversion content inspection device,

The original content composed of at least one layer is flattened into a single layer, and texts and objects that are obscured by an object included in a higher layer among texts and objects included in the original content are removed. The process of creating processed content that is a result of the removal;

a pre-examination process of comparing the text recognized from the processed content with the text extracted from the original content;

a post-examination process of comparing the original content with digitally converted content produced based on the original content; and

The process of creating an inspection report based on the comparison result of the pre-inspection process and the post-inspection process

Digital conversion content inspection method comprising a.
According to claim 1,

The preliminary inspection process is

A digital conversion content inspection method, characterized in that the text value recognized from the processed content and the text value extracted from the original content are compared.
According to claim 1,

The preliminary inspection process is

A digital conversion content inspection method, characterized in that the style information of the text recognized from the processed content and the style information of the text extracted from the original content are compared.
3. The method of claim 2,

The style information of the text is

A digital conversion content inspection method, characterized in that the information is information related to at least one of direction, spacing, leading, and size of text.
The method of claim 1,

The post-inspection process is

generating an original image that is a surface image obtained by rendering the original content;

generating a converted image that is a screen image obtained by rendering the digitally converted content; and

Comparing the original image and the converted image

Digital conversion content inspection method comprising a.
6. The method of claim 5,

The process of comparing the original image and the converted image,

A method for examining digitally converted content, characterized in that the difference between the original image and the converted image is detected by using a template matching algorithm and/or a structural similarity index algorithm.
The method of claim 1,

The post-inspection process is

Digitally converted content inspection method, characterized in that the object included in the original content and the vector-based object included in the digitally converted content are compared.
According to claim 1,

The post-inspection process is

Digitally converted content inspection method, characterized in that the image included in the original content and the image included in the digitally converted content are compared in units of pixels.
The method of claim 1,

The inspection report is

Digitally converted content inspection method, characterized in that it includes information on at least one of a page and an area where a difference between the original content and the digitally converted content exists.
The method of claim 1,

Prior to the process of writing the inspection report,

Based on the comparison result of the pre-examination process and the post-approval process, digital conversion content inspection method, characterized in that the digital conversion content production is re-performed.
The original content composed of at least one layer is flattened into a single layer, and texts and objects that are obscured by the objects included in the upper layer among the texts and objects included in the original content are removed. a preprocessor for generating processed content that is a result of the removal;

a dictionary check unit comparing the text recognized from the processed content with the text extracted from the original content;

a post-examination unit for comparing the original content with digitally converted content produced based on the original content; and

A learning unit that creates an inspection report based on a comparison result of the pre-inspection process and the post-inspection process

Digital conversion content inspection device comprising a.
12. The method of claim 11,

The learning unit,

Digital conversion content inspection apparatus, characterized in that by using the comparison result of the pre-examination unit and the post-examination unit, the pre-examination unit and the post-examination unit are learned.
A computer program stored in a computer-readable recording medium to execute each process included in the digital conversion content inspection method according to any one of claims 1 to 10.