WO2016175666A1 - A method of automatic separation and identification of layers of substances on paper using a multi-spectral analysis approach - Google Patents
A method of automatic separation and identification of layers of substances on paper using a multi-spectral analysis approach Download PDFInfo
- Publication number
- WO2016175666A1 WO2016175666A1 PCT/PL2015/000073 PL2015000073W WO2016175666A1 WO 2016175666 A1 WO2016175666 A1 WO 2016175666A1 PL 2015000073 W PL2015000073 W PL 2015000073W WO 2016175666 A1 WO2016175666 A1 WO 2016175666A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- layers
- document
- content
- substances
- paper
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 239000000126 substance Substances 0.000 title claims abstract description 31
- 238000000926 separation method Methods 0.000 title claims abstract description 23
- 238000010183 spectrum analysis Methods 0.000 title abstract description 6
- 238000013459 approach Methods 0.000 title description 4
- 238000001228 spectrum Methods 0.000 claims description 88
- 230000003044 adaptive effect Effects 0.000 claims description 9
- 238000002224 dissection Methods 0.000 claims description 5
- 239000012634 fragment Substances 0.000 claims description 3
- 238000010422 painting Methods 0.000 claims description 2
- 238000007639 printing Methods 0.000 claims description 2
- 239000000976 ink Substances 0.000 abstract description 17
- 239000003973 paint Substances 0.000 abstract description 7
- 238000011109 contamination Methods 0.000 abstract description 3
- 238000010276 construction Methods 0.000 abstract description 2
- 239000000463 material Substances 0.000 description 22
- 238000002156 mixing Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 150000001875 compounds Chemical class 0.000 description 6
- 102100022523 Acetoacetyl-CoA synthetase Human genes 0.000 description 5
- 101000678027 Homo sapiens Acetoacetyl-CoA synthetase Proteins 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 230000035945 sensitivity Effects 0.000 description 4
- 238000010521 absorption reaction Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000000701 chemical imaging Methods 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 206010016256 fatigue Diseases 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000000499 gel Substances 0.000 description 1
- 229910002804 graphite Inorganic materials 0.000 description 1
- 239000010439 graphite Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000001429 visible spectrum Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G07—CHECKING-DEVICES
- G07D—HANDLING OF COINS OR VALUABLE PAPERS, e.g. TESTING, SORTING BY DENOMINATIONS, COUNTING, DISPENSING, CHANGING OR DEPOSITING
- G07D7/00—Testing specially adapted to determine the identity or genuineness of valuable papers or for segregating those which are unacceptable, e.g. banknotes that are alien to a currency
- G07D7/06—Testing specially adapted to determine the identity or genuineness of valuable papers or for segregating those which are unacceptable, e.g. banknotes that are alien to a currency using wave or particle radiation
- G07D7/12—Visible light, infrared or ultraviolet radiation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/84—Systems specially adapted for particular applications
- G01N21/8422—Investigating thin films, e.g. matrix isolation method
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/84—Systems specially adapted for particular applications
- G01N21/86—Investigating moving sheets
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2134—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/16—Image preprocessing
- G06V30/164—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/84—Systems specially adapted for particular applications
- G01N21/8422—Investigating thin films, e.g. matrix isolation method
- G01N2021/8438—Mutilayers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- Most of the paper documents - printed, drawn, painted, written etc. - contain layers of different substances, including: ink, paint, toner, as well as various types of contamination and the paper base itself.
- a fresh laser printout may contain two main layers (the paper and the toner), but a logistics documents may contain tens of layers - the paper, the toner as well as many signatures, stamps and annotations.
- the separation of these layers is critical to retrieve the full content of the document (e.g. to identify the people signing, the dates on the stamps, the content of notes hidden beneath more recent layers etc.) and many techniques have been developed to achieve that goal.
- Multi- and hyper-spectral analysis is a technology used for decades to analyze the properties of various materials, identify them and - basing on a fact that the light can penetrate most of the substances to some degree - retrieving the shallow three-dimensional structure of the objects and - in certain cases - see through the upper layers of the analyzed object into the layers below.
- These spectral techniques are often used in forensics and criminology, material engineering and in the museums.
- the Inventors of this patent application utilize the multi-spectral techniques as methods of reducing the content noise in the printed documents in order to improve efficiency of the OCR, which is described in the Patent Application [CITATIONM].
- the methods disclosed there aim at separation of the printed part of the content of a document, discarding the rest of the content (including stamps, signatures, print-overs, surface contamination etc.), in order to increase clarity of the image to be run through the Optical Content Recognition algorithms.
- These techniques do not aim at identifying nor separating anything other than the printed text.
- a number of digital image processing techniques have been developed for cases, where retrieval of certain features (e.g. a signature) of a scanned document is required, basing on digital color-based filtering and interpolation.
- the invention provides an automated solution to identification and separation of layers of substances (in most cases - the inks, toners and paints of different kinds) on a paper utilizing multi-spectral analysis.
- the aim of the invention is to provide an automated method allowing construction of a device, which for every given point on a paper surface, is able to identify substances, often overlaying each other, present in this point, as well as retrieval of layers representing all the points on the paper where a given substance is present.
- a typical use case of this method is a a stamp placed over a text and a hand-written signature over this stamp.
- the method described here allows separation of these three substances (the printed toner, the stamp ink and the signature ink) in every point where at least one of them is present. This in turn allows separation of the complete print, the complete stamp and the complete signature as distinct document layers.
- the invention utilizes a combination of multi-spectral scanning technique providing spectral data and specialized algorithms that allow separation and identification of layers basing on these spectral data.
- the aim of this Invention is to provide an automated way of separation of layers of the document, which may be efficiently incorporated into a document processing workflow and used as a standard in enterprises, institutions and potentially in mainstream market devices.
- the Invention is also targeted at usage in automated and semi-automated processing of documents, books, drawings, paintings etc. where separation of layers corresponding to different materials is essential, and in particular in: recovery of the content of damaged documents (e.g. in case of flood or fire) and identification of forged documents and works of art.
- the separation of layers allows to see the difference between the materials that are hardly distinguishable by the eye (in the examples mentioned above: washed-out ink, ink on burned paper, subtle difference between original ink and the ink used by the forger).
- the proposed solution relies on multi-spectral scanning of the document in the wide wavelength spectrum and utilizing the spectral data collected for every point to identify substances present in these points and then - retrieve the layers of different substances across the document.
- the Invention proposes the following set of techniques of identification and separation of layers related to different substances on the paper and the paper itself, basing on the algorithmical analysis of the spectrums gathered by the multi-spectral scanner:
- the AASC provides a method to identify two spectrums as "identical”, taking into account facts that the measured spectrums depend not only on the analyzed material, but also on the measurement accuracy and light conditions, that render gathering mathematically identical spectrums practically impossible.
- the algorithm compares spectrums of different materials, where the "X" axis of the spectrum graph corresponds to wavelength used for measurement and the "Y" axis - contains " light metrics" parameter.
- the light metrics as a value depending primarily on the measured light intensity, but which may take into account the attributes of the detectors used (e.g. their sensitivity and noise level) and the light conditions.
- the light metrics is calculated as:
- LIGHT METRICS 100 * MEASURED LIGHT INTENSITY / MAXIMUM LIGHT INTENSITY OF THE SENSOR
- the algorithm defines that the two spectrums are identical if for each of measured wavelengths the absolute value of the difference in light metrics parameters of both of the spectrums is lower than IC, where IC is an numeric "identity coefficient" parameter, that can be freely defined according to the required algorithm sensitivity.
- IC determines also a notion of an " identity channel ⁇ ', defined as a set of all possible spectrums identical to the given spectrum.
- the identity channel can be represented by a set of intervals of height 2 * IC imposed on a given spectrum graph. All the spectrums that have the light metrics for each of the measured wavelengths within the identity channel - are identical; and all the spectrums for which at least one of the light metrics for any wavelength falls outside - is different. Comparing the spectrums using the identity channel approach is a graphical representation of the AACS.
- the AASD is an algorithm allowing assessment if a given spectrum (the "compound spectrum”) is a combination of spectrums of two or more individual material spectrums ("base spectrums").
- the algorithm has three steps:
- Step 1 Recalculation of the measured spectrums of the base and compound materials into the light metrics units (with utilization of normalization formula),
- Step 2 Calculation of so called Mixing Spectrum from the "base spectrums" expressed in light metrics units
- Step 3 Comparing of the "compound spectrum" expressed in light metrics units with the Mixing Spectrum using the AASC algorithm. If the spectrums are identical (in sense of AASC), then the compount spectrum is a combination of the base spectrums, if it is not identical then it is not a combination.
- Step 1 Each of the measured spectrums (the “compound spectrum” and “base spectrums”) is re-calculated (for each of the wavelengths ) into light metrics units using the following formula:
- LIGHT METRICS 100 * (MEASURED LIGHT INTENSITY - MINIMUM SENSOR INTENSITY) / (MAXIMUM SENSOR INTENSITY - MINIMUM SENSOR INTENSITY)
- Step 2 The Mixing Spectrum (a spectrum that should theoretically be a result of mixing of the base spectrums) is calculated for each of wavelengths using the following formula:
- LIGHT METRICS MIXING SPECTRUM
- LIGHT METRICS BASE SPECTRUM 1
- LIGHT METRICS BASE SPECTRUM 2
- ... * LIGHT METRICS BASE SPECTRUM N
- Step 3 The AASC is used to compare Mixing Spectrum calculated in Step 2 with the Compound Spectrum calculated in Step 1 , using a chosen identity coefficient.
- the TARL is a technique allowing automated identification of layers in the image scanned using the multi-spectral scanner and utilizing the AACS algorithm.
- the TARL technique assumes access to the spectrums of all the points in the analyzed fragment of the image and has the following steps:
- the spectrum is compared with all the already identified layers using the AACS algorithm.
- this spectrum is considered a new layer, and additionally marked as belonging to that layer.
- NL defining the sensitivity of the layer vs. noise detection can be freely chosen depending on the technique application.
- the "Database of Substances” is a collection of spectrums of the known materials. It may be generated automatically utilizing the TARL technique from the samples of the documents created with use if the known materials (e.g. inks, toners, gels, paints, graphite etc.).
- the Database of Substances may be global or managed by the individual users or institutions. In the latter case, it allows - with usage of the TARKL technique described below - to determine the materials that fall out of the standards utilized by the given institution, which in turn allows e.g. to detect forgeries or other types of singularities.
- TARKL is an extended and modified version of TARL, basing on utilizing a chosen subset of the "Database of Substances" used by the given user or institution, to pre-determine the layers that are expected to be present in the scanned document, which in turn may improve accuracy of the layer detection as well as will produce more predictable layer division.
- TARKL uses the following steps:
- the known layers are added to the list of the detected layers;
- TSASL Semi-Automated Separation of Layers
- This technique allows indicating the layer identical to (in the sense of AASC) or containing (in the sense of AASD) the chosen spectrum and showing that layer in the original image.
- This technique allows visual, point-and-click separation of layers on the screen of the electronic device (computer, tablet etc.).
- the user can influence the layer identification modifying the parameters listed in the description of the TARL, as well as switching on and off the AASD algorithm.
- Step 0 If the user chooses to include in the retrieval the mixed spectrums too, the TARL or TARKL algorithms are executed on the complete image first to detect the existing layers that may be subject to mixing.
- the technique has the following steps:
- the spectrum of the point is compared with the spectrum of the chosen point using the AACS algorithm.
- the document is scanned using multi-spectral techniques
- Each of the pages is processed using the TARL or TARKL techniques.
- the technique requires interaction with a user, providing him/her with layer-based information to be assessed.
- the document is scanned using a multi-spectral scanner
- the layers are extracted using the TARL or TARKL technique
- the identified layers are presented to the user, focusing on the elements that are similar in the visible spectrum, but not identical (in the sense of the AASC algorithm);
- the user may also be presented with information on the material used in the different layers.
- This technique is especially effective in recognizing the differences between handwritten signatures or real stamps or their facsimiles, as well as in recognizing the different inks which look the same to a naked eye.
- the Invention allows automated separation and identification of layers of substances (different types of inks, paints, toners, organic and non-organic substances etc.), thus providing a way to separate and identify various "business layers" of the document, like: print, stamps, signatures, annotations etc.
- the Invention may be utilized in business (e.g. reading the content of the stamps, signatures etc.), in criminology and forensics (e.g. to identify the forged or amended official documents, to retrieve erased content etc.), in archiving and digitization (e.g. to retrieve the crossed-out entries in the registers, to retrieve the signatures, the author notes, to retrieve the content damaged by water, fire etc., etc.), in libraries (to retrieve the original content of the books with handwritten notes), to detect singularities in the documents (which may be indications of forgeries) and more.
- business e.g. reading the content of the stamps, signatures etc.
- criminology and forensics e.g. to identify the forged or amended official documents, to retrieve erased content etc.
- archiving and digitization e.g. to retrieve the crossed-out entries in the registers, to retrieve the signatures, the author notes, to retrieve the content damaged by water, fire etc., etc.
- libraries to retrieve the original content of the books
- Figure 1 Spectrums of different materials gathered by a multi-spectral scanner.
- Figure 2 Graphical representation of the Algorithm of Adaptive Spectrum Comparison.
- Figure 3 a graphical representation of the Algorithm of Adaptive Spectrum Dissection.
- Figure 4 A graphical representation of Technique of Automated Retrieval of Layers. In this example three layers were identified and one spectrum (Material 13) was considered noise.
- Figure 5 A result of TSASL technique with and without the Algorithm of Adaptive Spectrum Dissection (AASD).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Toxicology (AREA)
- Analytical Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Pathology (AREA)
- Immunology (AREA)
- Biochemistry (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Mathematical Physics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Investigating Or Analysing Materials By Optical Means (AREA)
- Sampling And Sample Adjustment (AREA)
- Analysing Materials By The Use Of Radiation (AREA)
Abstract
Most of the paper documents - printed, drawn, painted, written etc. - contain layers of different substances, including: ink, paint, toner, as well as various types of contamination and the paper base itself. In many cases the separation of these layers is critical to retrieve the full content of the document (e.g. to identify the people signing, the dates on the stamps, the content of notes hidden beneath more recent layers etc.) and many techniques have been developed to achieve that goal. The Inventors of this patent application utilize the multi-spectral techniques as methods of reducing the content noise in the printed documents in order to improve efficiency of the OCR, which is described in the Patent Application [CITATIONM - PL P.411280]. The invention provides an automated solution to identification and separation of layers of substances (in most cases - the inks, toners and paints of different kinds) on a paper utilizing multi- spectral analysis. The aim of the invention is to provide an automated method allowing construction of a device, which for every given point on a paper surface, is able to identify substances, often overlaying each other, present in this point, as well as retrieval of layers representing all the points on the paper where a given substance is present.
Description
Description
A method of automatic separation and identification of layers of substances on paper using a multi-spectral analysis approach
Technical Field
[0001] Multi-spectral analysis
Background Art
[0002] Most of the paper documents - printed, drawn, painted, written etc. - contain layers of different substances, including: ink, paint, toner, as well as various types of contamination and the paper base itself. For instance - a fresh laser printout may contain two main layers (the paper and the toner), but a logistics documents may contain tens of layers - the paper, the toner as well as many signatures, stamps and annotations. In many cases the separation of these layers is critical to retrieve the full content of the document (e.g. to identify the people signing, the dates on the stamps, the content of notes hidden beneath more recent layers etc.) and many techniques have been developed to achieve that goal.
[0003] Multi- and hyper-spectral analysis is a technology used for decades to analyze the properties of various materials, identify them and - basing on a fact that the light can penetrate most of the substances to some degree - retrieving the shallow three-dimensional structure of the objects and - in certain cases - see through the upper layers of the analyzed object into the layers below. These spectral techniques are often used in forensics and criminology, material engineering and in the museums.
[0004] The Inventors of this patent application utilize the multi-spectral techniques as methods of reducing the content noise in the printed documents in order to improve efficiency of the OCR, which is described in the Patent Application [CITATIONM]. The methods disclosed there aim at separation of the printed part of the content of a document, discarding the rest of the content (including stamps, signatures, print-overs, surface contamination etc.), in order to increase clarity of the image to be run through the Optical Content Recognition algorithms. These techniques do not aim at identifying nor separating anything other than the printed text.
A number of digital image processing techniques have been developed for cases, where retrieval of certain features (e.g. a signature) of a scanned document is required, basing on digital color-based filtering and interpolation. These techniques - often available as plug-ins for image processing programs (e.g. Photoshop) - base on post-processing of digital images and - while often effective for simpler cases - cannot retrieve the actual layers of the original document, especially when the layers have identical or very close colors (e.g. black on black), as well as when the upper layers completely cover the lower layers.
Disclosure of Invention
[0006] The invention provides an automated solution to identification and separation of layers of substances (in most cases - the inks, toners and paints of different kinds) on a paper utilizing multi-spectral analysis.
[0007] The aim of the invention is to provide an automated method allowing construction of a device, which for every given point on a paper surface, is able to identify substances, often overlaying each other, present in this point, as well as retrieval of layers representing all the points on the paper where a given substance is present. A typical use case of this method is a a stamp placed over a text and a hand-written signature over this stamp. The method described here allows separation of these three substances (the printed toner, the stamp ink and the signature ink) in every point where at least one of them is present. This in turn allows separation of the complete print, the complete stamp and the complete signature as distinct document layers.
[0008] The invention utilizes a combination of multi-spectral scanning technique providing spectral data and specialized algorithms that allow separation and identification of layers basing on these spectral data.
Technical Problem
[0009] The challenge of separation of layers is currently solved in one of two ways: (1 ) the digital post-processing of a RGB image, which relies on easily obtainable digital scan of a document and can be automated and incorporated in document processing workflow or (2) the usage of hyper- or multi-spectral imaging, which requires a specialized scientific equipment and produces large quantities of data of mainly scientific value, therefore is typically applied in forensic, museum etc. laboratories. This processing, due to its complexity, cannot currently be effectively incorporated into an automated document processing workflow.
[0010] The aim of this Invention is to provide an automated way of separation of layers of the document, which may be efficiently incorporated into a document processing workflow and used as a standard in enterprises, institutions and potentially in mainstream market devices.
[001 1 ] The Invention is also targeted at usage in automated and semi-automated processing of documents, books, drawings, paintings etc. where separation of layers corresponding to different materials is essential, and in particular in: recovery of the content of damaged documents (e.g. in case of flood or fire) and identification of forged documents and works of art. In these cases, the separation of layers allows to see the difference between the materials that are hardly distinguishable by the eye (in the examples mentioned above: washed-out ink, ink on burned paper, subtle difference between original ink and the ink used by the forger).
Technical Solution
[0012] The proposed solution relies on multi-spectral scanning of the document in the wide wavelength spectrum and utilizing the spectral data collected for every point to identify substances present in these points and then - retrieve the layers of different substances across the document.
[0013] The multi-spectral scanning bases on a principle that different substances have different levels of absorption and of reflection of light in different wave lengths. For instance, if a paper document with text printed in different colors as well as signatures, stamps and branding print-overs is observed in a range of wave lengths between infrared and UV, depending on the wave length some elements are visible less or more, in some cases the elements of the image start to glow, while in others some disappear completely.
[0014] Generally, all materials observed on the surface of the document (like: inks, toner, pen, pencil, dirt etc.), as well as the surface (usually: paper) itself, have their own spectral traces (which might be distinctively unique or not), depending primarily on its chemical make-up and its physical structure. These spectrums reflect the level of absorption and reflection of light of the given material depending on the light wavelength (as seen in Figure 1 ). Majority of the materials have their spectrums unique enough to allow identification of that material basing on the spectrum.
[0015] The Invention proposes the following set of techniques of identification and separation of layers related to different substances on the paper and the paper itself, basing on the algorithmical analysis of the spectrums gathered by the multi-spectral scanner:
1. Automated Retrieval of Layers based on cluster-based analysis of spectrums from the whole document;
2. Automated Retrieval of Known Layers using a context-sensitive subset of the "Database of Substances";
3. Creation of database of typical substances (aka the "Database of Substances");
4. Semi-Automated Separation of Layers using point-and-identify layer retrieval technique.
[0016] In addition to this, the Invention proposes two basic algorithms of comparison, identification and separation of spectrums:
1 . Algorithm of Adaptive Spectrum Comparison, which allows to compare two spectrums taken in different lighting conditions and to identify them as spectrums of identical or sufficiently close substances;
2. Algorithm of Adaptive Spectrum Dissection, allowing identifying a spectrum as a product of multiple individual substance spectrums in different lighting conditions, used to separate overlapping layers.
Algorithm of Adaptive Spectrum Comparison [AASC]:
[0017] The AASC provides a method to identify two spectrums as "identical", taking into account facts that the measured spectrums depend not only on the analyzed material, but also on the measurement accuracy and light conditions, that render gathering mathematically identical spectrums practically impossible.
[0018] The algorithm compares spectrums of different materials, where the "X" axis of the spectrum graph corresponds to wavelength used for measurement and the "Y" axis - contains " light metrics" parameter.
[0019] We define the light metrics as a value depending primarily on the measured light intensity, but which may take into account the attributes of the detectors used (e.g. their sensitivity and noise level) and the light conditions. Here, we assume that the light metrics is calculated as:
LIGHT METRICS = 100 * MEASURED LIGHT INTENSITY / MAXIMUM LIGHT INTENSITY OF THE SENSOR
although its calculation may be more complex.
[0020] The algorithm defines that the two spectrums are identical if for each of measured wavelengths the absolute value of the difference in light metrics parameters of both of the spectrums is lower than IC, where IC is an numeric "identity coefficient" parameter, that can be freely defined according to the required algorithm sensitivity.
[0021 ] The value of IC determines also a notion of an " identity channel· ', defined as a set of all possible spectrums identical to the given spectrum.
Graphically, the identity channel can be represented by a set of intervals of
height 2*IC imposed on a given spectrum graph. All the spectrums that have the light metrics for each of the measured wavelengths within the identity channel - are identical; and all the spectrums for which at least one of the light metrics for any wavelength falls outside - is different. Comparing the spectrums using the identity channel approach is a graphical representation of the AACS.
[0022] Comparing the spectrums using the identity channel approach is presented in Figure 2. In that figure, the spectrums of Material 28 a di Test Material are considered identical (as they are contained by the same identity channel), while spectrum of Material 6 ' is considered different.
Algorithm of Adaptive Spectrum Dissection [AASD]:
[0023] The AASD is an algorithm allowing assessment if a given spectrum (the "compound spectrum") is a combination of spectrums of two or more individual material spectrums ("base spectrums").
[0024] The algorithm has three steps:
1 . Step 1 : Recalculation of the measured spectrums of the base and compound materials into the light metrics units (with utilization of normalization formula),
2. Step 2: Calculation of so called Mixing Spectrum from the "base spectrums" expressed in light metrics units
3. Step 3: Comparing of the "compound spectrum" expressed in light metrics units with the Mixing Spectrum using the AASC algorithm. If the spectrums are identical (in sense of AASC), then the compount spectrum is a combination of the base spectrums, if it is not identical then it is not a combination.
[0025] Step 1 : Each of the measured spectrums (the "compound spectrum" and "base spectrums") is re-calculated (for each of the wavelengths ) into light metrics units using the following formula:
LIGHT METRICS = 100 * (MEASURED LIGHT INTENSITY - MINIMUM SENSOR INTENSITY) / (MAXIMUM SENSOR INTENSITY - MINIMUM SENSOR INTENSITY)
[0026] Step 2: The Mixing Spectrum (a spectrum that should theoretically be a result of mixing of the base spectrums) is calculated for each of
wavelengths using the following formula:
LIGHT METRICS (MIXING SPECTRUM) = LIGHT METRICS (BASE SPECTRUM 1 ) * LIGHT METRICS (BASE SPECTRUM 2) * ... * LIGHT METRICS (BASE SPECTRUM N)
[0027] Step 3: The AASC is used to compare Mixing Spectrum calculated in Step 2 with the Compound Spectrum calculated in Step 1 , using a chosen identity coefficient.
[0028] The graphical representation of the Step 2 and Step 3 is presented in Figure 3. In that figure, the "Compound Spectrum" was found to be a combination of "Ink 1 " and "Ink 2" spectrums, as it falls into the identity channel of the calculated Mixing Spectrum.
Technique of Automated Retrieval of Layers [TARL]:
[0029] The TARL is a technique allowing automated identification of layers in the image scanned using the multi-spectral scanner and utilizing the AACS algorithm.
[0030] The TARL technique assumes access to the spectrums of all the points in the analyzed fragment of the image and has the following steps:
1 . For each of the scanned spectrums, the spectrum is compared with all the already identified layers using the AACS algorithm.
a. If the spectrum is determined identical to the layer, it is marked as belonging to that layer.
b. If it is determined as not identical, this spectrum is considered a new layer, and additionally marked as belonging to that layer.
2. After all the spectrums are processed, then for each of the identified layers a number of spectrums belonging to that layer is calculated and if the layer has less than NL spectrums assigned, this layer is considered not a layer, but noise. The parameter NL, defining the sensitivity of the layer vs. noise detection can be freely chosen depending on the technique application.
[0031] The above technique can be applied to the whole image or to chosen fragments and is parameterized by the following main coefficients:
1 . The Identity Coefficient s (used in AACS calculations);
2. The formula for the calculation of the light metrics;
3. The NL parameter defining noise vs. layer detection sensitivity;
4. The order of the processing of the spectrums (different ordering may in some cases produce slightly different division into layers and noise).
[0032] The "Database of Substances" is a collection of spectrums of the known materials. It may be generated automatically utilizing the TARL technique from the samples of the documents created with use if the known materials (e.g. inks, toners, gels, paints, graphite etc.).
[0033] The Database of Substances may be global or managed by the individual users or institutions. In the latter case, it allows - with usage of the TARKL technique described below - to determine the materials that fall out of the standards utilized by the given institution, which in turn allows e.g. to detect forgeries or other types of singularities.
Technique of Automated Retrieval of Known Layers [TARKL]:
[0034] TARKL is an extended and modified version of TARL, basing on utilizing a chosen subset of the "Database of Substances" used by the given user or institution, to pre-determine the layers that are expected to be present in the scanned document, which in turn may improve accuracy of the layer detection as well as will produce more predictable layer division.
[0035] TARKL uses the following steps:
1 . The known layers are added to the list of the detected layers;
2. The complete TARL is executed;
3. The layers with no spectrum assigned are removed.
Technique of Semi-Automated Separation of Layers [TSASL]:
This technique allows indicating the layer identical to (in the sense of AASC) or containing (in the sense of AASD) the chosen spectrum and showing that layer in the original image. This technique allows visual, point-and-click separation of layers on the screen of the electronic device (computer, tablet etc.).
[0037] The user can influence the layer identification modifying the parameters listed in the description of the TARL, as well as switching on and off the AASD algorithm.
[0038] Step 0: If the user chooses to include in the retrieval the mixed spectrums too, the TARL or TARKL algorithms are executed on the complete image first to detect the existing layers that may be subject to mixing.
[0039] The technique has the following steps:
1. The user chooses a point in the original image;
2. For each of the points in the image, the spectrum of the point is compared with the spectrum of the chosen point using the AACS algorithm.
a. If the spectrum is identicalXo the spectrum of the chosen point, it is marked as belonging to the chosen layer.
b. If it is not identical, then if the user chose to include the mixed layers, the AASD algorithm is used on the spectrum and each of the layers detected in Step 0, to check if the spectrum is a composite. If so, it is marked as belonging to the chosen layer. c. If neither of (a) and (b) occur, the point is discarded.
3. The points that were assigned to the chosen layer are presented to the user.
Retrieval of damaged content of the documents:
[0040] The above mentioned techniques may be utilized in order to retrieve the content of the documents that were damaged, for instance by water or fire, as long as the paper base is in sufficient condition to preserve the traces of the inks, paints or other printing/writing materials originally used in the document.
[0041] The restoration of the contents of such documents has the following steps:
1. The document is scanned using multi-spectral techniques;
2. Each of the pages is processed using the TARL or TARKL techniques.
This allows to detect the layers that have different chemical and physical structure than the surroundings, even if the content is not anymore visible by the eye;
3. The separated layers are presented to the users or extracted automatically, discarding the noise.
Automated detection of signature and stamp forgeries:
[0042] This technique allows automatic detection of certain types of forgeries or singularities in the hand-written, hand-signed or stamped documents.
[0043] The technique requires interaction with a user, providing him/her with layer-based information to be assessed.
[0044] The technique has the following steps:
1 . The document is scanned using a multi-spectral scanner;
2. The layers are extracted using the TARL or TARKL technique;
3. The identified layers are presented to the user, focusing on the elements that are similar in the visible spectrum, but not identical (in the sense of the AASC algorithm);
4. If the Database of Substances is available, the user may also be presented with information on the material used in the different layers.
[0045] This technique is especially effective in recognizing the differences between handwritten signatures or real stamps or their facsimiles, as well as in recognizing the different inks which look the same to a naked eye.
Advantageous Effects
[0046] The Invention allows automated separation and identification of layers of substances (different types of inks, paints, toners, organic and non-organic substances etc.), thus providing a way to separate and identify various "business layers" of the document, like: print, stamps, signatures, annotations etc.
[0047] The proposed techniques allow in many cases precise separation of layers that overlap, hiding the content in the layers below. In such cases the Invention allows retrieval of both the layers above and below.
[0048] The Invention may be utilized in business (e.g. reading the content of the stamps, signatures etc.), in criminology and forensics (e.g. to identify the forged or amended official documents, to retrieve erased content etc.), in archiving and digitization (e.g. to retrieve the crossed-out entries in the registers, to retrieve the signatures, the author notes, to retrieve the content damaged by water, fire etc., etc.), in libraries (to retrieve the original content of the books with handwritten notes), to detect singularities in the documents (which may be indications of forgeries) and more.
Brief Description of Drawings
[0049] Figure 1 : Spectrums of different materials gathered by a multi-spectral scanner.
[0050] Figure 2: Graphical representation of the Algorithm of Adaptive Spectrum Comparison.
[0051] Figure 3: a graphical representation of the Algorithm of Adaptive Spectrum Dissection.
[0052] Figure 4: A graphical representation of Technique of Automated Retrieval of Layers. In this example three layers were identified and one spectrum (Material 13) was considered noise.
[0053] Figure 5: A result of TSASL technique with and without the Algorithm of Adaptive Spectrum Dissection (AASD).
Claims
Claim 1 . Applying of multi-spectral scanning to documents in order to automatically retrieve the layers of substances which were used to produce its content.
Claim 2. Applying the method described in Claim 1 to separate the "business layers" of the content, which relate to the individual phases of creating the content (including: printing, writing, stamping, marking, damage etc.).
Claim 3. A method of automated identification of the primary layers of the document utilizing the technique of Automated Retrieval of Layers.
Claim 4. A method of automated identification of layers of the documents utilizing the technique of Automated Retrieval of Known Layers.
Claim 5. A method of human-operated separation of layers of the documents utilizing the technique of Semi-Automated Separation of Layers.
Claim 6. The algorithm of identifying the spectrums of a set of points in the document as a product of spectrums of individual substances utilizing the
Algorithm of Adaptive Spectrum Dissection.
Claim 7. The algorithm of comparing the spectrums of a set of points in the document utilizing the Algorithm of Adaptive Spectrum Comparison.
Claim 8. Applying the methods described in Claim 1 and Claim 2 to automatically retrieve the content of signatures and handwritten notes in a document.
Claim 9. Applying the methods described in Claim 1 and Claim 2 to automatically retrieve the content of stamps in a document.
Claim 10. Applying the methods described in Claim 1 and Claim 2 to automatically retrieve the content of crossed-out text in a document.
Claim 1 1 . Applying the methods described in Claim 1 and Claim 2 to automatically retrieve the content of damaged fragments of a document.
Claim 12. Applying the methods described in Claim 1 and Claim 2 to automatically retrieve the drawings and paintings in a document.
Claim 13. Applying the methods described in Claim 1 , Claim 2, Claim 3, Claim
4 and Claim 5 to detect forged signatures, forged stamps and facsimiles.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PLP.412148 | 2015-04-28 | ||
PL412148A PL412148A1 (en) | 2015-04-28 | 2015-04-28 | Method for automatic separation and identification of a substance layer on paper, using the multispectral analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016175666A1 true WO2016175666A1 (en) | 2016-11-03 |
Family
ID=53794463
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/PL2015/000073 WO2016175666A1 (en) | 2015-04-28 | 2015-04-29 | A method of automatic separation and identification of layers of substances on paper using a multi-spectral analysis approach |
Country Status (2)
Country | Link |
---|---|
PL (1) | PL412148A1 (en) |
WO (1) | WO2016175666A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348527A (en) * | 2019-07-16 | 2019-10-18 | 百度在线网络技术(北京)有限公司 | A kind of method for amalgamation processing of picture, device, equipment and storage medium |
US11276250B2 (en) | 2019-10-23 | 2022-03-15 | International Business Machines Corporation | Recognition for overlapped patterns |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011047235A1 (en) * | 2009-10-15 | 2011-04-21 | Authentix, Inc. | Document sensor |
WO2011110839A1 (en) * | 2010-03-09 | 2011-09-15 | Isis Innovation Limited | Multi-spectral scanning system |
US20140267754A1 (en) * | 2013-03-15 | 2014-09-18 | Luxtreme Limited | Method for applying a security marking to an object and a hyper-spectral imaging reader |
-
2015
- 2015-04-28 PL PL412148A patent/PL412148A1/en unknown
- 2015-04-29 WO PCT/PL2015/000073 patent/WO2016175666A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011047235A1 (en) * | 2009-10-15 | 2011-04-21 | Authentix, Inc. | Document sensor |
WO2011110839A1 (en) * | 2010-03-09 | 2011-09-15 | Isis Innovation Limited | Multi-spectral scanning system |
US20140267754A1 (en) * | 2013-03-15 | 2014-09-18 | Luxtreme Limited | Method for applying a security marking to an object and a hyper-spectral imaging reader |
Non-Patent Citations (2)
Title |
---|
JAY ARRE TOQUE ET AL: "Pigment identification by analytical imaging using multispectral images", IMAGE PROCESSING (ICIP), 2009 16TH IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 7 November 2009 (2009-11-07), pages 2861 - 2864, XP031629172, ISBN: 978-1-4244-5653-6 * |
KEITH T. KNOX: "Enhancement of overwritten text in the Archimedes Palimpsest", PROCEEDINGS OF SPIE, vol. 6810, 1 January 2008 (2008-01-01), pages 681004, XP055003639, ISSN: 0277-786X, DOI: 10.1117/12.766679 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348527A (en) * | 2019-07-16 | 2019-10-18 | 百度在线网络技术(北京)有限公司 | A kind of method for amalgamation processing of picture, device, equipment and storage medium |
US11276250B2 (en) | 2019-10-23 | 2022-03-15 | International Business Machines Corporation | Recognition for overlapped patterns |
Also Published As
Publication number | Publication date |
---|---|
PL412148A1 (en) | 2016-11-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bouhennache et al. | A new spectral index for the extraction of built-up land features from Landsat 8 satellite imagery | |
Polak et al. | Hyperspectral imaging combined with data classification techniques as an aid for artwork authentication | |
Mishra et al. | Mapping vegetation morphology types in a dry savanna ecosystem: Integrating hierarchical object-based image analysis with Random Forest | |
Phinn et al. | Multi-scale, object-based image analysis for mapping geomorphic and ecological zones on coral reefs | |
Pereira et al. | Projection pursuit and PCA associated with near and middle infrared hyperspectral images to investigate forensic cases of fraudulent documents | |
Giacometti et al. | The value of critical destruction: Evaluating multispectral image processing methods for the analysis of primary historical texts | |
US20130208985A1 (en) | System and Method for Improved Forensic Analysis | |
Dadon et al. | Sequential PCA-based classification of mediterranean forest plants using airborne hyperspectral remote sensing | |
Loh et al. | Development of a portable oil type classifier using laser-induced fluorescence spectrometer coupled with chemometrics | |
Besheer et al. | Modified invariant colour model for shadow detection | |
Goltz et al. | Assessing stains on historical documents using hyperspectral imaging | |
Andreou et al. | Investigation of hyperspectral remote sensing for mapping asphalt road conditions | |
Andronache et al. | Fractal algorithms and RGB image processing in scribal and ink identification on an 1819 secret initiation manuscript to the “Philike Hetaereia” | |
Gill et al. | Quality-assured fingerprint image enhancement and extraction using hyperspectral imaging | |
WO2016175666A1 (en) | A method of automatic separation and identification of layers of substances on paper using a multi-spectral analysis approach | |
Deák et al. | Heterogeneous forest classification by creating mixed vegetation classes using EO-1 Hyperion | |
López-Baldomero et al. | Selection of optimal spectral metrics for classification of inks in historical documents using hyperspectral imaging data | |
Deepthi et al. | Classification of forensic hyperspectral paper data using hybrid spectral similarity algorithms | |
Nininahazwe et al. | Mapping common and glossy buckthorns (Frangula alnus and Rhamnus cathartica) using multi-date satellite imagery WorldView-3, GeoEye-1 and SPOT-7 | |
da Silva Barbosa et al. | Later added strokes or text-fraud detection in documents written with ballpoint pens | |
Lee et al. | TOF-SIMS study of red sealing-inks on paper and its forensic applications | |
Fichera et al. | A combined approach for the attribution of handwriting: the case of Antonio Stradivari’s manuscripts | |
Sugawara | Obliterated-writing decipherment using an infrared hyperspectral imaging system | |
Dasari et al. | Identification of non-black inks using HSV colour space | |
Olyaei et al. | A hyperspectral reflectance database of plastic debris with different fractional abundance in river systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15748074 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15748074 Country of ref document: EP Kind code of ref document: A1 |