CN116363492A - Anti-counterfeiting identification method and identification system for file files based on OCR - Google Patents

Anti-counterfeiting identification method and identification system for file files based on OCR Download PDF

Info

Publication number
CN116363492A
CN116363492A CN202310645347.0A CN202310645347A CN116363492A CN 116363492 A CN116363492 A CN 116363492A CN 202310645347 A CN202310645347 A CN 202310645347A CN 116363492 A CN116363492 A CN 116363492A
Authority
CN
China
Prior art keywords
counterfeiting
counterfeiting mark
dot matrix
mark
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310645347.0A
Other languages
Chinese (zh)
Other versions
CN116363492B (en
Inventor
朱楠楠
刘飞飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jintian Industrial Development Shandong Group Co ltd
Original Assignee
Jintian Industrial Development Shandong Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jintian Industrial Development Shandong Group Co ltd filed Critical Jintian Industrial Development Shandong Group Co ltd
Priority to CN202310645347.0A priority Critical patent/CN116363492B/en
Publication of CN116363492A publication Critical patent/CN116363492A/en
Application granted granted Critical
Publication of CN116363492B publication Critical patent/CN116363492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/95Pattern authentication; Markers therefor; Forgery detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • G06V30/19093Proximity measures, i.e. similarity or distance measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The invention is applicable to the technical field of file anti-counterfeiting, and provides an OCR-based file anti-counterfeiting identification method and an OCR-based file anti-counterfeiting identification system, wherein the method comprises the following steps: performing optical character recognition processing on the file to obtain a black-white dot matrix image file; determining the page number and page size of a black-and-white dot matrix image file, and scaling the page size to a standard size; randomly retrieving an anti-counterfeiting mark generation message from an anti-counterfeiting mark library according to the number of pages, wherein the anti-counterfeiting mark generation message comprises an anti-counterfeiting mark number, a plurality of anti-counterfeiting mark page numbers and anti-counterfeiting mark coordinates; determining anti-counterfeiting application areas in the black-and-white dot matrix image file according to the anti-counterfeiting mark page number and the anti-counterfeiting mark coordinates, and stacking a plurality of anti-counterfeiting application areas to obtain a dot matrix anti-counterfeiting mark; and adding the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark into the file. Thus, the lattice anti-counterfeiting mark is related to the content of the file, each file is provided with the unique lattice anti-counterfeiting mark, and the anti-counterfeiting mark is difficult to imitate and has good anti-counterfeiting effect.

Description

Anti-counterfeiting identification method and identification system for file files based on OCR
Technical Field
The invention relates to the technical field of file anti-counterfeiting, in particular to an OCR-based file anti-counterfeiting identification method and an OCR-based file anti-counterfeiting identification system.
Background
In order to ensure the authenticity and reliability of the file, anti-counterfeiting marks are added in the file, the common anti-counterfeiting marks comprise watermark anti-counterfeiting and material anti-counterfeiting, and the common material anti-counterfeiting marks comprise color-changing materials, metal safety lines, plastic film stripping and scraping, laser and the like, so that the anti-counterfeiting mark has high use cost and is not beneficial to popularization; the general watermark anti-counterfeiting pattern is uniform, easy to imitate and low in safety. Therefore, there is a need to provide an OCR-based document anti-counterfeit identification method and system, which aims to solve the above problems.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention aims to provide an OCR-based file anti-counterfeiting identification method and an OCR-based file anti-counterfeiting identification system so as to solve the problems existing in the background art.
The invention is realized in such a way that an OCR-based file anti-counterfeiting identification method comprises the following steps:
performing optical character recognition processing on the file to obtain a black-white dot matrix image file;
determining the page number and page size of a black-and-white dot matrix image file, and scaling the page size to a standard size;
randomly retrieving an anti-counterfeiting mark generation message from an anti-counterfeiting mark library according to the number of pages, wherein the anti-counterfeiting mark generation message comprises an anti-counterfeiting mark number, a plurality of anti-counterfeiting mark page numbers and anti-counterfeiting mark coordinates;
determining anti-counterfeiting application areas in the black-and-white dot matrix image file according to the anti-counterfeiting mark page number and the anti-counterfeiting mark coordinates, and stacking a plurality of anti-counterfeiting application areas to obtain a dot matrix anti-counterfeiting mark;
adding the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark into a file;
receiving an anti-counterfeiting identification command of a file;
and identifying and judging the lattice anti-counterfeiting mark in the file, and generating an identification result.
As a further scheme of the invention: the step of determining the anti-counterfeiting application area in the black-and-white dot matrix image file according to the anti-counterfeiting mark page number and the anti-counterfeiting mark coordinate specifically comprises the following steps:
sequentially determining black-and-white dot matrix images corresponding to the anti-counterfeiting mark page numbers according to the numerical values of the anti-counterfeiting mark page numbers;
determining an anti-counterfeiting use area of the black-and-white dot matrix image according to anti-counterfeiting mark coordinates, wherein the anti-counterfeiting mark page numbers and the anti-counterfeiting mark coordinates correspond to each other one by one;
and arranging the obtained anti-counterfeiting application areas according to the corresponding anti-counterfeiting mark page numbers.
As a further scheme of the invention: the step of stacking a plurality of anti-counterfeiting application areas to obtain the dot matrix anti-counterfeiting mark specifically comprises the following steps:
image processing is carried out on the anti-counterfeiting use area in the arrangement, and the white spot part is converted into a transparent part;
and stacking all the anti-counterfeiting using areas according to the arrangement sequence to obtain the dot matrix anti-counterfeiting mark.
As a further scheme of the invention: the step of identifying and judging the lattice anti-counterfeiting mark in the file specifically comprises the following steps:
the corresponding anti-counterfeiting mark generation information is called according to the anti-counterfeiting mark number in the file;
performing optical character recognition processing on the file to obtain a black-and-white dot matrix image file, and processing the black-and-white dot matrix image file according to the anti-counterfeiting mark generation information to obtain a verification dot matrix anti-counterfeiting mark;
and comparing the similarity between the lattice anti-counterfeiting mark in the file and the verification lattice anti-counterfeiting mark.
As a further scheme of the invention: the step of comparing the similarity between the lattice anti-counterfeiting mark in the file and the verification lattice anti-counterfeiting mark specifically comprises the following steps:
respectively calculating hash values of the dot matrix anti-counterfeiting mark and the verification dot matrix anti-counterfeiting mark in the file by using a hash method based on discrete cosine transform to obtain h_1 and h_2;
calculating a hamming distance dis_h between h_1 and h_2;
and calculating according to the Hamming distance dis_h to obtain the similarity between the dot matrix anti-counterfeiting mark and the verification dot matrix anti-counterfeiting mark in the file.
It is another object of the present invention to provide an OCR-based document file anti-counterfeit identification system, said system comprising:
the optical character recognition module is used for carrying out optical character recognition processing on the file to obtain a black-white dot matrix image file;
the page size scaling module is used for determining the page number and page size of the black-and-white dot matrix image file and scaling the page size to the standard size;
the information random access module is used for randomly accessing an anti-counterfeiting mark generation information from the anti-counterfeiting mark library according to the number of pages, wherein the anti-counterfeiting mark generation information comprises an anti-counterfeiting mark number, a plurality of anti-counterfeiting mark page numbers and anti-counterfeiting mark coordinates;
the anti-counterfeiting mark generation module is used for determining anti-counterfeiting use areas in the black-and-white dot matrix image file according to the anti-counterfeiting mark page number and the anti-counterfeiting mark coordinates, and stacking a plurality of anti-counterfeiting use areas to obtain a dot matrix anti-counterfeiting mark;
the anti-counterfeiting mark adding module is used for adding the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark into the file;
the anti-counterfeiting identification command module is used for receiving an anti-counterfeiting identification command of the file;
and the anti-counterfeiting mark verification module is used for identifying and judging the dot matrix anti-counterfeiting marks in the file, and generating an identification result.
As a further scheme of the invention: the anti-counterfeiting mark generation module comprises:
the dot matrix image determining unit is used for sequentially determining black-and-white dot matrix images corresponding to the anti-counterfeiting mark page numbers according to the numerical values of the anti-counterfeiting mark page numbers;
the anti-counterfeiting application area unit is used for determining an anti-counterfeiting application area of the black-and-white dot matrix image according to anti-counterfeiting mark coordinates, wherein the anti-counterfeiting mark page numbers correspond to the anti-counterfeiting mark coordinates one by one;
the anti-counterfeiting area arrangement unit is used for arranging the obtained anti-counterfeiting application areas according to the corresponding anti-counterfeiting mark page numbers.
As a further scheme of the invention: the anti-counterfeiting mark generation module further comprises:
the anti-counterfeiting image processing unit is used for performing image processing on the anti-counterfeiting use area in the arrangement and converting the white spot part into a transparent part;
and the anti-counterfeiting mark generation unit is used for stacking all anti-counterfeiting use areas according to the arrangement sequence to obtain the dot matrix anti-counterfeiting mark.
As a further scheme of the invention: the anti-counterfeiting mark verification module comprises:
the information appointing and calling unit is used for calling corresponding anti-counterfeiting mark generating information according to the anti-counterfeiting mark numbers in the file;
the verification mark generation unit is used for carrying out optical character recognition processing on the file to obtain a black-and-white dot matrix image file, and processing the black-and-white dot matrix image file according to the anti-counterfeiting mark generation information to obtain a verification dot matrix anti-counterfeiting mark;
and the mark similarity comparison unit is used for comparing the similarity between the dot matrix anti-counterfeiting mark in the file and the verification dot matrix anti-counterfeiting mark.
Compared with the prior art, the invention has the beneficial effects that:
the invention can randomly call a piece of anti-counterfeiting mark generation information from the anti-counterfeiting mark library, wherein the anti-counterfeiting mark generation information comprises anti-counterfeiting mark numbers, a plurality of anti-counterfeiting mark page numbers and anti-counterfeiting mark coordinates; determining anti-counterfeiting application areas in the black-and-white dot matrix image file according to the anti-counterfeiting mark page number and the anti-counterfeiting mark coordinates, and stacking a plurality of anti-counterfeiting application areas to obtain a dot matrix anti-counterfeiting mark; and adding the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark into the file. Therefore, the dot matrix anti-counterfeiting marks obtained by generating information according to different anti-counterfeiting marks are different, the dot matrix anti-counterfeiting marks are related to the file content, each file is provided with the unique dot matrix anti-counterfeiting mark, the imitation is difficult, and the anti-counterfeiting effect is good. The invention does not need special materials, has low use cost and is worth popularizing.
Drawings
FIG. 1 is a flow chart of an OCR-based document file anti-counterfeiting recognition method.
Fig. 2 is a flowchart of determining anti-counterfeit usage areas in a black-and-white dot matrix image file in an OCR-based file anti-counterfeit identification method.
Fig. 3 is a flowchart of a method for anti-counterfeiting identification of a document file based on OCR, wherein a plurality of anti-counterfeiting use areas are stacked to obtain a dot matrix anti-counterfeiting mark.
Fig. 4 is a flowchart of a method for identifying and determining lattice anti-counterfeit marks in a document file in an OCR-based document file anti-counterfeit identification method.
FIG. 5 is a flow chart of similarity comparison in an OCR-based document file anti-counterfeit identification method.
Fig. 6 is a schematic structural diagram of an OCR-based document anti-counterfeit identification system.
Fig. 7 is a schematic structural diagram of an anti-counterfeit mark generation module in an OCR-based document anti-counterfeit recognition system.
Fig. 8 is a schematic structural diagram of an anti-counterfeit mark verification module in an OCR-based document anti-counterfeit identification system.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Specific implementations of the invention are described in detail below in connection with specific embodiments.
As shown in fig. 1, an embodiment of the present invention provides an OCR-based anti-counterfeit identification method for a document file, including the following steps:
s100, performing optical character recognition processing on the file to obtain a black-and-white dot matrix image file;
s200, determining the page number and page size of the black-and-white dot matrix image file, and scaling the page size to a standard size;
s300, randomly retrieving a piece of anti-counterfeiting mark generation information from an anti-counterfeiting mark library according to the number of pages, wherein the anti-counterfeiting mark generation information comprises anti-counterfeiting mark numbers, a plurality of anti-counterfeiting mark page numbers and anti-counterfeiting mark coordinates;
s400, determining anti-counterfeiting use areas in the black-and-white dot matrix image file according to the anti-counterfeiting mark page number and the anti-counterfeiting mark coordinates, and stacking a plurality of anti-counterfeiting use areas to obtain a dot matrix anti-counterfeiting mark;
s500, adding the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark into a file;
s600, receiving a file anti-counterfeiting identification command;
s700, identifying and judging the lattice anti-counterfeiting mark in the file, and generating an identification result.
In order to ensure the authenticity and reliability of the file, anti-counterfeiting marks are added in the file, the common anti-counterfeiting marks comprise watermark anti-counterfeiting and material anti-counterfeiting, and the common anti-counterfeiting materials comprise color-changing materials, metal safety lines, plastic film uncovering and scraping, laser and the like, so that the anti-counterfeiting materials have high use cost and are not beneficial to popularization; the general watermark anti-counterfeiting pattern is uniform and easy to imitate, and the security is low.
In the embodiment of the invention, the electronic document is added and identified, if the electronic document is a paper document file, the paper document file is required to be scanned to obtain the electronic document file, then the optical character recognition processing is carried out on the document file to obtain a black-and-white dot matrix image file, when OCR recognition is carried out, the pretreatment mainly comprises graying, binarization, noise removal and inclination correction, the OCR (Optical Character Recognition ) technology mainly adopts an optical mode to convert characters in the document into an image file of a black-and-white dot matrix, then the page number and page size of the black-and-white dot matrix image file are required to be read, the page size is scaled to be standard size, the standard size is a fixed value set in advance, the length and the width are both certain, then one piece of anti-counterfeiting mark generation information is randomly extracted from an anti-counterfeiting mark library according to the page number, the anti-counterfeiting mark page number and a plurality of anti-counterfeiting mark coordinates are required to be smaller than or equal to the page number of the dot matrix image file, the anti-counterfeiting mark library is more difficult to generate the anti-counterfeiting mark according to the established in advance, and the anti-counterfeiting mark library contains different anti-counterfeiting mark information is difficult to generate anti-counterfeiting mark information; then, the anti-counterfeiting use area in the black-and-white dot matrix image file is determined according to the anti-counterfeiting mark page number and the anti-counterfeiting mark coordinates, and a plurality of anti-counterfeiting use areas are stacked to obtain the dot matrix anti-counterfeiting mark, so that the dot matrix anti-counterfeiting mark is related to the content of the file, each file is provided with a unique dot matrix anti-counterfeiting mark, the anti-counterfeiting effect is excellent, and finally, the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark are added into the file, and particularly, the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark can be added in a watermark mode or at a header or a footer. When it is required to verify whether the lattice anti-counterfeiting mark on a document file is counterfeit or not, and input a document file anti-counterfeiting identification command, the embodiment of the invention can identify and judge the lattice anti-counterfeiting mark in the document file.
As shown in fig. 2, as a preferred embodiment of the present invention, the step of determining the anti-counterfeit usage area in the black-and-white dot matrix image file according to the anti-counterfeit mark page number and the anti-counterfeit mark coordinates specifically includes:
s401, sequentially determining black-and-white dot matrix images corresponding to the anti-counterfeiting mark page numbers according to the numerical values of the anti-counterfeiting mark page numbers;
s402, determining an anti-counterfeiting use area of the black-and-white dot matrix image according to anti-counterfeiting mark coordinates, wherein the anti-counterfeiting mark page numbers and the anti-counterfeiting mark coordinates correspond to each other one by one;
s403, arranging the obtained anti-counterfeiting application areas according to the corresponding anti-counterfeiting mark page numbers.
In the embodiment of the invention, firstly, black-and-white dot matrix images corresponding to the anti-counterfeiting mark page number are sequentially determined according to the numerical value of the anti-counterfeiting mark page number, for example, the anti-counterfeiting mark page number and the anti-counterfeiting mark coordinate in certain anti-counterfeiting mark generation information are as follows: page 2- (12, 6), page 5- (5, 8) and page 8- (8, 12) then the black and white dot matrix images of page 2, page 5 and page 8 are called up, and then the anti-counterfeit usage areas of the black and white dot matrix images are determined according to the anti-counterfeit mark coordinates, that is, the local graphics at the (12, 6) coordinates of page 2, the (5, 8) coordinates of page 5 and the (8, 12) coordinates of page 8 are taken as anti-counterfeit usage areas, three anti-counterfeit usage areas are obtained here, each anti-counterfeit usage area is obtained by taking the anti-counterfeit mark coordinates as the center and taking the set length as the radius, and the set length is the set value set in advance.
As shown in fig. 3, as a preferred embodiment of the present invention, the step of stacking a plurality of anti-counterfeit application areas to obtain a dot matrix anti-counterfeit mark specifically includes:
s404, performing image processing on the anti-counterfeiting use area in the arrangement, and converting the white spot part into a transparent part;
s405, stacking all the anti-counterfeiting use areas according to the arrangement sequence to obtain the dot matrix anti-counterfeiting mark.
In the embodiment of the invention, the anti-counterfeiting application area is required to be subjected to image processing, the anti-counterfeiting application area is also composed of a black-and-white lattice, white dot positions in the anti-counterfeiting application area are converted into transparent positions, the black dot positions are unchanged, all the anti-counterfeiting application areas are stacked according to the arrangement sequence to obtain the lattice anti-counterfeiting mark, the first anti-counterfeiting application area is arranged at the bottommost layer, the second anti-counterfeiting application area is arranged at the penultimate layer, and the like, and after stacking, all the black dot positions form a graph which is the lattice anti-counterfeiting mark.
As shown in fig. 4, as a preferred embodiment of the present invention, the step of identifying and determining the dot matrix security mark in the document file specifically includes:
s701, corresponding anti-counterfeiting mark generation information is called according to the anti-counterfeiting mark number in the file;
s702, performing optical character recognition processing on the file to obtain a black-and-white dot matrix image file, and processing the black-and-white dot matrix image file according to the anti-counterfeiting mark generation information to obtain a verification dot matrix anti-counterfeiting mark;
s703, comparing the similarity between the lattice anti-counterfeiting mark in the file and the verification lattice anti-counterfeiting mark.
In the embodiment of the invention, when the identification verification is carried out on the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark in the file, the corresponding anti-counterfeiting mark generation information is required to be called from an anti-counterfeiting mark library according to the anti-counterfeiting mark number, then the previous steps are repeated, namely, the file is subjected to optical character identification processing to obtain a black-white dot matrix image file, the black-white dot matrix image file is processed according to the anti-counterfeiting mark generation information to obtain the dot matrix anti-counterfeiting mark, the dot matrix anti-counterfeiting mark is used as the verification dot matrix anti-counterfeiting mark, the dot matrix anti-counterfeiting mark in the file and the verification dot matrix anti-counterfeiting mark are subjected to similarity comparison, so that authenticity can be known, and if the two are identical.
As shown in fig. 5, as a preferred embodiment of the present invention, the step of comparing the similarity between the dot matrix security mark in the document file and the verification dot matrix security mark specifically includes:
s7031, respectively calculating hash values of the dot matrix anti-counterfeiting mark and the verification dot matrix anti-counterfeiting mark in the file by using a hash method based on discrete cosine transform to obtain h_1 and h_2, wherein h_1 represents the hash value of the dot matrix anti-counterfeiting mark, and h_2 represents the hash value of the verification dot matrix anti-counterfeiting mark;
s7032, calculating a Hamming distance dis_h between h_1 and h_2;
s7033, calculating according to the Hamming distance dis_h to obtain the similarity between the dot matrix anti-counterfeiting mark and the verification dot matrix anti-counterfeiting mark in the file.
In the embodiment of the invention, the similarity calculation method of the two graphs is numerous, the similarity is calculated by using a hash method based on discrete cosine transform, and other methods can be used, and when the similarity between the two graphs is larger than the set similarity value, the two graphs can be considered to be the same, for example, the set similarity value can be 98%.
As shown in fig. 6, the embodiment of the invention further provides an OCR-based anti-counterfeiting identification system for a document file, which comprises:
the optical character recognition module 100 is used for performing optical character recognition processing on the file to obtain a black-white dot matrix image file;
a page size scaling module 200, configured to determine the number of pages and the page size of the black-and-white dot matrix image file, and scale the page size to a standard size;
the information random access module 300 is configured to randomly access an anti-counterfeit mark generation information from the anti-counterfeit mark library according to the number of pages, where the anti-counterfeit mark generation information includes an anti-counterfeit mark number, a plurality of anti-counterfeit mark page numbers, and anti-counterfeit mark coordinates;
the anti-counterfeit mark generation module 400 is used for determining anti-counterfeit use areas in the black-and-white dot matrix image file according to the anti-counterfeit mark page number and the anti-counterfeit mark coordinates, and stacking a plurality of anti-counterfeit use areas to obtain a dot matrix anti-counterfeit mark;
the anti-counterfeiting mark adding module 500 is used for adding the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark into the file;
the anti-counterfeiting identification command module 600 is used for receiving an anti-counterfeiting identification command of the file;
the anti-counterfeit mark verification module 700 is configured to identify and determine the dot matrix anti-counterfeit mark in the document file, and generate an identification result.
In the embodiment of the invention, the electronic document is added and identified, if the electronic document is a paper document file, the paper document file is required to be scanned to obtain the electronic document file, then the document file is subjected to optical character recognition processing to obtain a black-and-white dot matrix image file, when OCR (optical character recognition) is carried out, the pretreatment mainly comprises graying, binarization, noise removal and inclination correction, the OCR technology mainly adopts an optical mode to convert characters in the document into the black-and-white dot matrix image file, then the page number and page size of the black-and-white dot matrix image file are required to be read, the page size is scaled into a standard size, the standard size is a fixed value set in advance, the length and the width are both certain, and then one piece of anti-counterfeit mark generation information is randomly called from an anti-counterfeit mark library according to the page number, the anti-counterfeit mark generation information comprises a plurality of anti-counterfeit mark numbers, the anti-counterfeit mark page numbers and a plurality of anti-counterfeit mark coordinates, the anti-counterfeit mark page numbers and the anti-counterfeit mark coordinates are one by one, the anti-counterfeit mark page numbers are smaller than or equal to the page number of the black-and white dot matrix image file, the anti-counterfeit mark library is difficult to generate anti-counterfeit mark information according to different anti-counterfeit mark information, and the anti-counterfeit mark library is difficult to generate anti-counterfeit effect; then, the anti-counterfeiting use area in the black-and-white dot matrix image file is determined according to the anti-counterfeiting mark page number and the anti-counterfeiting mark coordinates, and a plurality of anti-counterfeiting use areas are stacked to obtain the dot matrix anti-counterfeiting mark, so that the dot matrix anti-counterfeiting mark is related to the content of the file, each file is provided with a unique dot matrix anti-counterfeiting mark, the anti-counterfeiting effect is excellent, and finally, the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark are added into the file, and particularly, the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark can be added in a watermark mode or at a header or a footer. When it is required to verify whether the lattice anti-counterfeiting mark on a document file is counterfeit or not, and input a document file anti-counterfeiting identification command, the embodiment of the invention can identify and judge the lattice anti-counterfeiting mark in the document file.
As shown in fig. 7, as a preferred embodiment of the present invention, the anti-counterfeit mark generation module 400 includes:
a dot-matrix image determining unit 401, configured to sequentially determine black-and-white dot-matrix images corresponding to the anti-counterfeit mark page numbers according to the numerical values of the anti-counterfeit mark page numbers;
an anti-counterfeiting area unit 402, configured to determine an anti-counterfeiting area of the black-and-white dot matrix image according to anti-counterfeiting mark coordinates, where the anti-counterfeiting mark page number corresponds to the anti-counterfeiting mark coordinates one by one;
the anti-counterfeiting area arrangement unit 403 is configured to arrange the obtained plurality of anti-counterfeiting areas according to the corresponding anti-counterfeiting mark page numbers.
As shown in fig. 7, as a preferred embodiment of the present invention, the anti-counterfeit mark generation module 400 further includes:
an anti-counterfeit image processing unit 404, configured to perform image processing on the anti-counterfeit usage area in the arrangement, and convert the white spot part into a transparent part;
the anti-counterfeit mark generation unit 405 is configured to stack all the anti-counterfeit usage areas according to the arrangement order to obtain the dot matrix anti-counterfeit mark.
As shown in fig. 8, as a preferred embodiment of the present invention, the anti-counterfeit mark verification module 700 includes:
the information designating and calling unit 701 is configured to call corresponding anti-counterfeit mark generation information according to the anti-counterfeit mark number in the document file;
the verification mark generating unit 702 is configured to perform optical character recognition processing on the file to obtain a black-and-white dot matrix image file, and process the black-and-white dot matrix image file according to the anti-counterfeit mark generating information to obtain a verification dot matrix anti-counterfeit mark;
the mark similarity comparing unit 703 is configured to compare the similarity between the dot matrix anti-counterfeit mark in the document file and the verification dot matrix anti-counterfeit mark.
The foregoing description of the preferred embodiments of the present invention should not be taken as limiting the invention, but rather should be understood to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (9)

1. An OCR-based anti-counterfeit identification method for a document file, the method comprising the steps of:
performing optical character recognition processing on the file to obtain a black-white dot matrix image file;
determining the page number and page size of a black-and-white dot matrix image file, and scaling the page size to a standard size;
randomly retrieving an anti-counterfeiting mark generation message from an anti-counterfeiting mark library according to the number of pages, wherein the anti-counterfeiting mark generation message comprises an anti-counterfeiting mark number, a plurality of anti-counterfeiting mark page numbers and anti-counterfeiting mark coordinates;
determining anti-counterfeiting application areas in the black-and-white dot matrix image file according to the anti-counterfeiting mark page number and the anti-counterfeiting mark coordinates, and stacking a plurality of anti-counterfeiting application areas to obtain a dot matrix anti-counterfeiting mark;
adding the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark into a file;
receiving an anti-counterfeiting identification command of a file;
and identifying and judging the lattice anti-counterfeiting mark in the file, and generating an identification result.
2. The OCR-based document file anti-counterfeit identification method of claim 1, wherein the step of determining the anti-counterfeit usage area in the black-and-white dot matrix image file according to the anti-counterfeit mark page number and the anti-counterfeit mark coordinates specifically comprises:
sequentially determining black-and-white dot matrix images corresponding to the anti-counterfeiting mark page numbers according to the numerical values of the anti-counterfeiting mark page numbers;
determining an anti-counterfeiting use area of the black-and-white dot matrix image according to anti-counterfeiting mark coordinates, wherein the anti-counterfeiting mark page numbers and the anti-counterfeiting mark coordinates correspond to each other one by one;
and arranging the obtained anti-counterfeiting application areas according to the corresponding anti-counterfeiting mark page numbers.
3. The OCR-based document anti-counterfeit identification method of claim 2, wherein the step of stacking a plurality of anti-counterfeit usage areas to obtain a dot matrix anti-counterfeit mark specifically comprises:
image processing is carried out on the anti-counterfeiting use area in the arrangement, and the white spot part is converted into a transparent part;
and stacking all the anti-counterfeiting using areas according to the arrangement sequence to obtain the dot matrix anti-counterfeiting mark.
4. The OCR-based document file anti-counterfeit identification method of claim 1, wherein the step of identifying and determining the dot matrix anti-counterfeit mark in the document file specifically comprises:
the corresponding anti-counterfeiting mark generation information is called according to the anti-counterfeiting mark number in the file;
performing optical character recognition processing on the file to obtain a black-and-white dot matrix image file, and processing the black-and-white dot matrix image file according to the anti-counterfeiting mark generation information to obtain a verification dot matrix anti-counterfeiting mark;
and comparing the similarity between the lattice anti-counterfeiting mark in the file and the verification lattice anti-counterfeiting mark.
5. The OCR-based document anti-counterfeit identification method of claim 4, wherein the step of comparing the similarity of the dot matrix anti-counterfeit mark in the document with the verification dot matrix anti-counterfeit mark comprises:
respectively calculating hash values of the dot matrix anti-counterfeiting mark and the verification dot matrix anti-counterfeiting mark in the file by using a hash method based on discrete cosine transform to obtain h_1 and h_2;
calculating a hamming distance dis_h between h_1 and h_2;
and calculating according to the Hamming distance dis_h to obtain the similarity between the dot matrix anti-counterfeiting mark and the verification dot matrix anti-counterfeiting mark in the file.
6. An OCR-based document anti-counterfeiting recognition system, the system comprising:
the optical character recognition module is used for carrying out optical character recognition processing on the file to obtain a black-white dot matrix image file;
the page size scaling module is used for determining the page number and page size of the black-and-white dot matrix image file and scaling the page size to the standard size;
the information random access module is used for randomly accessing an anti-counterfeiting mark generation information from the anti-counterfeiting mark library according to the number of pages, wherein the anti-counterfeiting mark generation information comprises an anti-counterfeiting mark number, a plurality of anti-counterfeiting mark page numbers and anti-counterfeiting mark coordinates;
the anti-counterfeiting mark generation module is used for determining anti-counterfeiting use areas in the black-and-white dot matrix image file according to the anti-counterfeiting mark page number and the anti-counterfeiting mark coordinates, and stacking a plurality of anti-counterfeiting use areas to obtain a dot matrix anti-counterfeiting mark;
the anti-counterfeiting mark adding module is used for adding the anti-counterfeiting mark number and the dot matrix anti-counterfeiting mark into the file;
the anti-counterfeiting identification command module is used for receiving an anti-counterfeiting identification command of the file;
and the anti-counterfeiting mark verification module is used for identifying and judging the dot matrix anti-counterfeiting marks in the file, and generating an identification result.
7. An OCR-based document archival anti-counterfeiting recognition system according to claim 6, wherein the anti-counterfeiting mark generation module comprises:
the dot matrix image determining unit is used for sequentially determining black-and-white dot matrix images corresponding to the anti-counterfeiting mark page numbers according to the numerical values of the anti-counterfeiting mark page numbers;
the anti-counterfeiting application area unit is used for determining an anti-counterfeiting application area of the black-and-white dot matrix image according to anti-counterfeiting mark coordinates, wherein the anti-counterfeiting mark page numbers correspond to the anti-counterfeiting mark coordinates one by one;
the anti-counterfeiting area arrangement unit is used for arranging the obtained anti-counterfeiting application areas according to the corresponding anti-counterfeiting mark page numbers.
8. The OCR-based document archival anti-counterfeit identification system of claim 7, wherein the anti-counterfeit mark generation module further comprises:
the anti-counterfeiting image processing unit is used for performing image processing on the anti-counterfeiting use area in the arrangement and converting the white spot part into a transparent part;
and the anti-counterfeiting mark generation unit is used for stacking all anti-counterfeiting use areas according to the arrangement sequence to obtain the dot matrix anti-counterfeiting mark.
9. An OCR-based document archival anti-counterfeiting recognition system according to claim 6, wherein the anti-counterfeiting mark verification module includes:
the information appointing and calling unit is used for calling corresponding anti-counterfeiting mark generating information according to the anti-counterfeiting mark numbers in the file;
the verification mark generation unit is used for carrying out optical character recognition processing on the file to obtain a black-and-white dot matrix image file, and processing the black-and-white dot matrix image file according to the anti-counterfeiting mark generation information to obtain a verification dot matrix anti-counterfeiting mark;
and the mark similarity comparison unit is used for comparing the similarity between the dot matrix anti-counterfeiting mark in the file and the verification dot matrix anti-counterfeiting mark.
CN202310645347.0A 2023-06-02 2023-06-02 Anti-counterfeiting identification method and identification system for file files based on OCR Active CN116363492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310645347.0A CN116363492B (en) 2023-06-02 2023-06-02 Anti-counterfeiting identification method and identification system for file files based on OCR

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310645347.0A CN116363492B (en) 2023-06-02 2023-06-02 Anti-counterfeiting identification method and identification system for file files based on OCR

Publications (2)

Publication Number Publication Date
CN116363492A true CN116363492A (en) 2023-06-30
CN116363492B CN116363492B (en) 2023-08-04

Family

ID=86928599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310645347.0A Active CN116363492B (en) 2023-06-02 2023-06-02 Anti-counterfeiting identification method and identification system for file files based on OCR

Country Status (1)

Country Link
CN (1) CN116363492B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN2635368Y (en) * 2003-08-08 2004-08-25 海南亚元防伪纸业有限公司 Digital texture antiforgery paper
RU2776828C1 (en) * 2019-01-27 2022-07-27 Хайнань Пайпайкань Информэйшн Технолоджи Ко., Лтд. Method for counterfeit-protected packaging for product liability insurance
CN114943067A (en) * 2022-07-13 2022-08-26 河北汇金集团股份有限公司 Anti-counterfeiting identification method for file archive based on OCR technology
WO2023065285A1 (en) * 2021-10-22 2023-04-27 拍拍看(海南)人工智能有限公司 Texture-based anti-counterfeiting method for fabric, and fabric
CN116029777A (en) * 2022-12-23 2023-04-28 北京菱云科技有限公司 Anti-counterfeiting bill generation method, device, equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN2635368Y (en) * 2003-08-08 2004-08-25 海南亚元防伪纸业有限公司 Digital texture antiforgery paper
RU2776828C1 (en) * 2019-01-27 2022-07-27 Хайнань Пайпайкань Информэйшн Технолоджи Ко., Лтд. Method for counterfeit-protected packaging for product liability insurance
WO2023065285A1 (en) * 2021-10-22 2023-04-27 拍拍看(海南)人工智能有限公司 Texture-based anti-counterfeiting method for fabric, and fabric
CN114943067A (en) * 2022-07-13 2022-08-26 河北汇金集团股份有限公司 Anti-counterfeiting identification method for file archive based on OCR technology
CN116029777A (en) * 2022-12-23 2023-04-28 北京菱云科技有限公司 Anti-counterfeiting bill generation method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAI WANG; JIANMING JIN; QINGREN WANG: "High Performance Chinese/English Mixed OCR with Character Level Language Identification", 《2009 10TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION》, pages 406 - 410 *
王攀,陈长晴,靳雨欣: "区块链技术下人事档案防伪溯源", 《兰台世界》, pages 70 - 72 *

Also Published As

Publication number Publication date
CN116363492B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
Amano et al. A feature calibration method for watermarking of document images
US6272245B1 (en) Apparatus and method for pattern recognition
CN108805787B (en) Paper document tampering and authenticating method and device
CN109902710B (en) Quick matching method and device for text images
CN109635805B (en) Image text positioning method and device and image text identification method and device
EP0725359A1 (en) Image processing method and apparatus
CN112508011A (en) OCR (optical character recognition) method and device based on neural network
Carter et al. Automatic recognition of printed music
US7362455B2 (en) Processing scanned pages
JPH05506115A (en) Correlation masking process for deskewing, filtering and recognition of vertically segmented characters
EP1118959A2 (en) Method and apparatus for determining form sheet type
CN110210467B (en) Formula positioning method of text image, image processing device and storage medium
CN116363492B (en) Anti-counterfeiting identification method and identification system for file files based on OCR
Andreeva et al. Comparison of scanned administrative document images
JP2004228759A (en) Method and apparatus for correcting image
CN111881880A (en) Bill text recognition method based on novel network
Eskenazi et al. When document security brings new challenges to document analysis
CN115410191B (en) Text image recognition method, device, equipment and storage medium
CN113496115A (en) File content comparison method and device
JP2008028716A (en) Image processing method and apparatus
CN115661839A (en) Bill filing method, device, terminal and storage medium
Van Beusekom et al. Document signature using intrinsic features for counterfeit detection
CN114694159A (en) Engineering drawing BOM identification method and device, electronic equipment and storage medium
US11522715B2 (en) Methods for processing and verifying a document
CN113591849A (en) File review method, device, equipment and storage medium based on signature identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant